[jira] [Updated] (PIG-5377) Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce
[ https://issues.apache.org/jira/browse/PIG-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin J. Price updated PIG-5377: Attachment: PIG-5377-2.patch > Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce > - > > Key: PIG-5377 > URL: https://issues.apache.org/jira/browse/PIG-5377 > Project: Pig > Issue Type: Improvement > Components: internal-udfs, piggybank >Reporter: Kevin J. Price >Assignee: Kevin J. Price >Priority: Minor > Attachments: PIG-5377-2.patch, PIG-5377.patch > > > Now that we're running on JDK8 and can have default implementations in > interfaces, we can move supportsParallelWriteToStoreLocation() to the > StoreFuncInterface class and properly set it on the supported built-in > functions rather than having a static list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PIG-5377) Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce
[ https://issues.apache.org/jira/browse/PIG-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin J. Price updated PIG-5377: Attachment: PIG-5377.patch Status: Patch Available (was: Open) > Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce > - > > Key: PIG-5377 > URL: https://issues.apache.org/jira/browse/PIG-5377 > Project: Pig > Issue Type: Improvement > Components: internal-udfs, piggybank >Reporter: Kevin J. Price >Assignee: Kevin J. Price >Priority: Minor > Attachments: PIG-5377.patch > > > Now that we're running on JDK8 and can have default implementations in > interfaces, we can move supportsParallelWriteToStoreLocation() to the > StoreFuncInterface class and properly set it on the supported built-in > functions rather than having a static list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PIG-5377) Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce
Kevin J. Price created PIG-5377: --- Summary: Move supportsParallelWriteToStoreLocation from StoreFunc to StoreFuncInterfce Key: PIG-5377 URL: https://issues.apache.org/jira/browse/PIG-5377 Project: Pig Issue Type: Improvement Components: internal-udfs, piggybank Reporter: Kevin J. Price Assignee: Kevin J. Price Now that we're running on JDK8 and can have default implementations in interfaces, we can move supportsParallelWriteToStoreLocation() to the StoreFuncInterface class and properly set it on the supported built-in functions rather than having a static list. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-4608) FOREACH ... UPDATE
[ https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327736#comment-16327736 ] Kevin J. Price commented on PIG-4608: - This is Yahoo-centric, but would it be possible to grep our logs for existing pig jobs and see how many of them have keyword conflicts with 'update', 'delete', 'drop', etc? I'm indifferent on 'delete' versus 'drop', but it'd be interesting to know which one would impact fewer existing scripts. As for the 'update val AS col' versus 'update col BY val', I think the former looks less confusing to a current pig user. 'BY' only gets used currently for key ordering in groups and joins, whereas 'AS' is already used for value assignment. I agree that there's a difference between 'GENERATE val AS col' and 'UPDATE val AS col', but it's a fairly philosophical difference from the user's perspective. In both cases, they want col to have the value val after the statement, so having the same syntax makes sense. > FOREACH ... UPDATE > -- > > Key: PIG-4608 > URL: https://issues.apache.org/jira/browse/PIG-4608 > Project: Pig > Issue Type: New Feature >Reporter: Haley Thrapp >Priority: Major > > I would like to propose a new command in Pig, FOREACH...UPDATE. > Syntactically, it would look much like FOREACH … GENERATE. > Example: > Input data: > (1,2,3) > (2,3,4) > (3,4,5) > -- Load the data > three_numbers = LOAD 'input_data' > USING PigStorage() > AS (f1:int, f2:int, f3:int); > -- Sum up the row > updated = FOREACH three_numbers UPDATE > 5 as f1, > f1+f2 as new_sum > ; > Dump updated; > (5,2,3,3) > (5,3,4,5) > (5,4,5,7) > Fields to update must be specified by alias. Any fields in the UPDATE that do > not match an existing field will be appended to the end of the tuple. > This command is particularly desirable in scripts that deal with a large > number of fields (in the 20-200 range). Often, we need to only make > modifications to a few fields. The FOREACH ... UPDATE statement, allows the > developer to focus on the actual logical changes instead of having to list > all of the fields that are also being passed through. > My team has prototyped this with changes to FOREACH ... GENERATE. We believe > this can be done with changes to the parser and the creation of a new > LOUpdate. No physical plan changes should be needed because we will leverage > what LOGenerate does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204921#comment-15204921 ] Kevin J. Price commented on PIG-3000: - Did this patch just get dropped? This is still a serious problem. > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding >Assignee: Mona Chitnis > Attachments: PIG-3000-6.patch, unit_tests.patch > > > In this Pig script: > {code} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {code} > The Eval function UPPER is called twice for each record. > This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4608) FOREACH ... UPDATE
[ https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596462#comment-14596462 ] Kevin J. Price commented on PIG-4608: - Several of us actually discussed this at some length, and didn't think it was worth differentiating between modified columns and appended columns in the command. Two ideas we had: # A token, like you have, indicating that the remaining fields are being added. We were considering using an 'ADD' keyword. As in: {code} updated = FOREACH three_numbers UPDATE 3 AS f3, 6 AS f6 ADD f1+f2 AS new_sum; {code} # Separate statements for 'strict' versus 'non-strict' mode. e.g., for updating with appending you would use {code} updated = FOREACH three_numbers UPDATE_STRICT 3 AS f3, 6 AS f6; {code} and for updating with appending, you could use {code} updated = FOREACH three_numbers UPDATE 3 AS f3, 6 AS f6, f1+f2 AS new_sum; {code} However, our overall view from writing pig scripts is that chances are very few people would ever want to use the strict mode, nor did we see much value in having the extra token (ADD or ...) separating out appended columns. >From a programming viewpoint, it just makes more logical sense to us to view it as an implicit "update or add" construct. > FOREACH ... UPDATE > -- > > Key: PIG-4608 > URL: https://issues.apache.org/jira/browse/PIG-4608 > Project: Pig > Issue Type: New Feature >Reporter: Haley Thrapp > > I would like to propose a new command in Pig, FOREACH...UPDATE. > Syntactically, it would look much like FOREACH … GENERATE. > Example: > Input data: > (1,2,3) > (2,3,4) > (3,4,5) > -- Load the data > three_numbers = LOAD 'input_data' > USING PigStorage() > AS (f1:int, f2:int, f3:int); > -- Sum up the row > updated = FOREACH three_numbers UPDATE > 5 as f1, > f1+f2 as new_sum > ; > Dump updated; > (5,2,3,3) > (5,3,4,5) > (5,4,5,7) > Fields to update must be specified by alias. Any fields in the UPDATE that do > not match an existing field will be appended to the end of the tuple. > This command is particularly desirable in scripts that deal with a large > number of fields (in the 20-200 range). Often, we need to only make > modifications to a few fields. The FOREACH ... UPDATE statement, allows the > developer to focus on the actual logical changes instead of having to list > all of the fields that are also being passed through. > My team has prototyped this with changes to FOREACH ... GENERATE. We believe > this can be done with changes to the parser and the creation of a new > LOUpdate. No physical plan changes should be needed because we will leverage > what LOGenerate does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4433) Loading bigdecimal in nested tuple does not work
[ https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336632#comment-14336632 ] Kevin J. Price commented on PIG-4433: - Thanks, Daniel! Will do. > Loading bigdecimal in nested tuple does not work > > > Key: PIG-4433 > URL: https://issues.apache.org/jira/browse/PIG-4433 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0, 0.14.1, 0.15.0 >Reporter: Kevin J. Price >Assignee: Kevin J. Price > Fix For: 0.15.0 > > Attachments: PIG-4433-1.patch > > > The parsing of BigDecimal data types in a nested tuple, as implemented by > Utf8StorageConverter.java, does not work. There's a "break;" missing from a > switch statement. > Code example that demonstrates the problem: > === input.txt === > (17,1234567890.0987654321) > === pig_script ===: > inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal)); > STORE inp INTO 'output'; > === output === > (17,) > With patch, the output becomes the expected: > (17,1234567890.0987654321) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4433) Loading bigdecimal in nested tuple does not work
[ https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335375#comment-14335375 ] Kevin J. Price commented on PIG-4433: - Pull request created on github: https://github.com/apache/pig/pull/16 > Loading bigdecimal in nested tuple does not work > > > Key: PIG-4433 > URL: https://issues.apache.org/jira/browse/PIG-4433 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0, 0.14.1, 0.15.0 >Reporter: Kevin J. Price > Fix For: 0.14.1, 0.15.0 > > > The parsing of BigDecimal data types in a nested tuple, as implemented by > Utf8StorageConverter.java, does not work. There's a "break;" missing from a > switch statement. > Code example that demonstrates the problem: > === input.txt === > (17,1234567890.0987654321) > === pig_script ===: > inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal)); > STORE inp INTO 'output'; > === output === > (17,) > With patch, the output becomes the expected: > (17,1234567890.0987654321) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4433) Loading bigdecimal in nested tuple does not work
[ https://issues.apache.org/jira/browse/PIG-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin J. Price updated PIG-4433: Status: Patch Available (was: Open) diff --git a/src/org/apache/pig/builtin/Utf8StorageConverter.java b/src/org/apache/pig/builtin/Utf8StorageConverter.java index 814c746..1b905e2 100644 --- a/src/org/apache/pig/builtin/Utf8StorageConverter.java +++ b/src/org/apache/pig/builtin/Utf8StorageConverter.java @@ -315,6 +315,7 @@ public class Utf8StorageConverter implements LoadStoreCaster { break; case DataType.BIGDECIMAL: field = bytesToBigDecimal(b); +break; case DataType.DATETIME: field = bytesToDateTime(b); break; diff --git a/test/org/apache/pig/builtin/TestUtf8StorageConverter.java b/test/org/apache/pig/builtin/TestUtf8StorageConverter.java new file mode 100644 index 000..8cc9e55 --- /dev/null +++ b/test/org/apache/pig/builtin/TestUtf8StorageConverter.java @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.pig.builtin; + +import static org.junit.Assert.assertEquals; + +import java.math.BigDecimal; +import java.math.BigInteger; + +import org.apache.pig.ResourceSchema; +import org.apache.pig.ResourceSchema.ResourceFieldSchema; +import org.apache.pig.data.DataByteArray; +import org.apache.pig.data.Tuple; +import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema; +import org.apache.pig.impl.util.Utils; +import org.joda.time.DateTime; +import org.joda.time.DateTimeZone; +import org.junit.Test; + +public class TestUtf8StorageConverter { + +@Test +/* Test that the simple data types convert properly in a tuple context */ +public void testSimpleTypes() throws Exception { +Utf8StorageConverter converter = new Utf8StorageConverter(); +String schemaString = "a:int, b:long, c:float, d:double, e:chararray, f:bytearray, g:boolean, h:biginteger, i:bigdecimal, j:datetime"; +String dataString = "(1,2,3.0,4.0,five,6,true,12345678901234567890,1234567890.0987654321,2007-04-05T14:30Z)"; + +ResourceSchema.ResourceFieldSchema rfs = new ResourceFieldSchema(new FieldSchema("schema", Utils.getSchemaFromString(schemaString))); +Tuple result = converter.bytesToTuple(dataString.getBytes(), rfs); +assertEquals(10, result.size()); +assertEquals(new Integer(1), result.get(0)); +assertEquals(new Long(2L), result.get(1)); +assertEquals(new Float(3.0f), result.get(2)); +assertEquals(new Double(4.0), result.get(3)); +assertEquals("five", result.get(4)); +assertEquals(new DataByteArray(new byte[] { (byte) '6' }), result.get(5)); +assertEquals(new Boolean(true), result.get(6)); +assertEquals(new BigInteger("12345678901234567890"), result.get(7)); +assertEquals(new BigDecimal("1234567890.0987654321"), result.get(8)); +assertEquals(new DateTime("2007-04-05T14:30Z", DateTimeZone.UTC), result.get(9)); +} +} > Loading bigdecimal in nested tuple does not work > > > Key: PIG-4433 > URL: https://issues.apache.org/jira/browse/PIG-4433 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0, 0.14.1, 0.15.0 >Reporter: Kevin J. Price > Fix For: 0.14.1, 0.15.0 > > > The parsing of BigDecimal data types in a nested tuple, as implemented by > Utf8StorageConverter.java, does not work. There's a "break;" missing from a > switch statement. > Code example that demonstrates the problem: > === input.txt === > (17,1234567890.0987654321) > === pig_script ===: > inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal)); > STORE inp INTO 'output'; > === output === > (17,) > With patch, the output becomes the expected: > (17,1234567890.0987654321) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4433) Loading bigdecimal in nested tuple does not work
Kevin J. Price created PIG-4433: --- Summary: Loading bigdecimal in nested tuple does not work Key: PIG-4433 URL: https://issues.apache.org/jira/browse/PIG-4433 Project: Pig Issue Type: Bug Affects Versions: 0.14.0, 0.14.1, 0.15.0 Reporter: Kevin J. Price Fix For: 0.14.1, 0.15.0 The parsing of BigDecimal data types in a nested tuple, as implemented by Utf8StorageConverter.java, does not work. There's a "break;" missing from a switch statement. Code example that demonstrates the problem: === input.txt === (17,1234567890.0987654321) === pig_script ===: inp = LOAD 'input.txt' AS (foo:tuple(bar:long, baz:bigdecimal)); STORE inp INTO 'output'; === output === (17,) With patch, the output becomes the expected: (17,1234567890.0987654321) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-2046) Properties defined through 'SET' are not passed through to fs commands
[ https://issues.apache.org/jira/browse/PIG-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030769#comment-13030769 ] Kevin J. Price commented on PIG-2046: - Odd. It definitely works correctly if you set up a "pig-cluster-hadoop-site.xml" file in a conf directory and include it on the class path using -cp. That's the workaround I'm using right now. > Properties defined through 'SET' are not passed through to fs commands > -- > > Key: PIG-2046 > URL: https://issues.apache.org/jira/browse/PIG-2046 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0 >Reporter: Vivek Padmanabhan > > The properties which are set through 'SET' commands are not passed through to > FS commands. > Ex; > SET dfs.umaskmode '026' > fs -touchz umasktest/file0 > It looks like the SET commands are processed by GruntParser after the FsShell > creation happens with current set of properties. Hence whatever properties > defined in SET will not be reflected for fs commands executed in the script. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira