[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4174_4.patch This patch fixed the unit tests. The version of Spark used is 1.0.2. In Spark 1.1.0, the CoGroupRDD is changed and breaks the cogroup runtime. I'm looking into this. Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, PIG-4174_4.patch, TEST-org.apache.pig.spark.TestSpark.txt The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4174_5.patch This patch fixed the cogroup issue for Spark 1.1.0. Spark version is updated to 1.1.0. Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, PIG-4174_4.patch, PIG-4174_5.patch, TEST-org.apache.pig.spark.TestSpark.txt The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4173_3.patch Thanks for the review. The new patch incorporate the changes in the comments. Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, TEST-org.apache.pig.spark.TestSpark.txt The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153701#comment-14153701 ] Richard Ding commented on PIG-4173: --- Hi ~praveenr019, Since PIG-4186 hasn't been checked in, it seems make more sense to first build with Spark 1.x and then fix PIG-4186. What do you think? Thanks, -Richard Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, TEST-org.apache.pig.spark.TestSpark.txt The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153705#comment-14153705 ] Richard Ding commented on PIG-4173: --- Sorry, I meant PIG-4168. Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, TEST-org.apache.pig.spark.TestSpark.txt The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-4173: - Assignee: Richard Ding Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4173.patch Attaching the initial patch to upgrade Spark to 1.1.0. I made some local changes so that the patch now compiles with the latest Spark jar. I have a question though: why don't we use JavaRDD throughout the code? Is this due to performance concerns? Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4173_2.patch Adding javax.servlet dependency Move to Spark 1.x - Key: PIG-4173 URL: https://issues.apache.org/jira/browse/PIG-4173 Project: Pig Issue Type: Sub-task Components: spark Reporter: bc Wong Assignee: Richard Ding Attachments: PIG-4173.patch, PIG-4173_2.patch The Spark branch is using Spark 0.9: https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858404#comment-13858404 ] Richard Ding commented on PIG-3608: --- Thanks [~cheolsoo]. ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.13.0 Attachments: PIG-3608.patch, PIG-3608_2.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Resolution: Fixed Fix Version/s: 0.13.0 Release Note: Committed to trunk. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.13.0 Attachments: PIG-3608.patch, PIG-3608_2.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856041#comment-13856041 ] Richard Ding commented on PIG-3609: --- [~cheolsoo], checking size is an optimization, this is also what DefaultAbstractBag implements. +1 on the patch. ClassCastException when calling compareTo method on AvroBagWrapper --- Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3609.patch, PIG-3609_2.patch, PIG-3609_3.patch One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856047#comment-13856047 ] Richard Ding commented on PIG-3608: --- Thanks for reviewing the patch. Right now I don't have a Pig script to demonstrate this use case. I'm getting this problem while trying to iterate an instance of AvroMapWrapper and find out that I can't look up the value from the map using the key just retrieved from the map. I think this breaks the basic contract of a map implementation. I think the check {code} if (isUtf8key !(key instanceof Utf8)) {code} is more general. But I'm ok if it is restricted to String. ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3608.patch, PIG-3608_2.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Attachment: PIG-3608_2.patch You are right. Update the patch with a test case. ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3608.patch, PIG-3608_2.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3609: -- Attachment: PIG-3609_2.patch New patch with a test case. ClassCastException when calling compareTo method on AvroBagWrapper --- Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3609.patch, PIG-3609_2.patch One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-3608: - Assignee: Richard Ding ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3608.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Attachment: PIG-3608.patch Attach a simple patch. ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Priority: Minor Attachments: PIG-3608.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Status: Patch Available (was: Open) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3608.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3609: -- Attachment: PIG-3609.patch Attaching a patch. ClassCastException when calling compareTo method on AvroBagWrapper --- Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Priority: Minor Attachments: PIG-3609.patch One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3609: -- Status: Patch Available (was: Open) ClassCastException when calling compareTo method on AvroBagWrapper --- Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3609.patch One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-3609: - Assignee: Richard Ding ClassCastException when calling compareTo method on AvroBagWrapper --- Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3609.patch One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840791#comment-13840791 ] Richard Ding commented on PIG-3608: --- Actually I have a question: should it be {code} if (isUtf8key) { v = innerMap.get(key); } else { v = innerMap.get(new Utf8((String) key)); } {code} since isUft8key == true means the key is already Utf8? ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key --- Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Attachments: PIG-3608.patch One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
Richard Ding created PIG-3609: - Summary: ClassCastException when calling compareTo method on AvroBagWrapper Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Priority: Minor One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607820#comment-13607820 ] Richard Ding commented on PIG-3251: --- With HADOOP-7823, can we remove Bzip2TextInputFormat and just use PigTextInputFormat? Bzip2TextInputFormat requires double the memory of maximum record size -- Key: PIG-3251 URL: https://issues.apache.org/jira/browse/PIG-3251 Project: Pig Issue Type: Improvement Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch While looking at user's OOM heap dump, noticed that pig's Bzip2TextInputFormat consumes memory at both Bzip2TextInputFormat.buffer (ByteArrayOutputStream) and actual Text that is returned as line. For example, when having one record with 160MBytes, buffer was 268MBytes and Text was 160MBytes. We can probably eliminate one of them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table
Richard Ding created PIG-3097: - Summary: HiveColumnarLoader doesn't correctly load partitioned Hive table Key: PIG-3097 URL: https://issues.apache.org/jira/browse/PIG-3097 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Given a partitioned Hive table: {code} hive describe mytable; OK f1string f2 string f3 string partition_dtstring {code} The following Pig script gives the correct schema: {code} grunt A = load '/hive/warehouse/mytable' using org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 string'); grunt describe A A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray} {code} But, the command {code} grunt dump A {code} only produces the first column of all records in the table (all four columns are expected). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3058) Upgrade junit to at least 4.8
[ https://issues.apache.org/jira/browse/PIG-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3058: -- This two failures were introduced by PIG-2924 which actually fixed a bug in JobStats class. But the corresponding errors in TestPigRunner didn't get fixed. Upgrade junit to at least 4.8 - Key: PIG-3058 URL: https://issues.apache.org/jira/browse/PIG-3058 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11 Reporter: fang fang chen Assignee: fang fang chen Pig needs to upgrade junit version to at least 4.8. Otherwise, one gets following warnings. [javadoc] org/apache/hadoop/hbase/mapreduce/TestWALPlayer.class(org/apache/hadoop/hbase/mapreduce:TestWALPlayer.class): warning: Cannot find annotation method 'value()' in type 'org.junit.experimental.categories.Category': class file for org.junit.experimental.categories.Category not found -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2405: -- Fix Version/s: 0.11 svn tags/release-0.9.1: some unit test case failed with open JDK Key: PIG-2405 URL: https://issues.apache.org/jira/browse/PIG-2405 Project: Pig Issue Type: Bug Affects Versions: 0.9.1 Environment: ant-1.8.2 open jdk: 1.6 Reporter: fang fang chen Assignee: fang fang chen Fix For: 0.11 Attachments: PIG-2405-trunk.patch [junit] Test org.apache.pig.test.TestDataModel FAILED Testcase: testTupleToString took 0.004 sec FAILED toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... junit.framework.ComparisonFailure: toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... at org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec Testcase: testHeterogeneousScans took 0.018 sec Caused an ERROR java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) java.lang.RuntimeException: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) at org.apache.hadoop.conf.Configuration.get(Configuration.java:436) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130) at org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809) at org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741) Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at java.io.FileInputStream.init(FileInputStream.java:112) at java.io.FileInputStream.init(FileInputStream.java:72) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079) Caused an ERROR Could not resolve the DNS name of hostname:39611 java.lang.IllegalArgumentException: Could not resolve the DNS name of hostname:39611 at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145) at org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120) at org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112) [junit] Test org.apache.pig.test.TestMRCompiler FAILED Testcase: testSortUDF1 took 0.045 sec FAILED null
[jira] [Updated] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3000: -- Description: In this Pig script: {case} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {case} Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding In this Pig script: {case} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {case} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3000: -- Description: In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record was: In this Pig script: {case} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {case} Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2637: -- Status: Patch Available (was: Open) Command-line option -e throws TokenMgrError exception - Key: PIG-2637 URL: https://issues.apache.org/jira/browse/PIG-2637 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.2 Reporter: Richard Ding Assignee: fang fang chen Priority: Minor Attachments: PIG-2637.patch The command-line: {code} java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt'; {code} fails with exception: {code} ERROR 1000: Error during parsing. Lexical error at line 1, column 18. Encountered: EOF after : {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2744) Handle Pig command line with XML special characters
Richard Ding created PIG-2744: - Summary: Handle Pig command line with XML special characters Key: PIG-2744 URL: https://issues.apache.org/jira/browse/PIG-2744 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Richard Ding Pig stores Pig command line string to the Hadoop job XML file. It will fail if the command line string contains XML special characters. Pig should treat the command string like Pig script by first encoding it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Attachment: PIG-2261.patch Attaching patch that restores the support for parenthesis. Restore support for parenthesis in Pig 0.9 -- Key: PIG-2261 URL: https://issues.apache.org/jira/browse/PIG-2261 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 Attachments: PIG-2261.patch Pig 0.8 and earlier versions used to support syntax such as {code} A =(load ) {code} This was removed as useless in 0.9 when the grammar was redone. It turns out that some user is using this for ease of code generation so we want to restore it back. Just to clarify, Pig 0.9 continues to support composite statements such as {code} B = filter (load 'data' as (a, b)) by a 0; {code} It just removed useless parenthesis and doesn't support statements like {code} A = (load 'data' as (a, b)); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Status: Patch Available (was: Open) Restore support for parenthesis in Pig 0.9 -- Key: PIG-2261 URL: https://issues.apache.org/jira/browse/PIG-2261 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 Attachments: PIG-2261.patch Pig 0.8 and earlier versions used to support syntax such as {code} A =(load ) {code} This was removed as useless in 0.9 when the grammar was redone. It turns out that some user is using this for ease of code generation so we want to restore it back. Just to clarify, Pig 0.9 continues to support composite statements such as {code} B = filter (load 'data' as (a, b)) by a 0; {code} It just removed useless parenthesis and doesn't support statements like {code} A = (load 'data' as (a, b)); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2261) Restor support for parenthesis in Pig 0.9
Restor support for parenthesis in Pig 0.9 - Key: PIG-2261 URL: https://issues.apache.org/jira/browse/PIG-2261 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Fix For: 0.9.1 Pig 0.8 and earlier versions used to support syntax such as {code} A =(load ) {code} This was removed as useless in 0.9 when the grammar was redone. It turns out that some user is using this for ease of code generation so we want to restore it back. Just to clarify, Pig 0.9 continues to support composite statements such as {code} B = filter (load 'data' as (a, b)) by a 0; {code} It just removed useless parenthesis and doesn't support statements like {code} A = (load 'data' as (a, b)); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Summary: Restore support for parenthesis in Pig 0.9 (was: Restor support for parenthesis in Pig 0.9) Restore support for parenthesis in Pig 0.9 -- Key: PIG-2261 URL: https://issues.apache.org/jira/browse/PIG-2261 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Fix For: 0.9.1 Pig 0.8 and earlier versions used to support syntax such as {code} A =(load ) {code} This was removed as useless in 0.9 when the grammar was redone. It turns out that some user is using this for ease of code generation so we want to restore it back. Just to clarify, Pig 0.9 continues to support composite statements such as {code} B = filter (load 'data' as (a, b)) by a 0; {code} It just removed useless parenthesis and doesn't support statements like {code} A = (load 'data' as (a, b)); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089330#comment-13089330 ] Richard Ding commented on PIG-2208: --- It only logs once per job in the front end so that user is informed that the multi-inputs (or outputs) counters are disabled. In the back-end the counters are simply disabled without logging. Restrict number of PIG generated Haddop counters - Key: PIG-2208 URL: https://issues.apache.org/jira/browse/PIG-2208 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 Attachments: PIG-2208.patch PIG 8.0 implemented Hadoop counters to track the number of records read for each input and the number of records written for each output (PIG-1389 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. Therefore we need a way to cap the number of PIG generated counters. Here are the two options: 1. Add a integer property (e.g., pig.counter.limit) to the pig property file (e.g., 20). If the number of inputs of a job exceeds this number, the input counters are disabled. Similarly, if the number of outputs of a job exceeds this number, the output counters are disabled. 2. Add a boolean property (e.g., pig.disable.counters) to the pig property file (default: false). If this property is set to true, then the PIG generated counters are disabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2208: -- Attachment: PIG-2208.patch This patch implements option 2. Augmenting Pig grammar will be more involved and could be done later. Restrict number of PIG generated Haddop counters - Key: PIG-2208 URL: https://issues.apache.org/jira/browse/PIG-2208 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 Attachments: PIG-2208.patch PIG 8.0 implemented Hadoop counters to track the number of records read for each input and the number of records written for each output (PIG-1389 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. Therefore we need a way to cap the number of PIG generated counters. Here are the two options: 1. Add a integer property (e.g., pig.counter.limit) to the pig property file (e.g., 20). If the number of inputs of a job exceeds this number, the input counters are disabled. Similarly, if the number of outputs of a job exceeds this number, the output counters are disabled. 2. Add a boolean property (e.g., pig.disable.counters) to the pig property file (default: false). If this property is set to true, then the PIG generated counters are disabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2208) Restrict number of PIG generated Haddop counters
Restrict number of PIG generated Haddop counters - Key: PIG-2208 URL: https://issues.apache.org/jira/browse/PIG-2208 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0, 0.8.1 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 PIG 8.0 implemented Hadoop counters to track the number of records read for each input and the number of records written for each output (PIG-1389 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. Therefore we need a way to cap the number of PIG generated counters. Here are the two options: 1. Add a integer property (e.g., pig.counter.limit) to the pig property file (e.g., 20). If the number of inputs of a job exceeds this number, the input counters are disabled. Similarly, if the number of outputs of a job exceeds this number, the output counters are disabled. 2. Add a boolean property (e.g., pig.disable.counters) to the pig property file (default: false). If this property is set to true, then the PIG generated counters are disabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2125) Make Pig work with hadoop .NEXT
[ https://issues.apache.org/jira/browse/PIG-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068739#comment-13068739 ] Richard Ding commented on PIG-2125: --- +1 Make Pig work with hadoop .NEXT --- Key: PIG-2125 URL: https://issues.apache.org/jira/browse/PIG-2125 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.10 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.10 Attachments: PIG-2125-1.patch, PIG-2125-2.patch, PIG-2125-3.patch, PIG-2125-4.patch, PIG-2125-5.patch We need to make Pig work with hadoop .NEXT, the svn branch currently is: https://svn.apache.org/repos/asf/hadoop/common/branches/MR-279 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar
Do not bundle apache commons jars with pig-withouthadoop.jar Key: PIG-2141 URL: https://issues.apache.org/jira/browse/PIG-2141 Project: Pig Issue Type: Bug Components: build Affects Versions: site, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 This jars are already available with hadoop installation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2141: -- Attachment: PIG-2141.patch Do not bundle apache commons jars with pig-withouthadoop.jar Key: PIG-2141 URL: https://issues.apache.org/jira/browse/PIG-2141 Project: Pig Issue Type: Bug Components: build Affects Versions: site, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2141.patch This jars are already available with hadoop installation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2141: -- Description: These jars are already available with hadoop installation. (was: This jars are already available with hadoop installation. ) Do not bundle apache commons jars with pig-withouthadoop.jar Key: PIG-2141 URL: https://issues.apache.org/jira/browse/PIG-2141 Project: Pig Issue Type: Bug Components: build Affects Versions: site, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2141.patch These jars are already available with hadoop installation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2083) bincond ERROR 1025: Invalid field projection when null is used
[ https://issues.apache.org/jira/browse/PIG-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039242#comment-13039242 ] Richard Ding commented on PIG-2083: --- +1 bincond ERROR 1025: Invalid field projection when null is used -- Key: PIG-2083 URL: https://issues.apache.org/jira/browse/PIG-2083 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.9.0 Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Hadoop 0.20.203.3.1104011556 -r 96519d04f65e22ffadf89b225d0d44ef1741d126 Compiled on Fri Apr 1 16:29:09 PDT 2011 Reporter: Araceli Henley Assignee: Thejas M Nair Fix For: 0.9.0 Attachments: PIG-2083.1.patch This is a regression for 9. a = load '1.txt' as (a0, a1); b = foreach a generate (a0==0?null:2); explain b; ERROR 1025: Invalid field projection. Projected field [null] does not exist in schema -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script
[ https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038097#comment-13038097 ] Richard Ding commented on PIG-2084: --- +1 pig is running validation for a statement at a time batch mode, instead of running it for whole script -- Key: PIG-2084 URL: https://issues.apache.org/jira/browse/PIG-2084 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.9.0 Attachments: PIG-2084.1.patch In PIG-2059, a change was made to run validation for each statement instead of running it once for the whole script. This slows down the validation phase, and it ends up taking tens of seconds. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2088) Return alias validation failed when there is single line comment in the macro
Return alias validation failed when there is single line comment in the macro - Key: PIG-2088 URL: https://issues.apache.org/jira/browse/PIG-2088 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2088.patch The following script {code} define test() returns b { a = load 'data' as (name, age, gpa); -- message $b = filter a by (int)age 40; }; beta = test(); store beta into 'output'; {code} results in a validation failure: {code} ERROR 1200 Macro test missing return alias b {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2088) Return alias validation failed when there is single line comment in the macro
[ https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2088: -- Attachment: PIG-2088.patch Return alias validation failed when there is single line comment in the macro - Key: PIG-2088 URL: https://issues.apache.org/jira/browse/PIG-2088 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2088.patch The following script {code} define test() returns b { a = load 'data' as (name, age, gpa); -- message $b = filter a by (int)age 40; }; beta = test(); store beta into 'output'; {code} results in a validation failure: {code} ERROR 1200 Macro test missing return alias b {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2088) Return alias validation failed when there is single line comment in the macro
[ https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2088. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. Return alias validation failed when there is single line comment in the macro - Key: PIG-2088 URL: https://issues.apache.org/jira/browse/PIG-2088 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2088.patch The following script {code} define test() returns b { a = load 'data' as (name, age, gpa); -- message $b = filter a by (int)age 40; }; beta = test(); store beta into 'output'; {code} results in a validation failure: {code} ERROR 1200 Macro test missing return alias b {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
[ https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037003#comment-13037003 ] Richard Ding commented on PIG-2081: --- test-patch and unit tests pass. Dryrun gives wrong line numbers in error message for scripts containing macro. -- Key: PIG-2081 URL: https://issues.apache.org/jira/browse/PIG-2081 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2081.patch For following script (test.pig) {code} 1 DEFINE my_macro (X,key) returns Y 2 { 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; 4 tmp2 = foreach tmp1 generate flatten(tokens); 5 tmp3 = order tmp2 by $0; 6 $Y = distinct tmp3; 7 } 8 9 A = load 'sometext' using TextLoader() as (row) ; 10 E = my_macro(A,row); 11 12 A1 = load 'sometext2' using TextLoader() as (row1); 13 E1 = my_macro(A1,row1); 14 15 A3 = load 'sometext3' using TextLoader() as (row3); 16 E3 = my_macro(A3,$0); 17 18 F = cogroup E by $0, E1 by $0,E3 by $0; 19 dump F; {code} pig test.pig gives correct line number in error message: {code} ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, column 17 mismatched input '$0' expecting set null {code} while pig -r test.pig gives incorrect line number in error message: {code} ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, column 17 mismatched input '$0' expecting set null {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
[ https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2081. --- Resolution: Fixed Hadoop Flags: [Reviewed] patch committed to trunk and 0.9 branch Dryrun gives wrong line numbers in error message for scripts containing macro. -- Key: PIG-2081 URL: https://issues.apache.org/jira/browse/PIG-2081 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2081.patch For following script (test.pig) {code} 1 DEFINE my_macro (X,key) returns Y 2 { 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; 4 tmp2 = foreach tmp1 generate flatten(tokens); 5 tmp3 = order tmp2 by $0; 6 $Y = distinct tmp3; 7 } 8 9 A = load 'sometext' using TextLoader() as (row) ; 10 E = my_macro(A,row); 11 12 A1 = load 'sometext2' using TextLoader() as (row1); 13 E1 = my_macro(A1,row1); 14 15 A3 = load 'sometext3' using TextLoader() as (row3); 16 E3 = my_macro(A3,$0); 17 18 F = cogroup E by $0, E1 by $0,E3 by $0; 19 dump F; {code} pig test.pig gives correct line number in error message: {code} ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, column 17 mismatched input '$0' expecting set null {code} while pig -r test.pig gives incorrect line number in error message: {code} ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, column 17 mismatched input '$0' expecting set null {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2029. --- Resolution: Fixed Hadoop Flags: [Reviewed] Inconsistency in Pig Stats reports --- Key: PIG-2029 URL: https://issues.apache.org/jira/browse/PIG-2029 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2029.patch I have a Pig script which reports varying Stats for the same M/R job (same inputs). Sometimes the PigStats reports all the stats (such as Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in Run 2, Hadoop job job_201104272229_75693 has some valid values. The actual Job Tracker link shows that they are non empty. This points to a bug in the interaction of the PigStats module with the Jobtracker. Run 1: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201103091134_556458 160 100 552 191 368 1257 371 392 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201103091134_556600 0 0 0 0 0 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, job_201103091134_556601 7 100 17 8 14 200 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201103091134_556602 0 0 0 0 0 0 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201103091134_556603 0 0 0 0 0 0 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201103091134_556604 2 100 13 7 10 34 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201103091134_556644 0 0 0 0 0 0 0 0 ONJOIN15SAMPLER job_201103091134_556645 0 0 0 0 0 0 0 0 ONJOIN25SAMPLER job_201103091134_556646 0 0 0 0 0 0 0 0 ONJOIN3 SAMPLER job_201103091134_556654 0 0 0 0 0 0 0 0 ONJOIN19SAMPLER job_201103091134_556662 0 0 0 0 0 0 0 0 ONJOIN19ORDER_BY,COMBINER .. {quote} Run 2: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_75503159 100 484 192 353 396 308 321 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_7569318 0 31 14 24 0 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, job_201104272229_756947 100 34 13 22 46 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_75695125 100 19 11 15 32 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_756981 100 12 12 12 13 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_757022 100 21 5 13 35 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_757241 1 4 4 4 11 11 11 ONJOIN15SAMPLER job_201104272229_757250 0 0 0 0 0 0 ONJOIN25SAMPLER job_201104272229_757266 1 8 6 8 24 24 24 ONJOIN3 SAMPLER job_201104272229_757290 0 0 0 0 0 0 ONJOIN19SAMPLER job_201104272229_757521 100 5 5 5
[jira] [Resolved] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1824. --- Resolution: Fixed Hadoop Flags: [Reviewed] patch committed to trunk. Thanks Woody! Support import modules in Jython UDF Key: PIG-1824 URL: https://issues.apache.org/jira/browse/PIG-1824 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Assignee: Woody Anderson Fix For: 0.10 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, 1824x.patch, TEST-org.apache.pig.test.TestGrunt.txt, TEST-org.apache.pig.test.TestScriptLanguage.txt, TEST-org.apache.pig.test.TestScriptUDF.txt Currently, Jython UDF script doesn't support Jython import statement as in the following example: {code} #!/usr/bin/python import re @outputSchema(word:chararray) def resplit(content, regex, index): return re.compile(regex).split(content)[index] {code} Can Pig automatically locate the Jython module file and ship it to the backend? Or should we add a ship clause to let user explicitly specify the module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
Dryrun gives wrong line numbers in error message for scripts containing macro. -- Key: PIG-2081 URL: https://issues.apache.org/jira/browse/PIG-2081 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 For following script (test.pig) {code} 1 DEFINE my_macro (X,key) returns Y 2 { 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; 4 tmp2 = foreach tmp1 generate flatten(tokens); 5 tmp3 = order tmp2 by $0; 6 $Y = distinct tmp3; 7 } 8 9 A = load 'sometext' using TextLoader() as (row) ; 10 E = my_macro(A,row); 11 12 A1 = load 'sometext2' using TextLoader() as (row1); 13 E1 = my_macro(A1,row1); 14 15 A3 = load 'sometext3' using TextLoader() as (row3); 16 E3 = my_macro(A3,$0); 17 18 F = cogroup E by $0, E1 by $0,E3 by $0; 19 dump F; {code} pig test.pig gives correct line number in error message: {code} ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, column 17 mismatched input '$0' expecting set null {code} while pig -r test.pig gives incorrect line number in error message: {code} ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, column 17 mismatched input '$0' expecting set null {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036403#comment-13036403 ] Richard Ding commented on PIG-2029: --- Patch committed to trunk and 0.9 branch. Inconsistency in Pig Stats reports --- Key: PIG-2029 URL: https://issues.apache.org/jira/browse/PIG-2029 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2029.patch I have a Pig script which reports varying Stats for the same M/R job (same inputs). Sometimes the PigStats reports all the stats (such as Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in Run 2, Hadoop job job_201104272229_75693 has some valid values. The actual Job Tracker link shows that they are non empty. This points to a bug in the interaction of the PigStats module with the Jobtracker. Run 1: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201103091134_556458 160 100 552 191 368 1257 371 392 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201103091134_556600 0 0 0 0 0 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, job_201103091134_556601 7 100 17 8 14 200 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201103091134_556602 0 0 0 0 0 0 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201103091134_556603 0 0 0 0 0 0 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201103091134_556604 2 100 13 7 10 34 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201103091134_556644 0 0 0 0 0 0 0 0 ONJOIN15SAMPLER job_201103091134_556645 0 0 0 0 0 0 0 0 ONJOIN25SAMPLER job_201103091134_556646 0 0 0 0 0 0 0 0 ONJOIN3 SAMPLER job_201103091134_556654 0 0 0 0 0 0 0 0 ONJOIN19SAMPLER job_201103091134_556662 0 0 0 0 0 0 0 0 ONJOIN19ORDER_BY,COMBINER .. {quote} Run 2: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_75503159 100 484 192 353 396 308 321 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_7569318 0 31 14 24 0 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, job_201104272229_756947 100 34 13 22 46 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_75695125 100 19 11 15 32 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_756981 100 12 12 12 13 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_757022 100 21 5 13 35 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_757241 1 4 4 4 11 11 11 ONJOIN15SAMPLER job_201104272229_757250 0 0 0 0 0 0 ONJOIN25SAMPLER job_201104272229_757266 1 8 6 8 24 24 24 ONJOIN3 SAMPLER job_201104272229_757290 0 0 0 0 0 0 ONJOIN19SAMPLER job_201104272229_75752
[jira] [Updated] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
[ https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2081: -- Attachment: PIG-2081.patch Dryrun gives wrong line numbers in error message for scripts containing macro. -- Key: PIG-2081 URL: https://issues.apache.org/jira/browse/PIG-2081 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2081.patch For following script (test.pig) {code} 1 DEFINE my_macro (X,key) returns Y 2 { 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; 4 tmp2 = foreach tmp1 generate flatten(tokens); 5 tmp3 = order tmp2 by $0; 6 $Y = distinct tmp3; 7 } 8 9 A = load 'sometext' using TextLoader() as (row) ; 10 E = my_macro(A,row); 11 12 A1 = load 'sometext2' using TextLoader() as (row1); 13 E1 = my_macro(A1,row1); 14 15 A3 = load 'sometext3' using TextLoader() as (row3); 16 E3 = my_macro(A3,$0); 17 18 F = cogroup E by $0, E1 by $0,E3 by $0; 19 dump F; {code} pig test.pig gives correct line number in error message: {code} ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, column 17 mismatched input '$0' expecting set null {code} while pig -r test.pig gives incorrect line number in error message: {code} ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, column 17 mismatched input '$0' expecting set null {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035542#comment-13035542 ] Richard Ding commented on PIG-1824: --- The new patch fixed the unit test errors reported earlier. I have one (different) failed test in TestGrunt, not sure if it's related to the patch. Support import modules in Jython UDF Key: PIG-1824 URL: https://issues.apache.org/jira/browse/PIG-1824 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Assignee: Woody Anderson Fix For: 0.10 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, 1824x.patch, TEST-org.apache.pig.test.TestGrunt.txt, TEST-org.apache.pig.test.TestScriptLanguage.txt, TEST-org.apache.pig.test.TestScriptUDF.txt Currently, Jython UDF script doesn't support Jython import statement as in the following example: {code} #!/usr/bin/python import re @outputSchema(word:chararray) def resplit(content, regex, index): return re.compile(regex).split(content)[index] {code} Can Pig automatically locate the Jython module file and ship it to the backend? Or should we add a ship clause to let user explicitly specify the module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035010#comment-13035010 ] Richard Ding commented on PIG-2029: --- Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't available by querying hadoop using hadoop client API. This is misleading. I propose that we change those values to 'n/a' as following: {code} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_434232 2 10 354 220 287 168 149 163 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_434319 2 0 9 3 6 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/rding/verifypigstats2-UNION5, job_201104272229_434320 2 10 n/a n/a n/a n/a n/a n/a CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_434321 1 10 5 5 5 23 9 17 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_434322 2 10 n/a n/a n/a n/a n/a n/a CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_434323 2 10 n/a n/a n/a n/a n/a n/a CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_434331 2 1 n/a n/a n/a n/a n/a n/a ONJOIN15SAMPLER job_201104272229_434332 2 1 n/a n/a n/a n/a n/a n/a ONJOIN3 SAMPLER job_201104272229_434333 1 1 2 2 2 13 13 13 ONJOIN25SAMPLER job_201104272229_434334 1 1 1 1 1 12 12 12 ONJOIN19SAMPLER job_201104272229_434342 1 10 2 2 2 16 8 11 ONJOIN25ORDER_BY,COMBINER {code} Inconsistency in Pig Stats reports --- Key: PIG-2029 URL: https://issues.apache.org/jira/browse/PIG-2029 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.10 I have a Pig script which reports varying Stats for the same M/R job (same inputs). Sometimes the PigStats reports all the stats (such as Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in Run 2, Hadoop job job_201104272229_75693 has some valid values. The actual Job Tracker link shows that they are non empty. This points to a bug in the interaction of the PigStats module with the Jobtracker. Run 1: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201103091134_556458 160 100 552 191 368 1257 371 392 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201103091134_556600 0 0 0 0 0 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, job_201103091134_556601 7 100 17 8 14 200 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201103091134_556602 0 0 0 0 0 0 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201103091134_556603 0 0 0 0 0 0 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201103091134_556604 2 100 13 7 10 34 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201103091134_556644 0 0 0 0 0 0 0 0 ONJOIN15SAMPLER job_201103091134_556645 0 0 0 0 0 0 0 0 ONJOIN25SAMPLER job_201103091134_556646 0 0 0 0 0 0 0 0 ONJOIN3 SAMPLER job_201103091134_556654 0 0 0 0 0 0 0 0 ONJOIN19
[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2029: -- Attachment: PIG-2029.patch Inconsistency in Pig Stats reports --- Key: PIG-2029 URL: https://issues.apache.org/jira/browse/PIG-2029 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.10 Attachments: PIG-2029.patch I have a Pig script which reports varying Stats for the same M/R job (same inputs). Sometimes the PigStats reports all the stats (such as Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in Run 2, Hadoop job job_201104272229_75693 has some valid values. The actual Job Tracker link shows that they are non empty. This points to a bug in the interaction of the PigStats module with the Jobtracker. Run 1: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201103091134_556458 160 100 552 191 368 1257 371 392 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201103091134_556600 0 0 0 0 0 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, job_201103091134_556601 7 100 17 8 14 200 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201103091134_556602 0 0 0 0 0 0 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201103091134_556603 0 0 0 0 0 0 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201103091134_556604 2 100 13 7 10 34 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201103091134_556644 0 0 0 0 0 0 0 0 ONJOIN15SAMPLER job_201103091134_556645 0 0 0 0 0 0 0 0 ONJOIN25SAMPLER job_201103091134_556646 0 0 0 0 0 0 0 0 ONJOIN3 SAMPLER job_201103091134_556654 0 0 0 0 0 0 0 0 ONJOIN19SAMPLER job_201103091134_556662 0 0 0 0 0 0 0 0 ONJOIN19ORDER_BY,COMBINER .. {quote} Run 2: {quote} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_75503159 100 484 192 353 396 308 321 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_7569318 0 31 14 24 0 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, job_201104272229_756947 100 34 13 22 46 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_75695125 100 19 11 15 32 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_756981 100 12 12 12 13 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_757022 100 21 5 13 35 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_757241 1 4 4 4 11 11 11 ONJOIN15SAMPLER job_201104272229_757250 0 0 0 0 0 0 ONJOIN25SAMPLER job_201104272229_757266 1 8 6 8 24 24 24 ONJOIN3 SAMPLER job_201104272229_757290 0 0 0 0 0 0 ONJOIN19SAMPLER job_201104272229_757521 100 5 5 5 12 9 11
[jira] [Resolved] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case
[ https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2069. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass. Patch committed to trunk and 0.9 branch. LoadFunc jar does not ship to backend in MultiQuery case Key: PIG-2069 URL: https://issues.apache.org/jira/browse/PIG-2069 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.0 Reporter: Daniel Dai Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2069.patch Pig is able to automatically figure out the jar containing the LoadFunc and ship them to backend. However, the following script didn't: {code} A = load '1.txt' using SomeLoadFunc(); B = filter A by $0==0; C = filter A by $1==1; D = join B by $0, C by $0; dump D; {code} The reason is this query is a multiquery (A is reused and thus create an implicit split). When we merge multiquery into one job, we didn't merge udfs list properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2070) Unknown appears in error message for an error case
[ https://issues.apache.org/jira/browse/PIG-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034135#comment-13034135 ] Richard Ding commented on PIG-2070: --- +1 Unknown appears in error message for an error case Key: PIG-2070 URL: https://issues.apache.org/jira/browse/PIG-2070 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Xuefu Zhang Assignee: Thejas M Nair Fix For: 0.9.0 Attachments: PIG-2070.1.patch For the following query: a = load '1.txt' as (a0:int, a1:int); b = load '2.txt' as (a0:int, a1:chararray); c = cogroup a by (a0,a1), b by (a0,a1); Pig gives the following message, which includes unknown word. 2011-05-13 11:01:18,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1051: line 3, column 4 Cannot cast to Unknown The error message should be more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1819) For implicit binding, Jython embedded Pig should skip any variable/value that contains $.
[ https://issues.apache.org/jira/browse/PIG-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1819. --- Resolution: Fixed This is fixed per PIG-1827. For implicit binding, Jython embedded Pig should skip any variable/value that contains $. -- Key: PIG-1819 URL: https://issues.apache.org/jira/browse/PIG-1819 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1819.patch, PIG-1819_1.patch, PIG-1819_2.patch We use the Pig parameter substitution for the bindings so variable/value that contains $ cannot be used. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1827: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and 0.9 branch. When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason Key: PIG-1827 URL: https://issues.apache.org/jira/browse/PIG-1827 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Julien Le Dem Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases
[ https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033151#comment-13033151 ] Richard Ding commented on PIG-2067: --- +1 FilterLogicExpressionSimplifier removed some branches in some cases --- Key: PIG-2067 URL: https://issues.apache.org/jira/browse/PIG-2067 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1, 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.1, 0.9.0 Attachments: PIG-2067-1-0.8.patch, PIG-2067-1.patch The following script produce wrong result: {code} A = load 'a.dat' as (cookie); B = load 'b.dat' as (cookie); C = cogroup A by cookie, B by cookie; E = filter C by COUNT(B)0 AND COUNT(A)0; explain E; {code} a.dat: 1 1 2 2 3 3 4 4 5 5 6 6 7 7 b.dat: 3 3 4 4 5 5 6 6 7 7 8 8 Expected output: (3,{(3)},{(3)}) (4,{(4)},{(4)}) (5,{(5)},{(5)}) (6,{(6)},{(6)}) (7,{(7)},{(7)}) We get: (3,{(3)},{(3)}) (4,{(4)},{(4)}) (5,{(5)},{(5)}) (6,{(6)},{(6)}) (7,{(7)},{(7)}) (8,{},{(8)}) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case
[ https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2069: -- Attachment: PIG-2069.patch This happens when the original MapReduce DAG (before optimization) contains a diamond node. User can workaround this by explicitly registering the LoadFunc jar in the script. The attached patch provides a fix. It's verified with manual test. LoadFunc jar does not ship to backend in MultiQuery case Key: PIG-2069 URL: https://issues.apache.org/jira/browse/PIG-2069 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.0 Reporter: Daniel Dai Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2069.patch Pig is able to automatically figure out the jar containing the LoadFunc and ship them to backend. However, the following script didn't: {code} A = load '1.txt' using SomeLoadFunc(); B = filter A by $0==0; C = filter A by $1==1; D = join B by $0, C by $0; dump D; {code} The reason is this query is a multiquery (A is reused and thus create an implicit split). When we merge multiquery into one job, we didn't merge udfs list properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2076) update documentation, help command with correct default value of pig.cachedbag.memusage
[ https://issues.apache.org/jira/browse/PIG-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033394#comment-13033394 ] Richard Ding commented on PIG-2076: --- +1 update documentation, help command with correct default value of pig.cachedbag.memusage --- Key: PIG-2076 URL: https://issues.apache.org/jira/browse/PIG-2076 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.9.0 Attachments: PIG-2076.1.patch The default value of pig.cachedbag.memusage was changed to 0.2 in pig 0.8, as part of changes in PIG-1447 . But the help command and documentation shows older default value of 0.1 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2056. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass. Patch committed to trunk and 0.9 branch. Jython error messages should show script name - Key: PIG-2056 URL: https://issues.apache.org/jira/browse/PIG-2056 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Attachments: PIG-2056.patch Instead of messages like {code} Traceback (most recent call last): File iostream, line 12, in module {code} It should display the script file name: {code} Traceback (most recent call last): File test.py, line 12, in module {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2058. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. Macro missing returns clause doesn't give a good error message -- Key: PIG-2058 URL: https://issues.apache.org/jira/browse/PIG-2058 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Xuefu Zhang Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2058.patch For the following query: define test( out1,out2 ){ A = load 'x' as (u:int, v:int); $B = filter A by u 3 and v 20; } Pig gives the following error message: Syntax error,unexpected symbol at or near '{' Previously, it gives: mismatched input '{' expecting RETURNS The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031911#comment-13031911 ] Richard Ding commented on PIG-2056: --- Result of test-patch: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} Jython error messages should show script name - Key: PIG-2056 URL: https://issues.apache.org/jira/browse/PIG-2056 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Attachments: PIG-2056.patch Instead of messages like {code} Traceback (most recent call last): File iostream, line 12, in module {code} It should display the script file name: {code} Traceback (most recent call last): File test.py, line 12, in module {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2056) Jython error messages should show script name
Jython error messages should show script name - Key: PIG-2056 URL: https://issues.apache.org/jira/browse/PIG-2056 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Instead of messages like {code} Traceback (most recent call last): File iostream, line 12, in module {code} It should display the script file name: {code} Traceback (most recent call last): File test.py, line 12, in module {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2056: -- Attachment: PIG-2056.patch Jython error messages should show script name - Key: PIG-2056 URL: https://issues.apache.org/jira/browse/PIG-2056 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Attachments: PIG-2056.patch Instead of messages like {code} Traceback (most recent call last): File iostream, line 12, in module {code} It should display the script file name: {code} Traceback (most recent call last): File test.py, line 12, in module {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2035. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2035_1.patch Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2058: -- Attachment: PIG-2058.patch Thanks Xuefu. Attaching a patch with the fix. Macro missing returns clause doesn't give a good error message -- Key: PIG-2058 URL: https://issues.apache.org/jira/browse/PIG-2058 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Xuefu Zhang Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2058.patch For the following query: define test( out1,out2 ){ A = load 'x' as (u:int, v:int); $B = filter A by u 3 and v 20; } Pig gives the following error message: Syntax error,unexpected symbol at or near '{' Previously, it gives: mismatched input '{' expecting RETURNS The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2012: -- Attachment: PIG-2012_2.patch Thanks Xuefu. The new patch addresses the review comments. Comments at the begining of the file throws off line numbers in errors -- Key: PIG-2012 URL: https://issues.apache.org/jira/browse/PIG-2012 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Alan Gates Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig The preprocessor does not appear to be handling leading comments properly when calculating line numbers for error messages. In the attached script, the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1827: -- Attachment: PIG-1827_3.patch We should limit this jira to fix the issue in embedded Pig (i.e. workaround the general parameter substitution) and visit parameter substitution parser and related code in a separate jira. When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason Key: PIG-1827 URL: https://issues.apache.org/jira/browse/PIG-1827 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Julien Le Dem Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030972#comment-13030972 ] Richard Ding commented on PIG-1827: --- New patch added a unit test case as suggested. When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason Key: PIG-1827 URL: https://issues.apache.org/jira/browse/PIG-1827 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Julien Le Dem Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030019#comment-13030019 ] Richard Ding commented on PIG-2035: --- Unit tests pass. Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2035_1.patch Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2049) Pig should display TokenMgrError consistently across all parsers
Pig should display TokenMgrError consistently across all parsers Key: PIG-2049 URL: https://issues.apache.org/jira/browse/PIG-2049 Project: Pig Issue Type: Bug Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs {code} ERROR 1000: Error during parsing. Lexical error at line 5, column 0. {code} But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs {code} ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. {code} Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2049) Pig should display TokenMgrError consistently across all parsers
[ https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2049: -- Attachment: PIG-2049.patch Pig should display TokenMgrError consistently across all parsers Key: PIG-2049 URL: https://issues.apache.org/jira/browse/PIG-2049 Project: Pig Issue Type: Bug Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Attachments: PIG-2049.patch For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs {code} ERROR 1000: Error during parsing. Lexical error at line 5, column 0. {code} But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs {code} ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. {code} Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2050) Pig can't reference auto-generated schema name for TOTUPLE
Pig can't reference auto-generated schema name for TOTUPLE -- Key: PIG-2050 URL: https://issues.apache.org/jira/browse/PIG-2050 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Priority: Minor Here is the use case: {code} grunt A = load 'data' as (a0, a1, a2); grunt B = foreach A generate TOTUPLE(a0, a2); grunt describe B B: {org.apache.pig.builtin.totuple_a0_3: (a0: bytearray,a2: bytearray)} grunt C = foreach B generate org.apache.pig.builtin.totuple_a0_3; 2011-05-06 14:38:14,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: org in {org.apache.pig.builtin.totuple_a0_1: (a0: bytearray,a2: bytearray)} {code} The workaround is to specify a use-defined schema name: {code} grunt A = load 'data' as (a0, a1, a2); grunt B = foreach A generate TOTUPLE(a0, a2) as aa; grunt describe B B: {aa: (a0: bytearray,a2: bytearray)} grunt C = foreach B generate aa; grunt {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2049) Pig should display TokenMgrError message consistently across all parsers
[ https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2049. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. Pig should display TokenMgrError message consistently across all parsers Key: PIG-2049 URL: https://issues.apache.org/jira/browse/PIG-2049 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Attachments: PIG-2049.patch For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs {code} ERROR 1000: Error during parsing. Lexical error at line 5, column 0. {code} But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs {code} ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. {code} Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2033) Pig returns sucess for the failed Pig script
[ https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2033. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass on 0.8 branch. Patch committed to 0.8 branch, 0.9 branch and trunk. Pig returns sucess for the failed Pig script Key: PIG-2033 URL: https://issues.apache.org/jira/browse/PIG-2033 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.1, 0.9.0 Attachments: PIG-2033.patch Pig returns success when a Pig script fails but the count of failed MR jobs is zero. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2041) Minicluster should make each run independent
[ https://issues.apache.org/jira/browse/PIG-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029488#comment-13029488 ] Richard Ding commented on PIG-2041: --- +1 Minicluster should make each run independent Key: PIG-2041 URL: https://issues.apache.org/jira/browse/PIG-2041 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0, 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-2041-1.patch Minicluster will reuse ~/pigtest/conf/hadoop-site.xml. If something wrong in hadoop-site.xml, next test will also be affected. This leads to some mysterious test failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2033) Pig returns sucess for the failed Pig script
[ https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2033: -- Attachment: PIG-2033.patch We make sure that Pig returns success iff the number of successfully jobs equal the number of compiled jobs. This patch doesn't include a unit test since it's difficult to simulate the failure case. Pig returns sucess for the failed Pig script Key: PIG-2033 URL: https://issues.apache.org/jira/browse/PIG-2033 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.1, 0.9.0 Attachments: PIG-2033.patch Pig returns success when a Pig script fails but the count of failed MR jobs is zero. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2035: -- Attachment: PIG-2035_1.patch Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2035_1.patch Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029045#comment-13029045 ] Richard Ding commented on PIG-2035: --- test-patch result: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 585 release audit warnings (more than the trunk's current 584 warnings). {code} Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2035_1.patch Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2028) Speed up multiquery unit tests
[ https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2028. --- Resolution: Fixed Hadoop Flags: [Reviewed] Path committed to trunk and 0.9 branch. Speed up multiquery unit tests --- Key: PIG-2028 URL: https://issues.apache.org/jira/browse/PIG-2028 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2028.patch, PIG-2028_1.patch Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results on my laptop: Using Mini Cluster: TestMultiQueryBasic: 17 min 17 sec TestMultiQuery: 23 min 2 sec Using LOCAL mode: TestMultiQueryBasic: 4 min 17 sec TestMultiQuery: 5 min 51 sec -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2028) Speed up multiquery unit tests
Speed up multiquery unit tests --- Key: PIG-2028 URL: https://issues.apache.org/jira/browse/PIG-2028 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results on my laptop: Using Mini Cluster: TestMultiQueryBasic: 17 min 17 sec TestMultiQuery: 23 min 2 sec Using LOCAL mode: TestMultiQueryBasic: 4 min 17 sec TestMultiQuery: 5 min 51 sec -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1998) Allow macro to return void
[ https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1998: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and 0.9 branch. Allow macro to return void -- Key: PIG-1998 URL: https://issues.apache.org/jira/browse/PIG-1998 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1998_1.patch, PIG-1998_2.patch, PIG-1998_3.patch Pig macro is allowed to not have output alias. But this property isn't clear from macro definition and macro invocation (macro inline). Here we propose to make it clear: 1. If a macro doesn't output any alias, it must specify void as return value. For example: {code} define mymacro(...) returns void { ... ... }; {code} 2. If a macro doesn't output any alias, it must be invoked without return value. For example, to invoke above macro, just specify: {code} mymacro(...); {code} 3. Any non-void return alias in the macro definition must exist in the macro body and be prefixed with $. For example: {code} define mymacro(...) returns B { ... ... $B = filter ...; }; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1827: -- Status: Patch Available (was: Open) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason Key: PIG-1827 URL: https://issues.apache.org/jira/browse/PIG-1827 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Julien Le Dem Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1827-1.patch, PIG-1827_2.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027174#comment-13027174 ] Richard Ding commented on PIG-2012: --- test-patch result: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 964 javac compiler warnings (more than the trunk's current 963 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} Comments at the begining of the file throws off line numbers in errors -- Key: PIG-2012 URL: https://issues.apache.org/jira/browse/PIG-2012 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Alan Gates Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2012_1.patch, macro.pig The preprocessor does not appear to be handling leading comments properly when calculating line numbers for error messages. In the attached script, the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2012: -- Status: Patch Available (was: Open) Comments at the begining of the file throws off line numbers in errors -- Key: PIG-2012 URL: https://issues.apache.org/jira/browse/PIG-2012 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Alan Gates Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2012_1.patch, macro.pig The preprocessor does not appear to be handling leading comments properly when calculating line numbers for error messages. In the attached script, the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1998) Allow macro to return void
[ https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1998: -- Status: Patch Available (was: Open) Allow macro to return void -- Key: PIG-1998 URL: https://issues.apache.org/jira/browse/PIG-1998 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1998_1.patch, PIG-1998_2.patch, PIG-1998_3.patch Pig macro is allowed to not have output alias. But this property isn't clear from macro definition and macro invocation (macro inline). Here we propose to make it clear: 1. If a macro doesn't output any alias, it must specify void as return value. For example: {code} define mymacro(...) returns void { ... ... }; {code} 2. If a macro doesn't output any alias, it must be invoked without return value. For example, to invoke above macro, just specify: {code} mymacro(...); {code} 3. Any non-void return alias in the macro definition must exist in the macro body and be prefixed with $. For example: {code} define mymacro(...) returns B { ... ... $B = filter ...; }; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1998) Allow macro to return void
[ https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1998: -- Attachment: PIG-1998_3.patch The purpose of this validation is to give user an early warning when an alias in the returns clause doesn't appear in the macro as $alias. It performs a simple parsing using StreamTokenizer. Allow macro to return void -- Key: PIG-1998 URL: https://issues.apache.org/jira/browse/PIG-1998 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1998_1.patch, PIG-1998_2.patch, PIG-1998_3.patch Pig macro is allowed to not have output alias. But this property isn't clear from macro definition and macro invocation (macro inline). Here we propose to make it clear: 1. If a macro doesn't output any alias, it must specify void as return value. For example: {code} define mymacro(...) returns void { ... ... }; {code} 2. If a macro doesn't output any alias, it must be invoked without return value. For example, to invoke above macro, just specify: {code} mymacro(...); {code} 3. Any non-void return alias in the macro definition must exist in the macro body and be prefixed with $. For example: {code} define mymacro(...) returns B { ... ... $B = filter ...; }; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2012: -- Attachment: PIG-2012_1.patch Comments at the begining of the file throws off line numbers in errors -- Key: PIG-2012 URL: https://issues.apache.org/jira/browse/PIG-2012 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Alan Gates Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2012_1.patch, macro.pig The preprocessor does not appear to be handling leading comments properly when calculating line numbers for error messages. In the attached script, the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1998) Allow macro to return void
[ https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025898#comment-13025898 ] Richard Ding commented on PIG-1998: --- Patch 2 committed to both trunk and 0.9 branch. I'll add new patches to address additional review comments. Allow macro to return void -- Key: PIG-1998 URL: https://issues.apache.org/jira/browse/PIG-1998 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1998_1.patch, PIG-1998_2.patch Pig macro is allowed to not have output alias. But this property isn't clear from macro definition and macro invocation (macro inline). Here we propose to make it clear: 1. If a macro doesn't output any alias, it must specify void as return value. For example: {code} define mymacro(...) returns void { ... ... }; {code} 2. If a macro doesn't output any alias, it must be invoked without return value. For example, to invoke above macro, just specify: {code} mymacro(...); {code} 3. Any non-void return alias in the macro definition must exist in the macro body and be prefixed with $. For example: {code} define mymacro(...) returns B { ... ... $B = filter ...; }; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1998) Allow macro to return void
[ https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1998: -- Attachment: PIG-1998_2.patch Attaching a new patch that addresses Xuefu's review comments. Allow macro to return void -- Key: PIG-1998 URL: https://issues.apache.org/jira/browse/PIG-1998 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1998_1.patch, PIG-1998_2.patch Pig macro is allowed to not have output alias. But this property isn't clear from macro definition and macro invocation (macro inline). Here we propose to make it clear: 1. If a macro doesn't output any alias, it must specify void as return value. For example: {code} define mymacro(...) returns void { ... ... }; {code} 2. If a macro doesn't output any alias, it must be invoked without return value. For example, to invoke above macro, just specify: {code} mymacro(...); {code} 3. Any non-void return alias in the macro definition must exist in the macro body and be prefixed with $. For example: {code} define mymacro(...) returns B { ... ... $B = filter ...; }; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024833#comment-13024833 ] Richard Ding commented on PIG-1827: --- Unit tests pass. When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason Key: PIG-1827 URL: https://issues.apache.org/jira/browse/PIG-1827 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Julien Le Dem Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1827-1.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1865) BinStorage/PigStorageSchema cannot load data from a different namenode
[ https://issues.apache.org/jira/browse/PIG-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024927#comment-13024927 ] Richard Ding commented on PIG-1865: --- +1 BinStorage/PigStorageSchema cannot load data from a different namenode -- Key: PIG-1865 URL: https://issues.apache.org/jira/browse/PIG-1865 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0, 0.9.0 Reporter: Vivek Padmanabhan Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-1865-1.patch BinStorage/PigStorageSchema cannot load data from a different namenode. The main reason for this is that, in the getSchema method , they use org.apache.pig.impl.io.FileLocalizer to check whether the exists, but the filesystem in HDataStorage refers to the natively configured dfs. The test case is simple : a = load 'hdfs://nn2/input' using BinStorage(); dump a; Here if I specify -Dmapreduce.job.hdfs-servers, it should have worked , by pig still takes the fs from fs.default.name so to make it work i had to override fs.default.name in pig command line. Raising this as a bug since the same scenario works with PigStorage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1976) One more TwoLevelAccess to remove
[ https://issues.apache.org/jira/browse/PIG-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024928#comment-13024928 ] Richard Ding commented on PIG-1976: --- +1 One more TwoLevelAccess to remove - Key: PIG-1976 URL: https://issues.apache.org/jira/browse/PIG-1976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.9.0 Attachments: PIG-1976-1.patch We removed two level access in PIG-847. However, there is another occurrence we miss in ResourceSchema.java: {code} if (type == DataType.BAG fieldSchema.schema != null !fieldSchema.schema.isTwoLevelAccessRequired()) { log.info(Insert two-level access to Resource Schema); FieldSchema fs = new FieldSchema(t, fieldSchema.schema); inner = new Schema(fs); } {code} Though by default schema.isTwoLevelAccessRequired is false, we shall not use this flag in Pig. User could set this flag in legacy UDF. Thanks Woody uncovered this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1999) Macro alias masker should consider schema context
[ https://issues.apache.org/jira/browse/PIG-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1999: -- Attachment: PIG-1999_1.patch Macro alias masker should consider schema context -- Key: PIG-1999 URL: https://issues.apache.org/jira/browse/PIG-1999 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1999_1.patch Macro alias masker doesn't consider the current schema context. This results errors when deciding with alias to mask. Here is an example: {code} define toBytearray(in, intermediate) returns e { a = load '$in' as (name:chararray, age:long, gpa: float); b = group a by name; c = foreach b generate a, (1,2,3); store c into '$intermediate' using BinStorage(); d = load '$intermediate' using BinStorage() as (b:bag{t:tuple(x,y,z)}, t2:tuple(a,b,c)); $e = foreach d generate COUNT(b), t2.a, t2.b, t2.c; }; f = toBytearray ('data', 'output1'); {code} Now the alias masker mistakes b in COUNT(b) as an alias instead of b in the current schema. The workaround is to not use alias as as names in the schema definition. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2005) Discrepancy in the way dry run handles semicolon in macro definition
[ https://issues.apache.org/jira/browse/PIG-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023253#comment-13023253 ] Richard Ding commented on PIG-2005: --- Patch-test result: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} Unit tests pass. Discrepancy in the way dry run handles semicolon in macro definition Key: PIG-2005 URL: https://issues.apache.org/jira/browse/PIG-2005 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2005_1.patch Macro definition requires a semicolon to mark the end. For example: {code} define mymacro(x) returns y {... ...}; {code} But invoked through command line, the macro definitions without semicolon also work except in the case of dryrun. This discrepancy is due to GruntParser automatic appending a semicolon to Pig statements if semicolon is absent at the end. Dryrun GruntParser should do the same. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira