[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4174_5.patch This patch fixed the cogroup issue for Spark 1.1.0. Spark version is updated to 1.1.0. > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, > PIG-4174_4.patch, PIG-4174_5.patch, TEST-org.apache.pig.spark.TestSpark.txt > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4174_4.patch This patch fixed the unit tests. The version of Spark used is 1.0.2. In Spark 1.1.0, the CoGroupRDD is changed and breaks the cogroup runtime. I'm looking into this. > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, > PIG-4174_4.patch, TEST-org.apache.pig.spark.TestSpark.txt > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153705#comment-14153705 ] Richard Ding commented on PIG-4173: --- Sorry, I meant PIG-4168. > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, > TEST-org.apache.pig.spark.TestSpark.txt > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153701#comment-14153701 ] Richard Ding commented on PIG-4173: --- Hi ~praveenr019, Since PIG-4186 hasn't been checked in, it seems make more sense to first build with Spark 1.x and then fix PIG-4186. What do you think? Thanks, -Richard > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, > TEST-org.apache.pig.spark.TestSpark.txt > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4173_3.patch Thanks for the review. The new patch incorporate the changes in the comments. > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, > TEST-org.apache.pig.spark.TestSpark.txt > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4173_2.patch Adding javax.servlet dependency > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch, PIG-4173_2.patch > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-4173: -- Attachment: PIG-4173.patch Attaching the initial patch to upgrade Spark to 1.1.0. I made some local changes so that the patch now compiles with the latest Spark jar. I have a question though: why don't we use JavaRDD throughout the code? Is this due to performance concerns? > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > Attachments: PIG-4173.patch > > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PIG-4173) Move to Spark 1.x
[ https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-4173: - Assignee: Richard Ding > Move to Spark 1.x > - > > Key: PIG-4173 > URL: https://issues.apache.org/jira/browse/PIG-4173 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: bc Wong >Assignee: Richard Ding > > The Spark branch is using Spark 0.9: > https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably > switch to Spark 1.x asap, due to Spark interface changes since 1.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858404#comment-13858404 ] Richard Ding commented on PIG-3608: --- Thanks [~cheolsoo]. > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.13.0 > > Attachments: PIG-3608.patch, PIG-3608_2.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Resolution: Fixed Fix Version/s: 0.13.0 Release Note: Committed to trunk. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.13.0 > > Attachments: PIG-3608.patch, PIG-3608_2.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856047#comment-13856047 ] Richard Ding commented on PIG-3608: --- Thanks for reviewing the patch. Right now I don't have a Pig script to demonstrate this use case. I'm getting this problem while trying to iterate an instance of AvroMapWrapper and find out that I can't look up the value from the map using the key just retrieved from the map. I think this breaks the basic contract of a map implementation. I think the check {code} if (isUtf8key && !(key instanceof Utf8)) {code} is more general. But I'm ok if it is restricted to String. > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3608.patch, PIG-3608_2.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856041#comment-13856041 ] Richard Ding commented on PIG-3609: --- [~cheolsoo], checking size is an optimization, this is also what DefaultAbstractBag implements. +1 on the patch. > ClassCastException when calling compareTo method on AvroBagWrapper > --- > > Key: PIG-3609 > URL: https://issues.apache.org/jira/browse/PIG-3609 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3609.patch, PIG-3609_2.patch, PIG-3609_3.patch > > > One got the following exception when calling compareTo method on > AvroBagWrapper with an AvroBagWrapper object: > {code} > java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper > incompatible with java.util.Collection > at org.apache.avro.generic.GenericData.compare(GenericData.java:786) > at org.apache.avro.generic.GenericData.compare(GenericData.java:760) > at > org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) > {code} > Looking at the code, it compares objects with different types: > {code} > return GenericData.get().compare(theArray, o, theArray.getSchema()); > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3609: -- Attachment: PIG-3609_2.patch New patch with a test case. > ClassCastException when calling compareTo method on AvroBagWrapper > --- > > Key: PIG-3609 > URL: https://issues.apache.org/jira/browse/PIG-3609 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3609.patch, PIG-3609_2.patch > > > One got the following exception when calling compareTo method on > AvroBagWrapper with an AvroBagWrapper object: > {code} > java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper > incompatible with java.util.Collection > at org.apache.avro.generic.GenericData.compare(GenericData.java:786) > at org.apache.avro.generic.GenericData.compare(GenericData.java:760) > at > org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) > {code} > Looking at the code, it compares objects with different types: > {code} > return GenericData.get().compare(theArray, o, theArray.getSchema()); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Attachment: PIG-3608_2.patch You are right. Update the patch with a test case. > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3608.patch, PIG-3608_2.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840791#comment-13840791 ] Richard Ding commented on PIG-3608: --- Actually I have a question: should it be {code} if (isUtf8key) { v = innerMap.get(key); } else { v = innerMap.get(new Utf8((String) key)); } {code} since isUft8key == true means the key is already Utf8? > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3608.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3609: -- Attachment: PIG-3609.patch Attaching a patch. > ClassCastException when calling compareTo method on AvroBagWrapper > --- > > Key: PIG-3609 > URL: https://issues.apache.org/jira/browse/PIG-3609 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Priority: Minor > Attachments: PIG-3609.patch > > > One got the following exception when calling compareTo method on > AvroBagWrapper with an AvroBagWrapper object: > {code} > java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper > incompatible with java.util.Collection > at org.apache.avro.generic.GenericData.compare(GenericData.java:786) > at org.apache.avro.generic.GenericData.compare(GenericData.java:760) > at > org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) > {code} > Looking at the code, it compares objects with different types: > {code} > return GenericData.get().compare(theArray, o, theArray.getSchema()); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3609: -- Status: Patch Available (was: Open) > ClassCastException when calling compareTo method on AvroBagWrapper > --- > > Key: PIG-3609 > URL: https://issues.apache.org/jira/browse/PIG-3609 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3609.patch > > > One got the following exception when calling compareTo method on > AvroBagWrapper with an AvroBagWrapper object: > {code} > java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper > incompatible with java.util.Collection > at org.apache.avro.generic.GenericData.compare(GenericData.java:786) > at org.apache.avro.generic.GenericData.compare(GenericData.java:760) > at > org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) > {code} > Looking at the code, it compares objects with different types: > {code} > return GenericData.get().compare(theArray, o, theArray.getSchema()); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
[ https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-3609: - Assignee: Richard Ding > ClassCastException when calling compareTo method on AvroBagWrapper > --- > > Key: PIG-3609 > URL: https://issues.apache.org/jira/browse/PIG-3609 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3609.patch > > > One got the following exception when calling compareTo method on > AvroBagWrapper with an AvroBagWrapper object: > {code} > java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper > incompatible with java.util.Collection > at org.apache.avro.generic.GenericData.compare(GenericData.java:786) > at org.apache.avro.generic.GenericData.compare(GenericData.java:760) > at > org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) > {code} > Looking at the code, it compares objects with different types: > {code} > return GenericData.get().compare(theArray, o, theArray.getSchema()); > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Status: Patch Available (was: Open) > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3608.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-3608: - Assignee: Richard Ding > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Attachments: PIG-3608.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
[ https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3608: -- Attachment: PIG-3608.patch Attach a simple patch. > ClassCastException when looking up a value from AvroMapWrapper using a Utf8 > key > --- > > Key: PIG-3608 > URL: https://issues.apache.org/jira/browse/PIG-3608 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.12.0 >Reporter: Richard Ding >Priority: Minor > Attachments: PIG-3608.patch > > > One got the following exception: > {code} > java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with > java.lang.String > at > org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) > {code} > This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper
Richard Ding created PIG-3609: - Summary: ClassCastException when calling compareTo method on AvroBagWrapper Key: PIG-3609 URL: https://issues.apache.org/jira/browse/PIG-3609 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Priority: Minor One got the following exception when calling compareTo method on AvroBagWrapper with an AvroBagWrapper object: {code} java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper incompatible with java.util.Collection at org.apache.avro.generic.GenericData.compare(GenericData.java:786) at org.apache.avro.generic.GenericData.compare(GenericData.java:760) at org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78) {code} Looking at the code, it compares objects with different types: {code} return GenericData.get().compare(theArray, o, theArray.getSchema()); {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key
Richard Ding created PIG-3608: - Summary: ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key Key: PIG-3608 URL: https://issues.apache.org/jira/browse/PIG-3608 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.12.0 Reporter: Richard Ding Priority: Minor One got the following exception: {code} java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with java.lang.String at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80) {code} This is related to the change by PIG-3420. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size
[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607820#comment-13607820 ] Richard Ding commented on PIG-3251: --- With HADOOP-7823, can we remove Bzip2TextInputFormat and just use PigTextInputFormat? > Bzip2TextInputFormat requires double the memory of maximum record size > -- > > Key: PIG-3251 > URL: https://issues.apache.org/jira/browse/PIG-3251 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch > > > While looking at user's OOM heap dump, noticed that pig's > Bzip2TextInputFormat consumes memory at both > Bzip2TextInputFormat.buffer (ByteArrayOutputStream) > and actual Text that is returned as line. > For example, when having one record with 160MBytes, buffer was 268MBytes and > Text was 160MBytes. > We can probably eliminate one of them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table
Richard Ding created PIG-3097: - Summary: HiveColumnarLoader doesn't correctly load partitioned Hive table Key: PIG-3097 URL: https://issues.apache.org/jira/browse/PIG-3097 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Given a partitioned Hive table: {code} hive> describe mytable; OK f1string f2 string f3 string partition_dtstring {code} The following Pig script gives the correct schema: {code} grunt> A = load '/hive/warehouse/mytable' using org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 string'); grunt> describe A A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray} {code} But, the command {code} grunt> dump A {code} only produces the first column of all records in the table (all four columns are expected). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3058) Upgrade junit to at least 4.8
[ https://issues.apache.org/jira/browse/PIG-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3058: -- This two failures were introduced by PIG-2924 which actually fixed a bug in JobStats class. But the corresponding errors in TestPigRunner didn't get fixed. > Upgrade junit to at least 4.8 > - > > Key: PIG-3058 > URL: https://issues.apache.org/jira/browse/PIG-3058 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.11 >Reporter: fang fang chen >Assignee: fang fang chen > > Pig needs to upgrade junit version to at least 4.8. Otherwise, one gets > following warnings. > [javadoc] > org/apache/hadoop/hbase/mapreduce/TestWALPlayer.class(org/apache/hadoop/hbase/mapreduce:TestWALPlayer.class): > warning: Cannot find annotation method 'value()' in type > 'org.junit.experimental.categories.Category': class file for > org.junit.experimental.categories.Category not found -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2405: -- Fix Version/s: 0.11 > svn tags/release-0.9.1: some unit test case failed with open JDK > > > Key: PIG-2405 > URL: https://issues.apache.org/jira/browse/PIG-2405 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.1 > Environment: ant-1.8.2 > open jdk: 1.6 >Reporter: fang fang chen >Assignee: fang fang chen > Fix For: 0.11 > > Attachments: PIG-2405-trunk.patch > > > [junit] Test org.apache.pig.test.TestDataModel FAILED > Testcase: testTupleToString took 0.004 sec > FAILED > toString expected:<...ad a little > lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a > little lamb)},[[goodbye#all,hello#world]],42,50,3.14...> > junit.framework.ComparisonFailure: toString expected:<...ad a little > lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a > little lamb)},[[goodbye#all,hello#world]],42,50,3.14...> > at > org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269 > [junit] Test org.apache.pig.test.TestHBaseStorage FAILED > Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec > Testcase: testHeterogeneousScans took 0.018 sec > Caused an ERROR > java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many > open files) > java.lang.RuntimeException: java.io.FileNotFoundException: > /root/pigtest/conf/hadoop-site.xml (Too many open files) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:436) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:271) > at > org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:130) > at > org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809) > at > org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741) > Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml > (Too many open files) > at java.io.FileInputStream.(FileInputStream.java:112) > at java.io.FileInputStream.(FileInputStream.java:72) > at > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) > at > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) > at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown > Source) > at > org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079) > Caused an ERROR > Could not resolve the DNS name of hostname:39611 > java.lang.IllegalArgumentException: Could not resolve the DNS name of > hostname:39611 > at > org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) > at > org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:66) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:171) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:145) > at > org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120) > at > org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112) > [junit] Test org.apache.pig.test.TestMRCompiler FAILED > Testcase
[jira] [Updated] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3000: -- Description: In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record was: In this Pig script: {case} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {case} > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding > > In this Pig script: > {code} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {code} > The Eval function UPPER is called twice for each record. > This should be optimized so that the UPPER is called only once for each record -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-3000: -- Description: In this Pig script: {case} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {case} > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding > > In this Pig script: > {case} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {case} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3000) Optimize nested foreach
Richard Ding created PIG-3000: - Summary: Optimize nested foreach Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception
[ https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2637: -- Status: Patch Available (was: Open) > Command-line option -e throws TokenMgrError exception > - > > Key: PIG-2637 > URL: https://issues.apache.org/jira/browse/PIG-2637 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.9.2 >Reporter: Richard Ding >Assignee: fang fang chen >Priority: Minor > Attachments: PIG-2637.patch > > > The command-line: > {code} > java -cp pig.jar org.apache.pig.Main -x local -e "a = load '1.txt';" > {code} > fails with exception: > {code} > ERROR 1000: Error during parsing. Lexical error at line 1, column 18. > Encountered: after : "" > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2744) Handle Pig command line with XML special characters
[ https://issues.apache.org/jira/browse/PIG-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2744: -- Status: Patch Available (was: Open) > Handle Pig command line with XML special characters > --- > > Key: PIG-2744 > URL: https://issues.apache.org/jira/browse/PIG-2744 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding >Assignee: fang fang chen > Attachments: PIG-2744.patch > > > Pig stores Pig command line string to the Hadoop job XML file. It will fail > if the command line string contains XML special characters. Pig should treat > the command string like Pig script by first encoding it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2405: -- Status: Patch Available (was: Open) > svn tags/release-0.9.1: some unit test case failed with open JDK > > > Key: PIG-2405 > URL: https://issues.apache.org/jira/browse/PIG-2405 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.1 > Environment: ant-1.8.2 > open jdk: 1.6 >Reporter: fang fang chen >Assignee: fang fang chen > Attachments: 2405_1.patch, 2405_2.patch > > > [junit] Test org.apache.pig.test.TestDataModel FAILED > Testcase: testTupleToString took 0.004 sec > FAILED > toString expected:<...ad a little > lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a > little lamb)},[[goodbye#all,hello#world]],42,50,3.14...> > junit.framework.ComparisonFailure: toString expected:<...ad a little > lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a > little lamb)},[[goodbye#all,hello#world]],42,50,3.14...> > at > org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269 > [junit] Test org.apache.pig.test.TestHBaseStorage FAILED > Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec > Testcase: testHeterogeneousScans took 0.018 sec > Caused an ERROR > java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many > open files) > java.lang.RuntimeException: java.io.FileNotFoundException: > /root/pigtest/conf/hadoop-site.xml (Too many open files) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:436) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:271) > at > org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:130) > at > org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809) > at > org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741) > Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml > (Too many open files) > at java.io.FileInputStream.(FileInputStream.java:112) > at java.io.FileInputStream.(FileInputStream.java:72) > at > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) > at > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) > at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown > Source) > at > org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079) > Caused an ERROR > Could not resolve the DNS name of hostname:39611 > java.lang.IllegalArgumentException: Could not resolve the DNS name of > hostname:39611 > at > org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) > at > org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:66) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:171) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:145) > at > org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120) > at > org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112) > [junit] Test org.apache.pig.test.TestMRCompiler FAILED > Testcase: testS
[jira] [Created] (PIG-2744) Handle Pig command line with XML special characters
Richard Ding created PIG-2744: - Summary: Handle Pig command line with XML special characters Key: PIG-2744 URL: https://issues.apache.org/jira/browse/PIG-2744 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Richard Ding Pig stores Pig command line string to the Hadoop job XML file. It will fail if the command line string contains XML special characters. Pig should treat the command string like Pig script by first encoding it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Status: Patch Available (was: Open) > Restore support for parenthesis in Pig 0.9 > -- > > Key: PIG-2261 > URL: https://issues.apache.org/jira/browse/PIG-2261 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2261.patch > > > Pig 0.8 and earlier versions used to support syntax such as > > {code} > A =(load ) > {code} > This was removed as "useless" in 0.9 when the grammar was redone. It turns > out that some user is using this for ease of code generation so we want to > restore it back. > Just to clarify, Pig 0.9 continues to support composite statements such as > {code} > B = filter (load 'data' as (a, b)) by a > 0; > {code} > It just removed "useless" parenthesis and doesn't support statements like > {code} > A = (load 'data' as (a, b)); > {code} > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Attachment: PIG-2261.patch Attaching patch that restores the support for parenthesis. > Restore support for parenthesis in Pig 0.9 > -- > > Key: PIG-2261 > URL: https://issues.apache.org/jira/browse/PIG-2261 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2261.patch > > > Pig 0.8 and earlier versions used to support syntax such as > > {code} > A =(load ) > {code} > This was removed as "useless" in 0.9 when the grammar was redone. It turns > out that some user is using this for ease of code generation so we want to > restore it back. > Just to clarify, Pig 0.9 continues to support composite statements such as > {code} > B = filter (load 'data' as (a, b)) by a > 0; > {code} > It just removed "useless" parenthesis and doesn't support statements like > {code} > A = (load 'data' as (a, b)); > {code} > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Assignee: Richard Ding > Restore support for parenthesis in Pig 0.9 > -- > > Key: PIG-2261 > URL: https://issues.apache.org/jira/browse/PIG-2261 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > > Pig 0.8 and earlier versions used to support syntax such as > > {code} > A =(load ) > {code} > This was removed as "useless" in 0.9 when the grammar was redone. It turns > out that some user is using this for ease of code generation so we want to > restore it back. > Just to clarify, Pig 0.9 continues to support composite statements such as > {code} > B = filter (load 'data' as (a, b)) by a > 0; > {code} > It just removed "useless" parenthesis and doesn't support statements like > {code} > A = (load 'data' as (a, b)); > {code} > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9
[ https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2261: -- Summary: Restore support for parenthesis in Pig 0.9 (was: Restor support for parenthesis in Pig 0.9) > Restore support for parenthesis in Pig 0.9 > -- > > Key: PIG-2261 > URL: https://issues.apache.org/jira/browse/PIG-2261 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding > Fix For: 0.9.1 > > > Pig 0.8 and earlier versions used to support syntax such as > > {code} > A =(load ) > {code} > This was removed as "useless" in 0.9 when the grammar was redone. It turns > out that some user is using this for ease of code generation so we want to > restore it back. > Just to clarify, Pig 0.9 continues to support composite statements such as > {code} > B = filter (load 'data' as (a, b)) by a > 0; > {code} > It just removed "useless" parenthesis and doesn't support statements like > {code} > A = (load 'data' as (a, b)); > {code} > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2261) Restor support for parenthesis in Pig 0.9
Restor support for parenthesis in Pig 0.9 - Key: PIG-2261 URL: https://issues.apache.org/jira/browse/PIG-2261 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Fix For: 0.9.1 Pig 0.8 and earlier versions used to support syntax such as {code} A =(load ) {code} This was removed as "useless" in 0.9 when the grammar was redone. It turns out that some user is using this for ease of code generation so we want to restore it back. Just to clarify, Pig 0.9 continues to support composite statements such as {code} B = filter (load 'data' as (a, b)) by a > 0; {code} It just removed "useless" parenthesis and doesn't support statements like {code} A = (load 'data' as (a, b)); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089330#comment-13089330 ] Richard Ding commented on PIG-2208: --- It only logs once per job in the front end so that user is informed that the multi-inputs (or outputs) counters are disabled. In the back-end the counters are simply disabled without logging. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2208: -- Attachment: PIG-2208.patch This patch implements option 2. Augmenting Pig grammar will be more involved and could be done later. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2208) Restrict number of PIG generated Haddop counters
Restrict number of PIG generated Haddop counters - Key: PIG-2208 URL: https://issues.apache.org/jira/browse/PIG-2208 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0, 0.8.1 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.1 PIG 8.0 implemented Hadoop counters to track the number of records read for each input and the number of records written for each output (PIG-1389 & PIG-1299). On the other hand, Hadoop has imposed limit on per job counters (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. Therefore we need a way to cap the number of PIG generated counters. Here are the two options: 1. Add a integer property (e.g., pig.counter.limit) to the pig property file (e.g., 20). If the number of inputs of a job exceeds this number, the input counters are disabled. Similarly, if the number of outputs of a job exceeds this number, the output counters are disabled. 2. Add a boolean property (e.g., pig.disable.counters) to the pig property file (default: false). If this property is set to true, then the PIG generated counters are disabled. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2125) Make Pig work with hadoop .NEXT
[ https://issues.apache.org/jira/browse/PIG-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068739#comment-13068739 ] Richard Ding commented on PIG-2125: --- +1 > Make Pig work with hadoop .NEXT > --- > > Key: PIG-2125 > URL: https://issues.apache.org/jira/browse/PIG-2125 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.10 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.10 > > Attachments: PIG-2125-1.patch, PIG-2125-2.patch, PIG-2125-3.patch, > PIG-2125-4.patch, PIG-2125-5.patch > > > We need to make Pig work with hadoop .NEXT, the svn branch currently is: > https://svn.apache.org/repos/asf/hadoop/common/branches/MR-279 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2141: -- Description: These jars are already available with hadoop installation. (was: This jars are already available with hadoop installation. ) > Do not bundle apache commons jars with pig-withouthadoop.jar > > > Key: PIG-2141 > URL: https://issues.apache.org/jira/browse/PIG-2141 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: site, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2141.patch > > > These jars are already available with hadoop installation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar
[ https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2141: -- Attachment: PIG-2141.patch > Do not bundle apache commons jars with pig-withouthadoop.jar > > > Key: PIG-2141 > URL: https://issues.apache.org/jira/browse/PIG-2141 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: site, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2141.patch > > > This jars are already available with hadoop installation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar
Do not bundle apache commons jars with pig-withouthadoop.jar Key: PIG-2141 URL: https://issues.apache.org/jira/browse/PIG-2141 Project: Pig Issue Type: Bug Components: build Affects Versions: site, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 This jars are already available with hadoop installation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2083) bincond ERROR 1025: Invalid field projection when null is used
[ https://issues.apache.org/jira/browse/PIG-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039242#comment-13039242 ] Richard Ding commented on PIG-2083: --- +1 > bincond ERROR 1025: Invalid field projection when null is used > -- > > Key: PIG-2083 > URL: https://issues.apache.org/jira/browse/PIG-2083 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.9.0 > Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST > 2008 x86_64 x86_64 x86_64 GNU/Linux > Hadoop 0.20.203.3.1104011556 -r 96519d04f65e22ffadf89b225d0d44ef1741d126 > Compiled on Fri Apr 1 16:29:09 PDT 2011 >Reporter: Araceli Henley >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-2083.1.patch > > > This is a regression for 9. > a = load '1.txt' as (a0, a1); > b = foreach a generate (a0==0?null:2); > explain b; > ERROR 1025: > Invalid field projection. Projected field [null] does not exist in schema -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2094) Move register command from Grunt Parser to Query Parser
Move register command from Grunt Parser to Query Parser --- Key: PIG-2094 URL: https://issues.apache.org/jira/browse/PIG-2094 Project: Pig Issue Type: Improvement Components: grunt Affects Versions: 0.9.0 Reporter: Richard Ding Fix For: 0.10 Like the define command, the register command should be processed by Query Parser. This will allow the register command be used inside macros (since macro can only contain commands that can be processed by Query Parser). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2088) Return alias validation failed when there is single line comment in the macro
[ https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2088. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Return alias validation failed when there is single line comment in the macro > - > > Key: PIG-2088 > URL: https://issues.apache.org/jira/browse/PIG-2088 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2088.patch > > > The following script > {code} > define test() returns b { >a = load 'data' as (name, age, gpa); > -- message >$b = filter a by (int)age > 40; > }; > beta = test(); > store beta into 'output'; > {code} > results in a validation failure: > {code} > ERROR 1200 "Macro test missing return alias b" > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2088) Return alias validation failed when there is single line comment in the macro
[ https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2088: -- Attachment: PIG-2088.patch > Return alias validation failed when there is single line comment in the macro > - > > Key: PIG-2088 > URL: https://issues.apache.org/jira/browse/PIG-2088 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2088.patch > > > The following script > {code} > define test() returns b { >a = load 'data' as (name, age, gpa); > -- message >$b = filter a by (int)age > 40; > }; > beta = test(); > store beta into 'output'; > {code} > results in a validation failure: > {code} > ERROR 1200 "Macro test missing return alias b" > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2088) Return alias validation failed when there is single line comment in the macro
Return alias validation failed when there is single line comment in the macro - Key: PIG-2088 URL: https://issues.apache.org/jira/browse/PIG-2088 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-2088.patch The following script {code} define test() returns b { a = load 'data' as (name, age, gpa); -- message $b = filter a by (int)age > 40; }; beta = test(); store beta into 'output'; {code} results in a validation failure: {code} ERROR 1200 "Macro test missing return alias b" {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script
[ https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038097#comment-13038097 ] Richard Ding commented on PIG-2084: --- +1 > pig is running validation for a statement at a time batch mode, instead of > running it for whole script > -- > > Key: PIG-2084 > URL: https://issues.apache.org/jira/browse/PIG-2084 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-2084.1.patch > > > In PIG-2059, a change was made to run validation for each statement instead > of running it once for the whole script. > This slows down the validation phase, and it ends up taking tens of seconds. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1824. --- Resolution: Fixed Hadoop Flags: [Reviewed] patch committed to trunk. Thanks Woody! > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037039#comment-13037039 ] Richard Ding commented on PIG-1824: --- Patch passed e2e python tests. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2029. --- Resolution: Fixed Hadoop Flags: [Reviewed] > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2029.patch > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0 0 0 > 0 0 ONJOIN19SAMPLER > job_201103091134_556662 0 0 0 0 0 0 > 0 0 ONJOIN19ORDER_BY,COMBINER > .. > {quote} > Run 2: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201104272229_75503159 100 484 192 353 396 > 308 321 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201104272229_7569318 0 31 14 24 0 > 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, > job_201104272229_756947 100 34 13 22 46 > 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201104272229_75695125 100 19 11 15 32 > 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201104272229_756981 100 12 12 12 13 > 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201104272229_757022 100 21 5 13 35 > 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201104272229_757241 1 4 4 4 11 > 11 11 ONJOIN15SAMPLER > job_201104272229_757250 0 0 0 0 0 > 0 ONJOIN25SAMPLER > job_201104272229_757266 1 8 6 8 24 > 24 24 ONJOIN3 SAMPLER > job_201104272229_757290 0 0 0 0 0 > 0 ONJOIN
[jira] [Resolved] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
[ https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2081. --- Resolution: Fixed Hadoop Flags: [Reviewed] patch committed to trunk and 0.9 branch > Dryrun gives wrong line numbers in error message for scripts containing macro. > -- > > Key: PIG-2081 > URL: https://issues.apache.org/jira/browse/PIG-2081 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2081.patch > > > For following script (test.pig) > {code} > 1 DEFINE my_macro (X,key) returns Y > 2 { > 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; > 4 tmp2 = foreach tmp1 generate flatten(tokens); > 5 tmp3 = order tmp2 by $0; > 6 $Y = distinct tmp3; > 7 } > 8 > 9 A = load 'sometext' using TextLoader() as (row) ; > 10 E = my_macro(A,row); > 11 > 12 A1 = load 'sometext2' using TextLoader() as (row1); > 13 E1 = my_macro(A1,row1); > 14 > 15 A3 = load 'sometext3' using TextLoader() as (row3); > 16 E3 = my_macro(A3,$0); > 17 > 18 F = cogroup E by $0, E1 by $0,E3 by $0; > 19 dump F; > {code} > pig test.pig gives correct line number in error message: > {code} > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: column 17> mismatched input '$0' expecting set null > {code} > while pig -r test.pig gives incorrect line number in error message: > {code} > ERROR org.apache.pig.Main - ERROR 1200: column 17> mismatched input '$0' expecting set null > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
[ https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037003#comment-13037003 ] Richard Ding commented on PIG-2081: --- test-patch and unit tests pass. > Dryrun gives wrong line numbers in error message for scripts containing macro. > -- > > Key: PIG-2081 > URL: https://issues.apache.org/jira/browse/PIG-2081 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2081.patch > > > For following script (test.pig) > {code} > 1 DEFINE my_macro (X,key) returns Y > 2 { > 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; > 4 tmp2 = foreach tmp1 generate flatten(tokens); > 5 tmp3 = order tmp2 by $0; > 6 $Y = distinct tmp3; > 7 } > 8 > 9 A = load 'sometext' using TextLoader() as (row) ; > 10 E = my_macro(A,row); > 11 > 12 A1 = load 'sometext2' using TextLoader() as (row1); > 13 E1 = my_macro(A1,row1); > 14 > 15 A3 = load 'sometext3' using TextLoader() as (row3); > 16 E3 = my_macro(A3,$0); > 17 > 18 F = cogroup E by $0, E1 by $0,E3 by $0; > 19 dump F; > {code} > pig test.pig gives correct line number in error message: > {code} > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: column 17> mismatched input '$0' expecting set null > {code} > while pig -r test.pig gives incorrect line number in error message: > {code} > ERROR org.apache.pig.Main - ERROR 1200: column 17> mismatched input '$0' expecting set null > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
[ https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2081: -- Attachment: PIG-2081.patch > Dryrun gives wrong line numbers in error message for scripts containing macro. > -- > > Key: PIG-2081 > URL: https://issues.apache.org/jira/browse/PIG-2081 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2081.patch > > > For following script (test.pig) > {code} > 1 DEFINE my_macro (X,key) returns Y > 2 { > 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; > 4 tmp2 = foreach tmp1 generate flatten(tokens); > 5 tmp3 = order tmp2 by $0; > 6 $Y = distinct tmp3; > 7 } > 8 > 9 A = load 'sometext' using TextLoader() as (row) ; > 10 E = my_macro(A,row); > 11 > 12 A1 = load 'sometext2' using TextLoader() as (row1); > 13 E1 = my_macro(A1,row1); > 14 > 15 A3 = load 'sometext3' using TextLoader() as (row3); > 16 E3 = my_macro(A3,$0); > 17 > 18 F = cogroup E by $0, E1 by $0,E3 by $0; > 19 dump F; > {code} > pig test.pig gives correct line number in error message: > {code} > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: column 17> mismatched input '$0' expecting set null > {code} > while pig -r test.pig gives incorrect line number in error message: > {code} > ERROR org.apache.pig.Main - ERROR 1200: column 17> mismatched input '$0' expecting set null > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036403#comment-13036403 ] Richard Ding commented on PIG-2029: --- Patch committed to trunk and 0.9 branch. > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2029.patch > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0 0 0 > 0 0 ONJOIN19SAMPLER > job_201103091134_556662 0 0 0 0 0 0 > 0 0 ONJOIN19ORDER_BY,COMBINER > .. > {quote} > Run 2: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201104272229_75503159 100 484 192 353 396 > 308 321 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201104272229_7569318 0 31 14 24 0 > 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, > job_201104272229_756947 100 34 13 22 46 > 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201104272229_75695125 100 19 11 15 32 > 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201104272229_756981 100 12 12 12 13 > 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201104272229_757022 100 21 5 13 35 > 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201104272229_757241 1 4 4 4 11 > 11 11 ONJOIN15SAMPLER > job_201104272229_757250 0 0 0 0 0 > 0 ONJOIN25SAMPLER > job_201104272229_757266 1 8 6 8 24 > 24 24 ONJOIN3 SAMPLER > job_201104272229_757290 0 0 0 0
[jira] [Created] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.
Dryrun gives wrong line numbers in error message for scripts containing macro. -- Key: PIG-2081 URL: https://issues.apache.org/jira/browse/PIG-2081 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 For following script (test.pig) {code} 1 DEFINE my_macro (X,key) returns Y 2 { 3 tmp1 = foreach $X generate TOKENIZE((chararray)$key) as tokens; 4 tmp2 = foreach tmp1 generate flatten(tokens); 5 tmp3 = order tmp2 by $0; 6 $Y = distinct tmp3; 7 } 8 9 A = load 'sometext' using TextLoader() as (row) ; 10 E = my_macro(A,row); 11 12 A1 = load 'sometext2' using TextLoader() as (row1); 13 E1 = my_macro(A1,row1); 14 15 A3 = load 'sometext3' using TextLoader() as (row3); 16 E3 = my_macro(A3,$0); 17 18 F = cogroup E by $0, E1 by $0,E3 by $0; 19 dump F; {code} pig test.pig gives correct line number in error message: {code} ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input '$0' expecting set null {code} while pig -r test.pig gives incorrect line number in error message: {code} ERROR org.apache.pig.Main - ERROR 1200: mismatched input '$0' expecting set null {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035542#comment-13035542 ] Richard Ding commented on PIG-1824: --- The new patch fixed the unit test errors reported earlier. I have one (different) failed test in TestGrunt, not sure if it's related to the patch. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2029: -- Attachment: PIG-2029.patch > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.10 > > Attachments: PIG-2029.patch > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0 0 0 > 0 0 ONJOIN19SAMPLER > job_201103091134_556662 0 0 0 0 0 0 > 0 0 ONJOIN19ORDER_BY,COMBINER > .. > {quote} > Run 2: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201104272229_75503159 100 484 192 353 396 > 308 321 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201104272229_7569318 0 31 14 24 0 > 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, > job_201104272229_756947 100 34 13 22 46 > 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201104272229_75695125 100 19 11 15 32 > 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201104272229_756981 100 12 12 12 13 > 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201104272229_757022 100 21 5 13 35 > 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201104272229_757241 1 4 4 4 11 > 11 11 ONJOIN15SAMPLER > job_201104272229_757250 0 0 0 0 0 > 0 ONJOIN25SAMPLER > job_201104272229_757266 1 8 6 8 24 > 24 24 ONJOIN3 SAMPLER > job_201104272229_757290 0 0 0 0 0 > 0 ONJOIN19SAMPLER > job_2011
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035010#comment-13035010 ] Richard Ding commented on PIG-2029: --- Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't available by querying hadoop using hadoop client API. This is misleading. I propose that we change those values to 'n/a' as following: {code} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_434232 2 10 354 220 287 168 149 163 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_434319 2 0 9 3 6 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/rding/verifypigstats2-UNION5, job_201104272229_434320 2 10 n/a n/a n/a n/a n/a n/a CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_434321 1 10 5 5 5 23 9 17 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_434322 2 10 n/a n/a n/a n/a n/a n/a CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_434323 2 10 n/a n/a n/a n/a n/a n/a CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_434331 2 1 n/a n/a n/a n/a n/a n/a ONJOIN15SAMPLER job_201104272229_434332 2 1 n/a n/a n/a n/a n/a n/a ONJOIN3 SAMPLER job_201104272229_434333 1 1 2 2 2 13 13 13 ONJOIN25SAMPLER job_201104272229_434334 1 1 1 1 1 12 12 12 ONJOIN19SAMPLER job_201104272229_434342 1 10 2 2 2 16 8 11 ONJOIN25ORDER_BY,COMBINER {code} > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.10 > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0
[jira] [Commented] (PIG-2070) "Unknown" appears in error message for an error case
[ https://issues.apache.org/jira/browse/PIG-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034135#comment-13034135 ] Richard Ding commented on PIG-2070: --- +1 > "Unknown" appears in error message for an error case > > > Key: PIG-2070 > URL: https://issues.apache.org/jira/browse/PIG-2070 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-2070.1.patch > > > For the following query: > a = load '1.txt' as (a0:int, a1:int); > b = load '2.txt' as (a0:int, a1:chararray); > c = cogroup a by (a0,a1), b by (a0,a1); > Pig gives the following message, which includes "unknown" word. > 2011-05-13 11:01:18,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1051: > Cannot cast to Unknown > The error message should be more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case
[ https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2069. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass. Patch committed to trunk and 0.9 branch. > LoadFunc jar does not ship to backend in MultiQuery case > > > Key: PIG-2069 > URL: https://issues.apache.org/jira/browse/PIG-2069 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1, 0.9.0 >Reporter: Daniel Dai >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2069.patch > > > Pig is able to automatically figure out the jar containing the LoadFunc and > ship them to backend. However, the following script didn't: > {code} > A = load '1.txt' using SomeLoadFunc(); > B = filter A by $0==0; > C = filter A by $1==1; > D = join B by $0, C by $0; > dump D; > {code} > The reason is this query is a multiquery (A is reused and thus create an > implicit split). When we merge multiquery into one job, we didn't merge udfs > list properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2076) update documentation, help command with correct default value of pig.cachedbag.memusage
[ https://issues.apache.org/jira/browse/PIG-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033394#comment-13033394 ] Richard Ding commented on PIG-2076: --- +1 > update documentation, help command with correct default value of > pig.cachedbag.memusage > --- > > Key: PIG-2076 > URL: https://issues.apache.org/jira/browse/PIG-2076 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-2076.1.patch > > > The default value of pig.cachedbag.memusage was changed to 0.2 in pig 0.8, as > part of changes in PIG-1447 . > But the help command and documentation shows older default value of 0.1 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case
[ https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2069: -- Attachment: PIG-2069.patch This happens when the original MapReduce DAG (before optimization) contains a diamond node. User can workaround this by explicitly registering the LoadFunc jar in the script. The attached patch provides a fix. It's verified with manual test. > LoadFunc jar does not ship to backend in MultiQuery case > > > Key: PIG-2069 > URL: https://issues.apache.org/jira/browse/PIG-2069 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1, 0.9.0 >Reporter: Daniel Dai >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2069.patch > > > Pig is able to automatically figure out the jar containing the LoadFunc and > ship them to backend. However, the following script didn't: > {code} > A = load '1.txt' using SomeLoadFunc(); > B = filter A by $0==0; > C = filter A by $1==1; > D = join B by $0, C by $0; > dump D; > {code} > The reason is this query is a multiquery (A is reused and thus create an > implicit split). When we merge multiquery into one job, we didn't merge udfs > list properly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases
[ https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033151#comment-13033151 ] Richard Ding commented on PIG-2067: --- +1 > FilterLogicExpressionSimplifier removed some branches in some cases > --- > > Key: PIG-2067 > URL: https://issues.apache.org/jira/browse/PIG-2067 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.1, 0.9.0 > > Attachments: PIG-2067-1-0.8.patch, PIG-2067-1.patch > > > The following script produce wrong result: > {code} > A = load 'a.dat' as (cookie); > B = load 'b.dat' as (cookie); > C = cogroup A by cookie, B by cookie; > E = filter C by COUNT(B)>0 AND COUNT(A)>0; > explain E; > {code} > a.dat: > 1 1 > 2 2 > 3 3 > 4 4 > 5 5 > 6 6 > 7 7 > b.dat: > 3 3 > 4 4 > 5 5 > 6 6 > 7 7 > 8 8 > Expected output: > (3,{(3)},{(3)}) > (4,{(4)},{(4)}) > (5,{(5)},{(5)}) > (6,{(6)},{(6)}) > (7,{(7)},{(7)}) > We get: > (3,{(3)},{(3)}) > (4,{(4)},{(4)}) > (5,{(5)},{(5)}) > (6,{(6)},{(6)}) > (7,{(7)},{(7)}) > (8,{},{(8)}) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1827: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and 0.9 branch. > When passing a parameter to Pig, if the value contains $ it has to be escaped > for no apparent reason > > > Key: PIG-1827 > URL: https://issues.apache.org/jira/browse/PIG-1827 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Julien Le Dem >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1819) For implicit binding, Jython embedded Pig should skip any variable/value that contains $.
[ https://issues.apache.org/jira/browse/PIG-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1819. --- Resolution: Fixed This is fixed per PIG-1827. > For implicit binding, Jython embedded Pig should skip any variable/value that > contains $. > -- > > Key: PIG-1819 > URL: https://issues.apache.org/jira/browse/PIG-1819 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1819.patch, PIG-1819_1.patch, PIG-1819_2.patch > > > We use the Pig parameter substitution for the bindings so variable/value that > contains $ cannot be used. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2056. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass. Patch committed to trunk and 0.9 branch. > Jython error messages should show script name > - > > Key: PIG-2056 > URL: https://issues.apache.org/jira/browse/PIG-2056 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2056.patch > > > Instead of messages like > {code} > Traceback (most recent call last): > File "", line 12, in > {code} > It should display the script file name: > {code} > Traceback (most recent call last): > File "test.py", line 12, in > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031911#comment-13031911 ] Richard Ding commented on PIG-2056: --- Result of test-patch: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} > Jython error messages should show script name > - > > Key: PIG-2056 > URL: https://issues.apache.org/jira/browse/PIG-2056 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2056.patch > > > Instead of messages like > {code} > Traceback (most recent call last): > File "", line 12, in > {code} > It should display the script file name: > {code} > Traceback (most recent call last): > File "test.py", line 12, in > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2058. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Macro missing returns clause doesn't give a good error message > -- > > Key: PIG-2058 > URL: https://issues.apache.org/jira/browse/PIG-2058 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2058.patch > > > For the following query: > define test( out1,out2 ){ >A = load 'x' as (u:int, v:int); >$B = filter A by u < 3 and v < 20; > } > Pig gives the following error message: Syntax error,unexpected symbol at or > near '{' > Previously, it gives: mismatched input '{' expecting RETURNS > The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message
[ https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2058: -- Attachment: PIG-2058.patch Thanks Xuefu. Attaching a patch with the fix. > Macro missing returns clause doesn't give a good error message > -- > > Key: PIG-2058 > URL: https://issues.apache.org/jira/browse/PIG-2058 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Xuefu Zhang >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2058.patch > > > For the following query: > define test( out1,out2 ){ >A = load 'x' as (u:int, v:int); >$B = filter A by u < 3 and v < 20; > } > Pig gives the following error message: Syntax error,unexpected symbol at or > near '{' > Previously, it gives: mismatched input '{' expecting RETURNS > The previous message is more meaningful. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2035. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Macro expansion doesn't handle multiple expansions of same macro inside > another macro > - > > Key: PIG-2035 > URL: https://issues.apache.org/jira/browse/PIG-2035 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2035_1.patch > > > Here is the use case: > {code} > define test ( in, out, x ) returns c { > a = load '$in' as (name, age, gpa); > b = group a by gpa; > $c = foreach b generate group, COUNT(a.$x); > store $c into '$out'; > }; > define test2( in, out ) returns x { > $x = test( '$in', '$out', 'name' ); > $x = test( '$in', '$out.1', 'age' ); > $x = test( '$in', '$out.2', 'gpa' ); > }; > x = test2('studenttab10k', 'myoutput'); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2056) Jython error messages should show script name
[ https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2056: -- Attachment: PIG-2056.patch > Jython error messages should show script name > - > > Key: PIG-2056 > URL: https://issues.apache.org/jira/browse/PIG-2056 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2056.patch > > > Instead of messages like > {code} > Traceback (most recent call last): > File "", line 12, in > {code} > It should display the script file name: > {code} > Traceback (most recent call last): > File "test.py", line 12, in > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2056) Jython error messages should show script name
Jython error messages should show script name - Key: PIG-2056 URL: https://issues.apache.org/jira/browse/PIG-2056 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 Instead of messages like {code} Traceback (most recent call last): File "", line 12, in {code} It should display the script file name: {code} Traceback (most recent call last): File "test.py", line 12, in {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030972#comment-13030972 ] Richard Ding commented on PIG-1827: --- New patch added a unit test case as suggested. > When passing a parameter to Pig, if the value contains $ it has to be escaped > for no apparent reason > > > Key: PIG-1827 > URL: https://issues.apache.org/jira/browse/PIG-1827 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Julien Le Dem >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason
[ https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1827: -- Attachment: PIG-1827_3.patch We should limit this jira to fix the issue in embedded Pig (i.e. workaround the general parameter substitution) and visit parameter substitution parser and related code in a separate jira. > When passing a parameter to Pig, if the value contains $ it has to be escaped > for no apparent reason > > > Key: PIG-1827 > URL: https://issues.apache.org/jira/browse/PIG-1827 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Julien Le Dem >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2012: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch 2 committed to trunk and 0.9 branch. > Comments at the begining of the file throws off line numbers in errors > -- > > Key: PIG-2012 > URL: https://issues.apache.org/jira/browse/PIG-2012 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Alan Gates >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig > > > The preprocessor does not appear to be handling leading comments properly > when calculating line numbers for error messages. In the attached script, > the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2012: -- Attachment: PIG-2012_2.patch Thanks Xuefu. The new patch addresses the review comments. > Comments at the begining of the file throws off line numbers in errors > -- > > Key: PIG-2012 > URL: https://issues.apache.org/jira/browse/PIG-2012 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Alan Gates >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig > > > The preprocessor does not appear to be handling leading comments properly > when calculating line numbers for error messages. In the attached script, > the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2033) Pig returns sucess for the failed Pig script
[ https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2033. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass on 0.8 branch. Patch committed to 0.8 branch, 0.9 branch and trunk. > Pig returns sucess for the failed Pig script > > > Key: PIG-2033 > URL: https://issues.apache.org/jira/browse/PIG-2033 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.1, 0.9.0 > > Attachments: PIG-2033.patch > > > Pig returns success when a Pig script fails but the count of failed MR jobs > is zero. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2049) Pig should display TokenMgrError message consistently across all parsers
[ https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2049. --- Resolution: Fixed Hadoop Flags: [Reviewed] Patch committed to trunk and 0.9 branch. > Pig should display TokenMgrError message consistently across all parsers > > > Key: PIG-2049 > URL: https://issues.apache.org/jira/browse/PIG-2049 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2049.patch > > > For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs > {code} > ERROR 1000: Error during parsing. Lexical error at line 5, column 0. > {code} > But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs > {code} > ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. > {code} > Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2050) Pig can't reference auto-generated schema name for TOTUPLE
Pig can't reference auto-generated schema name for TOTUPLE -- Key: PIG-2050 URL: https://issues.apache.org/jira/browse/PIG-2050 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Priority: Minor Here is the use case: {code} grunt> A = load 'data' as (a0, a1, a2); grunt> B = foreach A generate TOTUPLE(a0, a2); grunt> describe B B: {org.apache.pig.builtin.totuple_a0_3: (a0: bytearray,a2: bytearray)} grunt> C = foreach B generate org.apache.pig.builtin.totuple_a0_3; 2011-05-06 14:38:14,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: org in {org.apache.pig.builtin.totuple_a0_1: (a0: bytearray,a2: bytearray)} {code} The workaround is to specify a use-defined schema name: {code} grunt> A = load 'data' as (a0, a1, a2); grunt> B = foreach A generate TOTUPLE(a0, a2) as aa; grunt> describe B B: {aa: (a0: bytearray,a2: bytearray)} grunt> C = foreach B generate aa; grunt> {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2049) Pig should display TokenMgrError message consistently across all parsers
[ https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2049: -- Affects Version/s: (was: 0.8.0) Summary: Pig should display TokenMgrError message consistently across all parsers (was: Pig should display TokenMgrError consistently across all parsers) > Pig should display TokenMgrError message consistently across all parsers > > > Key: PIG-2049 > URL: https://issues.apache.org/jira/browse/PIG-2049 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2049.patch > > > For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs > {code} > ERROR 1000: Error during parsing. Lexical error at line 5, column 0. > {code} > But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs > {code} > ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. > {code} > Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2049) Pig should display TokenMgrError consistently across all parsers
[ https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2049: -- Attachment: PIG-2049.patch > Pig should display TokenMgrError consistently across all parsers > > > Key: PIG-2049 > URL: https://issues.apache.org/jira/browse/PIG-2049 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding >Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-2049.patch > > > For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs > {code} > ERROR 1000: Error during parsing. Lexical error at line 5, column 0. > {code} > But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs > {code} > ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. > {code} > Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2049) Pig should display TokenMgrError consistently across all parsers
Pig should display TokenMgrError consistently across all parsers Key: PIG-2049 URL: https://issues.apache.org/jira/browse/PIG-2049 Project: Pig Issue Type: Bug Affects Versions: 0.8.0, 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Priority: Minor Fix For: 0.9.0 For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs {code} ERROR 1000: Error during parsing. Lexical error at line 5, column 0. {code} But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs {code} ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0. {code} Both should have error code 1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030019#comment-13030019 ] Richard Ding commented on PIG-2035: --- Unit tests pass. > Macro expansion doesn't handle multiple expansions of same macro inside > another macro > - > > Key: PIG-2035 > URL: https://issues.apache.org/jira/browse/PIG-2035 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2035_1.patch > > > Here is the use case: > {code} > define test ( in, out, x ) returns c { > a = load '$in' as (name, age, gpa); > b = group a by gpa; > $c = foreach b generate group, COUNT(a.$x); > store $c into '$out'; > }; > define test2( in, out ) returns x { > $x = test( '$in', '$out', 'name' ); > $x = test( '$in', '$out.1', 'age' ); > $x = test( '$in', '$out.2', 'gpa' ); > }; > x = test2('studenttab10k', 'myoutput'); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2033) Pig returns sucess for the failed Pig script
[ https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2033: -- Attachment: PIG-2033.patch We make sure that Pig returns success iff the number of successfully jobs equal the number of compiled jobs. This patch doesn't include a unit test since it's difficult to simulate the failure case. > Pig returns sucess for the failed Pig script > > > Key: PIG-2033 > URL: https://issues.apache.org/jira/browse/PIG-2033 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.1, 0.9.0 > > Attachments: PIG-2033.patch > > > Pig returns success when a Pig script fails but the count of failed MR jobs > is zero. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2041) Minicluster should make each run independent
[ https://issues.apache.org/jira/browse/PIG-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029488#comment-13029488 ] Richard Ding commented on PIG-2041: --- +1 > Minicluster should make each run independent > > > Key: PIG-2041 > URL: https://issues.apache.org/jira/browse/PIG-2041 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0, 0.9.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.9.0 > > Attachments: PIG-2041-1.patch > > > Minicluster will reuse ~/pigtest/conf/hadoop-site.xml. If something wrong in > hadoop-site.xml, next test will also be affected. This leads to some > mysterious test failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1999) Macro alias masker should consider schema context
[ https://issues.apache.org/jira/browse/PIG-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1999. --- Resolution: Fixed Hadoop Flags: [Reviewed] Unit tests pass. Patch committed to trunk and 0.9 branch. > Macro alias masker should consider schema context > -- > > Key: PIG-1999 > URL: https://issues.apache.org/jira/browse/PIG-1999 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-1999_1.patch, PIG-1999_2.patch > > > Macro alias masker doesn't consider the current schema context. This results > errors when deciding with alias to mask. Here is an example: > {code} > define toBytearray(in, intermediate) returns e { >a = load '$in' as (name:chararray, age:long, gpa: float); >b = group a by name; >c = foreach b generate a, (1,2,3); >store c into '$intermediate' using BinStorage(); >d = load '$intermediate' using BinStorage() as (b:bag{t:tuple(x,y,z)}, > t2:tuple(a,b,c)); >$e = foreach d generate COUNT(b), t2.a, t2.b, t2.c; > }; > > f = toBytearray ('data', 'output1'); > {code} > Now the alias masker mistakes b in COUNT(b) as an alias instead of b in the > current schema. > The workaround is to not use alias as as names in the schema definition. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029045#comment-13029045 ] Richard Ding commented on PIG-2035: --- test-patch result: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 585 release audit warnings (more than the trunk's current 584 warnings). {code} > Macro expansion doesn't handle multiple expansions of same macro inside > another macro > - > > Key: PIG-2035 > URL: https://issues.apache.org/jira/browse/PIG-2035 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2035_1.patch > > > Here is the use case: > {code} > define test ( in, out, x ) returns c { > a = load '$in' as (name, age, gpa); > b = group a by gpa; > $c = foreach b generate group, COUNT(a.$x); > store $c into '$out'; > }; > define test2( in, out ) returns x { > $x = test( '$in', '$out', 'name' ); > $x = test( '$in', '$out.1', 'age' ); > $x = test( '$in', '$out.2', 'gpa' ); > }; > x = test2('studenttab10k', 'myoutput'); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
[ https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2035: -- Attachment: PIG-2035_1.patch > Macro expansion doesn't handle multiple expansions of same macro inside > another macro > - > > Key: PIG-2035 > URL: https://issues.apache.org/jira/browse/PIG-2035 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2035_1.patch > > > Here is the use case: > {code} > define test ( in, out, x ) returns c { > a = load '$in' as (name, age, gpa); > b = group a by gpa; > $c = foreach b generate group, COUNT(a.$x); > store $c into '$out'; > }; > define test2( in, out ) returns x { > $x = test( '$in', '$out', 'name' ); > $x = test( '$in', '$out.1', 'age' ); > $x = test( '$in', '$out.2', 'gpa' ); > }; > x = test2('studenttab10k', 'myoutput'); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2012) Comments at the begining of the file throws off line numbers in errors
[ https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028864#comment-13028864 ] Richard Ding commented on PIG-2012: --- Unit tests pass. > Comments at the begining of the file throws off line numbers in errors > -- > > Key: PIG-2012 > URL: https://issues.apache.org/jira/browse/PIG-2012 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Alan Gates >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2012_1.patch, macro.pig > > > The preprocessor does not appear to be handling leading comments properly > when calculating line numbers for error messages. In the attached script, > the error is reported to be on line 7. It is actually on line 10. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro
Macro expansion doesn't handle multiple expansions of same macro inside another macro - Key: PIG-2035 URL: https://issues.apache.org/jira/browse/PIG-2035 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.9.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.9.0 Here is the use case: {code} define test ( in, out, x ) returns c { a = load '$in' as (name, age, gpa); b = group a by gpa; $c = foreach b generate group, COUNT(a.$x); store $c into '$out'; }; define test2( in, out ) returns x { $x = test( '$in', '$out', 'name' ); $x = test( '$in', '$out.1', 'age' ); $x = test( '$in', '$out.2', 'gpa' ); }; x = test2('studenttab10k', 'myoutput'); {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2028) Speed up multiquery unit tests
[ https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-2028. --- Resolution: Fixed Hadoop Flags: [Reviewed] Path committed to trunk and 0.9 branch. > Speed up multiquery unit tests > --- > > Key: PIG-2028 > URL: https://issues.apache.org/jira/browse/PIG-2028 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2028.patch, PIG-2028_1.patch > > > Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results > on my laptop: > Using Mini Cluster: > TestMultiQueryBasic: 17 min 17 sec > TestMultiQuery: 23 min 2 sec > Using LOCAL mode: > TestMultiQueryBasic: 4 min 17 sec > TestMultiQuery: 5 min 51 sec -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2033) Pig returns sucess for the failed Pig script
Pig returns sucess for the failed Pig script Key: PIG-2033 URL: https://issues.apache.org/jira/browse/PIG-2033 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.1, 0.9.0 Pig returns success when a Pig script fails but the count of failed MR jobs is zero. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2028) Speed up multiquery unit tests
[ https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028485#comment-13028485 ] Richard Ding commented on PIG-2028: --- Simplify the test cases. Using Util.createLocalInputFile whenever possible. > Speed up multiquery unit tests > --- > > Key: PIG-2028 > URL: https://issues.apache.org/jira/browse/PIG-2028 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2028.patch, PIG-2028_1.patch > > > Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results > on my laptop: > Using Mini Cluster: > TestMultiQueryBasic: 17 min 17 sec > TestMultiQuery: 23 min 2 sec > Using LOCAL mode: > TestMultiQueryBasic: 4 min 17 sec > TestMultiQuery: 5 min 51 sec -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2028) Speed up multiquery unit tests
[ https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2028: -- Attachment: PIG-2028_1.patch > Speed up multiquery unit tests > --- > > Key: PIG-2028 > URL: https://issues.apache.org/jira/browse/PIG-2028 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.0 > > Attachments: PIG-2028.patch, PIG-2028_1.patch > > > Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results > on my laptop: > Using Mini Cluster: > TestMultiQueryBasic: 17 min 17 sec > TestMultiQuery: 23 min 2 sec > Using LOCAL mode: > TestMultiQueryBasic: 4 min 17 sec > TestMultiQuery: 5 min 51 sec -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028438#comment-13028438 ] Richard Ding commented on PIG-1821: --- +1 > UDFContext.getUDFProperties does not handle collisions in hashcode of udf > classname (+ arg hashcodes) > - > > Key: PIG-1821 > URL: https://issues.apache.org/jira/browse/PIG-1821 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.9.0 > > Attachments: PIG-1821.1.patch, PIG-1821.2.patch > > > In code below, if generateKey() returns same value for two udfs, the udfs > would end up sharing the properties object. > {code} > private HashMap udfConfs = new HashMap Properties>(); > public Properties getUDFProperties(Class c) { > Integer k = generateKey(c); > Properties p = udfConfs.get(k); > if (p == null) { > p = new Properties(); > udfConfs.put(k, p); > } > return p; > } > private int generateKey(Class c) { > return c.getName().hashCode(); > } > public Properties getUDFProperties(Class c, String[] args) { > Integer k = generateKey(c, args); > Properties p = udfConfs.get(k); > if (p == null) { > p = new Properties(); > udfConfs.put(k, p); > } > return p; > } > private int generateKey(Class c, String[] args) { > int hc = c.getName().hashCode(); > for (int i = 0; i < args.length; i++) { > hc <<= 1; > hc ^= args[i].hashCode(); > } > return hc; > } > {code} > To prevent this, a new class (say X) that can hold the classname and args > should be created, and instead of HashMap, HashMap Properties> should be used. Then HahsMap will deal with the collisions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira