[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-10-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4174_5.patch

This patch fixed the cogroup issue for Spark 1.1.0. Spark version is updated to 
1.1.0.

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> PIG-4174_4.patch, PIG-4174_5.patch, TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-10-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4174_4.patch

This patch fixed the unit tests. 

The version of Spark used is 1.0.2. In Spark 1.1.0, the CoGroupRDD is changed 
and breaks the cogroup runtime. I'm looking into this.

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> PIG-4174_4.patch, TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153705#comment-14153705
 ] 

Richard Ding commented on PIG-4173:
---

Sorry, I meant PIG-4168.

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153701#comment-14153701
 ] 

Richard Ding commented on PIG-4173:
---

Hi ~praveenr019, 

Since PIG-4186 hasn't been checked in, it seems make more sense to first build 
with Spark 1.x and then fix PIG-4186. What do you think?

Thanks,
-Richard

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173_3.patch

Thanks for the review. The new patch incorporate the changes in the comments. 

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
> TEST-org.apache.pig.spark.TestSpark.txt
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173_2.patch

Adding javax.servlet dependency

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch, PIG-4173_2.patch
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173.patch

Attaching the initial patch to upgrade Spark to 1.1.0.

I made some local changes so that the patch now compiles with the latest Spark 
jar.

I have a question though: why don't we use JavaRDD throughout the code? Is this 
due to performance concerns?

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
> Attachments: PIG-4173.patch
>
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4173) Move to Spark 1.x

2014-09-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-4173:
-

Assignee: Richard Ding

> Move to Spark 1.x
> -
>
> Key: PIG-4173
> URL: https://issues.apache.org/jira/browse/PIG-4173
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: bc Wong
>Assignee: Richard Ding
>
> The Spark branch is using Spark 0.9: 
> https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
> switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-29 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858404#comment-13858404
 ] 

Richard Ding commented on PIG-3608:
---

Thanks [~cheolsoo].

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: PIG-3608.patch, PIG-3608_2.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-27 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

   Resolution: Fixed
Fix Version/s: 0.13.0
 Release Note: 
Committed to trunk.

 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: PIG-3608.patch, PIG-3608_2.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856047#comment-13856047
 ] 

Richard Ding commented on PIG-3608:
---

Thanks for reviewing the patch.

Right now I don't have a Pig script to demonstrate this use case. I'm getting 
this problem while trying to iterate an instance of AvroMapWrapper and find out 
that I can't look up the value from the map using the key just retrieved from 
the map. I think this breaks the basic contract of a map implementation.

I think the check

{code}
if (isUtf8key && !(key instanceof Utf8))
{code}

is more general. But I'm ok if it is restricted to String.


> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3608.patch, PIG-3608_2.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856041#comment-13856041
 ] 

Richard Ding commented on PIG-3609:
---

[~cheolsoo], checking size is an optimization, this is also what 
DefaultAbstractBag implements. 

+1 on the patch.

> ClassCastException when calling compareTo method on AvroBagWrapper 
> ---
>
> Key: PIG-3609
> URL: https://issues.apache.org/jira/browse/PIG-3609
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3609.patch, PIG-3609_2.patch, PIG-3609_3.patch
>
>
> One got the following exception when calling compareTo method on 
> AvroBagWrapper with an AvroBagWrapper object:
> {code}
> java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
> incompatible with java.util.Collection
> at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
> at 
> org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
> {code}
> Looking at the code, it compares objects with different types:
> {code}
> return GenericData.get().compare(theArray, o, theArray.getSchema());
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3609:
--

Attachment: PIG-3609_2.patch

New patch with a test case.

> ClassCastException when calling compareTo method on AvroBagWrapper 
> ---
>
> Key: PIG-3609
> URL: https://issues.apache.org/jira/browse/PIG-3609
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3609.patch, PIG-3609_2.patch
>
>
> One got the following exception when calling compareTo method on 
> AvroBagWrapper with an AvroBagWrapper object:
> {code}
> java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
> incompatible with java.util.Collection
> at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
> at 
> org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
> {code}
> Looking at the code, it compares objects with different types:
> {code}
> return GenericData.get().compare(theArray, o, theArray.getSchema());
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

Attachment: PIG-3608_2.patch

You are right. Update the patch with a test case.

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3608.patch, PIG-3608_2.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840791#comment-13840791
 ] 

Richard Ding commented on PIG-3608:
---

Actually I have a question: should it be

{code}
if (isUtf8key) {
  v = innerMap.get(key);
} else {
  v = innerMap.get(new Utf8((String) key));
}
{code}

since isUft8key == true means the key is already Utf8?

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3608.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3609:
--

Attachment: PIG-3609.patch

Attaching a patch.

> ClassCastException when calling compareTo method on AvroBagWrapper 
> ---
>
> Key: PIG-3609
> URL: https://issues.apache.org/jira/browse/PIG-3609
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Priority: Minor
> Attachments: PIG-3609.patch
>
>
> One got the following exception when calling compareTo method on 
> AvroBagWrapper with an AvroBagWrapper object:
> {code}
> java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
> incompatible with java.util.Collection
> at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
> at 
> org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
> {code}
> Looking at the code, it compares objects with different types:
> {code}
> return GenericData.get().compare(theArray, o, theArray.getSchema());
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3609:
--

Status: Patch Available  (was: Open)

> ClassCastException when calling compareTo method on AvroBagWrapper 
> ---
>
> Key: PIG-3609
> URL: https://issues.apache.org/jira/browse/PIG-3609
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3609.patch
>
>
> One got the following exception when calling compareTo method on 
> AvroBagWrapper with an AvroBagWrapper object:
> {code}
> java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
> incompatible with java.util.Collection
> at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
> at 
> org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
> {code}
> Looking at the code, it compares objects with different types:
> {code}
> return GenericData.get().compare(theArray, o, theArray.getSchema());
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-3609:
-

Assignee: Richard Ding

> ClassCastException when calling compareTo method on AvroBagWrapper 
> ---
>
> Key: PIG-3609
> URL: https://issues.apache.org/jira/browse/PIG-3609
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3609.patch
>
>
> One got the following exception when calling compareTo method on 
> AvroBagWrapper with an AvroBagWrapper object:
> {code}
> java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
> incompatible with java.util.Collection
> at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
> at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
> at 
> org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
> {code}
> Looking at the code, it compares objects with different types:
> {code}
> return GenericData.get().compare(theArray, o, theArray.getSchema());
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

Status: Patch Available  (was: Open)

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3608.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-3608:
-

Assignee: Richard Ding

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Attachments: PIG-3608.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

Attachment: PIG-3608.patch

Attach a simple patch.

> ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
> key
> ---
>
> Key: PIG-3608
> URL: https://issues.apache.org/jira/browse/PIG-3608
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.12.0
>Reporter: Richard Ding
>Priority: Minor
> Attachments: PIG-3608.patch
>
>
> One got the following exception:
> {code}
> java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
> java.lang.String 
> at 
> org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
> {code}
> This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-04 Thread Richard Ding (JIRA)
Richard Ding created PIG-3609:
-

 Summary: ClassCastException when calling compareTo method on 
AvroBagWrapper 
 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Priority: Minor


One got the following exception when calling compareTo method on AvroBagWrapper 
with an AvroBagWrapper object:

{code}
java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
incompatible with java.util.Collection
at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
at 
org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
{code}

Looking at the code, it compares objects with different types:

{code}
return GenericData.get().compare(theArray, o, theArray.getSchema());
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-04 Thread Richard Ding (JIRA)
Richard Ding created PIG-3608:
-

 Summary: ClassCastException when looking up a value from 
AvroMapWrapper using a Utf8 key
 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Priority: Minor


One got the following exception:

{code}
java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
java.lang.String 
at org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
{code}

This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-03-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607820#comment-13607820
 ] 

Richard Ding commented on PIG-3251:
---

With HADOOP-7823, can we remove Bzip2TextInputFormat and just use 
PigTextInputFormat?

> Bzip2TextInputFormat requires double the memory of maximum record size
> --
>
> Key: PIG-3251
> URL: https://issues.apache.org/jira/browse/PIG-3251
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch
>
>
> While looking at user's OOM heap dump, noticed that pig's 
> Bzip2TextInputFormat consumes memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and 
> Text was 160MBytes.  
> We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table

2012-12-16 Thread Richard Ding (JIRA)
Richard Ding created PIG-3097:
-

 Summary: HiveColumnarLoader doesn't correctly load partitioned 
Hive table 
 Key: PIG-3097
 URL: https://issues.apache.org/jira/browse/PIG-3097
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding



Given a partitioned Hive table:

{code}
hive> describe mytable;
OK
f1string  
f2 string  
f3 string  
partition_dtstring
{code}

The following Pig script gives the correct schema:

{code}
grunt> A = load '/hive/warehouse/mytable' using 
org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 
string');
grunt> describe A
A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray}
{code}

But, the command

{code}
grunt> dump A
{code}

only produces the first column of all records in the table (all four columns 
are expected).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3058) Upgrade junit to at least 4.8

2012-11-21 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3058:
--


This two failures were introduced by PIG-2924 which actually fixed a bug in 
JobStats class. But the corresponding errors in TestPigRunner didn't get fixed.



> Upgrade junit to at least 4.8
> -
>
> Key: PIG-3058
> URL: https://issues.apache.org/jira/browse/PIG-3058
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.11
>Reporter: fang fang chen
>Assignee: fang fang chen
>
> Pig needs to upgrade junit version to at least 4.8. Otherwise, one gets 
> following warnings.
>   [javadoc] 
> org/apache/hadoop/hbase/mapreduce/TestWALPlayer.class(org/apache/hadoop/hbase/mapreduce:TestWALPlayer.class):
>  warning: Cannot find annotation method 'value()' in type 
> 'org.junit.experimental.categories.Category': class file for 
> org.junit.experimental.categories.Category not found

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-11-07 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2405:
--

Fix Version/s: 0.11

> svn tags/release-0.9.1: some unit test case failed with open JDK
> 
>
> Key: PIG-2405
> URL: https://issues.apache.org/jira/browse/PIG-2405
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.1
> Environment: ant-1.8.2
> open jdk: 1.6
>Reporter: fang fang chen
>Assignee: fang fang chen
> Fix For: 0.11
>
> Attachments: PIG-2405-trunk.patch
>
>
> [junit] Test org.apache.pig.test.TestDataModel FAILED
> Testcase: testTupleToString took 0.004 sec
> FAILED
> toString expected:<...ad a little 
> lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a 
> little lamb)},[[goodbye#all,hello#world]],42,50,3.14...>
> junit.framework.ComparisonFailure: toString expected:<...ad a little 
> lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a 
> little lamb)},[[goodbye#all,hello#world]],42,50,3.14...>
>  at 
> org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269
> [junit] Test org.apache.pig.test.TestHBaseStorage FAILED
> Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec
> Testcase: testHeterogeneousScans took 0.018 sec
> Caused an ERROR
> java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many 
> open files)
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /root/pigtest/conf/hadoop-site.xml (Too many open files)
> at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
> at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035)
> at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:436)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:271)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:167)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:130)
> at 
> org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809)
> at 
> org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741)
> Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml 
> (Too many open files)
> at java.io.FileInputStream.(FileInputStream.java:112)
> at java.io.FileInputStream.(FileInputStream.java:72)
> at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
> at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
> at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)
> Caused an ERROR
> Could not resolve the DNS name of hostname:39611
> java.lang.IllegalArgumentException: Could not resolve the DNS name of 
> hostname:39611
> at 
> org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
> at 
> org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:66)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:171)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:145)
> at 
> org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120)
> at 
> org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112)
> [junit] Test org.apache.pig.test.TestMRCompiler FAILED
> Testcase

[jira] [Updated] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3000:
--

Description: 
In this Pig script:

{code}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{code}

The Eval function UPPER is called twice for each record.

This should be optimized so that the UPPER is called only once for each record

  was:
In this Pig script:

{case}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{case}


> Optimize nested foreach
> ---
>
> Key: PIG-3000
> URL: https://issues.apache.org/jira/browse/PIG-3000
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Richard Ding
>
> In this Pig script:
> {code}
> A = load 'data' as (a:chararray);
> B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
> ? 1 : 0); }
> {code}
> The Eval function UPPER is called twice for each record.
> This should be optimized so that the UPPER is called only once for each record

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3000:
--

Description: 
In this Pig script:

{case}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{case}

> Optimize nested foreach
> ---
>
> Key: PIG-3000
> URL: https://issues.apache.org/jira/browse/PIG-3000
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Richard Ding
>
> In this Pig script:
> {case}
> A = load 'data' as (a:chararray);
> B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
> ? 1 : 0); }
> {case}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)
Richard Ding created PIG-3000:
-

 Summary: Optimize nested foreach
 Key: PIG-3000
 URL: https://issues.apache.org/jira/browse/PIG-3000
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2637:
--

Status: Patch Available  (was: Open)

> Command-line option -e throws TokenMgrError exception
> -
>
> Key: PIG-2637
> URL: https://issues.apache.org/jira/browse/PIG-2637
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.9.2
>Reporter: Richard Ding
>Assignee: fang fang chen
>Priority: Minor
> Attachments: PIG-2637.patch
>
>
> The command-line:
> {code}
> java -cp pig.jar org.apache.pig.Main -x local -e "a = load '1.txt';"
> {code}
> fails with exception:
> {code}
> ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
> Encountered:  after : ""
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2744) Handle Pig command line with XML special characters

2012-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2744:
--

Status: Patch Available  (was: Open)

> Handle Pig command line with XML special characters
> ---
>
> Key: PIG-2744
> URL: https://issues.apache.org/jira/browse/PIG-2744
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Richard Ding
>Assignee: fang fang chen
> Attachments: PIG-2744.patch
>
>
> Pig stores Pig command line string to the Hadoop job XML file. It will fail 
> if the command line string contains XML special characters. Pig should treat 
> the command string like Pig script by first encoding it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2405:
--

Status: Patch Available  (was: Open)

> svn tags/release-0.9.1: some unit test case failed with open JDK
> 
>
> Key: PIG-2405
> URL: https://issues.apache.org/jira/browse/PIG-2405
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.1
> Environment: ant-1.8.2
> open jdk: 1.6
>Reporter: fang fang chen
>Assignee: fang fang chen
> Attachments: 2405_1.patch, 2405_2.patch
>
>
> [junit] Test org.apache.pig.test.TestDataModel FAILED
> Testcase: testTupleToString took 0.004 sec
> FAILED
> toString expected:<...ad a little 
> lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a 
> little lamb)},[[goodbye#all,hello#world]],42,50,3.14...>
> junit.framework.ComparisonFailure: toString expected:<...ad a little 
> lamb)},[[hello#world,goodbye#all]],42,50,3.14...> but was:<...ad a 
> little lamb)},[[goodbye#all,hello#world]],42,50,3.14...>
>  at 
> org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269
> [junit] Test org.apache.pig.test.TestHBaseStorage FAILED
> Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec
> Testcase: testHeterogeneousScans took 0.018 sec
> Caused an ERROR
> java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many 
> open files)
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /root/pigtest/conf/hadoop-site.xml (Too many open files)
> at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
> at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035)
> at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
> at org.apache.hadoop.conf.Configuration.get(Configuration.java:436)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:271)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:167)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:130)
> at 
> org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809)
> at 
> org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741)
> Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml 
> (Too many open files)
> at java.io.FileInputStream.(FileInputStream.java:112)
> at java.io.FileInputStream.(FileInputStream.java:72)
> at 
> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
> at 
> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
> at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)
> Caused an ERROR
> Could not resolve the DNS name of hostname:39611
> java.lang.IllegalArgumentException: Could not resolve the DNS name of 
> hostname:39611
> at 
> org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
> at 
> org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:66)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:171)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:145)
> at 
> org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120)
> at 
> org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112)
> [junit] Test org.apache.pig.test.TestMRCompiler FAILED
> Testcase: testS

[jira] [Created] (PIG-2744) Handle Pig command line with XML special characters

2012-06-08 Thread Richard Ding (JIRA)
Richard Ding created PIG-2744:
-

 Summary: Handle Pig command line with XML special characters
 Key: PIG-2744
 URL: https://issues.apache.org/jira/browse/PIG-2744
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding
Assignee: Richard Ding


Pig stores Pig command line string to the Hadoop job XML file. It will fail if 
the command line string contains XML special characters. Pig should treat the 
command string like Pig script by first encoding it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Status: Patch Available  (was: Open)

> Restore support for parenthesis in Pig 0.9
> --
>
> Key: PIG-2261
> URL: https://issues.apache.org/jira/browse/PIG-2261
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.1
>
> Attachments: PIG-2261.patch
>
>
> Pig 0.8 and earlier versions used to support syntax such as 
>  
> {code}
> A =(load )
> {code}
> This was removed as "useless" in 0.9 when the grammar was redone. It turns 
> out that some user is using this for ease of code generation so we want to 
> restore it back.
> Just to clarify, Pig 0.9 continues to support composite statements such as
> {code}
> B = filter (load 'data' as (a, b)) by a > 0;
> {code}
> It just removed "useless" parenthesis and doesn't support statements like
> {code}
> A = (load 'data' as (a, b));
> {code}
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Attachment: PIG-2261.patch

Attaching patch that restores the support for parenthesis.

> Restore support for parenthesis in Pig 0.9
> --
>
> Key: PIG-2261
> URL: https://issues.apache.org/jira/browse/PIG-2261
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.1
>
> Attachments: PIG-2261.patch
>
>
> Pig 0.8 and earlier versions used to support syntax such as 
>  
> {code}
> A =(load )
> {code}
> This was removed as "useless" in 0.9 when the grammar was redone. It turns 
> out that some user is using this for ease of code generation so we want to 
> restore it back.
> Just to clarify, Pig 0.9 continues to support composite statements such as
> {code}
> B = filter (load 'data' as (a, b)) by a > 0;
> {code}
> It just removed "useless" parenthesis and doesn't support statements like
> {code}
> A = (load 'data' as (a, b));
> {code}
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-07 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Assignee: Richard Ding

> Restore support for parenthesis in Pig 0.9
> --
>
> Key: PIG-2261
> URL: https://issues.apache.org/jira/browse/PIG-2261
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.1
>
>
> Pig 0.8 and earlier versions used to support syntax such as 
>  
> {code}
> A =(load )
> {code}
> This was removed as "useless" in 0.9 when the grammar was redone. It turns 
> out that some user is using this for ease of code generation so we want to 
> restore it back.
> Just to clarify, Pig 0.9 continues to support composite statements such as
> {code}
> B = filter (load 'data' as (a, b)) by a > 0;
> {code}
> It just removed "useless" parenthesis and doesn't support statements like
> {code}
> A = (load 'data' as (a, b));
> {code}
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-01 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Summary: Restore support for parenthesis in Pig 0.9  (was: Restor support 
for parenthesis in Pig 0.9)

> Restore support for parenthesis in Pig 0.9
> --
>
> Key: PIG-2261
> URL: https://issues.apache.org/jira/browse/PIG-2261
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
> Fix For: 0.9.1
>
>
> Pig 0.8 and earlier versions used to support syntax such as 
>  
> {code}
> A =(load )
> {code}
> This was removed as "useless" in 0.9 when the grammar was redone. It turns 
> out that some user is using this for ease of code generation so we want to 
> restore it back.
> Just to clarify, Pig 0.9 continues to support composite statements such as
> {code}
> B = filter (load 'data' as (a, b)) by a > 0;
> {code}
> It just removed "useless" parenthesis and doesn't support statements like
> {code}
> A = (load 'data' as (a, b));
> {code}
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2261) Restor support for parenthesis in Pig 0.9

2011-09-01 Thread Richard Ding (JIRA)
Restor support for parenthesis in Pig 0.9
-

 Key: PIG-2261
 URL: https://issues.apache.org/jira/browse/PIG-2261
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
 Fix For: 0.9.1


Pig 0.8 and earlier versions used to support syntax such as 
 
{code}
A =(load )
{code}

This was removed as "useless" in 0.9 when the grammar was redone. It turns out 
that some user is using this for ease of code generation so we want to restore 
it back.

Just to clarify, Pig 0.9 continues to support composite statements such as

{code}
B = filter (load 'data' as (a, b)) by a > 0;
{code}

It just removed "useless" parenthesis and doesn't support statements like

{code}
A = (load 'data' as (a, b));
{code}
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089330#comment-13089330
 ] 

Richard Ding commented on PIG-2208:
---

It only logs once per job in the front end so that user is informed that the 
multi-inputs (or outputs) counters are disabled. In the back-end the counters 
are simply disabled without logging. 

> Restrict number of PIG generated Haddop counters 
> -
>
> Key: PIG-2208
> URL: https://issues.apache.org/jira/browse/PIG-2208
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.1
>
> Attachments: PIG-2208.patch
>
>
> PIG 8.0 implemented Hadoop counters to track the number of records read for 
> each input and the number of records written for each output (PIG-1389 & 
> PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
> (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
> Therefore we need a way to cap the number of PIG generated counters.
> Here are the two options:
> 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
> (e.g., 20). If the number of inputs of a job exceeds this number, the input 
> counters are disabled. Similarly, if the number of outputs of a job exceeds 
> this number, the output counters are disabled.
> 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
> file (default: false). If this property is set to true, then the PIG 
> generated counters are disabled.
>   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-11 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2208:
--

Attachment: PIG-2208.patch

This patch implements option 2. Augmenting Pig grammar will be more involved 
and could be done later.

> Restrict number of PIG generated Haddop counters 
> -
>
> Key: PIG-2208
> URL: https://issues.apache.org/jira/browse/PIG-2208
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.1
>
> Attachments: PIG-2208.patch
>
>
> PIG 8.0 implemented Hadoop counters to track the number of records read for 
> each input and the number of records written for each output (PIG-1389 & 
> PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
> (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
> Therefore we need a way to cap the number of PIG generated counters.
> Here are the two options:
> 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
> (e.g., 20). If the number of inputs of a job exceeds this number, the input 
> counters are disabled. Similarly, if the number of outputs of a job exceeds 
> this number, the output counters are disabled.
> 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
> file (default: false). If this property is set to true, then the PIG 
> generated counters are disabled.
>   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-09 Thread Richard Ding (JIRA)
Restrict number of PIG generated Haddop counters 
-

 Key: PIG-2208
 URL: https://issues.apache.org/jira/browse/PIG-2208
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0, 0.8.1
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1


PIG 8.0 implemented Hadoop counters to track the number of records read for 
each input and the number of records written for each output (PIG-1389 & 
PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
(MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.

Therefore we need a way to cap the number of PIG generated counters.

Here are the two options:

1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
(e.g., 20). If the number of inputs of a job exceeds this number, the input 
counters are disabled. Similarly, if the number of outputs of a job exceeds 
this number, the output counters are disabled.

2. Add a boolean property (e.g., pig.disable.counters) to the pig property file 
(default: false). If this property is set to true, then the PIG generated 
counters are disabled.

  



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2125) Make Pig work with hadoop .NEXT

2011-07-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068739#comment-13068739
 ] 

Richard Ding commented on PIG-2125:
---

+1

> Make Pig work with hadoop .NEXT
> ---
>
> Key: PIG-2125
> URL: https://issues.apache.org/jira/browse/PIG-2125
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.10
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.10
>
> Attachments: PIG-2125-1.patch, PIG-2125-2.patch, PIG-2125-3.patch, 
> PIG-2125-4.patch, PIG-2125-5.patch
>
>
> We need to make Pig work with hadoop .NEXT, the svn branch currently is: 
> https://svn.apache.org/repos/asf/hadoop/common/branches/MR-279

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2141:
--

Description: These jars are already available with hadoop installation.   
(was: This jars are already available with hadoop installation. )

> Do not bundle apache commons jars with pig-withouthadoop.jar
> 
>
> Key: PIG-2141
> URL: https://issues.apache.org/jira/browse/PIG-2141
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: site, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2141.patch
>
>
> These jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2141:
--

Attachment: PIG-2141.patch

> Do not bundle apache commons jars with pig-withouthadoop.jar
> 
>
> Key: PIG-2141
> URL: https://issues.apache.org/jira/browse/PIG-2141
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: site, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2141.patch
>
>
> This jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)
Do not bundle apache commons jars with pig-withouthadoop.jar


 Key: PIG-2141
 URL: https://issues.apache.org/jira/browse/PIG-2141
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: site, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


This jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2083) bincond ERROR 1025: Invalid field projection when null is used

2011-05-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039242#comment-13039242
 ] 

Richard Ding commented on PIG-2083:
---

+1

> bincond ERROR 1025: Invalid field projection when null is used
> --
>
> Key: PIG-2083
> URL: https://issues.apache.org/jira/browse/PIG-2083
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.9.0
> Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> Hadoop 0.20.203.3.1104011556 -r 96519d04f65e22ffadf89b225d0d44ef1741d126
> Compiled on Fri Apr  1 16:29:09 PDT 2011
>Reporter: Araceli Henley
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2083.1.patch
>
>
> This is a regression for 9.
> a = load '1.txt' as (a0, a1);
> b = foreach a generate (a0==0?null:2);
> explain b;
> ERROR 1025:
> Invalid field projection. Projected field [null] does not exist in schema

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2094) Move register command from Grunt Parser to Query Parser

2011-05-24 Thread Richard Ding (JIRA)
Move register command from Grunt Parser to Query Parser
---

 Key: PIG-2094
 URL: https://issues.apache.org/jira/browse/PIG-2094
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Affects Versions: 0.9.0
Reporter: Richard Ding
 Fix For: 0.10


Like the define command, the register command should be processed by Query 
Parser. This will allow the register command be used inside macros (since macro 
can only contain commands that can be processed by Query Parser).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2088.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Return alias validation failed when there is single line comment in the macro
> -
>
> Key: PIG-2088
> URL: https://issues.apache.org/jira/browse/PIG-2088
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2088.patch
>
>
> The following script
> {code}
> define test() returns b { 
>a = load 'data' as (name, age, gpa);
> -- message 
>$b = filter a by (int)age > 40; 
> };
> beta = test();
> store beta into 'output';
> {code}
> results in a validation failure:
> {code}
> ERROR 1200 "Macro test missing return alias b"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2088:
--

Attachment: PIG-2088.patch

> Return alias validation failed when there is single line comment in the macro
> -
>
> Key: PIG-2088
> URL: https://issues.apache.org/jira/browse/PIG-2088
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2088.patch
>
>
> The following script
> {code}
> define test() returns b { 
>a = load 'data' as (name, age, gpa);
> -- message 
>$b = filter a by (int)age > 40; 
> };
> beta = test();
> store beta into 'output';
> {code}
> results in a validation failure:
> {code}
> ERROR 1200 "Macro test missing return alias b"
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)
Return alias validation failed when there is single line comment in the macro
-

 Key: PIG-2088
 URL: https://issues.apache.org/jira/browse/PIG-2088
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0
 Attachments: PIG-2088.patch

The following script

{code}
define test() returns b { 
   a = load 'data' as (name, age, gpa);
-- message 
   $b = filter a by (int)age > 40; 
};

beta = test();
store beta into 'output';
{code}

results in a validation failure:

{code}
ERROR 1200 "Macro test missing return alias b"
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script

2011-05-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038097#comment-13038097
 ] 

Richard Ding commented on PIG-2084:
---

+1

> pig is running validation for a statement at a time batch mode, instead of 
> running it for whole script
> --
>
> Key: PIG-2084
> URL: https://issues.apache.org/jira/browse/PIG-2084
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2084.1.patch
>
>
> In PIG-2059, a change was made to run validation for each statement instead 
> of running it once for the whole script.
> This slows down the validation phase, and it ends up taking tens of seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1824) Support import modules in Jython UDF

2011-05-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1824.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

patch committed to trunk. Thanks Woody!

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
> 1824c.patch, 1824d.patch, 1824x.patch, 
> TEST-org.apache.pig.test.TestGrunt.txt, 
> TEST-org.apache.pig.test.TestScriptLanguage.txt, 
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037039#comment-13037039
 ] 

Richard Ding commented on PIG-1824:
---

Patch passed e2e python tests.

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
> 1824c.patch, 1824d.patch, 1824x.patch, 
> TEST-org.apache.pig.test.TestGrunt.txt, 
> TEST-org.apache.pig.test.TestScriptLanguage.txt, 
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2029.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2029.patch
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0   ONJOIN25SAMPLER 
> job_201104272229_757266   1   8   6   8   24  
> 24  24  ONJOIN3 SAMPLER 
> job_201104272229_757290   0   0   0   0   0   
> 0   ONJOIN

[jira] [Resolved] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2081.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

patch committed to trunk and 0.9 branch

> Dryrun gives wrong line numbers in error message for scripts containing macro.
> --
>
> Key: PIG-2081
> URL: https://issues.apache.org/jira/browse/PIG-2081
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2081.patch
>
>
> For following script (test.pig)
> {code}
> 1 DEFINE my_macro (X,key) returns Y
>   2 {
>   3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
>   4 tmp2 = foreach tmp1 generate flatten(tokens);
>   5 tmp3 = order tmp2 by $0;
>   6 $Y = distinct tmp3;
>   7 }
>   8 
>   9 A = load 'sometext' using TextLoader() as (row) ;
>  10 E = my_macro(A,row);
>  11 
>  12 A1 = load 'sometext2' using TextLoader() as (row1);
>  13 E1 = my_macro(A1,row1);
>  14 
>  15 A3 = load 'sometext3' using TextLoader() as (row3);
>  16 E3 = my_macro(A3,$0);
>  17 
>  18 F = cogroup E by $0, E1 by $0,E3 by $0;
>  19 dump F;
> {code}
> pig test.pig gives correct line number in error message:
> {code}
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  column 17>  mismatched input '$0' expecting set null
> {code}
> while pig -r test.pig gives incorrect line number in error message:
> {code}
> ERROR org.apache.pig.Main - ERROR 1200:  column 17>  mismatched input '$0' expecting set null
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037003#comment-13037003
 ] 

Richard Ding commented on PIG-2081:
---

test-patch and unit tests pass.

> Dryrun gives wrong line numbers in error message for scripts containing macro.
> --
>
> Key: PIG-2081
> URL: https://issues.apache.org/jira/browse/PIG-2081
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2081.patch
>
>
> For following script (test.pig)
> {code}
> 1 DEFINE my_macro (X,key) returns Y
>   2 {
>   3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
>   4 tmp2 = foreach tmp1 generate flatten(tokens);
>   5 tmp3 = order tmp2 by $0;
>   6 $Y = distinct tmp3;
>   7 }
>   8 
>   9 A = load 'sometext' using TextLoader() as (row) ;
>  10 E = my_macro(A,row);
>  11 
>  12 A1 = load 'sometext2' using TextLoader() as (row1);
>  13 E1 = my_macro(A1,row1);
>  14 
>  15 A3 = load 'sometext3' using TextLoader() as (row3);
>  16 E3 = my_macro(A3,$0);
>  17 
>  18 F = cogroup E by $0, E1 by $0,E3 by $0;
>  19 dump F;
> {code}
> pig test.pig gives correct line number in error message:
> {code}
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  column 17>  mismatched input '$0' expecting set null
> {code}
> while pig -r test.pig gives incorrect line number in error message:
> {code}
> ERROR org.apache.pig.Main - ERROR 1200:  column 17>  mismatched input '$0' expecting set null
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-19 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2081:
--

Attachment: PIG-2081.patch

> Dryrun gives wrong line numbers in error message for scripts containing macro.
> --
>
> Key: PIG-2081
> URL: https://issues.apache.org/jira/browse/PIG-2081
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2081.patch
>
>
> For following script (test.pig)
> {code}
> 1 DEFINE my_macro (X,key) returns Y
>   2 {
>   3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
>   4 tmp2 = foreach tmp1 generate flatten(tokens);
>   5 tmp3 = order tmp2 by $0;
>   6 $Y = distinct tmp3;
>   7 }
>   8 
>   9 A = load 'sometext' using TextLoader() as (row) ;
>  10 E = my_macro(A,row);
>  11 
>  12 A1 = load 'sometext2' using TextLoader() as (row1);
>  13 E1 = my_macro(A1,row1);
>  14 
>  15 A3 = load 'sometext3' using TextLoader() as (row3);
>  16 E3 = my_macro(A3,$0);
>  17 
>  18 F = cogroup E by $0, E1 by $0,E3 by $0;
>  19 dump F;
> {code}
> pig test.pig gives correct line number in error message:
> {code}
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:  column 17>  mismatched input '$0' expecting set null
> {code}
> while pig -r test.pig gives incorrect line number in error message:
> {code}
> ERROR org.apache.pig.Main - ERROR 1200:  column 17>  mismatched input '$0' expecting set null
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-19 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036403#comment-13036403
 ] 

Richard Ding commented on PIG-2029:
---

Patch committed to trunk and 0.9 branch.

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2029.patch
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0   ONJOIN25SAMPLER 
> job_201104272229_757266   1   8   6   8   24  
> 24  24  ONJOIN3 SAMPLER 
> job_201104272229_757290   0   0   0   0 

[jira] [Created] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-19 Thread Richard Ding (JIRA)
Dryrun gives wrong line numbers in error message for scripts containing macro.
--

 Key: PIG-2081
 URL: https://issues.apache.org/jira/browse/PIG-2081
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


For following script (test.pig)

{code}
1 DEFINE my_macro (X,key) returns Y
  2 {
  3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
  4 tmp2 = foreach tmp1 generate flatten(tokens);
  5 tmp3 = order tmp2 by $0;
  6 $Y = distinct tmp3;
  7 }
  8 
  9 A = load 'sometext' using TextLoader() as (row) ;
 10 E = my_macro(A,row);
 11 
 12 A1 = load 'sometext2' using TextLoader() as (row1);
 13 E1 = my_macro(A1,row1);
 14 
 15 A3 = load 'sometext3' using TextLoader() as (row3);
 16 E3 = my_macro(A3,$0);
 17 
 18 F = cogroup E by $0, E1 by $0,E3 by $0;
 19 dump F;
{code}

pig test.pig gives correct line number in error message:

{code}
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:   mismatched input '$0' expecting set null
{code}

while pig -r test.pig gives incorrect line number in error message:

{code}
ERROR org.apache.pig.Main - ERROR 1200:   mismatched input '$0' expecting set null
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-18 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035542#comment-13035542
 ] 

Richard Ding commented on PIG-1824:
---

The new patch fixed the unit test errors reported earlier. I have one 
(different) failed test in TestGrunt, not sure if it's related to the patch. 

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10
>
> Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
> 1824c.patch, 1824d.patch, 1824x.patch, 
> TEST-org.apache.pig.test.TestGrunt.txt, 
> TEST-org.apache.pig.test.TestScriptLanguage.txt, 
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-17 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2029:
--

Attachment: PIG-2029.patch

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.10
>
> Attachments: PIG-2029.patch
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0   0   0   
> 0   0   ONJOIN19SAMPLER 
> job_201103091134_556662   0   0   0   0   0   0   
> 0   0   ONJOIN19ORDER_BY,COMBINER
> ..
> {quote}
> Run 2:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201104272229_75503159 100 484 192 353 396 
> 308 321 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201104272229_7569318  0   31  14  24  0   
> 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
> job_201104272229_756947   100 34  13  22  46  
> 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201104272229_75695125 100 19  11  15  32  
> 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201104272229_756981   100 12  12  12  13  
> 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201104272229_757022   100 21  5   13  35  
> 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201104272229_757241   1   4   4   4   11  
> 11  11  ONJOIN15SAMPLER 
> job_201104272229_757250   0   0   0   0   0   
> 0   ONJOIN25SAMPLER 
> job_201104272229_757266   1   8   6   8   24  
> 24  24  ONJOIN3 SAMPLER 
> job_201104272229_757290   0   0   0   0   0   
> 0   ONJOIN19SAMPLER 
> job_2011

[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-17 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035010#comment-13035010
 ] 

Richard Ding commented on PIG-2029:
---

Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't 
available by querying hadoop using hadoop client API. This is misleading. I 
propose that we change those values to 'n/a' as following:

{code}
Job Stats (time in seconds):
JobId   MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201104272229_434232 2   10  354 220 287 168 149 
163 
IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
   DISTINCT,MULTI_QUERY
job_201104272229_434319 2   0   9   3   6   0   0   
0   UNION5  MULTI_QUERY,MAP_ONLY/user/rding/verifypigstats2-UNION5,
job_201104272229_434320 2   10  n/a n/a n/a n/a n/a 
n/a CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
job_201104272229_434321 1   10  5   5   5   23  9   
17  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
job_201104272229_434322 2   10  n/a n/a n/a n/a n/a 
n/a CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
job_201104272229_434323 2   10  n/a n/a n/a n/a n/a 
n/a CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
job_201104272229_434331 2   1   n/a n/a n/a n/a n/a 
n/a ONJOIN15SAMPLER 
job_201104272229_434332 2   1   n/a n/a n/a n/a n/a 
n/a ONJOIN3 SAMPLER 
job_201104272229_434333 1   1   2   2   2   13  13  
13  ONJOIN25SAMPLER 
job_201104272229_434334 1   1   1   1   1   12  12  
12  ONJOIN19SAMPLER 
job_201104272229_434342 1   10  2   2   2   16  8   
11  ONJOIN25ORDER_BY,COMBINER   
{code}

> Inconsistency in Pig Stats reports 
> ---
>
> Key: PIG-2029
> URL: https://issues.apache.org/jira/browse/PIG-2029
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.10
>
>
> I have a Pig script which reports varying Stats for the same M/R job (same 
> inputs). Sometimes the PigStats reports all the stats (such as 
> Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
> and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
> Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
> job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
> Run 2, Hadoop job job_201104272229_75693 has some valid values. 
> The actual Job Tracker link shows that they are non empty. This points to a 
> bug in the interaction of the PigStats module with the Jobtracker.
> Run 1:
> {quote}
> Job Stats (time in seconds):
> JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
> MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
> job_201103091134_556458   160 100 552 191 368 1257
> 371 392 
> IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
>DISTINCT,MULTI_QUERY
> job_201103091134_556600   0   0   0   0   0   0   
> 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
> job_201103091134_556601   7   100 17  8   14  200 
> 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
> job_201103091134_556602   0   0   0   0   0   0   
> 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
> job_201103091134_556603   0   0   0   0   0   0   
> 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
> job_201103091134_556604   2   100 13  7   10  34  
> 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
> job_201103091134_556644   0   0   0   0   0   0   
> 0   0   ONJOIN15SAMPLER 
> job_201103091134_556645   0   0   0   0   0   0   
> 0   0   ONJOIN25SAMPLER 
> job_201103091134_556646   0   0   0   0   0   0   
> 0   0   ONJOIN3 SAMPLER 
> job_201103091134_556654   0   0   0   0  

[jira] [Commented] (PIG-2070) "Unknown" appears in error message for an error case

2011-05-16 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034135#comment-13034135
 ] 

Richard Ding commented on PIG-2070:
---

+1

> "Unknown" appears in error message for an error case
> 
>
> Key: PIG-2070
> URL: https://issues.apache.org/jira/browse/PIG-2070
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2070.1.patch
>
>
> For the following query:
> a = load '1.txt' as (a0:int, a1:int);
> b = load '2.txt' as (a0:int, a1:chararray);
> c = cogroup a by (a0,a1), b by (a0,a1);
> Pig gives the following message, which includes "unknown" word. 
> 2011-05-13 11:01:18,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1051:
>  Cannot cast to Unknown
> The error message should be more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case

2011-05-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2069.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass. Patch committed to trunk and 0.9 branch.

> LoadFunc jar does not ship to backend in MultiQuery case
> 
>
> Key: PIG-2069
> URL: https://issues.apache.org/jira/browse/PIG-2069
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2069.patch
>
>
> Pig is able to automatically figure out the jar containing the LoadFunc and 
> ship them to backend. However, the following script didn't:
> {code}
> A = load '1.txt' using SomeLoadFunc();
> B = filter A by $0==0;
> C = filter A by $1==1;
> D = join B by $0, C by $0;
> dump D;
> {code}
> The reason is this query is a multiquery (A is reused and thus create an 
> implicit split). When we merge multiquery into one job, we didn't merge udfs 
> list properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2076) update documentation, help command with correct default value of pig.cachedbag.memusage

2011-05-13 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033394#comment-13033394
 ] 

Richard Ding commented on PIG-2076:
---

+1

> update documentation, help command with correct default value of 
> pig.cachedbag.memusage
> ---
>
> Key: PIG-2076
> URL: https://issues.apache.org/jira/browse/PIG-2076
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-2076.1.patch
>
>
> The default value of pig.cachedbag.memusage was changed to 0.2 in pig 0.8, as 
> part of changes in PIG-1447 .
> But the help command and documentation shows older default value of 0.1 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case

2011-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2069:
--

Attachment: PIG-2069.patch

This happens when the original MapReduce DAG (before optimization) contains a 
diamond node.

User can workaround this by explicitly registering the LoadFunc jar in the 
script.

The attached patch provides a fix. It's verified with manual test.

> LoadFunc jar does not ship to backend in MultiQuery case
> 
>
> Key: PIG-2069
> URL: https://issues.apache.org/jira/browse/PIG-2069
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2069.patch
>
>
> Pig is able to automatically figure out the jar containing the LoadFunc and 
> ship them to backend. However, the following script didn't:
> {code}
> A = load '1.txt' using SomeLoadFunc();
> B = filter A by $0==0;
> C = filter A by $1==1;
> D = join B by $0, C by $0;
> dump D;
> {code}
> The reason is this query is a multiquery (A is reused and thus create an 
> implicit split). When we merge multiquery into one job, we didn't merge udfs 
> list properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-13 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033151#comment-13033151
 ] 

Richard Ding commented on PIG-2067:
---

+1

> FilterLogicExpressionSimplifier removed some branches in some cases
> ---
>
> Key: PIG-2067
> URL: https://issues.apache.org/jira/browse/PIG-2067
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.1, 0.9.0
>
> Attachments: PIG-2067-1-0.8.patch, PIG-2067-1.patch
>
>
> The following script produce wrong result:
> {code}
> A = load 'a.dat' as (cookie);
> B = load 'b.dat' as (cookie);
> C = cogroup A by cookie, B by cookie;
> E = filter C by COUNT(B)>0 AND COUNT(A)>0;
> explain E;
> {code}
> a.dat:
> 1   1
> 2   2
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> b.dat:
> 3   3
> 4   4
> 5   5
> 6   6
> 7   7
> 8   8
> Expected output:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> We get:
> (3,{(3)},{(3)})
> (4,{(4)},{(4)})
> (5,{(5)},{(5)})
> (6,{(6)},{(6)})
> (7,{(7)},{(7)})
> (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.9 branch.

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1819) For implicit binding, Jython embedded Pig should skip any variable/value that contains $.

2011-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1819.
---

Resolution: Fixed

This is fixed per PIG-1827.

> For implicit binding, Jython embedded Pig should skip any variable/value that 
> contains $. 
> --
>
> Key: PIG-1819
> URL: https://issues.apache.org/jira/browse/PIG-1819
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1819.patch, PIG-1819_1.patch, PIG-1819_2.patch
>
>
> We use the Pig parameter substitution for the bindings so variable/value that 
> contains $ cannot be used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2056) Jython error messages should show script name

2011-05-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2056.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass. Patch committed to trunk and 0.9 branch.

> Jython error messages should show script name
> -
>
> Key: PIG-2056
> URL: https://issues.apache.org/jira/browse/PIG-2056
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2056.patch
>
>
> Instead of messages like
> {code}
> Traceback (most recent call last):
>   File "", line 12, in 
> {code}
> It should display the script file name:
> {code}
> Traceback (most recent call last):
>   File "test.py", line 12, in 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2056) Jython error messages should show script name

2011-05-11 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031911#comment-13031911
 ] 

Richard Ding commented on PIG-2056:
---

Result of test-patch:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

> Jython error messages should show script name
> -
>
> Key: PIG-2056
> URL: https://issues.apache.org/jira/browse/PIG-2056
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2056.patch
>
>
> Instead of messages like
> {code}
> Traceback (most recent call last):
>   File "", line 12, in 
> {code}
> It should display the script file name:
> {code}
> Traceback (most recent call last):
>   File "test.py", line 12, in 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-11 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2058.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Macro missing returns clause doesn't give a good error message
> --
>
> Key: PIG-2058
> URL: https://issues.apache.org/jira/browse/PIG-2058
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2058.patch
>
>
> For the following query:
> define test( out1,out2 ){
>A  = load 'x' as (u:int, v:int);
>$B  = filter A by u < 3 and v <  20;
> }
> Pig gives the following error message: Syntax error,unexpected symbol at or 
> near '{'
> Previously, it gives: mismatched input '{' expecting RETURNS
> The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2058:
--

Attachment: PIG-2058.patch

Thanks Xuefu. Attaching a patch with the fix.

> Macro missing returns clause doesn't give a good error message
> --
>
> Key: PIG-2058
> URL: https://issues.apache.org/jira/browse/PIG-2058
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Xuefu Zhang
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2058.patch
>
>
> For the following query:
> define test( out1,out2 ){
>A  = load 'x' as (u:int, v:int);
>$B  = filter A by u < 3 and v <  20;
> }
> Pig gives the following error message: Syntax error,unexpected symbol at or 
> near '{'
> Previously, it gives: mismatched input '{' expecting RETURNS
> The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2035.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Macro expansion doesn't handle multiple expansions of same macro inside 
> another macro
> -
>
> Key: PIG-2035
> URL: https://issues.apache.org/jira/browse/PIG-2035
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2035_1.patch
>
>
> Here is the use case:
> {code}
> define test ( in, out, x ) returns c { 
> a = load '$in' as (name, age, gpa);
> b = group a by gpa;
> $c = foreach b generate group, COUNT(a.$x);
> store $c into '$out';
> };
> define test2( in, out ) returns x { 
> $x = test( '$in', '$out', 'name' );
> $x = test( '$in', '$out.1', 'age' );
> $x = test( '$in', '$out.2', 'gpa' );
> };
> x = test2('studenttab10k', 'myoutput');
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2056) Jython error messages should show script name

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2056:
--

Attachment: PIG-2056.patch

> Jython error messages should show script name
> -
>
> Key: PIG-2056
> URL: https://issues.apache.org/jira/browse/PIG-2056
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2056.patch
>
>
> Instead of messages like
> {code}
> Traceback (most recent call last):
>   File "", line 12, in 
> {code}
> It should display the script file name:
> {code}
> Traceback (most recent call last):
>   File "test.py", line 12, in 
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2056) Jython error messages should show script name

2011-05-10 Thread Richard Ding (JIRA)
Jython error messages should show script name
-

 Key: PIG-2056
 URL: https://issues.apache.org/jira/browse/PIG-2056
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0


Instead of messages like

{code}
Traceback (most recent call last):
  File "", line 12, in 
{code}

It should display the script file name:

{code}
Traceback (most recent call last):
  File "test.py", line 12, in 
{code}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-05-09 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030972#comment-13030972
 ] 

Richard Ding commented on PIG-1827:
---

New patch added a unit test case as suggested.

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-05-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

Attachment: PIG-1827_3.patch

We should limit this jira to fix the issue in embedded Pig (i.e. workaround the 
general parameter substitution) and visit parameter substitution parser and 
related code in a separate jira.

> When passing a parameter to Pig, if the value contains $ it has to be escaped 
> for no apparent reason
> 
>
> Key: PIG-1827
> URL: https://issues.apache.org/jira/browse/PIG-1827
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Julien Le Dem
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-05-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2012:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch 2 committed to trunk and 0.9 branch.

> Comments at the begining of the file throws off line numbers in errors
> --
>
> Key: PIG-2012
> URL: https://issues.apache.org/jira/browse/PIG-2012
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Alan Gates
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig
>
>
> The preprocessor does not appear to be handling leading comments properly 
> when calculating line numbers for error messages.  In the attached script, 
> the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-05-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2012:
--

Attachment: PIG-2012_2.patch

Thanks Xuefu. The new patch addresses the review comments.

> Comments at the begining of the file throws off line numbers in errors
> --
>
> Key: PIG-2012
> URL: https://issues.apache.org/jira/browse/PIG-2012
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Alan Gates
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig
>
>
> The preprocessor does not appear to be handling leading comments properly 
> when calculating line numbers for error messages.  In the attached script, 
> the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2033) Pig returns sucess for the failed Pig script

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2033.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass on 0.8 branch. Patch committed to 0.8 branch, 0.9 branch and 
trunk.

> Pig returns sucess for the failed Pig script
> 
>
> Key: PIG-2033
> URL: https://issues.apache.org/jira/browse/PIG-2033
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.1, 0.9.0
>
> Attachments: PIG-2033.patch
>
>
> Pig returns success when a Pig script fails but the count of failed MR jobs 
> is zero. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2049) Pig should display TokenMgrError message consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2049.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

> Pig should display TokenMgrError message consistently across all parsers
> 
>
> Key: PIG-2049
> URL: https://issues.apache.org/jira/browse/PIG-2049
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2049.patch
>
>
> For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs
> {code}
> ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
> {code}
> But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs
> {code}
> ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
> {code}
> Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2050) Pig can't reference auto-generated schema name for TOTUPLE

2011-05-06 Thread Richard Ding (JIRA)
Pig can't reference auto-generated schema name for TOTUPLE
--

 Key: PIG-2050
 URL: https://issues.apache.org/jira/browse/PIG-2050
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Priority: Minor


Here is the use case:

{code}
grunt> A = load 'data' as (a0, a1, a2); 
grunt> B = foreach A generate TOTUPLE(a0, a2);  
grunt> describe B
B: {org.apache.pig.builtin.totuple_a0_3: (a0: bytearray,a2: bytearray)}
grunt> C = foreach B generate org.apache.pig.builtin.totuple_a0_3;
2011-05-06 14:38:14,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Invalid alias: org in 
{org.apache.pig.builtin.totuple_a0_1: (a0: bytearray,a2: bytearray)}
{code}

The workaround is to specify a use-defined schema name:

{code}
grunt> A = load 'data' as (a0, a1, a2); 
 
grunt> B = foreach A generate TOTUPLE(a0, a2) as aa;  
grunt> describe B 
B: {aa: (a0: bytearray,a2: bytearray)}
grunt> C = foreach B generate aa; 
grunt> 
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2049) Pig should display TokenMgrError message consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2049:
--

Affects Version/s: (was: 0.8.0)
  Summary: Pig should display TokenMgrError message consistently 
across all parsers  (was: Pig should display TokenMgrError consistently across 
all parsers)

> Pig should display TokenMgrError message consistently across all parsers
> 
>
> Key: PIG-2049
> URL: https://issues.apache.org/jira/browse/PIG-2049
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2049.patch
>
>
> For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs
> {code}
> ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
> {code}
> But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs
> {code}
> ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
> {code}
> Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2049) Pig should display TokenMgrError consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2049:
--

Attachment: PIG-2049.patch

> Pig should display TokenMgrError consistently across all parsers
> 
>
> Key: PIG-2049
> URL: https://issues.apache.org/jira/browse/PIG-2049
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-2049.patch
>
>
> For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs
> {code}
> ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
> {code}
> But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs
> {code}
> ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
> {code}
> Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2049) Pig should display TokenMgrError consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)
Pig should display TokenMgrError consistently across all parsers


 Key: PIG-2049
 URL: https://issues.apache.org/jira/browse/PIG-2049
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0


For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs

{code}
ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
{code}

But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs

{code}
ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
{code}

Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-06 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030019#comment-13030019
 ] 

Richard Ding commented on PIG-2035:
---

Unit tests pass.

> Macro expansion doesn't handle multiple expansions of same macro inside 
> another macro
> -
>
> Key: PIG-2035
> URL: https://issues.apache.org/jira/browse/PIG-2035
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2035_1.patch
>
>
> Here is the use case:
> {code}
> define test ( in, out, x ) returns c { 
> a = load '$in' as (name, age, gpa);
> b = group a by gpa;
> $c = foreach b generate group, COUNT(a.$x);
> store $c into '$out';
> };
> define test2( in, out ) returns x { 
> $x = test( '$in', '$out', 'name' );
> $x = test( '$in', '$out.1', 'age' );
> $x = test( '$in', '$out.2', 'gpa' );
> };
> x = test2('studenttab10k', 'myoutput');
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2033) Pig returns sucess for the failed Pig script

2011-05-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2033:
--

Attachment: PIG-2033.patch

We make sure that Pig returns success iff the number of successfully jobs equal 
the number of compiled jobs.

This patch doesn't include a unit test since it's difficult to simulate the 
failure case.

> Pig returns sucess for the failed Pig script
> 
>
> Key: PIG-2033
> URL: https://issues.apache.org/jira/browse/PIG-2033
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.1, 0.9.0
>
> Attachments: PIG-2033.patch
>
>
> Pig returns success when a Pig script fails but the count of failed MR jobs 
> is zero. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2041) Minicluster should make each run independent

2011-05-05 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029488#comment-13029488
 ] 

Richard Ding commented on PIG-2041:
---

+1

> Minicluster should make each run independent
> 
>
> Key: PIG-2041
> URL: https://issues.apache.org/jira/browse/PIG-2041
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.0
>
> Attachments: PIG-2041-1.patch
>
>
> Minicluster will reuse ~/pigtest/conf/hadoop-site.xml. If something wrong in 
> hadoop-site.xml, next test will also be affected. This leads to some 
> mysterious test failures. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1999) Macro alias masker should consider schema context

2011-05-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1999.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass. Patch committed to trunk and 0.9 branch.

> Macro alias masker should consider schema context 
> --
>
> Key: PIG-1999
> URL: https://issues.apache.org/jira/browse/PIG-1999
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-1999_1.patch, PIG-1999_2.patch
>
>
> Macro alias masker doesn't consider the current schema context. This results 
> errors when deciding with alias to mask. Here is an example:
> {code}
> define toBytearray(in, intermediate) returns e { 
>a = load '$in' as (name:chararray, age:long, gpa: float);
>b = group a by  name;
>c = foreach b generate a, (1,2,3);
>store c into '$intermediate' using BinStorage();
>d = load '$intermediate' using BinStorage() as (b:bag{t:tuple(x,y,z)}, 
> t2:tuple(a,b,c));
>$e = foreach d generate COUNT(b), t2.a, t2.b, t2.c;
> };
>  
> f = toBytearray ('data', 'output1');
> {code} 
> Now the alias masker mistakes b in COUNT(b) as an alias instead of b in the 
> current schema.
> The workaround is to not use alias as as names in the schema definition. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-04 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029045#comment-13029045
 ] 

Richard Ding commented on PIG-2035:
---

test-patch result:

{code}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 585 release 
audit warnings (more than the trunk's current 584 warnings).
{code}



> Macro expansion doesn't handle multiple expansions of same macro inside 
> another macro
> -
>
> Key: PIG-2035
> URL: https://issues.apache.org/jira/browse/PIG-2035
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2035_1.patch
>
>
> Here is the use case:
> {code}
> define test ( in, out, x ) returns c { 
> a = load '$in' as (name, age, gpa);
> b = group a by gpa;
> $c = foreach b generate group, COUNT(a.$x);
> store $c into '$out';
> };
> define test2( in, out ) returns x { 
> $x = test( '$in', '$out', 'name' );
> $x = test( '$in', '$out.1', 'age' );
> $x = test( '$in', '$out.2', 'gpa' );
> };
> x = test2('studenttab10k', 'myoutput');
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-04 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2035:
--

Attachment: PIG-2035_1.patch

> Macro expansion doesn't handle multiple expansions of same macro inside 
> another macro
> -
>
> Key: PIG-2035
> URL: https://issues.apache.org/jira/browse/PIG-2035
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2035_1.patch
>
>
> Here is the use case:
> {code}
> define test ( in, out, x ) returns c { 
> a = load '$in' as (name, age, gpa);
> b = group a by gpa;
> $c = foreach b generate group, COUNT(a.$x);
> store $c into '$out';
> };
> define test2( in, out ) returns x { 
> $x = test( '$in', '$out', 'name' );
> $x = test( '$in', '$out.1', 'age' );
> $x = test( '$in', '$out.2', 'gpa' );
> };
> x = test2('studenttab10k', 'myoutput');
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-05-04 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028864#comment-13028864
 ] 

Richard Ding commented on PIG-2012:
---

Unit tests pass.

> Comments at the begining of the file throws off line numbers in errors
> --
>
> Key: PIG-2012
> URL: https://issues.apache.org/jira/browse/PIG-2012
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Alan Gates
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2012_1.patch, macro.pig
>
>
> The preprocessor does not appear to be handling leading comments properly 
> when calculating line numbers for error messages.  In the attached script, 
> the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-04 Thread Richard Ding (JIRA)
Macro expansion doesn't handle multiple expansions of same macro inside another 
macro
-

 Key: PIG-2035
 URL: https://issues.apache.org/jira/browse/PIG-2035
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


Here is the use case:

{code}
define test ( in, out, x ) returns c { 
a = load '$in' as (name, age, gpa);
b = group a by gpa;
$c = foreach b generate group, COUNT(a.$x);
store $c into '$out';
};

define test2( in, out ) returns x { 
$x = test( '$in', '$out', 'name' );
$x = test( '$in', '$out.1', 'age' );
$x = test( '$in', '$out.2', 'gpa' );
};

x = test2('studenttab10k', 'myoutput');
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2028) Speed up multiquery unit tests

2011-05-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2028.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Path committed to trunk and 0.9 branch.

> Speed up multiquery unit tests 
> ---
>
> Key: PIG-2028
> URL: https://issues.apache.org/jira/browse/PIG-2028
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2028.patch, PIG-2028_1.patch
>
>
> Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results 
> on my laptop:
> Using Mini Cluster:
> TestMultiQueryBasic: 17 min 17 sec
> TestMultiQuery:  23 min 2 sec
> Using LOCAL mode:
> TestMultiQueryBasic: 4 min 17 sec
> TestMultiQuery:  5 min 51 sec

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2033) Pig returns sucess for the failed Pig script

2011-05-03 Thread Richard Ding (JIRA)
Pig returns sucess for the failed Pig script


 Key: PIG-2033
 URL: https://issues.apache.org/jira/browse/PIG-2033
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.1, 0.9.0


Pig returns success when a Pig script fails but the count of failed MR jobs is 
zero. 





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2028) Speed up multiquery unit tests

2011-05-03 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028485#comment-13028485
 ] 

Richard Ding commented on PIG-2028:
---

Simplify the test cases. Using Util.createLocalInputFile whenever possible.

> Speed up multiquery unit tests 
> ---
>
> Key: PIG-2028
> URL: https://issues.apache.org/jira/browse/PIG-2028
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2028.patch, PIG-2028_1.patch
>
>
> Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results 
> on my laptop:
> Using Mini Cluster:
> TestMultiQueryBasic: 17 min 17 sec
> TestMultiQuery:  23 min 2 sec
> Using LOCAL mode:
> TestMultiQueryBasic: 4 min 17 sec
> TestMultiQuery:  5 min 51 sec

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2028) Speed up multiquery unit tests

2011-05-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2028:
--

Attachment: PIG-2028_1.patch

> Speed up multiquery unit tests 
> ---
>
> Key: PIG-2028
> URL: https://issues.apache.org/jira/browse/PIG-2028
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.0
>
> Attachments: PIG-2028.patch, PIG-2028_1.patch
>
>
> Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results 
> on my laptop:
> Using Mini Cluster:
> TestMultiQueryBasic: 17 min 17 sec
> TestMultiQuery:  23 min 2 sec
> Using LOCAL mode:
> TestMultiQueryBasic: 4 min 17 sec
> TestMultiQuery:  5 min 51 sec

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

2011-05-03 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028438#comment-13028438
 ] 

Richard Ding commented on PIG-1821:
---

+1

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf 
> classname (+ arg hashcodes)
> -
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs 
> would end up sharing the properties object. 
> {code}
> private HashMap udfConfs = new HashMap Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args 
> should be created, and instead of HashMap,  HashMap Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >