[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-10-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4174_4.patch

This patch fixed the unit tests. 

The version of Spark used is 1.0.2. In Spark 1.1.0, the CoGroupRDD is changed 
and breaks the cogroup runtime. I'm looking into this.

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
 PIG-4174_4.patch, TEST-org.apache.pig.spark.TestSpark.txt


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-10-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4174_5.patch

This patch fixed the cogroup issue for Spark 1.1.0. Spark version is updated to 
1.1.0.

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
 PIG-4174_4.patch, PIG-4174_5.patch, TEST-org.apache.pig.spark.TestSpark.txt


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173_3.patch

Thanks for the review. The new patch incorporate the changes in the comments. 

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
 TEST-org.apache.pig.spark.TestSpark.txt


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153701#comment-14153701
 ] 

Richard Ding commented on PIG-4173:
---

Hi ~praveenr019, 

Since PIG-4186 hasn't been checked in, it seems make more sense to first build 
with Spark 1.x and then fix PIG-4186. What do you think?

Thanks,
-Richard

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
 TEST-org.apache.pig.spark.TestSpark.txt


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4173) Move to Spark 1.x

2014-09-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153705#comment-14153705
 ] 

Richard Ding commented on PIG-4173:
---

Sorry, I meant PIG-4168.

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch, PIG-4173_2.patch, PIG-4173_3.patch, 
 TEST-org.apache.pig.spark.TestSpark.txt


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4173) Move to Spark 1.x

2014-09-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-4173:
-

Assignee: Richard Ding

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding

 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173.patch

Attaching the initial patch to upgrade Spark to 1.1.0.

I made some local changes so that the patch now compiles with the latest Spark 
jar.

I have a question though: why don't we use JavaRDD throughout the code? Is this 
due to performance concerns?

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4173) Move to Spark 1.x

2014-09-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-4173:
--
Attachment: PIG-4173_2.patch

Adding javax.servlet dependency

 Move to Spark 1.x
 -

 Key: PIG-4173
 URL: https://issues.apache.org/jira/browse/PIG-4173
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: bc Wong
Assignee: Richard Ding
 Attachments: PIG-4173.patch, PIG-4173_2.patch


 The Spark branch is using Spark 0.9: 
 https://github.com/apache/pig/blob/spark/ivy.xml#L438. We should probably 
 switch to Spark 1.x asap, due to Spark interface changes since 1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-29 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858404#comment-13858404
 ] 

Richard Ding commented on PIG-3608:
---

Thanks [~cheolsoo].

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.13.0

 Attachments: PIG-3608.patch, PIG-3608_2.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-27 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

   Resolution: Fixed
Fix Version/s: 0.13.0
 Release Note: 
Committed to trunk.

 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.13.0

 Attachments: PIG-3608.patch, PIG-3608_2.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856041#comment-13856041
 ] 

Richard Ding commented on PIG-3609:
---

[~cheolsoo], checking size is an optimization, this is also what 
DefaultAbstractBag implements. 

+1 on the patch.

 ClassCastException when calling compareTo method on AvroBagWrapper 
 ---

 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3609.patch, PIG-3609_2.patch, PIG-3609_3.patch


 One got the following exception when calling compareTo method on 
 AvroBagWrapper with an AvroBagWrapper object:
 {code}
 java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
 incompatible with java.util.Collection
 at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
 at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
 at 
 org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
 {code}
 Looking at the code, it compares objects with different types:
 {code}
 return GenericData.get().compare(theArray, o, theArray.getSchema());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856047#comment-13856047
 ] 

Richard Ding commented on PIG-3608:
---

Thanks for reviewing the patch.

Right now I don't have a Pig script to demonstrate this use case. I'm getting 
this problem while trying to iterate an instance of AvroMapWrapper and find out 
that I can't look up the value from the map using the key just retrieved from 
the map. I think this breaks the basic contract of a map implementation.

I think the check

{code}
if (isUtf8key  !(key instanceof Utf8))
{code}

is more general. But I'm ok if it is restricted to String.


 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3608.patch, PIG-3608_2.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

Attachment: PIG-3608_2.patch

You are right. Update the patch with a test case.

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3608.patch, PIG-3608_2.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3609:
--

Attachment: PIG-3609_2.patch

New patch with a test case.

 ClassCastException when calling compareTo method on AvroBagWrapper 
 ---

 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3609.patch, PIG-3609_2.patch


 One got the following exception when calling compareTo method on 
 AvroBagWrapper with an AvroBagWrapper object:
 {code}
 java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
 incompatible with java.util.Collection
 at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
 at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
 at 
 org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
 {code}
 Looking at the code, it compares objects with different types:
 {code}
 return GenericData.get().compare(theArray, o, theArray.getSchema());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-3608:
-

Assignee: Richard Ding

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3608.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

Attachment: PIG-3608.patch

Attach a simple patch.

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Priority: Minor
 Attachments: PIG-3608.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3608:
--

Status: Patch Available  (was: Open)

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3608.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3609:
--

Attachment: PIG-3609.patch

Attaching a patch.

 ClassCastException when calling compareTo method on AvroBagWrapper 
 ---

 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Priority: Minor
 Attachments: PIG-3609.patch


 One got the following exception when calling compareTo method on 
 AvroBagWrapper with an AvroBagWrapper object:
 {code}
 java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
 incompatible with java.util.Collection
 at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
 at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
 at 
 org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
 {code}
 Looking at the code, it compares objects with different types:
 {code}
 return GenericData.get().compare(theArray, o, theArray.getSchema());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3609:
--

Status: Patch Available  (was: Open)

 ClassCastException when calling compareTo method on AvroBagWrapper 
 ---

 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3609.patch


 One got the following exception when calling compareTo method on 
 AvroBagWrapper with an AvroBagWrapper object:
 {code}
 java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
 incompatible with java.util.Collection
 at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
 at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
 at 
 org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
 {code}
 Looking at the code, it compares objects with different types:
 {code}
 return GenericData.get().compare(theArray, o, theArray.getSchema());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-3609:
-

Assignee: Richard Ding

 ClassCastException when calling compareTo method on AvroBagWrapper 
 ---

 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3609.patch


 One got the following exception when calling compareTo method on 
 AvroBagWrapper with an AvroBagWrapper object:
 {code}
 java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
 incompatible with java.util.Collection
 at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
 at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
 at 
 org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
 {code}
 Looking at the code, it compares objects with different types:
 {code}
 return GenericData.get().compare(theArray, o, theArray.getSchema());
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3608) ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key

2013-12-05 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840791#comment-13840791
 ] 

Richard Ding commented on PIG-3608:
---

Actually I have a question: should it be

{code}
if (isUtf8key) {
  v = innerMap.get(key);
} else {
  v = innerMap.get(new Utf8((String) key));
}
{code}

since isUft8key == true means the key is already Utf8?

 ClassCastException when looking up a value from AvroMapWrapper using a Utf8 
 key
 ---

 Key: PIG-3608
 URL: https://issues.apache.org/jira/browse/PIG-3608
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Attachments: PIG-3608.patch


 One got the following exception:
 {code}
 java.lang.ClassCastException: org.apache.avro.util.Utf8 incompatible with 
 java.lang.String 
 at 
 org.apache.pig.impl.util.avro.AvroMapWrapper.get(AvroMapWrapper.java:80)
 {code}
 This is related to the change by PIG-3420.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3609) ClassCastException when calling compareTo method on AvroBagWrapper 

2013-12-04 Thread Richard Ding (JIRA)
Richard Ding created PIG-3609:
-

 Summary: ClassCastException when calling compareTo method on 
AvroBagWrapper 
 Key: PIG-3609
 URL: https://issues.apache.org/jira/browse/PIG-3609
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.12.0
Reporter: Richard Ding
Priority: Minor


One got the following exception when calling compareTo method on AvroBagWrapper 
with an AvroBagWrapper object:

{code}
java.lang.ClassCastException: org.apache.pig.impl.util.avro.AvroBagWrapper 
incompatible with java.util.Collection
at org.apache.avro.generic.GenericData.compare(GenericData.java:786)
at org.apache.avro.generic.GenericData.compare(GenericData.java:760)
at 
org.apache.pig.impl.util.avro.AvroBagWrapper.compareTo(AvroBagWrapper.java:78)
{code}

Looking at the code, it compares objects with different types:

{code}
return GenericData.get().compare(theArray, o, theArray.getSchema());
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-03-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607820#comment-13607820
 ] 

Richard Ding commented on PIG-3251:
---

With HADOOP-7823, can we remove Bzip2TextInputFormat and just use 
PigTextInputFormat?

 Bzip2TextInputFormat requires double the memory of maximum record size
 --

 Key: PIG-3251
 URL: https://issues.apache.org/jira/browse/PIG-3251
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch


 While looking at user's OOM heap dump, noticed that pig's 
 Bzip2TextInputFormat consumes memory at both
 Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
 and actual Text that is returned as line.
 For example, when having one record with 160MBytes, buffer was 268MBytes and 
 Text was 160MBytes.  
 We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3097) HiveColumnarLoader doesn't correctly load partitioned Hive table

2012-12-16 Thread Richard Ding (JIRA)
Richard Ding created PIG-3097:
-

 Summary: HiveColumnarLoader doesn't correctly load partitioned 
Hive table 
 Key: PIG-3097
 URL: https://issues.apache.org/jira/browse/PIG-3097
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding



Given a partitioned Hive table:

{code}
hive describe mytable;
OK
f1string  
f2 string  
f3 string  
partition_dtstring
{code}

The following Pig script gives the correct schema:

{code}
grunt A = load '/hive/warehouse/mytable' using 
org.apache.pig.piggybank.storage.HiveColumnarLoader('f1 string,f2string,f3 
string');
grunt describe A
A: {f1: chararray,f2: chararray,f3: chararray,partition_dt: chararray}
{code}

But, the command

{code}
grunt dump A
{code}

only produces the first column of all records in the table (all four columns 
are expected).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3058) Upgrade junit to at least 4.8

2012-11-21 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3058:
--


This two failures were introduced by PIG-2924 which actually fixed a bug in 
JobStats class. But the corresponding errors in TestPigRunner didn't get fixed.



 Upgrade junit to at least 4.8
 -

 Key: PIG-3058
 URL: https://issues.apache.org/jira/browse/PIG-3058
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11
Reporter: fang fang chen
Assignee: fang fang chen

 Pig needs to upgrade junit version to at least 4.8. Otherwise, one gets 
 following warnings.
   [javadoc] 
 org/apache/hadoop/hbase/mapreduce/TestWALPlayer.class(org/apache/hadoop/hbase/mapreduce:TestWALPlayer.class):
  warning: Cannot find annotation method 'value()' in type 
 'org.junit.experimental.categories.Category': class file for 
 org.junit.experimental.categories.Category not found

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-11-07 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2405:
--

Fix Version/s: 0.11

 svn tags/release-0.9.1: some unit test case failed with open JDK
 

 Key: PIG-2405
 URL: https://issues.apache.org/jira/browse/PIG-2405
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.1
 Environment: ant-1.8.2
 open jdk: 1.6
Reporter: fang fang chen
Assignee: fang fang chen
 Fix For: 0.11

 Attachments: PIG-2405-trunk.patch


 [junit] Test org.apache.pig.test.TestDataModel FAILED
 Testcase: testTupleToString took 0.004 sec
 FAILED
 toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
 junit.framework.ComparisonFailure: toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
  at 
 org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269
 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED
 Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec
 Testcase: testHeterogeneousScans took 0.018 sec
 Caused an ERROR
 java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many 
 open files)
 java.lang.RuntimeException: java.io.FileNotFoundException: 
 /root/pigtest/conf/hadoop-site.xml (Too many open files)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
 at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035)
 at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:436)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130)
 at 
 org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809)
 at 
 org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741)
 Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml 
 (Too many open files)
 at java.io.FileInputStream.init(FileInputStream.java:112)
 at java.io.FileInputStream.init(FileInputStream.java:72)
 at 
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
 at 
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
 at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
 Source)
 at 
 org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)
 Caused an ERROR
 Could not resolve the DNS name of hostname:39611
 java.lang.IllegalArgumentException: Could not resolve the DNS name of 
 hostname:39611
 at 
 org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
 at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145)
 at 
 org.apache.pig.test.TestHBaseStorage.deleteAllRows(TestHBaseStorage.java:120)
 at 
 org.apache.pig.test.TestHBaseStorage.tearDown(TestHBaseStorage.java:112)
 [junit] Test org.apache.pig.test.TestMRCompiler FAILED
 Testcase: testSortUDF1 took 0.045 sec
 FAILED
 null 

[jira] [Updated] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3000:
--

Description: 
In this Pig script:

{case}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{case}

 Optimize nested foreach
 ---

 Key: PIG-3000
 URL: https://issues.apache.org/jira/browse/PIG-3000
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding

 In this Pig script:
 {case}
 A = load 'data' as (a:chararray);
 B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
 ? 1 : 0); }
 {case}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3000) Optimize nested foreach

2012-10-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-3000:
--

Description: 
In this Pig script:

{code}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{code}

The Eval function UPPER is called twice for each record.

This should be optimized so that the UPPER is called only once for each record

  was:
In this Pig script:

{case}
A = load 'data' as (a:chararray);
B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 
1 : 0); }
{case}


 Optimize nested foreach
 ---

 Key: PIG-3000
 URL: https://issues.apache.org/jira/browse/PIG-3000
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding

 In this Pig script:
 {code}
 A = load 'data' as (a:chararray);
 B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') 
 ? 1 : 0); }
 {code}
 The Eval function UPPER is called twice for each record.
 This should be optimized so that the UPPER is called only once for each record

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2637) Command-line option -e throws TokenMgrError exception

2012-09-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2637:
--

Status: Patch Available  (was: Open)

 Command-line option -e throws TokenMgrError exception
 -

 Key: PIG-2637
 URL: https://issues.apache.org/jira/browse/PIG-2637
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.2
Reporter: Richard Ding
Assignee: fang fang chen
Priority: Minor
 Attachments: PIG-2637.patch


 The command-line:
 {code}
 java -cp pig.jar org.apache.pig.Main -x local -e a = load '1.txt';
 {code}
 fails with exception:
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 1, column 18.  
 Encountered: EOF after : 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2744) Handle Pig command line with XML special characters

2012-06-08 Thread Richard Ding (JIRA)
Richard Ding created PIG-2744:
-

 Summary: Handle Pig command line with XML special characters
 Key: PIG-2744
 URL: https://issues.apache.org/jira/browse/PIG-2744
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Richard Ding
Assignee: Richard Ding


Pig stores Pig command line string to the Hadoop job XML file. It will fail if 
the command line string contains XML special characters. Pig should treat the 
command string like Pig script by first encoding it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Attachment: PIG-2261.patch

Attaching patch that restores the support for parenthesis.

 Restore support for parenthesis in Pig 0.9
 --

 Key: PIG-2261
 URL: https://issues.apache.org/jira/browse/PIG-2261
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1

 Attachments: PIG-2261.patch


 Pig 0.8 and earlier versions used to support syntax such as 
  
 {code}
 A =(load )
 {code}
 This was removed as useless in 0.9 when the grammar was redone. It turns 
 out that some user is using this for ease of code generation so we want to 
 restore it back.
 Just to clarify, Pig 0.9 continues to support composite statements such as
 {code}
 B = filter (load 'data' as (a, b)) by a  0;
 {code}
 It just removed useless parenthesis and doesn't support statements like
 {code}
 A = (load 'data' as (a, b));
 {code}
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Status: Patch Available  (was: Open)

 Restore support for parenthesis in Pig 0.9
 --

 Key: PIG-2261
 URL: https://issues.apache.org/jira/browse/PIG-2261
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1

 Attachments: PIG-2261.patch


 Pig 0.8 and earlier versions used to support syntax such as 
  
 {code}
 A =(load )
 {code}
 This was removed as useless in 0.9 when the grammar was redone. It turns 
 out that some user is using this for ease of code generation so we want to 
 restore it back.
 Just to clarify, Pig 0.9 continues to support composite statements such as
 {code}
 B = filter (load 'data' as (a, b)) by a  0;
 {code}
 It just removed useless parenthesis and doesn't support statements like
 {code}
 A = (load 'data' as (a, b));
 {code}
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2261) Restor support for parenthesis in Pig 0.9

2011-09-01 Thread Richard Ding (JIRA)
Restor support for parenthesis in Pig 0.9
-

 Key: PIG-2261
 URL: https://issues.apache.org/jira/browse/PIG-2261
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
 Fix For: 0.9.1


Pig 0.8 and earlier versions used to support syntax such as 
 
{code}
A =(load )
{code}

This was removed as useless in 0.9 when the grammar was redone. It turns out 
that some user is using this for ease of code generation so we want to restore 
it back.

Just to clarify, Pig 0.9 continues to support composite statements such as

{code}
B = filter (load 'data' as (a, b)) by a  0;
{code}

It just removed useless parenthesis and doesn't support statements like

{code}
A = (load 'data' as (a, b));
{code}
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2261) Restore support for parenthesis in Pig 0.9

2011-09-01 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2261:
--

Summary: Restore support for parenthesis in Pig 0.9  (was: Restor support 
for parenthesis in Pig 0.9)

 Restore support for parenthesis in Pig 0.9
 --

 Key: PIG-2261
 URL: https://issues.apache.org/jira/browse/PIG-2261
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
 Fix For: 0.9.1


 Pig 0.8 and earlier versions used to support syntax such as 
  
 {code}
 A =(load )
 {code}
 This was removed as useless in 0.9 when the grammar was redone. It turns 
 out that some user is using this for ease of code generation so we want to 
 restore it back.
 Just to clarify, Pig 0.9 continues to support composite statements such as
 {code}
 B = filter (load 'data' as (a, b)) by a  0;
 {code}
 It just removed useless parenthesis and doesn't support statements like
 {code}
 A = (load 'data' as (a, b));
 {code}
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089330#comment-13089330
 ] 

Richard Ding commented on PIG-2208:
---

It only logs once per job in the front end so that user is informed that the 
multi-inputs (or outputs) counters are disabled. In the back-end the counters 
are simply disabled without logging. 

 Restrict number of PIG generated Haddop counters 
 -

 Key: PIG-2208
 URL: https://issues.apache.org/jira/browse/PIG-2208
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1

 Attachments: PIG-2208.patch


 PIG 8.0 implemented Hadoop counters to track the number of records read for 
 each input and the number of records written for each output (PIG-1389  
 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
 (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
 Therefore we need a way to cap the number of PIG generated counters.
 Here are the two options:
 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
 (e.g., 20). If the number of inputs of a job exceeds this number, the input 
 counters are disabled. Similarly, if the number of outputs of a job exceeds 
 this number, the output counters are disabled.
 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
 file (default: false). If this property is set to true, then the PIG 
 generated counters are disabled.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-11 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2208:
--

Attachment: PIG-2208.patch

This patch implements option 2. Augmenting Pig grammar will be more involved 
and could be done later.

 Restrict number of PIG generated Haddop counters 
 -

 Key: PIG-2208
 URL: https://issues.apache.org/jira/browse/PIG-2208
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1

 Attachments: PIG-2208.patch


 PIG 8.0 implemented Hadoop counters to track the number of records read for 
 each input and the number of records written for each output (PIG-1389  
 PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
 (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
 Therefore we need a way to cap the number of PIG generated counters.
 Here are the two options:
 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
 (e.g., 20). If the number of inputs of a job exceeds this number, the input 
 counters are disabled. Similarly, if the number of outputs of a job exceeds 
 this number, the output counters are disabled.
 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
 file (default: false). If this property is set to true, then the PIG 
 generated counters are disabled.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-09 Thread Richard Ding (JIRA)
Restrict number of PIG generated Haddop counters 
-

 Key: PIG-2208
 URL: https://issues.apache.org/jira/browse/PIG-2208
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0, 0.8.1
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.1


PIG 8.0 implemented Hadoop counters to track the number of records read for 
each input and the number of records written for each output (PIG-1389  
PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
(MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.

Therefore we need a way to cap the number of PIG generated counters.

Here are the two options:

1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
(e.g., 20). If the number of inputs of a job exceeds this number, the input 
counters are disabled. Similarly, if the number of outputs of a job exceeds 
this number, the output counters are disabled.

2. Add a boolean property (e.g., pig.disable.counters) to the pig property file 
(default: false). If this property is set to true, then the PIG generated 
counters are disabled.

  



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2125) Make Pig work with hadoop .NEXT

2011-07-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068739#comment-13068739
 ] 

Richard Ding commented on PIG-2125:
---

+1

 Make Pig work with hadoop .NEXT
 ---

 Key: PIG-2125
 URL: https://issues.apache.org/jira/browse/PIG-2125
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.10
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.10

 Attachments: PIG-2125-1.patch, PIG-2125-2.patch, PIG-2125-3.patch, 
 PIG-2125-4.patch, PIG-2125-5.patch


 We need to make Pig work with hadoop .NEXT, the svn branch currently is: 
 https://svn.apache.org/repos/asf/hadoop/common/branches/MR-279

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)
Do not bundle apache commons jars with pig-withouthadoop.jar


 Key: PIG-2141
 URL: https://issues.apache.org/jira/browse/PIG-2141
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: site, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


This jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2141:
--

Attachment: PIG-2141.patch

 Do not bundle apache commons jars with pig-withouthadoop.jar
 

 Key: PIG-2141
 URL: https://issues.apache.org/jira/browse/PIG-2141
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: site, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2141.patch


 This jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2141) Do not bundle apache commons jars with pig-withouthadoop.jar

2011-06-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2141:
--

Description: These jars are already available with hadoop installation.   
(was: This jars are already available with hadoop installation. )

 Do not bundle apache commons jars with pig-withouthadoop.jar
 

 Key: PIG-2141
 URL: https://issues.apache.org/jira/browse/PIG-2141
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: site, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2141.patch


 These jars are already available with hadoop installation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2083) bincond ERROR 1025: Invalid field projection when null is used

2011-05-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039242#comment-13039242
 ] 

Richard Ding commented on PIG-2083:
---

+1

 bincond ERROR 1025: Invalid field projection when null is used
 --

 Key: PIG-2083
 URL: https://issues.apache.org/jira/browse/PIG-2083
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.9.0
 Environment: Linux 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 
 2008 x86_64 x86_64 x86_64 GNU/Linux
 Hadoop 0.20.203.3.1104011556 -r 96519d04f65e22ffadf89b225d0d44ef1741d126
 Compiled on Fri Apr  1 16:29:09 PDT 2011
Reporter: Araceli Henley
Assignee: Thejas M Nair
 Fix For: 0.9.0

 Attachments: PIG-2083.1.patch


 This is a regression for 9.
 a = load '1.txt' as (a0, a1);
 b = foreach a generate (a0==0?null:2);
 explain b;
 ERROR 1025:
 Invalid field projection. Projected field [null] does not exist in schema

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2084) pig is running validation for a statement at a time batch mode, instead of running it for whole script

2011-05-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038097#comment-13038097
 ] 

Richard Ding commented on PIG-2084:
---

+1

 pig is running validation for a statement at a time batch mode, instead of 
 running it for whole script
 --

 Key: PIG-2084
 URL: https://issues.apache.org/jira/browse/PIG-2084
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.9.0

 Attachments: PIG-2084.1.patch


 In PIG-2059, a change was made to run validation for each statement instead 
 of running it once for the whole script.
 This slows down the validation phase, and it ends up taking tens of seconds.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)
Return alias validation failed when there is single line comment in the macro
-

 Key: PIG-2088
 URL: https://issues.apache.org/jira/browse/PIG-2088
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0
 Attachments: PIG-2088.patch

The following script

{code}
define test() returns b { 
   a = load 'data' as (name, age, gpa);
-- message 
   $b = filter a by (int)age  40; 
};

beta = test();
store beta into 'output';
{code}

results in a validation failure:

{code}
ERROR 1200 Macro test missing return alias b
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2088:
--

Attachment: PIG-2088.patch

 Return alias validation failed when there is single line comment in the macro
 -

 Key: PIG-2088
 URL: https://issues.apache.org/jira/browse/PIG-2088
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2088.patch


 The following script
 {code}
 define test() returns b { 
a = load 'data' as (name, age, gpa);
 -- message 
$b = filter a by (int)age  40; 
 };
 beta = test();
 store beta into 'output';
 {code}
 results in a validation failure:
 {code}
 ERROR 1200 Macro test missing return alias b
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2088) Return alias validation failed when there is single line comment in the macro

2011-05-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2088.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

 Return alias validation failed when there is single line comment in the macro
 -

 Key: PIG-2088
 URL: https://issues.apache.org/jira/browse/PIG-2088
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2088.patch


 The following script
 {code}
 define test() returns b { 
a = load 'data' as (name, age, gpa);
 -- message 
$b = filter a by (int)age  40; 
 };
 beta = test();
 store beta into 'output';
 {code}
 results in a validation failure:
 {code}
 ERROR 1200 Macro test missing return alias b
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-20 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037003#comment-13037003
 ] 

Richard Ding commented on PIG-2081:
---

test-patch and unit tests pass.

 Dryrun gives wrong line numbers in error message for scripts containing macro.
 --

 Key: PIG-2081
 URL: https://issues.apache.org/jira/browse/PIG-2081
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2081.patch


 For following script (test.pig)
 {code}
 1 DEFINE my_macro (X,key) returns Y
   2 {
   3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
   4 tmp2 = foreach tmp1 generate flatten(tokens);
   5 tmp3 = order tmp2 by $0;
   6 $Y = distinct tmp3;
   7 }
   8 
   9 A = load 'sometext' using TextLoader() as (row) ;
  10 E = my_macro(A,row);
  11 
  12 A1 = load 'sometext2' using TextLoader() as (row1);
  13 E1 = my_macro(A1,row1);
  14 
  15 A3 = load 'sometext3' using TextLoader() as (row3);
  16 E3 = my_macro(A3,$0);
  17 
  18 F = cogroup E by $0, E1 by $0,E3 by $0;
  19 dump F;
 {code}
 pig test.pig gives correct line number in error message:
 {code}
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, 
 column 17  mismatched input '$0' expecting set null
 {code}
 while pig -r test.pig gives incorrect line number in error message:
 {code}
 ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, 
 column 17  mismatched input '$0' expecting set null
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2081.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

patch committed to trunk and 0.9 branch

 Dryrun gives wrong line numbers in error message for scripts containing macro.
 --

 Key: PIG-2081
 URL: https://issues.apache.org/jira/browse/PIG-2081
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2081.patch


 For following script (test.pig)
 {code}
 1 DEFINE my_macro (X,key) returns Y
   2 {
   3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
   4 tmp2 = foreach tmp1 generate flatten(tokens);
   5 tmp3 = order tmp2 by $0;
   6 $Y = distinct tmp3;
   7 }
   8 
   9 A = load 'sometext' using TextLoader() as (row) ;
  10 E = my_macro(A,row);
  11 
  12 A1 = load 'sometext2' using TextLoader() as (row1);
  13 E1 = my_macro(A1,row1);
  14 
  15 A3 = load 'sometext3' using TextLoader() as (row3);
  16 E3 = my_macro(A3,$0);
  17 
  18 F = cogroup E by $0, E1 by $0,E3 by $0;
  19 dump F;
 {code}
 pig test.pig gives correct line number in error message:
 {code}
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, 
 column 17  mismatched input '$0' expecting set null
 {code}
 while pig -r test.pig gives incorrect line number in error message:
 {code}
 ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, 
 column 17  mismatched input '$0' expecting set null
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2029.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

 Inconsistency in Pig Stats reports 
 ---

 Key: PIG-2029
 URL: https://issues.apache.org/jira/browse/PIG-2029
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2029.patch


 I have a Pig script which reports varying Stats for the same M/R job (same 
 inputs). Sometimes the PigStats reports all the stats (such as 
 Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
 and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
 Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
 Run 2, Hadoop job job_201104272229_75693 has some valid values. 
 The actual Job Tracker link shows that they are non empty. This points to a 
 bug in the interaction of the PigStats module with the Jobtracker.
 Run 1:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201103091134_556458   160 100 552 191 368 1257
 371 392 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201103091134_556600   0   0   0   0   0   0   
 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
 job_201103091134_556601   7   100 17  8   14  200 
 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201103091134_556602   0   0   0   0   0   0   
 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201103091134_556603   0   0   0   0   0   0   
 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201103091134_556604   2   100 13  7   10  34  
 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201103091134_556644   0   0   0   0   0   0   
 0   0   ONJOIN15SAMPLER 
 job_201103091134_556645   0   0   0   0   0   0   
 0   0   ONJOIN25SAMPLER 
 job_201103091134_556646   0   0   0   0   0   0   
 0   0   ONJOIN3 SAMPLER 
 job_201103091134_556654   0   0   0   0   0   0   
 0   0   ONJOIN19SAMPLER 
 job_201103091134_556662   0   0   0   0   0   0   
 0   0   ONJOIN19ORDER_BY,COMBINER
 ..
 {quote}
 Run 2:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201104272229_75503159 100 484 192 353 396 
 308 321 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201104272229_7569318  0   31  14  24  0   
 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
 job_201104272229_756947   100 34  13  22  46  
 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201104272229_75695125 100 19  11  15  32  
 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201104272229_756981   100 12  12  12  13  
 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201104272229_757022   100 21  5   13  35  
 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201104272229_757241   1   4   4   4   11  
 11  11  ONJOIN15SAMPLER 
 job_201104272229_757250   0   0   0   0   0   
 0   ONJOIN25SAMPLER 
 job_201104272229_757266   1   8   6   8   24  
 24  24  ONJOIN3 SAMPLER 
 job_201104272229_757290   0   0   0   0   0   
 0   ONJOIN19SAMPLER 
 job_201104272229_757521   100 5   5   5

[jira] [Resolved] (PIG-1824) Support import modules in Jython UDF

2011-05-20 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1824.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

patch committed to trunk. Thanks Woody!

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
 1824c.patch, 1824d.patch, 1824x.patch, 
 TEST-org.apache.pig.test.TestGrunt.txt, 
 TEST-org.apache.pig.test.TestScriptLanguage.txt, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-19 Thread Richard Ding (JIRA)
Dryrun gives wrong line numbers in error message for scripts containing macro.
--

 Key: PIG-2081
 URL: https://issues.apache.org/jira/browse/PIG-2081
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


For following script (test.pig)

{code}
1 DEFINE my_macro (X,key) returns Y
  2 {
  3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
  4 tmp2 = foreach tmp1 generate flatten(tokens);
  5 tmp3 = order tmp2 by $0;
  6 $Y = distinct tmp3;
  7 }
  8 
  9 A = load 'sometext' using TextLoader() as (row) ;
 10 E = my_macro(A,row);
 11 
 12 A1 = load 'sometext2' using TextLoader() as (row1);
 13 E1 = my_macro(A1,row1);
 14 
 15 A3 = load 'sometext3' using TextLoader() as (row3);
 16 E3 = my_macro(A3,$0);
 17 
 18 F = cogroup E by $0, E1 by $0,E3 by $0;
 19 dump F;
{code}

pig test.pig gives correct line number in error message:

{code}
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, 
column 17  mismatched input '$0' expecting set null
{code}

while pig -r test.pig gives incorrect line number in error message:

{code}
ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, 
column 17  mismatched input '$0' expecting set null
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-19 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036403#comment-13036403
 ] 

Richard Ding commented on PIG-2029:
---

Patch committed to trunk and 0.9 branch.

 Inconsistency in Pig Stats reports 
 ---

 Key: PIG-2029
 URL: https://issues.apache.org/jira/browse/PIG-2029
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2029.patch


 I have a Pig script which reports varying Stats for the same M/R job (same 
 inputs). Sometimes the PigStats reports all the stats (such as 
 Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
 and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
 Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
 Run 2, Hadoop job job_201104272229_75693 has some valid values. 
 The actual Job Tracker link shows that they are non empty. This points to a 
 bug in the interaction of the PigStats module with the Jobtracker.
 Run 1:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201103091134_556458   160 100 552 191 368 1257
 371 392 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201103091134_556600   0   0   0   0   0   0   
 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
 job_201103091134_556601   7   100 17  8   14  200 
 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201103091134_556602   0   0   0   0   0   0   
 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201103091134_556603   0   0   0   0   0   0   
 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201103091134_556604   2   100 13  7   10  34  
 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201103091134_556644   0   0   0   0   0   0   
 0   0   ONJOIN15SAMPLER 
 job_201103091134_556645   0   0   0   0   0   0   
 0   0   ONJOIN25SAMPLER 
 job_201103091134_556646   0   0   0   0   0   0   
 0   0   ONJOIN3 SAMPLER 
 job_201103091134_556654   0   0   0   0   0   0   
 0   0   ONJOIN19SAMPLER 
 job_201103091134_556662   0   0   0   0   0   0   
 0   0   ONJOIN19ORDER_BY,COMBINER
 ..
 {quote}
 Run 2:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201104272229_75503159 100 484 192 353 396 
 308 321 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201104272229_7569318  0   31  14  24  0   
 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
 job_201104272229_756947   100 34  13  22  46  
 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201104272229_75695125 100 19  11  15  32  
 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201104272229_756981   100 12  12  12  13  
 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201104272229_757022   100 21  5   13  35  
 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201104272229_757241   1   4   4   4   11  
 11  11  ONJOIN15SAMPLER 
 job_201104272229_757250   0   0   0   0   0   
 0   ONJOIN25SAMPLER 
 job_201104272229_757266   1   8   6   8   24  
 24  24  ONJOIN3 SAMPLER 
 job_201104272229_757290   0   0   0   0   0   
 0   ONJOIN19SAMPLER 
 job_201104272229_75752   

[jira] [Updated] (PIG-2081) Dryrun gives wrong line numbers in error message for scripts containing macro.

2011-05-19 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2081:
--

Attachment: PIG-2081.patch

 Dryrun gives wrong line numbers in error message for scripts containing macro.
 --

 Key: PIG-2081
 URL: https://issues.apache.org/jira/browse/PIG-2081
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2081.patch


 For following script (test.pig)
 {code}
 1 DEFINE my_macro (X,key) returns Y
   2 {
   3 tmp1 = foreach  $X generate TOKENIZE((chararray)$key) as tokens;
   4 tmp2 = foreach tmp1 generate flatten(tokens);
   5 tmp3 = order tmp2 by $0;
   6 $Y = distinct tmp3;
   7 }
   8 
   9 A = load 'sometext' using TextLoader() as (row) ;
  10 E = my_macro(A,row);
  11 
  12 A1 = load 'sometext2' using TextLoader() as (row1);
  13 E1 = my_macro(A1,row1);
  14 
  15 A3 = load 'sometext3' using TextLoader() as (row3);
  16 E3 = my_macro(A3,$0);
  17 
  18 F = cogroup E by $0, E1 by $0,E3 by $0;
  19 dump F;
 {code}
 pig test.pig gives correct line number in error message:
 {code}
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 16, 
 column 17  mismatched input '$0' expecting set null
 {code}
 while pig -r test.pig gives incorrect line number in error message:
 {code}
 ERROR org.apache.pig.Main - ERROR 1200: file test.pig.substituted, line 1, 
 column 17  mismatched input '$0' expecting set null
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2011-05-18 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035542#comment-13035542
 ] 

Richard Ding commented on PIG-1824:
---

The new patch fixed the unit test errors reported earlier. I have one 
(different) failed test in TestGrunt, not sure if it's related to the patch. 

 Support import modules in Jython UDF
 

 Key: PIG-1824
 URL: https://issues.apache.org/jira/browse/PIG-1824
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Woody Anderson
 Fix For: 0.10

 Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, 
 1824c.patch, 1824d.patch, 1824x.patch, 
 TEST-org.apache.pig.test.TestGrunt.txt, 
 TEST-org.apache.pig.test.TestScriptLanguage.txt, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 Currently, Jython UDF script doesn't support Jython import statement as in 
 the following example:
 {code}
 #!/usr/bin/python
 import re
 @outputSchema(word:chararray)
 def resplit(content, regex, index):
 return re.compile(regex).split(content)[index]
 {code}
 Can Pig automatically locate the Jython module file and ship it to the 
 backend? Or should we add a ship clause to let user explicitly specify the 
 module to ship? 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-17 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035010#comment-13035010
 ] 

Richard Ding commented on PIG-2029:
---

Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't 
available by querying hadoop using hadoop client API. This is misleading. I 
propose that we change those values to 'n/a' as following:

{code}
Job Stats (time in seconds):
JobId   MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
job_201104272229_434232 2   10  354 220 287 168 149 
163 
IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
   DISTINCT,MULTI_QUERY
job_201104272229_434319 2   0   9   3   6   0   0   
0   UNION5  MULTI_QUERY,MAP_ONLY/user/rding/verifypigstats2-UNION5,
job_201104272229_434320 2   10  n/a n/a n/a n/a n/a 
n/a CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
job_201104272229_434321 1   10  5   5   5   23  9   
17  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
job_201104272229_434322 2   10  n/a n/a n/a n/a n/a 
n/a CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
job_201104272229_434323 2   10  n/a n/a n/a n/a n/a 
n/a CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
job_201104272229_434331 2   1   n/a n/a n/a n/a n/a 
n/a ONJOIN15SAMPLER 
job_201104272229_434332 2   1   n/a n/a n/a n/a n/a 
n/a ONJOIN3 SAMPLER 
job_201104272229_434333 1   1   2   2   2   13  13  
13  ONJOIN25SAMPLER 
job_201104272229_434334 1   1   1   1   1   12  12  
12  ONJOIN19SAMPLER 
job_201104272229_434342 1   10  2   2   2   16  8   
11  ONJOIN25ORDER_BY,COMBINER   
{code}

 Inconsistency in Pig Stats reports 
 ---

 Key: PIG-2029
 URL: https://issues.apache.org/jira/browse/PIG-2029
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.10


 I have a Pig script which reports varying Stats for the same M/R job (same 
 inputs). Sometimes the PigStats reports all the stats (such as 
 Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
 and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
 Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
 Run 2, Hadoop job job_201104272229_75693 has some valid values. 
 The actual Job Tracker link shows that they are non empty. This points to a 
 bug in the interaction of the PigStats module with the Jobtracker.
 Run 1:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201103091134_556458   160 100 552 191 368 1257
 371 392 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201103091134_556600   0   0   0   0   0   0   
 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
 job_201103091134_556601   7   100 17  8   14  200 
 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201103091134_556602   0   0   0   0   0   0   
 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201103091134_556603   0   0   0   0   0   0   
 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201103091134_556604   2   100 13  7   10  34  
 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201103091134_556644   0   0   0   0   0   0   
 0   0   ONJOIN15SAMPLER 
 job_201103091134_556645   0   0   0   0   0   0   
 0   0   ONJOIN25SAMPLER 
 job_201103091134_556646   0   0   0   0   0   0   
 0   0   ONJOIN3 SAMPLER 
 job_201103091134_556654   0   0   0   0   0   0   
 0   0   ONJOIN19   

[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports

2011-05-17 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2029:
--

Attachment: PIG-2029.patch

 Inconsistency in Pig Stats reports 
 ---

 Key: PIG-2029
 URL: https://issues.apache.org/jira/browse/PIG-2029
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.10

 Attachments: PIG-2029.patch


 I have a Pig script which reports varying Stats for the same M/R job (same 
 inputs). Sometimes the PigStats reports all the stats (such as 
 Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime 
 and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly.
 Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 
 job_201103091134_556600 from Run 1; has 0 against all the columns whereas in 
 Run 2, Hadoop job job_201104272229_75693 has some valid values. 
 The actual Job Tracker link shows that they are non empty. This points to a 
 bug in the interaction of the PigStats module with the Jobtracker.
 Run 1:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201103091134_556458   160 100 552 191 368 1257
 371 392 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201103091134_556600   0   0   0   0   0   0   
 0   0   UNION5  MULTI_QUERY,MAP_ONLY/user/viraj/dir,,
 job_201103091134_556601   7   100 17  8   14  200 
 15  27  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201103091134_556602   0   0   0   0   0   0   
 0   0   CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201103091134_556603   0   0   0   0   0   0   
 0   0   CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201103091134_556604   2   100 13  7   10  34  
 13  31  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201103091134_556644   0   0   0   0   0   0   
 0   0   ONJOIN15SAMPLER 
 job_201103091134_556645   0   0   0   0   0   0   
 0   0   ONJOIN25SAMPLER 
 job_201103091134_556646   0   0   0   0   0   0   
 0   0   ONJOIN3 SAMPLER 
 job_201103091134_556654   0   0   0   0   0   0   
 0   0   ONJOIN19SAMPLER 
 job_201103091134_556662   0   0   0   0   0   0   
 0   0   ONJOIN19ORDER_BY,COMBINER
 ..
 {quote}
 Run 2:
 {quote}
 Job Stats (time in seconds):
 JobId MapsReduces MaxMapTime  MinMapTIme  AvgMapTime  
 MaxReduceTime   MinReduceTime   AvgReduceTime   Alias   Feature Outputs
 job_201104272229_75503159 100 484 192 353 396 
 308 321 
 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P
DISTINCT,MULTI_QUERY
 job_201104272229_7569318  0   31  14  24  0   
 0   UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir,
 job_201104272229_756947   100 34  13  22  46  
 20  25  CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER   
 job_201104272229_75695125 100 19  11  15  32  
 18  26  CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER   
 job_201104272229_756981   100 12  12  12  13  
 9   11  CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER   
 job_201104272229_757022   100 21  5   13  35  
 22  26  CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER   
 job_201104272229_757241   1   4   4   4   11  
 11  11  ONJOIN15SAMPLER 
 job_201104272229_757250   0   0   0   0   0   
 0   ONJOIN25SAMPLER 
 job_201104272229_757266   1   8   6   8   24  
 24  24  ONJOIN3 SAMPLER 
 job_201104272229_757290   0   0   0   0   0   
 0   ONJOIN19SAMPLER 
 job_201104272229_757521   100 5   5   5   12  
 9   11  

[jira] [Resolved] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case

2011-05-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2069.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass. Patch committed to trunk and 0.9 branch.

 LoadFunc jar does not ship to backend in MultiQuery case
 

 Key: PIG-2069
 URL: https://issues.apache.org/jira/browse/PIG-2069
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Daniel Dai
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2069.patch


 Pig is able to automatically figure out the jar containing the LoadFunc and 
 ship them to backend. However, the following script didn't:
 {code}
 A = load '1.txt' using SomeLoadFunc();
 B = filter A by $0==0;
 C = filter A by $1==1;
 D = join B by $0, C by $0;
 dump D;
 {code}
 The reason is this query is a multiquery (A is reused and thus create an 
 implicit split). When we merge multiquery into one job, we didn't merge udfs 
 list properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2070) Unknown appears in error message for an error case

2011-05-16 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034135#comment-13034135
 ] 

Richard Ding commented on PIG-2070:
---

+1

 Unknown appears in error message for an error case
 

 Key: PIG-2070
 URL: https://issues.apache.org/jira/browse/PIG-2070
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Thejas M Nair
 Fix For: 0.9.0

 Attachments: PIG-2070.1.patch


 For the following query:
 a = load '1.txt' as (a0:int, a1:int);
 b = load '2.txt' as (a0:int, a1:chararray);
 c = cogroup a by (a0,a1), b by (a0,a1);
 Pig gives the following message, which includes unknown word. 
 2011-05-13 11:01:18,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1051:
 line 3, column 4 Cannot cast to Unknown
 The error message should be more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1819) For implicit binding, Jython embedded Pig should skip any variable/value that contains $.

2011-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1819.
---

Resolution: Fixed

This is fixed per PIG-1827.

 For implicit binding, Jython embedded Pig should skip any variable/value that 
 contains $. 
 --

 Key: PIG-1819
 URL: https://issues.apache.org/jira/browse/PIG-1819
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1819.patch, PIG-1819_1.patch, PIG-1819_2.patch


 We use the Pig parameter substitution for the bindings so variable/value that 
 contains $ cannot be used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.9 branch.

 When passing a parameter to Pig, if the value contains $ it has to be escaped 
 for no apparent reason
 

 Key: PIG-1827
 URL: https://issues.apache.org/jira/browse/PIG-1827
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Julien Le Dem
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2067) FilterLogicExpressionSimplifier removed some branches in some cases

2011-05-13 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033151#comment-13033151
 ] 

Richard Ding commented on PIG-2067:
---

+1

 FilterLogicExpressionSimplifier removed some branches in some cases
 ---

 Key: PIG-2067
 URL: https://issues.apache.org/jira/browse/PIG-2067
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1, 0.9.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.1, 0.9.0

 Attachments: PIG-2067-1-0.8.patch, PIG-2067-1.patch


 The following script produce wrong result:
 {code}
 A = load 'a.dat' as (cookie);
 B = load 'b.dat' as (cookie);
 C = cogroup A by cookie, B by cookie;
 E = filter C by COUNT(B)0 AND COUNT(A)0;
 explain E;
 {code}
 a.dat:
 1   1
 2   2
 3   3
 4   4
 5   5
 6   6
 7   7
 b.dat:
 3   3
 4   4
 5   5
 6   6
 7   7
 8   8
 Expected output:
 (3,{(3)},{(3)})
 (4,{(4)},{(4)})
 (5,{(5)},{(5)})
 (6,{(6)},{(6)})
 (7,{(7)},{(7)})
 We get:
 (3,{(3)},{(3)})
 (4,{(4)},{(4)})
 (5,{(5)},{(5)})
 (6,{(6)},{(6)})
 (7,{(7)},{(7)})
 (8,{},{(8)})

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2069) LoadFunc jar does not ship to backend in MultiQuery case

2011-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2069:
--

Attachment: PIG-2069.patch

This happens when the original MapReduce DAG (before optimization) contains a 
diamond node.

User can workaround this by explicitly registering the LoadFunc jar in the 
script.

The attached patch provides a fix. It's verified with manual test.

 LoadFunc jar does not ship to backend in MultiQuery case
 

 Key: PIG-2069
 URL: https://issues.apache.org/jira/browse/PIG-2069
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Daniel Dai
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2069.patch


 Pig is able to automatically figure out the jar containing the LoadFunc and 
 ship them to backend. However, the following script didn't:
 {code}
 A = load '1.txt' using SomeLoadFunc();
 B = filter A by $0==0;
 C = filter A by $1==1;
 D = join B by $0, C by $0;
 dump D;
 {code}
 The reason is this query is a multiquery (A is reused and thus create an 
 implicit split). When we merge multiquery into one job, we didn't merge udfs 
 list properly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2076) update documentation, help command with correct default value of pig.cachedbag.memusage

2011-05-13 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033394#comment-13033394
 ] 

Richard Ding commented on PIG-2076:
---

+1

 update documentation, help command with correct default value of 
 pig.cachedbag.memusage
 ---

 Key: PIG-2076
 URL: https://issues.apache.org/jira/browse/PIG-2076
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.9.0

 Attachments: PIG-2076.1.patch


 The default value of pig.cachedbag.memusage was changed to 0.2 in pig 0.8, as 
 part of changes in PIG-1447 .
 But the help command and documentation shows older default value of 0.1 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2056) Jython error messages should show script name

2011-05-12 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2056.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass. Patch committed to trunk and 0.9 branch.

 Jython error messages should show script name
 -

 Key: PIG-2056
 URL: https://issues.apache.org/jira/browse/PIG-2056
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0

 Attachments: PIG-2056.patch


 Instead of messages like
 {code}
 Traceback (most recent call last):
   File iostream, line 12, in module
 {code}
 It should display the script file name:
 {code}
 Traceback (most recent call last):
   File test.py, line 12, in module
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-11 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2058.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

 Macro missing returns clause doesn't give a good error message
 --

 Key: PIG-2058
 URL: https://issues.apache.org/jira/browse/PIG-2058
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2058.patch


 For the following query:
 define test( out1,out2 ){
A  = load 'x' as (u:int, v:int);
$B  = filter A by u  3 and v   20;
 }
 Pig gives the following error message: Syntax error,unexpected symbol at or 
 near '{'
 Previously, it gives: mismatched input '{' expecting RETURNS
 The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2056) Jython error messages should show script name

2011-05-11 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031911#comment-13031911
 ] 

Richard Ding commented on PIG-2056:
---

Result of test-patch:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

 Jython error messages should show script name
 -

 Key: PIG-2056
 URL: https://issues.apache.org/jira/browse/PIG-2056
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0

 Attachments: PIG-2056.patch


 Instead of messages like
 {code}
 Traceback (most recent call last):
   File iostream, line 12, in module
 {code}
 It should display the script file name:
 {code}
 Traceback (most recent call last):
   File test.py, line 12, in module
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2056) Jython error messages should show script name

2011-05-10 Thread Richard Ding (JIRA)
Jython error messages should show script name
-

 Key: PIG-2056
 URL: https://issues.apache.org/jira/browse/PIG-2056
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0


Instead of messages like

{code}
Traceback (most recent call last):
  File iostream, line 12, in module
{code}

It should display the script file name:

{code}
Traceback (most recent call last):
  File test.py, line 12, in module
{code}



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2056) Jython error messages should show script name

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2056:
--

Attachment: PIG-2056.patch

 Jython error messages should show script name
 -

 Key: PIG-2056
 URL: https://issues.apache.org/jira/browse/PIG-2056
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0

 Attachments: PIG-2056.patch


 Instead of messages like
 {code}
 Traceback (most recent call last):
   File iostream, line 12, in module
 {code}
 It should display the script file name:
 {code}
 Traceback (most recent call last):
   File test.py, line 12, in module
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2035.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

 Macro expansion doesn't handle multiple expansions of same macro inside 
 another macro
 -

 Key: PIG-2035
 URL: https://issues.apache.org/jira/browse/PIG-2035
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2035_1.patch


 Here is the use case:
 {code}
 define test ( in, out, x ) returns c { 
 a = load '$in' as (name, age, gpa);
 b = group a by gpa;
 $c = foreach b generate group, COUNT(a.$x);
 store $c into '$out';
 };
 define test2( in, out ) returns x { 
 $x = test( '$in', '$out', 'name' );
 $x = test( '$in', '$out.1', 'age' );
 $x = test( '$in', '$out.2', 'gpa' );
 };
 x = test2('studenttab10k', 'myoutput');
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2058) Macro missing returns clause doesn't give a good error message

2011-05-10 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2058:
--

Attachment: PIG-2058.patch

Thanks Xuefu. Attaching a patch with the fix.

 Macro missing returns clause doesn't give a good error message
 --

 Key: PIG-2058
 URL: https://issues.apache.org/jira/browse/PIG-2058
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2058.patch


 For the following query:
 define test( out1,out2 ){
A  = load 'x' as (u:int, v:int);
$B  = filter A by u  3 and v   20;
 }
 Pig gives the following error message: Syntax error,unexpected symbol at or 
 near '{'
 Previously, it gives: mismatched input '{' expecting RETURNS
 The previous message is more meaningful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-05-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2012:
--

Attachment: PIG-2012_2.patch

Thanks Xuefu. The new patch addresses the review comments.

 Comments at the begining of the file throws off line numbers in errors
 --

 Key: PIG-2012
 URL: https://issues.apache.org/jira/browse/PIG-2012
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2012_1.patch, PIG-2012_2.patch, macro.pig


 The preprocessor does not appear to be handling leading comments properly 
 when calculating line numbers for error messages.  In the attached script, 
 the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-05-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

Attachment: PIG-1827_3.patch

We should limit this jira to fix the issue in embedded Pig (i.e. workaround the 
general parameter substitution) and visit parameter substitution parser and 
related code in a separate jira.

 When passing a parameter to Pig, if the value contains $ it has to be escaped 
 for no apparent reason
 

 Key: PIG-1827
 URL: https://issues.apache.org/jira/browse/PIG-1827
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Julien Le Dem
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-05-09 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030972#comment-13030972
 ] 

Richard Ding commented on PIG-1827:
---

New patch added a unit test case as suggested.

 When passing a parameter to Pig, if the value contains $ it has to be escaped 
 for no apparent reason
 

 Key: PIG-1827
 URL: https://issues.apache.org/jira/browse/PIG-1827
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Julien Le Dem
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1827-1.patch, PIG-1827_2.patch, PIG-1827_3.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-06 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13030019#comment-13030019
 ] 

Richard Ding commented on PIG-2035:
---

Unit tests pass.

 Macro expansion doesn't handle multiple expansions of same macro inside 
 another macro
 -

 Key: PIG-2035
 URL: https://issues.apache.org/jira/browse/PIG-2035
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2035_1.patch


 Here is the use case:
 {code}
 define test ( in, out, x ) returns c { 
 a = load '$in' as (name, age, gpa);
 b = group a by gpa;
 $c = foreach b generate group, COUNT(a.$x);
 store $c into '$out';
 };
 define test2( in, out ) returns x { 
 $x = test( '$in', '$out', 'name' );
 $x = test( '$in', '$out.1', 'age' );
 $x = test( '$in', '$out.2', 'gpa' );
 };
 x = test2('studenttab10k', 'myoutput');
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2049) Pig should display TokenMgrError consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)
Pig should display TokenMgrError consistently across all parsers


 Key: PIG-2049
 URL: https://issues.apache.org/jira/browse/PIG-2049
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0


For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs

{code}
ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
{code}

But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs

{code}
ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
{code}

Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2049) Pig should display TokenMgrError consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2049:
--

Attachment: PIG-2049.patch

 Pig should display TokenMgrError consistently across all parsers
 

 Key: PIG-2049
 URL: https://issues.apache.org/jira/browse/PIG-2049
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0

 Attachments: PIG-2049.patch


 For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
 {code}
 But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs
 {code}
 ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
 {code}
 Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2050) Pig can't reference auto-generated schema name for TOTUPLE

2011-05-06 Thread Richard Ding (JIRA)
Pig can't reference auto-generated schema name for TOTUPLE
--

 Key: PIG-2050
 URL: https://issues.apache.org/jira/browse/PIG-2050
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Richard Ding
Priority: Minor


Here is the use case:

{code}
grunt A = load 'data' as (a0, a1, a2); 
grunt B = foreach A generate TOTUPLE(a0, a2);  
grunt describe B
B: {org.apache.pig.builtin.totuple_a0_3: (a0: bytearray,a2: bytearray)}
grunt C = foreach B generate org.apache.pig.builtin.totuple_a0_3;
2011-05-06 14:38:14,635 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Invalid alias: org in 
{org.apache.pig.builtin.totuple_a0_1: (a0: bytearray,a2: bytearray)}
{code}

The workaround is to specify a use-defined schema name:

{code}
grunt A = load 'data' as (a0, a1, a2); 
 
grunt B = foreach A generate TOTUPLE(a0, a2) as aa;  
grunt describe B 
B: {aa: (a0: bytearray,a2: bytearray)}
grunt C = foreach B generate aa; 
grunt 
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2049) Pig should display TokenMgrError message consistently across all parsers

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2049.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

 Pig should display TokenMgrError message consistently across all parsers
 

 Key: PIG-2049
 URL: https://issues.apache.org/jira/browse/PIG-2049
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.9.0

 Attachments: PIG-2049.patch


 For example, for org.apache.pig.tools.pigscript.parser.TokenMgrError, Pig logs
 {code}
 ERROR 1000: Error during parsing. Lexical error at line 5, column 0.
 {code}
 But for org.apache.pig.tools.parameters.TokenMgrError, Pig logs
 {code}
 ERROR 2998: Unhandled internal error. Lexical error at line 10, column 0.
 {code}
 Both should have error code 1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2033) Pig returns sucess for the failed Pig script

2011-05-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2033.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Unit tests pass on 0.8 branch. Patch committed to 0.8 branch, 0.9 branch and 
trunk.

 Pig returns sucess for the failed Pig script
 

 Key: PIG-2033
 URL: https://issues.apache.org/jira/browse/PIG-2033
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.1, 0.9.0

 Attachments: PIG-2033.patch


 Pig returns success when a Pig script fails but the count of failed MR jobs 
 is zero. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2041) Minicluster should make each run independent

2011-05-05 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029488#comment-13029488
 ] 

Richard Ding commented on PIG-2041:
---

+1

 Minicluster should make each run independent
 

 Key: PIG-2041
 URL: https://issues.apache.org/jira/browse/PIG-2041
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0, 0.9.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.9.0

 Attachments: PIG-2041-1.patch


 Minicluster will reuse ~/pigtest/conf/hadoop-site.xml. If something wrong in 
 hadoop-site.xml, next test will also be affected. This leads to some 
 mysterious test failures. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2033) Pig returns sucess for the failed Pig script

2011-05-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2033:
--

Attachment: PIG-2033.patch

We make sure that Pig returns success iff the number of successfully jobs equal 
the number of compiled jobs.

This patch doesn't include a unit test since it's difficult to simulate the 
failure case.

 Pig returns sucess for the failed Pig script
 

 Key: PIG-2033
 URL: https://issues.apache.org/jira/browse/PIG-2033
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.1, 0.9.0

 Attachments: PIG-2033.patch


 Pig returns success when a Pig script fails but the count of failed MR jobs 
 is zero. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-04 Thread Richard Ding (JIRA)
Macro expansion doesn't handle multiple expansions of same macro inside another 
macro
-

 Key: PIG-2035
 URL: https://issues.apache.org/jira/browse/PIG-2035
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


Here is the use case:

{code}
define test ( in, out, x ) returns c { 
a = load '$in' as (name, age, gpa);
b = group a by gpa;
$c = foreach b generate group, COUNT(a.$x);
store $c into '$out';
};

define test2( in, out ) returns x { 
$x = test( '$in', '$out', 'name' );
$x = test( '$in', '$out.1', 'age' );
$x = test( '$in', '$out.2', 'gpa' );
};

x = test2('studenttab10k', 'myoutput');
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-04 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2035:
--

Attachment: PIG-2035_1.patch

 Macro expansion doesn't handle multiple expansions of same macro inside 
 another macro
 -

 Key: PIG-2035
 URL: https://issues.apache.org/jira/browse/PIG-2035
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2035_1.patch


 Here is the use case:
 {code}
 define test ( in, out, x ) returns c { 
 a = load '$in' as (name, age, gpa);
 b = group a by gpa;
 $c = foreach b generate group, COUNT(a.$x);
 store $c into '$out';
 };
 define test2( in, out ) returns x { 
 $x = test( '$in', '$out', 'name' );
 $x = test( '$in', '$out.1', 'age' );
 $x = test( '$in', '$out.2', 'gpa' );
 };
 x = test2('studenttab10k', 'myoutput');
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2035) Macro expansion doesn't handle multiple expansions of same macro inside another macro

2011-05-04 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029045#comment-13029045
 ] 

Richard Ding commented on PIG-2035:
---

test-patch result:

{code}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 585 release 
audit warnings (more than the trunk's current 584 warnings).
{code}



 Macro expansion doesn't handle multiple expansions of same macro inside 
 another macro
 -

 Key: PIG-2035
 URL: https://issues.apache.org/jira/browse/PIG-2035
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2035_1.patch


 Here is the use case:
 {code}
 define test ( in, out, x ) returns c { 
 a = load '$in' as (name, age, gpa);
 b = group a by gpa;
 $c = foreach b generate group, COUNT(a.$x);
 store $c into '$out';
 };
 define test2( in, out ) returns x { 
 $x = test( '$in', '$out', 'name' );
 $x = test( '$in', '$out.1', 'age' );
 $x = test( '$in', '$out.2', 'gpa' );
 };
 x = test2('studenttab10k', 'myoutput');
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2028) Speed up multiquery unit tests

2011-05-03 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2028.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Path committed to trunk and 0.9 branch.

 Speed up multiquery unit tests 
 ---

 Key: PIG-2028
 URL: https://issues.apache.org/jira/browse/PIG-2028
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2028.patch, PIG-2028_1.patch


 Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results 
 on my laptop:
 Using Mini Cluster:
 TestMultiQueryBasic: 17 min 17 sec
 TestMultiQuery:  23 min 2 sec
 Using LOCAL mode:
 TestMultiQueryBasic: 4 min 17 sec
 TestMultiQuery:  5 min 51 sec

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2028) Speed up multiquery unit tests

2011-05-02 Thread Richard Ding (JIRA)
Speed up multiquery unit tests 
---

 Key: PIG-2028
 URL: https://issues.apache.org/jira/browse/PIG-2028
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0


Switch TestMultiQueryBasic and TestMultiQuery to use LOCAL mode. The results on 
my laptop:

Using Mini Cluster:

TestMultiQueryBasic: 17 min 17 sec
TestMultiQuery:  23 min 2 sec

Using LOCAL mode:

TestMultiQueryBasic: 4 min 17 sec
TestMultiQuery:  5 min 51 sec




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1998) Allow macro to return void

2011-05-02 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1998:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.9 branch.

 Allow macro to return void
 --

 Key: PIG-1998
 URL: https://issues.apache.org/jira/browse/PIG-1998
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1998_1.patch, PIG-1998_2.patch, PIG-1998_3.patch


 Pig macro is allowed to not have output alias. But this property isn't clear 
 from macro definition and macro invocation (macro inline). Here we propose to 
 make it clear:
 1. If a macro doesn't output any alias, it must specify void as return value. 
 For example:
 {code}  
 define mymacro(...) returns void {
... ...
 };
 {code}
 2. If a macro doesn't output any alias, it must be invoked without return 
 value. For example, to invoke above macro, just specify:
 {code}
 mymacro(...);
 {code}
 3. Any non-void return alias in the macro definition must exist in the macro 
 body and be prefixed with $. For example:
 {code}  
 define mymacro(...) returns B {
... ...
$B = filter ...;
 };
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-29 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1827:
--

Status: Patch Available  (was: Open)

 When passing a parameter to Pig, if the value contains $ it has to be escaped 
 for no apparent reason
 

 Key: PIG-1827
 URL: https://issues.apache.org/jira/browse/PIG-1827
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Julien Le Dem
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1827-1.patch, PIG-1827_2.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-04-29 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027174#comment-13027174
 ] 

Richard Ding commented on PIG-2012:
---

test-patch result:

{code}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] -1 javac.  The applied patch generated 964 javac compiler 
warnings (more than the trunk's current 963 warnings).
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

 Comments at the begining of the file throws off line numbers in errors
 --

 Key: PIG-2012
 URL: https://issues.apache.org/jira/browse/PIG-2012
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2012_1.patch, macro.pig


 The preprocessor does not appear to be handling leading comments properly 
 when calculating line numbers for error messages.  In the attached script, 
 the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-04-29 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2012:
--

Status: Patch Available  (was: Open)

 Comments at the begining of the file throws off line numbers in errors
 --

 Key: PIG-2012
 URL: https://issues.apache.org/jira/browse/PIG-2012
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2012_1.patch, macro.pig


 The preprocessor does not appear to be handling leading comments properly 
 when calculating line numbers for error messages.  In the attached script, 
 the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1998) Allow macro to return void

2011-04-29 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1998:
--

Status: Patch Available  (was: Open)

 Allow macro to return void
 --

 Key: PIG-1998
 URL: https://issues.apache.org/jira/browse/PIG-1998
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1998_1.patch, PIG-1998_2.patch, PIG-1998_3.patch


 Pig macro is allowed to not have output alias. But this property isn't clear 
 from macro definition and macro invocation (macro inline). Here we propose to 
 make it clear:
 1. If a macro doesn't output any alias, it must specify void as return value. 
 For example:
 {code}  
 define mymacro(...) returns void {
... ...
 };
 {code}
 2. If a macro doesn't output any alias, it must be invoked without return 
 value. For example, to invoke above macro, just specify:
 {code}
 mymacro(...);
 {code}
 3. Any non-void return alias in the macro definition must exist in the macro 
 body and be prefixed with $. For example:
 {code}  
 define mymacro(...) returns B {
... ...
$B = filter ...;
 };
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1998) Allow macro to return void

2011-04-29 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1998:
--

Attachment: PIG-1998_3.patch

The purpose of this validation is to give user an early warning when an alias 
in the returns clause doesn't appear in the macro as $alias. It performs a 
simple parsing using StreamTokenizer.

 Allow macro to return void
 --

 Key: PIG-1998
 URL: https://issues.apache.org/jira/browse/PIG-1998
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1998_1.patch, PIG-1998_2.patch, PIG-1998_3.patch


 Pig macro is allowed to not have output alias. But this property isn't clear 
 from macro definition and macro invocation (macro inline). Here we propose to 
 make it clear:
 1. If a macro doesn't output any alias, it must specify void as return value. 
 For example:
 {code}  
 define mymacro(...) returns void {
... ...
 };
 {code}
 2. If a macro doesn't output any alias, it must be invoked without return 
 value. For example, to invoke above macro, just specify:
 {code}
 mymacro(...);
 {code}
 3. Any non-void return alias in the macro definition must exist in the macro 
 body and be prefixed with $. For example:
 {code}  
 define mymacro(...) returns B {
... ...
$B = filter ...;
 };
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2012) Comments at the begining of the file throws off line numbers in errors

2011-04-28 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-2012:
--

Attachment: PIG-2012_1.patch

 Comments at the begining of the file throws off line numbers in errors
 --

 Key: PIG-2012
 URL: https://issues.apache.org/jira/browse/PIG-2012
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2012_1.patch, macro.pig


 The preprocessor does not appear to be handling leading comments properly 
 when calculating line numbers for error messages.  In the attached script, 
 the error is reported to be on line 7.  It is actually on line 10.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1998) Allow macro to return void

2011-04-27 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025898#comment-13025898
 ] 

Richard Ding commented on PIG-1998:
---

Patch 2 committed to both trunk and 0.9 branch. I'll add new patches to address 
additional review comments.

 Allow macro to return void
 --

 Key: PIG-1998
 URL: https://issues.apache.org/jira/browse/PIG-1998
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1998_1.patch, PIG-1998_2.patch


 Pig macro is allowed to not have output alias. But this property isn't clear 
 from macro definition and macro invocation (macro inline). Here we propose to 
 make it clear:
 1. If a macro doesn't output any alias, it must specify void as return value. 
 For example:
 {code}  
 define mymacro(...) returns void {
... ...
 };
 {code}
 2. If a macro doesn't output any alias, it must be invoked without return 
 value. For example, to invoke above macro, just specify:
 {code}
 mymacro(...);
 {code}
 3. Any non-void return alias in the macro definition must exist in the macro 
 body and be prefixed with $. For example:
 {code}  
 define mymacro(...) returns B {
... ...
$B = filter ...;
 };
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1998) Allow macro to return void

2011-04-26 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1998:
--

Attachment: PIG-1998_2.patch

Attaching a new patch that addresses Xuefu's review comments.

 Allow macro to return void
 --

 Key: PIG-1998
 URL: https://issues.apache.org/jira/browse/PIG-1998
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1998_1.patch, PIG-1998_2.patch


 Pig macro is allowed to not have output alias. But this property isn't clear 
 from macro definition and macro invocation (macro inline). Here we propose to 
 make it clear:
 1. If a macro doesn't output any alias, it must specify void as return value. 
 For example:
 {code}  
 define mymacro(...) returns void {
... ...
 };
 {code}
 2. If a macro doesn't output any alias, it must be invoked without return 
 value. For example, to invoke above macro, just specify:
 {code}
 mymacro(...);
 {code}
 3. Any non-void return alias in the macro definition must exist in the macro 
 body and be prefixed with $. For example:
 {code}  
 define mymacro(...) returns B {
... ...
$B = filter ...;
 };
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1827) When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason

2011-04-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024833#comment-13024833
 ] 

Richard Ding commented on PIG-1827:
---

Unit tests pass.

 When passing a parameter to Pig, if the value contains $ it has to be escaped 
 for no apparent reason
 

 Key: PIG-1827
 URL: https://issues.apache.org/jira/browse/PIG-1827
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Julien Le Dem
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1827-1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1865) BinStorage/PigStorageSchema cannot load data from a different namenode

2011-04-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024927#comment-13024927
 ] 

Richard Ding commented on PIG-1865:
---

+1

 BinStorage/PigStorageSchema cannot load data from a different namenode
 --

 Key: PIG-1865
 URL: https://issues.apache.org/jira/browse/PIG-1865
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0, 0.9.0
Reporter: Vivek Padmanabhan
Assignee: Daniel Dai
 Fix For: 0.9.0

 Attachments: PIG-1865-1.patch


 BinStorage/PigStorageSchema cannot load data from a different namenode. The 
 main reason for this is that, in the getSchema method , they use 
 org.apache.pig.impl.io.FileLocalizer to check whether the exists, but the 
 filesystem in HDataStorage refers to the natively configured dfs.
 The test case is simple :
 a = load 'hdfs://nn2/input' using BinStorage();
 dump a;
 Here if I specify -Dmapreduce.job.hdfs-servers, it should have worked , by 
 pig still takes the fs from fs.default.name so to make it work i had to 
 override  fs.default.name in pig command line.
 Raising this as a bug since the same scenario works with PigStorage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1976) One more TwoLevelAccess to remove

2011-04-25 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024928#comment-13024928
 ] 

Richard Ding commented on PIG-1976:
---

+1

 One more TwoLevelAccess to remove
 -

 Key: PIG-1976
 URL: https://issues.apache.org/jira/browse/PIG-1976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.9.0

 Attachments: PIG-1976-1.patch


 We removed two level access in PIG-847. However, there is another occurrence 
 we miss in ResourceSchema.java:
 {code}
 if (type == DataType.BAG  fieldSchema.schema != null
  !fieldSchema.schema.isTwoLevelAccessRequired()) { 
 log.info(Insert two-level access to Resource Schema);
 FieldSchema fs = new FieldSchema(t, fieldSchema.schema);
 inner = new Schema(fs);
 }
 {code}
 Though by default schema.isTwoLevelAccessRequired is false, we shall not use 
 this flag in Pig. User could set this flag in legacy UDF.
 Thanks Woody uncovered this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1999) Macro alias masker should consider schema context

2011-04-25 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1999:
--

Attachment: PIG-1999_1.patch

 Macro alias masker should consider schema context 
 --

 Key: PIG-1999
 URL: https://issues.apache.org/jira/browse/PIG-1999
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-1999_1.patch


 Macro alias masker doesn't consider the current schema context. This results 
 errors when deciding with alias to mask. Here is an example:
 {code}
 define toBytearray(in, intermediate) returns e { 
a = load '$in' as (name:chararray, age:long, gpa: float);
b = group a by  name;
c = foreach b generate a, (1,2,3);
store c into '$intermediate' using BinStorage();
d = load '$intermediate' using BinStorage() as (b:bag{t:tuple(x,y,z)}, 
 t2:tuple(a,b,c));
$e = foreach d generate COUNT(b), t2.a, t2.b, t2.c;
 };
  
 f = toBytearray ('data', 'output1');
 {code} 
 Now the alias masker mistakes b in COUNT(b) as an alias instead of b in the 
 current schema.
 The workaround is to not use alias as as names in the schema definition. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2005) Discrepancy in the way dry run handles semicolon in macro definition

2011-04-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023253#comment-13023253
 ] 

Richard Ding commented on PIG-2005:
---

Patch-test result:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

Unit tests pass.

 Discrepancy in the way dry run handles semicolon in macro definition
 

 Key: PIG-2005
 URL: https://issues.apache.org/jira/browse/PIG-2005
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2005_1.patch


 Macro definition requires a semicolon to mark the end. For example:
 {code}
 define mymacro(x) returns y {... ...};
 {code}
 But invoked through command line, the macro definitions without semicolon 
 also work except in the case of dryrun. This discrepancy is due to 
 GruntParser automatic appending a semicolon to Pig statements if semicolon is 
 absent at the end. Dryrun GruntParser should do the same.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   >