[jira] [Updated] (PIG-2837) AvroStorage throws StackOverFlowError

2012-07-26 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2837:
---

Attachment: PIG-2837.patch

> AvroStorage throws StackOverFlowError
> -
>
> Key: PIG-2837
> URL: https://issues.apache.org/jira/browse/PIG-2837
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Mubarak Seyed
>Assignee: Cheolsoo Park
> Attachments: PIG-2837.patch, avro_test_files.tar.gz
>
>
> When i try to dump avro data using
> {code}
> records = LOAD '/logs/records/07262012/01/1/Record.1343265732700.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage(); 
> dump records;
> {code}
> {code}
> Pig Stack Trace 
> --- 
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError 
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:258)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
> {code}
> I did verify the avro schema using avro-tools and dump the data as json 
> format, data looks good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2837) AvroStorage throws StackOverFlowError

2012-07-26 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2837:
---

Attachment: (was: PIG-2837.patch)

> AvroStorage throws StackOverFlowError
> -
>
> Key: PIG-2837
> URL: https://issues.apache.org/jira/browse/PIG-2837
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Mubarak Seyed
>Assignee: Cheolsoo Park
> Attachments: PIG-2837.patch, avro_test_files.tar.gz
>
>
> When i try to dump avro data using
> {code}
> records = LOAD '/logs/records/07262012/01/1/Record.1343265732700.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage(); 
> dump records;
> {code}
> {code}
> Pig Stack Trace 
> --- 
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError 
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:258)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
> {code}
> I did verify the avro schema using avro-tools and dump the data as json 
> format, data looks good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2837) AvroStorage throws StackOverFlowError

2012-07-26 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2837:
---

Status: Patch Available  (was: Open)

It seems that AvroStorage does not support recursive record and generic union:

{quote}
1. Limited support for "record": we do not support recursively defined record 
because the number of fields in such records is data dependent.
2. Limited support for "union": we only accept nullable union like ["null", 
"some-type"].
{quote}
https://cwiki.apache.org/PIG/avrostorage.html

AvroStorage checks the above limitations and throws exceptions when violated; 
however, since #2 is checked before #1, we ends up with stack overflow if 
schema is recursive. This can be avoided by changing the order of the checks so 
that AvroStorage fails fast if schema is recursive.

I uploaded a patch that changes the order of the checks and adds two test cases 
to TestAvroStorage to verify that proper exceptions are thrown for two cases. 
My test can be run with the following commands:
{code}
tar -xf avro_test_files.tar.gz
ant clean compile-test piggybank -Dhadoopversion=20
cd contrib/piggybank/java
ant test -Dtestcase=TestAvroStorage
{code}

> AvroStorage throws StackOverFlowError
> -
>
> Key: PIG-2837
> URL: https://issues.apache.org/jira/browse/PIG-2837
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Mubarak Seyed
>Assignee: Cheolsoo Park
> Attachments: PIG-2837.patch, avro_test_files.tar.gz
>
>
> When i try to dump avro data using
> {code}
> records = LOAD '/logs/records/07262012/01/1/Record.1343265732700.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage(); 
> dump records;
> {code}
> {code}
> Pig Stack Trace 
> --- 
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError 
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:258)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
> {code}
> I did verify the avro schema using avro-tools and dump the data as json 
> format, data looks good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!

[jira] [Assigned] (PIG-2837) AvroStorage throws StackOverFlowError

2012-07-26 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park reassigned PIG-2837:
--

Assignee: Cheolsoo Park

> AvroStorage throws StackOverFlowError
> -
>
> Key: PIG-2837
> URL: https://issues.apache.org/jira/browse/PIG-2837
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Mubarak Seyed
>Assignee: Cheolsoo Park
> Attachments: PIG-2837.patch, avro_test_files.tar.gz
>
>
> When i try to dump avro data using
> {code}
> records = LOAD '/logs/records/07262012/01/1/Record.1343265732700.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage(); 
> dump records;
> {code}
> {code}
> Pig Stack Trace 
> --- 
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError 
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:258)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
> {code}
> I did verify the avro schema using avro-tools and dump the data as json 
> format, data looks good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2837) AvroStorage throws StackOverFlowError

2012-07-26 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2837:
---

Attachment: avro_test_files.tar.gz
PIG-2837.patch

> AvroStorage throws StackOverFlowError
> -
>
> Key: PIG-2837
> URL: https://issues.apache.org/jira/browse/PIG-2837
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Mubarak Seyed
> Attachments: PIG-2837.patch, avro_test_files.tar.gz
>
>
> When i try to dump avro data using
> {code}
> records = LOAD '/logs/records/07262012/01/1/Record.1343265732700.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage(); 
> dump records;
> {code}
> {code}
> Pig Stack Trace 
> --- 
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError 
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:258)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
> {code}
> I did verify the avro schema using avro-tools and dump the data as json 
> format, data looks good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively

2012-07-26 Thread Jie Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423613#comment-13423613
 ] 

Jie Li commented on PIG-2829:
-

bq.Can you try these settings queries where there are around 10+ group+agg that 
get combined into single MR job ?

Sure the tpc-h Q1 I ran before has 8 aggregations. I'll further double the 
number of aggregations and also change the group-by key so that every hash map 
will get full, so we can identify if there's any memory issue.

bq. Can you do some benchmarks to see if there is any noticeable difference in 
runtime because of the delay in turning mapPartAgg off ? 

Yeah will compare these two settings.

> Use partial aggregation more aggresively
> 
>
> Key: PIG-2829
> URL: https://issues.apache.org/jira/browse/PIG-2829
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Jie Li
> Attachments: 2829.1.patch, 2829.separate.options.patch, 
> pigmix-10G.png, tpch-10G.png
>
>
> Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature 
> in Pig 0.10 that will perform aggregation within map function. The main 
> advantage against combiner is it avoids de/serializing and sorting the data, 
> and it can auto disable itself if the data reduction rate is low. Currently 
> it's disabled by default.
> To leverage the power of PartialAgg more aggressively, several things need to 
> be revisited:
> 1. The threshold of auto-disabling. Currently each mapper looks at first 1k 
> (hard-coded) records to see if there's enough data size reduction (defaults 
> to 10x, configurable). The check would happen earlier if the hash table gets 
> full before processing the 1k records (hash table size is controlled by 
> pig.cachedbag.memusage). We might want to relax these thresholds.
> 2. Dependency on the combiner. Currently the PartialAgg won't work without a 
> combiner following it, so we need to provide separate options to enable each 
> independently. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Total count of RandomSampleLoader is unpredicatable

2012-07-26 Thread Prasanth J
Hello everyone

I am using RandomSampleLoader to load 1000 tuples per mapper. I have 11 map 
jobs in a small dataset and 109 map jobs in a large dataset. 

I am expecting 11000 tuples from the small dataset and 109000 tuples from the 
large dataset. But the actual number of tuples that I get is always more than 
what I expected. In small dataset case I am getting 15000 tuples whereas in 
large dataset case I am getting 145000 (sometimes 15) tuples. 

Is this a bug? or is it an expected behavior? If reservoir sampling is used by 
all mappers then why is the number of total samples is more?

Thanks
-- Prasanth



[jira] [Commented] (PIG-2829) Use partial aggregation more aggresively

2012-07-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423602#comment-13423602
 ] 

Thejas M Nair commented on PIG-2829:


I will review the patch soon. Some comments regarding the default configuration 
- 

bq. 2: changes existing default values: 
After thinking of the multi-query use case, where you can have multiple 
POPartialAgg operators in a map task, I am having second thoughts on turning 
partial agg on by default. Can you try these settings queries where there are 
around 10+ group+agg that get combined into single MR job ? Maybe we should 
address the potential OOM issues for this use case before we change the 
defaults. This is likely to be become a bigger issue when we use 100k records 
to decide to turn on/off the partial aggregation.

bq. 3: adds a property pig.exec.mapPartAgg.reduction.checkinterval which 
defaults to 100k, so after processing every 100k records mapagg will check the 
reduction rate to see if it should be disabled. Previously we only look at 
first 1000 records.
Can you do some benchmarks to see if there is any noticeable difference in 
runtime because of the delay in turning mapPartAgg off ? 

> Use partial aggregation more aggresively
> 
>
> Key: PIG-2829
> URL: https://issues.apache.org/jira/browse/PIG-2829
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Jie Li
> Attachments: 2829.1.patch, 2829.separate.options.patch, 
> pigmix-10G.png, tpch-10G.png
>
>
> Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature 
> in Pig 0.10 that will perform aggregation within map function. The main 
> advantage against combiner is it avoids de/serializing and sorting the data, 
> and it can auto disable itself if the data reduction rate is low. Currently 
> it's disabled by default.
> To leverage the power of PartialAgg more aggressively, several things need to 
> be revisited:
> 1. The threshold of auto-disabling. Currently each mapper looks at first 1k 
> (hard-coded) records to see if there's enough data size reduction (defaults 
> to 10x, configurable). The check would happen earlier if the hash table gets 
> full before processing the 1k records (hash table size is controlled by 
> pig.cachedbag.memusage). We might want to relax these thresholds.
> 2. Dependency on the combiner. Currently the PartialAgg won't work without a 
> combiner following it, so we need to provide separate options to enable each 
> independently. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Pig-trunk #1283

2012-07-26 Thread Apache Jenkins Server
See 

Changes:

[julien] PIG-2740: get rid of "java[77427:1a03] Unable to load realm info from 
SCDynamicStore" log lines when running pig tests (julien)

[julien] PIG-2817: Documentation for Groovy UDFs (herberts via julien)

[julien] PIG-2839: mock.Storage overwrites output with the last relation 
written when storing UNION (julien)

[jcoveney] PIG-2840: Fix SchemaTuple bugs (jcoveney)

[julien] PIG-2842: TestNewPlanOperatorPlan fails when new Configuration() picks 
up a previous minicluster conf file (julien)

--
[...truncated 38015 lines...]
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:788)
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:566)
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:550)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:87)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:129)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 12/07/27 00:59:46 WARN datanode.FSDatasetAsyncDiskService: 
AsyncDiskService has already shut down.
[junit] 12/07/27 00:59:46 INFO mortbay.log: Stopped 
SelectChannelConnector@localhost:0
[junit] 12/07/27 00:59:46 INFO mapred.TaskTracker: Received 'KillJobAction' 
for job: job_20120727005135297_0012
[junit] 12/07/27 00:59:46 WARN mapred.TaskTracker: Unknown job 
job_20120727005135297_0012 being deleted.
[junit] 12/07/27 00:59:46 INFO ipc.Server: Stopping server on 48890
[junit] 12/07/27 00:59:46 INFO ipc.Server: IPC Server handler 2 on 48890: 
exiting
[junit] 12/07/27 00:59:46 INFO ipc.Server: IPC Server handler 0 on 48890: 
exiting
[junit] 12/07/27 00:59:46 INFO metrics.RpcInstrumentation: shut down
[junit] 12/07/27 00:59:46 INFO ipc.Server: Stopping IPC Server listener on 
48890
[junit] 12/07/27 00:59:46 INFO ipc.Server: IPC Server handler 1 on 48890: 
exiting
[junit] 12/07/27 00:59:46 WARN datanode.DataNode: 
DatanodeRegistration(127.0.0.1:39613, 
storageID=DS-1996163325-67.195.138.20-39613-1343350294883, infoPort=41713, 
ipcPort=48890):DataXceiveServer:java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
[junit] at java.lang.Thread.run(Thread.java:662)
[junit] 
[junit] 12/07/27 00:59:46 INFO datanode.DataNode: Exiting DataXceiveServer
[junit] 12/07/27 00:59:46 INFO datanode.DataNode: Waiting for threadgroup 
to exit, active threads is 1
[junit] 12/07/27 00:59:46 INFO ipc.Server: Stopping IPC Server Responder
[junit] 12/07/27 00:59:46 INFO datanode.DataBlockScanner: Exiting 
DataBlockScanner thread.
[junit] 12/07/27 00:59:46 INFO datanode.DataNode: 
DatanodeRegistration(127.0.0.1:39613, 
storageID=DS-1996163325-67.195.138.20-39613-1343350294883, infoPort=41713, 
ipcPort=48890):Finishing DataNode in: 
FSDataset{dirpath='

[jira] [Updated] (PIG-2845) Configure hadoop.tmp.dir under build/tmp for MiniCluster tests

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2845:
---

Attachment: PIG-2845_0.patch

> Configure hadoop.tmp.dir under build/tmp for MiniCluster tests
> --
>
> Key: PIG-2845
> URL: https://issues.apache.org/jira/browse/PIG-2845
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2845_0.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2845) Configure hadoop.tmp.dir under build/tmp for MiniCluster tests

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2845:
---

Patch Info: Patch Available

> Configure hadoop.tmp.dir under build/tmp for MiniCluster tests
> --
>
> Key: PIG-2845
> URL: https://issues.apache.org/jira/browse/PIG-2845
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2845_0.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2845) Configure hadoop.tmp.dir under build/tmp for MiniCluster tests

2012-07-26 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-2845:
--

 Summary: Configure hadoop.tmp.dir under build/tmp for MiniCluster 
tests
 Key: PIG-2845
 URL: https://issues.apache.org/jira/browse/PIG-2845
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2844) ant makepom is misconfigured

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2844:
---

Attachment: PIG-2844_0.patch

> ant makepom is misconfigured
> 
>
> Key: PIG-2844
> URL: https://issues.apache.org/jira/browse/PIG-2844
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2844_0.patch
>
>
> Currently we manually maintain a pom. We should use the ant makepom target 
> for this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2844) ant makepom is misconfigured

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2844:
---

Patch Info: Patch Available

> ant makepom is misconfigured
> 
>
> Key: PIG-2844
> URL: https://issues.apache.org/jira/browse/PIG-2844
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2844_0.patch
>
>
> Currently we manually maintain a pom. We should use the ant makepom target 
> for this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2844) ant makepom is misconfigured

2012-07-26 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-2844:
--

 Summary: ant makepom is misconfigured
 Key: PIG-2844
 URL: https://issues.apache.org/jira/browse/PIG-2844
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem


Currently we manually maintain a pom. We should use the ant makepom target for 
this

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2740) get rid of "java[77427:1a03] Unable to load realm info from SCDynamicStore" log lines when running pig tests

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2740.


   Resolution: Fixed
Fix Version/s: 0.11

> get rid of "java[77427:1a03] Unable to load realm info from SCDynamicStore" 
> log lines when running pig tests
> 
>
> Key: PIG-2740
> URL: https://issues.apache.org/jira/browse/PIG-2740
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2740.patch
>
>
> see https://issues.apache.org/jira/browse/HADOOP-7489

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2817) Documentation for Groovy UDFs

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2817.


   Resolution: Fixed
Fix Version/s: 0.11

committed to TRUNK
Thank you Mathias!

> Documentation for Groovy UDFs
> -
>
> Key: PIG-2817
> URL: https://issues.apache.org/jira/browse/PIG-2817
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.11
>Reporter: Julien Le Dem
>Assignee: Mathias Herberts
> Fix For: 0.11
>
> Attachments: PIG-2817-1.patch, PIG-2817-2.patch
>
>
> Update the documentation in trunk/src/docs/src/documentation/content/xdocs by 
> looking at the python udfs metions:
> basic.xml
> cont.xml
> pig-index.xml
> start.xml
> udf.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2817) Documentation for Groovy UDFs

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2817:
---

Attachment: PIG-2817-2.patch

PIG-2817-2.patch fixes the markup so that it passes DTD validation

> Documentation for Groovy UDFs
> -
>
> Key: PIG-2817
> URL: https://issues.apache.org/jira/browse/PIG-2817
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.11
>Reporter: Julien Le Dem
>Assignee: Mathias Herberts
> Attachments: PIG-2817-1.patch, PIG-2817-2.patch
>
>
> Update the documentation in trunk/src/docs/src/documentation/content/xdocs by 
> looking at the python udfs metions:
> basic.xml
> cont.xml
> pig-index.xml
> start.xml
> udf.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2829) Use partial aggregation more aggresively

2012-07-26 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2829:


Attachment: 2829.1.patch

Attached patch with all the rework (mostly learnt from hive):

1: separates options to enable combiner and mapagg

2: changes existing default values: 
||property||old default value||new default value||comment||
|pig.exec.nocombiner|false|true| disable combiner by default|
|pig.exec.mapPartAgg|false|true| enable mapagg by default|
|pig.exec.mapPartAgg.minReduction|10|2.0| more aggressive. also change from int 
to double|

3: adds a property pig.exec.mapPartAgg.reduction.checkinterval which defaults 
to 100k,  so after processing every 100k records mapagg will check the 
reduction rate to see if it should be disabled. Previously we only look at 
first 1000 records.

4: previously the reduction check would also happen if the hash map gets full. 
The patch removes this condition and instead it keeps track of the total new 
hash map entries, so the reduction check will only be triggered by 
pig.exec.mapPartAgg.reduction.checkinterval, which is easier to control.

Welcome to give any comment! Will work on fixing unit tests and performance 
testing. 

> Use partial aggregation more aggresively
> 
>
> Key: PIG-2829
> URL: https://issues.apache.org/jira/browse/PIG-2829
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Jie Li
> Attachments: 2829.1.patch, 2829.separate.options.patch, 
> pigmix-10G.png, tpch-10G.png
>
>
> Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature 
> in Pig 0.10 that will perform aggregation within map function. The main 
> advantage against combiner is it avoids de/serializing and sorting the data, 
> and it can auto disable itself if the data reduction rate is low. Currently 
> it's disabled by default.
> To leverage the power of PartialAgg more aggressively, several things need to 
> be revisited:
> 1. The threshold of auto-disabling. Currently each mapper looks at first 1k 
> (hard-coded) records to see if there's enough data size reduction (defaults 
> to 10x, configurable). The check would happen earlier if the hash table gets 
> full before processing the 1k records (hash table size is controlled by 
> pig.cachedbag.memusage). We might want to relax these thresholds.
> 2. Dependency on the combiner. Currently the PartialAgg won't work without a 
> combiner following it, so we need to provide separate options to enable each 
> independently. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2839) mock.Storage overwrites output with the last relation written when storing UNION

2012-07-26 Thread Aneesh Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423479#comment-13423479
 ] 

Aneesh Sharma commented on PIG-2839:


Thanks, Julien!!

> mock.Storage overwrites output with the last relation written when storing 
> UNION
> 
>
> Key: PIG-2839
> URL: https://issues.apache.org/jira/browse/PIG-2839
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2839.patch, PIG-2839_a.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2843) Typo in Documentation

2012-07-26 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated PIG-2843:
--

Attachment: PIG-2843-0.patch

> Typo in Documentation
> -
>
> Key: PIG-2843
> URL: https://issues.apache.org/jira/browse/PIG-2843
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.0
>Reporter: Eric Spishak
> Attachments: PIG-2843-0.patch
>
>
> There's a small typo in start.html (missing a space). The attached patch 
> fixes the issue.
> This same typo is in start.pdf as well, but I'm unsure how to update that 
> file. If someone can point me to directions I'll gladly add that to the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2843) Typo in Documentation

2012-07-26 Thread Eric Spishak (JIRA)
Eric Spishak created PIG-2843:
-

 Summary: Typo in Documentation
 Key: PIG-2843
 URL: https://issues.apache.org/jira/browse/PIG-2843
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.10.0
Reporter: Eric Spishak


There's a small typo in start.html (missing a space). The attached patch fixes 
the issue.

This same typo is in start.pdf as well, but I'm unsure how to update that file. 
If someone can point me to directions I'll gladly add that to the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2839) mock.Storage overwrites output with the last relation written when storing UNION

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2839.


   Resolution: Fixed
Fix Version/s: 0.11

> mock.Storage overwrites output with the last relation written when storing 
> UNION
> 
>
> Key: PIG-2839
> URL: https://issues.apache.org/jira/browse/PIG-2839
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2839.patch, PIG-2839_a.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2842) TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2842.


   Resolution: Fixed
Fix Version/s: 0.11

> TestNewPlanOperatorPlan fails when new Configuration() picks up a previous 
> minicluster conf file
> 
>
> Key: PIG-2842
> URL: https://issues.apache.org/jira/browse/PIG-2842
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2842.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2840) Fix SchemaTuple bugs

2012-07-26 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423419#comment-13423419
 ] 

Julien Le Dem commented on PIG-2840:


+1 Looks good to me

> Fix SchemaTuple bugs
> 
>
> Key: PIG-2840
> URL: https://issues.apache.org/jira/browse/PIG-2840
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2840-0.patch
>
>
> SchemaTuple had some subtle bugs that are now fixed.
> - hashCode should now be consistent with any normal Tuple
> - comparison had a subtle but nasty bug that is now fixed
> - some minor performance improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2842) TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2842:
---

Patch Info: Patch Available

> TestNewPlanOperatorPlan fails when new Configuration() picks up a previous 
> minicluster conf file
> 
>
> Key: PIG-2842
> URL: https://issues.apache.org/jira/browse/PIG-2842
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2842.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira





[jira] [Commented] (PIG-2842) TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file

2012-07-26 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423411#comment-13423411
 ] 

Jonathan Coveney commented on PIG-2842:
---

+1

> TestNewPlanOperatorPlan fails when new Configuration() picks up a previous 
> minicluster conf file
> 
>
> Key: PIG-2842
> URL: https://issues.apache.org/jira/browse/PIG-2842
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2842.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2842) TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file

2012-07-26 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-2842:
--

 Summary: TestNewPlanOperatorPlan fails when new Configuration() 
picks up a previous minicluster conf file
 Key: PIG-2842
 URL: https://issues.apache.org/jira/browse/PIG-2842
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Attachments: PIG-2842.patch



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2842) TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2842:
---

Attachment: PIG-2842.patch

PIG-2842.patch fixes this

> TestNewPlanOperatorPlan fails when new Configuration() picks up a previous 
> minicluster conf file
> 
>
> Key: PIG-2842
> URL: https://issues.apache.org/jira/browse/PIG-2842
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2842.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2839) mock.Storage overwrites output with the last relation written when storing UNION

2012-07-26 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423360#comment-13423360
 ] 

Jonathan Coveney commented on PIG-2839:
---

+1 assuming test-commit passes (since that depends on it)

> mock.Storage overwrites output with the last relation written when storing 
> UNION
> 
>
> Key: PIG-2839
> URL: https://issues.apache.org/jira/browse/PIG-2839
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2839.patch, PIG-2839_a.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2841) Inconsistent URL in Docs

2012-07-26 Thread Eric Spishak (JIRA)
Eric Spishak created PIG-2841:
-

 Summary: Inconsistent URL in Docs
 Key: PIG-2841
 URL: https://issues.apache.org/jira/browse/PIG-2841
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.10.0
Reporter: Eric Spishak
 Attachments: PIG-2841-0.patch

There are inconsistent links to "cont.html#Parameter-Sub" throughout the 
documentation. For some "Parameter-Sub" is all lowercase, some have it with the 
case shown here. 

At least for Chrome, this results in a broken link, where the browser won't 
scroll to the correct section in the page.

The attached patch updates all to use the "Parameter-Sub" casing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2841) Inconsistent URL in Docs

2012-07-26 Thread Eric Spishak (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Spishak updated PIG-2841:
--

Attachment: PIG-2841-0.patch

> Inconsistent URL in Docs
> 
>
> Key: PIG-2841
> URL: https://issues.apache.org/jira/browse/PIG-2841
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.10.0
>Reporter: Eric Spishak
> Attachments: PIG-2841-0.patch
>
>
> There are inconsistent links to "cont.html#Parameter-Sub" throughout the 
> documentation. For some "Parameter-Sub" is all lowercase, some have it with 
> the case shown here. 
> At least for Chrome, this results in a broken link, where the browser won't 
> scroll to the correct section in the page.
> The attached patch updates all to use the "Parameter-Sub" casing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2699) Reduce the number of instances of Load and Store Funcs down to 2+1. It should be 1 in the front-end and 1 in the backend

2012-07-26 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423356#comment-13423356
 ] 

Julien Le Dem commented on PIG-2699:


Hi Koji,
Yes this is "new Configuration()" picking up the config file generated for 
minicluster in the previous test.
I'll fix the test

> Reduce the number of instances of Load and Store Funcs down to 2+1. It should 
> be 1 in the front-end and 1 in the backend
> 
>
> Key: PIG-2699
> URL: https://issues.apache.org/jira/browse/PIG-2699
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.10.0
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2699.patch, PIG-2699_a.patch, PIG-2699_b.patch, 
> PIG-2699_c.patch, PIG-2699_d.patch, PIG-2699_e.patch, PIG-2699_f.patch
>
>
> Attached: a patch to get it down to 3
> Here is the report of the remaining calls.
> some methods are unnecessarily called multiple times, this should be improved 
> as well.
> {noformat}
> A = LOAD 'foo' USING TestLoadStoreFuncLifeCycle$Loader();
> STORE A INTO 'bar' USING TestLoadStoreFuncLifeCycle$Storer();
> report:
> 3 instances of Loader
> 20 calls to Loader
> 3 instances of Storer
> 24 calls to Storer
> all calls:
> Loader[1].()
> Loader[1].relativeToAbsolutePath(foo, 
> file:/Users/julien/svn/pig/trunk-LoadStoreFunc-lifecycle)
> Loader[1].setUDFContextSignature(A_1-0)
> Loader[1].getSchema(foo, org.apache.hadoop.mapreduce.Job@7ee49dcd)
> Storer[1].()
> Storer[1].setStoreFuncUDFContextSignature(A_1-1)
> Storer[1].relToAbsPathForStoreLocation(bar, 
> file:/Users/julien/svn/pig/trunk-LoadStoreFunc-lifecycle)
> Storer[1].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@776be68f)
> Storer[1].getOutputFormat()
> Loader[1].getStatistics(foo, org.apache.hadoop.mapreduce.Job@11e9c82e)
> Loader[1].setLocation(foo, org.apache.hadoop.mapreduce.Job@11e9c82e)
> Storer[1].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@57d840cd)
> Storer[2].()
> Storer[2].setStoreFuncUDFContextSignature(A_1-1)
> Storer[2].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@76996cca)
> Storer[2].getOutputFormat()
> Loader[2].()
> Loader[2].setUDFContextSignature(A_1-0)
> Loader[2].setLocation(foo, org.apache.hadoop.mapreduce.Job@317cfd38)
> Loader[2].getInputFormat()
> Storer[3].()
> Storer[3].setStoreFuncUDFContextSignature(A_1-1)
> Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@459d3b3a)
> Storer[3].getOutputFormat()
> Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@225f1ae9)
> Loader[3].()
> Loader[3].setUDFContextSignature(A_1-0)
> Loader[3].setLocation(foo, org.apache.hadoop.mapreduce.Job@6b98e8b4)
> Loader[3].getInputFormat()
> Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@5fb11b79)
> Storer[3].getOutputFormat()
> Storer[3].prepareToWrite(org.apache.pig.builtin.mock.Storage$MockRecordWriter@49b09282)
> Loader[3].setUDFContextSignature(A_1-0)
> Loader[3].prepareToRead(org.apache.pig.builtin.mock.Storage$MockRecordReader@2c8c7d6,
>  Number of splits :1...)
> Loader[3].getNext()
> Storer[3].putNext((a))
> Loader[3].getNext()
> Storer[3].putNext((b))
> Loader[3].getNext()
> Storer[3].putNext((c))
> Loader[3].getNext()
> Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@3ebfbbe3)
> Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@14d964af)
> Storer[1].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job@644ca6b6)
> constructor calls:
> Loader[1]. called by 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
> org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:426)
> org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3170)
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1293)
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
> org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1602)
> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
> org.apache.pig.PigServer.registerQuery(PigServer.java:534)
> org.apache.pig.PigServer.registerQuery(PigServer.java:547)
> Storer[1]. called by 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:565)
> org.apache.pig.parser.LogicalPlanBuilder.buildStoreOp(LogicalPlanBuilder.java:486)
> org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGen

[jira] [Updated] (PIG-2839) mock.Storage overwrites output with the last relation written when storing UNION

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2839:
---

Attachment: PIG-2839_a.patch

PIG-2839_a.patch fixes the bug

> mock.Storage overwrites output with the last relation written when storing 
> UNION
> 
>
> Key: PIG-2839
> URL: https://issues.apache.org/jira/browse/PIG-2839
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2839.patch, PIG-2839_a.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2779) Refactoring the code for setting number of reducers

2012-07-26 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2779:


Attachment: PIG-2779.3.patch

Attached PIG-2779.3.patch for setting various parallelism into job conf for 
later ananlysis, as suggested by Bill. Also add unit tests for testing them.

> Refactoring the code for setting number of reducers
> ---
>
> Key: PIG-2779
> URL: https://issues.apache.org/jira/browse/PIG-2779
> Project: Pig
>  Issue Type: Bug
>Reporter: Jie Li
>Assignee: Jie Li
> Fix For: 0.11
>
> Attachments: PIG-2779.0.patch, PIG-2779.1.patch, PIG-2779.2.patch, 
> PIG-2779.3.patch, TestNumberOfReducers.java, TestNumberOfReducers.java
>
>
> As PIG-2652 observed, currently the code for setting number of reducers is a 
> little messy. MapReduceOper.requestedParallelism seems being misused in some 
> plases, and now we support runtime estimation of #reducer which further 
> complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated 
> #reducer will be used. If we specify parallel 2 while it estimates 4, 
> order-by will fail due to "Illegal partition for Null". If we specify 
> parallel 4 while it estimates 2, then some reducers will have nothing to do. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2839) mock.Storage overwrites output with the last relation written when storing UNION

2012-07-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2839:
---

Attachment: PIG-2839.patch

Attaching a test reproducing the bug

> mock.Storage overwrites output with the last relation written when storing 
> UNION
> 
>
> Key: PIG-2839
> URL: https://issues.apache.org/jira/browse/PIG-2839
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2839.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2840) Fix SchemaTuple bugs

2012-07-26 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2840:
--

Status: Patch Available  (was: Open)

> Fix SchemaTuple bugs
> 
>
> Key: PIG-2840
> URL: https://issues.apache.org/jira/browse/PIG-2840
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2840-0.patch
>
>
> SchemaTuple had some subtle bugs that are now fixed.
> - hashCode should now be consistent with any normal Tuple
> - comparison had a subtle but nasty bug that is now fixed
> - some minor performance improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2840) Fix SchemaTuple bugs

2012-07-26 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2840:
--

Attachment: PIG-2840-0.patch

> Fix SchemaTuple bugs
> 
>
> Key: PIG-2840
> URL: https://issues.apache.org/jira/browse/PIG-2840
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2840-0.patch
>
>
> SchemaTuple had some subtle bugs that are now fixed.
> - hashCode should now be consistent with any normal Tuple
> - comparison had a subtle but nasty bug that is now fixed
> - some minor performance improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2840) Fix SchemaTuple bugs

2012-07-26 Thread Jonathan Coveney (JIRA)
Jonathan Coveney created PIG-2840:
-

 Summary: Fix SchemaTuple bugs
 Key: PIG-2840
 URL: https://issues.apache.org/jira/browse/PIG-2840
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.11


SchemaTuple had some subtle bugs that are now fixed.
- hashCode should now be consistent with any normal Tuple
- comparison had a subtle but nasty bug that is now fixed
- some minor performance improvements

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2829) Use partial aggregation more aggresively

2012-07-26 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2829:


Attachment: 2829.separate.options.patch

Attached an initial patch that separates options for enabling combiner and 
mapagg. Now both combiner and mapagg will trigger the CombinerOptimization, and 
the combiner plan will be removed if the combiner is not enabled.

> Use partial aggregation more aggresively
> 
>
> Key: PIG-2829
> URL: https://issues.apache.org/jira/browse/PIG-2829
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Jie Li
> Attachments: 2829.separate.options.patch, pigmix-10G.png, tpch-10G.png
>
>
> Partial aggregation (Hash Aggregation, aka in-map combiner) is a new feature 
> in Pig 0.10 that will perform aggregation within map function. The main 
> advantage against combiner is it avoids de/serializing and sorting the data, 
> and it can auto disable itself if the data reduction rate is low. Currently 
> it's disabled by default.
> To leverage the power of PartialAgg more aggressively, several things need to 
> be revisited:
> 1. The threshold of auto-disabling. Currently each mapper looks at first 1k 
> (hard-coded) records to see if there's enough data size reduction (defaults 
> to 10x, configurable). The check would happen earlier if the hash table gets 
> full before processing the 1k records (hash table size is controlled by 
> pig.cachedbag.memusage). We might want to relax these thresholds.
> 2. Dependency on the combiner. Currently the PartialAgg won't work without a 
> combiner following it, so we need to provide separate options to enable each 
> independently. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2839) mock.Storage overwrites output with the last relation written when storing UNION

2012-07-26 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-2839:
--

 Summary: mock.Storage overwrites output with the last relation 
written when storing UNION
 Key: PIG-2839
 URL: https://issues.apache.org/jira/browse/PIG-2839
 Project: Pig
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2729) Macro expansion does not use pig.import.search.path - UnitTest borked

2012-07-26 Thread Johannes Schwenk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422978#comment-13422978
 ] 

Johannes Schwenk commented on PIG-2729:
---

Thanks Rohini!

Meanwhile I was able to create a review request after a hint on the mailing 
list.

https://reviews.apache.org/r/6150/

> Macro expansion does not use pig.import.search.path - UnitTest borked
> -
>
> Key: PIG-2729
> URL: https://issues.apache.org/jira/browse/PIG-2729
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
> Environment: pig-0.9.2 and pig-0.10.0, hadoop-0.20.2 from Clouderas 
> distribution cdh3u3 on Kubuntu 12.04 64Bit.
>Reporter: Johannes Schwenk
> Fix For: 0.10.0
>
> Attachments: PIG-2729.patch, PIG-2729.patch, PIG-2729.patch, 
> PIG-2729.patch, test-macros.tar.gz, use-search-path-for-imports.patch
>
>
> org.apache.pig.test.TestMacroExpansion, in function importUsingSearchPathTest 
> the import statement is provided with the full path to /tmp/mytest2.pig so 
> the pig.import.search.path is never used. I changed the import to 
> import 'mytest2.pig';
> and ran the UnitTest again. This time the test failed as expected from my 
> experience from earlier this day trying in vain to get pig eat my 
> pig.import.search.path property! Other properties in the same custom 
> properties file (provided via -propertyFile command line option) like 
> udf.import.list get read without any problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2837) AvroStorage throws StackOverFlowError

2012-07-26 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423014#comment-13423014
 ] 

Harsh J commented on PIG-2837:
--

I can imagine this happening if there's a self-reference in the schema. For 
example, an array of the record name itself being a field inside the schema. In 
this case, the record's type and schema is re-resolved and it enters an 
infinite loop.

I suppose we could keep a reference of names (with namespace, for correctness) 
and only re-call if we've not already visited it?

> AvroStorage throws StackOverFlowError
> -
>
> Key: PIG-2837
> URL: https://issues.apache.org/jira/browse/PIG-2837
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Mubarak Seyed
>
> When i try to dump avro data using
> {code}
> records = LOAD '/logs/records/07262012/01/1/Record.1343265732700.avro' using 
> org.apache.pig.piggybank.storage.avro.AvroStorage(); 
> dump records;
> {code}
> {code}
> Pig Stack Trace 
> --- 
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError 
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:258)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:262)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:271)
>  
> at 
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.containsGenericUnion(AvroStorageUtils.java:284)
> {code}
> I did verify the avro schema using avro-tools and dump the data as json 
> format, data looks good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2838) Improve performance using sort avoidance in reduce phase

2012-07-26 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-2838:
---

 Summary: Improve performance using sort avoidance in reduce phase
 Key: PIG-2838
 URL: https://issues.apache.org/jira/browse/PIG-2838
 Project: Pig
  Issue Type: New Feature
Reporter: Rohini Palaniswamy


Pig should take advantage of sort avoidance in core mapreduce once 
MAPREDUCE-4039 is done for hash aggregation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2779) Refactoring the code for setting number of reducers

2012-07-26 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422915#comment-13422915
 ] 

Bill Graham commented on PIG-2779:
--

I just checked and {{default_parallel}} doesn't show up in the jobconf when set 
so we should add it in the same format:

{noformat}
pig.info.reducers.default.parallel
{noformat}

Since {{pig.info.reducers.runtime.parallel}} is what becomes 
{{mapred.reduce.tasks}}, no we don't need that one.


> Refactoring the code for setting number of reducers
> ---
>
> Key: PIG-2779
> URL: https://issues.apache.org/jira/browse/PIG-2779
> Project: Pig
>  Issue Type: Bug
>Reporter: Jie Li
>Assignee: Jie Li
> Fix For: 0.11
>
> Attachments: PIG-2779.0.patch, PIG-2779.1.patch, PIG-2779.2.patch, 
> TestNumberOfReducers.java, TestNumberOfReducers.java
>
>
> As PIG-2652 observed, currently the code for setting number of reducers is a 
> little messy. MapReduceOper.requestedParallelism seems being misused in some 
> plases, and now we support runtime estimation of #reducer which further 
> complicates the problem.
> For example, if we specify parallel 1 for the order-by, the estimated 
> #reducer will be used. If we specify parallel 2 while it estimates 4, 
> order-by will fail due to "Illegal partition for Null". If we specify 
> parallel 4 while it estimates 2, then some reducers will have nothing to do. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Max tuples that can be handled by a reducer

2012-07-26 Thread Prasanth J
Hi

I wanted to find the maximum number of tuples a reducer can handle. For that I 
am using the following inside an UDF in a sampling job

maxTuples = Runtime.getRuntime.maxMemory() / tuple.getInMemorySize();

I am little skeptical about the maxMemory() usage as it will be different in 
sampling job and the actual job. 
Does this always provide a good estimate of the max tuples a reducer can handle?

Thanks
-- Prasanth



Re: Reviewboard gives Error 500

2012-07-26 Thread Johannes Schwenk
Am 25.07.2012 20:01, schrieb Prasanth J:
> Is your patch generated using git? If so try using "pig-git" trunk instead of 
> pig trunk.

Thanks, that was it. Did not see the second entry for git... My bad, sorry!

Thanks,
Johannes

> Thanks
> -- Prasanth
> 
> On Jul 25, 2012, at 4:53 AM, Johannes Schwenk wrote:
> 
>> The review boards seems to have an issue with my requests. I always get
>>
>> --
>> Something broke! (Error 500)
>>
>> It appears something broke when you tried to go to here. This is either
>> a bug in Review Board or a server configuration error. Please report
>> this to your administrator.
>> --
>>
>> I just wanted to post a review request for pig, /tags/release-0.10.0/,
>> and PIG-2729.patch. I get the above error with Firefox and Chromium.
>> Does anybody have a clue why that happens?
>>
>> Thanks,
>>
>> Johannes Schwenk
>>
>> -- 
>> Softwareentwickler (Reporting)
>> 
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
> 
> 



Johannes Schwenk

-- 
Softwareentwickler (Reporting)


ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434



signature.asc
Description: OpenPGP digital signature