Re: Pig 11.0

2013-01-28 Thread Bill Graham
I've just committed the 2 documentation jiras and all Pig 0.11 issues are
resolved once again.


On Mon, Jan 28, 2013 at 7:22 PM, Rohini Palaniswamy  wrote:

> e2e tests are fine on my end too except for hcat tests (Did not have hcat
> setup) and few transient Errors test failures which is a known issue. Ran
> with Hadoop 0.23.
>
> Regards,
> Rohini
>
>
> On Mon, Jan 28, 2013 at 4:44 PM, Julien Le Dem  wrote:
>
> > It sounds like we are ready to go.
> > There are two remaining documentation JIRAs that will be committed soon.
> > Daniel, do you want to build the release?
> > Julien
> >
> > On Mon, Jan 28, 2013 at 10:12 AM, Daniel Dai 
> > wrote:
> >
> > > All tests pass for me, no single failures or aborts.
> > >
> > > Daniel
> > >
> > > On Mon, Jan 28, 2013 at 8:33 AM, Cheolsoo Park 
> > > wrote:
> > > > Here are my results:
> > > >
> > > > Hadoop-1.0.x:  [exec] Final results ,PASSED: 612  FAILED: 2
> > > >  SKIPPED: 24   ABORTED: 1FAILED DEPENDENCY: 0
> > > > - The failures are simply because I didn't install Hcatalog.
> > > >
> > > > Hadoop-2.0.x:  [exec] Final results ,PASSED: 567  FAILED: 3
> > > >  SKIPPED: 27   ABORTED: 42   FAILED DEPENDENCY: 0
> > > > - The failures seem due to issues in my cluster rather than Pig
> > issues. I
> > > > will re-run them to verify.
> > > >
> > > >
> > > >
> > > > On Fri, Jan 25, 2013 at 5:31 PM, Cheolsoo Park <
> cheol...@cloudera.com
> > > >wrote:
> > > >
> > > >> I will also run e2e on Hadoop-1.x and Hadoop-2.x.
> > > >>
> > > >>
> > > >> On Fri, Jan 25, 2013 at 5:02 PM, Daniel Dai 
> > > wrote:
> > > >>
> > > >>> I will run e2e tests on Hadoop 1.x over the weekend.
> > > >>>
> > > >>> Thanks,
> > > >>> Daniel
> > > >>>
> > > >>> On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
> > > >>>  wrote:
> > > >>> >  Thats good :). Unit tests have all been passing. I haven't run
> e2e
> > > >>> tests
> > > >>> > on pig 0.11 for sometime. Will kick off one this weekend. It
> would
> > be
> > > >>> nice
> > > >>> > if Cheolsoo and Daniel can also kick off one run.
> > > >>> >
> > > >>> > Regards,
> > > >>> > Rohini
> > > >>> >
> > > >>> >
> > > >>> > On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem <
> jul...@twitter.com
> > >
> > > >>> wrote:
> > > >>> >
> > > >>> >> It looks like all the tickets for Pig 0.11 have been resolved as
> > of
> > > >>> today.
> > > >>> >> See:
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
> > > >>> >>
> > > >>> >> I propose we make the release 0.11.0 next week.
> > > >>> >>
> > > >>> >> Julien
> > > >>> >>
> > > >>>
> > > >>
> > > >>
> > >
> >
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgra...@gmail.com going forward.*


[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3141:
-

Fix Version/s: (was: 0.11)
   0.12

> Giving CSVExcelStorage an option to handle header rows
> --
>
> Key: PIG-3141
> URL: https://issues.apache.org/jira/browse/PIG-3141
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
> Fix For: 0.12
>
> Attachments: csv.patch
>
>
> Adds an argument to CSVExcelStorage to skip the header row when loading. This 
> works properly with multiple small files each with a header being combined 
> into one split, or a large file with a single header being split into 
> multiple splits.
> Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
> involving quoted fields at the end of a line not escaping properly.
> Removes the choice of delimiter, since a CSV file ought to only use a comma 
> delimiter, hence the name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3140:
-

Release Note:   (was: Committed to trunk and 0.11 branch.)

Committed to trunk an 0.11 branch.

> Document PigProgressNotificationListener configs
> 
>
> Key: PIG-3140
> URL: https://issues.apache.org/jira/browse/PIG-3140
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3140_1.patch
>
>
> Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2756) Documentation for 0.11

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham resolved PIG-2756.
--

Resolution: Fixed

All known documentation issues for Pig 0.11 have been resolved, closing.

> Documentation for 0.11
> --
>
> Key: PIG-2756
> URL: https://issues.apache.org/jira/browse/PIG-2756
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.11
>Reporter: Bill Graham
>Assignee: Olga Natkovich
> Fix For: 0.11
>
>
> Tracking areas where we need documentation on the pig.apache.org site 
> (Javadocs are typically pretty good). We can open child tasks as needed. 
> Please add to the list if you know of others.
> * Pluggable {{PigProgressNotificationListener}} isn't in the docs
> * Pluggable reducer estimators (see PIG-2574)
> * ILLUSTRATE seems to have dropped off the docs
> * {{HBaseStorage}} (see PIG-2341)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3139) Document reducer estimation

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3139:
-

Release Note:   (was: Committed to trunk an 0.11 branch.)

Committed to trunk and 0.11 branch.

> Document reducer estimation
> ---
>
> Key: PIG-3139
> URL: https://issues.apache.org/jira/browse/PIG-3139
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3139_1.patch
>
>
> Add docs to describe how default reducer estimation algo works and how to 
> override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3139) Document reducer estimation

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3139:
-

  Resolution: Fixed
Release Note: Committed to trunk an 0.11 branch.
  Status: Resolved  (was: Patch Available)

> Document reducer estimation
> ---
>
> Key: PIG-3139
> URL: https://issues.apache.org/jira/browse/PIG-3139
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3139_1.patch
>
>
> Add docs to describe how default reducer estimation algo works and how to 
> override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3140:
-

  Resolution: Fixed
Release Note: Committed to trunk and 0.11 branch.
  Status: Resolved  (was: Patch Available)

> Document PigProgressNotificationListener configs
> 
>
> Key: PIG-3140
> URL: https://issues.apache.org/jira/browse/PIG-3140
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3140_1.patch
>
>
> Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted

2013-01-28 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565042#comment-13565042
 ] 

Prashant Kommireddi commented on PIG-3145:
--

Agreed. I am guessing your changes would go into HExecutionEngine primarily. 
Would be good to get both changes in around the same time.

> Parameters in core-site.xml and mapred-site.xml are not correctly substituted
> -
>
> Key: PIG-3145
> URL: https://issues.apache.org/jira/browse/PIG-3145
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>
> To reproduce the issue, please do the following:
> # Parameterize the address of name node in core-site.xml.
> {code}
>   
> fs.default.name
> hdfs://${foo}:8020
>   
> {code}
> # Set the value of "foo" via -D option.
> {code}
> export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com"
> {code}
> # Pig fails with the following error.
> {code}
> 2013-01-28 18:54:02,786 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://${foo}:8020
> 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. null
> Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log
> {code}
> Note that the parameter $\{foo\} in core-site.xml is not expanded. This is 
> because the addresses of name node and job tracker are read directly from 
> core-site.xml instead of reading via Configuration.get().
> {code:title=HExecutionEngine.java}
> // properties is Java Properties
> cluster = properties.getProperty(JOB_TRACKER_LOCATION);
> nameNode = properties.getProperty(FILE_SYSTEM_LOCATION);
> {code}
> Replacing these lines with Configuration.get() fixes the issue.
> {code:title=HExecutionEngine.java}
> // jc is Hadoop Configuration
> cluster = jc.get(JOB_TRACKER_LOCATION);
> nameNode = jc.get(FILE_SYSTEM_LOCATION);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted

2013-01-28 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565038#comment-13565038
 ] 

Cheolsoo Park commented on PIG-3145:


Hi Prashant,

Thank you very much for pointing that out.

Indeed, PIg-3135 is in a similar category, but but I think that my problem is a 
bit different from yours. In my case, *-site.xml files are present in 
classpath, but system properties (-Dkey=value) that are passed to JVM are not 
honored. They are different, aren't they?

> Parameters in core-site.xml and mapred-site.xml are not correctly substituted
> -
>
> Key: PIG-3145
> URL: https://issues.apache.org/jira/browse/PIG-3145
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>
> To reproduce the issue, please do the following:
> # Parameterize the address of name node in core-site.xml.
> {code}
>   
> fs.default.name
> hdfs://${foo}:8020
>   
> {code}
> # Set the value of "foo" via -D option.
> {code}
> export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com"
> {code}
> # Pig fails with the following error.
> {code}
> 2013-01-28 18:54:02,786 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://${foo}:8020
> 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. null
> Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log
> {code}
> Note that the parameter $\{foo\} in core-site.xml is not expanded. This is 
> because the addresses of name node and job tracker are read directly from 
> core-site.xml instead of reading via Configuration.get().
> {code:title=HExecutionEngine.java}
> // properties is Java Properties
> cluster = properties.getProperty(JOB_TRACKER_LOCATION);
> nameNode = properties.getProperty(FILE_SYSTEM_LOCATION);
> {code}
> Replacing these lines with Configuration.get() fixes the issue.
> {code:title=HExecutionEngine.java}
> // jc is Hadoop Configuration
> cluster = jc.get(JOB_TRACKER_LOCATION);
> nameNode = jc.get(FILE_SYSTEM_LOCATION);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 11.0

2013-01-28 Thread Rohini Palaniswamy
e2e tests are fine on my end too except for hcat tests (Did not have hcat
setup) and few transient Errors test failures which is a known issue. Ran
with Hadoop 0.23.

Regards,
Rohini


On Mon, Jan 28, 2013 at 4:44 PM, Julien Le Dem  wrote:

> It sounds like we are ready to go.
> There are two remaining documentation JIRAs that will be committed soon.
> Daniel, do you want to build the release?
> Julien
>
> On Mon, Jan 28, 2013 at 10:12 AM, Daniel Dai 
> wrote:
>
> > All tests pass for me, no single failures or aborts.
> >
> > Daniel
> >
> > On Mon, Jan 28, 2013 at 8:33 AM, Cheolsoo Park 
> > wrote:
> > > Here are my results:
> > >
> > > Hadoop-1.0.x:  [exec] Final results ,PASSED: 612  FAILED: 2
> > >  SKIPPED: 24   ABORTED: 1FAILED DEPENDENCY: 0
> > > - The failures are simply because I didn't install Hcatalog.
> > >
> > > Hadoop-2.0.x:  [exec] Final results ,PASSED: 567  FAILED: 3
> > >  SKIPPED: 27   ABORTED: 42   FAILED DEPENDENCY: 0
> > > - The failures seem due to issues in my cluster rather than Pig
> issues. I
> > > will re-run them to verify.
> > >
> > >
> > >
> > > On Fri, Jan 25, 2013 at 5:31 PM, Cheolsoo Park  > >wrote:
> > >
> > >> I will also run e2e on Hadoop-1.x and Hadoop-2.x.
> > >>
> > >>
> > >> On Fri, Jan 25, 2013 at 5:02 PM, Daniel Dai 
> > wrote:
> > >>
> > >>> I will run e2e tests on Hadoop 1.x over the weekend.
> > >>>
> > >>> Thanks,
> > >>> Daniel
> > >>>
> > >>> On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
> > >>>  wrote:
> > >>> >  Thats good :). Unit tests have all been passing. I haven't run e2e
> > >>> tests
> > >>> > on pig 0.11 for sometime. Will kick off one this weekend. It would
> be
> > >>> nice
> > >>> > if Cheolsoo and Daniel can also kick off one run.
> > >>> >
> > >>> > Regards,
> > >>> > Rohini
> > >>> >
> > >>> >
> > >>> > On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem  >
> > >>> wrote:
> > >>> >
> > >>> >> It looks like all the tickets for Pig 0.11 have been resolved as
> of
> > >>> today.
> > >>> >> See:
> > >>> >>
> > >>> >>
> > >>>
> >
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
> > >>> >>
> > >>> >> I propose we make the release 0.11.0 next week.
> > >>> >>
> > >>> >> Julien
> > >>> >>
> > >>>
> > >>
> > >>
> >
>


[jira] [Commented] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted

2013-01-28 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565033#comment-13565033
 ] 

Prashant Kommireddi commented on PIG-3145:
--

Hi Cheolsoo, this might also be in a similar category as PIG-3135. Currently 
Pig expects the site xml files on the classpath - again Configuration is not 
used.

> Parameters in core-site.xml and mapred-site.xml are not correctly substituted
> -
>
> Key: PIG-3145
> URL: https://issues.apache.org/jira/browse/PIG-3145
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>
> To reproduce the issue, please do the following:
> # Parameterize the address of name node in core-site.xml.
> {code}
>   
> fs.default.name
> hdfs://${foo}:8020
>   
> {code}
> # Set the value of "foo" via -D option.
> {code}
> export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com"
> {code}
> # Pig fails with the following error.
> {code}
> 2013-01-28 18:54:02,786 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://${foo}:8020
> 2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. null
> Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log
> {code}
> Note that the parameter $\{foo\} in core-site.xml is not expanded. This is 
> because the addresses of name node and job tracker are read directly from 
> core-site.xml instead of reading via Configuration.get().
> {code:title=HExecutionEngine.java}
> // properties is Java Properties
> cluster = properties.getProperty(JOB_TRACKER_LOCATION);
> nameNode = properties.getProperty(FILE_SYSTEM_LOCATION);
> {code}
> Replacing these lines with Configuration.get() fixes the issue.
> {code:title=HExecutionEngine.java}
> // jc is Hadoop Configuration
> cluster = jc.get(JOB_TRACKER_LOCATION);
> nameNode = jc.get(FILE_SYSTEM_LOCATION);
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3145) Parameters in core-site.xml and mapred-site.xml are not correctly substituted

2013-01-28 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3145:
--

 Summary: Parameters in core-site.xml and mapred-site.xml are not 
correctly substituted
 Key: PIG-3145
 URL: https://issues.apache.org/jira/browse/PIG-3145
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park


To reproduce the issue, please do the following:

# Parameterize the address of name node in core-site.xml.
{code}
  
fs.default.name
hdfs://${foo}:8020
  
{code}
# Set the value of "foo" via -D option.
{code}
export PIG_OPTS="-Dfoo=mr1-0.cheolsoo.com"
{code}
# Pig fails with the following error.
{code}
2013-01-28 18:54:02,786 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://${foo}:8020
2013-01-28 18:54:02,805 [main] ERROR org.apache.pig.Main - ERROR 2999: 
Unexpected internal error. null
Details at logfile: /home/cheolsoo/pig-cdh/pig_1359428042522.log
{code}
Note that the parameter $\{foo\} in core-site.xml is not expanded. This is 
because the addresses of name node and job tracker are read directly from 
core-site.xml instead of reading via Configuration.get().
{code:title=HExecutionEngine.java}
// properties is Java Properties
cluster = properties.getProperty(JOB_TRACKER_LOCATION);
nameNode = properties.getProperty(FILE_SYSTEM_LOCATION);
{code}
Replacing these lines with Configuration.get() fixes the issue.
{code:title=HExecutionEngine.java}
// jc is Hadoop Configuration
cluster = jc.get(JOB_TRACKER_LOCATION);
nameNode = jc.get(FILE_SYSTEM_LOCATION);
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-01-28 Thread jira
Issue Subscription
Filter: PIG patch available (27 issues)

Subscriber: pigdaily

Key Summary
PIG-3140Document PigProgressNotificationListener configs
https://issues.apache.org/jira/browse/PIG-3140
PIG-3139Document reducer estimation
https://issues.apache.org/jira/browse/PIG-3139
PIG-3136Introduce a syntax making declared aliases optional
https://issues.apache.org/jira/browse/PIG-3136
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3108HBaseStorage returns empty maps when mixing wildcard- with other 
columns
https://issues.apache.org/jira/browse/PIG-3108
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3098Add another test for the self join case
https://issues.apache.org/jira/browse/PIG-3098
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2266bug with input file joining optimization in Pig
https://issues.apache.org/jira/browse/PIG-2266
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-01-28 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564926#comment-13564926
 ] 

Joseph Adler commented on PIG-3015:
---

Sorry, didn't mean to submit a patch with Avro 1.7.4-SNAPSHOT. I added a couple 
optimizations to Trevni so that the performance was comparable with Avro. (I'll 
submit that patch to Avro.)

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, 
> PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, 
> TestInput.java, Test.java
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2266) bug with input file joining optimization in Pig

2013-01-28 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564924#comment-13564924
 ] 

Joseph Adler commented on PIG-2266:
---

Thanks for adding this fix!

> bug with input file joining optimization in Pig
> ---
>
> Key: PIG-2266
> URL: https://issues.apache.org/jira/browse/PIG-2266
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: PIG-2266.patch
>
>
> In 
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java,
>  the function hasTooManyInputFiles instantiated a LoadFunc instance, then 
> calls setLocation before calling setUDFContextSignature. This is inconsistent 
> with the documentation for the LoadFunc interface (see 
> http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/LoadFunc.html#setUDFContextSignature(java.lang.String)).
>  (We've written UDFs that assume that setUDFContextSignature is called first.)
> I think you can fix this by adding 
>loader.setUDFContextSignature(ld.getSignature());
> Before
>loader.setLocation(location, job);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2266) bug with input file joining optimization in Pig

2013-01-28 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564920#comment-13564920
 ] 

Cheolsoo Park commented on PIG-2266:


Thank you Santhosh for the review. I will commit it after running tests.

> bug with input file joining optimization in Pig
> ---
>
> Key: PIG-2266
> URL: https://issues.apache.org/jira/browse/PIG-2266
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: PIG-2266.patch
>
>
> In 
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java,
>  the function hasTooManyInputFiles instantiated a LoadFunc instance, then 
> calls setLocation before calling setUDFContextSignature. This is inconsistent 
> with the documentation for the LoadFunc interface (see 
> http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/LoadFunc.html#setUDFContextSignature(java.lang.String)).
>  (We've written UDFs that assume that setUDFContextSignature is called first.)
> I think you can fix this by adding 
>loader.setUDFContextSignature(ld.getSignature());
> Before
>loader.setLocation(location, job);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 11.0

2013-01-28 Thread Julien Le Dem
It sounds like we are ready to go.
There are two remaining documentation JIRAs that will be committed soon.
Daniel, do you want to build the release?
Julien

On Mon, Jan 28, 2013 at 10:12 AM, Daniel Dai  wrote:

> All tests pass for me, no single failures or aborts.
>
> Daniel
>
> On Mon, Jan 28, 2013 at 8:33 AM, Cheolsoo Park 
> wrote:
> > Here are my results:
> >
> > Hadoop-1.0.x:  [exec] Final results ,PASSED: 612  FAILED: 2
> >  SKIPPED: 24   ABORTED: 1FAILED DEPENDENCY: 0
> > - The failures are simply because I didn't install Hcatalog.
> >
> > Hadoop-2.0.x:  [exec] Final results ,PASSED: 567  FAILED: 3
> >  SKIPPED: 27   ABORTED: 42   FAILED DEPENDENCY: 0
> > - The failures seem due to issues in my cluster rather than Pig issues. I
> > will re-run them to verify.
> >
> >
> >
> > On Fri, Jan 25, 2013 at 5:31 PM, Cheolsoo Park  >wrote:
> >
> >> I will also run e2e on Hadoop-1.x and Hadoop-2.x.
> >>
> >>
> >> On Fri, Jan 25, 2013 at 5:02 PM, Daniel Dai 
> wrote:
> >>
> >>> I will run e2e tests on Hadoop 1.x over the weekend.
> >>>
> >>> Thanks,
> >>> Daniel
> >>>
> >>> On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
> >>>  wrote:
> >>> >  Thats good :). Unit tests have all been passing. I haven't run e2e
> >>> tests
> >>> > on pig 0.11 for sometime. Will kick off one this weekend. It would be
> >>> nice
> >>> > if Cheolsoo and Daniel can also kick off one run.
> >>> >
> >>> > Regards,
> >>> > Rohini
> >>> >
> >>> >
> >>> > On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem 
> >>> wrote:
> >>> >
> >>> >> It looks like all the tickets for Pig 0.11 have been resolved as of
> >>> today.
> >>> >> See:
> >>> >>
> >>> >>
> >>>
> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
> >>> >>
> >>> >> I propose we make the release 0.11.0 next week.
> >>> >>
> >>> >> Julien
> >>> >>
> >>>
> >>
> >>
>


[jira] [Commented] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564901#comment-13564901
 ] 

Julien Le Dem commented on PIG-3140:


+1

> Document PigProgressNotificationListener configs
> 
>
> Key: PIG-3140
> URL: https://issues.apache.org/jira/browse/PIG-3140
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3140_1.patch
>
>
> Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3139) Document reducer estimation

2013-01-28 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564902#comment-13564902
 ] 

Julien Le Dem commented on PIG-3139:


+1

> Document reducer estimation
> ---
>
> Key: PIG-3139
> URL: https://issues.apache.org/jira/browse/PIG-3139
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3139_1.patch
>
>
> Add docs to describe how default reducer estimation algo works and how to 
> override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2266) bug with input file joining optimization in Pig

2013-01-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564862#comment-13564862
 ] 

Santhosh Srinivasan commented on PIG-2266:
--

+1 to the patch.

> bug with input file joining optimization in Pig
> ---
>
> Key: PIG-2266
> URL: https://issues.apache.org/jira/browse/PIG-2266
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: PIG-2266.patch
>
>
> In 
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java,
>  the function hasTooManyInputFiles instantiated a LoadFunc instance, then 
> calls setLocation before calling setUDFContextSignature. This is inconsistent 
> with the documentation for the LoadFunc interface (see 
> http://pig.apache.org/docs/r0.9.0/api/org/apache/pig/LoadFunc.html#setUDFContextSignature(java.lang.String)).
>  (We've written UDFs that assume that setUDFContextSignature is called first.)
> I think you can fix this by adding 
>loader.setUDFContextSignature(ld.getSignature());
> Before
>loader.setLocation(location, job);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564841#comment-13564841
 ] 

Dmitriy V. Ryaboy commented on PIG-3140:


+1

> Document PigProgressNotificationListener configs
> 
>
> Key: PIG-3140
> URL: https://issues.apache.org/jira/browse/PIG-3140
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3140_1.patch
>
>
> Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3139) Document reducer estimation

2013-01-28 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564840#comment-13564840
 ] 

Dmitriy V. Ryaboy commented on PIG-3139:


+1

> Document reducer estimation
> ---
>
> Key: PIG-3139
> URL: https://issues.apache.org/jira/browse/PIG-3139
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3139_1.patch
>
>
> Add docs to describe how default reducer estimation algo works and how to 
> override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3144) Erroneous map entry alias resolution leading to "Duplicate schema alias" errors

2013-01-28 Thread Kai Londenberg (JIRA)
Kai Londenberg created PIG-3144:
---

 Summary: Erroneous map entry alias resolution leading to 
"Duplicate schema alias" errors
 Key: PIG-3144
 URL: https://issues.apache.org/jira/browse/PIG-3144
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Kai Londenberg



The following code illustrates a problem concerning alias resolution in pig 
0.10.x. The schema of D2 will incorrectly be described as containing two "age" 
fields. And the last step in the following script will lead to a "Duplicate 
schema alias" error message.

I only encountered this bug when using aliases for map fields. 

{code}
DATA = LOAD 'file:///whatever' as (a:map[chararray], b:chararray);

D1 = FOREACH DATA GENERATE a#'name' as name, a#'age' as age, b;

D2 = FOREACH D1 GENERATE name, age, b;

DESCRIBE D2;

{code}

Output:
{code}
D2: {
age: chararray,
age: chararray,
b: chararray
}
{code}

{code}

D3 = FOREACH D2 GENERATE *;

DESCRIBE D3;
{code}

Output:

{code}
 Duplicate schema 
alias: age
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3143) Enable TOKENIZE to use any configurable Lucene Tokenizer, if a config parameter is set and the JARs included

2013-01-28 Thread Russell Jurney (JIRA)
Russell Jurney created PIG-3143:
---

 Summary: Enable TOKENIZE to use any configurable Lucene Tokenizer, 
if a config parameter is set and the JARs included
 Key: PIG-3143
 URL: https://issues.apache.org/jira/browse/PIG-3143
 Project: Pig
  Issue Type: Improvement
  Components: impl, internal-udfs
Affects Versions: 0.11
Reporter: Russell Jurney
Assignee: Jonathan Coveney
 Fix For: 0.12


I'll do this in time for 12. TOKENIZE is literally useless as is. See: 

http://thedatachef.blogspot.com/2011/04/lucene-text-tokenization-udf-for-apache.html
https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3142) Fixed-width load and store functions for the Piggybank

2013-01-28 Thread Jonathan Packer (JIRA)
Jonathan Packer created PIG-3142:


 Summary: Fixed-width load and store functions for the Piggybank
 Key: PIG-3142
 URL: https://issues.apache.org/jira/browse/PIG-3142
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.11
Reporter: Jonathan Packer
 Attachments: fixed-width.patch

Adds load/store functions for fixed width data to the Piggybank. They use the 
syntax of the unix "cut" command to specify column positions, and have an 
option to skip the header row when loading or to write a header row when 
storing.

The header handling works properly with multiple small files each with a header 
being combined into one split, or a large file with a single header being split 
into multiple splits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3142) Fixed-width load and store functions for the Piggybank

2013-01-28 Thread Jonathan Packer (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Packer updated PIG-3142:
-

Attachment: fixed-width.patch

Patch adds FixedWidthLoader, FixedWidthStorage, and unit tests as per the issue 
description.

> Fixed-width load and store functions for the Piggybank
> --
>
> Key: PIG-3142
> URL: https://issues.apache.org/jira/browse/PIG-3142
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
> Attachments: fixed-width.patch
>
>
> Adds load/store functions for fixed width data to the Piggybank. They use the 
> syntax of the unix "cut" command to specify column positions, and have an 
> option to skip the header row when loading or to write a header row when 
> storing.
> The header handling works properly with multiple small files each with a 
> header being combined into one split, or a large file with a single header 
> being split into multiple splits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows

2013-01-28 Thread Jonathan Packer (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Packer updated PIG-3141:
-

Attachment: csv.patch

See issue description

> Giving CSVExcelStorage an option to handle header rows
> --
>
> Key: PIG-3141
> URL: https://issues.apache.org/jira/browse/PIG-3141
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
> Fix For: 0.11
>
> Attachments: csv.patch
>
>
> Adds an argument to CSVExcelStorage to skip the header row when loading. This 
> works properly with multiple small files each with a header being combined 
> into one split, or a large file with a single header being split into 
> multiple splits.
> Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
> involving quoted fields at the end of a line not escaping properly.
> Removes the choice of delimiter, since a CSV file ought to only use a comma 
> delimiter, hence the name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3141) Giving CSVExcelStorage an option to handle header rows

2013-01-28 Thread Jonathan Packer (JIRA)
Jonathan Packer created PIG-3141:


 Summary: Giving CSVExcelStorage an option to handle header rows
 Key: PIG-3141
 URL: https://issues.apache.org/jira/browse/PIG-3141
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Affects Versions: 0.11
Reporter: Jonathan Packer
 Fix For: 0.11
 Attachments: csv.patch

Adds an argument to CSVExcelStorage to skip the header row when loading. This 
works properly with multiple small files each with a header being combined into 
one split, or a large file with a single header being split into multiple 
splits.

Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
involving quoted fields at the end of a line not escaping properly.

Removes the choice of delimiter, since a CSV file ought to only use a comma 
delimiter, hence the name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1914) Support load/store JSON data in Pig

2013-01-28 Thread Jonathan Packer (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564656#comment-13564656
 ] 

Jonathan Packer commented on PIG-1914:
--

A note about handling arrays: the proposed JsonLoader will wrap the values of a 
flat JSON array, ex. "arr": [1, 2, 3, 4], in single-element tuples by default. 
However, if a tuple schema, for example coords: (lat: double, long: double), is 
specified for a field which is a flat JSON array, the JsonLoader will cast the 
array to a tuple. Nested arrays are loaded properly if a valid schema is 
specified.

> Support load/store JSON data in Pig
> ---
>
> Key: PIG-1914
> URL: https://issues.apache.org/jira/browse/PIG-1914
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.11
>Reporter: Chao Tian
> Attachments: json.patch, PIG-1914.patch
>
>
> The JSON is a commonly used data storage format. It is popular for storing 
> structured data, especially for JavaScript data exchange. 
> Pig should have the ability to load/store JSON format data. I plan to write 
> one for the piggy bank.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1914) Support load/store JSON data in Pig

2013-01-28 Thread Jonathan Packer (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Packer updated PIG-1914:
-

 Tags: JSON LoadFunc StoreFunc  (was: JSON LoadFunc)
Affects Version/s: (was: 0.9.0)
   (was: 0.8.0)
   0.11
 Release Note: Adds Piggybank functions for loading/storing JSON 
without relying on storing metadata alongside it.  (was: Adds support for 
loading JSON data in Pig)
   Status: Patch Available  (was: Open)

Hi, I submitted a patch with an implementation of JSON load and store functions 
which do not rely on metadata being stored alongside the data. There is javadoc 
documentation for each function, but here is a summary of the features.

The JsonLoader can either be passed a schema as a string argument, or it can 
infer a schema if none is provided. If passed a schema, it will load fields in 
the JSON which match the field names in the schema, ignoring extra fields, 
writing nulls for missing fields, and handling out-of-order fields properly.

If not passed a schema, it will load the entire document as a map. The values 
of the map will either be bytearrays (for scalar values) or further maps/bags 
(for nested objects and arrays).

Example usage:

json = LOAD '$INPUT_PATH' USING org.apache.pig.piggybank.storage.JsonLoader('a: 
int, t: (i: int, j: int)');

STORE json INTO '$OUTPUT_PATH' USING 
org.apache.pig.piggybank.storage.JsonStorage();

Jonathan Packer (Mortar Data)

> Support load/store JSON data in Pig
> ---
>
> Key: PIG-1914
> URL: https://issues.apache.org/jira/browse/PIG-1914
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.11
>Reporter: Chao Tian
> Attachments: json.patch, PIG-1914.patch
>
>
> The JSON is a commonly used data storage format. It is popular for storing 
> structured data, especially for JavaScript data exchange. 
> Pig should have the ability to load/store JSON format data. I plan to write 
> one for the piggy bank.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1914) Support load/store JSON data in Pig

2013-01-28 Thread Jonathan Packer (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Packer updated PIG-1914:
-

Attachment: json.patch

Adds Piggybank functions for loading/storing JSON without relying on storing 
metadata alongside it.

> Support load/store JSON data in Pig
> ---
>
> Key: PIG-1914
> URL: https://issues.apache.org/jira/browse/PIG-1914
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Chao Tian
> Attachments: json.patch, PIG-1914.patch
>
>
> The JSON is a commonly used data storage format. It is popular for storing 
> structured data, especially for JavaScript data exchange. 
> Pig should have the ability to load/store JSON format data. I plan to write 
> one for the piggy bank.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3135) HExecutionEngine should look for resources in user passed Properties

2013-01-28 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3135:
-

Patch Info: Patch Available

> HExecutionEngine should look for resources in user passed Properties
> 
>
> Key: PIG-3135
> URL: https://issues.apache.org/jira/browse/PIG-3135
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Attachments: PIG-3135.patch
>
>
> Looking at this snippet:
> {code}
> private void init(Properties properties) throws ExecException {
>   .
>   .
>   .
> // Check existence of hadoop-site.xml or core-site.xml
> Configuration testConf = new Configuration();
> ClassLoader cl = testConf.getClassLoader();
> URL hadoop_site = cl.getResource( HADOOP_SITE );
> URL core_site = cl.getResource( CORE_SITE );
>
> if( hadoop_site == null && core_site == null ) {
> throw new ExecException("Cannot find hadoop configurations in 
> classpath (neither hadoop-site.xml nor core-site.xml was found in the 
> classpath)." +
> " If you plan to use local mode, please put -x local 
> option in command line",
> 4010);
> }
> {code}
> This assumes the resources (*-site.xml) are set on the classpath, but this 
> will not always be the case when run with Pig's Java APIs. One could want to 
> programatically set the resources and the code here should additionally check 
> if they are available in there. 
> Example: When a Configuration object is created and resources are added 
> before passing it on to Pig.
> {code}
> Configuration conf = new Configuration(false);
> conf.addResource("foo/core-site.xml");
> conf.addResource("bar/hadoop-site.xml");
> PigServer pServer = new PigServer(ExecType.MAPREDUCE, conf);
> {code}
> The above conf is not used right now to obtain resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3138) Decouple PigServer.executeBatch() from compilation of batch

2013-01-28 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3138:
-

Patch Info: Patch Available

> Decouple PigServer.executeBatch() from compilation of batch
> ---
>
> Key: PIG-3138
> URL: https://issues.apache.org/jira/browse/PIG-3138
> Project: Pig
>  Issue Type: Improvement
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3138.patch
>
>
> executeBatch() currently does parsing and building of LogicalPlan in addition 
> to the actual execution. It will be beneficial to separate out 
> parsing/building from execution - that will allow us to get a handle on 
> load/store and other operators before execution of batch. Useful for folks 
> using PigServer API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 11.0

2013-01-28 Thread Daniel Dai
All tests pass for me, no single failures or aborts.

Daniel

On Mon, Jan 28, 2013 at 8:33 AM, Cheolsoo Park  wrote:
> Here are my results:
>
> Hadoop-1.0.x:  [exec] Final results ,PASSED: 612  FAILED: 2
>  SKIPPED: 24   ABORTED: 1FAILED DEPENDENCY: 0
> - The failures are simply because I didn't install Hcatalog.
>
> Hadoop-2.0.x:  [exec] Final results ,PASSED: 567  FAILED: 3
>  SKIPPED: 27   ABORTED: 42   FAILED DEPENDENCY: 0
> - The failures seem due to issues in my cluster rather than Pig issues. I
> will re-run them to verify.
>
>
>
> On Fri, Jan 25, 2013 at 5:31 PM, Cheolsoo Park wrote:
>
>> I will also run e2e on Hadoop-1.x and Hadoop-2.x.
>>
>>
>> On Fri, Jan 25, 2013 at 5:02 PM, Daniel Dai  wrote:
>>
>>> I will run e2e tests on Hadoop 1.x over the weekend.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
>>>  wrote:
>>> >  Thats good :). Unit tests have all been passing. I haven't run e2e
>>> tests
>>> > on pig 0.11 for sometime. Will kick off one this weekend. It would be
>>> nice
>>> > if Cheolsoo and Daniel can also kick off one run.
>>> >
>>> > Regards,
>>> > Rohini
>>> >
>>> >
>>> > On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem 
>>> wrote:
>>> >
>>> >> It looks like all the tickets for Pig 0.11 have been resolved as of
>>> today.
>>> >> See:
>>> >>
>>> >>
>>> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
>>> >>
>>> >> I propose we make the release 0.11.0 next week.
>>> >>
>>> >> Julien
>>> >>
>>>
>>
>>


[jira] [Updated] (PIG-3139) Document reducer estimation

2013-01-28 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3139:
-

Status: Patch Available  (was: Open)

> Document reducer estimation
> ---
>
> Key: PIG-3139
> URL: https://issues.apache.org/jira/browse/PIG-3139
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3139_1.patch
>
>
> Add docs to describe how default reducer estimation algo works and how to 
> override it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 11.0

2013-01-28 Thread Cheolsoo Park
Here are my results:

Hadoop-1.0.x:  [exec] Final results ,PASSED: 612  FAILED: 2
 SKIPPED: 24   ABORTED: 1FAILED DEPENDENCY: 0
- The failures are simply because I didn't install Hcatalog.

Hadoop-2.0.x:  [exec] Final results ,PASSED: 567  FAILED: 3
 SKIPPED: 27   ABORTED: 42   FAILED DEPENDENCY: 0
- The failures seem due to issues in my cluster rather than Pig issues. I
will re-run them to verify.



On Fri, Jan 25, 2013 at 5:31 PM, Cheolsoo Park wrote:

> I will also run e2e on Hadoop-1.x and Hadoop-2.x.
>
>
> On Fri, Jan 25, 2013 at 5:02 PM, Daniel Dai  wrote:
>
>> I will run e2e tests on Hadoop 1.x over the weekend.
>>
>> Thanks,
>> Daniel
>>
>> On Fri, Jan 25, 2013 at 4:27 PM, Rohini Palaniswamy
>>  wrote:
>> >  Thats good :). Unit tests have all been passing. I haven't run e2e
>> tests
>> > on pig 0.11 for sometime. Will kick off one this weekend. It would be
>> nice
>> > if Cheolsoo and Daniel can also kick off one run.
>> >
>> > Regards,
>> > Rohini
>> >
>> >
>> > On Fri, Jan 25, 2013 at 4:08 PM, Julien Le Dem 
>> wrote:
>> >
>> >> It looks like all the tickets for Pig 0.11 have been resolved as of
>> today.
>> >> See:
>> >>
>> >>
>> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%220.11%22%20AND%20project%20%3D%20PIG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%20created%20ASC%2C%20priority%20DESC
>> >>
>> >> I propose we make the release 0.11.0 next week.
>> >>
>> >> Julien
>> >>
>>
>
>


[jira] [Commented] (PIG-3140) Document PigProgressNotificationListener configs

2013-01-28 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564358#comment-13564358
 ] 

Jarek Jarcec Cecho commented on PIG-3140:
-

+1 (non-binding)

> Document PigProgressNotificationListener configs
> 
>
> Key: PIG-3140
> URL: https://issues.apache.org/jira/browse/PIG-3140
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Bill Graham
> Fix For: 0.11
>
> Attachments: PIG-3140_1.patch
>
>
> Add docs to describe what PPNL is and how to configure it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is back to normal : Pig-trunk #1395

2013-01-28 Thread Apache Jenkins Server
See