[jira] [Commented] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase

2013-10-03 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785951#comment-13785951
 ] 

Daniel Dai commented on PIG-3497:
-

+1 for 0.12. The change is simple enough and not likely to break anything.

> JobControlCompiler should only do reducer estimation when the job has a 
> reduce phase
> 
>
> Key: PIG-3497
> URL: https://issues.apache.org/jira/browse/PIG-3497
> Project: Pig
>  Issue Type: Bug
>Reporter: Akihiro Matsukawa
>Assignee: Akihiro Matsukawa
>Priority: Minor
> Attachments: reducer_estimation.patch
>
>
> Currently, JobControlCompiler makes an estimation for the number of reducers 
> required (by default based on input size into mappers) regardless of whether 
> there is a reduce phase in the job. This is unnecessary, especially when 
> running more complex custom reducer estimators. 
> Change to only estimate reducers when necessary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3483) Document ASSERT keyword

2013-10-03 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785940#comment-13785940
 ] 

Cheolsoo Park commented on PIG-3483:


+1. Thank you for taking care of this!

Just found a typo in your example. The relation A doesn't have a0 field. Do you 
mind fixing it when committing the patch?
{code}
Suppose we have relation A.

A = LOAD 'data' AS (a1:int,a2:int,a3:int);
...
Now, you can assert that a0 column in your data is >0, fail if otherwise
ASSERT A by a0 > 0 'a0 should be greater than 0';
{code}


> Document ASSERT keyword
> ---
>
> Key: PIG-3483
> URL: https://issues.apache.org/jira/browse/PIG-3483
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.12.0
>Reporter: Cheolsoo Park
>Assignee: Aniket Mokashi
> Fix For: 0.12.0
>
> Attachments: PIG-3483.patch
>
>
> PIG-3367 added a new keyword ASSERT, so we need to document it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3483) Document ASSERT keyword

2013-10-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3483:


Status: Patch Available  (was: Open)

> Document ASSERT keyword
> ---
>
> Key: PIG-3483
> URL: https://issues.apache.org/jira/browse/PIG-3483
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.12.0
>Reporter: Cheolsoo Park
>Assignee: Aniket Mokashi
> Fix For: 0.12.0
>
> Attachments: PIG-3483.patch
>
>
> PIG-3367 added a new keyword ASSERT, so we need to document it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3483) Document ASSERT keyword

2013-10-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-3483:


Attachment: PIG-3483.patch

> Document ASSERT keyword
> ---
>
> Key: PIG-3483
> URL: https://issues.apache.org/jira/browse/PIG-3483
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.12.0
>Reporter: Cheolsoo Park
>Assignee: Aniket Mokashi
> Fix For: 0.12.0
>
> Attachments: PIG-3483.patch
>
>
> PIG-3367 added a new keyword ASSERT, so we need to document it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3483) Document ASSERT keyword

2013-10-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785932#comment-13785932
 ] 

Aniket Mokashi commented on PIG-3483:
-

[~cheolsoo], can you please review this patch?

> Document ASSERT keyword
> ---
>
> Key: PIG-3483
> URL: https://issues.apache.org/jira/browse/PIG-3483
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.12.0
>Reporter: Cheolsoo Park
>Assignee: Aniket Mokashi
> Fix For: 0.12.0
>
> Attachments: PIG-3483.patch
>
>
> PIG-3367 added a new keyword ASSERT, so we need to document it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase

2013-10-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785876#comment-13785876
 ] 

Aniket Mokashi commented on PIG-3497:
-

+1. Committed to trunk.
[~daijy], should we also commit this to 0.12?

> JobControlCompiler should only do reducer estimation when the job has a 
> reduce phase
> 
>
> Key: PIG-3497
> URL: https://issues.apache.org/jira/browse/PIG-3497
> Project: Pig
>  Issue Type: Bug
>Reporter: Akihiro Matsukawa
>Assignee: Akihiro Matsukawa
>Priority: Minor
> Attachments: reducer_estimation.patch
>
>
> Currently, JobControlCompiler makes an estimation for the number of reducers 
> required (by default based on input size into mappers) regardless of whether 
> there is a reduce phase in the job. This is unnecessary, especially when 
> running more complex custom reducer estimators. 
> Change to only estimate reducers when necessary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3494) Several fixes for e2e tests

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785825#comment-13785825
 ] 

Hudson commented on PIG-3494:
-

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #189 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/189/])
PIG-3494: Several fixes for e2e tests (daijy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712)
* /pig/trunk/test/e2e/harness/test_harness.pl
* /pig/trunk/test/e2e/pig/conf/default.conf
* /pig/trunk/test/e2e/pig/conf/rpm.conf
* /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm
* /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm
* /pig/trunk/test/e2e/pig/tests/negative.conf
* /pig/trunk/test/e2e/pig/tests/nightly.conf


> Several fixes for e2e tests
> ---
>
> Key: PIG-3494
> URL: https://issues.apache.org/jira/browse/PIG-3494
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3494-1.patch
>
>
> Address several issues in e2e tests:
> 1. Adding the capacity to test Pig installed by rpm (also involves 
> configurable piggybank.jar)
> 2. Remove hadoop23.res since it is no longer needed
> 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are 
> fixed by PIG-3360



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3449) Move JobCreationException to org.apache.pig.backend.hadoop.executionengine

2013-10-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3449:
---

Attachment: PIG-3446-2.patch

Fix compilation error...

> Move JobCreationException to org.apache.pig.backend.hadoop.executionengine
> --
>
> Key: PIG-3449
> URL: https://issues.apache.org/jira/browse/PIG-3449
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3446-1.patch, PIG-3446-2.patch
>
>
> JobCreationException is not MR-specific, so it should be moved from  
> {{org.apache.pig.backend.hadoop.executionengine.mapReduceLayer}} to
>  {{org.apache.pig.backend.hadoop.executionengine}}.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3108) HBaseStorage returns empty maps when mixing wildcard- with other columns

2013-10-03 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785792#comment-13785792
 ] 

Harsh J commented on PIG-3108:
--

Moved from Release Notes to comments:

bq. Tested and committed. Thanks for the patch Christoph and sorry for the 
delay!

> HBaseStorage returns empty maps when mixing wildcard- with other columns
> 
>
> Key: PIG-3108
> URL: https://issues.apache.org/jira/browse/PIG-3108
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12.0
>Reporter: Christoph Bauer
>Assignee: Christoph Bauer
> Fix For: 0.12.0
>
> Attachments: PIG-3108.patch, PIG-3108.patch
>
>
> Consider the following:
> A and B should be the same (with different order, of course).
> {code}
> /*
> in hbase shell:
> create 'pigtest', 'pig'
> put 'pigtest' , '1', 'pig:name', 'A'
> put 'pigtest' , '1', 'pig:has_legs', 'true'
> put 'pigtest' , '1', 'pig:has_ribs', 'true'
> */
> A = LOAD 'hbase://pigtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name pig:has*') AS 
> (name:chararray,parts);
> B = LOAD 'hbase://pigtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has* pig:name') AS 
> (parts,name:chararray);
> dump A;
> dump B;
> {code}
> This is due to a bug in setLocation and initScan.
> For _A_ 
> # scan.addColumn(pig,name); // for 'pig:name'
> # scan.addFamily(pig); // for the 'pig:has*'
> So that's silently right.
> But for _B_
> # scan.addFamily(pig)
> # scan.addColumn(pig,name)
> will override the first call to addFamily, because you cannot mix them on the 
> same family.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3108) HBaseStorage returns empty maps when mixing wildcard- with other columns

2013-10-03 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated PIG-3108:
-

Release Note:   (was: Tested and committed. Thanks for the patch Christoph 
and sorry for the delay!)

> HBaseStorage returns empty maps when mixing wildcard- with other columns
> 
>
> Key: PIG-3108
> URL: https://issues.apache.org/jira/browse/PIG-3108
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12.0
>Reporter: Christoph Bauer
>Assignee: Christoph Bauer
> Fix For: 0.12.0
>
> Attachments: PIG-3108.patch, PIG-3108.patch
>
>
> Consider the following:
> A and B should be the same (with different order, of course).
> {code}
> /*
> in hbase shell:
> create 'pigtest', 'pig'
> put 'pigtest' , '1', 'pig:name', 'A'
> put 'pigtest' , '1', 'pig:has_legs', 'true'
> put 'pigtest' , '1', 'pig:has_ribs', 'true'
> */
> A = LOAD 'hbase://pigtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name pig:has*') AS 
> (name:chararray,parts);
> B = LOAD 'hbase://pigtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has* pig:name') AS 
> (parts,name:chararray);
> dump A;
> dump B;
> {code}
> This is due to a bug in setLocation and initScan.
> For _A_ 
> # scan.addColumn(pig,name); // for 'pig:name'
> # scan.addFamily(pig); // for the 'pig:has*'
> So that's silently right.
> But for _B_
> # scan.addFamily(pig)
> # scan.addColumn(pig,name)
> will override the first call to addFamily, because you cannot mix them on the 
> same family.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785785#comment-13785785
 ] 

Daniel Dai commented on PIG-3445:
-

Great, thanks!

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3446) Umbrella jira for Pig on Tez

2013-10-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3446:
---

Description: 
This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.

More information can be found on the following wiki page:
https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez

To build tez-branch, you need to install tez jars on local maven repo first. 
Please checkout Apache Tez repo and run mvn install.

  was:
This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.

More information can be found on the following wiki page:
https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez


> Umbrella jira for Pig on Tez
> 
>
> Key: PIG-3446
> URL: https://issues.apache.org/jira/browse/PIG-3446
> Project: Pig
>  Issue Type: New Feature
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
>
> This is a umbrella jira for Pig on Tez. More detailed subtasks will be added.
> More information can be found on the following wiki page:
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Tez
> To build tez-branch, you need to install tez jars on local maven repo first. 
> Please checkout Apache Tez repo and run mvn install.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] Subscription: PIG patch available

2013-10-03 Thread jira
Issue Subscription
Filter: PIG patch available (12 issues)

Subscriber: pigdaily

Key Summary
PIG-3497JobControlCompiler should only do reducer estimation when the job 
has a reduce phase
https://issues.apache.org/jira/browse/PIG-3497
PIG-3496Propagate HBase 0.95 jars to the backend
https://issues.apache.org/jira/browse/PIG-3496
PIG-3451EvalFunc ctor reflection to determine value of type param T is 
brittle
https://issues.apache.org/jira/browse/PIG-3451
PIG-3449Move JobCreationException to 
org.apache.pig.backend.hadoop.executionengine
https://issues.apache.org/jira/browse/PIG-3449
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3388No support for Regex for row filter in 
org.apache.pig.backend.hadoop.hbase.HBaseStorage
https://issues.apache.org/jira/browse/PIG-3388
PIG-3347Store invocation in local mode brings sire effect
https://issues.apache.org/jira/browse/PIG-3347
PIG-3325Adding a tuple to a bag is slow
https://issues.apache.org/jira/browse/PIG-3325
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3117A debug mode in which pig does not delete temporary files
https://issues.apache.org/jira/browse/PIG-3117
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Created] (PIG-3502) Initial implementation of TezLauncher

2013-10-03 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3502:
--

 Summary: Initial implementation of TezLauncher
 Key: PIG-3502
 URL: https://issues.apache.org/jira/browse/PIG-3502
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch


Once Tez DAG is built, TezLauncher submits it to Tez cluster using TezClient 
API.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3501) Initial implementation of TezJobControlCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785759#comment-13785759
 ] 

Cheolsoo Park commented on PIG-3501:


Note the patch should be applied after PIG-3500 (TezCompiler).

> Initial implementation of TezJobControlCompiler
> ---
>
> Key: PIG-3501
> URL: https://issues.apache.org/jira/browse/PIG-3501
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3501-1.patch
>
>
> TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, 
> it is wrapped by JobControl before being submitted by TezLauncher.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3501) Initial implementation of TezJobControlCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3501:
---

Attachment: PIG-3501-1.patch

Attached includes an initial version of TezJobControlCompiler with unit tests. 
Tez DAG is built by TezDagBuilder which is an extension of TezOpPlanVisitor.

The unit tests can run with ant clean test -Dtestcase=TestTezJobControlCompiler.

> Initial implementation of TezJobControlCompiler
> ---
>
> Key: PIG-3501
> URL: https://issues.apache.org/jira/browse/PIG-3501
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3501-1.patch
>
>
> TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, 
> it is wrapped by JobControl before being submitted by TezLauncher.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3501) Initial implementation of TezJobControlCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3501:
--

 Summary: Initial implementation of TezJobControlCompiler
 Key: PIG-3501
 URL: https://issues.apache.org/jira/browse/PIG-3501
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch


TezJobControlCompiler builds tez plan into tez DAG. Once tez DAG is built, it 
is wrapped by JobControl before being submitted by TezLauncher.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3501) Initial implementation of TezJobControlCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3501:
---

Description: TezJobControlCompiler converts tez plan into tez DAG. Once tez 
DAG is built, it is wrapped by JobControl before being submitted by 
TezLauncher.  (was: TezJobControlCompiler builds tez plan into tez DAG. Once 
tez DAG is built, it is wrapped by JobControl before being submitted by 
TezLauncher.)

> Initial implementation of TezJobControlCompiler
> ---
>
> Key: PIG-3501
> URL: https://issues.apache.org/jira/browse/PIG-3501
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
>
> TezJobControlCompiler converts tez plan into tez DAG. Once tez DAG is built, 
> it is wrapped by JobControl before being submitted by TezLauncher.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785742#comment-13785742
 ] 

Julien Le Dem commented on PIG-3445:


I just released parquet-pig-bundle-1.2.3
this should show up in maven central overnight

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3499) Pig job fails when run in local mode with namenode HA(QJM)

2013-10-03 Thread venkata kamalnath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785735#comment-13785735
 ] 

venkata kamalnath commented on PIG-3499:


yes I am running pig job in local mode using pig -x local test.pig . I am 
testing pig jobs with hadoop environment setup in HA mode. 

Steps to reproduce:

Set up hadoop HA namenode setup and run pig jobs in using -x localmode.



> Pig job fails when run in local mode with namenode HA(QJM) 
> ---
>
> Key: PIG-3499
> URL: https://issues.apache.org/jira/browse/PIG-3499
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, parser, tools
>Affects Versions: 0.10.0
>Reporter: venkata kamalnath
>
> when we run a pig script with namenode HA(QJM) we always get unknown host 
> exception. The nameserviceID is being considered as host and pig job giving 
> unknown host exception.
> I am working on this fix but want community to validate whether any bug 
> reported similar to this. If not I will provide the fix as soon as possible.
> The pig script is as below:
> testTable = LOAD 'hdfs://kdvenkata/user/kd/test.csv'
>   USING PigStorage(',')
>   AS (col1:chararray, col2:chararray, col3:int);
> STORE testTable into '/tmp/test_pig_output';
> Exception:
> Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> kdvenkata
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:412)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:379)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123)
> at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2278)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
> at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2312)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2294)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:53)
> at 
> org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
> at 
> org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
> at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:465)
> at 
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
> at 
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
> at 
> org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84)
> at 
> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1626)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1620)
> at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1343)
> at org.apache.pig.PigServer.storeEx(PigServer.java:960)
> at org.apache.pig.PigServer.store(PigServer.java:928)
> at org.apache.pig.PigServer.openIterator(PigServer.java:841)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (PIG-3500) Initial implementation of TezCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785730#comment-13785730
 ] 

Cheolsoo Park edited comment on PIG-3500 at 10/4/13 12:29 AM:
--

Attached includes an initial version of TezCompiler with unit tests. Note that 
query #3 is compiled into 3 Tez vertices (two input vertices and one join 
vertex) unlike MR plan.

The unit test can run with ant test clean -Dtestcase=TestTezCompiler.


was (Author: cheolsoo):
Attached includes an initial version of TezCompiler with unit tests. Note that 
it query #3 is compiled into 3 Tez vertices (two input vertices and one join 
vertex) unlike MR plan.

The unit test can run with ant test clean -Dtestcase=TestTezCompiler.

> Initial implementation of TezCompiler
> -
>
> Key: PIG-3500
> URL: https://issues.apache.org/jira/browse/PIG-3500
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3500-1.patch
>
>
> Implement TezCompiler that compiles physical plan into tez plan. To begin 
> with, we can implement the initial version that works for basic queries as 
> follows:
> # Load-Filter-Store
> {code}
> a = load 'file:///tmp/input' as (x:int, y:int);
> b = filter a by x > 0;
> c = foreach b generate y;
> store c into 'file:///tmp/output';
> {code}
> # Load-Filter-GroupBy-Store
> {code}
> a = load 'file:///tmp/input' as (x:int, y:int);
> b = group a by x;
> c = foreach b generate group, a;
> store c into 'file:///tmp/output';
> {code}
> # Load1-Load2-Join-Store
> {code}
> a = load 'file:///tmp/input1' as (x:int, y:int);
> b = load 'file:///tmp/input2' as (x:int, z:int);
> c = join a by x, b by x;
> d = foreach c generate a::x as x, y, z;
> store d into 'file:///tmp/output';
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3500) Initial implementation of TezCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3500:
---

Attachment: PIG-3500-1.patch

Attached includes an initial version of TezCompiler with unit tests. Note that 
it query #3 is compiled into 3 Tez vertices (two input vertices and one join 
vertex) unlike MR plan.

The unit test can run with ant test clean -Dtestcase=TestTezCompiler.

> Initial implementation of TezCompiler
> -
>
> Key: PIG-3500
> URL: https://issues.apache.org/jira/browse/PIG-3500
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: tez-branch
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: tez-branch
>
> Attachments: PIG-3500-1.patch
>
>
> Implement TezCompiler that compiles physical plan into tez plan. To begin 
> with, we can implement the initial version that works for basic queries as 
> follows:
> # Load-Filter-Store
> {code}
> a = load 'file:///tmp/input' as (x:int, y:int);
> b = filter a by x > 0;
> c = foreach b generate y;
> store c into 'file:///tmp/output';
> {code}
> # Load-Filter-GroupBy-Store
> {code}
> a = load 'file:///tmp/input' as (x:int, y:int);
> b = group a by x;
> c = foreach b generate group, a;
> store c into 'file:///tmp/output';
> {code}
> # Load1-Load2-Join-Store
> {code}
> a = load 'file:///tmp/input1' as (x:int, y:int);
> b = load 'file:///tmp/input2' as (x:int, z:int);
> c = join a by x, b by x;
> d = foreach c generate a::x as x, y, z;
> store d into 'file:///tmp/output';
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3500) Initial implementation of TezCompiler

2013-10-03 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3500:
--

 Summary: Initial implementation of TezCompiler
 Key: PIG-3500
 URL: https://issues.apache.org/jira/browse/PIG-3500
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch


Implement TezCompiler that compiles physical plan into tez plan. To begin with, 
we can implement the initial version that works for basic queries as follows:
# Load-Filter-Store
{code}
a = load 'file:///tmp/input' as (x:int, y:int);
b = filter a by x > 0;
c = foreach b generate y;
store c into 'file:///tmp/output';
{code}
# Load-Filter-GroupBy-Store
{code}
a = load 'file:///tmp/input' as (x:int, y:int);
b = group a by x;
c = foreach b generate group, a;
store c into 'file:///tmp/output';
{code}
# Load1-Load2-Join-Store
{code}
a = load 'file:///tmp/input1' as (x:int, y:int);
b = load 'file:///tmp/input2' as (x:int, z:int);
c = join a by x, b by x;
d = foreach c generate a::x as x, y, z;
store d into 'file:///tmp/output';
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3499) Pig job fails when run in local mode with namenode HA(QJM)

2013-10-03 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785702#comment-13785702
 ] 

Prashant Kommireddi commented on PIG-3499:
--

[~kdvenkata] - why is local mode using HA? Are you sure you are running Pig's 
local mode?

> Pig job fails when run in local mode with namenode HA(QJM) 
> ---
>
> Key: PIG-3499
> URL: https://issues.apache.org/jira/browse/PIG-3499
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, parser, tools
>Affects Versions: 0.10.0
>Reporter: venkata kamalnath
>
> when we run a pig script with namenode HA(QJM) we always get unknown host 
> exception. The nameserviceID is being considered as host and pig job giving 
> unknown host exception.
> I am working on this fix but want community to validate whether any bug 
> reported similar to this. If not I will provide the fix as soon as possible.
> The pig script is as below:
> testTable = LOAD 'hdfs://kdvenkata/user/kd/test.csv'
>   USING PigStorage(',')
>   AS (col1:chararray, col2:chararray, col3:int);
> STORE testTable into '/tmp/test_pig_output';
> Exception:
> Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> kdvenkata
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:412)
> at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:379)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123)
> at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2278)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
> at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2312)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2294)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:53)
> at 
> org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)
> at 
> org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)
> at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:465)
> at 
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)
> at 
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)
> at 
> org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)
> at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84)
> at 
> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1626)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1620)
> at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1343)
> at org.apache.pig.PigServer.storeEx(PigServer.java:960)
> at org.apache.pig.PigServer.store(PigServer.java:928)
> at org.apache.pig.PigServer.openIterator(PigServer.java:841)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3499) Pig job fails when run in local mode with namenode HA(QJM)

2013-10-03 Thread venkata kamalnath (JIRA)
venkata kamalnath created PIG-3499:
--

 Summary: Pig job fails when run in local mode with namenode 
HA(QJM) 
 Key: PIG-3499
 URL: https://issues.apache.org/jira/browse/PIG-3499
 Project: Pig
  Issue Type: Bug
  Components: grunt, parser, tools
Affects Versions: 0.10.0
Reporter: venkata kamalnath


when we run a pig script with namenode HA(QJM) we always get unknown host 
exception. The nameserviceID is being considered as host and pig job giving 
unknown host exception.

I am working on this fix but want community to validate whether any bug 
reported similar to this. If not I will provide the fix as soon as possible.

The pig script is as below:

testTable = LOAD 'hdfs://kdvenkata/user/kd/test.csv'
  USING PigStorage(',')
  AS (col1:chararray, col2:chararray, col3:int);
STORE testTable into '/tmp/test_pig_output';


Exception:


Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
kdvenkata

at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:417)

at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)

at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)

at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:412)

at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:379)

at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:123)

at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2278)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)

at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2312)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2294)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:317)

at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)

at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:53)

at 
org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:106)

at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:188)

at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:465)

at 
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:151)

at 
org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:110)

at 
org.apache.pig.newplan.logical.relational.LOStore.getSchema(LOStore.java:68)

at 
org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:60)

at 
org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:84)

at 
org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)

at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)

at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)

at org.apache.pig.PigServer$Graph.compile(PigServer.java:1626)

at org.apache.pig.PigServer$Graph.compile(PigServer.java:1620)

at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1343)

at org.apache.pig.PigServer.storeEx(PigServer.java:960)

at org.apache.pig.PigServer.store(PigServer.java:928)

at org.apache.pig.PigServer.openIterator(PigServer.java:841)









--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3498) Make pig binary work on both HBase version 0.94 and 0.95

2013-10-03 Thread Jarek Jarcec Cecho (JIRA)
Jarek Jarcec Cecho created PIG-3498:
---

 Summary: Make pig binary work on both HBase version 0.94 and 0.95
 Key: PIG-3498
 URL: https://issues.apache.org/jira/browse/PIG-3498
 Project: Pig
  Issue Type: Task
Affects Versions: 0.11
Reporter: Jarek Jarcec Cecho


HBase 0.95+ support has been added via PIG-3390. Whereas pig can be compiled 
against both 0.94 and 0.95, due to binary incompatibilities inside HBase, pig 
compiled against HBase 0.95 can't be used against 0.94 and vice versa. 

One of the issue we are facing is HBase class {{RowFilter}}, that changed 
constructor between the two HBase releases:

* HBase 0.94  {{RowFilter(CompareOp, WritableByteArrayComparable)}}
* HBase 0.95 {{RowFilter(CompareO, ByteArrayComparable)}}

We are using children of the classes used in second parameter and therefore the 
same code compiles against both HBase versions. However as the entire 
constructor signature is saved into compiled Java class, generated binaries are 
compatible with only one HBase version. 

As we're releasing only one pig binary, it would be useful to make Pig 
compatible with both versions at the same time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3494) Several fixes for e2e tests

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785655#comment-13785655
 ] 

Hudson commented on PIG-3494:
-

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #123 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/123/])
PIG-3494: Several fixes for e2e tests (daijy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712)
* /pig/trunk/test/e2e/harness/test_harness.pl
* /pig/trunk/test/e2e/pig/conf/default.conf
* /pig/trunk/test/e2e/pig/conf/rpm.conf
* /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm
* /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm
* /pig/trunk/test/e2e/pig/tests/negative.conf
* /pig/trunk/test/e2e/pig/tests/nightly.conf


> Several fixes for e2e tests
> ---
>
> Key: PIG-3494
> URL: https://issues.apache.org/jira/browse/PIG-3494
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3494-1.patch
>
>
> Address several issues in e2e tests:
> 1. Adding the capacity to test Pig installed by rpm (also involves 
> configurable piggybank.jar)
> 2. Remove hadoop23.res since it is no longer needed
> 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are 
> fixed by PIG-3360



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785616#comment-13785616
 ] 

Daniel Dai commented on PIG-3445:
-

Hi, [~julienledem], I am trying to roll a Pig 0.12.0 RC tomorrow, can we get it 
done by then?

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785610#comment-13785610
 ] 

Julien Le Dem commented on PIG-3445:


We merged the PR for parquet-pig-bundle
I'm making a release so that this can be merge in pig 0.12


> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase

2013-10-03 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3497:
-

Assignee: Akihiro Matsukawa

> JobControlCompiler should only do reducer estimation when the job has a 
> reduce phase
> 
>
> Key: PIG-3497
> URL: https://issues.apache.org/jira/browse/PIG-3497
> Project: Pig
>  Issue Type: Bug
>Reporter: Akihiro Matsukawa
>Assignee: Akihiro Matsukawa
>Priority: Minor
> Attachments: reducer_estimation.patch
>
>
> Currently, JobControlCompiler makes an estimation for the number of reducers 
> required (by default based on input size into mappers) regardless of whether 
> there is a reduce phase in the job. This is unnecessary, especially when 
> running more complex custom reducer estimators. 
> Change to only estimate reducers when necessary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase

2013-10-03 Thread Akihiro Matsukawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akihiro Matsukawa updated PIG-3497:
---

Status: Patch Available  (was: Open)

> JobControlCompiler should only do reducer estimation when the job has a 
> reduce phase
> 
>
> Key: PIG-3497
> URL: https://issues.apache.org/jira/browse/PIG-3497
> Project: Pig
>  Issue Type: Bug
>Reporter: Akihiro Matsukawa
>Priority: Minor
> Attachments: reducer_estimation.patch
>
>
> Currently, JobControlCompiler makes an estimation for the number of reducers 
> required (by default based on input size into mappers) regardless of whether 
> there is a reduce phase in the job. This is unnecessary, especially when 
> running more complex custom reducer estimators. 
> Change to only estimate reducers when necessary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase

2013-10-03 Thread Akihiro Matsukawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akihiro Matsukawa updated PIG-3497:
---

Attachment: reducer_estimation.patch

> JobControlCompiler should only do reducer estimation when the job has a 
> reduce phase
> 
>
> Key: PIG-3497
> URL: https://issues.apache.org/jira/browse/PIG-3497
> Project: Pig
>  Issue Type: Bug
>Reporter: Akihiro Matsukawa
>Priority: Minor
> Attachments: reducer_estimation.patch
>
>
> Currently, JobControlCompiler makes an estimation for the number of reducers 
> required (by default based on input size into mappers) regardless of whether 
> there is a reduce phase in the job. This is unnecessary, especially when 
> running more complex custom reducer estimators. 
> Change to only estimate reducers when necessary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3497) JobControlCompiler should only do reducer estimation when the job has a reduce phase

2013-10-03 Thread Akihiro Matsukawa (JIRA)
Akihiro Matsukawa created PIG-3497:
--

 Summary: JobControlCompiler should only do reducer estimation when 
the job has a reduce phase
 Key: PIG-3497
 URL: https://issues.apache.org/jira/browse/PIG-3497
 Project: Pig
  Issue Type: Bug
Reporter: Akihiro Matsukawa
Priority: Minor


Currently, JobControlCompiler makes an estimation for the number of reducers 
required (by default based on input size into mappers) regardless of whether 
there is a reduce phase in the job. This is unnecessary, especially when 
running more complex custom reducer estimators. 

Change to only estimate reducers when necessary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3496) Propagate HBase 0.95 jars to the backend

2013-10-03 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated PIG-3496:


Attachment: PIG-3496.patch

> Propagate HBase 0.95 jars to the backend
> 
>
> Key: PIG-3496
> URL: https://issues.apache.org/jira/browse/PIG-3496
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Jarek Jarcec Cecho
>Assignee: Jarek Jarcec Cecho
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: PIG-3496.patch
>
>
> In PIG-3390 we've introduced support for HBase 0.95 that introduced a lot of 
> significant changes to HBase. One of the biggest user facing changes was 
> splitting one uber jar file into multiple independent jars (such as 
> {{hbase-common}}, {{hbase-client}}, ...).  
> {{HBaseStore}} have [special 
> code|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L724]
>  for propagating HBase jar files and important dependencies to the backend. 
> This logic has not been altered to take into account the different HBase jars 
> after the split and as a result the HBase integration with 0.95 is not 
> working in fully distributed mode (it is work in local mode thought).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3496) Propagate HBase 0.95 jars to the backend

2013-10-03 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated PIG-3496:


Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

> Propagate HBase 0.95 jars to the backend
> 
>
> Key: PIG-3496
> URL: https://issues.apache.org/jira/browse/PIG-3496
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Jarek Jarcec Cecho
>Assignee: Jarek Jarcec Cecho
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: PIG-3496.patch
>
>
> In PIG-3390 we've introduced support for HBase 0.95 that introduced a lot of 
> significant changes to HBase. One of the biggest user facing changes was 
> splitting one uber jar file into multiple independent jars (such as 
> {{hbase-common}}, {{hbase-client}}, ...).  
> {{HBaseStore}} have [special 
> code|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L724]
>  for propagating HBase jar files and important dependencies to the backend. 
> This logic has not been altered to take into account the different HBase jars 
> after the split and as a result the HBase integration with 0.95 is not 
> working in fully distributed mode (it is work in local mode thought).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Review Request 14472: PIG-3496 Propagate HBase 0.95 jars to the backend

2013-10-03 Thread Jarek Cecho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14472/
---

Review request for pig.


Bugs: PIG-3496
https://issues.apache.org/jira/browse/PIG-3496


Repository: pig-git


Description
---

I've added the additional required jars into the 
initialiseHBaseClassLoaderResources() method, so that they get propagated into 
the backend.


Diffs
-

  src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java 67aa984 

Diff: https://reviews.apache.org/r/14472/diff/


Testing
---

Unit tests for both hbaseversion = 94 | 95 seems to be passing:

ant clean test -Dtestcase=TestHBaseStorage -Dhbaseversion=94 
-Dprotobuf-java.version=2.4.0a
ant clean test -Dtestcase=TestHBaseStorage -Dhbaseversion=95 
-Dprotobuf-java.version=2.5.0

I've also tried the patch on fully distributed clusters running both major 
HBase versions and everything seems to be working.


Thanks,

Jarek Cecho



[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785565#comment-13785565
 ] 

Julien Le Dem commented on PIG-3445:



parquet-format.version should be 1.0.0

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785564#comment-13785564
 ] 

Julien Le Dem commented on PIG-3445:


I add a parquet-pig-bundle and the shading of fastutil:
https://github.com/Parquet/parquet-mr/pull/186
We can make a new release to simplify

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (PIG-3496) Propagate HBase 0.95 jars to the backend

2013-10-03 Thread Jarek Jarcec Cecho (JIRA)
Jarek Jarcec Cecho created PIG-3496:
---

 Summary: Propagate HBase 0.95 jars to the backend
 Key: PIG-3496
 URL: https://issues.apache.org/jira/browse/PIG-3496
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
Priority: Minor


In PIG-3390 we've introduced support for HBase 0.95 that introduced a lot of 
significant changes to HBase. One of the biggest user facing changes was 
splitting one uber jar file into multiple independent jars (such as 
{{hbase-common}}, {{hbase-client}}, ...).  

{{HBaseStore}} have [special 
code|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L724]
 for propagating HBase jar files and important dependencies to the backend. 
This logic has not been altered to take into account the different HBase jars 
after the split and as a result the HBase integration with 0.95 is not working 
in fully distributed mode (it is work in local mode thought).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3469) Skewed join can cause unrecoverable NullPointerException when one of its inputs is missing.

2013-10-03 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3469:
-

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks, Jarces!

> Skewed join can cause unrecoverable NullPointerException when one of its 
> inputs is missing.
> ---
>
> Key: PIG-3469
> URL: https://issues.apache.org/jira/browse/PIG-3469
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
> Environment: Apache Pig version 0.11.0-cdh4.4.0
> Happens in both local execution environment (os x) and cluster environment 
> (linux)
>Reporter: Christon DeWan
>Assignee: Jarek Jarcec Cecho
> Fix For: 0.13.0
>
> Attachments: PIG-3469.patch, PIG-3469.patch, PIG-3469.patch
>
>
> Run this script in the local execution environment (affects cluster mode too):
> {noformat}
> %declare DATA_EXISTS /tmp/test_data_exists.tsv
> %declare DATA_MISSING /tmp/test_data_missing.tsv
> %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i; done) > 
> /tmp/test_data_exists.tsv; true'`
> exists = LOAD '$DATA_EXISTS' AS (a:long);
> missing = LOAD '$DATA_MISSING' AS (a:long);
> missing = FOREACH ( GROUP missing BY a ) GENERATE $0 AS a, COUNT_STAR($1);
> joined = JOIN exists BY a, missing BY a USING 'skewed';
> STORE joined INTO '/tmp/test_out.tsv';
> {noformat}
> Results in NullPointerException which halts entire pig execution, including 
> unrelated jobs. Expected: only dependencies of the error'd LOAD statement 
> should fail. 
> Error:
> {noformat}
> 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2017: Internal error creating job configuration.
> 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
>  ERROR 2017: Internal error creating job configuration.
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:848)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
>   at org.apache.pig.PigServer.execute(PigServer.java:1241)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
>   at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>   at org.apache.pig.Main.run(Main.java:604)
>   at org.apache.pig.Main.main(Main.java:157)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:868)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480)
>   ... 17 more
> {noformat}
> Script above is as small as I can make it while still reproducing the issue. 
> Removing the group-foreach causes the join to fail harmlessly (not stopping 
> pig execution), as does using the default join. Did not occur on 0.10.1.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-2315) Make as clause work in generate

2013-10-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2315:


Attachment: PIG-2315-1.patch

Fix unit test failures.

> Make as clause work in generate
> ---
>
> Key: PIG-2315
> URL: https://issues.apache.org/jira/browse/PIG-2315
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Gianmarco De Francisci Morales
> Fix For: 0.12.0
>
> Attachments: PIG-2315-1.patch, PIG-2315-1.patch
>
>
> Currently, the following syntax is supported and ignored causing confusing 
> with users:
> A1 = foreach A1 generate a as a:chararray ;
> After this statement a just retains its previous type



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3494) Several fixes for e2e tests

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785466#comment-13785466
 ] 

Hudson commented on PIG-3494:
-

ABORTED: Integrated in Hive-trunk-hadoop2 #472 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/472/])
PIG-3494: Several fixes for e2e tests (daijy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712)
* /pig/trunk/test/e2e/harness/test_harness.pl
* /pig/trunk/test/e2e/pig/conf/default.conf
* /pig/trunk/test/e2e/pig/conf/rpm.conf
* /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm
* /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm
* /pig/trunk/test/e2e/pig/tests/negative.conf
* /pig/trunk/test/e2e/pig/tests/nightly.conf


> Several fixes for e2e tests
> ---
>
> Key: PIG-3494
> URL: https://issues.apache.org/jira/browse/PIG-3494
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3494-1.patch
>
>
> Address several issues in e2e tests:
> 1. Adding the capacity to test Pig installed by rpm (also involves 
> configurable piggybank.jar)
> 2. Remove hadoop23.res since it is no longer needed
> 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are 
> fixed by PIG-3360



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema

2013-10-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785427#comment-13785427
 ] 

Julien Le Dem commented on PIG-3082:


This is intended.
The second behavior described above is really problematic.
If a UDF breaks because it returns a schema of more than one field it should be 
changed to return one field of type tuple.
Once fixed it works in all versions of Pig.
This is only removing an unsafe use of outputSchema in favor of the existing 
correct use.

> outputSchema of a UDF allows two usages when describing a Tuple schema
> --
>
> Key: PIG-3082
> URL: https://issues.apache.org/jira/browse/PIG-3082
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Jonathan Coveney
> Fix For: 0.12.0
>
> Attachments: PIG-3082-0.patch, PIG-3082-1.patch
>
>
> When defining an evalfunc that returns a Tuple there are two ways you can 
> implement outputSchema().
> - The right way: return a schema that contains one Field that contains the 
> type and schema of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and 
> it will be understood as a tuple schema even though there is no type (which 
> is in Field class) to specify that. This is particularly deceitful when the 
> output schema is derived from the input schema and the outputted Tuple 
> sometimes contain only one field. In such cases Pig understands the output 
> schema as a tuple only if there is more than one field. And sometimes it 
> works, sometimes it does not.
> We should at least issue a warning (backward compatibility) if not plain 
> throw an exception when the output schema contains more than one Field.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3470) Print configuration variables in grunt

2013-10-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3470:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. The patch is straightforward enough. I am fine to commit it to 0.12. Patch 
committed to both branches. Thanks Lorand!

> Print configuration variables in grunt
> --
>
> Key: PIG-3470
> URL: https://issues.apache.org/jira/browse/PIG-3470
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Lorand Bendig
>Assignee: Lorand Bendig
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: PIG-3470-2.patch, PIG-3470.patch
>
>
> However parameter handling is limited in grunt by design (PIG-2122), I would 
> find it sometimes useful to be able to list the jobConf properties when 
> testing statements or debugging scripts line by line. This patch extends the
> SET command; as an analogue to Hive when calling it itself without the key 
> value parameters it prints all the configuration variables. System properties 
> are prefixed with "system:"



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (PIG-3495) Streaming udf e2e tests failures on Windows

2013-10-03 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3495.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

Patch committed to branch 0.12 and trunk. 

> Streaming udf e2e tests failures on Windows
> ---
>
> Key: PIG-3495
> URL: https://issues.apache.org/jira/browse/PIG-3495
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3495-1.patch
>
>
> Register a jython script with an absolute path fail. For Example:
> {code}
> register 'D:\scriptingudf.py' using jython as myfuncs;
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = foreach a generate myfuncs.square(age);
> dump b;
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3494) Several fixes for e2e tests

2013-10-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785339#comment-13785339
 ] 

Hudson commented on PIG-3494:
-

FAILURE: Integrated in Hive-trunk-h0.21 #2376 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2376/])
PIG-3494: Several fixes for e2e tests (daijy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1528712)
* /pig/trunk/test/e2e/harness/test_harness.pl
* /pig/trunk/test/e2e/pig/conf/default.conf
* /pig/trunk/test/e2e/pig/conf/rpm.conf
* /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm
* /pig/trunk/test/e2e/pig/drivers/TestDriverScript.pm
* /pig/trunk/test/e2e/pig/tests/negative.conf
* /pig/trunk/test/e2e/pig/tests/nightly.conf


> Several fixes for e2e tests
> ---
>
> Key: PIG-3494
> URL: https://issues.apache.org/jira/browse/PIG-3494
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3494-1.patch
>
>
> Address several issues in e2e tests:
> 1. Adding the capacity to test Pig installed by rpm (also involves 
> configurable piggybank.jar)
> 2. Remove hadoop23.res since it is no longer needed
> 3. Remove hadoop2 specific error message "UdfException_[1-4]" since they are 
> fixed by PIG-3360



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3495) Streaming udf e2e tests failures on Windows

2013-10-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785197#comment-13785197
 ] 

Rohini Palaniswamy commented on PIG-3495:
-

Thanks. +1

> Streaming udf e2e tests failures on Windows
> ---
>
> Key: PIG-3495
> URL: https://issues.apache.org/jira/browse/PIG-3495
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: PIG-3495-1.patch
>
>
> Register a jython script with an absolute path fail. For Example:
> {code}
> register 'D:\scriptingudf.py' using jython as myfuncs;
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = foreach a generate myfuncs.square(age);
> dump b;
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3470) Print configuration variables in grunt

2013-10-03 Thread Lorand Bendig (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorand Bendig updated PIG-3470:
---

Attachment: PIG-3470-2.patch

Patch is modified according to your comment.
If accepted, the docs for the set command need to be updated

> Print configuration variables in grunt
> --
>
> Key: PIG-3470
> URL: https://issues.apache.org/jira/browse/PIG-3470
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Lorand Bendig
>Assignee: Lorand Bendig
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: PIG-3470-2.patch, PIG-3470.patch
>
>
> However parameter handling is limited in grunt by design (PIG-2122), I would 
> find it sometimes useful to be able to list the jobConf properties when 
> testing statements or debugging scripts line by line. This patch extends the
> SET command; as an analogue to Hive when calling it itself without the key 
> value parameters it prints all the configuration variables. System properties 
> are prefixed with "system:"



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (PIG-3445) Make Parquet format available out of the box in Pig

2013-10-03 Thread Lorand Bendig (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorand Bendig updated PIG-3445:
---

Attachment: PIG-3445-4.patch

[~dvryaboy] Thank you.
Well, yes, ParquetUtil is general util, so I merged it to JarManager.

> Make Parquet format available out of the box in Pig
> ---
>
> Key: PIG-3445
> URL: https://issues.apache.org/jira/browse/PIG-3445
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
> Fix For: 0.12.0
>
> Attachments: PIG-3445-2.patch, PIG-3445-3.patch, PIG-3445-4.patch, 
> PIG-3445.patch
>
>
> We would add the Parquet jar in the Pig packages to make it available out of 
> the box to pig users.
> On top of that we could add the parquet.pig package to the list of packages 
> to search for UDFs. (alternatively, the parquet jar could contain classes 
> name or.apache.pig.builtin.ParquetLoader and ParquetStorer)
> This way users can use Parquet simply by typing:
> A = LOAD 'foo' USING ParquetLoader();
> STORE A INTO 'bar' USING ParquetStorer();



--
This message was sent by Atlassian JIRA
(v6.1#6144)