[jira] [Updated] (PIG-4086) Fix Orc e2e tests for tez

2014-07-31 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4086:


Attachment: PIG-4086-1.patch

 Fix Orc e2e tests for tez
 -

 Key: PIG-4086
 URL: https://issues.apache.org/jira/browse/PIG-4086
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4086-1.patch


 All Orc e2e tests fail on tez.
 There are two issue:
 1. hivelibdir etc is not set in tez.conf
 2. OrcStorage produce empty output file
 Digging into #2, the problem is in this code in PigProcessor:
 {code}
 if (fileOutput.isCommitRequired()) {
 fileOutput.commit();
 }
 {code}
 fileOutput.commit() invokes both RecordWriter.close() and 
 committer.commitTask(). However, OrcNewOutputFormate will generate output 
 file only after RecordWriter.close (if the output file is small), 
 fileOutput.isCommitRequired will not detect this file, thus skip 
 fileOutput.commit().
 Changing the code to invoke fileOutput.close explicitly fix the issue. 
 fileOutput.commit will invoke close again, but there is no side effect since 
 close will check if it has been already called.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4086) Fix Orc e2e tests for tez

2014-07-31 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4086:
---

 Summary: Fix Orc e2e tests for tez
 Key: PIG-4086
 URL: https://issues.apache.org/jira/browse/PIG-4086
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0
 Attachments: PIG-4086-1.patch

All Orc e2e tests fail on tez.

There are two issue:
1. hivelibdir etc is not set in tez.conf
2. OrcStorage produce empty output file

Digging into #2, the problem is in this code in PigProcessor:
{code}
if (fileOutput.isCommitRequired()) {
fileOutput.commit();
}
{code}
fileOutput.commit() invokes both RecordWriter.close() and 
committer.commitTask(). However, OrcNewOutputFormate will generate output file 
only after RecordWriter.close (if the output file is small), 
fileOutput.isCommitRequired will not detect this file, thus skip 
fileOutput.commit().

Changing the code to invoke fileOutput.close explicitly fix the issue. 
fileOutput.commit will invoke close again, but there is no side effect since 
close will check if it has been already called.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4085) TEZ-1303 broke hadoop 2 compilation in trunk

2014-07-31 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080605#comment-14080605
 ] 

Daniel Dai commented on PIG-4085:
-

+1

 TEZ-1303 broke hadoop 2 compilation in trunk
 

 Key: PIG-4085
 URL: https://issues.apache.org/jira/browse/PIG-4085
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4085-1.patch


 {code}
 [javac] 
 /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:45:
  error: PartitionerDefinedVertexManager is not abstract and does not override 
 abstract method initialize() in VertexManagerPlugin
 [javac] public class PartitionerDefinedVertexManager extends 
 VertexManagerPlugin {
 [javac]^
 [javac] 
 /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:53:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4087) Download PIG link is not exist

2014-07-31 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081027#comment-14081027
 ] 

Akira AJISAKA commented on PIG-4087:


Moved to Pig project.

 Download PIG link is not exist 
 ---

 Key: PIG-4087
 URL: https://issues.apache.org/jira/browse/PIG-4087
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: evgeny
   Original Estimate: 1h
  Remaining Estimate: 1h

 In order to improve the usability we have to add to the main WEBSITE  
 download's link.
 github  provide this link , therefore instruction such as : 
 svn checkout http://svn.apache.org/repos/asf/pig/trunk/
 or
 git clone https://github.com/apache/pig.git 
 we can leave to the developers who want join to our project.
 Just make me happy by copying link to the zip file from the github .
 thanks .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (PIG-4087) Download PIG link is not exist

2014-07-31 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA moved HADOOP-10916 to PIG-4087:
-

Component/s: (was: conf)
 documentation
 Issue Type: Improvement  (was: Bug)
Key: PIG-4087  (was: HADOOP-10916)
Project: Pig  (was: Hadoop Common)

 Download PIG link is not exist 
 ---

 Key: PIG-4087
 URL: https://issues.apache.org/jira/browse/PIG-4087
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: evgeny
   Original Estimate: 1h
  Remaining Estimate: 1h

 In order to improve the usability we have to add to the main WEBSITE  
 download's link.
 github  provide this link , therefore instruction such as : 
 svn checkout http://svn.apache.org/repos/asf/pig/trunk/
 or
 git clone https://github.com/apache/pig.git 
 we can leave to the developers who want join to our project.
 Just make me happy by copying link to the zip file from the github .
 thanks .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error

2014-07-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081032#comment-14081032
 ] 

Josh Elser commented on PIG-4083:
-

I'll try to look into this, [~fang fang chen]. Any logs or other information 
you have would be helpful.

 TestAccumuloPigCluster always failed with timeout error
 ---

 Key: PIG-4083
 URL: https://issues.apache.org/jira/browse/PIG-4083
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: fang fang chen
Assignee: Josh Elser
Priority: Critical

 TestAccumuloPigCluster always failed with timeout error.
 Tried with sun jdk 6 and sun jdk 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (PIG-4083) TestAccumuloPigCluster always failed with timeout error

2014-07-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reassigned PIG-4083:
---

Assignee: Josh Elser

 TestAccumuloPigCluster always failed with timeout error
 ---

 Key: PIG-4083
 URL: https://issues.apache.org/jira/browse/PIG-4083
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: fang fang chen
Assignee: Josh Elser
Priority: Critical

 TestAccumuloPigCluster always failed with timeout error.
 Tried with sun jdk 6 and sun jdk 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4086) Fix Orc e2e tests for tez

2014-07-31 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081058#comment-14081058
 ] 

Rohini Palaniswamy commented on PIG-4086:
-

[~sseth]  told me when I did the patch to selectively start inputs that we 
should never call close() methods of input or output and only framework should 
do that. [~daijy], can you check with him?

 Fix Orc e2e tests for tez
 -

 Key: PIG-4086
 URL: https://issues.apache.org/jira/browse/PIG-4086
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4086-1.patch


 All Orc e2e tests fail on tez.
 There are two issue:
 1. hivelibdir etc is not set in tez.conf
 2. OrcStorage produce empty output file
 Digging into #2, the problem is in this code in PigProcessor:
 {code}
 if (fileOutput.isCommitRequired()) {
 fileOutput.commit();
 }
 {code}
 fileOutput.commit() invokes both RecordWriter.close() and 
 committer.commitTask(). However, OrcNewOutputFormate will generate output 
 file only after RecordWriter.close (if the output file is small), 
 fileOutput.isCommitRequired will not detect this file, thus skip 
 fileOutput.commit().
 Changing the code to invoke fileOutput.close explicitly fix the issue. 
 fileOutput.commit will invoke close again, but there is no side effect since 
 close will check if it has been already called.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error

2014-07-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081080#comment-14081080
 ] 

Josh Elser commented on PIG-4083:
-

This is passing for me using Oracle 1.7.0_55. It's possible that the 
MiniAccumuloCluster being started by the test is failing to start for a variety 
of reasons (lack of memory probably the most common). I can provide a quick 
patch which will add some extra logging information if you want to help me 
debug this.

 TestAccumuloPigCluster always failed with timeout error
 ---

 Key: PIG-4083
 URL: https://issues.apache.org/jira/browse/PIG-4083
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: fang fang chen
Assignee: Josh Elser
Priority: Critical

 TestAccumuloPigCluster always failed with timeout error.
 Tried with sun jdk 6 and sun jdk 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4085) TEZ-1303 broke hadoop 2 compilation in trunk

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4085:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Daniel for the review!

 TEZ-1303 broke hadoop 2 compilation in trunk
 

 Key: PIG-4085
 URL: https://issues.apache.org/jira/browse/PIG-4085
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4085-1.patch


 {code}
 [javac] 
 /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:45:
  error: PartitionerDefinedVertexManager is not abstract and does not override 
 abstract method initialize() in VertexManagerPlugin
 [javac] public class PartitionerDefinedVertexManager extends 
 VertexManagerPlugin {
 [javac]^
 [javac] 
 /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:53:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: An optimization for ROLLUP Operation on Apache Pig with 50% faster [PIG-4066]

2014-07-31 Thread Cheolsoo Park
This is due to hadoop 1 and 2 incompatibility. Did you compile Pig with
-Dhadoopversion=23?


On Thu, Jul 31, 2014 at 9:13 AM, Quang-Nhat HOANG-XUAN 
hxquangn...@gmail.com wrote:

 Hi,
 I've rebased my patch to trunk, but I cannot run it on our cluster.
 Our hadoop version is 2.0.0-cdh4.4.0.
 Do you know why this happened?

 This is the error log from Pig Stack Trace:

 ERROR 2998: Unhandled internal error.

 org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;

 java.lang.NoSuchMethodError:

 org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:195)
 at

 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:279)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1378)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363)
 at org.apache.pig.PigServer.execute(PigServer.java:1352)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:403)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:386)
 at
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
 at

 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233)
 at

 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
 at org.apache.pig.Main.run(Main.java:620)
 at org.apache.pig.Main.main(Main.java:168)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

 Thank you.

 Quang-Nhat


 On Thu, Jul 24, 2014 at 10:32 PM, Quang-Nhat HOANG-XUAN 
 hxquangn...@gmail.com wrote:

  Very helpful comments.
  I will fix them up.
 
  Thank you
 
  Quang-Nhat
 
 
  On Thu, Jul 24, 2014 at 9:54 PM, Cheolsoo Park piaozhe...@gmail.com
  wrote:
 
  I added some comments to the review board. I love the idea, but the
 patch
  needs to be cleaned up to get committed. Also, please provide a patch
 that
  is based to trunk. Or it's not easy to test.
 
  Thanks!
  Cheolsoo
 
 
 
 
  On Thu, Jul 24, 2014 at 5:06 AM, Nhat Hoang hxquangn...@gmail.com
  wrote:
 
   Hello everyone,
  
   I am currently a master student at EURECOM (www.eurecom.fr). I am
  working
   on a project related to Apache Pig in the context of a EU-funded
 project
   Bigfoot (www.bigfootproject.eu).
  
   Based on our previous work: “Duy-Hung Phan, Matteo Dell’Amico, Pietro
   Michiardi: On the design space of MapReduce ROLLUP aggregates” (
  
 http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf),
  I
   am working on a new family of algorithms to address some limitations
 of
  the
   current ROLLUP operator in Apache Pig: the IRG (in-reducer grouping),
  the
   hybrid IRG, and chained-IRG. I have an implementation that indicates
   superior performance to the existing ROLLUP implementation.
  
   You can find out more information on this work here:
   https://issues.apache.org/jira/browse/PIG-4066. I've also created a
  review
   request on the review board: https://reviews.apache.org/r/23804/
   It would be very helpful for me if someone can review and have some
   feedback on this patch.
  
  
   Looking forward for the feedback.
  
   Regards,
   Quang-Nhat HOANG-XUAN
  
 
 
 



Re: An optimization for ROLLUP Operation on Apache Pig with 50% faster [PIG-4066]

2014-07-31 Thread Quang-Nhat HOANG-XUAN
Yes, I did try. In the last trunk (r1579421), it worked perfectly when i
compiled with -Dhadoopversion23.

Quang-Nhat


On Thu, Jul 31, 2014 at 6:46 PM, Cheolsoo Park piaozhe...@gmail.com wrote:

 This is due to hadoop 1 and 2 incompatibility. Did you compile Pig with
 -Dhadoopversion=23?


 On Thu, Jul 31, 2014 at 9:13 AM, Quang-Nhat HOANG-XUAN 
 hxquangn...@gmail.com wrote:

  Hi,
  I've rebased my patch to trunk, but I cannot run it on our cluster.
  Our hadoop version is 2.0.0-cdh4.4.0.
  Do you know why this happened?
 
  This is the error log from Pig Stack Trace:
 
  ERROR 2998: Unhandled internal error.
 
 
 org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
 
  java.lang.NoSuchMethodError:
 
 
 org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;
  at
 
 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327)
  at
 
 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:195)
  at
 
 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:279)
  at org.apache.pig.PigServer.launchPlan(PigServer.java:1378)
  at
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363)
  at org.apache.pig.PigServer.execute(PigServer.java:1352)
  at org.apache.pig.PigServer.executeBatch(PigServer.java:403)
  at org.apache.pig.PigServer.executeBatch(PigServer.java:386)
  at
  org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
  at
 
 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233)
  at
 
 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204)
  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
  at org.apache.pig.Main.run(Main.java:620)
  at org.apache.pig.Main.main(Main.java:168)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 
  Thank you.
 
  Quang-Nhat
 
 
  On Thu, Jul 24, 2014 at 10:32 PM, Quang-Nhat HOANG-XUAN 
  hxquangn...@gmail.com wrote:
 
   Very helpful comments.
   I will fix them up.
  
   Thank you
  
   Quang-Nhat
  
  
   On Thu, Jul 24, 2014 at 9:54 PM, Cheolsoo Park piaozhe...@gmail.com
   wrote:
  
   I added some comments to the review board. I love the idea, but the
  patch
   needs to be cleaned up to get committed. Also, please provide a patch
  that
   is based to trunk. Or it's not easy to test.
  
   Thanks!
   Cheolsoo
  
  
  
  
   On Thu, Jul 24, 2014 at 5:06 AM, Nhat Hoang hxquangn...@gmail.com
   wrote:
  
Hello everyone,
   
I am currently a master student at EURECOM (www.eurecom.fr). I am
   working
on a project related to Apache Pig in the context of a EU-funded
  project
Bigfoot (www.bigfootproject.eu).
   
Based on our previous work: “Duy-Hung Phan, Matteo Dell’Amico,
 Pietro
Michiardi: On the design space of MapReduce ROLLUP aggregates” (
   
  http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf),
   I
am working on a new family of algorithms to address some limitations
  of
   the
current ROLLUP operator in Apache Pig: the IRG (in-reducer
 grouping),
   the
hybrid IRG, and chained-IRG. I have an implementation that indicates
superior performance to the existing ROLLUP implementation.
   
You can find out more information on this work here:
https://issues.apache.org/jira/browse/PIG-4066. I've also created a
   review
request on the review board: https://reviews.apache.org/r/23804/
It would be very helpful for me if someone can review and have some
feedback on this patch.
   
   
Looking forward for the feedback.
   
Regards,
Quang-Nhat HOANG-XUAN
   
  
  
  
 



[jira] [Created] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk

2014-07-31 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-4088:
--

 Summary: TEZ-1346 breaks hadoop 2 compilation in trunk
 Key: PIG-4088
 URL: https://issues.apache.org/jira/browse/PIG-4088
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0


TEZ-1346 is not published into apache snapshot repo yet, but once it's, it will 
break Pig trunk-
{code}
[javac] 
/Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
 error: PigProcessor is not abstract and does not override abstract method 
initialize() in Processor
[javac] public class PigProcessor implements LogicalIOProcessor {
[javac]^
[javac] 
/Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
 error: method does not override or implement a method from a supertype
[javac] @Override
[javac] ^
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4088:
---

Attachment: PIG-4088-1.patch

Uploading a patch.

 TEZ-1346 breaks hadoop 2 compilation in trunk
 -

 Key: PIG-4088
 URL: https://issues.apache.org/jira/browse/PIG-4088
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4088-1.patch


 TEZ-1346 is not published into apache snapshot repo yet, but once it's, it 
 will break Pig trunk-
 {code}
 [javac] 
 /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
  error: PigProcessor is not abstract and does not override abstract method 
 initialize() in Processor
 [javac] public class PigProcessor implements LogicalIOProcessor {
 [javac]^
 [javac] 
 /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1

2014-07-31 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-4089:
--

 Summary: TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk 
after PIG-4079 in Hadoop 1
 Key: PIG-4089
 URL: https://issues.apache.org/jira/browse/PIG-4089
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0


The job fails with the following error in *Hadoop 1* local mode-
{code}
2014-07-31 05:55:06,630 [Thread-75] WARN  
org.apache.hadoop.mapred.LocalJobRunner  - job_local_0021
java.io.IOException: Illegal partition for Null: false index: 0 5 (1)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
{code}
This is because Hadoop 1 doesn't support multiple reducers in local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4089:
---

Attachment: PIG-4089-1.patch

Attaching a patch that changes parallelism of reducers from 3 to 1 so that the 
test case will pass in both hadoop 1 and 2. I don't think there is any reason 
why the parallelism needs to be greater than 1 in the test case.

 TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in 
 Hadoop 1
 --

 Key: PIG-4089
 URL: https://issues.apache.org/jira/browse/PIG-4089
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4089-1.patch


 The job fails with the following error in *Hadoop 1* local mode-
 {code}
 2014-07-31 05:55:06,630 [Thread-75] WARN  
 org.apache.hadoop.mapred.LocalJobRunner  - job_local_0021
 java.io.IOException: Illegal partition for Null: false index: 0 5 (1)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}
 This is because Hadoop 1 doesn't support multiple reducers in local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4089:
---

Status: Patch Available  (was: Open)

 TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in 
 Hadoop 1
 --

 Key: PIG-4089
 URL: https://issues.apache.org/jira/browse/PIG-4089
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4089-1.patch


 The job fails with the following error in *Hadoop 1* local mode-
 {code}
 2014-07-31 05:55:06,630 [Thread-75] WARN  
 org.apache.hadoop.mapred.LocalJobRunner  - job_local_0021
 java.io.IOException: Illegal partition for Null: false index: 0 5 (1)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}
 This is because Hadoop 1 doesn't support multiple reducers in local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk

2014-07-31 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081225#comment-14081225
 ] 

Daniel Dai commented on PIG-4088:
-

+1

 TEZ-1346 breaks hadoop 2 compilation in trunk
 -

 Key: PIG-4088
 URL: https://issues.apache.org/jira/browse/PIG-4088
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4088-1.patch


 TEZ-1346 is not published into apache snapshot repo yet, but once it's, it 
 will break Pig trunk-
 {code}
 [javac] 
 /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
  error: PigProcessor is not abstract and does not override abstract method 
 initialize() in Processor
 [javac] public class PigProcessor implements LogicalIOProcessor {
 [javac]^
 [javac] 
 /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1

2014-07-31 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081222#comment-14081222
 ] 

Daniel Dai commented on PIG-4089:
-

+1

 TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in 
 Hadoop 1
 --

 Key: PIG-4089
 URL: https://issues.apache.org/jira/browse/PIG-4089
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4089-1.patch


 The job fails with the following error in *Hadoop 1* local mode-
 {code}
 2014-07-31 05:55:06,630 [Thread-75] WARN  
 org.apache.hadoop.mapred.LocalJobRunner  - job_local_0021
 java.io.IOException: Illegal partition for Null: false index: 0 5 (1)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}
 This is because Hadoop 1 doesn't support multiple reducers in local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4089:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you Daniel for the review!

 TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in 
 Hadoop 1
 --

 Key: PIG-4089
 URL: https://issues.apache.org/jira/browse/PIG-4089
 Project: Pig
  Issue Type: Bug
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4089-1.patch


 The job fails with the following error in *Hadoop 1* local mode-
 {code}
 2014-07-31 05:55:06,630 [Thread-75] WARN  
 org.apache.hadoop.mapred.LocalJobRunner  - job_local_0021
 java.io.IOException: Illegal partition for Null: false index: 0 5 (1)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}
 This is because Hadoop 1 doesn't support multiple reducers in local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk

2014-07-31 Thread Koji Noguchi (JIRA)
Koji Noguchi created PIG-4090:
-

 Summary: TEZ-1346 broke hadoop 2 compilation in trunk
 Key: PIG-4090
 URL: https://issues.apache.org/jira/browse/PIG-4090
 Project: Pig
  Issue Type: Bug
  Components: tez
Affects Versions: 0.14.0
Reporter: Koji Noguchi
Priority: Trivial


{noformat}
[javac] 
/Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
 error: PigProcessor is not abstract and does not override abstract method 
initialize() in Processor
[javac] public class PigProcessor implements LogicalIOProcessor {
[javac]^
[javac] 
/Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
 error: method does not override or implement a method from a supertype
[javac] @Override
[javac] ^
[javac]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk

2014-07-31 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-4090:
--

Attachment: pig-4090-v01.txt

Not understanding TEZ at all but changing the code to fit with the patch in 
TEZ-1346.

 TEZ-1346 broke hadoop 2 compilation in trunk
 

 Key: PIG-4090
 URL: https://issues.apache.org/jira/browse/PIG-4090
 Project: Pig
  Issue Type: Bug
  Components: tez
Affects Versions: 0.14.0
Reporter: Koji Noguchi
Priority: Trivial
 Attachments: pig-4090-v01.txt


 {noformat}
 [javac] 
 /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
  error: PigProcessor is not abstract and does not override abstract method 
 initialize() in Processor
 [javac] public class PigProcessor implements LogicalIOProcessor {
 [javac]^
 [javac] 
 /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 [javac]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4088:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 TEZ-1346 breaks hadoop 2 compilation in trunk
 -

 Key: PIG-4088
 URL: https://issues.apache.org/jira/browse/PIG-4088
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.14.0

 Attachments: PIG-4088-1.patch


 TEZ-1346 is not published into apache snapshot repo yet, but once it's, it 
 will break Pig trunk-
 {code}
 [javac] 
 /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
  error: PigProcessor is not abstract and does not override abstract method 
 initialize() in Processor
 [javac] public class PigProcessor implements LogicalIOProcessor {
 [javac]^
 [javac] 
 /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk

2014-07-31 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park resolved PIG-4090.


Resolution: Duplicate

[~knoguchi], I committed PIG-4088. The identical patch. Hope we will no longer 
have to chase Tez changes soon.

 TEZ-1346 broke hadoop 2 compilation in trunk
 

 Key: PIG-4090
 URL: https://issues.apache.org/jira/browse/PIG-4090
 Project: Pig
  Issue Type: Bug
  Components: tez
Affects Versions: 0.14.0
Reporter: Koji Noguchi
Priority: Trivial
 Attachments: pig-4090-v01.txt


 {noformat}
 [javac] 
 /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
  error: PigProcessor is not abstract and does not override abstract method 
 initialize() in Processor
 [javac] public class PigProcessor implements LogicalIOProcessor {
 [javac]^
 [javac] 
 /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 [javac]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk

2014-07-31 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081360#comment-14081360
 ] 

Koji Noguchi commented on PIG-4090:
---

bq. The identical patch. Hope we will no longer have to chase Tez changes soon.

Ah, thanks.  I hope so too~.

 TEZ-1346 broke hadoop 2 compilation in trunk
 

 Key: PIG-4090
 URL: https://issues.apache.org/jira/browse/PIG-4090
 Project: Pig
  Issue Type: Bug
  Components: tez
Affects Versions: 0.14.0
Reporter: Koji Noguchi
Priority: Trivial
 Attachments: pig-4090-v01.txt


 {noformat}
 [javac] 
 /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67:
  error: PigProcessor is not abstract and does not override abstract method 
 initialize() in Processor
 [javac] public class PigProcessor implements LogicalIOProcessor {
 [javac]^
 [javac] 
 /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102:
  error: method does not override or implement a method from a supertype
 [javac] @Override
 [javac] ^
 [javac]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3760:


Summary: Predicate pushdown for columnar file formats  (was: Predicate 
pushdown for ORC and Parquet)

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
Assignee: Rohini Palaniswamy
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3760:


Assignee: (was: Rohini Palaniswamy)

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4091) Predicate pushdown for ORC

2014-07-31 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4091:
---

 Summary: Predicate pushdown for ORC
 Key: PIG-4091
 URL: https://issues.apache.org/jira/browse/PIG-4091
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4092:
---

 Summary: Predicate pushdown for Parquet
 Key: PIG-4092
 URL: https://issues.apache.org/jira/browse/PIG-4092
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4093) Predicate pushdown to support removing filters from pig plan

2014-07-31 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4093:
---

 Summary: Predicate pushdown to support removing filters from pig 
plan
 Key: PIG-4093
 URL: https://issues.apache.org/jira/browse/PIG-4093
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy


   It is possible for the loaders to evaluate the pushed filter conditions. In 
that case it is not necessary to retain the filter conditions in the pig plan. 
So need to support two modes :
1) filter conditions are pushed into loader but also retained in pig plan 
as loader might do only best effort filtering based on block metadata
2) filter conditions are pushed into loader and removed from pig plan when 
the loader can evaluate the expression itself and filter out records. In this 
case, loader can do lazy deserialization adn avoid deserialization of the full 
record.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4094) Predicate pushdown to support complex data types

2014-07-31 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4094:
---

 Summary: Predicate pushdown to support complex data types
 Key: PIG-4094
 URL: https://issues.apache.org/jira/browse/PIG-4094
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0


  Parquet has support for pushing predicates on tuples, maps and bags according 
to [~aniket486]. ORC currently only supports primitives, but will add support 
for structs(tuples) in the future.  The API needs to be there even if not 
implemented as it will hard to change the interface once released.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-4095) Collapse multiple OR conditions to IN and BETWEEN

2014-07-31 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4095:
---

 Summary: Collapse multiple OR conditions to IN and BETWEEN
 Key: PIG-4095
 URL: https://issues.apache.org/jira/browse/PIG-4095
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy


  ORC predicate pushdown supports IN and BETWEEN operators. Need equivalent 
expressions in Pig.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4091) Predicate pushdown for ORC

2014-07-31 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4091:


Attachment: PIG-3760-initial.patch

 Predicate pushdown for ORC
 --

 Key: PIG-4091
 URL: https://issues.apache.org/jira/browse/PIG-4091
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0

 Attachments: PIG-3760-initial.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4091) Predicate pushdown for ORC

2014-07-31 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081711#comment-14081711
 ] 

Rohini Palaniswamy commented on PIG-4091:
-

Attached initial patch. Still has some pending TODOs
   - Add e2e tests
   - Add tests for datatypes - boolean, byte, short, biginteger, bigdecimal, 
datetime

LoadPredicatePushdown interface needs some more enhancements. Filed PIG-4093 
and PIG-4094 for that. 

 Predicate pushdown for ORC
 --

 Key: PIG-4091
 URL: https://issues.apache.org/jira/browse/PIG-4091
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0

 Attachments: PIG-3760-initial.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081715#comment-14081715
 ] 

Rohini Palaniswamy commented on PIG-3760:
-

Attached initial patch with PIG-4091 with basic functionality required of 
Predicate Pushdown interface. The interface needs some more enhancements. Filed 
PIG-4093 and PIG-4094 for that. 

[~julienledem]/ [~dvryaboy],
 Is there someone in Twitter that we can work with for the Parquet 
implementation? It would help us flush out and finalize the APIs. 

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] Subscription: PIG patch available

2014-07-31 Thread jira
Issue Subscription
Filter: PIG patch available (14 issues)

Subscriber: pigdaily

Key Summary
PIG-4066An optimization for ROLLUP operation in Pig
https://issues.apache.org/jira/browse/PIG-4066
PIG-4008Pig code change to enable Tez Local mode 
https://issues.apache.org/jira/browse/PIG-4008
PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce
https://issues.apache.org/jira/browse/PIG-4004
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3861duplicate jars get added to distributed cache
https://issues.apache.org/jira/browse/PIG-3861
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error

2014-07-31 Thread fang fang chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081817#comment-14081817
 ] 

fang fang chen commented on PIG-4083:
-

BTW, I was uring sun jdk 1.7.0_60/1.6.0_45 and ibm jdk 1.6.0/1.7.0. All failed. 
If this is caused by environment, I want to know what caused this issue and how 
to resolve. This would be helpful if pig can provide this information. Thanks.

 TestAccumuloPigCluster always failed with timeout error
 ---

 Key: PIG-4083
 URL: https://issues.apache.org/jira/browse/PIG-4083
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: fang fang chen
Assignee: Josh Elser
Priority: Critical

 TestAccumuloPigCluster always failed with timeout error.
 Tried with sun jdk 6 and sun jdk 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error

2014-07-31 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081823#comment-14081823
 ] 

Josh Elser commented on PIG-4083:
-

Sounds good, I'll get a patch with some extra debugging here for you. Out of 
curiosity, does it fail quickly?

 TestAccumuloPigCluster always failed with timeout error
 ---

 Key: PIG-4083
 URL: https://issues.apache.org/jira/browse/PIG-4083
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: fang fang chen
Assignee: Josh Elser
Priority: Critical

 TestAccumuloPigCluster always failed with timeout error.
 Tried with sun jdk 6 and sun jdk 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4083) TestAccumuloPigCluster always failed with timeout error

2014-07-31 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PIG-4083:


Attachment: PIG-4083-debug.patch

Ok, [~fang fang chen]. You can apply this using {{patch -p1 
PIG-4083-debug.patch}}.

Then, run just the testcase {{ant test -Dtestcase=TestAccumuloPigCluster}}.

After, please attach 
{{build/test/logs/TEST-org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster.txt}}.

Also, in that same log file, you will also see a line that matches {{INFO  
org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster  - Starting 
MiniAccumuloCluster in ...}}, where {{...}} is some directory on your local 
filesystem. That directory is where the MiniAccumuloCluster was started from. 
Please attach the contents of the {{logs}} directory beneath the temporary 
directory path, as well.

Those two logs should help me better understand why this test was failing for 
you. Thanks.

 TestAccumuloPigCluster always failed with timeout error
 ---

 Key: PIG-4083
 URL: https://issues.apache.org/jira/browse/PIG-4083
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: fang fang chen
Assignee: Josh Elser
Priority: Critical
 Attachments: PIG-4083-debug.patch


 TestAccumuloPigCluster always failed with timeout error.
 Tried with sun jdk 6 and sun jdk 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081944#comment-14081944
 ] 

Julien Le Dem commented on PIG-3760:


[~rohini] I added to the description of PIG-4092

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-4092:
---

Description: 
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java

 Predicate pushdown for Parquet
 --

 Key: PIG-4092
 URL: https://issues.apache.org/jira/browse/PIG-4092
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0


 See:
 https://github.com/apache/incubator-parquet-mr/pull/4
 and:
 https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet

2014-07-31 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-4092:
---

Description: 
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java
[~alexlevenson] is the main author of this API

  was:
See:
https://github.com/apache/incubator-parquet-mr/pull/4
and:
https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java


 Predicate pushdown for Parquet
 --

 Key: PIG-4092
 URL: https://issues.apache.org/jira/browse/PIG-4092
 Project: Pig
  Issue Type: Sub-task
Reporter: Rohini Palaniswamy
 Fix For: 0.14.0


 See:
 https://github.com/apache/incubator-parquet-mr/pull/4
 and:
 https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java
 [~alexlevenson] is the main author of this API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats

2014-07-31 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081948#comment-14081948
 ] 

Julien Le Dem commented on PIG-3760:


FYI in Parquet the filter is not a hint and it will be applied to records after 
the metadata

 Predicate pushdown for columnar file formats
 

 Key: PIG-3760
 URL: https://issues.apache.org/jira/browse/PIG-3760
 Project: Pig
  Issue Type: New Feature
Reporter: Andrew Musselman
 Fix For: 0.14.0


 From the conversation on dev@pig:
 Partition pruning for ORC is not addressed in PIG-3558. We will need
 to do partition pruning for both ORC and Parquet in a new ticket.
 Curently there is no interface to deal with this kind of pushdown
 (LoadMetadata.setPartitionFilter push the filter to loader, but remove
 the filter statement, for ORC/Parquet, filter is a hint, and we need
 to do the filter again in Pig even it is pushed to loader), we will
 need to define a new interface for that. You are welcome to initiate
 the work. I know Aniket is also interested in doing that, so be sure
 the talk with him about this work.
 Thanks,
 Daniel
 On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman
 andrew.mussel...@gmail.com wrote:
  I had a chat with a couple people last week about a feature request for
  Pig:  in a where or filter clause, when loading an ORC file, to skip
  directly to the right offset instead of scanning the whole file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)