[jira] [Updated] (PIG-4086) Fix Orc e2e tests for tez
[ https://issues.apache.org/jira/browse/PIG-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4086: Attachment: PIG-4086-1.patch Fix Orc e2e tests for tez - Key: PIG-4086 URL: https://issues.apache.org/jira/browse/PIG-4086 Project: Pig Issue Type: Bug Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: PIG-4086-1.patch All Orc e2e tests fail on tez. There are two issue: 1. hivelibdir etc is not set in tez.conf 2. OrcStorage produce empty output file Digging into #2, the problem is in this code in PigProcessor: {code} if (fileOutput.isCommitRequired()) { fileOutput.commit(); } {code} fileOutput.commit() invokes both RecordWriter.close() and committer.commitTask(). However, OrcNewOutputFormate will generate output file only after RecordWriter.close (if the output file is small), fileOutput.isCommitRequired will not detect this file, thus skip fileOutput.commit(). Changing the code to invoke fileOutput.close explicitly fix the issue. fileOutput.commit will invoke close again, but there is no side effect since close will check if it has been already called. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4086) Fix Orc e2e tests for tez
Daniel Dai created PIG-4086: --- Summary: Fix Orc e2e tests for tez Key: PIG-4086 URL: https://issues.apache.org/jira/browse/PIG-4086 Project: Pig Issue Type: Bug Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: PIG-4086-1.patch All Orc e2e tests fail on tez. There are two issue: 1. hivelibdir etc is not set in tez.conf 2. OrcStorage produce empty output file Digging into #2, the problem is in this code in PigProcessor: {code} if (fileOutput.isCommitRequired()) { fileOutput.commit(); } {code} fileOutput.commit() invokes both RecordWriter.close() and committer.commitTask(). However, OrcNewOutputFormate will generate output file only after RecordWriter.close (if the output file is small), fileOutput.isCommitRequired will not detect this file, thus skip fileOutput.commit(). Changing the code to invoke fileOutput.close explicitly fix the issue. fileOutput.commit will invoke close again, but there is no side effect since close will check if it has been already called. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4085) TEZ-1303 broke hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080605#comment-14080605 ] Daniel Dai commented on PIG-4085: - +1 TEZ-1303 broke hadoop 2 compilation in trunk Key: PIG-4085 URL: https://issues.apache.org/jira/browse/PIG-4085 Project: Pig Issue Type: Bug Components: tez Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4085-1.patch {code} [javac] /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:45: error: PartitionerDefinedVertexManager is not abstract and does not override abstract method initialize() in VertexManagerPlugin [javac] public class PartitionerDefinedVertexManager extends VertexManagerPlugin { [javac]^ [javac] /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:53: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4087) Download PIG link is not exist
[ https://issues.apache.org/jira/browse/PIG-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081027#comment-14081027 ] Akira AJISAKA commented on PIG-4087: Moved to Pig project. Download PIG link is not exist --- Key: PIG-4087 URL: https://issues.apache.org/jira/browse/PIG-4087 Project: Pig Issue Type: Improvement Components: documentation Reporter: evgeny Original Estimate: 1h Remaining Estimate: 1h In order to improve the usability we have to add to the main WEBSITE download's link. github provide this link , therefore instruction such as : svn checkout http://svn.apache.org/repos/asf/pig/trunk/ or git clone https://github.com/apache/pig.git we can leave to the developers who want join to our project. Just make me happy by copying link to the zip file from the github . thanks . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (PIG-4087) Download PIG link is not exist
[ https://issues.apache.org/jira/browse/PIG-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA moved HADOOP-10916 to PIG-4087: - Component/s: (was: conf) documentation Issue Type: Improvement (was: Bug) Key: PIG-4087 (was: HADOOP-10916) Project: Pig (was: Hadoop Common) Download PIG link is not exist --- Key: PIG-4087 URL: https://issues.apache.org/jira/browse/PIG-4087 Project: Pig Issue Type: Improvement Components: documentation Reporter: evgeny Original Estimate: 1h Remaining Estimate: 1h In order to improve the usability we have to add to the main WEBSITE download's link. github provide this link , therefore instruction such as : svn checkout http://svn.apache.org/repos/asf/pig/trunk/ or git clone https://github.com/apache/pig.git we can leave to the developers who want join to our project. Just make me happy by copying link to the zip file from the github . thanks . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error
[ https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081032#comment-14081032 ] Josh Elser commented on PIG-4083: - I'll try to look into this, [~fang fang chen]. Any logs or other information you have would be helpful. TestAccumuloPigCluster always failed with timeout error --- Key: PIG-4083 URL: https://issues.apache.org/jira/browse/PIG-4083 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: fang fang chen Assignee: Josh Elser Priority: Critical TestAccumuloPigCluster always failed with timeout error. Tried with sun jdk 6 and sun jdk 7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (PIG-4083) TestAccumuloPigCluster always failed with timeout error
[ https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser reassigned PIG-4083: --- Assignee: Josh Elser TestAccumuloPigCluster always failed with timeout error --- Key: PIG-4083 URL: https://issues.apache.org/jira/browse/PIG-4083 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: fang fang chen Assignee: Josh Elser Priority: Critical TestAccumuloPigCluster always failed with timeout error. Tried with sun jdk 6 and sun jdk 7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4086) Fix Orc e2e tests for tez
[ https://issues.apache.org/jira/browse/PIG-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081058#comment-14081058 ] Rohini Palaniswamy commented on PIG-4086: - [~sseth] told me when I did the patch to selectively start inputs that we should never call close() methods of input or output and only framework should do that. [~daijy], can you check with him? Fix Orc e2e tests for tez - Key: PIG-4086 URL: https://issues.apache.org/jira/browse/PIG-4086 Project: Pig Issue Type: Bug Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: PIG-4086-1.patch All Orc e2e tests fail on tez. There are two issue: 1. hivelibdir etc is not set in tez.conf 2. OrcStorage produce empty output file Digging into #2, the problem is in this code in PigProcessor: {code} if (fileOutput.isCommitRequired()) { fileOutput.commit(); } {code} fileOutput.commit() invokes both RecordWriter.close() and committer.commitTask(). However, OrcNewOutputFormate will generate output file only after RecordWriter.close (if the output file is small), fileOutput.isCommitRequired will not detect this file, thus skip fileOutput.commit(). Changing the code to invoke fileOutput.close explicitly fix the issue. fileOutput.commit will invoke close again, but there is no side effect since close will check if it has been already called. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error
[ https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081080#comment-14081080 ] Josh Elser commented on PIG-4083: - This is passing for me using Oracle 1.7.0_55. It's possible that the MiniAccumuloCluster being started by the test is failing to start for a variety of reasons (lack of memory probably the most common). I can provide a quick patch which will add some extra logging information if you want to help me debug this. TestAccumuloPigCluster always failed with timeout error --- Key: PIG-4083 URL: https://issues.apache.org/jira/browse/PIG-4083 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: fang fang chen Assignee: Josh Elser Priority: Critical TestAccumuloPigCluster always failed with timeout error. Tried with sun jdk 6 and sun jdk 7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4085) TEZ-1303 broke hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4085: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks Daniel for the review! TEZ-1303 broke hadoop 2 compilation in trunk Key: PIG-4085 URL: https://issues.apache.org/jira/browse/PIG-4085 Project: Pig Issue Type: Bug Components: tez Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4085-1.patch {code} [javac] /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:45: error: PartitionerDefinedVertexManager is not abstract and does not override abstract method initialize() in VertexManagerPlugin [javac] public class PartitionerDefinedVertexManager extends VertexManagerPlugin { [javac]^ [javac] /Users/cheolsoop/workspace/pig-apache/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:53: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: An optimization for ROLLUP Operation on Apache Pig with 50% faster [PIG-4066]
This is due to hadoop 1 and 2 incompatibility. Did you compile Pig with -Dhadoopversion=23? On Thu, Jul 31, 2014 at 9:13 AM, Quang-Nhat HOANG-XUAN hxquangn...@gmail.com wrote: Hi, I've rebased my patch to trunk, but I cannot run it on our cluster. Our hadoop version is 2.0.0-cdh4.4.0. Do you know why this happened? This is the error log from Pig Stack Trace: ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; java.lang.NoSuchMethodError: org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:195) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:279) at org.apache.pig.PigServer.launchPlan(PigServer.java:1378) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363) at org.apache.pig.PigServer.execute(PigServer.java:1352) at org.apache.pig.PigServer.executeBatch(PigServer.java:403) at org.apache.pig.PigServer.executeBatch(PigServer.java:386) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:620) at org.apache.pig.Main.main(Main.java:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Thank you. Quang-Nhat On Thu, Jul 24, 2014 at 10:32 PM, Quang-Nhat HOANG-XUAN hxquangn...@gmail.com wrote: Very helpful comments. I will fix them up. Thank you Quang-Nhat On Thu, Jul 24, 2014 at 9:54 PM, Cheolsoo Park piaozhe...@gmail.com wrote: I added some comments to the review board. I love the idea, but the patch needs to be cleaned up to get committed. Also, please provide a patch that is based to trunk. Or it's not easy to test. Thanks! Cheolsoo On Thu, Jul 24, 2014 at 5:06 AM, Nhat Hoang hxquangn...@gmail.com wrote: Hello everyone, I am currently a master student at EURECOM (www.eurecom.fr). I am working on a project related to Apache Pig in the context of a EU-funded project Bigfoot (www.bigfootproject.eu). Based on our previous work: “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of MapReduce ROLLUP aggregates” ( http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), I am working on a new family of algorithms to address some limitations of the current ROLLUP operator in Apache Pig: the IRG (in-reducer grouping), the hybrid IRG, and chained-IRG. I have an implementation that indicates superior performance to the existing ROLLUP implementation. You can find out more information on this work here: https://issues.apache.org/jira/browse/PIG-4066. I've also created a review request on the review board: https://reviews.apache.org/r/23804/ It would be very helpful for me if someone can review and have some feedback on this patch. Looking forward for the feedback. Regards, Quang-Nhat HOANG-XUAN
Re: An optimization for ROLLUP Operation on Apache Pig with 50% faster [PIG-4066]
Yes, I did try. In the last trunk (r1579421), it worked perfectly when i compiled with -Dhadoopversion23. Quang-Nhat On Thu, Jul 31, 2014 at 6:46 PM, Cheolsoo Park piaozhe...@gmail.com wrote: This is due to hadoop 1 and 2 incompatibility. Did you compile Pig with -Dhadoopversion=23? On Thu, Jul 31, 2014 at 9:13 AM, Quang-Nhat HOANG-XUAN hxquangn...@gmail.com wrote: Hi, I've rebased my patch to trunk, but I cannot run it on our cluster. Our hadoop version is 2.0.0-cdh4.4.0. Do you know why this happened? This is the error log from Pig Stack Trace: ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; java.lang.NoSuchMethodError: org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:327) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:195) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:279) at org.apache.pig.PigServer.launchPlan(PigServer.java:1378) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363) at org.apache.pig.PigServer.execute(PigServer.java:1352) at org.apache.pig.PigServer.executeBatch(PigServer.java:403) at org.apache.pig.PigServer.executeBatch(PigServer.java:386) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:620) at org.apache.pig.Main.main(Main.java:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Thank you. Quang-Nhat On Thu, Jul 24, 2014 at 10:32 PM, Quang-Nhat HOANG-XUAN hxquangn...@gmail.com wrote: Very helpful comments. I will fix them up. Thank you Quang-Nhat On Thu, Jul 24, 2014 at 9:54 PM, Cheolsoo Park piaozhe...@gmail.com wrote: I added some comments to the review board. I love the idea, but the patch needs to be cleaned up to get committed. Also, please provide a patch that is based to trunk. Or it's not easy to test. Thanks! Cheolsoo On Thu, Jul 24, 2014 at 5:06 AM, Nhat Hoang hxquangn...@gmail.com wrote: Hello everyone, I am currently a master student at EURECOM (www.eurecom.fr). I am working on a project related to Apache Pig in the context of a EU-funded project Bigfoot (www.bigfootproject.eu). Based on our previous work: “Duy-Hung Phan, Matteo Dell’Amico, Pietro Michiardi: On the design space of MapReduce ROLLUP aggregates” ( http://www.eurecom.fr/en/publication/4212/download/rs-publi-4212_2.pdf), I am working on a new family of algorithms to address some limitations of the current ROLLUP operator in Apache Pig: the IRG (in-reducer grouping), the hybrid IRG, and chained-IRG. I have an implementation that indicates superior performance to the existing ROLLUP implementation. You can find out more information on this work here: https://issues.apache.org/jira/browse/PIG-4066. I've also created a review request on the review board: https://reviews.apache.org/r/23804/ It would be very helpful for me if someone can review and have some feedback on this patch. Looking forward for the feedback. Regards, Quang-Nhat HOANG-XUAN
[jira] [Created] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk
Cheolsoo Park created PIG-4088: -- Summary: TEZ-1346 breaks hadoop 2 compilation in trunk Key: PIG-4088 URL: https://issues.apache.org/jira/browse/PIG-4088 Project: Pig Issue Type: Bug Components: tez Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 TEZ-1346 is not published into apache snapshot repo yet, but once it's, it will break Pig trunk- {code} [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4088: --- Attachment: PIG-4088-1.patch Uploading a patch. TEZ-1346 breaks hadoop 2 compilation in trunk - Key: PIG-4088 URL: https://issues.apache.org/jira/browse/PIG-4088 Project: Pig Issue Type: Bug Components: tez Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4088-1.patch TEZ-1346 is not published into apache snapshot repo yet, but once it's, it will break Pig trunk- {code} [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1
Cheolsoo Park created PIG-4089: -- Summary: TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1 Key: PIG-4089 URL: https://issues.apache.org/jira/browse/PIG-4089 Project: Pig Issue Type: Bug Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 The job fails with the following error in *Hadoop 1* local mode- {code} 2014-07-31 05:55:06,630 [Thread-75] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0021 java.io.IOException: Illegal partition for Null: false index: 0 5 (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} This is because Hadoop 1 doesn't support multiple reducers in local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1
[ https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4089: --- Attachment: PIG-4089-1.patch Attaching a patch that changes parallelism of reducers from 3 to 1 so that the test case will pass in both hadoop 1 and 2. I don't think there is any reason why the parallelism needs to be greater than 1 in the test case. TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1 -- Key: PIG-4089 URL: https://issues.apache.org/jira/browse/PIG-4089 Project: Pig Issue Type: Bug Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4089-1.patch The job fails with the following error in *Hadoop 1* local mode- {code} 2014-07-31 05:55:06,630 [Thread-75] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0021 java.io.IOException: Illegal partition for Null: false index: 0 5 (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} This is because Hadoop 1 doesn't support multiple reducers in local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1
[ https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4089: --- Status: Patch Available (was: Open) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1 -- Key: PIG-4089 URL: https://issues.apache.org/jira/browse/PIG-4089 Project: Pig Issue Type: Bug Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4089-1.patch The job fails with the following error in *Hadoop 1* local mode- {code} 2014-07-31 05:55:06,630 [Thread-75] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0021 java.io.IOException: Illegal partition for Null: false index: 0 5 (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} This is because Hadoop 1 doesn't support multiple reducers in local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081225#comment-14081225 ] Daniel Dai commented on PIG-4088: - +1 TEZ-1346 breaks hadoop 2 compilation in trunk - Key: PIG-4088 URL: https://issues.apache.org/jira/browse/PIG-4088 Project: Pig Issue Type: Bug Components: tez Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4088-1.patch TEZ-1346 is not published into apache snapshot repo yet, but once it's, it will break Pig trunk- {code} [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1
[ https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081222#comment-14081222 ] Daniel Dai commented on PIG-4089: - +1 TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1 -- Key: PIG-4089 URL: https://issues.apache.org/jira/browse/PIG-4089 Project: Pig Issue Type: Bug Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4089-1.patch The job fails with the following error in *Hadoop 1* local mode- {code} 2014-07-31 05:55:06,630 [Thread-75] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0021 java.io.IOException: Illegal partition for Null: false index: 0 5 (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} This is because Hadoop 1 doesn't support multiple reducers in local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4089) TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1
[ https://issues.apache.org/jira/browse/PIG-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4089: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thank you Daniel for the review! TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after PIG-4079 in Hadoop 1 -- Key: PIG-4089 URL: https://issues.apache.org/jira/browse/PIG-4089 Project: Pig Issue Type: Bug Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4089-1.patch The job fails with the following error in *Hadoop 1* local mode- {code} 2014-07-31 05:55:06,630 [Thread-75] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0021 java.io.IOException: Illegal partition for Null: false index: 0 5 (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:121) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} This is because Hadoop 1 doesn't support multiple reducers in local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk
Koji Noguchi created PIG-4090: - Summary: TEZ-1346 broke hadoop 2 compilation in trunk Key: PIG-4090 URL: https://issues.apache.org/jira/browse/PIG-4090 Project: Pig Issue Type: Bug Components: tez Affects Versions: 0.14.0 Reporter: Koji Noguchi Priority: Trivial {noformat} [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-4090: -- Attachment: pig-4090-v01.txt Not understanding TEZ at all but changing the code to fit with the patch in TEZ-1346. TEZ-1346 broke hadoop 2 compilation in trunk Key: PIG-4090 URL: https://issues.apache.org/jira/browse/PIG-4090 Project: Pig Issue Type: Bug Components: tez Affects Versions: 0.14.0 Reporter: Koji Noguchi Priority: Trivial Attachments: pig-4090-v01.txt {noformat} [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4088) TEZ-1346 breaks hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-4088: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. TEZ-1346 breaks hadoop 2 compilation in trunk - Key: PIG-4088 URL: https://issues.apache.org/jira/browse/PIG-4088 Project: Pig Issue Type: Bug Components: tez Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.14.0 Attachments: PIG-4088-1.patch TEZ-1346 is not published into apache snapshot repo yet, but once it's, it will break Pig trunk- {code} [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/cheolsoop/workspace/pig-stash/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved PIG-4090. Resolution: Duplicate [~knoguchi], I committed PIG-4088. The identical patch. Hope we will no longer have to chase Tez changes soon. TEZ-1346 broke hadoop 2 compilation in trunk Key: PIG-4090 URL: https://issues.apache.org/jira/browse/PIG-4090 Project: Pig Issue Type: Bug Components: tez Affects Versions: 0.14.0 Reporter: Koji Noguchi Priority: Trivial Attachments: pig-4090-v01.txt {noformat} [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4090) TEZ-1346 broke hadoop 2 compilation in trunk
[ https://issues.apache.org/jira/browse/PIG-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081360#comment-14081360 ] Koji Noguchi commented on PIG-4090: --- bq. The identical patch. Hope we will no longer have to chase Tez changes soon. Ah, thanks. I hope so too~. TEZ-1346 broke hadoop 2 compilation in trunk Key: PIG-4090 URL: https://issues.apache.org/jira/browse/PIG-4090 Project: Pig Issue Type: Bug Components: tez Affects Versions: 0.14.0 Reporter: Koji Noguchi Priority: Trivial Attachments: pig-4090-v01.txt {noformat} [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:67: error: PigProcessor is not abstract and does not override abstract method initialize() in Processor [javac] public class PigProcessor implements LogicalIOProcessor { [javac]^ [javac] /Users/knoguchi/git/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java:102: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3760: Summary: Predicate pushdown for columnar file formats (was: Predicate pushdown for ORC and Parquet) Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Assignee: Rohini Palaniswamy Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3760: Assignee: (was: Rohini Palaniswamy) Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4091) Predicate pushdown for ORC
Rohini Palaniswamy created PIG-4091: --- Summary: Predicate pushdown for ORC Key: PIG-4091 URL: https://issues.apache.org/jira/browse/PIG-4091 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4092) Predicate pushdown for Parquet
Rohini Palaniswamy created PIG-4092: --- Summary: Predicate pushdown for Parquet Key: PIG-4092 URL: https://issues.apache.org/jira/browse/PIG-4092 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4093) Predicate pushdown to support removing filters from pig plan
Rohini Palaniswamy created PIG-4093: --- Summary: Predicate pushdown to support removing filters from pig plan Key: PIG-4093 URL: https://issues.apache.org/jira/browse/PIG-4093 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy It is possible for the loaders to evaluate the pushed filter conditions. In that case it is not necessary to retain the filter conditions in the pig plan. So need to support two modes : 1) filter conditions are pushed into loader but also retained in pig plan as loader might do only best effort filtering based on block metadata 2) filter conditions are pushed into loader and removed from pig plan when the loader can evaluate the expression itself and filter out records. In this case, loader can do lazy deserialization adn avoid deserialization of the full record. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4094) Predicate pushdown to support complex data types
Rohini Palaniswamy created PIG-4094: --- Summary: Predicate pushdown to support complex data types Key: PIG-4094 URL: https://issues.apache.org/jira/browse/PIG-4094 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 Parquet has support for pushing predicates on tuples, maps and bags according to [~aniket486]. ORC currently only supports primitives, but will add support for structs(tuples) in the future. The API needs to be there even if not implemented as it will hard to change the interface once released. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-4095) Collapse multiple OR conditions to IN and BETWEEN
Rohini Palaniswamy created PIG-4095: --- Summary: Collapse multiple OR conditions to IN and BETWEEN Key: PIG-4095 URL: https://issues.apache.org/jira/browse/PIG-4095 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy ORC predicate pushdown supports IN and BETWEEN operators. Need equivalent expressions in Pig. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4091) Predicate pushdown for ORC
[ https://issues.apache.org/jira/browse/PIG-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4091: Attachment: PIG-3760-initial.patch Predicate pushdown for ORC -- Key: PIG-4091 URL: https://issues.apache.org/jira/browse/PIG-4091 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 Attachments: PIG-3760-initial.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4091) Predicate pushdown for ORC
[ https://issues.apache.org/jira/browse/PIG-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081711#comment-14081711 ] Rohini Palaniswamy commented on PIG-4091: - Attached initial patch. Still has some pending TODOs - Add e2e tests - Add tests for datatypes - boolean, byte, short, biginteger, bigdecimal, datetime LoadPredicatePushdown interface needs some more enhancements. Filed PIG-4093 and PIG-4094 for that. Predicate pushdown for ORC -- Key: PIG-4091 URL: https://issues.apache.org/jira/browse/PIG-4091 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 Attachments: PIG-3760-initial.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081715#comment-14081715 ] Rohini Palaniswamy commented on PIG-3760: - Attached initial patch with PIG-4091 with basic functionality required of Predicate Pushdown interface. The interface needs some more enhancements. Filed PIG-4093 and PIG-4094 for that. [~julienledem]/ [~dvryaboy], Is there someone in Twitter that we can work with for the Parquet implementation? It would help us flush out and finalize the APIs. Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (14 issues) Subscriber: pigdaily Key Summary PIG-4066An optimization for ROLLUP operation in Pig https://issues.apache.org/jira/browse/PIG-4066 PIG-4008Pig code change to enable Tez Local mode https://issues.apache.org/jira/browse/PIG-4008 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3861duplicate jars get added to distributed cache https://issues.apache.org/jira/browse/PIG-3861 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error
[ https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081817#comment-14081817 ] fang fang chen commented on PIG-4083: - BTW, I was uring sun jdk 1.7.0_60/1.6.0_45 and ibm jdk 1.6.0/1.7.0. All failed. If this is caused by environment, I want to know what caused this issue and how to resolve. This would be helpful if pig can provide this information. Thanks. TestAccumuloPigCluster always failed with timeout error --- Key: PIG-4083 URL: https://issues.apache.org/jira/browse/PIG-4083 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: fang fang chen Assignee: Josh Elser Priority: Critical TestAccumuloPigCluster always failed with timeout error. Tried with sun jdk 6 and sun jdk 7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4083) TestAccumuloPigCluster always failed with timeout error
[ https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081823#comment-14081823 ] Josh Elser commented on PIG-4083: - Sounds good, I'll get a patch with some extra debugging here for you. Out of curiosity, does it fail quickly? TestAccumuloPigCluster always failed with timeout error --- Key: PIG-4083 URL: https://issues.apache.org/jira/browse/PIG-4083 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: fang fang chen Assignee: Josh Elser Priority: Critical TestAccumuloPigCluster always failed with timeout error. Tried with sun jdk 6 and sun jdk 7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4083) TestAccumuloPigCluster always failed with timeout error
[ https://issues.apache.org/jira/browse/PIG-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated PIG-4083: Attachment: PIG-4083-debug.patch Ok, [~fang fang chen]. You can apply this using {{patch -p1 PIG-4083-debug.patch}}. Then, run just the testcase {{ant test -Dtestcase=TestAccumuloPigCluster}}. After, please attach {{build/test/logs/TEST-org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster.txt}}. Also, in that same log file, you will also see a line that matches {{INFO org.apache.pig.backend.hadoop.accumulo.TestAccumuloPigCluster - Starting MiniAccumuloCluster in ...}}, where {{...}} is some directory on your local filesystem. That directory is where the MiniAccumuloCluster was started from. Please attach the contents of the {{logs}} directory beneath the temporary directory path, as well. Those two logs should help me better understand why this test was failing for you. Thanks. TestAccumuloPigCluster always failed with timeout error --- Key: PIG-4083 URL: https://issues.apache.org/jira/browse/PIG-4083 Project: Pig Issue Type: Bug Affects Versions: 0.13.0 Reporter: fang fang chen Assignee: Josh Elser Priority: Critical Attachments: PIG-4083-debug.patch TestAccumuloPigCluster always failed with timeout error. Tried with sun jdk 6 and sun jdk 7. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081944#comment-14081944 ] Julien Le Dem commented on PIG-3760: [~rohini] I added to the description of PIG-4092 Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet
[ https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-4092: --- Description: See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java Predicate pushdown for Parquet -- Key: PIG-4092 URL: https://issues.apache.org/jira/browse/PIG-4092 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-4092) Predicate pushdown for Parquet
[ https://issues.apache.org/jira/browse/PIG-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-4092: --- Description: See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java [~alexlevenson] is the main author of this API was: See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java Predicate pushdown for Parquet -- Key: PIG-4092 URL: https://issues.apache.org/jira/browse/PIG-4092 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: 0.14.0 See: https://github.com/apache/incubator-parquet-mr/pull/4 and: https://github.com/apache/incubator-parquet-mr/blob/master/parquet-column/src/main/java/parquet/filter2/predicate/FilterApi.java [~alexlevenson] is the main author of this API -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3760) Predicate pushdown for columnar file formats
[ https://issues.apache.org/jira/browse/PIG-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081948#comment-14081948 ] Julien Le Dem commented on PIG-3760: FYI in Parquet the filter is not a hint and it will be applied to records after the metadata Predicate pushdown for columnar file formats Key: PIG-3760 URL: https://issues.apache.org/jira/browse/PIG-3760 Project: Pig Issue Type: New Feature Reporter: Andrew Musselman Fix For: 0.14.0 From the conversation on dev@pig: Partition pruning for ORC is not addressed in PIG-3558. We will need to do partition pruning for both ORC and Parquet in a new ticket. Curently there is no interface to deal with this kind of pushdown (LoadMetadata.setPartitionFilter push the filter to loader, but remove the filter statement, for ORC/Parquet, filter is a hint, and we need to do the filter again in Pig even it is pushed to loader), we will need to define a new interface for that. You are welcome to initiate the work. I know Aniket is also interested in doing that, so be sure the talk with him about this work. Thanks, Daniel On Mon, Feb 10, 2014 at 11:42 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I had a chat with a couple people last week about a feature request for Pig: in a where or filter clause, when loading an ORC file, to skip directly to the right offset instead of scanning the whole file. -- This message was sent by Atlassian JIRA (v6.2#6252)