[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699915#action_12699915 ] Vadim Zaliva commented on PIG-766: -- I have 1Gb now, could not go any higher. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699277#action_12699277 ] Vadim Zaliva commented on PIG-766: -- increasing sort buffer to 500Mb did not work for me. since implementation of many basic algorithms (like counting number of records in relationship) in PIG requires using GROUP BY which could produce very long records (up to number of tuples in relationship), this is a very serious problem. Potentially record could exceed available Java heap memory. What are the strategies for overcoming this limitation? Does pig plan to address this? ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699376#action_12699376 ] Vadim Zaliva commented on PIG-766: -- I have increased it to 500Mb and it is still not enough. I see this as a more general problem, as at some point the memory I need to allocate for processing big dataset will exceed all possible VM limits. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699376#action_12699376 ] Vadim Zaliva edited comment on PIG-766 at 4/15/09 2:04 PM: --- I have increased it to 500Mb and it is still not enough. I see this as a more general problem, as at some point the memory I need to allocate for processing big dataset will exceed all possible VM limits. In other words I am concerned that size of dataset I can process is limited by memory I can allocate to Java VM. So the system have a fundamental scalability limit. was (Author: vzaliva): I have increased it to 500Mb and it is still not enough. I see this as a more general problem, as at some point the memory I need to allocate for processing big dataset will exceed all possible VM limits. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699000#action_12699000 ] Vadim Zaliva commented on PIG-766: -- I have at most 17m rows in my dataset. At some point I am doing GROUP BY and longest row about 500,000 tuples. ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PIG-766) ava.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699021#action_12699021 ] Vadim Zaliva edited comment on PIG-766 at 4/14/09 6:53 PM: --- I know for sure that I have at some point a record of 500K tuples, but tuples being 40-50 bytes each, it is far from 90M. Even if this is the case, I do not see how this could cause OutOfMemory exception in java.util.Arrays.copyOf(). Even if this record is, say, 200Mb, given JVM total heap memory size 1Gb, it could happen only if all of this memory is already used and does not have 200Mb left. How can I increase combiner buffer size? What are the possible remedies/workarounds for this problem? was (Author: vzaliva): I know for sure that I have at some point a record of 500K tuples, but tuples being 40-50 bytes each, it is far from 90M. Even if this is the case, I do not see how this could cause OutOfMemory exception in java.util.Arrays.copyOf(). Even if this record is, say, 200Mb, given JVM total heap memory size 1Gb, it could happen only if all of this memory is already used and does not have 200Mb left. How can I increase combiner buffer size? ava.lang.OutOfMemoryError: Java heap space -- Key: PIG-766 URL: https://issues.apache.org/jira/browse/PIG-766 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop-0.18.3 (cloudera RPMs). mapred.child.java.opts=-Xmx1024m Reporter: Vadim Zaliva My pig script always fails with the following error: Java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233) at org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162) at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-761) ERROR 2086 on simple JOIN
ERROR 2086 on simple JOIN - Key: PIG-761 URL: https://issues.apache.org/jira/browse/PIG-761 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Environment: mapreduce mode Reporter: Vadim Zaliva ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 doing pretty straightforward join in one of my pig scripts. I am able to 'dump' both relationship involved in this join. when I try to join them I am getting this error. Here is a full log: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700) at org.apache.pig.PigServer.execute(PigServer.java:691) at org.apache.pig.PigServer.registerQuery(PigServer.java:292) ... 5 more Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. MapReduceLauncher.compile(MapReduceLauncher.java:198) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261) ... 8 more ERROR 1002: Unable to store alias 398 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 398 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:669) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:330) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:41) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246) at org.apache.pig.PigServer.compilePp(PigServer.java:771) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:697) at org.apache.pig.PigServer.execute(PigServer.java:691) at org.apache.pig.PigServer.registerQuery(PigServer.java:292) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-475) task failed to report status error
[ https://issues.apache.org/jira/browse/PIG-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682743#action_12682743 ] Vadim Zaliva commented on PIG-475: -- I am using SVN trunk version (checked out March 16, 2009) and getting these errors on one of my pig scripts. Task attempt_200903131720_0047_m_000270_0 failed to report status for 627 seconds. Killing! ... I could not successfully complete this task even once! This task is working with data in bz2 format. My other tasks are working with non-bz2 data. Maybe this is a reason? task failed to report status error Key: PIG-475 URL: https://issues.apache.org/jira/browse/PIG-475 Project: Pig Issue Type: Bug Affects Versions: 1.0.0 Reporter: Olga Natkovich Assignee: Shravan Matthur Narayanamurthy Fix For: 1.0.0 Attachments: 475.patch When running a very large query a user got the following error after 90 minutes: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error message from task (reduce) tip_200810072143_0004_r_33Task task_200810072143_0004_r_33_0 failed to report status for 600 seconds. Killing! Looks like we missed reporting progress in a few places. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-692) when running script file, automatically set up job name based on the file name
when running script file, automatically set up job name based on the file name -- Key: PIG-692 URL: https://issues.apache.org/jira/browse/PIG-692 Project: Pig Issue Type: Improvement Components: tools Affects Versions: types_branch Reporter: Vadim Zaliva Priority: Trivial Fix For: types_branch Attachments: pig-job-name.patch When running pig script from command like like this: pig scriptfile right now default job name is used. it is convenient to have it automatically set up based on the script name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)
[ https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12665677#action_12665677 ] Vadim Zaliva commented on PIG-620: -- Sorry about println, it was a debug statement and should be removed. I was not aware about apache header. I guess it could be added if needed. Do you want me to resubmit or committer can do these 2 trivial changes (1. remove print, 2. add header) before committing? find Max Tuple by 1st field UDF (for piggybank) --- Key: PIG-620 URL: https://issues.apache.org/jira/browse/PIG-620 Project: Pig Issue Type: New Feature Components: impl Affects Versions: types_branch Reporter: Vadim Zaliva Fix For: types_branch Attachments: MaxTupleBy1stField.java This is simple UDF which takes bag of tuples and returns one with max. 1st column. It is fairly trivial but I have seen people asking for it. Detailed usage comments are in Javadoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)
[ https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-620: - Status: Patch Available (was: Open) implementation find Max Tuple by 1st field UDF (for piggybank) --- Key: PIG-620 URL: https://issues.apache.org/jira/browse/PIG-620 Project: Pig Issue Type: New Feature Components: impl Affects Versions: types_branch Reporter: Vadim Zaliva Fix For: types_branch This is simple UDF which takes bag of tuples and returns one with max. 1st column. It is fairly trivial but I have seen people asking for it. Detailed usage comments are in Javadoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)
[ https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-620: - Attachment: MaxTupleBy1stField.java implementation find Max Tuple by 1st field UDF (for piggybank) --- Key: PIG-620 URL: https://issues.apache.org/jira/browse/PIG-620 Project: Pig Issue Type: New Feature Components: impl Affects Versions: types_branch Reporter: Vadim Zaliva Fix For: types_branch Attachments: MaxTupleBy1stField.java This is simple UDF which takes bag of tuples and returns one with max. 1st column. It is fairly trivial but I have seen people asking for it. Detailed usage comments are in Javadoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-593) RegExLoader stops an non-matching line
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Status: Patch Available (was: Open) deco /Users/lord/java/pig-0.1.1/contrib/piggybank/java/src svn diff Index: main/java/org/apache/pig/piggybank/storage/RegExLoader.java === --- main/java/org/apache/pig/piggybank/storage/RegExLoader.java (revision 730029) +++ main/java/org/apache/pig/piggybank/storage/RegExLoader.java (working copy) @@ -57,7 +57,8 @@ Matcher matcher = pattern.matcher(); String line; -if ((line = in.readLine(utf8, recordDel)) != null) { +while((line = in.readLine(utf8, recordDel)) != null) +{ if (line.length() 0 line.charAt(line.length() - 1) == '\r') line = line.substring(0, line.length() - 1); RegExLoader stops an non-matching line -- Key: PIG-593 URL: https://issues.apache.org/jira/browse/PIG-593 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.1.0 Reporter: Vadim Zaliva Class RegExLoader and all its subclasses stop if some of lines does not match provided regular expression. In particular, I have noticed this when CombinedLogLoader stopped at the following line: 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET /tor/browse/?id=24746rel=FLY 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri ca+22+cd1.avi HTTP/1.1 8952 200 http://img252.imageshack.us/tor/browse/?id=247 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; Wi ndows NT 5.1; ) - Looks like some japanese characters here do not match \S expression used. In general I expect it to skip such lines, not to stop processing data file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-593) RegExLoader stops an non-matching line
RegExLoader stops an non-matching line -- Key: PIG-593 URL: https://issues.apache.org/jira/browse/PIG-593 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.1.0 Reporter: Vadim Zaliva Class RegExLoader and all its subclasses stop if some of lines does not match provided regular expression. In particular, I have noticed this when CombinedLogLoader stopped at the following line: 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET /tor/browse/?id=24746rel=FLY 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri ca+22+cd1.avi HTTP/1.1 8952 200 http://img252.imageshack.us/tor/browse/?id=247 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; Wi ndows NT 5.1; ) - Looks like some japanese characters here do not match \S expression used. In general I expect it to skip such lines, not to stop processing data file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-593) RegExLoader stops an non-matching line
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Attachment: PIG-593.diff Attaching patch. RegExLoader stops an non-matching line -- Key: PIG-593 URL: https://issues.apache.org/jira/browse/PIG-593 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.1.0 Reporter: Vadim Zaliva Attachments: PIG-593.diff Class RegExLoader and all its subclasses stop if some of lines does not match provided regular expression. In particular, I have noticed this when CombinedLogLoader stopped at the following line: 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET /tor/browse/?id=24746rel=FLY 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri ca+22+cd1.avi HTTP/1.1 8952 200 http://img252.imageshack.us/tor/browse/?id=247 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; Wi ndows NT 5.1; ) - Looks like some japanese characters here do not match \S expression used. In general I expect it to skip such lines, not to stop processing data file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-593) RegExLoader stops an non-matching line
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Comment: was deleted RegExLoader stops an non-matching line -- Key: PIG-593 URL: https://issues.apache.org/jira/browse/PIG-593 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.1.0 Reporter: Vadim Zaliva Attachments: PIG-593.diff Class RegExLoader and all its subclasses stop if some of lines does not match provided regular expression. In particular, I have noticed this when CombinedLogLoader stopped at the following line: 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET /tor/browse/?id=24746rel=FLY 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri ca+22+cd1.avi HTTP/1.1 8952 200 http://img252.imageshack.us/tor/browse/?id=247 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; Wi ndows NT 5.1; ) - Looks like some japanese characters here do not match \S expression used. In general I expect it to skip such lines, not to stop processing data file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-593) RegExLoader stops an non-matching line
[ https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Zaliva updated PIG-593: - Priority: Minor (was: Major) RegExLoader stops an non-matching line -- Key: PIG-593 URL: https://issues.apache.org/jira/browse/PIG-593 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.1.0 Reporter: Vadim Zaliva Priority: Minor Attachments: PIG-593.diff Class RegExLoader and all its subclasses stop if some of lines does not match provided regular expression. In particular, I have noticed this when CombinedLogLoader stopped at the following line: 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET /tor/browse/?id=24746rel=FLY 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri ca+22+cd1.avi HTTP/1.1 8952 200 http://img252.imageshack.us/tor/browse/?id=247 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; Wi ndows NT 5.1; ) - Looks like some japanese characters here do not match \S expression used. In general I expect it to skip such lines, not to stop processing data file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.