[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-16 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699915#action_12699915
 ] 

Vadim Zaliva commented on PIG-766:
--

I have 1Gb now, could not go any higher.


 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-15 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699277#action_12699277
 ] 

Vadim Zaliva commented on PIG-766:
--

increasing sort buffer to 500Mb did not work for me.

since implementation of many basic algorithms (like counting number of records 
in relationship) in PIG requires using GROUP BY which could produce very long 
records (up to number of tuples in relationship), this is a very serious 
problem. Potentially record could exceed available Java heap memory. What are 
the strategies for overcoming this limitation? Does pig plan to address this?


 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-15 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699376#action_12699376
 ] 

Vadim Zaliva commented on PIG-766:
--

I have increased it to 500Mb and it is still not enough. I see this as a more 
general problem, as at some point the memory
I need to allocate for processing  big dataset will exceed all possible VM 
limits.


 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-15 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699376#action_12699376
 ] 

Vadim Zaliva edited comment on PIG-766 at 4/15/09 2:04 PM:
---

I have increased it to 500Mb and it is still not enough. I see this as a more 
general problem, as at some point the memory I need to allocate for processing  
big dataset will exceed all possible VM limits.

In other words I am concerned that size of dataset I can process is limited by 
memory I can allocate to Java VM. So the system have a fundamental scalability 
limit.



  was (Author: vzaliva):
I have increased it to 500Mb and it is still not enough. I see this as a 
more general problem, as at some point the memory
I need to allocate for processing  big dataset will exceed all possible VM 
limits.

  
 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Vadim Zaliva (JIRA)
ava.lang.OutOfMemoryError: Java heap space
--

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
mapred.child.java.opts=-Xmx1024m

Reporter: Vadim Zaliva


My pig script always fails with the following error:

Java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:2786)
   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
   at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
   at 
org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
   at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
   at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
   at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
   at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
   at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
   at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
   at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699000#action_12699000
 ] 

Vadim Zaliva commented on PIG-766:
--

I have at most 17m rows in my dataset.
At some point I am doing GROUP BY and longest row about 500,000 tuples.



 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2009-04-14 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699021#action_12699021
 ] 

Vadim Zaliva edited comment on PIG-766 at 4/14/09 6:53 PM:
---

I know for sure that I have at some point a record of 500K tuples, but tuples 
being 40-50 bytes each, it is far from 90M.

Even if this is the case, I do not see how this could cause OutOfMemory 
exception in java.util.Arrays.copyOf(). Even if this record
is, say, 200Mb, given JVM total heap memory size 1Gb, it could happen only if 
all of this memory is already used and does not have
200Mb left.

How can I increase combiner buffer size?  What are the possible 
remedies/workarounds for this problem?



  was (Author: vzaliva):
I know for sure that I have at some point a record of 500K tuples, but 
tuples being 40-50 bytes each, it is far from 90M.

Even if this is the case, I do not see how this could cause OutOfMemory 
exception in java.util.Arrays.copyOf(). Even if this record
is, say, 200Mb, given JVM total heap memory size 1Gb, it could happen only if 
all of this memory is already used and does not have
200Mb left.

How can I increase combiner buffer size?

  
 ava.lang.OutOfMemoryError: Java heap space
 --

 Key: PIG-766
 URL: https://issues.apache.org/jira/browse/PIG-766
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop-0.18.3 (cloudera RPMs).
 mapred.child.java.opts=-Xmx1024m
Reporter: Vadim Zaliva

 My pig script always fails with the following error:
 Java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:213)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:233)
at 
 org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:162)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:291)
at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:83)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:156)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.spillSingleRecord(MapTask.java:857)
at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:467)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:101)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:219)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:86)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-761) ERROR 2086 on simple JOIN

2009-04-08 Thread Vadim Zaliva (JIRA)
ERROR 2086 on simple JOIN
-

 Key: PIG-761
 URL: https://issues.apache.org/jira/browse/PIG-761
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
 Environment: mapreduce mode
Reporter: Vadim Zaliva


ERROR 2086: Unexpected problem during optimization. Could not find all 
LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: 
ERROR 1002: Unable to store alias 109

doing pretty straightforward join in one of my pig scripts. I am able to 'dump' 
both relationship involved in this join. when I try to join them I am getting 
this error.

Here is a full log:


ERROR 2086: Unexpected problem during optimization. Could not find all
LocalRearrange operators.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
to store alias 109
   at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
   at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
   at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
   at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
   at org.apache.pig.Main.main(Main.java:319)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
2043: Unexpected error during execution.
   at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
   at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700)
   at org.apache.pig.PigServer.execute(PigServer.java:691)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
   ... 5 more
Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException:
ERROR 2086: Unexpected problem during optimization. Could not find all
LocalRearrange operators.
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43)
   at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
   at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
   at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
   at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
   at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
   at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
MapReduceLauncher.compile(MapReduceLauncher.java:198)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80)
   at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
   ... 8 more
ERROR 1002: Unable to store alias 398
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
to store alias 398
   at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
   at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
   at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
   at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
   at org.apache.pig.Main.main(Main.java:319)
Caused by: java.lang.NullPointerException
   at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:669)
   at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:330)
   at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:41)
   at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
   at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
   at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
   at org.apache.pig.PigServer.compilePp(PigServer.java:771)
   at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:697)
   at org.apache.pig.PigServer.execute(PigServer.java:691)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
   ... 5 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-475) task failed to report status error

2009-03-17 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12682743#action_12682743
 ] 

Vadim Zaliva commented on PIG-475:
--

I am using SVN trunk version (checked out March 16, 2009) and getting these 
errors on one of my pig scripts.

Task attempt_200903131720_0047_m_000270_0 failed to report status for 627 
seconds. Killing!
...

I could not successfully complete this task even once!

This task is working with data in bz2 format. My other tasks are working with 
non-bz2 data. Maybe this is a reason?



 task failed to report status error
 

 Key: PIG-475
 URL: https://issues.apache.org/jira/browse/PIG-475
 Project: Pig
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Olga Natkovich
Assignee: Shravan Matthur Narayanamurthy
 Fix For: 1.0.0

 Attachments: 475.patch


 When running a very large query a user got the following error after 90 
 minutes:
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher
  - Error message from task (reduce) tip_200810072143_0004_r_33Task
  task_200810072143_0004_r_33_0 failed to report status for 600 
  seconds. Killing!
 Looks like we missed reporting progress in a few places.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-692) when running script file, automatically set up job name based on the file name

2009-02-27 Thread Vadim Zaliva (JIRA)
when running script file, automatically set up job name based on the file name
--

 Key: PIG-692
 URL: https://issues.apache.org/jira/browse/PIG-692
 Project: Pig
  Issue Type: Improvement
  Components: tools
Affects Versions: types_branch
Reporter: Vadim Zaliva
Priority: Trivial
 Fix For: types_branch
 Attachments: pig-job-name.patch

When running pig script from command like like this:

pig scriptfile

right now default job name is used. it is convenient to have it automatically 
set up based on the script name.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)

2009-01-20 Thread Vadim Zaliva (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12665677#action_12665677
 ] 

Vadim Zaliva commented on PIG-620:
--

Sorry about println, it was a debug statement and should be removed.

I was not aware about apache header. I guess it could be added if needed.

Do you want me to resubmit or committer can do these 2 trivial changes (1. 
remove print, 2. add header) before committing?




 find Max Tuple by 1st field UDF (for piggybank)
 ---

 Key: PIG-620
 URL: https://issues.apache.org/jira/browse/PIG-620
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: types_branch
Reporter: Vadim Zaliva
 Fix For: types_branch

 Attachments: MaxTupleBy1stField.java


 This is simple UDF which takes bag of tuples and returns one with max. 1st 
 column.
 It is fairly trivial but I have seen people asking for it. Detailed usage 
 comments are in Javadoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)

2009-01-15 Thread Vadim Zaliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-620:
-

Status: Patch Available  (was: Open)

implementation



 find Max Tuple by 1st field UDF (for piggybank)
 ---

 Key: PIG-620
 URL: https://issues.apache.org/jira/browse/PIG-620
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: types_branch
Reporter: Vadim Zaliva
 Fix For: types_branch


 This is simple UDF which takes bag of tuples and returns one with max. 1st 
 column.
 It is fairly trivial but I have seen people asking for it. Detailed usage 
 comments are in Javadoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)

2009-01-15 Thread Vadim Zaliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-620:
-

Attachment: MaxTupleBy1stField.java

implementation



 find Max Tuple by 1st field UDF (for piggybank)
 ---

 Key: PIG-620
 URL: https://issues.apache.org/jira/browse/PIG-620
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: types_branch
Reporter: Vadim Zaliva
 Fix For: types_branch

 Attachments: MaxTupleBy1stField.java


 This is simple UDF which takes bag of tuples and returns one with max. 1st 
 column.
 It is fairly trivial but I have seen people asking for it. Detailed usage 
 comments are in Javadoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-593:
-

Status: Patch Available  (was: Open)

deco /Users/lord/java/pig-0.1.1/contrib/piggybank/java/src svn diff
Index: main/java/org/apache/pig/piggybank/storage/RegExLoader.java
===
--- main/java/org/apache/pig/piggybank/storage/RegExLoader.java (revision 
730029)
+++ main/java/org/apache/pig/piggybank/storage/RegExLoader.java (working copy)
@@ -57,7 +57,8 @@
 Matcher matcher = pattern.matcher();
 
 String line;
-if ((line = in.readLine(utf8, recordDel)) != null) {
+while((line = in.readLine(utf8, recordDel)) != null) 
+{
 if (line.length()  0  line.charAt(line.length() - 1) == '\r')
 line = line.substring(0, line.length() - 1);
 





 RegExLoader stops an non-matching line
 --

 Key: PIG-593
 URL: https://issues.apache.org/jira/browse/PIG-593
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.1.0
Reporter: Vadim Zaliva

 Class RegExLoader and all its subclasses stop if some of lines does not match 
 provided regular expression.
 In particular, I have noticed this when CombinedLogLoader stopped at the 
 following line:
 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET 
 /tor/browse/?id=24746rel=FLY
 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
 ca+22+cd1.avi HTTP/1.1 8952 200 
 http://img252.imageshack.us/tor/browse/?id=247
 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; 
 Wi
 ndows NT 5.1; ) -
 Looks like some japanese characters here do not match \S expression used.  
 In general I expect it to skip such lines, not to stop processing data file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)
RegExLoader stops an non-matching line
--

 Key: PIG-593
 URL: https://issues.apache.org/jira/browse/PIG-593
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.1.0
Reporter: Vadim Zaliva


Class RegExLoader and all its subclasses stop if some of lines does not match 
provided regular expression.

In particular, I have noticed this when CombinedLogLoader stopped at the 
following line:

58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET /tor/browse/?id=24746rel=FLY
999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
ca+22+cd1.avi HTTP/1.1 8952 200 http://img252.imageshack.us/tor/browse/?id=247
46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; Wi
ndows NT 5.1; ) -

Looks like some japanese characters here do not match \S expression used.  

In general I expect it to skip such lines, not to stop processing data file.









-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-593:
-

Attachment: PIG-593.diff

Attaching patch.



 RegExLoader stops an non-matching line
 --

 Key: PIG-593
 URL: https://issues.apache.org/jira/browse/PIG-593
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.1.0
Reporter: Vadim Zaliva
 Attachments: PIG-593.diff


 Class RegExLoader and all its subclasses stop if some of lines does not match 
 provided regular expression.
 In particular, I have noticed this when CombinedLogLoader stopped at the 
 following line:
 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET 
 /tor/browse/?id=24746rel=FLY
 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
 ca+22+cd1.avi HTTP/1.1 8952 200 
 http://img252.imageshack.us/tor/browse/?id=247
 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; 
 Wi
 ndows NT 5.1; ) -
 Looks like some japanese characters here do not match \S expression used.  
 In general I expect it to skip such lines, not to stop processing data file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-593:
-

Comment: was deleted

 RegExLoader stops an non-matching line
 --

 Key: PIG-593
 URL: https://issues.apache.org/jira/browse/PIG-593
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.1.0
Reporter: Vadim Zaliva
 Attachments: PIG-593.diff


 Class RegExLoader and all its subclasses stop if some of lines does not match 
 provided regular expression.
 In particular, I have noticed this when CombinedLogLoader stopped at the 
 following line:
 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET 
 /tor/browse/?id=24746rel=FLY
 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
 ca+22+cd1.avi HTTP/1.1 8952 200 
 http://img252.imageshack.us/tor/browse/?id=247
 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; 
 Wi
 ndows NT 5.1; ) -
 Looks like some japanese characters here do not match \S expression used.  
 In general I expect it to skip such lines, not to stop processing data file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-593) RegExLoader stops an non-matching line

2008-12-30 Thread Vadim Zaliva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Zaliva updated PIG-593:
-

Priority: Minor  (was: Major)

 RegExLoader stops an non-matching line
 --

 Key: PIG-593
 URL: https://issues.apache.org/jira/browse/PIG-593
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.1.0
Reporter: Vadim Zaliva
Priority: Minor
 Attachments: PIG-593.diff


 Class RegExLoader and all its subclasses stop if some of lines does not match 
 provided regular expression.
 In particular, I have noticed this when CombinedLogLoader stopped at the 
 following line:
 58.210.62.24 - - [29/Dec/2008:23:06:57 -0800] GET 
 /tor/browse/?id=24746rel=FLY
 999%40Jack's+Teen+America+22%2FFLY999原創%40單掛D.C.資訊交流網+Jack's+Teen+Ameri
 ca+22+cd1.avi HTTP/1.1 8952 200 
 http://img252.imageshack.us/tor/browse/?id=247
 46rel=FLY999%40Jack%27s+Teen+America+22 Mozilla/4.0 (compatible; MSIE 6.0; 
 Wi
 ndows NT 5.1; ) -
 Looks like some japanese characters here do not match \S expression used.  
 In general I expect it to skip such lines, not to stop processing data file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.