Re: Spark 1.6.1 : SPARK-12089 : java.lang.NegativeArraySizeException

2016-03-13 Thread Ted Yu
Here is related code:

final int length = totalSize() + neededSize;
if (buffer.length < length) {
  // This will not happen frequently, because the buffer is re-used.
  final byte[] tmp = new byte[length * 2];

Looks like length was positive (since it was bigger than buffer.length) but
length * 2 became negative.
We just need to allocate length bytes instead of length * 2 bytes.

On Sun, Mar 13, 2016 at 10:39 PM, Ravindra Rawat 
wrote:

> Greetings,
>
> I am getting following exception on joining a few parquet files. SPARK-12089 
> description has details of the overflow condition which is marked as fixed in 
> 1.6.1. I recall seeing another issue related to csv files creating same 
> exception.
>
> Any pointers on how to debug this or possible workarounds? Google searches 
> and JIRA comments point to either a > 2GB record size (less likely) or RDD 
> sizes being too large.
>
> I had upgraded to Spark 1.6.1 due to Serialization errors from Catalyst while 
> reading Parquet files.
>
> Related JIRA Issue => https://issues.apache.org/jira/browse/SPARK-12089
>
> Related PR => https://github.com/apache/spark/pull/10142
>
>
> java.lang.NegativeArraySizeException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:45)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:196)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>
>
> Thanks.
>
> --
> Regards
> Ravindra
>


Spark 1.6.1 : SPARK-12089 : java.lang.NegativeArraySizeException

2016-03-13 Thread Ravindra Rawat
Greetings,

I am getting following exception on joining a few parquet files.
SPARK-12089 description has details of the overflow condition which is
marked as fixed in 1.6.1. I recall seeing another issue related to csv
files creating same exception.

Any pointers on how to debug this or possible workarounds? Google
searches and JIRA comments point to either a > 2GB record size (less
likely) or RDD sizes being too large.

I had upgraded to Spark 1.6.1 due to Serialization errors from
Catalyst while reading Parquet files.

Related JIRA Issue => https://issues.apache.org/jira/browse/SPARK-12089

Related PR => https://github.com/apache/spark/pull/10142


java.lang.NegativeArraySizeException
at 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:45)
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:196)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Thanks.

-- 
Regards
Ravindra