Re: NegativeArraySizeException / segfault

2016-06-08 Thread Andres Perez
We were able to reproduce it with a minimal example. I've opened a jira
issue:

https://issues.apache.org/jira/browse/SPARK-15825

On Wed, Jun 8, 2016 at 12:43 PM, Koert Kuipers  wrote:

> great!
>
> we weren't able to reproduce it because the unit tests use a
> broadcast-join while on the cluster it uses sort-merge-join. so the issue
> is in sort-merge-join.
>
> we are now able to reproduce it in tests using
> spark.sql.autoBroadcastJoinThreshold=-1
> it produces weird looking garbled results in the join.
> hoping to get a minimal reproducible example soon.
>
> On Wed, Jun 8, 2016 at 10:24 AM, Pete Robbins  wrote:
>
>> I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
>> similar looking issue. Analyzing the core dump from the segv with Memory
>> Analyzer it looks very much like a UTF8String is very corrupt.
>>
>> Cheers,
>>
>>
>> On Fri, 27 May 2016 at 21:00 Koert Kuipers  wrote:
>>
>>> hello all,
>>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>>> trying to run some algorithms at scale on our cluster.
>>> unfortunately this means that when i see errors i am having a harder
>>> time boiling it down to a small reproducible example.
>>>
>>> today we are running an iterative algo using the dataset api and we are
>>> seeing tasks fail with errors which seem to related to unsafe operations.
>>> the same tasks succeed without issues in our unit tests.
>>>
>>> i see either:
>>>
>>> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in
>>> stage 21.0 (TID 1073)
>>> java.lang.NegativeArraySizeException
>>> at
>>> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>>> at
>>> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>>> Source)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>>> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>>> Source)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>>> Source)
>>> at
>>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> at
>>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
>>> at
>>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
>>> at
>>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>>> at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> or alternatively:
>>>
>>> # A fatal error has been detected by the Java Runtime Environment:
>>> #
>>> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
>>> #
>>> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
>>> 1.7.0_75-b13)
>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
>>> linux-amd64 compressed oops)
>>> # Problematic frame:
>>> # v  ~StubRoutines::jbyte_disjoint_arraycopy

Re: NegativeArraySizeException / segfault

2016-06-08 Thread Koert Kuipers
great!

we weren't able to reproduce it because the unit tests use a broadcast-join
while on the cluster it uses sort-merge-join. so the issue is in
sort-merge-join.

we are now able to reproduce it in tests using
spark.sql.autoBroadcastJoinThreshold=-1
it produces weird looking garbled results in the join.
hoping to get a minimal reproducible example soon.

On Wed, Jun 8, 2016 at 10:24 AM, Pete Robbins  wrote:

> I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
> similar looking issue. Analyzing the core dump from the segv with Memory
> Analyzer it looks very much like a UTF8String is very corrupt.
>
> Cheers,
>
>
> On Fri, 27 May 2016 at 21:00 Koert Kuipers  wrote:
>
>> hello all,
>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>> trying to run some algorithms at scale on our cluster.
>> unfortunately this means that when i see errors i am having a harder time
>> boiling it down to a small reproducible example.
>>
>> today we are running an iterative algo using the dataset api and we are
>> seeing tasks fail with errors which seem to related to unsafe operations.
>> the same tasks succeed without issues in our unit tests.
>>
>> i see either:
>>
>> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in
>> stage 21.0 (TID 1073)
>> java.lang.NegativeArraySizeException
>> at
>> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>> at
>> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>> Source)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>> Source)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>> Source)
>> at
>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>> at
>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
>> at
>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
>> at
>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>> at org.apache.spark.scheduler.Task.run(Task.scala:85)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> or alternatively:
>>
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
>> #
>> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
>> 1.7.0_75-b13)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # v  ~StubRoutines::jbyte_disjoint_arraycopy
>>
>> i assume the best thing would be to try to get it to print out the
>> generated code that is causing this?
>> what switch do i need to use again to do so?
>> thanks,
>> koert
>>
>


Re: NegativeArraySizeException / segfault

2016-06-08 Thread Pete Robbins
I just raised https://issues.apache.org/jira/browse/SPARK-15822 for a
similar looking issue. Analyzing the core dump from the segv with Memory
Analyzer it looks very much like a UTF8String is very corrupt.

Cheers,

On Fri, 27 May 2016 at 21:00 Koert Kuipers  wrote:

> hello all,
> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
> trying to run some algorithms at scale on our cluster.
> unfortunately this means that when i see errors i am having a harder time
> boiling it down to a small reproducible example.
>
> today we are running an iterative algo using the dataset api and we are
> seeing tasks fail with errors which seem to related to unsafe operations.
> the same tasks succeed without issues in our unit tests.
>
> i see either:
>
> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in stage
> 21.0 (TID 1073)
> java.lang.NegativeArraySizeException
> at
> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
> at
> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
> at
> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
> at
> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> or alternatively:
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
> 1.7.0_75-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
>
> i assume the best thing would be to try to get it to print out the
> generated code that is causing this?
> what switch do i need to use again to do so?
> thanks,
> koert
>


Re: NegativeArraySizeException / segfault

2016-05-30 Thread Jiří Syrový
I think I saw this one already as the first indication that something is
wrong and it was related to
https://issues.apache.org/jira/browse/SPARK-13516

2016-05-28 1:34 GMT+02:00 Koert Kuipers :

> it seemed to be related to an Aggregator, so for tests we replaced it with
> an ordinary Dataset.reduce operation, and now we got:
>
> java.lang.NegativeArraySizeException
> at
> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
> at
> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:147)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> i did get the generated code, but its like 17 subtrees and its not a test
> but a company real program so i cannot just send it over.
>
> i will try to create a small test program to reproduce it.
>
> On Fri, May 27, 2016 at 4:25 PM, Reynold Xin  wrote:
>
>> They should get printed if you turn on debug level logging.
>>
>> On Fri, May 27, 2016 at 1:00 PM, Koert Kuipers  wrote:
>>
>>> hello all,
>>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>>> trying to run some algorithms at scale on our cluster.
>>> unfortunately this means that when i see errors i am having a harder
>>> time boiling it down to a small reproducible example.
>>>
>>> today we are running an iterative algo using the dataset api and we are
>>> seeing tasks fail with errors which seem to related to unsafe operations.
>>> the same tasks succeed without issues in our unit tests.
>>>
>>> i see either:
>>>
>>> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in
>>> stage 21.0 (TID 1073)
>>> java.lang.NegativeArraySizeException
>>> at
>>> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>>> at
>>> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>>> Source)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>>> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>>> Source)
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>>> Source)
>>> at
>>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>>> at
>>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
>>> at
>>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
>>> at
>>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>>> at
>>> 

Re: NegativeArraySizeException / segfault

2016-05-27 Thread Koert Kuipers
it seemed to be related to an Aggregator, so for tests we replaced it with
an ordinary Dataset.reduce operation, and now we got:

java.lang.NegativeArraySizeException
at
org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
at
org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
Source)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:147)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

i did get the generated code, but its like 17 subtrees and its not a test
but a company real program so i cannot just send it over.

i will try to create a small test program to reproduce it.

On Fri, May 27, 2016 at 4:25 PM, Reynold Xin  wrote:

> They should get printed if you turn on debug level logging.
>
> On Fri, May 27, 2016 at 1:00 PM, Koert Kuipers  wrote:
>
>> hello all,
>> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
>> trying to run some algorithms at scale on our cluster.
>> unfortunately this means that when i see errors i am having a harder time
>> boiling it down to a small reproducible example.
>>
>> today we are running an iterative algo using the dataset api and we are
>> seeing tasks fail with errors which seem to related to unsafe operations.
>> the same tasks succeed without issues in our unit tests.
>>
>> i see either:
>>
>> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in
>> stage 21.0 (TID 1073)
>> java.lang.NegativeArraySizeException
>> at
>> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
>> at
>> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
>> Source)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>> Source)
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>> Source)
>> at
>> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>> at
>> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
>> at
>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
>> at
>> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>> 

Re: NegativeArraySizeException / segfault

2016-05-27 Thread Reynold Xin
They should get printed if you turn on debug level logging.

On Fri, May 27, 2016 at 1:00 PM, Koert Kuipers  wrote:

> hello all,
> after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
> trying to run some algorithms at scale on our cluster.
> unfortunately this means that when i see errors i am having a harder time
> boiling it down to a small reproducible example.
>
> today we are running an iterative algo using the dataset api and we are
> seeing tasks fail with errors which seem to related to unsafe operations.
> the same tasks succeed without issues in our unit tests.
>
> i see either:
>
> 16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in stage
> 21.0 (TID 1073)
> java.lang.NegativeArraySizeException
> at
> org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
> at
> org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
> Source)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
> Source)
> at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
> at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
> at
> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
> at
> org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> or alternatively:
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
> 1.7.0_75-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
> linux-amd64 compressed oops)
> # Problematic frame:
> # v  ~StubRoutines::jbyte_disjoint_arraycopy
>
> i assume the best thing would be to try to get it to print out the
> generated code that is causing this?
> what switch do i need to use again to do so?
> thanks,
> koert
>


NegativeArraySizeException / segfault

2016-05-27 Thread Koert Kuipers
hello all,
after getting our unit tests to pass on spark 2.0.0-SNAPSHOT we are now
trying to run some algorithms at scale on our cluster.
unfortunately this means that when i see errors i am having a harder time
boiling it down to a small reproducible example.

today we are running an iterative algo using the dataset api and we are
seeing tasks fail with errors which seem to related to unsafe operations.
the same tasks succeed without issues in our unit tests.

i see either:

16/05/27 12:54:46 ERROR executor.Executor: Exception in task 31.0 in stage
21.0 (TID 1073)
java.lang.NegativeArraySizeException
at
org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:229)
at
org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:821)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
Source)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$7$$anon$1.hasNext(WholeStageCodegenExec.scala:359)
at
org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:74)
at
org.apache.spark.sql.execution.aggregate.SortBasedAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortBasedAggregateExec.scala:71)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:775)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

or alternatively:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7fe571041cba, pid=2450, tid=140622965913344
#
# JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build
1.7.0_75-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# v  ~StubRoutines::jbyte_disjoint_arraycopy

i assume the best thing would be to try to get it to print out the
generated code that is causing this?
what switch do i need to use again to do so?
thanks,
koert