Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature
Based on the latest spark code(commit 608353c8e8e50461fafff91a2c885dca8af3aaa8) and used the same Spark SQL query to test two group of combined configuration and seemed that currently it don't work fine in tungsten-sort shuffle manager from below results: *Test 1# (PASSED)* spark.shuffle.manager=sort spark.sql.codegen=true spark.sql.unsafe.enabled=true *Test 2#(FAILED)* spark.shuffle.manager=tungsten-sort spark.sql.codegen=true spark.sql.unsafe.enabled=true 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode4:50313 15/08/03 16:46:02 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 586 bytes 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:60490 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:56319 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:58179 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:32816 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:55840 15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:46874 15/08/03 16:46:02 WARN scheduler.TaskSetManager: Lost task 42.0 in stage 158.0 (TID 1548, bignode4): java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:118) at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:107) at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:140) at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:120) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13563.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature
Thank you for your reply! Do you mean that currently if i want to use this Tungsten feature, we had to set sort shuffle manager(spark.shuffle.manager=sort) ,right ? However, I saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal published in Spark Summit 2015 and it seems to recommend 'tungsten-sort' manager. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13561.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature
It would also be great to test this with codegen and unsafe enabled but while continuing to use sort shuffle manager instead of the new tungsten-sort one. On Fri, Jul 31, 2015 at 1:39 AM, Reynold Xin r...@databricks.com wrote: Is this deterministically reproducible? Can you try this on the latest master branch? Would be great to turn debug logging and and dump the generated code. Also would be great to dump the array size at your line 314 in UnsafeRow (and whatever master branch's appropriate line is). On Fri, Jul 31, 2015 at 1:31 AM, james yiaz...@gmail.com wrote: Another error: 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40443 15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 583 bytes 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40474 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:34052 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:46929 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:50890 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:47795 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode4:35120 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 32.0 in stage 151.0 (TID 1203) in 155 ms on bignode3 (1/50) 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 35.0 in stage 151.0 (TID 1204) in 157 ms on bignode2 (2/50) 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 151.0 (TID 1196) in 168 ms on bignode3 (3/50) 15/07/31 16:15:28 WARN scheduler.TaskSetManager: Lost task 46.0 in stage 151.0 (TID 1184, bignode1): java.lang.NegativeArraySizeException at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:314) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:297) at SC$SpecificProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:152) at org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:140) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:148) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13538.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature
Another error: 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40443 15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 583 bytes 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40474 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:34052 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:46929 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:50890 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:47795 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode4:35120 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 32.0 in stage 151.0 (TID 1203) in 155 ms on bignode3 (1/50) 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 35.0 in stage 151.0 (TID 1204) in 157 ms on bignode2 (2/50) 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 151.0 (TID 1196) in 168 ms on bignode3 (3/50) 15/07/31 16:15:28 WARN scheduler.TaskSetManager: Lost task 46.0 in stage 151.0 (TID 1184, bignode1): java.lang.NegativeArraySizeException at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:314) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:297) at SC$SpecificProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:152) at org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:140) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:148) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13538.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature
Is this deterministically reproducible? Can you try this on the latest master branch? Would be great to turn debug logging and and dump the generated code. Also would be great to dump the array size at your line 314 in UnsafeRow (and whatever master branch's appropriate line is). On Fri, Jul 31, 2015 at 1:31 AM, james yiaz...@gmail.com wrote: Another error: 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40443 15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 583 bytes 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode1:40474 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:34052 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:46929 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode3:50890 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode2:47795 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to bignode4:35120 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 32.0 in stage 151.0 (TID 1203) in 155 ms on bignode3 (1/50) 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 35.0 in stage 151.0 (TID 1204) in 157 ms on bignode2 (2/50) 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 151.0 (TID 1196) in 168 ms on bignode3 (3/50) 15/07/31 16:15:28 WARN scheduler.TaskSetManager: Lost task 46.0 in stage 151.0 (TID 1184, bignode1): java.lang.NegativeArraySizeException at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:314) at org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:297) at SC$SpecificProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:152) at org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:140) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:148) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13538.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org