Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-03 Thread james
Based on the latest spark code(commit
608353c8e8e50461fafff91a2c885dca8af3aaa8) and used the same Spark SQL query
to test two group of combined configuration and seemed that currently it
don't work fine in tungsten-sort shuffle manager from below results:

*Test 1# (PASSED)*
spark.shuffle.manager=sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true 

*Test 2#(FAILED)*
spark.shuffle.manager=tungsten-sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true 

15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode4:50313
15/08/03 16:46:02 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 3 is 586 bytes
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode2:60490
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode2:56319
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:58179
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:32816
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode3:55840
15/08/03 16:46:02 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode3:46874
15/08/03 16:46:02 WARN scheduler.TaskSetManager: Lost task 42.0 in stage
158.0 (TID 1548, bignode4): java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:118)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:107)
at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:140)
at
org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:120)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13563.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-08-02 Thread james
Thank you for your reply!
Do you mean that currently if i want to use this Tungsten feature, we had to
set sort shuffle manager(spark.shuffle.manager=sort) ,right ?  However, I
saw a slide Deep Dive into Project Tungsten: Bringing Spark Closer to Bare
Metal published in Spark Summit 2015 and it seems to recommend
'tungsten-sort' manager.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13561.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-07-31 Thread Josh Rosen
It would also be great to test this with codegen and unsafe enabled but
while continuing to use sort shuffle manager instead of the new
tungsten-sort one.

On Fri, Jul 31, 2015 at 1:39 AM, Reynold Xin r...@databricks.com wrote:

 Is this deterministically reproducible? Can you try this on the latest
 master branch?

 Would be great to turn debug logging and and dump the generated code. Also
 would be great to dump the array size at your line 314 in UnsafeRow (and
 whatever master branch's appropriate line is).

 On Fri, Jul 31, 2015 at 1:31 AM, james yiaz...@gmail.com wrote:

 Another error:
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode1:40443
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output
 statuses
 for shuffle 3 is 583 bytes
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode1:40474
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode2:34052
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode3:46929
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode3:50890
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode2:47795
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode4:35120
 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 32.0 in
 stage
 151.0 (TID 1203) in 155 ms on bignode3 (1/50)
 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 35.0 in
 stage
 151.0 (TID 1204) in 157 ms on bignode2 (2/50)
 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 8.0 in
 stage
 151.0 (TID 1196) in 168 ms on bignode3 (3/50)
 15/07/31 16:15:28 WARN scheduler.TaskSetManager: Lost task 46.0 in stage
 151.0 (TID 1184, bignode1): java.lang.NegativeArraySizeException
 at

 org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:314)
 at

 org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:297)
 at SC$SpecificProjection.apply(Unknown Source)
 at

 org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:152)
 at

 org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:140)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at

 org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:148)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:86)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)





 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13538.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-07-31 Thread james
Another error:
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:40443
15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 3 is 583 bytes
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:40474
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode2:34052
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode3:46929
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode3:50890
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode2:47795
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode4:35120
15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 32.0 in stage
151.0 (TID 1203) in 155 ms on bignode3 (1/50)
15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 35.0 in stage
151.0 (TID 1204) in 157 ms on bignode2 (2/50)
15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 8.0 in stage
151.0 (TID 1196) in 168 ms on bignode3 (3/50)
15/07/31 16:15:28 WARN scheduler.TaskSetManager: Lost task 46.0 in stage
151.0 (TID 1184, bignode1): java.lang.NegativeArraySizeException
at
org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:314)
at
org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:297)
at SC$SpecificProjection.apply(Unknown Source)
at
org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:152)
at
org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:140)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at
org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:148)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13538.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Came across Spark SQL hang/Error issue with Spark 1.5 Tungsten feature

2015-07-31 Thread Reynold Xin
Is this deterministically reproducible? Can you try this on the latest
master branch?

Would be great to turn debug logging and and dump the generated code. Also
would be great to dump the array size at your line 314 in UnsafeRow (and
whatever master branch's appropriate line is).

On Fri, Jul 31, 2015 at 1:31 AM, james yiaz...@gmail.com wrote:

 Another error:
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode1:40443
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output
 statuses
 for shuffle 3 is 583 bytes
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode1:40474
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode2:34052
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode3:46929
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode3:50890
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode2:47795
 15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
 map output locations for shuffle 3 to bignode4:35120
 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 32.0 in
 stage
 151.0 (TID 1203) in 155 ms on bignode3 (1/50)
 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 35.0 in
 stage
 151.0 (TID 1204) in 157 ms on bignode2 (2/50)
 15/07/31 16:15:28 INFO scheduler.TaskSetManager: Finished task 8.0 in stage
 151.0 (TID 1196) in 168 ms on bignode3 (3/50)
 15/07/31 16:15:28 WARN scheduler.TaskSetManager: Lost task 46.0 in stage
 151.0 (TID 1184, bignode1): java.lang.NegativeArraySizeException
 at

 org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBinary(UnsafeRow.java:314)
 at

 org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:297)
 at SC$SpecificProjection.apply(Unknown Source)
 at

 org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:152)
 at

 org.apache.spark.sql.catalyst.expressions.FromUnsafeProjection.apply(Projection.scala:140)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
 at

 org.apache.spark.shuffle.unsafe.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:148)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:86)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)





 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Came-across-Spark-SQL-hang-Error-issue-with-Spark-1-5-Tungsten-feature-tp13537p13538.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org