Hi everyone, I added more logs for my use case: When I cached all my data 500 mil records and count. I receive this. 16/06/16 10:09:25 ERROR TaskSetManager: Total size of serialized results of 27 tasks (1876.7 MB) is bigger than spark.driver.maxResultSize (1024.0 MB) >>> that weird because I just count After increase maxResultSize to 10g I still waiting slow for result and error 16/06/16 10:09:25 INFO BlockManagerInfo: Removed taskresult_94 on slave1:27743 in memory (size: 69.5 MB, free: 6.2 GB) org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 15 tasks (1042.6 MB) is bigger than spark.driver.maxResultSize (1024.0 MB ) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1863) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1876) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1889) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:883) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:357) at org.apache.spark.rdd.RDD.collect(RDD.scala:882) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:290) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2122) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2436) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2121) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2128) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2156) at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2155) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2449) at org.apache.spark.sql.Dataset.count(Dataset.scala:2155) ... 48 elided
I lost all my executors. > On Jun 15, 2016, at 8:44 PM, Chanh Le <giaosu...@gmail.com> wrote: > > Hi Gene, > I am using Alluxio 1.1.0. > Spark 2.0 Preview version. > Load from alluxio then cached and query for 2nd time. Spark will stuck. > > > >> On Jun 15, 2016, at 8:42 PM, Gene Pang <gene.p...@gmail.com >> <mailto:gene.p...@gmail.com>> wrote: >> >> Hi, >> >> Which version of Alluxio are you using? >> >> Thanks, >> Gene >> >> On Tue, Jun 14, 2016 at 3:45 AM, Chanh Le <giaosu...@gmail.com >> <mailto:giaosu...@gmail.com>> wrote: >> I am testing Spark 2.0 >> I load data from alluxio and cached then I query but the first query is ok >> because it kick off cache action. But after that I run the query again and >> it’s stuck. >> I ran in cluster 5 nodes in spark-shell. >> >> Did anyone has this issue? >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> For additional commands, e-mail: user-h...@spark.apache.org >> <mailto:user-h...@spark.apache.org> >> >> >