Re: Spark 2.0 Preview After caching query didn't work and can't kill job.

Chanh Le Wed, 15 Jun 2016 20:53:04 -0700

Hi everyone,
I added more logs for my use case:

When I cached all my data 500 mil records and count.
I receive this.
16/06/16 10:09:25 ERROR TaskSetManager: Total size of serialized results of 27 
tasks (1876.7 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
>>> that weird because I just count
After increase maxResultSize to 10g
I still waiting slow for result and error
16/06/16 10:09:25 INFO BlockManagerInfo: Removed taskresult_94 on slave1:27743 
in memory (size: 69.5 MB, free: 6.2 GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size 
of serialized results of 15 tasks (1042.6 MB) is bigger than 
spark.driver.maxResultSize (1024.0 MB
)
  at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
  at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
  at scala.Option.foreach(Option.scala:257)
  at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
  at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1863)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1876)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1889)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:883)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:882)
  at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:290)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2122)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2436)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2121)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2128)
  at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2156)
  at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2155)
  at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2449)
  at org.apache.spark.sql.Dataset.count(Dataset.scala:2155)
  ... 48 elided


I lost all my executors.



> On Jun 15, 2016, at 8:44 PM, Chanh Le <giaosu...@gmail.com> wrote:
> 
> Hi Gene,
> I am using Alluxio 1.1.0.
> Spark 2.0 Preview version. 
> Load from alluxio then cached and query for 2nd time. Spark will stuck.
> 
> 
> 
>> On Jun 15, 2016, at 8:42 PM, Gene Pang <gene.p...@gmail.com 
>> <mailto:gene.p...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> Which version of Alluxio are you using?
>> 
>> Thanks,
>> Gene
>> 
>> On Tue, Jun 14, 2016 at 3:45 AM, Chanh Le <giaosu...@gmail.com 
>> <mailto:giaosu...@gmail.com>> wrote:
>> I am testing Spark 2.0
>> I load data from alluxio and cached then I query but the first query is ok 
>> because it kick off cache action. But after that I run the query again and 
>> it’s stuck.
>> I ran in cluster 5 nodes in spark-shell.
>> 
>> Did anyone has this issue?
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: user-h...@spark.apache.org 
>> <mailto:user-h...@spark.apache.org>
>> 
>> 
>

Re: Spark 2.0 Preview After caching query didn't work and can't kill job.

Reply via email to