[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043730#comment-14043730 ]
Bharath Ravi Kumar commented on SPARK-1112: ------------------------------------------- Can a clear workaround be specified for this bug please? For those unable to upgrade to run on 1.0.1 or 1.1.0 in production, general instructions on the workaround are required. This is a huge blocker for current production deployments (even on 1.0.0) otherwise. For instance, running a saveAsTextFile() on an RDD (~400MB) causes execution to freeze with the last log statements seen on the driver being: 14/06/25 16:38:55 INFO spark.SparkContext: Starting job: saveAsTextFile at Test.java:99 14/06/25 16:38:55 INFO scheduler.DAGScheduler: Got job 6 (saveAsTextFile at Test.java:99) with 2 output partitions (allowLocal=false) 14/06/25 16:38:55 INFO scheduler.DAGScheduler: Final stage: Stage 6(saveAsTextFile at Test.java:99) 14/06/25 16:38:55 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/06/25 16:38:55 INFO scheduler.DAGScheduler: Missing parents: List() 14/06/25 16:38:55 INFO scheduler.DAGScheduler: Submitting Stage 6 (MappedRDD[558] at saveAsTextFile at Test.java:99), which has no missing parents 14/06/25 16:38:55 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 6 (MappedRDD[558] at saveAsTextFile at Test.java:99) 14/06/25 16:38:55 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 2 tasks 14/06/25 16:38:55 INFO scheduler.TaskSetManager: Starting task 6.0:0 as TID 5 on executor 1: somehost.corp (PROCESS_LOCAL) 14/06/25 16:38:55 INFO scheduler.TaskSetManager: Serialized task 6.0:0 as 351777 bytes in 36 ms 14/06/25 16:38:55 INFO scheduler.TaskSetManager: Starting task 6.0:1 as TID 6 on executor 0: someotherhost.corp (PROCESS_LOCAL) 14/06/25 16:38:55 INFO scheduler.TaskSetManager: Serialized task 6.0:1 as 186453 bytes in 16 ms Thanks. > When spark.akka.frameSize > 10, task results bigger than 10MiB block execution > ------------------------------------------------------------------------------ > > Key: SPARK-1112 > URL: https://issues.apache.org/jira/browse/SPARK-1112 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.9.0, 1.0.0 > Reporter: Guillaume Pitel > Assignee: Xiangrui Meng > Priority: Blocker > Fix For: 1.0.1, 1.1.0 > > > When I set the spark.akka.frameSize to something over 10, the messages sent > from the executors to the driver completely block the execution if the > message is bigger than 10MiB and smaller than the frameSize (if it's above > the frameSize, it's ok) > Workaround is to set the spark.akka.frameSize to 10. In this case, since > 0.8.1, the blockManager deal with the data to be sent. It seems slower than > akka direct message though. > The configuration seems to be correctly read (see actorSystemConfig.txt), so > I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)