I have a spark job running on 2 terabytes of data which creates more than 30,000 partitions. As a result, the spark job fails with the error "Map output statuses were 170415722 bytes which exceeds spark.akka.frameSize 52428800 bytes" (For 1 TB data) However, when I increase the akka.frame.size to around 500 MB, the job hangs with no further progress.
So, what is the ideal or maximum limit that i can assign akka.frame.size so that I do not get the error of map output statuses exceeding limit for large chunks of data ? Is coalescing the data into smaller number of partitions the only solution to this problem? Is there any better way than coalescing many intermediate rdd's in program ? My driver memory: 10G Executor memory: 36G Executor memory overhead : 3G -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org