This is on spark 1.2 I am loading ~6k parquet files, roughly 500 MB each into a schemaRDD, and calling count() on it.
After loading about 2705 tasks (there is one per file), the job crashes with this error: Total size of serialized results of 2705 tasks (1024.0 MB) is bigger than spark.driver.maxResultSize (1024.0 MB) This indicates that the results of each task are about 2705/1024 = 2.6MB each. Is that normal? I don't know exactly what the result of each task would be, but 2.6 MB for each seems too high. Can anyone offer an explanation as to what the normal size should be if this is too high, or ways to reduce this? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Serialized-task-result-size-exceeded-tp21449.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org