Hi all, I currently have a mapPartitions job which is flatMapping each value in the iterator, and I'm running into an issue where there will be major GC costs on certain executions. Some executors will take 20 minutes, 15 of which are pure garbage collection, and I believe that a lot of it has to do with the ArrayBuffer that I am outputting. Does anyone have any suggestions as to how I can do some form of a stream output?
Also, does anyone have any advice in general for tracking down/addressing GC issues in spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-collections-issue-on-MapPartitions-tp26104.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org