In general, Java processes fail with an OutOfMemoryError when your code and
data does not fit into the memory allocated to the runtime. In Spark, that
memory is controlled through the --executor-memory flag.
If you are running Spark on YARN, then YARN configuration will dictate the
maximum memory
This article recommends setting spark.locality.wait to 10 (milliseconds) in
the case of using Spark Streaming and gives an explanation of why they
chose that value. If using batch Spark, that value should still be a good
starting place
I just stumbled upon this issue as well in Spark 1.6.2 when trying to write
my own custom Sink. For anyone else who runs into this issue, there are
two relevant JIRAs that I found, but no solution as of yet:
- https://issues.apache.org/jira/browse/SPARK-14151 - Propose to refactor
and expose