Re: Do jobs fail because of other users of a cluster?

2017-01-23 Thread Matthew Dailey
In general, Java processes fail with an OutOfMemoryError when your code and data does not fit into the memory allocated to the runtime. In Spark, that memory is controlled through the --executor-memory flag. If you are running Spark on YARN, then YARN configuration will dictate the maximum memory

Re: tuning the spark.locality.wait

2017-01-23 Thread Matthew Dailey
This article recommends setting spark.locality.wait to 10 (milliseconds) in the case of using Spark Streaming and gives an explanation of why they chose that value. If using batch Spark, that value should still be a good starting place

Re: Spark Metrics: custom source/sink configurations not getting recognized

2016-11-28 Thread Matthew Dailey
I just stumbled upon this issue as well in Spark 1.6.2 when trying to write my own custom Sink. For anyone else who runs into this issue, there are two relevant JIRAs that I found, but no solution as of yet: - https://issues.apache.org/jira/browse/SPARK-14151 - Propose to refactor and expose