Hi,
I'm using the Databricks spark-avro library to save some DataFrames out as
Avro (with Spark 1.6.1). When I do this however, I lose the information in
the spark events about the number of records and size of data written to
HDFS for each partition that's available if I save an RDD out as a
Hi,
I'm running a spark job on YARN, using 6 executors each with 25 GB of
memory and spark.yarn.executor.overhead set to 5GB. Despite this, I still
seem to see YARN killing my executors for exceeding the memory limit.
Reading the docs, it looks like the overhead defaults to around 10% of the