from:"Tim Moran"

OutputMetrics with data frames (spark-avro)

2016-10-17 Thread Tim Moran

Hi, I'm using the Databricks spark-avro library to save some DataFrames out as Avro (with Spark 1.6.1). When I do this however, I lose the information in the spark events about the number of records and size of data written to HDFS for each partition that's available if I save an RDD out as a

YARN memory overhead settings

2016-09-06 Thread Tim Moran

Hi, I'm running a spark job on YARN, using 6 executors each with 25 GB of memory and spark.yarn.executor.overhead set to 5GB. Despite this, I still seem to see YARN killing my executors for exceeding the memory limit. Reading the docs, it looks like the overhead defaults to around 10% of the