This was a big help! For the benefit of my fellow travelers running spark on EMR:
I made a json file with the following: [ { "Classification": "yarn-site", "Properties": { "yarn.nodemanager.pmem-check-enabled": "false", "yarn.nodemanager.vmem-check-enabled": "false" } } ] and then I created my cluster like so: aws emr create-cluster --configurations file:///Users/jsnavely/project/frick/spark_config/nomem.json ... The other thing I noticed was that one of the dataframes I was joining against was actually coming from a gzip'd json file. gzip files are NOT splittable, so it wasn't properly parallelized, which means that the join were causing alot of memory pressure. I recompressed it was bzip2 and my job has been running with no errors. Thanks again! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-with-wide-289-column-dataframe-tp26651p26660.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org