This was a big help! For the benefit of my fellow travelers running spark on
EMR:

I made a json file with the following:

[ { "Classification": "yarn-site", "Properties": {
"yarn.nodemanager.pmem-check-enabled": "false",
"yarn.nodemanager.vmem-check-enabled": "false" } } ]

and then I created my cluster like so:

aws emr create-cluster --configurations
file:///Users/jsnavely/project/frick/spark_config/nomem.json
...

The other thing I noticed was that one of the dataframes I was joining
against was actually coming from
a gzip'd json file. gzip files are NOT splittable, so it wasn't properly
parallelized, which means that the join were causing alot of memory
pressure. I recompressed it was bzip2 and my job has been running with no
errors.

Thanks again!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-with-wide-289-column-dataframe-tp26651p26660.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to