Hello,

I am tinkering with Spark 1.6. I have this 1.5 Billion rows data, to which I 
apply several window functions such as lag, first, etc. The job is quite 
expensive, I am running a small cluster with executors running with 70GB of ram.

Using new memory management system, the job fails around the middle with heap 
memory limit exceeded problem. Tried also tinkering with different of the new 
memory settings with no success. 70GB * 4 nodes is a lot of resources for this 
kind of job.

Legacy mode memory management runs this job succesfully with default memory 
settings.

How could I further analyze this problem to provide assistance and better 
diagnostics??
All the job goes around the dataframe api, with nothing strange (no udf or 
custom operations).

Saif

Reply via email to