Nong Li created SPARK-12546:
-------------------------------

             Summary: Writing to partitioned parquet table can fail with OOM
                 Key: SPARK-12546
                 URL: https://issues.apache.org/jira/browse/SPARK-12546
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.0
            Reporter: Nong Li


It is possible to have jobs fail with OOM when writing to a partitioned parquet 
table. While this was probably always possible, it is more likely in 1.6 due to 
the memory manager changes. The unified memory manager enables Spark to use 
more of the process memory (in particular, for execution) which gets us in this 
state more often. This issue can happen for libraries that consume a lot of 
memory, such as parquet. Prior to 1.6, these libraries would more likely use 
memory that spark was not using (i.e. from the storage pool). In 1.6, this 
storage memory can now be used for execution.

There are a couple of configs that can help with this issue.
  - parquet.memory.pool.ratio: This is a parquet config on how much of the heap 
the parquet writers should use. This default to .95. Consider a much lower 
value (e.g. 0.1)
  - spark.memory.faction: This is a spark config to control how much of the 
memory should be allocated to spark. Consider setting this to 0.6.

This should cause jobs to potentially spill more but require less memory. More 
aggressive tuning will control this trade off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to