[ https://issues.apache.org/jira/browse/SPARK-28843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-28843. ---------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25545 [https://github.com/apache/spark/pull/25545] > Set OMP_NUM_THREADS to executor cores reduce Python memory consumption > ---------------------------------------------------------------------- > > Key: SPARK-28843 > URL: https://issues.apache.org/jira/browse/SPARK-28843 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.3.3, 3.0.0, 2.4.3 > Reporter: Ryan Blue > Assignee: Ryan Blue > Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > While testing hardware with more cores, we found that the amount of memory > required by PySpark applications increased and tracked the problem to > importing numpy. The numpy issue isĀ > [https://github.com/numpy/numpy/issues/10455] > NumPy uses OpenMP that starts a thread pool with the number of cores on the > machine (and does not respect cgroups). When we set this lower we see a > significant reduction in memory consumption. > This parallelism setting should be set to the number of cores allocated to > the executor, not the number of cores available. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org