[ 
https://issues.apache.org/jira/browse/SPARK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Khaitman updated SPARK-5782:
---------------------------------
    Description: 
I'm including the Shuffle component on this, as a brief scan through the code 
(which I'm not 100% familiar with just yet) shows a large amount of memory 
handling in it:

It appears that any type of join between two RDDs spawns up twice as many 
pyspark.daemon workers compared to the default 1 task -> 1 core configuration 
in our environment. This can become problematic in the cases where you build up 
a tree of RDD joins, since the pyspark.daemons do not cease to exist until the 
top level join is completed (or so it seems)... This can lead to memory 
exhaustion by a single framework, even though is set to have a 512MB python 
worker memory limit and few gigs of executor memory.

Another related issue to this is that the individual python workers are not 
supposed to even exceed that far beyond 512MB, otherwise they're supposed to 
spill to disk.

Some of our python workers are somehow reaching 2GB each (which when multiplied 
by the number of cores per executor * the number of joins occurring in some 
cases), causing the Out-of-Memory killer to step up to its unfortunate job! :(

I originally thought _next_limit in shuffle.py had an issue though I initially 
misread it. Logic looks good there :) Somewhere the 512mb limit is not being 
checked is my current suspicion.

I've only just started looking into the code, and would definitely love to 
contribute towards Spark, though I figured it might be quicker to resolve if 
someone already owns the code!

  was:
I'm including the Shuffle component on this, as a brief scan through the code 
(which I'm not 100% familiar with just yet) shows a large amount of memory 
handling in it:

It appears that any type of join between two RDDs spawns up twice as many 
pyspark.daemon workers compared to the default 1 task -> 1 core configuration 
in our environment. This can become problematic in the cases where you build up 
a tree of RDD joins, since the pyspark.daemons do not cease to exist until the 
top level join is completed (or so it seems)... This can lead to memory 
exhaustion by a single framework, even though is set to have a 512MB python 
worker memory limit and few gigs of executor memory.

Another related issue to this is that the individual python workers are not 
supposed to even exceed that far beyond 512MB, otherwise they're supposed to 
spill to disk.

I came across this bit of code in shuffle.py which *may* have something to do 
with allowing some of our python workers from somehow reaching 2GB each (which 
when multiplied by the number of cores per executor * the number of joins 
occurring in some cases), causing the Out-of-Memory killer to step up to its 
unfortunate job! :(

def _next_limit(self):
        """
        Return the next memory limit. If the memory is not released
        after spilling, it will dump the data only when the used memory
        starts to increase.
        """
        return max(self.memory_limit, get_used_memory() * 1.05)


I've only just started looking into the code, and would definitely love to 
contribute towards Spark, though I figured it might be quicker to resolve if 
someone already owns the code!


> Python Worker / Pyspark Daemon Memory Issue
> -------------------------------------------
>
>                 Key: SPARK-5782
>                 URL: https://issues.apache.org/jira/browse/SPARK-5782
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Shuffle
>    Affects Versions: 1.3.0, 1.2.1, 1.2.2
>         Environment: CentOS 7, Spark Standalone
>            Reporter: Mark Khaitman
>
> I'm including the Shuffle component on this, as a brief scan through the code 
> (which I'm not 100% familiar with just yet) shows a large amount of memory 
> handling in it:
> It appears that any type of join between two RDDs spawns up twice as many 
> pyspark.daemon workers compared to the default 1 task -> 1 core configuration 
> in our environment. This can become problematic in the cases where you build 
> up a tree of RDD joins, since the pyspark.daemons do not cease to exist until 
> the top level join is completed (or so it seems)... This can lead to memory 
> exhaustion by a single framework, even though is set to have a 512MB python 
> worker memory limit and few gigs of executor memory.
> Another related issue to this is that the individual python workers are not 
> supposed to even exceed that far beyond 512MB, otherwise they're supposed to 
> spill to disk.
> Some of our python workers are somehow reaching 2GB each (which when 
> multiplied by the number of cores per executor * the number of joins 
> occurring in some cases), causing the Out-of-Memory killer to step up to its 
> unfortunate job! :(
> I originally thought _next_limit in shuffle.py had an issue though I 
> initially misread it. Logic looks good there :) Somewhere the 512mb limit is 
> not being checked is my current suspicion.
> I've only just started looking into the code, and would definitely love to 
> contribute towards Spark, though I figured it might be quicker to resolve if 
> someone already owns the code!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to