Re: Out of memory with Spark Streaming

Tim Smith Thu, 11 Sep 2014 15:09:48 -0700

I noticed that, by default, in CDH-5.1 (Spark 1.0.0), in both,
StandAlone and Yarn mode - no GC options are set when an executor is
launched. The only options passed in StandAlone mode are
"-XX:MaxPermSize=128m -Xms16384M -Xmx16384M" (when I give each
executor 16G).


In Yarn mode, even fewer JVM options are set - "-server
-XX:OnOutOfMemoryError=kill %p -Xms16384m -Xmx16384m"

Monitoring OS and heap usage side-by-side (using top and jmap), I see
that my physical memory usage is anywhere between 2x-5x of the heap
usage (all heap, not just live objects).

So I set this, SPARK_JAVA_OPTS="-XX:MaxPermSize=128m -XX:NewSize=1024m
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
-XX:MaxHeapFreeRatio=70"

I am still monitoring but I think my app is more stable now, in
standalone mode, whereas earlier, under Yarn, the container would get
killed for too much memory usage.

How do I get Yarn to enforce SPARK_JAVA_OPTS? Setting
"spark.executor.extrajavaoptions" doesn't seem to work.



On Thu, Sep 11, 2014 at 1:50 PM, Tathagata Das
<tathagata.das1...@gmail.com> wrote:
> Which version of spark are you running?
>
> If you are running the latest one, then could try running not a window but a
> simple event count on every 2 second batch, and see if you are still running
> out of memory?
>
> TD
>
>
> On Thu, Sep 11, 2014 at 10:34 AM, Aniket Bhatnagar
> <aniket.bhatna...@gmail.com> wrote:
>>
>> I did change it to be 1 gb. It still ran out of memory but a little later.
>>
>> The streaming job isnt handling a lot of data. In every 2 seconds, it
>> doesn't get more than 50 records. Each record size is not more than 500
>> bytes.
>>
>> On Sep 11, 2014 10:54 PM, "Bharat Venkat" <bvenkat.sp...@gmail.com> wrote:
>>>
>>> You could set "spark.executor.memory" to something bigger than the
>>> default (512mb)
>>>
>>>
>>> On Thu, Sep 11, 2014 at 8:31 AM, Aniket Bhatnagar
>>> <aniket.bhatna...@gmail.com> wrote:
>>>>
>>>> I am running a simple Spark Streaming program that pulls in data from
>>>> Kinesis at a batch interval of 10 seconds, windows it for 10 seconds, maps
>>>> data and persists to a store.
>>>>
>>>> The program is running in local mode right now and runs out of memory
>>>> after a while. I am yet to investigate heap dumps but I think Spark isn't
>>>> releasing memory after processing is complete. I have even tried changing
>>>> storage level to disk only.
>>>>
>>>> Help!
>>>>
>>>> Thanks,
>>>> Aniket
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Out of memory with Spark Streaming

Reply via email to