subject:"\[jira\] \[Commented\] \(SPARK\-17380\) Spark streaming with a multi shard Kinesis freezes after several days \(memory\/resource leak\?\)"

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2017-01-16 Thread Xeto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824564#comment-15824564
 ] 

Xeto commented on SPARK-17380:
--

Hi
We switched to using StorageLevel.MEMORY_AND_DISK_SER on consumption from 
Kinesis as suggested above, also upgraded to EMR 5.2.0 (Spark 2.0.2)
Looks stable even on a multi-shard Kinesis (was able to survive high load 
without executors being killed).
Thanks!


> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2016-11-19 Thread Udit Mehrotra (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680276#comment-15680276
 ] 

Udit Mehrotra commented on SPARK-17380:
---

The above leak was seen with Spark 2.0 running on EMR. I noticed that the code 
path which causes the leak is the Block replication code, so I switched to 
using StorageLevel.MEMORY_AND_DISK, from StorageLevel.MEMORY_AND_DISK_2 for the 
Kinesis blocks received. After switching, I do not observe the above memory 
leak in the logs, but the application still freezes after 3-3.5 days. Spark 
streaming stops processing the records, and the input queue of records received 
from Kinesis keeps growing, until the executor runs out of memory.

> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2016-11-19 Thread Udit Mehrotra (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680270#comment-15680270
 ] 

Udit Mehrotra commented on SPARK-17380:
---

We came across this Memory Leak in the executor logs, by using the JVM option 
'-Dio.netty.leakDetectionLevel=advanced', which seems like a good evidence of 
memory leak, and tells the location where the buffer is created.

16/11/09 06:03:28 ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not 
called before it's garbage-collected. See 
http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 0
Created at:
io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:103)
io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:335)
io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:247)

org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:69)

org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1161)

org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:976)

org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910)
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)

org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910)

org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:700)

org.apache.spark.streaming.receiver.BlockManagerBasedBlockHandler.storeBlock(ReceivedBlockHandler.scala:80)

org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushAndReportBlock(ReceiverSupervisorImpl.scala:158)

org.apache.spark.streaming.receiver.ReceiverSupervisorImpl.pushArrayBuffer(ReceiverSupervisorImpl.scala:129)
org.apache.spark.streaming.receiver.Receiver.store(Receiver.scala:133)

org.apache.spark.streaming.kinesis.KinesisReceiver.org$apache$spark$streaming$kinesis$KinesisReceiver$$storeBlockWithRanges(KinesisReceiver.scala:282)

org.apache.spark.streaming.kinesis.KinesisReceiver$GeneratedBlockHandler.onPushBlock(KinesisReceiver.scala:352)

org.apache.spark.streaming.receiver.BlockGenerator.pushBlock(BlockGenerator.scala:297)

org.apache.spark.streaming.receiver.BlockGenerator.org$apache$spark$streaming$receiver$BlockGenerator$$keepPushingBlocks(BlockGenerator.scala:269)

org.apache.spark.streaming.receiver.BlockGenerator$$anon$1.run(BlockGenerator.scala:110)

Can we please have some action on this JIRA ?

> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2016-09-07 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15470399#comment-15470399
 ] 

Sean Owen commented on SPARK-17380:
---

Weirdly.. this might be related to SPARK-17379, where we're upgrading Netty and 
finding some problems with its memory pool. It's kind of what you're showing 
here, with a lot of memory held by netty pooled byte buffers. CC [~zsxwing] FYI

> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2016-09-07 Thread Xeto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15470322#comment-15470322
 ] 

Xeto commented on SPARK-17380:
--

Decided to try more executors with less memory to reduce the full GC time.
spark.executor.memory 2612M
spark.driver.memory 2612M
spark.yarn.executor.memoryOverhead 2612
spark.yarn.driver.memoryOverhead 2612

Also increased the network timeout:
spark.network.timeout 1800s

The cluster has been running without freezing for 1day and 16 hours.
However, the used memory kept growing till filling up almost all available.
Then 3 executors were killed and 3 new started.
Judging from the container log of the removed executors, I can tell that the 
full GC has failed and an OutOfMemory happened.
Executor stdout log follows. The same message can be seen on 3 executors.
As I'm not storing any data of my own in long term memory - so it seems that's 
spark itself (or the Kinesis connector in spark-streaming-kinesis-asl) is 
leaking.
We need Spark streaming to run without freezing/killing executors for at least 
a week.
Any input is appreciated.  Thanks in advance.

2016-09-07T09:42:35.110+: [CMS-concurrent-mark-start]
2016-09-07T09:42:35.116+: [Full GC (Allocation Failure) 
2016-09-07T09:42:35.116+: [CMS2016-09-07T09:42:36.090+: 
[CMS-concurrent-mark: 0.978/0.979 secs] [Times: user=1.98 sys=0.00, real=0.98 
secs] 
 (concurrent mode failure): 1993151K->1993151K(1993152K), 4.6558419 secs] 
2606437K->2606098K(2606592K), [Metaspace: 49338K->49338K(1093632K)], 4.6559435 
secs] [Times: user=5.63 sys=0.00, real=4.66 secs] 
2016-09-07T09:42:39.772+: [Full GC (Allocation Failure) 
2016-09-07T09:42:39.772+: [CMS: 1993151K->1993151K(1993152K), 2.9516622 
secs] 2606098K->2606090K(2606592K), [Metaspace: 49338K->49338K(1093632K)], 
2.9517595 secs] [Times: user=2.95 sys=0.00, real=2.95 secs] 
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 18903"...




> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: exec_Leak_Hunter.zip, memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2016-09-02 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458862#comment-15458862
 ] 

Sean Owen commented on SPARK-17380:
---

This doesn't show evidence of a memory leak. You may be low on memory, and 
you're experiencing a full GC. That would be consistent with these 
observations. That just means you need more memory or to better tune your GC. 
Huge pauses can cause stuff to 'hang' for a while, in any Java app.

> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

2016-09-02 Thread Xeto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458669#comment-15458669
 ] 

Xeto commented on SPARK-17380:
--

Could you advise how to obtain such an evidence?
Not storing anything in memory - besides the persist with replication.
It's a very straightforward POC code with no 3-rd paty cache or anything.
The events processed are objects with nested HashMap/LinkedList objects.
I noticed that after the freeze, used memory starts going down (eventual GC?) 
but it doesn't help the application to recover.



> Spark streaming with a multi shard Kinesis freezes after several days 
> (memory/resource leak?)
> -
>
> Key: SPARK-17380
> URL: https://issues.apache.org/jira/browse/SPARK-17380
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 2.0.0
>Reporter: Xeto
> Attachments: memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 
> shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has 
> been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it 
> doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), 
> persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then 
> storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 
> -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO 
> -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M 
> -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

[jira] [Commented] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

7 matches

Site Navigation

Mail list logo

Footer information