[ https://issues.apache.org/jira/browse/SPARK-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506754#comment-14506754 ]
Sean Owen commented on SPARK-7053: ---------------------------------- Why do you think it's a matter of leaked resources? you're building up a window of values for example, and adding to an in-memory queue. Can you use a heap analyzer to see what's taking up the memory? I get it that you have one test that does what you expect and another that doesn't but they have different behaviors. More info would help. > KafkaUtils.createStream leaks resources > --------------------------------------- > > Key: SPARK-7053 > URL: https://issues.apache.org/jira/browse/SPARK-7053 > Project: Spark > Issue Type: Bug > Components: Streaming > Affects Versions: 1.3.1 > Environment: Spark Streaming 1.3.1, standalone mode running on just 1 > box: Ubuntu 14.04.2 LTS, 4 cores, 8GB RAM, java version "1.8.0_40" > Reporter: Platon Potapov > Attachments: from.kafka.direct.stream.gif, from.kafka.receiver.gif, > from.queue.gif > > > I have a simple spark streaming application that reads messages from a Kafka > topic, applies trivial transforms (window, map, mapValues, foreachRDD), and > throws away the results. A test application, really. Streaming batch duration > is 2 seconds. > The kafka topic is steadily populated by 5000 messages each second by a data > generator application. Each message is small (a few double values). > The problem: > * in case KafkaUtils.createStream is used to create a receiver fetching > messages from Kafka, "Processing Time" (as seen at > http://localhost:4040/streaming) starts with hundreds of milliseconds, but > steadily increases over time, eventually resulting in scheduling delays and > out of memory. Note that memory consumption (by the JVM) and garbage > collection pauses grow over time. > * in case KafkaUtils.createDirectStream is used (the rest of the setup is > exactly the same), processing time starts at about 5 second (compare to > hundreds of milliseconds in case of kafka receiver), but remains at this > level. > * in case the input is substituted to an in-memory queue populated by the > same messages at the same rate (streamingContext.queueStream(queue)), > processing time is again hundreds of milliseconds, and remains stable. > This leads me to conclude that the setup with KafkaUtils.createStream > receiver leaks resources. > Attached are the graphs depicting each of the three scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org