Re: Long Running Spark Streaming getting slower
Right without knowing what exactly the code it is difficult to say. Do you analyze the stuff from your Spark GUI? For example looking at the amount of spillage and spill size as the DAG diagram shows below? After three days is a short period of time, so it is concerning! HTH P.S. What is the nature of this spark streaming if you can divulge on it? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 10 June 2016 at 18:48, John Simon <john.si...@tapjoy.com> wrote: > Hi Mich, > > batch interval is 10 seconds, and I don't use sliding window. > Typical message count per batch is ~100k. > > > -- > John Simon > > On Fri, Jun 10, 2016 at 10:31 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi John, >> >> I did not notice anything unusual in your env variables. >> >> However, what are the batch interval, the windowsLength and >> SlindingWindow interval. >> >> Also how many messages are sent by Kafka in a typical batch interval? >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 10 June 2016 at 18:21, john.simon <john.si...@tapjoy.com> wrote: >> >>> Hi all, >>> >>> I'm running Spark Streaming with Kafka Direct Stream, but after >>> running a couple of days, the batch processing time almost doubles. >>> I didn't find any slowdown on JVM GC logs, but I did find that Spark >>> broadcast variable reading time increasing. >>> Initially it takes less than 10ms, but after 3 days it takes more than >>> 60ms. It's really puzzling since I don't use broadcast variables at >>> all. >>> >>> My application needs to run 24/7, so I hope there's something I'm >>> missing to correct this behavior. >>> >>> FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client >>> mode. >>> Attached spark application environment settings file. >>> >>> -- >>> John Simon >>> >>> *environment.txt* (7K) Download Attachment >>> <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt> >>> >>> -- >>> View this message in context: Long Running Spark Streaming getting >>> slower >>> <http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html> >>> Sent from the Apache Spark User List mailing list archive >>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >>> >> >> >
Re: Long Running Spark Streaming getting slower
Hi Mich, batch interval is 10 seconds, and I don't use sliding window. Typical message count per batch is ~100k. -- John Simon On Fri, Jun 10, 2016 at 10:31 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > Hi John, > > I did not notice anything unusual in your env variables. > > However, what are the batch interval, the windowsLength and SlindingWindow > interval. > > Also how many messages are sent by Kafka in a typical batch interval? > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 10 June 2016 at 18:21, john.simon <john.si...@tapjoy.com> wrote: > >> Hi all, >> >> I'm running Spark Streaming with Kafka Direct Stream, but after >> running a couple of days, the batch processing time almost doubles. >> I didn't find any slowdown on JVM GC logs, but I did find that Spark >> broadcast variable reading time increasing. >> Initially it takes less than 10ms, but after 3 days it takes more than >> 60ms. It's really puzzling since I don't use broadcast variables at >> all. >> >> My application needs to run 24/7, so I hope there's something I'm >> missing to correct this behavior. >> >> FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client >> mode. >> Attached spark application environment settings file. >> >> -- >> John Simon >> >> *environment.txt* (7K) Download Attachment >> <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt> >> >> -- >> View this message in context: Long Running Spark Streaming getting slower >> <http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html> >> Sent from the Apache Spark User List mailing list archive >> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >> > >
Re: Long Running Spark Streaming getting slower
Hi John, I did not notice anything unusual in your env variables. However, what are the batch interval, the windowsLength and SlindingWindow interval. Also how many messages are sent by Kafka in a typical batch interval? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 10 June 2016 at 18:21, john.simon <john.si...@tapjoy.com> wrote: > Hi all, > > I'm running Spark Streaming with Kafka Direct Stream, but after > running a couple of days, the batch processing time almost doubles. > I didn't find any slowdown on JVM GC logs, but I did find that Spark > broadcast variable reading time increasing. > Initially it takes less than 10ms, but after 3 days it takes more than > 60ms. It's really puzzling since I don't use broadcast variables at > all. > > My application needs to run 24/7, so I hope there's something I'm > missing to correct this behavior. > > FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client > mode. > Attached spark application environment settings file. > > -- > John Simon > > *environment.txt* (7K) Download Attachment > <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt> > > ---------- > View this message in context: Long Running Spark Streaming getting slower > <http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >
Long Running Spark Streaming getting slower
Hi all, I'm running Spark Streaming with Kafka Direct Stream, but after running a couple of days, the batch processing time almost doubles. I didn't find any slowdown on JVM GC logs, but I did find that Spark broadcast variable reading time increasing. Initially it takes less than 10ms, but after 3 days it takes more than 60ms. It's really puzzling since I don't use broadcast variables at all. My application needs to run 24/7, so I hope there's something I'm missing to correct this behavior. FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client mode. Attached spark application environment settings file. -- John Simon environment.txt (7K) <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html Sent from the Apache Spark User List mailing list archive at Nabble.com.