Re: Long Running Spark Streaming getting slower

2016-06-10 Thread Mich Talebzadeh
Right without knowing what exactly the code it is difficult to say.

Do you analyze the stuff from your Spark GUI? For example looking at the
amount of spillage and spill size as the DAG diagram shows below?


​
After three days is a short period of time, so it is concerning!


HTH

P.S. What is the nature of this spark streaming if you can divulge on it?

HTH




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 10 June 2016 at 18:48, John Simon <john.si...@tapjoy.com> wrote:

> Hi Mich,
>
> batch interval is 10 seconds, and I don't use sliding window.
> Typical message count per batch is ~100k.
>
>
> --
> John Simon
>
> On Fri, Jun 10, 2016 at 10:31 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi John,
>>
>> I did not notice anything unusual in your env variables.
>>
>> However, what are the batch interval, the windowsLength and
>> SlindingWindow interval.
>>
>> Also how many messages are sent by Kafka in a typical batch interval?
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 10 June 2016 at 18:21, john.simon <john.si...@tapjoy.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm running Spark Streaming with Kafka Direct Stream, but after
>>> running a couple of days, the batch processing time almost doubles.
>>> I didn't find any slowdown on JVM GC logs, but I did find that Spark
>>> broadcast variable reading time increasing.
>>> Initially it takes less than 10ms, but after 3 days it takes more than
>>> 60ms. It's really puzzling since I don't use broadcast variables at
>>> all.
>>>
>>> My application needs to run 24/7, so I hope there's something I'm
>>> missing to correct this behavior.
>>>
>>> FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client
>>> mode.
>>> Attached spark application environment settings file.
>>>
>>> --
>>> John Simon
>>>
>>> *environment.txt* (7K) Download Attachment
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt>
>>>
>>> --
>>> View this message in context: Long Running Spark Streaming getting
>>> slower
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html>
>>> Sent from the Apache Spark User List mailing list archive
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>
>>
>>
>


Re: Long Running Spark Streaming getting slower

2016-06-10 Thread John Simon
Hi Mich,

batch interval is 10 seconds, and I don't use sliding window.
Typical message count per batch is ~100k.


--
John Simon

On Fri, Jun 10, 2016 at 10:31 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> Hi John,
>
> I did not notice anything unusual in your env variables.
>
> However, what are the batch interval, the windowsLength and SlindingWindow
> interval.
>
> Also how many messages are sent by Kafka in a typical batch interval?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 10 June 2016 at 18:21, john.simon <john.si...@tapjoy.com> wrote:
>
>> Hi all,
>>
>> I'm running Spark Streaming with Kafka Direct Stream, but after
>> running a couple of days, the batch processing time almost doubles.
>> I didn't find any slowdown on JVM GC logs, but I did find that Spark
>> broadcast variable reading time increasing.
>> Initially it takes less than 10ms, but after 3 days it takes more than
>> 60ms. It's really puzzling since I don't use broadcast variables at
>> all.
>>
>> My application needs to run 24/7, so I hope there's something I'm
>> missing to correct this behavior.
>>
>> FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client
>> mode.
>> Attached spark application environment settings file.
>>
>> --
>> John Simon
>>
>> *environment.txt* (7K) Download Attachment
>> <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt>
>>
>> --
>> View this message in context: Long Running Spark Streaming getting slower
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>


Re: Long Running Spark Streaming getting slower

2016-06-10 Thread Mich Talebzadeh
Hi John,

I did not notice anything unusual in your env variables.

However, what are the batch interval, the windowsLength and SlindingWindow
interval.

Also how many messages are sent by Kafka in a typical batch interval?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 10 June 2016 at 18:21, john.simon <john.si...@tapjoy.com> wrote:

> Hi all,
>
> I'm running Spark Streaming with Kafka Direct Stream, but after
> running a couple of days, the batch processing time almost doubles.
> I didn't find any slowdown on JVM GC logs, but I did find that Spark
> broadcast variable reading time increasing.
> Initially it takes less than 10ms, but after 3 days it takes more than
> 60ms. It's really puzzling since I don't use broadcast variables at
> all.
>
> My application needs to run 24/7, so I hope there's something I'm
> missing to correct this behavior.
>
> FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client
> mode.
> Attached spark application environment settings file.
>
> --
> John Simon
>
> *environment.txt* (7K) Download Attachment
> <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt>
>
> ----------
> View this message in context: Long Running Spark Streaming getting slower
> <http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>


Long Running Spark Streaming getting slower

2016-06-10 Thread john.simon
Hi all,

I'm running Spark Streaming with Kafka Direct Stream, but after
running a couple of days, the batch processing time almost doubles.
I didn't find any slowdown on JVM GC logs, but I did find that Spark
broadcast variable reading time increasing.
Initially it takes less than 10ms, but after 3 days it takes more than
60ms. It's really puzzling since I don't use broadcast variables at
all.

My application needs to run 24/7, so I hope there's something I'm
missing to correct this behavior.

FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client mode.
Attached spark application environment settings file.

--
John Simon


environment.txt (7K) 
<http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.