Re: Problem in Spark Streaming

2014-06-11 Thread nilmish

I used these commands to show the GC timings : -verbose:gc
-XX:-PrintGCDetails -XX:+PrintGCTimeStamps

Following is the output I got on the standard output :

4.092: [GC 4.092: [ParNew: 274752K-27199K(309056K), 0.0421460 secs]
274752K-27199K(995776K), 0.0422720 secs] [Times: user=0.33 sys=0.11,
real=0.04 secs] 

16.630: [GC 16.630: [ParNew: 301951K-17854K(309056K), 0.0686940 secs]
301951K-23624K(995776K), 0.0689110 secs] [Times: user=0.36 sys=0.05,
real=0.07 secs]
 
32.440: [GC 32.441: [ParNew: 292606K-14985K(309056K), 0.0206040 secs]
298376K-20755K(995776K), 0.0208320 secs] [Times: user=0.20 sys=0.00,
real=0.02 secs]
 
42.626: [GC 42.626: [ParNew: 289737K-15467K(309056K), 0.0138100 secs]
295507K-21237K(995776K), 0.0139830 secs] [Times: user=0.10 sys=0.00,
real=0.01 secs]
 
56.633: [GC 56.633: [ParNew: 290219K-17334K(309056K), 0.0170930 secs]
295989K-23105K(995776K), 0.0173130 secs] [Times: user=0.12 sys=0.01,
real=0.02 secs] 

Can anyone help me to understand these messgaes related to GC ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7384.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Problem in Spark Streaming

2014-06-11 Thread vinay Bajaj
http://stackoverflow.com/questions/895444/java-garbage-collection-log-messages

http://stackoverflow.com/questions/16794783/how-to-read-a-verbosegc-output

I think this will help in understanding the logs.


On Wed, Jun 11, 2014 at 12:53 PM, nilmish nilmish@gmail.com wrote:


 I used these commands to show the GC timings : -verbose:gc
 -XX:-PrintGCDetails -XX:+PrintGCTimeStamps

 Following is the output I got on the standard output :

 4.092: [GC 4.092: [ParNew: 274752K-27199K(309056K), 0.0421460 secs]
 274752K-27199K(995776K), 0.0422720 secs] [Times: user=0.33 sys=0.11,
 real=0.04 secs]

 16.630: [GC 16.630: [ParNew: 301951K-17854K(309056K), 0.0686940 secs]
 301951K-23624K(995776K), 0.0689110 secs] [Times: user=0.36 sys=0.05,
 real=0.07 secs]

 32.440: [GC 32.441: [ParNew: 292606K-14985K(309056K), 0.0206040 secs]
 298376K-20755K(995776K), 0.0208320 secs] [Times: user=0.20 sys=0.00,
 real=0.02 secs]

 42.626: [GC 42.626: [ParNew: 289737K-15467K(309056K), 0.0138100 secs]
 295507K-21237K(995776K), 0.0139830 secs] [Times: user=0.10 sys=0.00,
 real=0.01 secs]

 56.633: [GC 56.633: [ParNew: 290219K-17334K(309056K), 0.0170930 secs]
 295989K-23105K(995776K), 0.0173130 secs] [Times: user=0.12 sys=0.01,
 real=0.02 secs]

 Can anyone help me to understand these messgaes related to GC ?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7384.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Problem in Spark Streaming

2014-06-10 Thread Yingjun Wu
Hi Nilmish,

I confront the same problem. I am wondering how do you measure the latency?

Regards,
Yingjun



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7311.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Problem in Spark Streaming

2014-06-10 Thread nilmish
You can measure the latency from the logs. Search for words like Total delay
in the logs. This denotes the total end to end delay for a particular query.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7312.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Problem in Spark Streaming

2014-06-10 Thread Boduo Li
Hi Nilmish,

What's the data rate/node when you see the high latency? (It seems the
latency keeps increasing.) Do you still see it if you lower the data rate or
the frequency of the windowed query?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7321.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Problem in Spark Streaming

2014-06-10 Thread nilmish
How can I measure data rate/node ?

I am feeding the data through kafka API. I only know the total inflow data
rate which almost remains constant . How can I figure out what amount of
data is distributed to the nodes in my cluster ? 

Latency does not keep on increasing infinetly. It goes up for some instant
and then it drops down again to the normal level. I want to get away with
these spikes in between. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Problem in Spark Streaming

2014-06-10 Thread Boduo Li
Oh, I mean the average data rate/node.

But in case I want to know the input activities to each node (I use a custom
receiver instead of Kafka), I usually search these records in logs to get a
sense: BlockManagerInfo: Added input ... on [hostname:port] (size: xxx KB)

I also see some spikes in latency as I posted earlier:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-achieve-reasonable-performance-on-Spark-Streaming-tp7262.html
It's even worse as the spikes cause the latency to increase infinitely when
the data rate is a little high, although the machines are underutilized. I
can't explain it either. I'm not sure if the cause is the same as yours.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7327.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Problem in Spark Streaming

2014-06-10 Thread Ashish Rangole
Have you considered the garbage collection impact and if it coincides with
your latency spikes? You can enable gc logging by changing Spark
configuration for your job.
Hi, as I searched the keyword Total delay in the console log, the delay
keeps increasing. I am not sure what does this total delay mean? For
example, if I perform a windowing wordcount with windowSize=1ms and
slidingStep=2000ms, then does the delay measured from the 10th second?

A sample log is shown as follows:
Total delay: 136.983 s for time 1402409331000 ms (execution: 1.711s) --what
is execution time?
Finished TID 490 in 14 ms on  (progress: 1/6) --what is TID? and what
is the progress?



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-Spark-Streaming-tp7310p7329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.