R: Spark's behavior

Eduardo Alfaia Tue, 29 Apr 2014 16:17:29 -0700

Hi TD, I am GMT +8 from you, Tomorrow I will get these information that you 
have asked me.


Thanks

----- Messaggio originale -----
Da: "Tathagata Das" <tathagata.das1...@gmail.com>
Inviato: ‎30/‎04/‎2014 00.57
A: "user@spark.apache.org" <user@spark.apache.org>
Oggetto: Re: Spark's behavior

Strange! Can you just do lines.print() to print the raw data instead of doing 
word count. Beyond that we can do two things. 


1. Can see the Spark stage UI to see whether there are stages running during 
the 30 second period you referred to?
2. If you upgrade to using Spark master branch (or Spark 1.0 RC3, see different 
thread by Patrick), it has a streaming UI, which shows the number of records 
received, the state of the receiver, etc. That may be more useful in debugging 
whats going on .


TD 



On Tue, Apr 29, 2014 at 3:31 PM, Eduardo Costa Alfaia <e.costaalf...@unibs.it> 
wrote:

Hi TD,
We are not using stream context with master local, we have 1 Master and 8 
Workers and 1 word source. The command line that we are using is:
bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount 
spark://192.168.0.13:7077
     

On Apr 30, 2014, at 0:09, Tathagata Das <tathagata.das1...@gmail.com> wrote:


Is you batch size 30 seconds by any chance? 


Assuming not, please check whether you are creating the streaming context with 
master "local[n]" where n > 2. With "local" or "local[1]", the system only has 
one processing slot, which is occupied by the receiver leaving no room for 
processing the received data. It could be that after 30 seconds, the server 
disconnects, the receiver terminates, releasing the single slot for the 
processing to proceed. 


TD



On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia <e.costaalf...@unibs.it> 
wrote:

Hi TD,

In my tests with spark streaming, I'm using JavaNetworkWordCount(modified) code 
and a program that I wrote that sends words to the Spark worker, I use TCP as 
transport. I verified that after starting Spark, it connects to my source which 
actually starts sending, but the first word count is advertised approximately 
30 seconds after the context creation. So I'm wondering where is stored the 30 
seconds data already sent by the source. Is this a normal spark’s behaviour? I 
saw the same behaviour using the shipped JavaNetworkWordCount application.

Many thanks.
--
Informativa sulla Privacy: http://www.unibs.it/node/8155






Informativa sulla Privacy: http://www.unibs.it/node/8155
-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155

R: Spark's behavior

Reply via email to