updateStateByKey performance / API

2015-03-18 Thread Nikos Viorres
Hi all, We are having a few issues with the performance of updateStateByKey operation in Spark Streaming (1.2.1 at the moment) and any advice would be greatly appreciated. Specifically, on each tick of the system (which is set at 10 secs) we need to update a state tuple where the key is the

Re: Idempotent count

2015-03-18 Thread Arush Kharbanda
Hi Yes spark streaming is capable of stateful stream processing. With or without state is a way of classifying state. Checkpoints hold metadata and Data. Thanks On Wed, Mar 18, 2015 at 4:00 AM, Binh Nguyen Van binhn...@gmail.com wrote: Hi all, I am new to Spark so please forgive me if my

updateStateByKey performance API

2015-03-18 Thread nvrs
Hi all, We are having a few issues with the performance of updateStateByKey operation in Spark Streaming (1.2.1 at the moment) and any advice would be greatly appreciated. Specifically, on each tick of the system (which is set at 10 secs) we need to update a state tuple where the key is the

RE: [spark-streaming] can shuffle write to disk be disabled?

2015-03-18 Thread Shao, Saisai
Would you please check your driver log or streaming web UI to see each job's latency, including processing latency and total latency. Seems from your code, sliding window is just 3 seconds, so you will process each 60 second's data in 3 seconds, if processing latency is larger than the sliding

Re: HIVE SparkSQL

2015-03-18 Thread Jörn Franke
Hallo, Depending non your needs, search technology, such as SolrCloud or ElasticSearch makes more sense. If you go for the Cassandra solution you can use the lucene text indexer... I am not sure if hive or sparksql are very suitable for text. However, if you do not need text search then feel free

<    1   2