Hi Spark guru
I am evaluating Spark Streaming,
In my application I need to maintain cumulative statistics (e.g the total
running word count), so I need to call the updateStateByKey function on
very micro-batch.
After setting those things, I got following behaviors:
* The Processing Time
seconds for checkpoint. Now my application have average 30
seconds latency and keep increasingly.
2015-11-06 11:11 GMT+07:00 Thúy Hằng Lê <thuyhang...@gmail.com>:
> Thankd all, it would be great to have this feature soon.
> Do you know what's the release plan for 1.6?
>
> In a
e.
}
}
Without using updateStageByKey, I'm only have the stats of the last
micro-batch.
Any advise on this?
2015-11-07 11:35 GMT+07:00 Aniket Bhatnagar <aniket.bhatna...@gmail.com>:
> Can you try storing the state (word count) in an external key value store?
>
>
our state object more complicated and try to prune out words
>with very few occurrences or that haven’t been updated for a long time
> - You can do this by emitting None from updateStateByKey
>
> Hope this helps,
> -adrian
>
> From: Thúy Hằng Lê
> Date: Monday, Novemb
"Adrian Tanase" <atan...@adobe.com> wrote:
> Nice! Thanks for sharing, I wasn’t aware of the new API.
>
> Left some comments on the JIRA and design doc.
>
> -adrian
>
> From: Shixiong Zhu
> Date: Tuesday, November 3, 2015 at 3:32 AM
> To: Thúy Hằng Lê
ecific for metrics -
> chose the RDDs that write to OpenTSDB using foreachRdd
>
> -adrian
>
> --
> *From:* Thúy Hằng Lê <thuyhang...@gmail.com
> <javascript:_e(%7B%7D,'cvml','thuyhang...@gmail.com');>>
> *Sent:* Sunday
Hi all,
I am going to build a financial application for Portfolio Manager, where
each portfolio contains a list of stocks, the number of shares purchased,
and the purchase price.
Another source of information is stocks price from market data. The
application need to calculate real-time gain or
Thanks all for the feedback so far.
I havn't decided which external storage will be used yet.
HBase is cool but it requires Hadoop in production. I only have 3-4 servers
for the whole things ( i am thinking of a relational database for this, can
be MariaDB, Memsql or mysql) but they are hard to
Thanks Cody,
I still have concerns about this.
What's do you mean by saying Spark direct stream doesn't have a default
partitioner? Could you please help me to explain more?
When i assign 20 cores to 20 Kafka partitions, I am expecting each core
will work on a partition. Is it correct?
I'm