Re: [Spar Streaming] How can we use consecutive data points as the features ?

2014-08-17 Thread Tobias Pfeiffer
Hi,

On Sat, Aug 16, 2014 at 3:29 AM, Yan Fang yanfang...@gmail.com wrote:

 If all consecutive data points are in one batch, it's not complicated
 except that the order of data points is not guaranteed in the batch and so
 I have to use the timestamp in the data point to reach my goal. However,
 when the consecutive data points spread in two or more batches, how can I
 do this?


You *could* use window operations. If there is an upper limit to how many
batches you might want to look at, you can instead consider a window that
is large enough and thereby avoid using updateStateByKey.

Tobias


[Spar Streaming] How can we use consecutive data points as the features ?

2014-08-15 Thread Yan Fang
Hi guys,

We have a use case where we need to use consecutive data points to predict
the status. (yes, like using time series data to predict the machine
failure). Is there a straight-forward way to do this in Spark Streaming?

If all consecutive data points are in one batch, it's not complicated
except that the order of data points is not guaranteed in the batch and so
I have to use the timestamp in the data point to reach my goal. However,
when the consecutive data points spread in two or more batches, how can I
do this? From my understanding, I need to use the state management. But
it's not easy to use the updateStateByKey. e.g. I will need to update one
data point and delete the oldest data point but can not do them in a batch
fashion.

Does anyone have similar use case in the community and how do you solve
this? Thank you.

Cheers,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108