JavaDStream.foreachRDD
(https://spark.apache.org/docs/1.2.1/api/java/org/apache/spark/streaming/api/java/JavaDStreamLike.html#foreachRDD(org.apache.spark.api.java.function.Function))
and Statistics.corr
(https://spark.apache.org/docs/1.2.1/api/java/org/apache/spark/mllib/stat/Statistics.html#corr(org.apache.spark.rdd.RDD))
should be good starting points. -Xiangrui

On Mon, Feb 16, 2015 at 6:39 AM, Spico Florin <spicoflo...@gmail.com> wrote:
> Hello!
>   I'm newbie to Spark and I have the following case study:
> 1. Client sending at 100ms the following data:
>   {uniqueId, timestamp, measure1, measure2 }
> 2. Each 30 seconds I would like to correlate the data collected in the
> window, with some predefined double vector pattern for each given key. The
> predefined pattern has 300 records. The data should be also sorted by
> timestamp.
> 3. When the correlation is greater than a predefined threshold (e.g 0.9) I
> would like to emit an new message containing {uniqueId,
> doubleCorrelationValue}
> 4. For the correlation I would like to use MLlib
> 5. As a programming language I would like to muse Java 7.
>
> Can you please give me some suggestions on how to create the skeleton for
> the above scenario?
>
> Thanks.
>  Regards,
>  Florin
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to