...@gmail.com;
user@spark.apache.org user@spark.apache.org
Sent: Friday, January 30, 2015 7:11 AM
Subject: Re: spark challenge: zip with next???
assuming the data can be partitioned then you have many timeseries for which
you want to detect potential gaps. also assuming the resulting gaps info per
30, 2015 7:11 AM
*Subject:* Re: spark challenge: zip with next???
assuming the data can be partitioned then you have many timeseries for
which you want to detect potential gaps. also assuming the resulting gaps
info per timeseries is much smaller data then the timeseries data itself
...@gmail.com
*Cc:* Tobias Pfeiffer t...@preferred.jp; Ganelin, Ilya
ilya.gane...@capitalone.com; derrickburns derrickrbu...@gmail.com;
user@spark.apache.org user@spark.apache.org
*Sent:* Friday, January 30, 2015 7:11 AM
*Subject:* Re: spark challenge: zip with next???
assuming the data can
*Cc:* Tobias Pfeiffer t...@preferred.jp; Ganelin, Ilya
ilya.gane...@capitalone.com; derrickburns derrickrbu...@gmail.com;
user@spark.apache.org user@spark.apache.org
*Sent:* Friday, January 30, 2015 7:11 AM
*Subject:* Re: spark challenge: zip with next???
assuming the data can be partitioned
assuming the data can be partitioned then you have many timeseries for
which you want to detect potential gaps. also assuming the resulting gaps
info per timeseries is much smaller data then the timeseries data itself,
then this is a classical example to me of a sorted (streaming) foldLeft,
is the most efficient way to achieve this in
Spark?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-challenge-zip-with-next-tp21423.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Another solution would be to use the reduce action.
Mohammed
From: Ganelin, Ilya [mailto:ilya.gane...@capitalone.com]
Sent: Thursday, January 29, 2015 1:32 PM
To: 'derrickburns'; 'user@spark.apache.org'
Subject: RE: spark challenge: zip with next???
Make a copy of your RDD with an extra entry
: Thursday, January 29, 2015 02:52 PM Eastern Standard Time
To: user@spark.apache.org
Subject: spark challenge: zip with next???
Here is a spark challenge for you!
I have a data set where each entry has a date. I would like to identify
gaps in the dates greater larger a given length. For example
Hi,
On Fri, Jan 30, 2015 at 6:32 AM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
Make a copy of your RDD with an extra entry in the beginning to offset.
The you can zip the two RDDs and run a map to generate an RDD of
differences.
Does that work? I recently tried something to compute
http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/%3ccalrvtpkn65rolzbetc+ddk4o+yjm+tfaf5dz8eucpl-2yhy...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/%3ccalrvtpkn65rolzbetc+ddk4o+yjm+tfaf5dz8eucpl-2yhy...@mail.gmail.com%3E
you can use the MLLib
10 matches
Mail list logo