Re: spark challenge: zip with next???

2015-01-30 Thread Michael Malak
...@gmail.com; user@spark.apache.org user@spark.apache.org Sent: Friday, January 30, 2015 7:11 AM Subject: Re: spark challenge: zip with next??? assuming the data can be partitioned then you have many timeseries for which you want to detect potential gaps. also assuming the resulting gaps info per

Re: spark challenge: zip with next???

2015-01-30 Thread Koert Kuipers
30, 2015 7:11 AM *Subject:* Re: spark challenge: zip with next??? assuming the data can be partitioned then you have many timeseries for which you want to detect potential gaps. also assuming the resulting gaps info per timeseries is much smaller data then the timeseries data itself

Re: spark challenge: zip with next???

2015-01-30 Thread Koert Kuipers
...@gmail.com *Cc:* Tobias Pfeiffer t...@preferred.jp; Ganelin, Ilya ilya.gane...@capitalone.com; derrickburns derrickrbu...@gmail.com; user@spark.apache.org user@spark.apache.org *Sent:* Friday, January 30, 2015 7:11 AM *Subject:* Re: spark challenge: zip with next??? assuming the data can

Re: spark challenge: zip with next???

2015-01-30 Thread Derrick Burns
*Cc:* Tobias Pfeiffer t...@preferred.jp; Ganelin, Ilya ilya.gane...@capitalone.com; derrickburns derrickrbu...@gmail.com; user@spark.apache.org user@spark.apache.org *Sent:* Friday, January 30, 2015 7:11 AM *Subject:* Re: spark challenge: zip with next??? assuming the data can be partitioned

Re: spark challenge: zip with next???

2015-01-30 Thread Koert Kuipers
assuming the data can be partitioned then you have many timeseries for which you want to detect potential gaps. also assuming the resulting gaps info per timeseries is much smaller data then the timeseries data itself, then this is a classical example to me of a sorted (streaming) foldLeft,

spark challenge: zip with next???

2015-01-29 Thread derrickburns
is the most efficient way to achieve this in Spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-challenge-zip-with-next-tp21423.html Sent from the Apache Spark User List mailing list archive at Nabble.com

RE: spark challenge: zip with next???

2015-01-29 Thread Mohammed Guller
Another solution would be to use the reduce action. Mohammed From: Ganelin, Ilya [mailto:ilya.gane...@capitalone.com] Sent: Thursday, January 29, 2015 1:32 PM To: 'derrickburns'; 'user@spark.apache.org' Subject: RE: spark challenge: zip with next??? Make a copy of your RDD with an extra entry

RE: spark challenge: zip with next???

2015-01-29 Thread Ganelin, Ilya
: Thursday, January 29, 2015 02:52 PM Eastern Standard Time To: user@spark.apache.org Subject: spark challenge: zip with next??? Here is a spark challenge for you! I have a data set where each entry has a date. I would like to identify gaps in the dates greater larger a given length. For example

Re: spark challenge: zip with next???

2015-01-29 Thread Tobias Pfeiffer
Hi, On Fri, Jan 30, 2015 at 6:32 AM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: Make a copy of your RDD with an extra entry in the beginning to offset. The you can zip the two RDDs and run a map to generate an RDD of differences. Does that work? I recently tried something to compute

Re: spark challenge: zip with next???

2015-01-29 Thread Mohit Jaggi
http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/%3ccalrvtpkn65rolzbetc+ddk4o+yjm+tfaf5dz8eucpl-2yhy...@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/spark-user/201405.mbox/%3ccalrvtpkn65rolzbetc+ddk4o+yjm+tfaf5dz8eucpl-2yhy...@mail.gmail.com%3E you can use the MLLib