Re: RDD to DStream

2014-11-12 Thread Jianshi Huang
(t...@databricks.com) *Subject:* Re: RDD to DStream Yeah, you're absolutely right Saisai. My point is we should allow this kind of logic in RDD, let's say transforming type RDD[(Key, Iterable[T])] to Seq[(Key, RDD[T])]. Make sense? Jianshi On Mon, Oct 27, 2014 at 3:56 PM, Shao

RE: RDD to DStream

2014-10-27 Thread Shao, Saisai
...@gmail.com] Sent: Monday, October 27, 2014 1:42 PM To: Tathagata Das Cc: Aniket Bhatnagar; user@spark.apache.org Subject: Re: RDD to DStream I have a similar requirement. But instead of grouping it by chunkSize, I would have the timeStamp be part of the data. So the function I want has

Re: RDD to DStream

2014-10-27 Thread Jianshi Huang
...@gmail.com] *Sent:* Monday, October 27, 2014 1:42 PM *To:* Tathagata Das *Cc:* Aniket Bhatnagar; user@spark.apache.org *Subject:* Re: RDD to DStream I have a similar requirement. But instead of grouping it by chunkSize, I would have the timeStamp be part of the data. So the function I want has

RE: RDD to DStream

2014-10-27 Thread Shao, Saisai
amount of data back to driver. Thanks Jerry From: Jianshi Huang [mailto:jianshi.hu...@gmail.com] Sent: Monday, October 27, 2014 2:39 PM To: Shao, Saisai Cc: user@spark.apache.org; Tathagata Das (t...@databricks.com) Subject: Re: RDD to DStream Hi Saisai, I understand it's non-trivial

Re: RDD to DStream

2014-10-27 Thread Jianshi Huang
@spark.apache.org; Tathagata Das (t...@databricks.com) *Subject:* Re: RDD to DStream Hi Saisai, I understand it's non-trivial, but the requirement of simulating offline data as stream is also fair. :) I just wrote a prototype, however, I need to do a collect and a bunch of parallelize

Re: RDD to DStream

2014-10-27 Thread Jianshi Huang
, October 27, 2014 2:39 PM *To:* Shao, Saisai *Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com) *Subject:* Re: RDD to DStream Hi Saisai, I understand it's non-trivial, but the requirement of simulating offline data as stream is also fair. :) I just wrote a prototype, however

RE: RDD to DStream

2014-10-27 Thread Shao, Saisai
cannot support nested RDD in closure. Thanks Jerry From: Jianshi Huang [mailto:jianshi.hu...@gmail.com] Sent: Monday, October 27, 2014 3:30 PM To: Shao, Saisai Cc: user@spark.apache.org; Tathagata Das (t...@databricks.com) Subject: Re: RDD to DStream Ok, back to Scala code, I'm wondering why I

Re: RDD to DStream

2014-10-27 Thread Jianshi Huang
nested RDD in closure. Thanks Jerry *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] *Sent:* Monday, October 27, 2014 3:30 PM *To:* Shao, Saisai *Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com) *Subject:* Re: RDD to DStream Ok, back to Scala code, I'm wondering

RE: RDD to DStream

2014-10-27 Thread Shao, Saisai
@spark.apache.org; Tathagata Das (t...@databricks.com) Subject: Re: RDD to DStream Yeah, you're absolutely right Saisai. My point is we should allow this kind of logic in RDD, let's say transforming type RDD[(Key, Iterable[T])] to Seq[(Key, RDD[T])]. Make sense? Jianshi On Mon, Oct 27, 2014 at 3:56 PM

Re: RDD to DStream

2014-10-27 Thread Jianshi Huang
[mailto:jianshi.hu...@gmail.com] *Sent:* Monday, October 27, 2014 4:07 PM *To:* Shao, Saisai *Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com) *Subject:* Re: RDD to DStream Yeah, you're absolutely right Saisai. My point is we should allow this kind of logic in RDD, let's say transforming

Re: RDD to DStream

2014-10-26 Thread Jianshi Huang
I have a similar requirement. But instead of grouping it by chunkSize, I would have the timeStamp be part of the data. So the function I want has the following signature: // RDD of (timestamp, value) def rddToDStream[T](data: RDD[(Long, T)], timeWindow: Long)(implicit ssc: StreamingContext):

Re: RDD to DStream

2014-08-06 Thread Tathagata Das
Hey Aniket, Great thoughts! I understand the usecase. But as you have realized yourself it is not trivial to cleanly stream a RDD as a DStream. Since RDD operations are defined to be scan based, it is not efficient to define RDD based on slices of data within a partition of another RDD, using

Re: RDD to DStream

2014-08-01 Thread Aniket Bhatnagar
Hi everyone I haven't been receiving replies to my queries in the distribution list. Not pissed but I am actually curious to know if my messages are actually going through or not. Can someone please confirm that my msgs are getting delivered via this distribution list? Thanks, Aniket On 1

Re: RDD to DStream

2014-08-01 Thread Mayur Rustagi
Nice question :) Ideally you should use a queuestream interface to push RDD into a queue then spark streaming can handle the rest. Though why are you looking to convert RDD to DStream, another workaround folks use is to source DStream from folders move files that they need reprocessed back into