(t...@databricks.com)
*Subject:* Re: RDD to DStream
Yeah, you're absolutely right Saisai.
My point is we should allow this kind of logic in RDD, let's say
transforming type RDD[(Key, Iterable[T])] to Seq[(Key, RDD[T])].
Make sense?
Jianshi
On Mon, Oct 27, 2014 at 3:56 PM, Shao
...@gmail.com]
Sent: Monday, October 27, 2014 1:42 PM
To: Tathagata Das
Cc: Aniket Bhatnagar; user@spark.apache.org
Subject: Re: RDD to DStream
I have a similar requirement. But instead of grouping it by chunkSize, I would
have the timeStamp be part of the data. So the function I want has
...@gmail.com]
*Sent:* Monday, October 27, 2014 1:42 PM
*To:* Tathagata Das
*Cc:* Aniket Bhatnagar; user@spark.apache.org
*Subject:* Re: RDD to DStream
I have a similar requirement. But instead of grouping it by chunkSize, I
would have the timeStamp be part of the data. So the function I want has
amount of data back to driver.
Thanks
Jerry
From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Monday, October 27, 2014 2:39 PM
To: Shao, Saisai
Cc: user@spark.apache.org; Tathagata Das (t...@databricks.com)
Subject: Re: RDD to DStream
Hi Saisai,
I understand it's non-trivial
@spark.apache.org; Tathagata Das (t...@databricks.com)
*Subject:* Re: RDD to DStream
Hi Saisai,
I understand it's non-trivial, but the requirement of simulating offline
data as stream is also fair. :)
I just wrote a prototype, however, I need to do a collect and a bunch of
parallelize
, October 27, 2014 2:39 PM
*To:* Shao, Saisai
*Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
*Subject:* Re: RDD to DStream
Hi Saisai,
I understand it's non-trivial, but the requirement of simulating offline
data as stream is also fair. :)
I just wrote a prototype, however
cannot support nested RDD in closure.
Thanks
Jerry
From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Monday, October 27, 2014 3:30 PM
To: Shao, Saisai
Cc: user@spark.apache.org; Tathagata Das (t...@databricks.com)
Subject: Re: RDD to DStream
Ok, back to Scala code, I'm wondering why I
nested RDD in closure.
Thanks
Jerry
*From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
*Sent:* Monday, October 27, 2014 3:30 PM
*To:* Shao, Saisai
*Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
*Subject:* Re: RDD to DStream
Ok, back to Scala code, I'm wondering
@spark.apache.org; Tathagata Das (t...@databricks.com)
Subject: Re: RDD to DStream
Yeah, you're absolutely right Saisai.
My point is we should allow this kind of logic in RDD, let's say transforming
type RDD[(Key, Iterable[T])] to Seq[(Key, RDD[T])].
Make sense?
Jianshi
On Mon, Oct 27, 2014 at 3:56 PM
[mailto:jianshi.hu...@gmail.com]
*Sent:* Monday, October 27, 2014 4:07 PM
*To:* Shao, Saisai
*Cc:* user@spark.apache.org; Tathagata Das (t...@databricks.com)
*Subject:* Re: RDD to DStream
Yeah, you're absolutely right Saisai.
My point is we should allow this kind of logic in RDD, let's say
transforming
I have a similar requirement. But instead of grouping it by chunkSize, I
would have the timeStamp be part of the data. So the function I want has
the following signature:
// RDD of (timestamp, value)
def rddToDStream[T](data: RDD[(Long, T)], timeWindow: Long)(implicit ssc:
StreamingContext):
Hey Aniket,
Great thoughts! I understand the usecase. But as you have realized yourself
it is not trivial to cleanly stream a RDD as a DStream. Since RDD
operations are defined to be scan based, it is not efficient to define RDD
based on slices of data within a partition of another RDD, using
Hi everyone
I haven't been receiving replies to my queries in the distribution list.
Not pissed but I am actually curious to know if my messages are actually
going through or not. Can someone please confirm that my msgs are getting
delivered via this distribution list?
Thanks,
Aniket
On 1
Nice question :)
Ideally you should use a queuestream interface to push RDD into a queue
then spark streaming can handle the rest.
Though why are you looking to convert RDD to DStream, another workaround
folks use is to source DStream from folders move files that they need
reprocessed back into
14 matches
Mail list logo