Re: NonSerializable Exception in foreachRDD

2014-10-31 Thread Akhil Das
Are you expecting something like this?


val data = ssc.textFileStream("hdfs://akhldz:9000/input/")


 val rdd = ssc.sparkContext.parallelize(Seq("foo", "bar"))

 val sample = data.foreachRDD(x=> {
   val new_rdd = x.union(rdd)
   new_rdd.saveAsTextFile("hdfs://akhldz:9000/output/")
 })

Thanks
Best Regards

On Fri, Oct 31, 2014 at 10:46 AM, Tobias Pfeiffer  wrote:

> Harold,
>
> just mentioning it in case you run into it: If you are in a separate
> thread, there are apparently stricter limits to what you can and cannot
> serialize:
>
> val someVal
> future {
>   // be very careful with defining RDD operations using someVal here
>   val myLocalVal = someVal
>   // use myLocalVal instead
> }
>
> On Thu, Oct 30, 2014 at 4:55 PM, Harold Nguyen  wrote:
>
>> In Spark Streaming, when I do "foreachRDD" on my DStreams, I get a
>> NonSerializable exception when I try to do something like:
>>
>> DStream.foreachRDD( rdd => {
>>   var sc.parallelize(Seq(("test", "blah")))
>> })
>>
>
> Is this the code you are actually using? "var sc.parallelize(...)" doesn't
> really look like valid Scala to me.
>
> Tobias
>
>
>


Re: NonSerializable Exception in foreachRDD

2014-10-30 Thread Tobias Pfeiffer
Harold,

just mentioning it in case you run into it: If you are in a separate
thread, there are apparently stricter limits to what you can and cannot
serialize:

val someVal
future {
  // be very careful with defining RDD operations using someVal here
  val myLocalVal = someVal
  // use myLocalVal instead
}

On Thu, Oct 30, 2014 at 4:55 PM, Harold Nguyen  wrote:

> In Spark Streaming, when I do "foreachRDD" on my DStreams, I get a
> NonSerializable exception when I try to do something like:
>
> DStream.foreachRDD( rdd => {
>   var sc.parallelize(Seq(("test", "blah")))
> })
>

Is this the code you are actually using? "var sc.parallelize(...)" doesn't
really look like valid Scala to me.

Tobias


NonSerializable Exception in foreachRDD

2014-10-30 Thread Harold Nguyen
Hi all,

In Spark Streaming, when I do "foreachRDD" on my DStreams, I get a
NonSerializable exception when I try to do something like:

DStream.foreachRDD( rdd => {
  var sc.parallelize(Seq(("test", "blah")))
})

Is there any way around that ?

Thanks,

Harold