How to distribute non-serializable object in transform task or broadcast ?

2015-08-07 Thread Hao Ren
Is there any workaround to distribute non-serializable object for RDD transformation or broadcast variable ? Say I have an object of class C which is not serializable. Class C is in a jar package, I have no control on it. Now I need to distribute it either by rdd transformation or by broadcast.

Re: How to distribute non-serializable object in transform task or broadcast ?

2015-08-07 Thread Philip Weaver
If the object cannot be serialized, then I don't think broadcast will make it magically serializable. You can't transfer data structures between nodes without serializing them somehow. On Fri, Aug 7, 2015 at 7:31 AM, Sujit Pal sujitatgt...@gmail.com wrote: Hi Hao, I think sc.broadcast will

Re: How to distribute non-serializable object in transform task or broadcast ?

2015-08-07 Thread Han JU
If the object is something like an utility object (say a DB connection handler), I often use: @transient lazy val someObj = MyFactory.getObj(...) So basically `@transient` tell the closure cleaner don't serialize this, and the `lazy val` allows it to be initiated on each executor upon its

Re: How to distribute non-serializable object in transform task or broadcast ?

2015-08-07 Thread Eugene Morozov
Hao, I’d say there are few possible ways to achieve that: 1. Use KryoSerializer. The flaw of KryoSerializer is that current version (2.21) has an issue with internal state and it might not work for some objects. Spark get kryo dependency as transitive through chill and it’ll not be resolved