Sandy, On Mon, Nov 10, 2014 at 6:01 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > > The result array is 1867 x 5. It serialized is 80k bytes, which seems > about right: > scala> SparkEnv.get.closureSerializer.newInstance().serialize(arr) > res17: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=80027 > cap=80027] > > If I reference it from a simple function: > scala> def func(x: Long) => arr.length > scala> SparkEnv.get.closureSerializer.newInstance().serialize(func) > I get a NotSerializableException. >
(Are you in fact doing this from the Scala shell? Maybe try it from compiled code.) I may be wrong on that, but my understanding is that a "def" is a lot different from a "val" (holding a function) in terms of serialization. A "def" (such as yours defined above) that is a method of a class does not have a class file of its own, so it must be serialized with the whole object that it belongs to (which can fail easily if you reference a SparkContext etc.). A "val" (such as `val func: Long => Long = x => arr.length`) will be compiled into a class file of its own, so only that (pretty small) function object will have to be serialized. That may explain why some functions are serializable while others with the same content aren't, and also the difference in size. Tobias