Because it happens to reference something outside the closures scope that
will reference some other objects (that you don't need) and so one,
resulting in serializing with your task a lot of things that you don't
want. But sure it is discutable and it's more my personal opinion.
2014-04-17 23:28
Ok thanks. However it turns out that there's a problem with that and it's
not so safe to use kryo serialization with Spark:
Exception in thread Executor task launch worker-0
java.lang.NullPointerException
at
Indeed, serialization is always tricky when you want to work on objects
that are more sophisticated than simple POJOs.
And you can have sometimes unexpected behaviour when using the deserialized
objects. In my case I had troubles when serializaing/deser Avro specific
records with lists. The
You have two kind of ser : data and closures. They both use java ser. This
means that in your function you reference an object outside of it and it is
getting ser with your task. To enable kryo ser for closures set
spark.closure.serializer property. But usualy I dont as it allows me to
detect such
Hi to all,
in my application I read objects that are not serializable because I cannot
modify the sources.
So I tried to do a workaround creating a dummy class that extends the
unmodifiable one but implements serializable.
All attributes of the parent class are Lists of objects (some of them are
Thanks Eugen for tgee reply. Could you explain me why I have the
problem?Why my serialization doesn't work?
On Apr 14, 2014 6:40 PM, Eugen Cepoi cepoi.eu...@gmail.com wrote:
Hi,
as a easy workaround you can enable Kryo serialization
http://spark.apache.org/docs/latest/configuration.html
Sure. As you have pointed, those classes don't implement Serializable and
Spark uses by default java serialization (when you do collect the data from
the workers will be serialized, collected by the driver and then
deserialized on the driver side). Kryo (as most other decent serialization
libs)
Ok, that's fair enough. But why things work up to the collect?during map
and filter objects are not serialized?
On Apr 15, 2014 12:31 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote:
Sure. As you have pointed, those classes don't implement Serializable and
Spark uses by default java serialization
Nope, those operations are lazy, meaning it will create the RDDs but won't
trigger any action. The computation is launched by operations such as
collect, count, save to HDFS etc. And even if they were not lazy, no
serialization would happen. Serialization occurs only when data will be
transfered