Indeed, serialization is always tricky when you want to work on objects
that are more sophisticated than simple POJOs.
And you can have sometimes unexpected behaviour when using the deserialized
objects. In my case I had troubles when serializaing/deser Avro specific
records with lists. The implementation of java.util.List used by avro does
not have a default no arg constructor and has initialization logic inside
its constructors.


The best way to go (IMO) when you need some:
 - var is to do a copy of it inside the function having the closure
 - function to use in your closure => define it in some stateless dummy
class and implement serializable
 - also a trick with vars could be to define them as lazy, thus they will
be created inside the closure, so the closure won't have a reference on the
outter class (but you might get other surprises...)


2014-04-18 10:37 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:

> Ok thanks. However it turns out that there's a problem with that and it's
> not so safe to use kryo serialization with Spark:
>
> Exception in thread "Executor task launch worker-0"
> java.lang.NullPointerException
>  at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1$$anonfun$6.apply(Executor.scala:267)
> at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1$$anonfun$6.apply(Executor.scala:267)
>
> This error is reported also at
> http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCAPud8Tq7fK5j2Up9dDdRQ=y1efwidjnmqc55o9jm5dh7rpd...@mail.gmail.com%3E
> .
>
>
> On Fri, Apr 18, 2014 at 10:31 AM, Eugen Cepoi <cepoi.eu...@gmail.com>wrote:
>
>> Because it happens to reference something outside the closures scope that
>> will reference some other objects (that you don't need) and so one,
>> resulting in serializing with your task a lot of things that you don't
>> want. But sure it is discutable and it's more my personal opinion.
>>
>>
>> 2014-04-17 23:28 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>
>> Thanks again Eugen! I don't get the point..why you prefer to avoid kyro
>>> ser for closures?is there any problem with that?
>>>  On Apr 17, 2014 11:10 PM, "Eugen Cepoi" <cepoi.eu...@gmail.com> wrote:
>>>
>>>> You have two kind of ser : data and closures. They both use java ser.
>>>> This means that in your function you reference an object outside of it and
>>>> it is getting ser with your task. To enable kryo ser for closures set
>>>> spark.closure.serializer property. But usualy I dont as it allows me to
>>>> detect such unwanted references.
>>>> Le 17 avr. 2014 22:17, "Flavio Pompermaier" <pomperma...@okkam.it> a
>>>> écrit :
>>>>
>>>>> Now I have another problem..I have to pass one o this non serializable
>>>>> object to a PairFunction and I received another non serializable
>>>>> exception..it seems that Kyro doesn't work within Functions. Am I wrong or
>>>>> this is a limit of Spark?
>>>>> On Apr 15, 2014 1:36 PM, "Flavio Pompermaier" <pomperma...@okkam.it>
>>>>> wrote:
>>>>>
>>>>>> Ok thanks for the help!
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 15, 2014 at 12:43 AM, Eugen Cepoi 
>>>>>> <cepoi.eu...@gmail.com>wrote:
>>>>>>
>>>>>>> Nope, those operations are lazy, meaning it will create the RDDs but
>>>>>>> won't trigger any "action". The computation is launched by operations 
>>>>>>> such
>>>>>>> as collect, count, save to HDFS etc. And even if they were not lazy, no
>>>>>>> serialization would happen. Serialization occurs only when data will be
>>>>>>> transfered (collect, shuffle, maybe perist to disk - but I am not sure 
>>>>>>> for
>>>>>>> this one).
>>>>>>>
>>>>>>>
>>>>>>> 2014-04-15 0:34 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>>>>>>>
>>>>>>> Ok, that's fair enough. But why things work up to the collect?during
>>>>>>>> map and filter objects are not serialized?
>>>>>>>>  On Apr 15, 2014 12:31 AM, "Eugen Cepoi" <cepoi.eu...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sure. As you have pointed, those classes don't implement
>>>>>>>>> Serializable and Spark uses by default java serialization (when you do
>>>>>>>>> collect the data from the workers will be serialized, "collected" by 
>>>>>>>>> the
>>>>>>>>> driver and then deserialized on the driver side). Kryo (as most other
>>>>>>>>> decent serialization libs) doesn't require you to implement 
>>>>>>>>> Serializable.
>>>>>>>>>
>>>>>>>>> For the missing attributes it's due to the fact that java
>>>>>>>>> serialization does not ser/deser attributes from classes that don't 
>>>>>>>>> impl.
>>>>>>>>> Serializable (in your case the parent classes).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-04-14 23:17 GMT+02:00 Flavio Pompermaier <
>>>>>>>>> pomperma...@okkam.it>:
>>>>>>>>>
>>>>>>>>>> Thanks Eugen for tgee reply. Could you explain me why I have the
>>>>>>>>>> problem?Why my serialization doesn't work?
>>>>>>>>>> On Apr 14, 2014 6:40 PM, "Eugen Cepoi" <cepoi.eu...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> as a easy workaround you can enable Kryo serialization
>>>>>>>>>>> http://spark.apache.org/docs/latest/configuration.html
>>>>>>>>>>>
>>>>>>>>>>> Eugen
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-04-14 18:21 GMT+02:00 Flavio Pompermaier <
>>>>>>>>>>> pomperma...@okkam.it>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi to all,
>>>>>>>>>>>>
>>>>>>>>>>>> in my application I read objects that are not serializable
>>>>>>>>>>>> because I cannot modify the sources.
>>>>>>>>>>>> So I tried to do a workaround creating a dummy class that
>>>>>>>>>>>> extends the unmodifiable one but implements serializable.
>>>>>>>>>>>> All attributes of the parent class are Lists of objects (some
>>>>>>>>>>>> of them are still not serializable and some of them are, i.e. 
>>>>>>>>>>>> List<String>).
>>>>>>>>>>>>
>>>>>>>>>>>> Until I do map and filter on the RDD that objects are filled
>>>>>>>>>>>> correclty (I checked that via Eclipse debug), but when I do 
>>>>>>>>>>>> collect all the
>>>>>>>>>>>> attributes of my objects are empty. Could you help me please?
>>>>>>>>>>>> I'm using spark-core-2.10 e version 0.9.0-incubating.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Flavio
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>
>
>
> --
>
> Flavio Pompermaier
>
> *Development Department*_______________________________________________
> *OKKAM**Srl **- www.okkam.it <http://www.okkam.it/>*
>
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* pomperma...@okkam.it
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
>
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any manner.
>
>

Reply via email to