Re: Do all classes involving RDD operation need to be registered?

2014-03-29 Thread anny9699
Thanks so much Sonal! I am much clearer now.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439p3472.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Do all classes involving RDD operation need to be registered?

2014-03-29 Thread Sonal Goyal
>From my limited knowledge, all classes involved with the RDD operations
should be extending Serializable if you want Java serialization(default).

However, if you want Kryo serialization, you can
use conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
If you also want to perform custom serialization, as in you want some
variables to be set diferently/computed etc while deserialization, you
would create a custom registrator, register your classes with it and
call conf.set("spark.kryo.registrator","mypkg.MyKryoRegistrator");

If I am missing something, please feel free to correct me.

Best Regards,
Sonal
Nube Technologies 






On Sat, Mar 29, 2014 at 1:40 AM, anny9699  wrote:

> Thanks a lot Ognen!
>
> It's not a fancy class that I wrote, and now I realized I neither extends
> Serializable or register with Kyro and that's why it is not working.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439p3446.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


Re: Do all classes involving RDD operation need to be registered?

2014-03-28 Thread anny9699
Thanks a lot Ognen!

It's not a fancy class that I wrote, and now I realized I neither extends
Serializable or register with Kyro and that's why it is not working.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439p3446.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Do all classes involving RDD operation need to be registered?

2014-03-28 Thread Ognen Duzlevski
There is also this quote from the Tuning guide 
(http://spark.incubator.apache.org/docs/latest/tuning.html):
" Finally, if you don't register your classes, Kryo will still work, but 
it will have to store the full class name with each object, which is 
wasteful."


It implies that you don't really have to register your classes with 
Kryo. However, what kind of waste are we talking about? :)

Ognen

On 3/28/14, 12:10 PM, Debasish Das wrote:


Classes are serialized and sent to all the workers as akka msgs

singletons and case classes I am not sure if they are javaserialized 
or kryoserialized by default


But definitely your own classes if serialized by kryo will be much 
efficient.there is an comparison that Matei did for all the 
serialization options and kryo was fastest at that time


Hi,

I am sorry if this has been asked before. I found that if I wrapped up 
some
methods in a class with parameters, spark will throw "Task 
Nonserializable"

exception; however if wrapped up in an object or case class without
parameters, it will work fine. Is it true that all classes involving RDD
operation should be registered so that SparkContext could recognize them?

Thanks a lot!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.




Re: Do all classes involving RDD operation need to be registered?

2014-03-28 Thread Debasish Das
Classes are serialized and sent to all the workers as akka msgs

singletons and case classes I am not sure if they are javaserialized or
kryoserialized by default

But definitely your own classes if serialized by kryo will be much
efficient.there is an comparison that Matei did for all the
serialization options and kryo was fastest at that time
Hi,

I am sorry if this has been asked before. I found that if I wrapped up some
methods in a class with parameters, spark will throw "Task Nonserializable"
exception; however if wrapped up in an object or case class without
parameters, it will work fine. Is it true that all classes involving RDD
operation should be registered so that SparkContext could recognize them?

Thanks a lot!



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.