Object serialisation inside closures

2014-09-04 Thread Andrianasolo Fanilo
Hello Spark fellows :) I'm a new user of Spark and Scala and have been using both for 6 months without too many problems. Here I'm looking for best practices for using non-serializable classes inside closure. I'm using Spark-0.9.0-incubating here with Hadoop 2.2. Suppose I am using OpenCSV

Re: Object serialisation inside closures

2014-09-04 Thread Sean Owen
In your original version, the object is referenced by the function but it's on the driver, and so has to be serialized. This leads to an error since it's not serializable. Instead, you want to recreate the object locally on each of the remote machines. In your third version you are holding the

Re: Object serialisation inside closures

2014-09-04 Thread Yana Kadiyska
In the third case the object does not get shipped around. Each executor will create it's own instance. I got bitten by this here: http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-object-access-from-mapper-simple-question-tt8125.html On Thu, Sep 4, 2014 at 9:29 AM, Andrianasolo

RE: Object serialisation inside closures

2014-09-04 Thread Andrianasolo Fanilo
data within an executor sadly... Thanks for the input Fanilo -Message d'origine- De : Sean Owen [mailto:so...@cloudera.com] Envoyé : jeudi 4 septembre 2014 15:36 À : Andrianasolo Fanilo Cc : user@spark.apache.org Objet : Re: Object serialisation inside closures In your original version

Re: Object serialisation inside closures

2014-09-04 Thread Mohit Jaggi
: Object serialisation inside closures In your original version, the object is referenced by the function but it's on the driver, and so has to be serialized. This leads to an error since it's not serializable. Instead, you want to recreate the object locally on each of the remote machines