Hello Spark fellows :) I'm a new user of Spark and Scala and have been using both for 6 months without too many problems. Here I'm looking for best practices for using non-serializable classes inside closure. I'm using Spark-0.9.0-incubating here with Hadoop 2.2.
Suppose I am using OpenCSV parser to parse an input file. So inside my main : val sc = new SparkContext("local[2]", "App") val heyRDD = sc.textFile("...") val csvparser = new CSVParser(';') val heyMap = heyRDD.map { line => val temp = csvparser.parseLine(line) (temp(1), temp(4)) } This gives me a java.io.NotSerializableException: au.com.bytecode.opencsv.CSVParser, which seems reasonable. >From here I could see 3 solutions : 1/ Extending CSVParser with Serialisable properties, which adds a lot of boilerplate code if you ask me 2/ Using Kryo Serialization (still need to define a serializer) 3/ Creating an object with an instance of the class I want to use, typically : object CSVParserPlus { val csvParser = new CSVParser(';') def parse(line: String) = { csvParser.parseLine(line) } } val heyMap = heyRDD.map { line => val temp = CSVParserPlus.parse(line) (temp(1), temp(4)) } Third solution works and I don't get how, so I was wondering how worked the closure system inside Spark to be able to serialize an object with a non-serializable instance. How does that work ? Does it hinder performance ? Is it a good solution ? How do you manage this problem ? Any input would be greatly appreciated Best regards, Fanilo ________________________________ Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant ?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre recherch?e quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.