Hello Spark fellows :)

I'm a new user of Spark and Scala and have been using both for 6 months without 
too many problems.
Here I'm looking for best practices for using non-serializable classes inside 
closure. I'm using Spark-0.9.0-incubating here with Hadoop 2.2.

Suppose I am using OpenCSV parser to parse an input file. So inside my main :

val sc = new SparkContext("local[2]", "App")
val heyRDD = sc.textFile("...")

val csvparser = new CSVParser(';')
val heyMap = heyRDD.map { line =>
      val temp = csvparser.parseLine(line)
      (temp(1), temp(4))
}


This gives me a java.io.NotSerializableException: 
au.com.bytecode.opencsv.CSVParser, which seems reasonable.

>From here I could see 3 solutions :
1/ Extending CSVParser with Serialisable properties, which adds a lot of 
boilerplate code if you ask me
2/ Using Kryo Serialization (still need to define a serializer)
3/ Creating an object with an instance of the class I want to use, typically :

object CSVParserPlus {

  val csvParser = new CSVParser(';')

  def parse(line: String) = {
    csvParser.parseLine(line)
  }
}


    val heyMap = heyRDD.map { line =>
      val temp = CSVParserPlus.parse(line)
      (temp(1), temp(4))
    }

Third solution works and I don't get how, so I was wondering how worked the 
closure system inside Spark to be able to serialize an object with a 
non-serializable instance. How does that work ? Does it hinder performance ? Is 
it a good solution ? How do you manage this problem ?

Any input would be greatly appreciated

Best regards,
Fanilo

________________________________

Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage 
exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant 
?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre 
recherch?e quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne 
saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.

Reply via email to