We have gotten this to work, but it requires instantiating the CoreNLP object on the worker side. Because of the initialization time it makes a lot of sense to do this inside of a .mapPartitions instead of a .map, for example.
As an aside, if you're using it from Scala, have a look at sistanlp, which provided a nicer, scala-friendly interface to CoreNLP. > On Nov 24, 2014, at 7:46 AM, tvas <theodoros.vasilou...@gmail.com> wrote: > > Hello, > > I was wondering if anyone has gotten the Stanford CoreNLP Java library to > work with Spark. > > My attempts to use the parser/annotator fail because of task serialization > errors since the class > StanfordCoreNLP cannot be serialized. > > I've tried the remedies of registering StanfordCoreNLP through kryo, as well > as using chill.MeatLocker, > but these still produce serialization errors. > Passing the StanfordCoreNLP object as transient leads to a > NullPointerException instead. > > Has anybody managed to get this work? > > Regards, > Theodore > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Stanford-CoreNLP-tp19654.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org