This is a cross post of a StackOverflow that has not been answered.
I have just started 50pt bounty on it there.
http://stackoverflow.com/questions/30744294/error-using-json4s-with-apache-spark-in-spark-shell
You can answer it there if you prefer


I am trying to use the case class extraction feature of json4s in Spark,
ie calling `jvalue.extract[MyCaseClass]`. It works fine if I bring the
`JValue` objects into the master and do the extraction there, but the same
calls fail in the workers:

*    import org.json4s._*
*    import org.json4s.jackson.JsonMethods._*
*    import scala.util.{Try, Success, Failure}*

*    val sqx = sqlContext*

*    val data = sc.textFile(inpath).coalesce(2000)*

*    case class PageView(*
*     client:  Option[String]*
*    )*

*    def extract(json: JValue) = {*
*      implicit def formats = org.json4s.DefaultFormats*
*      Try(json.extract[PageView]).toOption*
*    }*

*    val json = data.map(parse(_)).sample(false, 1e-6).cache()*

*    // count initial inputs*
*    val raw = json.count *


*    // count successful extractions locally -- same value as above*
*    val loc = json.toLocalIterator.flatMap(extract).size*

*    // distributed count -- always zero*
*    val dist = json.flatMap(extract).count // always returns zero*

*    // this throws  "org.json4s.package$MappingException: Parsed JSON
values do not match with class constructor"*
*    json.map(x => {implicit def formats = org.json4s.DefaultFormats;
x.extract[PageView]}).count*

The implicit for `Formats` is defined locally in the `extract` function
since DefaultFormats is not serializable and defining it at top level
caused it to be serialized to for transmission to the workers rather than
constructed there. I think the proble still has something to do with the
remote initialization of `DefaultFormats`, but I am not sure what it is.

When I call the `extract` method directly, instead of my `extract`
function, like in the last example, it no longer complains about
serialization but just throws an error that the JSON does not match the
expected structure.

How can I get the extraction to work when distributed to the workers?

Wesley Miao has reproduced the problem and found that it is specific to
spark-shell. He reports that this code works as a standalone application.

thanks
Daniel Mahler


























I am trying to use the case class extraction feature of json4s in Spark, ie
calling jvalue.extract[MyCaseClass]. It works fine if I bring the JValue
objects into the master and do the extraction there, but the same calls
fail in the workers:

import org.json4s._
import org.json4s.jackson.JsonMethods._
import scala.util.{Try, Success, Failure}

val sqx = sqlContext

val data = sc.textFile(inpath).coalesce(2000)

case class PageView(
 client:  Option[String]
)

def extract(json: JValue) = {
  implicit def formats = org.json4s.DefaultFormats
  Try(json.extract[PageView]).toOption
}

val json = data.map(parse(_)).sample(false, 1e-6).cache()

// count initial inputs
val raw = json.count


// count successful extractions locally -- same value as above
val loc = json.toLocalIterator.flatMap(extract).size

// distributed count -- always zero
val dist = json.flatMap(extract).count // always returns zero

// this throws  "org.json4s.package$MappingException: Parsed JSON values do
not match with class constructor"
json.map(x => {implicit def formats = org.json4s.DefaultFormats;
x.extract[PageView]}).count

The implicit for Formats is defined locally in the extract function since
DefaultFormats is not serializable and defining it at top level caused it
to be serialized to for transmission to the workers rather than constructed
there. I think the proble still has something to do with the remote
initialization of DefaultFormats, but I am not sure what it is.

When I call the extract method directly, insted of my extract function,
like in the last example, it no longer complains about serialization but
just throws an error that the JSON does not match the expected structure.

How can I get the extraction to work when distributed to the workers?

@WesleyMiao has reproduced the problem and found that it is specific to
spark-shell. He reports that this code works as a standalone application.

Reply via email to