I'm getting a low performance while parsing json data. My cluster setup is 1.2.0 version of spark with 10 Nodes each having 15Gb of memory and 4 cores.
I tried both scala.util.parsing.json.JSON and and fasterxml's Jackson parser. This is what i basically do: *//Approach 1:* val jsonStream = myDStream.map(x=> { val mapper = new ObjectMapper() with ScalaObjectMapper mapper.registerModule(DefaultScalaModule) mapper.readValue[Map[String,Any]](x) }) jsonStream.count().print() *//Approach 2:* val jsonStream2 = myDStream.map(JSON.parseFull(_).get.asInstanceOf[scala.collection.immutable.Map[String, Any]]) jsonStream2.count().print() It takes around 15-20 Seconds to process/parse 35k json documents (contains nested documents and arrays) which i put in the stream. Is there any better approach/parser to process it faster? i also tried it with mapPartitions but it did not make any difference. Thanks Best Regards