Thanks Enno, let me have a look at Stream Parser version of Jackson.
Thanks
Best Regards
On Sat, Feb 14, 2015 at 9:30 PM, Enno Shioji eshi...@gmail.com wrote:
Huh, that would come to 6.5ms per one JSON. That does feel like a lot but
if your JSON file is big enough, I guess you could get that
I'm getting a low performance while parsing json data. My cluster setup is
1.2.0 version of spark with 10 Nodes each having 15Gb of memory and 4 cores.
I tried both scala.util.parsing.json.JSON and and fasterxml's Jackson
parser.
This is what i basically do:
*//Approach 1:*
val jsonStream =
I see. I'd really benchmark how the parsing performs outside Spark (in a
tight loop or something). If *that* is slow, you know it's the parsing. If
not, it's not the parsing.
Another thing you want to look at is CPU usage. If the actual parsing
really is the bottleneck, you should see very high
Ah my bad, it works without serializable exception. But not much
performance difference is there though.
Thanks
Best Regards
On Sat, Feb 14, 2015 at 7:45 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Thanks for the suggestion, but doing that gives me this exception:
Huh, that would come to 6.5ms per one JSON. That does feel like a lot but
if your JSON file is big enough, I guess you could get that sort of
processing time.
Jackson is more or less the most efficient JSON parser out there, so unless
the Scala API is somehow affecting it, I don't see any better
(adding back user)
Fair enough. Regarding serialization exception, the hack I use is to have a
object with a transient lazy field, like so:
object Holder extends Serializable {
@transient lazy val mapper = new ObjectMapper()
}
This way, the ObjectMapper will be instantiated at the
Thanks for the suggestion, but doing that gives me this exception:
http://pastebin.com/ni80NqKn
Over this piece of code:
object Holder extends Serializable {
@transient lazy val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
}
Thanks again!
Its with the parser only, just tried the parser
https://gist.github.com/akhld/3948a5d91d218eaf809d without Spark. And it
took me 52 Sec to process 8k json records. Not sure if there's an efficient
way to do this in Spark, i know if i use sparkSQL with schemaRDD and all it
will be