----- Original Message -----

> I am not sure I fully understand this reasoning. I imagine that lift-json
> is only one of hundreds of packages that would have to be built if you
> wanted to build all of Spark's transitive dependencies from source.

This is absolutely true.  However, many of Spark's dependencies are already 
available in operating system distributions.  In fact, in the case I am most 
familiar with (packaging Spark for Fedora), Lift is the biggest one left that 
isn't already available or under review.

> Additionally, to make sure I understand the impact -- this is only intended
> to simplify the process of packaging Spark on a new OS distribution that
> disallows pulling in binaries?

Yes, this was my main motivation.  Since the process of building Lift and its 
transitive dependencies is disproportionately complex compared to how much 
Spark uses lift-json, I thought it would be nice to replace it with something 
that could be built as just a JSON library.  I would argue that -- all else 
being equal -- it generally makes sense to make software development choices 
that facilitate packaging for distributions like Fedora and Debian.

There are other actual and potential advantages, though; here are a few:

1.  Based on some simple timing runs I did, json4s-jackson is faster all around 
when running warm (i.e. on subsequent timing runs in the same VM or timing runs 
with enough iterations to last for more than a few seconds), slightly slower 
when running cold on very small parsing tasks, and significantly (~10x) faster 
on large parsing tasks whether cold or warm.  The knee in the cold lift-json 
performance curve is somewhere between 2kb and 50kb of JSON source text.  
json4s-jackson is nominally faster cold with a 12kb file, 40% faster with a 
50kb file, 2.6x faster with a 500kb file and 10x faster with files ranging from 
4-20mb.  Given how Spark uses JSON at the moment, the improved large-file 
parsing performance seems unlikely to be a huge practical advantage for 
json4s-jackson, but it's worth noting.
2.  The release schedule of json4s isn't coupled to the release schedule of a 
larger project.
3.  json4s is intended to provide a uniform interface to Scala JSON libraries, 
and it provides multiple backends, which offers potential flexibility in the 
future.  (To be fair, this interface is heavily based on the one provided by 
Lift, so it would be only slightly more work to go from lift-json to json4s, as 
my patch does, as it would be to switch between json4s backends.)

Again, this change is primarily motivated by a desire to make life easier for 
downstream packagers, but there is no obvious downside (beyond the downsides 
inherent in changing library dependencies) and several minor advantages.


best,
wb

Reply via email to