Will, thanks for the clarifications. I think Spark's main use-case is
"warm, small inputs" right now, but the change seems reasonable to me
nevertheless.

Paul, do you know if there are any issues relevant to Spark that we need
from 2.3.2? We would also have to wait for json4s to release a new version
that depends on 2.3.2, or else pull it in ourselves.


On Wed, Feb 12, 2014 at 9:47 AM, Paul Brown <p...@mult.ifario.us> wrote:

> And, with my FasterXML hat on, if you ask, you'll find the Jackson folks
> will turn around issues quickly.  FWIW, there is a full-suite Jackson 2.3.2
> release rolling right up if you wait a couple of days to pull that in.
>
> -- Paul
>
> --
> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Wed, Feb 12, 2014 at 8:12 AM, Will Benton <wi...@redhat.com> wrote:
>
> > ----- Original Message -----
> >
> > > I am not sure I fully understand this reasoning. I imagine that
> lift-json
> > > is only one of hundreds of packages that would have to be built if you
> > > wanted to build all of Spark's transitive dependencies from source.
> >
> > This is absolutely true.  However, many of Spark's dependencies are
> > already available in operating system distributions.  In fact, in the
> case
> > I am most familiar with (packaging Spark for Fedora), Lift is the biggest
> > one left that isn't already available or under review.
> >
> > > Additionally, to make sure I understand the impact -- this is only
> > intended
> > > to simplify the process of packaging Spark on a new OS distribution
> that
> > > disallows pulling in binaries?
> >
> > Yes, this was my main motivation.  Since the process of building Lift and
> > its transitive dependencies is disproportionately complex compared to how
> > much Spark uses lift-json, I thought it would be nice to replace it with
> > something that could be built as just a JSON library.  I would argue that
> > -- all else being equal -- it generally makes sense to make software
> > development choices that facilitate packaging for distributions like
> Fedora
> > and Debian.
> >
> > There are other actual and potential advantages, though; here are a few:
> >
> > 1.  Based on some simple timing runs I did, json4s-jackson is faster all
> > around when running warm (i.e. on subsequent timing runs in the same VM
> or
> > timing runs with enough iterations to last for more than a few seconds),
> > slightly slower when running cold on very small parsing tasks, and
> > significantly (~10x) faster on large parsing tasks whether cold or warm.
> >  The knee in the cold lift-json performance curve is somewhere between
> 2kb
> > and 50kb of JSON source text.  json4s-jackson is nominally faster cold
> with
> > a 12kb file, 40% faster with a 50kb file, 2.6x faster with a 500kb file
> and
> > 10x faster with files ranging from 4-20mb.  Given how Spark uses JSON at
> > the moment, the improved large-file parsing performance seems unlikely to
> > be a huge practical advantage for json4s-jackson, but it's worth noting.
> > 2.  The release schedule of json4s isn't coupled to the release schedule
> > of a larger project.
> > 3.  json4s is intended to provide a uniform interface to Scala JSON
> > libraries, and it provides multiple backends, which offers potential
> > flexibility in the future.  (To be fair, this interface is heavily based
> on
> > the one provided by Lift, so it would be only slightly more work to go
> from
> > lift-json to json4s, as my patch does, as it would be to switch between
> > json4s backends.)
> >
> > Again, this change is primarily motivated by a desire to make life easier
> > for downstream packagers, but there is no obvious downside (beyond the
> > downsides inherent in changing library dependencies) and several minor
> > advantages.
> >
> >
> > best,
> > wb
> >
>

Reply via email to