Re: Thrift?

Ted Dunning Sat, 15 Sep 2012 04:47:37 -0700

I have heard very good words about MessagePack ( http://msgpack.org/ ) as a
binary JSON format.  It supports C++ and Java and is very fast.


Kryo is nice (see Storm for a success story) but it is very much limited to
Java and your wire protocol depends on you class structure which results in
a very bad abstraction leak.

Our requirements for the internal API technology include:

- support both C++ and Java

- super fast

- abstracted away from the actual class definition

- other stuff

Kryo fails badly on C++ and abstraction.  It is fabulous as a replacement
for Java serialization, but just isn't portable.  I love it for what it is,
but it isn't what we need here.

On Sat, Sep 15, 2012 at 4:39 AM, Min Zhou <[email protected]> wrote:

> There should be 2 types of serialization method. One should define its
> schema, for the use of RPC,  user wire API; while the other need not
> define schema, it typically for internal data transfer, I think fastjson or
> kryo is quite
> suitable for the latter purpose.
>
>
> Regards,
> Min
>
> On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas <
> [email protected]> wrote:
>
> >
> > Point taken … +1 for protobuf - from my POV we can close ISSUE-1
> >
> > > The question of an internal wire format, btw, does not constrain the
> > project relative to external access.
> >
> > Sounds sensible.
> >
> > The only one thing I really don't get is: why did you put Avro and JSON
> > into the proposal [1] in the first place? Or is this the 'external
> access'
> > from above?
> >
> > Cheers,
> >            Michael
> >
> > [1] http://wiki.apache.org/incubator/DrillProposal
> >
> > --
> > Michael Hausenblas
> > Ireland, Europe
> > http://mhausenblas.info/
> >
> > On 14 Sep 2012, at 22:31, Ted Dunning wrote:
> >
> > > I think that it is important to ask a few questions leading up a
> decision
> > > here.
> > >
> > > The first is a (rhetorical) show of hands about how many people believe
> > > that there are no serious performance or expressivity killers when
> > > comparing alternative serialization frameworks.  As far as I know,
> > > performance differences are not massive (and protobufs is one of the
> > > leaders in any case) and the expressivity differences are essentially
> > nil.
> > > If somebody feels that there is a serious show-stopper with any option,
> > > they should speak.
> > >
> > > The second is to ask the sense of the community whether they judge
> > progress
> > > or perfection in this decision is most important to the project.  My
> > guess
> > > is that almost everybody would prefer to see progress as long as the
> > > technical choice is not subject to some horrid missing bit.
> > >
> > > The final question is whether it is reasonable to go along with
> protobufs
> > > given that several very experienced engineers prefer it and would like
> to
> > > produce code based on it.  If the first two answers are answered to the
> > > effect of protobufs is about as good as we will find and that progress
> > > trumps small differences, then it seems that moving to follow this
> > > preference of Jason and Ryan for protobufs might be a reasonable thing
> to
> > > do.
> > >
> > > The question of an internal wire format, btw, does not constrain the
> > > project relative to external access.  I think it is important to
> support
> > > JDBC and ODBC and whatever is in common use for querying.  For external
> > > access the question is quite different.  Whereas for the internal
> format
> > > consensus around a single choice has large benefits, the external
> format
> > > choice is nearly the opposite.  For an external format, limiting
> > ourselves
> > > to a single choice seems like a bad idea and increasing the audience
> > seems
> > > like a better choice.
> > >
> > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]>
> > wrote:
> > >
> > >> Hi folks,
> > >>
> > >> I just commented on this first JIRA.  Here is my text:
> > >>
> > >> This issue has been hashed over a lot in the Hadoop projects. There
> > >> was work done to compare thrift vs avro vs protobuf. The conclusion
> > >> was protobuf was the decision to use.
> > >>
> > >> Prior to this move, there had been a lot of noise about pluggable RPC
> > >> transports, and whatnot. It held up adoption of a backwards compatible
> > >> serialization framework for a long time. The problem ended up being
> > >> the analysis-paralysis, rather than the specific implementation
> > >> problem. In other words, the problem was a LACK of implementation than
> > >> actual REAL problems.
> > >>
> > >> Based on this experience, I'd strongly suggest adopting protobuf and
> > >> moving on. Forget about pluggable RPC implementations, the complexity
> > >> doesnt deliver benefits. The benefits of protobuf is that its the RPC
> > >> format for Hadoop and HBase, which allows Drill to draw on the broad
> > >> experience of those communities who need to implement high performance
> > >> backwards compatible RPC serialization.
> > >>
> > >> ====
> > >>
> > >> Expanding a bit, I've looked in to this issue a lot, and there is very
> > >> few significant concrete reasons to choose protobuf vs thrift.  Tiny
> > >> percent faster of this, and that, etc.  I'd strongly suggest protobuf
> > >> for the expanded community.  There is no particular Apache imperative
> > >> that Apache projects re-use libraries.  Use what makes sense for your
> > >> project.
> > >>
> > >> As regards to Avro, it's a fine serialization format for long term
> > >> data retention, but the complexities that exist to enable that make it
> > >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
> > >>
> > >> -ryan
> > >>
> > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]>
> > >> wrote:
> > >>> We plan to propose the architecture and interfaces in the next couple
> > >>> weeks, which will make it easy to divide the project into clear
> > building
> > >>> blocks. At that point it will be easier to start contributing
> different
> > >>> data sources, data formats, operators, query languages, etc.
> > >>>
> > >>> The contributions are done in the usual Apache way. It's best to
> open a
> > >>> JIRA and then post a patch so that others can review and then a
> > committer
> > >>> can check it in.
> > >>>
> > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> > >> [email protected]
> > >>>> wrote:
> > >>>
> > >>>> Hi
> > >>>>
> > >>>> Hi
> > >>>>
> > >>>> What is the process to become a contributor to drill ?
> > >>>>
> > >>>> Regards
> > >>>> chandan
> > >>>>
> > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]>
> > >> wrote:
> > >>>>
> > >>>>> Suffice it to say that if *you* think it is important enough to
> > >> implement
> > >>>>> and maintain, then the group shouldn't say naye.  The consensus
> stuff
> > >>>>> should only block things that break something else.  Additive
> > features
> > >>>> that
> > >>>>> are highly maintainable (or which come with commitments) shouldn't
> > >>>>> generally be blocked.
> > >>>>>
> > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
> > >>>>> [email protected]> wrote:
> > >>>>>
> > >>>>>> Good. Feel free to put me down for that, if the group as a whole
> > >> thinks
> > >>>>>> that (supporting Thrift) makes sense.
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Tomer Shiran
> > >>> Director of Product Management | MapR Technologies | 650-804-8657
> > >>
> >
> >
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Re: Thrift?

Reply via email to