Re: Thrift?

Ryan Rawson Fri, 14 Sep 2012 15:31:24 -0700

Funny thing, given how much use protobufs has been put thru, I think
one could make the argument its more battle tested than ASN.1 ...


On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin
<[email protected]> wrote:
> Protobuf is an attempt to make ASN.1 more developer friendly (not a bad
> attempt).
> It's simpler, has much less features, easier to implement and has a compact
> encoding.
> But on other hand it's non-standard, "reinvented wheel" they could just do
> a "better than PER" encoding for ASN.1, and AFAIK has no support for the
> new and shiny Google encodings, like "group varint".
> All in all in current situation it seems a better choice than ASN.1, not
> even arguing about something even more vague and non-standard as Thrift.
>
> On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <[email protected]> wrote:
>
>> Thanks for that Ted.
>>
>> Correct - internal wire format doesnt mean 'drill only supports
>> protobuf encoded data'.
>>
>> Part of the reason to favor protobuf is that a lot of people in the
>> broader 'big data' community are building a lot of experience with it.
>>  Hadoop and HBase both are moving to/moved to protobuf on the wire.
>> Being able to leverage this expertise is valuable.
>>
>> There is a JIRA in Hadoop-land where someone had done a deep dive
>> 'bake off' between thrift, protobuf and avro.  The ultimate choice was
>> protobuf for a number of reasons.  If people want to re-do the
>> analysis, I'd like to see it in the context of THAT analysis (eg: why
>> the assumptions there are not the same for Drill)... if anything it'd
>> give a concrete form to what can be a mire.
>>
>> For what it's worth, I've had many discussion along these angles with
>> a variety of people including committers on Thrift, and the consensus
>> is both are good choices.
>>
>> -ryan
>>
>> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <[email protected]>
>> wrote:
>> > I think that it is important to ask a few questions leading up a decision
>> > here.
>> >
>> > The first is a (rhetorical) show of hands about how many people believe
>> > that there are no serious performance or expressivity killers when
>> > comparing alternative serialization frameworks.  As far as I know,
>> > performance differences are not massive (and protobufs is one of the
>> > leaders in any case) and the expressivity differences are essentially
>> nil.
>> >  If somebody feels that there is a serious show-stopper with any option,
>> > they should speak.
>> >
>> > The second is to ask the sense of the community whether they judge
>> progress
>> > or perfection in this decision is most important to the project.  My
>> guess
>> > is that almost everybody would prefer to see progress as long as the
>> > technical choice is not subject to some horrid missing bit.
>> >
>> > The final question is whether it is reasonable to go along with protobufs
>> > given that several very experienced engineers prefer it and would like to
>> > produce code based on it.  If the first two answers are answered to the
>> > effect of protobufs is about as good as we will find and that progress
>> > trumps small differences, then it seems that moving to follow this
>> > preference of Jason and Ryan for protobufs might be a reasonable thing to
>> > do.
>> >
>> > The question of an internal wire format, btw, does not constrain the
>> > project relative to external access.  I think it is important to support
>> > JDBC and ODBC and whatever is in common use for querying.  For external
>> > access the question is quite different.  Whereas for the internal format
>> > consensus around a single choice has large benefits, the external format
>> > choice is nearly the opposite.  For an external format, limiting
>> ourselves
>> > to a single choice seems like a bad idea and increasing the audience
>> seems
>> > like a better choice.
>> >
>> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]>
>> wrote:
>> >
>> >> Hi folks,
>> >>
>> >> I just commented on this first JIRA.  Here is my text:
>> >>
>> >> This issue has been hashed over a lot in the Hadoop projects. There
>> >> was work done to compare thrift vs avro vs protobuf. The conclusion
>> >> was protobuf was the decision to use.
>> >>
>> >> Prior to this move, there had been a lot of noise about pluggable RPC
>> >> transports, and whatnot. It held up adoption of a backwards compatible
>> >> serialization framework for a long time. The problem ended up being
>> >> the analysis-paralysis, rather than the specific implementation
>> >> problem. In other words, the problem was a LACK of implementation than
>> >> actual REAL problems.
>> >>
>> >> Based on this experience, I'd strongly suggest adopting protobuf and
>> >> moving on. Forget about pluggable RPC implementations, the complexity
>> >> doesnt deliver benefits. The benefits of protobuf is that its the RPC
>> >> format for Hadoop and HBase, which allows Drill to draw on the broad
>> >> experience of those communities who need to implement high performance
>> >> backwards compatible RPC serialization.
>> >>
>> >> ====
>> >>
>> >> Expanding a bit, I've looked in to this issue a lot, and there is very
>> >> few significant concrete reasons to choose protobuf vs thrift.  Tiny
>> >> percent faster of this, and that, etc.  I'd strongly suggest protobuf
>> >> for the expanded community.  There is no particular Apache imperative
>> >> that Apache projects re-use libraries.  Use what makes sense for your
>> >> project.
>> >>
>> >> As regards to Avro, it's a fine serialization format for long term
>> >> data retention, but the complexities that exist to enable that make it
>> >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
>> >>
>> >> -ryan
>> >>
>> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]>
>> >> wrote:
>> >> > We plan to propose the architecture and interfaces in the next couple
>> >> > weeks, which will make it easy to divide the project into clear
>> building
>> >> > blocks. At that point it will be easier to start contributing
>> different
>> >> > data sources, data formats, operators, query languages, etc.
>> >> >
>> >> > The contributions are done in the usual Apache way. It's best to open
>> a
>> >> > JIRA and then post a patch so that others can review and then a
>> committer
>> >> > can check it in.
>> >> >
>> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
>> >> [email protected]
>> >> >> wrote:
>> >> >
>> >> >> Hi
>> >> >>
>> >> >> Hi
>> >> >>
>> >> >> What is the process to become a contributor to drill ?
>> >> >>
>> >> >> Regards
>> >> >> chandan
>> >> >>
>> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]>
>> >> wrote:
>> >> >>
>> >> >> > Suffice it to say that if *you* think it is important enough to
>> >> implement
>> >> >> > and maintain, then the group shouldn't say naye.  The consensus
>> stuff
>> >> >> > should only block things that break something else.  Additive
>> features
>> >> >> that
>> >> >> > are highly maintainable (or which come with commitments) shouldn't
>> >> >> > generally be blocked.
>> >> >> >
>> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
>> >> >> > [email protected]> wrote:
>> >> >> >
>> >> >> > > Good. Feel free to put me down for that, if the group as a whole
>> >> thinks
>> >> >> > > that (supporting Thrift) makes sense.
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Tomer Shiran
>> >> > Director of Product Management | MapR Technologies | 650-804-8657
>> >>
>>

Re: Thrift?

Reply via email to