>
> There is a JIRA in Hadoop-land where someone had done a deep dive
> 'bake off' between thrift, protobuf and avro.  The ultimate choice was
> protobuf for a number of reasons.  If people want to re-do the
> analysis, I'd like to see it in the context of THAT analysis (eg: why
> the assumptions there are not the same for Drill)... if anything it'd
> give a concrete form to what can be a mire.

Could you please provide a pointer to that JIRA. It will be useful
background information for me and perhaps others in the group as well.
thanks,


> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <[email protected]> wrote:
>> I think that it is important to ask a few questions leading up a decision
>> here.
>>
>> The first is a (rhetorical) show of hands about how many people believe
>> that there are no serious performance or expressivity killers when
>> comparing alternative serialization frameworks.  As far as I know,
>> performance differences are not massive (and protobufs is one of the
>> leaders in any case) and the expressivity differences are essentially nil.
>>  If somebody feels that there is a serious show-stopper with any option,
>> they should speak.
>>
>> The second is to ask the sense of the community whether they judge progress
>> or perfection in this decision is most important to the project.  My guess
>> is that almost everybody would prefer to see progress as long as the
>> technical choice is not subject to some horrid missing bit.
>>
>> The final question is whether it is reasonable to go along with protobufs
>> given that several very experienced engineers prefer it and would like to
>> produce code based on it.  If the first two answers are answered to the
>> effect of protobufs is about as good as we will find and that progress
>> trumps small differences, then it seems that moving to follow this
>> preference of Jason and Ryan for protobufs might be a reasonable thing to
>> do.
>>
>> The question of an internal wire format, btw, does not constrain the
>> project relative to external access.  I think it is important to support
>> JDBC and ODBC and whatever is in common use for querying.  For external
>> access the question is quite different.  Whereas for the internal format
>> consensus around a single choice has large benefits, the external format
>> choice is nearly the opposite.  For an external format, limiting ourselves
>> to a single choice seems like a bad idea and increasing the audience seems
>> like a better choice.
>>
>> On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> wrote:
>>
>>> Hi folks,
>>>
>>> I just commented on this first JIRA.  Here is my text:
>>>
>>> This issue has been hashed over a lot in the Hadoop projects. There
>>> was work done to compare thrift vs avro vs protobuf. The conclusion
>>> was protobuf was the decision to use.
>>>
>>> Prior to this move, there had been a lot of noise about pluggable RPC
>>> transports, and whatnot. It held up adoption of a backwards compatible
>>> serialization framework for a long time. The problem ended up being
>>> the analysis-paralysis, rather than the specific implementation
>>> problem. In other words, the problem was a LACK of implementation than
>>> actual REAL problems.
>>>
>>> Based on this experience, I'd strongly suggest adopting protobuf and
>>> moving on. Forget about pluggable RPC implementations, the complexity
>>> doesnt deliver benefits. The benefits of protobuf is that its the RPC
>>> format for Hadoop and HBase, which allows Drill to draw on the broad
>>> experience of those communities who need to implement high performance
>>> backwards compatible RPC serialization.
>>>
>>> ====
>>>
>>> Expanding a bit, I've looked in to this issue a lot, and there is very
>>> few significant concrete reasons to choose protobuf vs thrift.  Tiny
>>> percent faster of this, and that, etc.  I'd strongly suggest protobuf
>>> for the expanded community.  There is no particular Apache imperative
>>> that Apache projects re-use libraries.  Use what makes sense for your
>>> project.
>>>
>>> As regards to Avro, it's a fine serialization format for long term
>>> data retention, but the complexities that exist to enable that make it
>>> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
>>>
>>> -ryan
>>>
>>> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]>
>>> wrote:
>>> > We plan to propose the architecture and interfaces in the next couple
>>> > weeks, which will make it easy to divide the project into clear building
>>> > blocks. At that point it will be easier to start contributing different
>>> > data sources, data formats, operators, query languages, etc.
>>> >
>>> > The contributions are done in the usual Apache way. It's best to open a
>>> > JIRA and then post a patch so that others can review and then a committer
>>> > can check it in.
>>> >
>>> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
>>> [email protected]
>>> >> wrote:
>>> >
>>> >> Hi
>>> >>
>>> >> Hi
>>> >>
>>> >> What is the process to become a contributor to drill ?
>>> >>
>>> >> Regards
>>> >> chandan
>>> >>
>>> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]>
>>> wrote:
>>> >>
>>> >> > Suffice it to say that if *you* think it is important enough to
>>> implement
>>> >> > and maintain, then the group shouldn't say naye.  The consensus stuff
>>> >> > should only block things that break something else.  Additive features
>>> >> that
>>> >> > are highly maintainable (or which come with commitments) shouldn't
>>> >> > generally be blocked.
>>> >> >
>>> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
>>> >> > [email protected]> wrote:
>>> >> >
>>> >> > > Good. Feel free to put me down for that, if the group as a whole
>>> thinks
>>> >> > > that (supporting Thrift) makes sense.
>>> >> > >
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Tomer Shiran
>>> > Director of Product Management | MapR Technologies | 650-804-8657
>>>

Reply via email to