> > There is a JIRA in Hadoop-land where someone had done a deep dive > 'bake off' between thrift, protobuf and avro. The ultimate choice was > protobuf for a number of reasons. If people want to re-do the > analysis, I'd like to see it in the context of THAT analysis (eg: why > the assumptions there are not the same for Drill)... if anything it'd > give a concrete form to what can be a mire.
Could you please provide a pointer to that JIRA. It will be useful background information for me and perhaps others in the group as well. thanks, > On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <[email protected]> wrote: >> I think that it is important to ask a few questions leading up a decision >> here. >> >> The first is a (rhetorical) show of hands about how many people believe >> that there are no serious performance or expressivity killers when >> comparing alternative serialization frameworks. As far as I know, >> performance differences are not massive (and protobufs is one of the >> leaders in any case) and the expressivity differences are essentially nil. >> If somebody feels that there is a serious show-stopper with any option, >> they should speak. >> >> The second is to ask the sense of the community whether they judge progress >> or perfection in this decision is most important to the project. My guess >> is that almost everybody would prefer to see progress as long as the >> technical choice is not subject to some horrid missing bit. >> >> The final question is whether it is reasonable to go along with protobufs >> given that several very experienced engineers prefer it and would like to >> produce code based on it. If the first two answers are answered to the >> effect of protobufs is about as good as we will find and that progress >> trumps small differences, then it seems that moving to follow this >> preference of Jason and Ryan for protobufs might be a reasonable thing to >> do. >> >> The question of an internal wire format, btw, does not constrain the >> project relative to external access. I think it is important to support >> JDBC and ODBC and whatever is in common use for querying. For external >> access the question is quite different. Whereas for the internal format >> consensus around a single choice has large benefits, the external format >> choice is nearly the opposite. For an external format, limiting ourselves >> to a single choice seems like a bad idea and increasing the audience seems >> like a better choice. >> >> On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> wrote: >> >>> Hi folks, >>> >>> I just commented on this first JIRA. Here is my text: >>> >>> This issue has been hashed over a lot in the Hadoop projects. There >>> was work done to compare thrift vs avro vs protobuf. The conclusion >>> was protobuf was the decision to use. >>> >>> Prior to this move, there had been a lot of noise about pluggable RPC >>> transports, and whatnot. It held up adoption of a backwards compatible >>> serialization framework for a long time. The problem ended up being >>> the analysis-paralysis, rather than the specific implementation >>> problem. In other words, the problem was a LACK of implementation than >>> actual REAL problems. >>> >>> Based on this experience, I'd strongly suggest adopting protobuf and >>> moving on. Forget about pluggable RPC implementations, the complexity >>> doesnt deliver benefits. The benefits of protobuf is that its the RPC >>> format for Hadoop and HBase, which allows Drill to draw on the broad >>> experience of those communities who need to implement high performance >>> backwards compatible RPC serialization. >>> >>> ==== >>> >>> Expanding a bit, I've looked in to this issue a lot, and there is very >>> few significant concrete reasons to choose protobuf vs thrift. Tiny >>> percent faster of this, and that, etc. I'd strongly suggest protobuf >>> for the expanded community. There is no particular Apache imperative >>> that Apache projects re-use libraries. Use what makes sense for your >>> project. >>> >>> As regards to Avro, it's a fine serialization format for long term >>> data retention, but the complexities that exist to enable that make it >>> non-ideal for an RPC. I know of no one who uses AvroRPC in any form. >>> >>> -ryan >>> >>> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]> >>> wrote: >>> > We plan to propose the architecture and interfaces in the next couple >>> > weeks, which will make it easy to divide the project into clear building >>> > blocks. At that point it will be easier to start contributing different >>> > data sources, data formats, operators, query languages, etc. >>> > >>> > The contributions are done in the usual Apache way. It's best to open a >>> > JIRA and then post a patch so that others can review and then a committer >>> > can check it in. >>> > >>> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < >>> [email protected] >>> >> wrote: >>> > >>> >> Hi >>> >> >>> >> Hi >>> >> >>> >> What is the process to become a contributor to drill ? >>> >> >>> >> Regards >>> >> chandan >>> >> >>> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]> >>> wrote: >>> >> >>> >> > Suffice it to say that if *you* think it is important enough to >>> implement >>> >> > and maintain, then the group shouldn't say naye. The consensus stuff >>> >> > should only block things that break something else. Additive features >>> >> that >>> >> > are highly maintainable (or which come with commitments) shouldn't >>> >> > generally be blocked. >>> >> > >>> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < >>> >> > [email protected]> wrote: >>> >> > >>> >> > > Good. Feel free to put me down for that, if the group as a whole >>> thinks >>> >> > > that (supporting Thrift) makes sense. >>> >> > > >>> >> > >>> >> >>> > >>> > >>> > >>> > -- >>> > Tomer Shiran >>> > Director of Product Management | MapR Technologies | 650-804-8657 >>>
