I think that it is important to ask a few questions leading up a decision here.
The first is a (rhetorical) show of hands about how many people believe that there are no serious performance or expressivity killers when comparing alternative serialization frameworks. As far as I know, performance differences are not massive (and protobufs is one of the leaders in any case) and the expressivity differences are essentially nil. If somebody feels that there is a serious show-stopper with any option, they should speak. The second is to ask the sense of the community whether they judge progress or perfection in this decision is most important to the project. My guess is that almost everybody would prefer to see progress as long as the technical choice is not subject to some horrid missing bit. The final question is whether it is reasonable to go along with protobufs given that several very experienced engineers prefer it and would like to produce code based on it. If the first two answers are answered to the effect of protobufs is about as good as we will find and that progress trumps small differences, then it seems that moving to follow this preference of Jason and Ryan for protobufs might be a reasonable thing to do. The question of an internal wire format, btw, does not constrain the project relative to external access. I think it is important to support JDBC and ODBC and whatever is in common use for querying. For external access the question is quite different. Whereas for the internal format consensus around a single choice has large benefits, the external format choice is nearly the opposite. For an external format, limiting ourselves to a single choice seems like a bad idea and increasing the audience seems like a better choice. On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> wrote: > Hi folks, > > I just commented on this first JIRA. Here is my text: > > This issue has been hashed over a lot in the Hadoop projects. There > was work done to compare thrift vs avro vs protobuf. The conclusion > was protobuf was the decision to use. > > Prior to this move, there had been a lot of noise about pluggable RPC > transports, and whatnot. It held up adoption of a backwards compatible > serialization framework for a long time. The problem ended up being > the analysis-paralysis, rather than the specific implementation > problem. In other words, the problem was a LACK of implementation than > actual REAL problems. > > Based on this experience, I'd strongly suggest adopting protobuf and > moving on. Forget about pluggable RPC implementations, the complexity > doesnt deliver benefits. The benefits of protobuf is that its the RPC > format for Hadoop and HBase, which allows Drill to draw on the broad > experience of those communities who need to implement high performance > backwards compatible RPC serialization. > > ==== > > Expanding a bit, I've looked in to this issue a lot, and there is very > few significant concrete reasons to choose protobuf vs thrift. Tiny > percent faster of this, and that, etc. I'd strongly suggest protobuf > for the expanded community. There is no particular Apache imperative > that Apache projects re-use libraries. Use what makes sense for your > project. > > As regards to Avro, it's a fine serialization format for long term > data retention, but the complexities that exist to enable that make it > non-ideal for an RPC. I know of no one who uses AvroRPC in any form. > > -ryan > > On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]> > wrote: > > We plan to propose the architecture and interfaces in the next couple > > weeks, which will make it easy to divide the project into clear building > > blocks. At that point it will be easier to start contributing different > > data sources, data formats, operators, query languages, etc. > > > > The contributions are done in the usual Apache way. It's best to open a > > JIRA and then post a patch so that others can review and then a committer > > can check it in. > > > > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > [email protected] > >> wrote: > > > >> Hi > >> > >> Hi > >> > >> What is the process to become a contributor to drill ? > >> > >> Regards > >> chandan > >> > >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]> > wrote: > >> > >> > Suffice it to say that if *you* think it is important enough to > implement > >> > and maintain, then the group shouldn't say naye. The consensus stuff > >> > should only block things that break something else. Additive features > >> that > >> > are highly maintainable (or which come with commitments) shouldn't > >> > generally be blocked. > >> > > >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > >> > [email protected]> wrote: > >> > > >> > > Good. Feel free to put me down for that, if the group as a whole > thinks > >> > > that (supporting Thrift) makes sense. > >> > > > >> > > >> > > > > > > > > -- > > Tomer Shiran > > Director of Product Management | MapR Technologies | 650-804-8657 >
