There should be 2 types of serialization method. One should define its schema, for the use of RPC, user wire API; while the other need not define schema, it typically for internal data transfer, I think fastjson or kryo is quite suitable for the latter purpose.
Regards, Min On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas < [email protected]> wrote: > > Point taken … +1 for protobuf - from my POV we can close ISSUE-1 > > > The question of an internal wire format, btw, does not constrain the > project relative to external access. > > Sounds sensible. > > The only one thing I really don't get is: why did you put Avro and JSON > into the proposal [1] in the first place? Or is this the 'external access' > from above? > > Cheers, > Michael > > [1] http://wiki.apache.org/incubator/DrillProposal > > -- > Michael Hausenblas > Ireland, Europe > http://mhausenblas.info/ > > On 14 Sep 2012, at 22:31, Ted Dunning wrote: > > > I think that it is important to ask a few questions leading up a decision > > here. > > > > The first is a (rhetorical) show of hands about how many people believe > > that there are no serious performance or expressivity killers when > > comparing alternative serialization frameworks. As far as I know, > > performance differences are not massive (and protobufs is one of the > > leaders in any case) and the expressivity differences are essentially > nil. > > If somebody feels that there is a serious show-stopper with any option, > > they should speak. > > > > The second is to ask the sense of the community whether they judge > progress > > or perfection in this decision is most important to the project. My > guess > > is that almost everybody would prefer to see progress as long as the > > technical choice is not subject to some horrid missing bit. > > > > The final question is whether it is reasonable to go along with protobufs > > given that several very experienced engineers prefer it and would like to > > produce code based on it. If the first two answers are answered to the > > effect of protobufs is about as good as we will find and that progress > > trumps small differences, then it seems that moving to follow this > > preference of Jason and Ryan for protobufs might be a reasonable thing to > > do. > > > > The question of an internal wire format, btw, does not constrain the > > project relative to external access. I think it is important to support > > JDBC and ODBC and whatever is in common use for querying. For external > > access the question is quite different. Whereas for the internal format > > consensus around a single choice has large benefits, the external format > > choice is nearly the opposite. For an external format, limiting > ourselves > > to a single choice seems like a bad idea and increasing the audience > seems > > like a better choice. > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> > wrote: > > > >> Hi folks, > >> > >> I just commented on this first JIRA. Here is my text: > >> > >> This issue has been hashed over a lot in the Hadoop projects. There > >> was work done to compare thrift vs avro vs protobuf. The conclusion > >> was protobuf was the decision to use. > >> > >> Prior to this move, there had been a lot of noise about pluggable RPC > >> transports, and whatnot. It held up adoption of a backwards compatible > >> serialization framework for a long time. The problem ended up being > >> the analysis-paralysis, rather than the specific implementation > >> problem. In other words, the problem was a LACK of implementation than > >> actual REAL problems. > >> > >> Based on this experience, I'd strongly suggest adopting protobuf and > >> moving on. Forget about pluggable RPC implementations, the complexity > >> doesnt deliver benefits. The benefits of protobuf is that its the RPC > >> format for Hadoop and HBase, which allows Drill to draw on the broad > >> experience of those communities who need to implement high performance > >> backwards compatible RPC serialization. > >> > >> ==== > >> > >> Expanding a bit, I've looked in to this issue a lot, and there is very > >> few significant concrete reasons to choose protobuf vs thrift. Tiny > >> percent faster of this, and that, etc. I'd strongly suggest protobuf > >> for the expanded community. There is no particular Apache imperative > >> that Apache projects re-use libraries. Use what makes sense for your > >> project. > >> > >> As regards to Avro, it's a fine serialization format for long term > >> data retention, but the complexities that exist to enable that make it > >> non-ideal for an RPC. I know of no one who uses AvroRPC in any form. > >> > >> -ryan > >> > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]> > >> wrote: > >>> We plan to propose the architecture and interfaces in the next couple > >>> weeks, which will make it easy to divide the project into clear > building > >>> blocks. At that point it will be easier to start contributing different > >>> data sources, data formats, operators, query languages, etc. > >>> > >>> The contributions are done in the usual Apache way. It's best to open a > >>> JIRA and then post a patch so that others can review and then a > committer > >>> can check it in. > >>> > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > >> [email protected] > >>>> wrote: > >>> > >>>> Hi > >>>> > >>>> Hi > >>>> > >>>> What is the process to become a contributor to drill ? > >>>> > >>>> Regards > >>>> chandan > >>>> > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]> > >> wrote: > >>>> > >>>>> Suffice it to say that if *you* think it is important enough to > >> implement > >>>>> and maintain, then the group shouldn't say naye. The consensus stuff > >>>>> should only block things that break something else. Additive > features > >>>> that > >>>>> are highly maintainable (or which come with commitments) shouldn't > >>>>> generally be blocked. > >>>>> > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> Good. Feel free to put me down for that, if the group as a whole > >> thinks > >>>>>> that (supporting Thrift) makes sense. > >>>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Tomer Shiran > >>> Director of Product Management | MapR Technologies | 650-804-8657 > >> > > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
