I have heard very good words about MessagePack ( http://msgpack.org/ ) as a binary JSON format. It supports C++ and Java and is very fast.
Kryo is nice (see Storm for a success story) but it is very much limited to Java and your wire protocol depends on you class structure which results in a very bad abstraction leak. Our requirements for the internal API technology include: - support both C++ and Java - super fast - abstracted away from the actual class definition - other stuff Kryo fails badly on C++ and abstraction. It is fabulous as a replacement for Java serialization, but just isn't portable. I love it for what it is, but it isn't what we need here. On Sat, Sep 15, 2012 at 4:39 AM, Min Zhou <[email protected]> wrote: > There should be 2 types of serialization method. One should define its > schema, for the use of RPC, user wire API; while the other need not > define schema, it typically for internal data transfer, I think fastjson or > kryo is quite > suitable for the latter purpose. > > > Regards, > Min > > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas < > [email protected]> wrote: > > > > > Point taken … +1 for protobuf - from my POV we can close ISSUE-1 > > > > > The question of an internal wire format, btw, does not constrain the > > project relative to external access. > > > > Sounds sensible. > > > > The only one thing I really don't get is: why did you put Avro and JSON > > into the proposal [1] in the first place? Or is this the 'external > access' > > from above? > > > > Cheers, > > Michael > > > > [1] http://wiki.apache.org/incubator/DrillProposal > > > > -- > > Michael Hausenblas > > Ireland, Europe > > http://mhausenblas.info/ > > > > On 14 Sep 2012, at 22:31, Ted Dunning wrote: > > > > > I think that it is important to ask a few questions leading up a > decision > > > here. > > > > > > The first is a (rhetorical) show of hands about how many people believe > > > that there are no serious performance or expressivity killers when > > > comparing alternative serialization frameworks. As far as I know, > > > performance differences are not massive (and protobufs is one of the > > > leaders in any case) and the expressivity differences are essentially > > nil. > > > If somebody feels that there is a serious show-stopper with any option, > > > they should speak. > > > > > > The second is to ask the sense of the community whether they judge > > progress > > > or perfection in this decision is most important to the project. My > > guess > > > is that almost everybody would prefer to see progress as long as the > > > technical choice is not subject to some horrid missing bit. > > > > > > The final question is whether it is reasonable to go along with > protobufs > > > given that several very experienced engineers prefer it and would like > to > > > produce code based on it. If the first two answers are answered to the > > > effect of protobufs is about as good as we will find and that progress > > > trumps small differences, then it seems that moving to follow this > > > preference of Jason and Ryan for protobufs might be a reasonable thing > to > > > do. > > > > > > The question of an internal wire format, btw, does not constrain the > > > project relative to external access. I think it is important to > support > > > JDBC and ODBC and whatever is in common use for querying. For external > > > access the question is quite different. Whereas for the internal > format > > > consensus around a single choice has large benefits, the external > format > > > choice is nearly the opposite. For an external format, limiting > > ourselves > > > to a single choice seems like a bad idea and increasing the audience > > seems > > > like a better choice. > > > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> > > wrote: > > > > > >> Hi folks, > > >> > > >> I just commented on this first JIRA. Here is my text: > > >> > > >> This issue has been hashed over a lot in the Hadoop projects. There > > >> was work done to compare thrift vs avro vs protobuf. The conclusion > > >> was protobuf was the decision to use. > > >> > > >> Prior to this move, there had been a lot of noise about pluggable RPC > > >> transports, and whatnot. It held up adoption of a backwards compatible > > >> serialization framework for a long time. The problem ended up being > > >> the analysis-paralysis, rather than the specific implementation > > >> problem. In other words, the problem was a LACK of implementation than > > >> actual REAL problems. > > >> > > >> Based on this experience, I'd strongly suggest adopting protobuf and > > >> moving on. Forget about pluggable RPC implementations, the complexity > > >> doesnt deliver benefits. The benefits of protobuf is that its the RPC > > >> format for Hadoop and HBase, which allows Drill to draw on the broad > > >> experience of those communities who need to implement high performance > > >> backwards compatible RPC serialization. > > >> > > >> ==== > > >> > > >> Expanding a bit, I've looked in to this issue a lot, and there is very > > >> few significant concrete reasons to choose protobuf vs thrift. Tiny > > >> percent faster of this, and that, etc. I'd strongly suggest protobuf > > >> for the expanded community. There is no particular Apache imperative > > >> that Apache projects re-use libraries. Use what makes sense for your > > >> project. > > >> > > >> As regards to Avro, it's a fine serialization format for long term > > >> data retention, but the complexities that exist to enable that make it > > >> non-ideal for an RPC. I know of no one who uses AvroRPC in any form. > > >> > > >> -ryan > > >> > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]> > > >> wrote: > > >>> We plan to propose the architecture and interfaces in the next couple > > >>> weeks, which will make it easy to divide the project into clear > > building > > >>> blocks. At that point it will be easier to start contributing > different > > >>> data sources, data formats, operators, query languages, etc. > > >>> > > >>> The contributions are done in the usual Apache way. It's best to > open a > > >>> JIRA and then post a patch so that others can review and then a > > committer > > >>> can check it in. > > >>> > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > > >> [email protected] > > >>>> wrote: > > >>> > > >>>> Hi > > >>>> > > >>>> Hi > > >>>> > > >>>> What is the process to become a contributor to drill ? > > >>>> > > >>>> Regards > > >>>> chandan > > >>>> > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]> > > >> wrote: > > >>>> > > >>>>> Suffice it to say that if *you* think it is important enough to > > >> implement > > >>>>> and maintain, then the group shouldn't say naye. The consensus > stuff > > >>>>> should only block things that break something else. Additive > > features > > >>>> that > > >>>>> are highly maintainable (or which come with commitments) shouldn't > > >>>>> generally be blocked. > > >>>>> > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > > >>>>> [email protected]> wrote: > > >>>>> > > >>>>>> Good. Feel free to put me down for that, if the group as a whole > > >> thinks > > >>>>>> that (supporting Thrift) makes sense. > > >>>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Tomer Shiran > > >>> Director of Product Management | MapR Technologies | 650-804-8657 > > >> > > > > > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
