Min, Thank you for comments.
On Sun, Sep 16, 2012 at 12:59 PM, Min Zhou <[email protected]> wrote: > Hi, Hyunsik, > > Hadoop was born before the java nio became popular. Initially, Hadoop's IPC > was written in the way of block io. Although, the recent version of hadoop > and > Yarn has change its IPC implementation use nio, but for historical reason, > its > not a typical way how to use java nio. We did a benchmark on YarnRPC, the > throughput is no more than 50,000 ops, to be worse, to earn a good result, > you should increase the number of RPC handlers. > We've developed another RPC following the best practice how to use java > nio. > Under the power of mina2/netty/grizzly, with a good io thread-model, a > good memory management, carefully avoid memory copies, and system > context switches, we made the throughput up to 168,000 ops. > see http://code.google.com/p/nfs-rpc/ > > Thanks, > Min > > On Sun, Sep 16, 2012 at 11:15 AM, Hyunsik Choi <[email protected] > >wrote: > > > Ted, > > > > Thank you for detail description. I agree that. > > In terms of productivity, I mentioned YarnRPC, > > > > -- > > Hyunsik Choi > > > > On Sun, Sep 16, 2012 at 6:16 AM, Ted Dunning <[email protected]> > > wrote: > > > > > YarnRPC is based on the original Hadoop RPC. The major change is the > > > change from Jute serialization to protobufs. > > > > > > Some of the limitations that I know of are: > > > > > > - only has a Java implementation > > > > > > - uses lots of synchronization instead of using lockless structures > where > > > possible > > > > > > - it is only client/server, not peer to peer > > > > > > - it doesn't support actor-style messages > > > > > > I have to admit that I haven't read the details for some time. > > > > > > On Sat, Sep 15, 2012 at 6:59 AM, Hyunsik Choi <[email protected] > > > >wrote: > > > > > > > You are right. I missed ProcobufRpcEngine transforms the proto > message > > > into > > > > a RpcResponseWritable object. Currently, It is not portable to other > > > > languages. > > > > > > > > However, why do you think YarnRPC is inefficient? It is just curious > =) > > > > > > > > -- > > > > Hyunsik Choi > > > > > > > > On Sat, Sep 15, 2012 at 9:26 PM, Min Zhou <[email protected]> > wrote: > > > > > > > > > YarnRPC -1 > > > > > > > > > > That's quite inefficient in my experience and doesn't support > > > > > multi-languages > > > > > currently. > > > > > > > > > > On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi < > > [email protected] > > > > > >wrote: > > > > > > > > > > > +1 for json as initial data format > > > > > > > > > > > > In addition, I recommend YarnRPC with protocol buffer for > internal > > > RPC > > > > > and > > > > > > API RPC. Protocol buffer is portable to other languages. If we > use > > > > > another > > > > > > RPC system, we have to additionally consider the security aspect > of > > > > > Hadoop. > > > > > > > > > > > > -- > > > > > > Hyunsik Choi > > > > > > > > > > > > On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <[email protected]> > > > wrote: > > > > > > > > > > > > > There should be 2 types of serialization method. One should > > define > > > > its > > > > > > > schema, > > > > > > > for the use of RPC, user wire API; while the other need not > > define > > > > > > schema, > > > > > > > it > > > > > > > typically for internal data transfer, I think fastjson or kryo > is > > > > quite > > > > > > > suitable for the > > > > > > > latter purpose. > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > Min > > > > > > > > > > > > > > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > Point taken … +1 for protobuf - from my POV we can close > > ISSUE-1 > > > > > > > > > > > > > > > > > The question of an internal wire format, btw, does not > > > constrain > > > > > the > > > > > > > > project relative to external access. > > > > > > > > > > > > > > > > Sounds sensible. > > > > > > > > > > > > > > > > The only one thing I really don't get is: why did you put > Avro > > > and > > > > > JSON > > > > > > > > into the proposal [1] in the first place? Or is this the > > > 'external > > > > > > > access' > > > > > > > > from above? > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Michael > > > > > > > > > > > > > > > > [1] http://wiki.apache.org/incubator/DrillProposal > > > > > > > > > > > > > > > > -- > > > > > > > > Michael Hausenblas > > > > > > > > Ireland, Europe > > > > > > > > http://mhausenblas.info/ > > > > > > > > > > > > > > > > On 14 Sep 2012, at 22:31, Ted Dunning wrote: > > > > > > > > > > > > > > > > > I think that it is important to ask a few questions leading > > up > > > a > > > > > > > decision > > > > > > > > > here. > > > > > > > > > > > > > > > > > > The first is a (rhetorical) show of hands about how many > > people > > > > > > believe > > > > > > > > > that there are no serious performance or expressivity > killers > > > > when > > > > > > > > > comparing alternative serialization frameworks. As far as > I > > > > know, > > > > > > > > > performance differences are not massive (and protobufs is > one > > > of > > > > > the > > > > > > > > > leaders in any case) and the expressivity differences are > > > > > essentially > > > > > > > > nil. > > > > > > > > > If somebody feels that there is a serious show-stopper with > > any > > > > > > option, > > > > > > > > > they should speak. > > > > > > > > > > > > > > > > > > The second is to ask the sense of the community whether > they > > > > judge > > > > > > > > progress > > > > > > > > > or perfection in this decision is most important to the > > > project. > > > > > My > > > > > > > > guess > > > > > > > > > is that almost everybody would prefer to see progress as > long > > > as > > > > > the > > > > > > > > > technical choice is not subject to some horrid missing bit. > > > > > > > > > > > > > > > > > > The final question is whether it is reasonable to go along > > with > > > > > > > protobufs > > > > > > > > > given that several very experienced engineers prefer it and > > > would > > > > > > like > > > > > > > to > > > > > > > > > produce code based on it. If the first two answers are > > > answered > > > > to > > > > > > the > > > > > > > > > effect of protobufs is about as good as we will find and > that > > > > > > progress > > > > > > > > > trumps small differences, then it seems that moving to > follow > > > > this > > > > > > > > > preference of Jason and Ryan for protobufs might be a > > > reasonable > > > > > > thing > > > > > > > to > > > > > > > > > do. > > > > > > > > > > > > > > > > > > The question of an internal wire format, btw, does not > > > constrain > > > > > the > > > > > > > > > project relative to external access. I think it is > important > > > to > > > > > > > support > > > > > > > > > JDBC and ODBC and whatever is in common use for querying. > > For > > > > > > external > > > > > > > > > access the question is quite different. Whereas for the > > > internal > > > > > > > format > > > > > > > > > consensus around a single choice has large benefits, the > > > external > > > > > > > format > > > > > > > > > choice is nearly the opposite. For an external format, > > > limiting > > > > > > > > ourselves > > > > > > > > > to a single choice seems like a bad idea and increasing the > > > > > audience > > > > > > > > seems > > > > > > > > > like a better choice. > > > > > > > > > > > > > > > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> Hi folks, > > > > > > > > >> > > > > > > > > >> I just commented on this first JIRA. Here is my text: > > > > > > > > >> > > > > > > > > >> This issue has been hashed over a lot in the Hadoop > > projects. > > > > > There > > > > > > > > >> was work done to compare thrift vs avro vs protobuf. The > > > > > conclusion > > > > > > > > >> was protobuf was the decision to use. > > > > > > > > >> > > > > > > > > >> Prior to this move, there had been a lot of noise about > > > > pluggable > > > > > > RPC > > > > > > > > >> transports, and whatnot. It held up adoption of a > backwards > > > > > > compatible > > > > > > > > >> serialization framework for a long time. The problem ended > > up > > > > > being > > > > > > > > >> the analysis-paralysis, rather than the specific > > > implementation > > > > > > > > >> problem. In other words, the problem was a LACK of > > > > implementation > > > > > > than > > > > > > > > >> actual REAL problems. > > > > > > > > >> > > > > > > > > >> Based on this experience, I'd strongly suggest adopting > > > protobuf > > > > > and > > > > > > > > >> moving on. Forget about pluggable RPC implementations, the > > > > > > complexity > > > > > > > > >> doesnt deliver benefits. The benefits of protobuf is that > > its > > > > the > > > > > > RPC > > > > > > > > >> format for Hadoop and HBase, which allows Drill to draw on > > the > > > > > broad > > > > > > > > >> experience of those communities who need to implement high > > > > > > performance > > > > > > > > >> backwards compatible RPC serialization. > > > > > > > > >> > > > > > > > > >> ==== > > > > > > > > >> > > > > > > > > >> Expanding a bit, I've looked in to this issue a lot, and > > there > > > > is > > > > > > very > > > > > > > > >> few significant concrete reasons to choose protobuf vs > > thrift. > > > > > Tiny > > > > > > > > >> percent faster of this, and that, etc. I'd strongly > suggest > > > > > > protobuf > > > > > > > > >> for the expanded community. There is no particular Apache > > > > > > imperative > > > > > > > > >> that Apache projects re-use libraries. Use what makes > sense > > > for > > > > > > your > > > > > > > > >> project. > > > > > > > > >> > > > > > > > > >> As regards to Avro, it's a fine serialization format for > > long > > > > term > > > > > > > > >> data retention, but the complexities that exist to enable > > that > > > > > make > > > > > > it > > > > > > > > >> non-ideal for an RPC. I know of no one who uses AvroRPC > in > > > any > > > > > > form. > > > > > > > > >> > > > > > > > > >> -ryan > > > > > > > > >> > > > > > > > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran < > > > > > [email protected] > > > > > > > > > > > > > > > >> wrote: > > > > > > > > >>> We plan to propose the architecture and interfaces in the > > > next > > > > > > couple > > > > > > > > >>> weeks, which will make it easy to divide the project into > > > clear > > > > > > > > building > > > > > > > > >>> blocks. At that point it will be easier to start > > contributing > > > > > > > different > > > > > > > > >>> data sources, data formats, operators, query languages, > > etc. > > > > > > > > >>> > > > > > > > > >>> The contributions are done in the usual Apache way. It's > > best > > > > to > > > > > > > open a > > > > > > > > >>> JIRA and then post a patch so that others can review and > > > then a > > > > > > > > committer > > > > > > > > >>> can check it in. > > > > > > > > >>> > > > > > > > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > > > > > > > > >> [email protected] > > > > > > > > >>>> wrote: > > > > > > > > >>> > > > > > > > > >>>> Hi > > > > > > > > >>>> > > > > > > > > >>>> Hi > > > > > > > > >>>> > > > > > > > > >>>> What is the process to become a contributor to drill ? > > > > > > > > >>>> > > > > > > > > >>>> Regards > > > > > > > > >>>> chandan > > > > > > > > >>>> > > > > > > > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning < > > > > > > [email protected]> > > > > > > > > >> wrote: > > > > > > > > >>>> > > > > > > > > >>>>> Suffice it to say that if *you* think it is important > > > enough > > > > to > > > > > > > > >> implement > > > > > > > > >>>>> and maintain, then the group shouldn't say naye. The > > > > consensus > > > > > > > stuff > > > > > > > > >>>>> should only block things that break something else. > > > Additive > > > > > > > > features > > > > > > > > >>>> that > > > > > > > > >>>>> are highly maintainable (or which come with > commitments) > > > > > > shouldn't > > > > > > > > >>>>> generally be blocked. > > > > > > > > >>>>> > > > > > > > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > > > > > > > > >>>>> [email protected]> wrote: > > > > > > > > >>>>> > > > > > > > > >>>>>> Good. Feel free to put me down for that, if the group > > as a > > > > > whole > > > > > > > > >> thinks > > > > > > > > >>>>>> that (supporting Thrift) makes sense. > > > > > > > > >>>>>> > > > > > > > > >>>>> > > > > > > > > >>>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> -- > > > > > > > > >>> Tomer Shiran > > > > > > > > >>> Director of Product Management | MapR Technologies | > > > > 650-804-8657 > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > My research interests are distributed systems, parallel > computing > > > and > > > > > > > bytecode based virtual machine. > > > > > > > > > > > > > > My profile: > > > > > > > http://www.linkedin.com/in/coderplay > > > > > > > My blog: > > > > > > > http://coderplay.javaeye.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > My research interests are distributed systems, parallel computing > and > > > > > bytecode based virtual machine. > > > > > > > > > > My profile: > > > > > http://www.linkedin.com/in/coderplay > > > > > My blog: > > > > > http://coderplay.javaeye.com > > > > > > > > > > > > > > > > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
