Re: Thrift?

Min Zhou Sat, 15 Sep 2012 20:59:46 -0700

Hi, Hyunsik,

Hadoop was born before the java nio became popular. Initially, Hadoop's IPC
was written in the way of block io. Although, the recent version of hadoop
and
Yarn has change its IPC implementation use nio, but for historical reason,
its
not a typical way how to use java nio. We did a benchmark on YarnRPC, the
throughput is no more than 50,000 ops, to be worse, to earn a good result,
you should increase the number of RPC handlers.
We've developed another RPC following the best practice how to use java nio.
Under the power of mina2/netty/grizzly, with a good io thread-model,  a
good memory management, carefully avoid memory copies, and system
context switches,  we made the throughput up to 168,000 ops.
see http://code.google.com/p/nfs-rpc/


Thanks,
Min

On Sun, Sep 16, 2012 at 11:15 AM, Hyunsik Choi <[email protected]>wrote:

> Ted,
>
> Thank you for detail description. I agree that.
> In terms of productivity, I mentioned YarnRPC,
>
> --
> Hyunsik Choi
>
> On Sun, Sep 16, 2012 at 6:16 AM, Ted Dunning <[email protected]>
> wrote:
>
> > YarnRPC is based on the original Hadoop RPC.  The major change is the
> > change from Jute serialization to protobufs.
> >
> > Some of the limitations that I know of are:
> >
> > - only has a Java implementation
> >
> > - uses lots of synchronization instead of using lockless structures where
> > possible
> >
> > - it is only client/server, not peer to peer
> >
> > - it doesn't support actor-style messages
> >
> > I have to admit that I haven't read the details for some time.
> >
> > On Sat, Sep 15, 2012 at 6:59 AM, Hyunsik Choi <[email protected]
> > >wrote:
> >
> > > You are right. I missed ProcobufRpcEngine transforms the proto message
> > into
> > > a RpcResponseWritable object. Currently, It is not portable to other
> > > languages.
> > >
> > > However, why do you think YarnRPC is inefficient? It is just curious =)
> > >
> > > --
> > > Hyunsik Choi
> > >
> > > On Sat, Sep 15, 2012 at 9:26 PM, Min Zhou <[email protected]> wrote:
> > >
> > > > YarnRPC -1
> > > >
> > > > That's quite inefficient in my experience and doesn't support
> > > > multi-languages
> > > > currently.
> > > >
> > > > On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi <
> [email protected]
> > > > >wrote:
> > > >
> > > > > +1 for json as initial data format
> > > > >
> > > > > In addition, I recommend YarnRPC with protocol buffer for internal
> > RPC
> > > > and
> > > > > API RPC. Protocol buffer is portable to other languages. If we use
> > > > another
> > > > > RPC system, we have to additionally consider the security aspect of
> > > > Hadoop.
> > > > >
> > > > > --
> > > > > Hyunsik Choi
> > > > >
> > > > > On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <[email protected]>
> > wrote:
> > > > >
> > > > > > There should be 2 types of serialization method. One should
> define
> > > its
> > > > > > schema,
> > > > > > for the use of RPC,  user wire API; while the other need not
> define
> > > > > schema,
> > > > > > it
> > > > > > typically for internal data transfer, I think fastjson or kryo is
> > > quite
> > > > > > suitable for the
> > > > > > latter purpose.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Min
> > > > > >
> > > > > > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > >
> > > > > > > Point taken … +1 for protobuf - from my POV we can close
> ISSUE-1
> > > > > > >
> > > > > > > > The question of an internal wire format, btw, does not
> > constrain
> > > > the
> > > > > > > project relative to external access.
> > > > > > >
> > > > > > > Sounds sensible.
> > > > > > >
> > > > > > > The only one thing I really don't get is: why did you put Avro
> > and
> > > > JSON
> > > > > > > into the proposal [1] in the first place? Or is this the
> > 'external
> > > > > > access'
> > > > > > > from above?
> > > > > > >
> > > > > > > Cheers,
> > > > > > >            Michael
> > > > > > >
> > > > > > > [1] http://wiki.apache.org/incubator/DrillProposal
> > > > > > >
> > > > > > > --
> > > > > > > Michael Hausenblas
> > > > > > > Ireland, Europe
> > > > > > > http://mhausenblas.info/
> > > > > > >
> > > > > > > On 14 Sep 2012, at 22:31, Ted Dunning wrote:
> > > > > > >
> > > > > > > > I think that it is important to ask a few questions leading
> up
> > a
> > > > > > decision
> > > > > > > > here.
> > > > > > > >
> > > > > > > > The first is a (rhetorical) show of hands about how many
> people
> > > > > believe
> > > > > > > > that there are no serious performance or expressivity killers
> > > when
> > > > > > > > comparing alternative serialization frameworks.  As far as I
> > > know,
> > > > > > > > performance differences are not massive (and protobufs is one
> > of
> > > > the
> > > > > > > > leaders in any case) and the expressivity differences are
> > > > essentially
> > > > > > > nil.
> > > > > > > > If somebody feels that there is a serious show-stopper with
> any
> > > > > option,
> > > > > > > > they should speak.
> > > > > > > >
> > > > > > > > The second is to ask the sense of the community whether they
> > > judge
> > > > > > > progress
> > > > > > > > or perfection in this decision is most important to the
> > project.
> > > >  My
> > > > > > > guess
> > > > > > > > is that almost everybody would prefer to see progress as long
> > as
> > > > the
> > > > > > > > technical choice is not subject to some horrid missing bit.
> > > > > > > >
> > > > > > > > The final question is whether it is reasonable to go along
> with
> > > > > > protobufs
> > > > > > > > given that several very experienced engineers prefer it and
> > would
> > > > > like
> > > > > > to
> > > > > > > > produce code based on it.  If the first two answers are
> > answered
> > > to
> > > > > the
> > > > > > > > effect of protobufs is about as good as we will find and that
> > > > > progress
> > > > > > > > trumps small differences, then it seems that moving to follow
> > > this
> > > > > > > > preference of Jason and Ryan for protobufs might be a
> > reasonable
> > > > > thing
> > > > > > to
> > > > > > > > do.
> > > > > > > >
> > > > > > > > The question of an internal wire format, btw, does not
> > constrain
> > > > the
> > > > > > > > project relative to external access.  I think it is important
> > to
> > > > > > support
> > > > > > > > JDBC and ODBC and whatever is in common use for querying.
>  For
> > > > > external
> > > > > > > > access the question is quite different.  Whereas for the
> > internal
> > > > > > format
> > > > > > > > consensus around a single choice has large benefits, the
> > external
> > > > > > format
> > > > > > > > choice is nearly the opposite.  For an external format,
> > limiting
> > > > > > > ourselves
> > > > > > > > to a single choice seems like a bad idea and increasing the
> > > > audience
> > > > > > > seems
> > > > > > > > like a better choice.
> > > > > > > >
> > > > > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi folks,
> > > > > > > >>
> > > > > > > >> I just commented on this first JIRA.  Here is my text:
> > > > > > > >>
> > > > > > > >> This issue has been hashed over a lot in the Hadoop
> projects.
> > > > There
> > > > > > > >> was work done to compare thrift vs avro vs protobuf. The
> > > > conclusion
> > > > > > > >> was protobuf was the decision to use.
> > > > > > > >>
> > > > > > > >> Prior to this move, there had been a lot of noise about
> > > pluggable
> > > > > RPC
> > > > > > > >> transports, and whatnot. It held up adoption of a backwards
> > > > > compatible
> > > > > > > >> serialization framework for a long time. The problem ended
> up
> > > > being
> > > > > > > >> the analysis-paralysis, rather than the specific
> > implementation
> > > > > > > >> problem. In other words, the problem was a LACK of
> > > implementation
> > > > > than
> > > > > > > >> actual REAL problems.
> > > > > > > >>
> > > > > > > >> Based on this experience, I'd strongly suggest adopting
> > protobuf
> > > > and
> > > > > > > >> moving on. Forget about pluggable RPC implementations, the
> > > > > complexity
> > > > > > > >> doesnt deliver benefits. The benefits of protobuf is that
> its
> > > the
> > > > > RPC
> > > > > > > >> format for Hadoop and HBase, which allows Drill to draw on
> the
> > > > broad
> > > > > > > >> experience of those communities who need to implement high
> > > > > performance
> > > > > > > >> backwards compatible RPC serialization.
> > > > > > > >>
> > > > > > > >> ====
> > > > > > > >>
> > > > > > > >> Expanding a bit, I've looked in to this issue a lot, and
> there
> > > is
> > > > > very
> > > > > > > >> few significant concrete reasons to choose protobuf vs
> thrift.
> > > >  Tiny
> > > > > > > >> percent faster of this, and that, etc.  I'd strongly suggest
> > > > > protobuf
> > > > > > > >> for the expanded community.  There is no particular Apache
> > > > > imperative
> > > > > > > >> that Apache projects re-use libraries.  Use what makes sense
> > for
> > > > > your
> > > > > > > >> project.
> > > > > > > >>
> > > > > > > >> As regards to Avro, it's a fine serialization format for
> long
> > > term
> > > > > > > >> data retention, but the complexities that exist to enable
> that
> > > > make
> > > > > it
> > > > > > > >> non-ideal for an RPC.  I know of no one who uses AvroRPC in
> > any
> > > > > form.
> > > > > > > >>
> > > > > > > >> -ryan
> > > > > > > >>
> > > > > > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <
> > > > [email protected]
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >>> We plan to propose the architecture and interfaces in the
> > next
> > > > > couple
> > > > > > > >>> weeks, which will make it easy to divide the project into
> > clear
> > > > > > > building
> > > > > > > >>> blocks. At that point it will be easier to start
> contributing
> > > > > > different
> > > > > > > >>> data sources, data formats, operators, query languages,
> etc.
> > > > > > > >>>
> > > > > > > >>> The contributions are done in the usual Apache way. It's
> best
> > > to
> > > > > > open a
> > > > > > > >>> JIRA and then post a patch so that others can review and
> > then a
> > > > > > > committer
> > > > > > > >>> can check it in.
> > > > > > > >>>
> > > > > > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> > > > > > > >> [email protected]
> > > > > > > >>>> wrote:
> > > > > > > >>>
> > > > > > > >>>> Hi
> > > > > > > >>>>
> > > > > > > >>>> Hi
> > > > > > > >>>>
> > > > > > > >>>> What is the process to become a contributor to drill ?
> > > > > > > >>>>
> > > > > > > >>>> Regards
> > > > > > > >>>> chandan
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <
> > > > > [email protected]>
> > > > > > > >> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> Suffice it to say that if *you* think it is important
> > enough
> > > to
> > > > > > > >> implement
> > > > > > > >>>>> and maintain, then the group shouldn't say naye.  The
> > > consensus
> > > > > > stuff
> > > > > > > >>>>> should only block things that break something else.
> >  Additive
> > > > > > > features
> > > > > > > >>>> that
> > > > > > > >>>>> are highly maintainable (or which come with commitments)
> > > > > shouldn't
> > > > > > > >>>>> generally be blocked.
> > > > > > > >>>>>
> > > > > > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
> > > > > > > >>>>> [email protected]> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> Good. Feel free to put me down for that, if the group
> as a
> > > > whole
> > > > > > > >> thinks
> > > > > > > >>>>>> that (supporting Thrift) makes sense.
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> Tomer Shiran
> > > > > > > >>> Director of Product Management | MapR Technologies |
> > > 650-804-8657
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > My research interests are distributed systems, parallel computing
> > and
> > > > > > bytecode based virtual machine.
> > > > > >
> > > > > > My profile:
> > > > > > http://www.linkedin.com/in/coderplay
> > > > > > My blog:
> > > > > > http://coderplay.javaeye.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > My research interests are distributed systems, parallel computing and
> > > > bytecode based virtual machine.
> > > >
> > > > My profile:
> > > > http://www.linkedin.com/in/coderplay
> > > > My blog:
> > > > http://coderplay.javaeye.com
> > > >
> > >
> >
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: Thrift?

Reply via email to