Hi Matt, how will you solve the support of the kernel clients through libceph.ko with Accelio/libxio ?
Best Regards, -Dieter On Wed, Dec 11, 2013 at 11:32:28PM +0100, Matt W. Benjamin wrote: > Hi Ceph devs, > > For the last several weeks, we've been working with engineers at > Mellanox on a prototype Ceph messaging implementation that runs on > the Accelio RDMA messaging service (libxio). > > Accelio is a rather new effort to build a high-performance, high-throughput > message passing framework atop openfabrics ibverbs and rdmacm primitives. > > It's early days, but the implementation has started to take shape, and > gives a feel for what the Accelio architecture looks like when using the > request-response model, as well as for our prototype mapping of the > xio framework concepts to the Ceph ones. > > The current classes and responsibility breakdown somewhat as follows. > The key classes in the TCP messaging implementation are: > > Messenger (abstract, represents a set of bidirectional communication > endpoints) > SimpleMessenger (concrete TCP messenger) > > Message (abstract, models a message between endpoints, all Ceph protocol > messages > derive from Message, obviously) > > Connection (concrete, though it -feels- abstract; Connection models a > communication > endpoint identifiable by address, but has -some- coupling with the internals > of > SimpleMessenger, in particular, with its Pipe, below). > > Pipe (concrete, an active (threaded) object that encapsulates various > operations on > one side (send or recv) of a TCP connection. The Pipe is really where a > -lot- of > the heavy lifting of SimpleMessenger is localized, and not just in the obvious > ways--eg, Pipe drives the dispatch queue in SimpleMessenger, so a lot of it's > visible semantics are built in cooperation with Pipe). > > Dispatcher (abstract, models the application processing messages and sending > replies--ie, the upper edge of Messenger). > > The approach I took in incorporating Accelio was to build on the key > abstractions > of Messenger, Connection, and Dispatcher, and Message, and build a > corresponding > family of concrete classes: > > XioMessenger (concrete, implements Messenger, encapsulates xio endpoints, > aggregates > dispatchers as normal). > > XioConnection (concrete, implements Connection) > > XioPortal (concrete, a new class that represents worker thread contexts for > all XioConnections in a given XioMessenger) > > XioMsg (concrete, a "transfer" class linking a sequence of low-level Accelio > datagrams with a Message being sent) > > XioReplyHook (concrete, derived from Ceph::Context [indirectly via > Message::ReplyHook], links a sequence of low-level Accelio datagrams for a > Message that has been received-- that is, part of a new "reply" abstraction > exposed to Message and Messenger). > > As noted above, there is some leakage of SimpleMessenger primitives into > classes that are intended to be abstract, and some refactoring was needed to > fit XioMessenger into the framework. The main changes I prototyped are as > follows: > > All traces of Pipe are removed from Connection, which is made abstract. A new > PipeConnection is introduced, that knows about Pipes. SimpleMessenger now > uses > instances of PipeConnection as its concrete connection type. > > The most interesting changes I introduced are driven by the need to support > Accelio's request/response model, which exists mainly to support RDMA memory > registration primitives, and needs a concrete realization in the Messenger > framework. > > To accomodate it, I've introduced two concepts. First, callers replying to a > Message use a new Messenger::send_reply(Message *msg, Message *reply) method. > In SimpleMessenger, this just maps to a call to send_message(Message *, > Connection*), but in XioMessenger, the reply is delivered through a new > Message::reply_hook completion functor that XioConnection sets when a message > is being dispatched. This is a general mechanism, new Messenger > implementations can derive from Message::ReplyHook to define their own reply > behavior, as needed. > > A lot of low level details of the mapping from Message to Accelio messaging > are > currently in flux, but the basic idea is to re-use the current encode/decode > primitives as far as possible, while eliding the acks, sequence # and tids, > and timestamp behaviors of Pipe, or rather, replacing them with mappings to > Accelio primitives. I have some wrapper classes that help with this. For > the moment, the existing Ceph message headers and footers are still there, > but are now encoded/decoded, rather than hand-marshalled. This means that > checksumming > is probably mostly intact. Message signatures are not implemented. > > What works. The current prototype isn't integrated with the main server > daemons > (e.g., OSD) but experimental work on that is in progress. I've created a > pair of > simple standalone client/server applications simple_server/simple_client and > a matching xio_server/xio_client, that provide a minimal message dispatch > loop with > a new SimpleDispatcher class and some other helpers, as a way to work with > both > messengers side-by-side. These are currently very primitive, but will > probably > do more things soon. The current prototype sends messages over Accelio, but > has some issue > with replies, that should be fixed shortly. It leaks lots of memory, etc. > > We've pushed a work-in-progress branch "xio-messenger" to our external github > repository, for community review. Find it here: > > https://github.com/linuxbox2/linuxbox-ceph > > Thanks! > > Matt > > -- > Matt Benjamin > CohortFS, LLC. > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://cohortfs.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
