Hi Matt,

how will you solve the support of the kernel clients through libceph.ko
with Accelio/libxio ?

Best Regards,
-Dieter

On Wed, Dec 11, 2013 at 11:32:28PM +0100, Matt W. Benjamin wrote:
> Hi Ceph devs,
> 
> For the last several weeks, we've been working with engineers at
> Mellanox on a prototype Ceph messaging implementation that runs on
> the Accelio RDMA messaging service (libxio).
> 
> Accelio is a rather new effort to build a high-performance, high-throughput
> message passing framework atop openfabrics ibverbs and rdmacm primitives.
> 
> It's early days, but the implementation has started to take shape, and
> gives a feel for what the Accelio architecture looks like when using the
> request-response model, as well as for our prototype mapping of the
> xio framework concepts to the Ceph ones.
> 
> The current classes and responsibility breakdown somewhat as follows.
> The key classes in the TCP messaging implementation are:
> 
> Messenger (abstract, represents a set of bidirectional communication 
> endpoints)
> SimpleMessenger (concrete TCP messenger)
> 
> Message (abstract, models a message between endpoints, all Ceph protocol 
> messages
> derive from Message, obviously)
> 
> Connection (concrete, though it -feels- abstract;  Connection models a 
> communication
> endpoint identifiable by address, but has -some- coupling with the internals 
> of
> SimpleMessenger, in particular, with its Pipe, below).
> 
> Pipe (concrete, an active (threaded) object that encapsulates various 
> operations on
> one side (send or recv) of a TCP connection.  The Pipe is really where a 
> -lot- of
> the heavy lifting of SimpleMessenger is localized, and not just in the obvious
> ways--eg, Pipe drives the dispatch queue in SimpleMessenger, so a lot of it's
> visible semantics are built in cooperation with Pipe).
> 
> Dispatcher (abstract, models the application processing messages and sending 
> replies--ie, the upper edge of Messenger).
> 
> The approach I took in incorporating Accelio was to build on the key 
> abstractions
> of Messenger, Connection, and Dispatcher, and Message, and build a 
> corresponding
> family of concrete classes:
> 
> XioMessenger (concrete, implements Messenger, encapsulates xio endpoints, 
> aggregates
> dispatchers as normal).
> 
> XioConnection (concrete, implements Connection)
> 
> XioPortal (concrete, a new class that represents worker thread contexts for 
> all XioConnections in a given XioMessenger)
> 
> XioMsg (concrete, a "transfer" class linking a sequence of low-level Accelio 
> datagrams with a Message being sent)
> 
> XioReplyHook (concrete, derived from Ceph::Context [indirectly via 
> Message::ReplyHook], links a sequence of low-level Accelio datagrams for a 
> Message that has been received-- that is, part of a new "reply" abstraction 
> exposed to Message and Messenger).
> 
> As noted above, there is some leakage of SimpleMessenger primitives into 
> classes that are intended to be abstract, and some refactoring was needed to 
> fit XioMessenger into the framework.  The main changes I prototyped are as 
> follows:
> 
> All traces of Pipe are removed from Connection, which is made abstract.  A new
> PipeConnection is introduced, that knows about Pipes.  SimpleMessenger now 
> uses
> instances of PipeConnection as its concrete connection type.
> 
> The most interesting changes I introduced are driven by the need to support
> Accelio's request/response model, which exists mainly to support RDMA memory
> registration primitives, and needs a concrete realization in the Messenger
> framework.
> 
> To accomodate it, I've introduced two concepts.  First, callers replying to a 
> Message use a new Messenger::send_reply(Message *msg, Message *reply) method. 
>  In SimpleMessenger, this just maps to a call to send_message(Message *, 
> Connection*), but in XioMessenger, the reply is delivered through a new 
> Message::reply_hook completion functor that XioConnection sets when a message 
> is being dispatched.  This is a general mechanism, new Messenger 
> implementations can derive from Message::ReplyHook to define their own reply 
> behavior, as needed.
> 
> A lot of low level details of the mapping from Message to Accelio messaging 
> are
> currently in flux, but the basic idea is to re-use the current encode/decode 
> primitives as far as possible, while eliding the acks, sequence # and tids, 
> and timestamp behaviors of Pipe, or rather, replacing them with mappings to 
> Accelio primitives.  I have some wrapper classes that help with this.  For 
> the moment, the existing Ceph message headers and footers are still there, 
> but are now encoded/decoded, rather than hand-marshalled.  This means that 
> checksumming 
> is probably mostly intact.  Message signatures are not implemented.
> 
> What works.  The current prototype isn't integrated with the main server 
> daemons
> (e.g., OSD) but experimental work on that is in progress.  I've created a 
> pair of
> simple standalone client/server applications simple_server/simple_client and
> a matching xio_server/xio_client, that provide a minimal message dispatch 
> loop with
> a new SimpleDispatcher class and some other helpers, as a way to work with 
> both
> messengers side-by-side.  These are currently very primitive, but will 
> probably
> do more things soon.  The current prototype sends messages over Accelio, but 
> has some issue
> with replies, that should be fixed shortly.  It leaks lots of memory, etc.
> 
> We've pushed a work-in-progress branch "xio-messenger" to our external github
> repository, for community review.  Find it here:
> 
> https://github.com/linuxbox2/linuxbox-ceph
> 
> Thanks!
> 
> Matt
> 
> -- 
> Matt Benjamin
> CohortFS, LLC.
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI  48104
> 
> http://cohortfs.com
> 
> tel.  734-761-4689 
> fax.  734-769-8938 
> cel.  734-216-5309 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to