On Mon, Nov 15, 2010 at 05:56:44PM +0200, Nir Muchtar wrote:

> > I'd really prefer that this not be seperate modules. I think it would
> > be good to stick the core stuff as part of ib_uverbs. Especially since
> > it doesn't look too big. For embedded you can have a
> > CONFIG_RDMA_NETLINK or something. (For embedded I think it would use
> > less memory to be able to forgo sysfs and use netlink entirely, someday.)

> Well, the main reason for the module separation is to allow
> extensibility and independence.

Those are orthogonal issues, you can keep a pluggable API without
having modules around it.

> Code separation is just a bonus. I like the idea of having the runtime
> option to use this interface (or not).

Well, I do not like that option at all. This means you can't rely on
netlink being available so people won't use it. If you have to rely on
an admin to add a bunch of module names to /etc/modules then you have
already lost. IMHO. Modules are best used when auto-detection is
possible, other cases are troublesome. I've already seen that (for
instance) using ib_ucm's interface is virtually impossible because
most sites don't load the module, and the end-users that want to use
the software that relies on it can't get an admin to install it.

More non-automatic modules == bad.

> What I wanted to achieve is an IB independent infrastructure that can be
> used in parts. 

That doesn't seem useful, what is really needed here is an RDMA
*dependent* netlink interface. Ie I think your plug-in point is at the
wrong place, ib_verbs should be enumerating all QPs and calling back
to the code that owns them to fill in additional information for that
QP. Ie SRP can annotate what the host ID is, RDMA-CM can include the
IP address, IB-CM can include the PRs/etc/etc

> This way only the modules of interest are joined to the infrastructure. 
> The necessity of this flexibility can be examined of course.

*shrug* why would anyone care except for embedded?

> I agree. That's one of the main goals.
> For example, I have plans for adding ipoib exports as well as other
> ideas.

I have a patch someplace that exports the IPOIB path as part of the
normal netlink neighbour dump, which, IMHO, is appropriate for most
IPOIB information. I can send it to you if you like. A similar
approach can be done for the multicast paths. The locking was
problematic which is why it never was sent to the list..

> > I'll comment on what you have specifically later, but just a quick
> > glance makes me wonder if you reviewed how the 'ss' program exchanges
> > very similar information over netlink for IP sockets when you designed
> > this??

> Yes I have actually. Some ideas are from NETLINK_INET_DIAG which is the
> back-end for ss.
> There are a few differences here that made the result different. 
> I'd say this is a mix between NETLINK_INET_DIAG and NETLINK_NETFILTER.

Well, I don't see it, your code should have calls to
netlink_dump_start, and lots of calls to 
RTA_DATA(__RTA_PUT(skb, attrtype, attrlen))

ie the NETLINK_INET_DIAG reply is returend as a series of netlink
messages for inet_diag_msg structures with sub structures of things
like INET_DIAG_INFO/INET_DIAG_VEGASINFO/etc terminated by NLMSG_DONE.

What you have done is just concatenate rdma_cm_id_stats structures,
which is not extensible, doesn't have natural netlink message
boundaries to let userspace re-call recv, and introduces a 32/64 bit
issue (which is an big no-go).

So for QP's I'd imagine similar, a netlink message for each QP. Basic
information like QPN and RDMA device ID, then sub-structures like
QP_RDMA_CM (port numbers and IP addresses), QP_IB_CM (path records),
etc that include additional information provided by that service.

And like I said, maybe today you only dump the RDMACM table, but
the userspace API should be built to dump the entire QP table and
support QPs created without CM, with UCM, and with RDMACM, which is a
trivial API to build if you use netlink the way it was ment to be used.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to