Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility
On Thu, Dec 10, 2009 at 02:49:45PM -0700, Jason Gunthorpe wrote: > > - Change the library API for ibv_port_attr to include a link_layer > - Change the kernel API to retrieve link_layer > - Add ibv_cmd_get_mac and other stuff to support RDMAoE > - Update verbs examples to support RDMAoE > - Update man pages (you missed these) > Will be addressed in the next patch set. > >--- a/include/infiniband/kern-abi.h > >+++ b/include/infiniband/kern-abi.h > >@@ -46,7 +46,7 @@ > > * The minimum and maximum kernel ABI that we can handle. > > */ > > #define IB_USER_VERBS_MIN_ABI_VERSION 1 > > -#define IB_USER_VERBS_MAX_ABI_VERSION 6 > > +#define IB_USER_VERBS_MAX_ABI_VERSION 7 > > Whats this about? That seems like it needs a much bigger review, > changing the kernel ABI version instantly breaks every existing > libibverbs, shouldn't be done without alot of discussion!! I think we can do without chagning the ABI version so I am going to ommit it in the next patch set. > > Extra include? Yes, thanks. > > >@@ -86,6 +86,7 @@ default_symver(__ibv_query_device, ibv_query_device); > > int __ibv_query_port(struct ibv_context *context, uint8_t port_num, > > struct ibv_port_attr *port_attr) > > { > >+ port_attr->link_layer = IBV_LINK_LAYER_UNSPECIFIED; > >return context->ops.query_port(context, port_num, port_attr); > > } > > Seems like this should be just > return ___ibv_query_port(context,port_num,port_attr); > A leftover, thanks. > > diff --git a/drivers/infiniband/core/uverbs_cmd.c > > b/drivers/infiniband/core/uverbs_cmd.c > > index 012aadf..d592bd2 100644 > > +++ b/drivers/infiniband/core/uverbs_cmd.c > > @@ -452,7 +452,8 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file > > *file, > > resp.active_width= attr.active_width; > > resp.active_speed= attr.active_speed; > > resp.phys_state = attr.phys_state; > > - resp.transport = attr.transport; > > + resp.transport = attr.transport == RDMA_TRANSPORT_RDMAOE ? > > + IB_LINK_LAYER_ETHERNET : IB_LINK_LAYER_INFINIBAND; > > Are you going to change the kernel patches to use the new link_layer > name? > Yes, in the next patch set. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility
On Thu, Dec 10, 2009 at 01:50:00PM -0800, Sean Hefty wrote: > >@@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct > >ibv_device *ib_dev, int size, > > return NULL; > > } > > > >-memset(ctx->buf, 0, size); > >+memset(ctx->buf, 0x7b + is_server, size); > > > > ctx->context = ibv_open_device(ib_dev); > > if (!ctx->context) { > >@@ -481,6 +489,7 @@ static void usage(const char *argv0) > > printf(" -n, --iters=number of exchanges (default 1000)\n"); > > printf(" -l, --sl= service level value\n"); > > printf(" -e, --events sleep on CQ events (default poll)\n"); > >+printf(" -g, --gid= gid of the other port\n"); > > It seems easier for the user if the tests discovered its local GID and > exchanged > it with the remote side like it does with the LID and QPN. > Will be changed in the next patch set -- the user will specify the index to the GID table of the local port. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
InfiniBand/RDMA merge plans for 2.6.33
Since 2.6.32 is already out, it's probably a good time (or at least I shouldn't wait longer!) to talk about 2.6.33 merge plans. Unfortunately I lost pretty much a week and a half after Thanksgiving to the flu, so I'm pretty late and didn't get as much done as I hoped. Anyway, all the pending things that I'm aware of are listed below. I've merged and pushed out what I plan to merge in my for-next branch, and everything has cooked a couple of days in -next, so I'll send a pull request to Linus shortly. Boilerplate: If something isn't already in my tree and it isn't listed below, I probably missed it or dropped it unintentionally. Please remind me. As usual, when submitting a patch: - Give a good changelog that explains what issue your patch addresses, how you address the issue, how serious the issue is, and any other information that would be useful to someone evaluating your patch or reading it years from now. - Please make sure that you include a "Signed-off-by:" line, and put any extra junk that should not go into the final kernel log *after* the "---" line so that git tools strip it off automatically. Make the subject line be appropriate for inclusion in the kernel log as well once the leading "[PATCH ...]" stuff is stripped off. I waste a lot of time fixing patches by hand that could otherwise be spent doing something productive like watching youtube. - Run your patch through checkpatch.pl so I don't have to nag you to fix trivial issues (or spend time fixing them myself). - Read your patch over so I don't see a memory leak or deadlock as soon as I look at it. - Build your patch with sparse checking ("C=2 CF=-D__CHECK_ENDIAN__") and make sure it doesn't introduce new warnings. (A big bonus in goodwill for sending patches that fix old warnings) - Test your patch on a kernel with things like slab debugging and lockdep turned on. And while you're waiting for me to get to your patch, I sure wouldn't mind if you read and commented on someone else's patch. None of this means you shouldn't remind me about pending patches, since I often lose track of things and drop them accidentally. Core: - Merged a series improving IPv6 CMA support, with work from David Wilder, Jason Gunthorpe and Sean Hefty. - Merged a bunch of small fixes from various people, including a number found via code analysis. ULPs: - Nothing major -- I think maybe one IPoIB fix and one iSER fix. HW specific: - Merged a series of fixes from Frank Zago to make the error handling behavior of various post_send / post_recv implementations more consistent. - Merged a bunch of fixes and cleanups for the nes driver. Nice to see continuing work to improve the code from Intel. Here are a few topics that are not ready in time for the 2.6.33 window and will need to wait for 2.6.34 at least: - Userspace MMU notifiers ("ummunotify"). When I requested that this be pulled earlier, I got feedback asking me to explore moving the userspace ABI to the perf events framework. I haven't had time to do this yet, so things are kind of stalled. I'm not convinced that perf events fit ummunotify particularly well, but we need to see. It also might be worth exploring again whether we can define a set of semantics tightly enough to do kernel-level MR caching to avoid the complications of defining a more general kernel facility. - SRP faster failover. I haven't had a chance to look closely at the latest patches. I still have the feeling that some corner cases are not being handled, but the main problem here was my lack of time. If anyone else (Dave Dillow?) wants to weigh in, that would be appreciated too. - mlx4 SR-IOV support. Again, main problem was my lack of time. I agree in principle with this stuff, just want to be careful that we don't turn the mlx4 driver into an unmaintainable mess of "if (sriov) something; else something_else" all over. - New QLogic qib driver. Needs at least one more iteration of cleanups; and I have not had time to look at the latest code in detail to see exactly what cleanups are needed. I am concerned that QLogic chose to abandon the ipath driver as unmaintainable, and now wants to replace it with an even bigger driver (measured by lines of code) that does not support the HT device that ipath supported. How can we make sure qib has a longer future? - Jack's XRC patch set. I still need time to work through and clean up the code. I am in the middle of adding reference counting to handle sharing XRC domains between processes, and once that is done I'll need to merge the mlx4 changes (and hopefully take some of the generic reference counting back into the core). If someone else wants to take a stab, that would be fine by me. - IBoE. In principle I think this is starting to get there. Still want to see better ABI compatibility at least, and also make sur
Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility
On Mon, Dec 14, 2009 at 10:41:27AM +0200, Eli Cohen wrote: > On Thu, Dec 10, 2009 at 01:50:00PM -0800, Sean Hefty wrote: > > >@@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct > > >ibv_device *ib_dev, int size, > > > return NULL; > > > } > > > > > >- memset(ctx->buf, 0, size); > > >+ memset(ctx->buf, 0x7b + is_server, size); > > > > > > ctx->context = ibv_open_device(ib_dev); > > > if (!ctx->context) { > > >@@ -481,6 +489,7 @@ static void usage(const char *argv0) > > > printf(" -n, --iters=number of exchanges (default 1000)\n"); > > > printf(" -l, --sl= service level value\n"); > > > printf(" -e, --events sleep on CQ events (default poll)\n"); > > >+ printf(" -g, --gid= gid of the other port\n"); > > > > It seems easier for the user if the tests discovered its local GID and > > exchanged > > it with the remote side like it does with the LID and QPN. > > > > Will be changed in the next patch set -- the user will specify the > index to the GID table of the local port. It would be nice if the tools would consistently let you choose the source device based on gid.. I've been meaning to make a patch for verbs to add a ibv_open_device_by_gid() function or something like that. Using gid idx is pretty unfriendly ... But it is just a test prorgam, no worries. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: InfiniBand/RDMA merge plans for 2.6.33
On Mon, 2009-12-14 at 10:37 -0800, Roland Dreier wrote: > - New QLogic qib driver. Needs at least one more iteration of >cleanups; and I have not had time to look at the latest code in >detail to see exactly what cleanups are needed. I am concerned >that QLogic chose to abandon the ipath driver as unmaintainable, >and now wants to replace it with an even bigger driver (measured >by lines of code) that does not support the HT device that ipath >supported. How can we make sure qib has a longer future? I understand your frustration in having to deal with a large amount of code. If you included all the Mellanox firmware in the mlx4 driver, it would be huge too. I'm limited in what I can do given the complexity of the IBTA spec. QLogic is not "abandoning the ipath driver as unmaintainable". The thought was that trying to incrementally patch in all the changes needed to support dual ports, QDR, chip register addresses, etc. would result in larger patches than renaming the driver. It was a chicken-and-egg problem because until the new code was fully written and debugged, we couldn't post it and we couldn't patch ipath until we knew all the places that needed to be changed. We had a pretty strong business requirement to get support for our QDR product into OFED 1.5 and the IB interrop tests required we get something into OFED. The chip schedule just didn't leave time to debug everything, get it reviewed and then merge with OFED. Also, I was out for 5 weeks and that delayed the submission to linux-rdma. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SRPT and SCST
On Thu, Nov 5, 2009 at 9:51 AM, Philip Pokorny wrote: >> >> [ ... ] >> >> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL: >> incrementing retry_cmds 1 >> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished, >> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137, >> retry_cmds=0) >> [ ... ] (replying to an e-mail of one month ago) The above issue should be fixed in SCST revision 1391, together with the following changes: * Support for OFED 1.5. * The maximum supported value for the ib_srp kernel module parameter srp_sg_tablesize has been increased from 58 to 128, at least when ib_srpt is loaded with default parameters. Modifying the new ib_srpt kernel module parameter srp_max_rdma_size makes it possible to support even higher values of srp_sg_tablesize. * Various small performance improvements. The results I obtained with QDR HCA's, a RAM disk as target and a vanilla 2.6.27.39 kernel + CFQ on the initiator system are: * for asynchronous (buffered) I/O, writing speeds up to 1070 MB/s (srp_sg_tablesize = 128; block size = 16 KB). * for asynchronous (buffered) I/O, reading speeds up to 930 MB/s (srp_sg_tablesize = 12; block size = 16 KB). * for direct (non-buffered) I/O, writing speeds up to 1000 MB/s (srp_sg_tablesize = 32; block size = 2 MB). * for direct (non-buffered) I/O, reading speeds up to 1710 MB/s (srp_sg_tablesize = 16; block size = 64 MB). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SRP issues with OpenSM 3.3.3
Sasha, Hal, I have found that the following patch caused our SRP connected storage to break. patch: 3d20f82edd3246879063b77721d0bcef927bdc48 opensm/osm_sa_path_record.c: separate router guid resolution code Move off subnet destination (router address) resolution code to separate function to improve readability. Signed-off-by: Sasha Khapyorsky I further traced the problem to pr_rcv_build_pr and the patch below fixes the bug. 16:03:34 > git diff diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c index be0cd71..32c889f 100644 --- a/opensm/opensm/osm_sa_path_record.c +++ b/opensm/opensm/osm_sa_path_record.c @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const osm_port_t * p_src_port, /* Only set HopLimit if going through a router */ if (is_nonzero_gid) - p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX); + p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX; p_pr->pkey = p_parms->pkey; ib_path_rec_set_sl(p_pr, p_parms->sl); "hop_flow_raw" is not really a 32bit value but rather 4 fields: Component Length(bits) Offset(bits) RawTraffic1 352 Reserved 3 353 FlowLabel 20 356 HopLimit 8 384 However, I don't understand the comment "Only set HopLimit if going through a router"? Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading through a router? I don't think it matters if we are going through a router or not does it? If not, I think the comment needs to be removed in the patch above. Thanks, Ira -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html