Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-14 Thread Eli Cohen
On Thu, Dec 10, 2009 at 02:49:45PM -0700, Jason Gunthorpe wrote:
> 
> - Change the library API for ibv_port_attr to include a link_layer
> - Change the kernel API to retrieve link_layer
> - Add ibv_cmd_get_mac and other stuff to support RDMAoE
> - Update verbs examples to support RDMAoE
> - Update man pages (you missed these)
> 

Will be addressed in the next patch set.
> >--- a/include/infiniband/kern-abi.h
> >+++ b/include/infiniband/kern-abi.h
> >@@ -46,7 +46,7 @@
> >  * The minimum and maximum kernel ABI that we can handle.
> >  */
> >  #define IB_USER_VERBS_MIN_ABI_VERSION  1
> > -#define IB_USER_VERBS_MAX_ABI_VERSION  6
> > +#define IB_USER_VERBS_MAX_ABI_VERSION  7
>  
> Whats this about? That seems like it needs a much bigger review,
> changing the kernel ABI version instantly breaks every existing
> libibverbs, shouldn't be done without alot of discussion!!
I think we can do without chagning the ABI version so I am going to
ommit it in the next patch set.

> 
> Extra include?

Yes, thanks.


> 
> >@@ -86,6 +86,7 @@ default_symver(__ibv_query_device, ibv_query_device);
> > int __ibv_query_port(struct ibv_context *context, uint8_t port_num,
> > struct ibv_port_attr *port_attr)
> > {
> >+   port_attr->link_layer = IBV_LINK_LAYER_UNSPECIFIED;
> >return context->ops.query_port(context, port_num, port_attr);
> > }
> 
> Seems like this should be just
>  return ___ibv_query_port(context,port_num,port_attr);
> 

A leftover, thanks.

> > diff --git a/drivers/infiniband/core/uverbs_cmd.c 
> > b/drivers/infiniband/core/uverbs_cmd.c
> > index 012aadf..d592bd2 100644
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -452,7 +452,8 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file 
> > *file,
> > resp.active_width= attr.active_width;
> > resp.active_speed= attr.active_speed;
> > resp.phys_state  = attr.phys_state;
> > -   resp.transport   = attr.transport;
> > +   resp.transport   = attr.transport == RDMA_TRANSPORT_RDMAOE ?
> > +   IB_LINK_LAYER_ETHERNET : IB_LINK_LAYER_INFINIBAND;
> 
> Are you going to change the kernel patches to use the new link_layer
> name?
> 

Yes, in the next patch set.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-14 Thread Eli Cohen
On Thu, Dec 10, 2009 at 01:50:00PM -0800, Sean Hefty wrote:
> >@@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct
> >ibv_device *ib_dev, int size,
> > return NULL;
> > }
> >
> >-memset(ctx->buf, 0, size);
> >+memset(ctx->buf, 0x7b + is_server, size);
> >
> > ctx->context = ibv_open_device(ib_dev);
> > if (!ctx->context) {
> >@@ -481,6 +489,7 @@ static void usage(const char *argv0)
> > printf("  -n, --iters=number of exchanges (default 1000)\n");
> > printf("  -l, --sl=  service level value\n");
> > printf("  -e, --events   sleep on CQ events (default poll)\n");
> >+printf("  -g, --gid= gid of the other port\n");
> 
> It seems easier for the user if the tests discovered its local GID and 
> exchanged
> it with the remote side like it does with the LID and QPN.
>  

Will be changed in the next patch set -- the user will specify the
index to the GID table of the local port.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


InfiniBand/RDMA merge plans for 2.6.33

2009-12-14 Thread Roland Dreier
Since 2.6.32 is already out, it's probably a good time (or at least I
shouldn't wait longer!) to talk about 2.6.33 merge plans.
Unfortunately I lost pretty much a week and a half after Thanksgiving
to the flu, so I'm pretty late and didn't get as much done as I hoped.
Anyway, all the pending things that I'm aware of are listed below.

I've merged and pushed out what I plan to merge in my for-next branch,
and everything has cooked a couple of days in -next, so I'll send a
pull request to Linus shortly.

Boilerplate:

If something isn't already in my tree and it isn't listed below, I
probably missed it or dropped it unintentionally.  Please remind me.

As usual, when submitting a patch:

 - Give a good changelog that explains what issue your patch
   addresses, how you address the issue, how serious the issue is, and
   any other information that would be useful to someone evaluating
   your patch or reading it years from now.

 - Please make sure that you include a "Signed-off-by:" line, and put
   any extra junk that should not go into the final kernel log *after*
   the "---" line so that git tools strip it off automatically.  Make
   the subject line be appropriate for inclusion in the kernel log as
   well once the leading "[PATCH ...]" stuff is stripped off.  I waste a
   lot of time fixing patches by hand that could otherwise be spent
   doing something productive like watching youtube.

 - Run your patch through checkpatch.pl so I don't have to nag you to
   fix trivial issues (or spend time fixing them myself).

 - Read your patch over so I don't see a memory leak or deadlock as
   soon as I look at it.

 - Build your patch with sparse checking ("C=2 CF=-D__CHECK_ENDIAN__")
   and make sure it doesn't introduce new warnings.  (A big bonus in
   goodwill for sending patches that fix old warnings)

 - Test your patch on a kernel with things like slab debugging and
   lockdep turned on.

And while you're waiting for me to get to your patch, I sure wouldn't
mind if you read and commented on someone else's patch.  None of this
means you shouldn't remind me about pending patches, since I often
lose track of things and drop them accidentally.

Core:

 - Merged a series improving IPv6 CMA support, with work from David
   Wilder, Jason Gunthorpe and Sean Hefty.

 - Merged a bunch of small fixes from various people, including a
   number found via code analysis.

ULPs:

 - Nothing major -- I think maybe one IPoIB fix and one iSER fix.

HW specific:

 - Merged a series of fixes from Frank Zago to make the error handling
   behavior of various post_send / post_recv implementations more
   consistent.

 - Merged a bunch of fixes and cleanups for the nes driver.  Nice to
   see continuing work to improve the code from Intel.

Here are a few topics that are not ready in time for the 2.6.33 window
and will need to wait for 2.6.34 at least:

 - Userspace MMU notifiers ("ummunotify").  When I requested that this
   be pulled earlier, I got feedback asking me to explore moving the
   userspace ABI to the perf events framework.  I haven't had time to
   do this yet, so things are kind of stalled.  I'm not convinced that
   perf events fit ummunotify particularly well, but we need to see.
   It also might be worth exploring again whether we can define a set
   of semantics tightly enough to do kernel-level MR caching to avoid
   the complications of defining a more general kernel facility.

 - SRP faster failover.  I haven't had a chance to look closely at the
   latest patches.  I still have the feeling that some corner cases
   are not being handled, but the main problem here was my lack of
   time.  If anyone else (Dave Dillow?) wants to weigh in, that would
   be appreciated too.

 - mlx4 SR-IOV support.  Again, main problem was my lack of time.  I
   agree in principle with this stuff, just want to be careful that we
   don't turn the mlx4 driver into an unmaintainable mess of "if
   (sriov) something; else something_else" all over.

 - New QLogic qib driver.  Needs at least one more iteration of
   cleanups; and I have not had time to look at the latest code in
   detail to see exactly what cleanups are needed.  I am concerned
   that QLogic chose to abandon the ipath driver as unmaintainable,
   and now wants to replace it with an even bigger driver (measured
   by lines of code) that does not support the HT device that ipath
   supported.  How can we make sure qib has a longer future?

 - Jack's XRC patch set.  I still need time to work through and clean
   up the code.  I am in the middle of adding reference counting to
   handle sharing XRC domains between processes, and once that is done
   I'll need to merge the mlx4 changes (and hopefully take some of the
   generic reference counting back into the core).  If someone else
   wants to take a stab, that would be fine by me.

 - IBoE.  In principle I think this is starting to get there.  Still
   want to see better ABI compatibility at least, and also make sur

Re: [ewg] Re: [PATCH] rdmaoe/libibverbs: handle binary compatibility

2009-12-14 Thread Jason Gunthorpe
On Mon, Dec 14, 2009 at 10:41:27AM +0200, Eli Cohen wrote:
> On Thu, Dec 10, 2009 at 01:50:00PM -0800, Sean Hefty wrote:
> > >@@ -306,7 +314,7 @@ static struct pingpong_context *pp_init_ctx(struct
> > >ibv_device *ib_dev, int size,
> > >   return NULL;
> > >   }
> > >
> > >-  memset(ctx->buf, 0, size);
> > >+  memset(ctx->buf, 0x7b + is_server, size);
> > >
> > >   ctx->context = ibv_open_device(ib_dev);
> > >   if (!ctx->context) {
> > >@@ -481,6 +489,7 @@ static void usage(const char *argv0)
> > >   printf("  -n, --iters=number of exchanges (default 1000)\n");
> > >   printf("  -l, --sl=  service level value\n");
> > >   printf("  -e, --events   sleep on CQ events (default poll)\n");
> > >+  printf("  -g, --gid= gid of the other port\n");
> > 
> > It seems easier for the user if the tests discovered its local GID and 
> > exchanged
> > it with the remote side like it does with the LID and QPN.
> >  
> 
> Will be changed in the next patch set -- the user will specify the
> index to the GID table of the local port.

It would be nice if the tools would consistently let you choose the
source device based on gid.. I've been meaning to make a patch for
verbs to add a ibv_open_device_by_gid() function or something like
that.

Using gid idx is pretty unfriendly ...

But it is just a test prorgam, no worries.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: InfiniBand/RDMA merge plans for 2.6.33

2009-12-14 Thread Ralph Campbell
On Mon, 2009-12-14 at 10:37 -0800, Roland Dreier wrote:

>  - New QLogic qib driver.  Needs at least one more iteration of
>cleanups; and I have not had time to look at the latest code in
>detail to see exactly what cleanups are needed.  I am concerned
>that QLogic chose to abandon the ipath driver as unmaintainable,
>and now wants to replace it with an even bigger driver (measured
>by lines of code) that does not support the HT device that ipath
>supported.  How can we make sure qib has a longer future?

I understand your frustration in having to deal with a large amount
of code. If you included all the Mellanox firmware in the mlx4
driver, it would be huge too. I'm limited in what I can do given
the complexity of the IBTA spec.

QLogic is not "abandoning the ipath driver as unmaintainable".
The thought was that trying to incrementally patch in all the changes
needed to support dual ports, QDR, chip register addresses, etc.
would result in larger patches than renaming the driver. It was a
chicken-and-egg problem because until the new code was fully
written and debugged, we couldn't post it and we couldn't patch
ipath until we knew all the places that needed to be changed.

We had a pretty strong business requirement to get support for 
our QDR product into OFED 1.5 and the IB interrop tests required
we get something into OFED. The chip schedule just didn't leave time
to debug everything, get it reviewed and then merge with OFED.
Also, I was out for 5 weeks and that delayed the submission to
linux-rdma.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRPT and SCST

2009-12-14 Thread Bart Van Assche
On Thu, Nov 5, 2009 at 9:51 AM, Philip Pokorny
 wrote:
>>
>> [ ... ]
>>
>> [ 8864.326702] <6>[0]: scst_queue_retry_cmd:1099:TGT QUEUE FULL:
>> incrementing retry_cmds 1
>> [ 8864.326709] <6>[0]: scst_queue_retry_cmd:1106:Some command(s) finished,
>> direct retry (finished_cmds=2023031, tgt->finished_cmds=2023137,
>> retry_cmds=0)
>> [ ... ]

(replying to an e-mail of one month ago)

The above issue should be fixed in SCST revision 1391, together with
the following changes:
* Support for OFED 1.5.
* The maximum supported value for the ib_srp kernel module parameter
srp_sg_tablesize has been increased from 58 to 128, at least when
ib_srpt is loaded with default parameters. Modifying the new ib_srpt
kernel module parameter srp_max_rdma_size makes it possible to support
even higher values of srp_sg_tablesize.
* Various small performance improvements.

The results I obtained with QDR HCA's, a RAM disk as target and a
vanilla 2.6.27.39 kernel + CFQ on the initiator system are:
* for asynchronous (buffered) I/O, writing speeds up to 1070 MB/s
(srp_sg_tablesize = 128; block size = 16 KB).
* for asynchronous (buffered) I/O, reading speeds up to 930 MB/s
(srp_sg_tablesize = 12; block size = 16 KB).
* for direct (non-buffered) I/O, writing speeds up to 1000 MB/s
(srp_sg_tablesize = 32; block size = 2 MB).
* for direct (non-buffered) I/O, reading speeds up to 1710 MB/s
(srp_sg_tablesize = 16; block size = 64 MB).

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SRP issues with OpenSM 3.3.3

2009-12-14 Thread Ira Weiny
Sasha, Hal,

I have found that the following patch caused our SRP connected storage to
break.

patch: 3d20f82edd3246879063b77721d0bcef927bdc48

opensm/osm_sa_path_record.c: separate router guid resolution code

Move off subnet destination (router address) resolution code to separate
function to improve readability.

Signed-off-by: Sasha Khapyorsky 

I further traced the problem to pr_rcv_build_pr and the patch below fixes the
bug.

16:03:34 > git diff 
diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index be0cd71..32c889f 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
osm_port_t * p_src_port,
 
/* Only set HopLimit if going through a router */
if (is_nonzero_gid)
-   p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
+   p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
 
p_pr->pkey = p_parms->pkey;
ib_path_rec_set_sl(p_pr, p_parms->sl);


"hop_flow_raw" is not really a 32bit value but rather 4 fields:
Component Length(bits)   Offset(bits)
RawTraffic1  352
Reserved  3  353
FlowLabel 20 356
HopLimit  8  384


However, I don't understand the comment "Only set HopLimit if going through a
router"?

Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
through a router?  I don't think it matters if we are going through a router
or not does it?  If not, I think the comment needs to be removed in the patch
above.

Thanks,
Ira

-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html