Re: help with IB_WC_MW_BIND_ERR

2014-06-26 Thread Shirley Ma
Hello Eli, Or,

Do you know who can help on this? NFSoRDMA hits this error case with Mellanox 
ConnectX-2 HCAs.

Thanks
Shirley

On 05/20/2014 11:55 AM, Chuck Lever wrote:
 Hi-
 
 What does it mean when a LOCAL_INV work request fails with a
 IB_WC_MW_BIND_ERR completion?
 
 --
 Chuck Lever
 chuck[dot]lever[at]oracle[dot]com
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help with IB_WC_MW_BIND_ERR

2014-05-20 Thread Wendy Cheng
On Tue, May 20, 2014 at 11:55 AM, Chuck Lever chuck.le...@oracle.com wrote:
 Hi-

 What does it mean when a LOCAL_INV work request fails with a
 IB_WC_MW_BIND_ERR completion?


Mapping an IB error code has been a great pain (at least for me)
unless you have access to the HCA firmware. In this case, I think it
implies memory protection error (registration issues) say in cxgb4
driver, it is associated with invalidate shared MR or invalidate bound
memory window (with a QP):

case T4_ERR_INVALIDATE_SHARED_MR:
case T4_ERR_INVALIDATE_MR_WITH_MW_BOUND:
wc-status = IB_WC_MW_BIND_ERR;
break;

drivers/infiniband/hw/cxgb4/cq.c line 654 of 898 --72%-- col 11-25

You'll probably need to mention the HCA name so the firmware people,
if they are reading this, could pinpoint the exact cause.

-- Wendy
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help with IB_WC_MW_BIND_ERR

2014-05-20 Thread Chuck Lever

On May 20, 2014, at 3:39 PM, Wendy Cheng s.wendy.ch...@gmail.com wrote:

 On Tue, May 20, 2014 at 11:55 AM, Chuck Lever chuck.le...@oracle.com wrote:
 Hi-
 
 What does it mean when a LOCAL_INV work request fails with a
 IB_WC_MW_BIND_ERR completion?
 
 
 Mapping an IB error code has been a great pain (at least for me)
 unless you have access to the HCA firmware. In this case, I think it
 implies memory protection error (registration issues) say in cxgb4
 driver, it is associated with invalidate shared MR or invalidate bound
 memory window (with a QP):
 
case T4_ERR_INVALIDATE_SHARED_MR:
case T4_ERR_INVALIDATE_MR_WITH_MW_BOUND:
wc-status = IB_WC_MW_BIND_ERR;
break;
 
 drivers/infiniband/hw/cxgb4/cq.c line 654 of 898 --72%-- col 11-25
 
 You'll probably need to mention the HCA name so the firmware people,
 if they are reading this, could pinpoint the exact cause.

Thanks. ConnectX-2, mlx4 provider.

The IB architecture spec lists five conditions that could result in an
IB_WC_MW_BIND_ERR completion of LOCAL_INV:

1. Memory access was attempted on an L_Key or R_Key that is in the
   Invalid State;

2. Memory Region could not be Invalidated, because it is a Shared
   Memory Region;

3. Memory Region can not be invalidated because it has bound Memory
   Window; or

4. Memory Region could not be Invalidated, because it was created
   through a Register Memory Region or Reregister Memory Region.

5. Memory Window could not be Invalidated, because it was a Type 1
   Memory Window.

This is with FRMR and the MR is not shared (I think?). So I expect I’m
dealing with condition 1. But I can’t seem to make any more headway on
confirming that, or how it got that way.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html