Re: help with IB_WC_MW_BIND_ERR
Hello Eli, Or, Do you know who can help on this? NFSoRDMA hits this error case with Mellanox ConnectX-2 HCAs. Thanks Shirley On 05/20/2014 11:55 AM, Chuck Lever wrote: Hi- What does it mean when a LOCAL_INV work request fails with a IB_WC_MW_BIND_ERR completion? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
help with IB_WC_MW_BIND_ERR
Hi- What does it mean when a LOCAL_INV work request fails with a IB_WC_MW_BIND_ERR completion? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help with IB_WC_MW_BIND_ERR
On Tue, May 20, 2014 at 11:55 AM, Chuck Lever chuck.le...@oracle.com wrote: Hi- What does it mean when a LOCAL_INV work request fails with a IB_WC_MW_BIND_ERR completion? Mapping an IB error code has been a great pain (at least for me) unless you have access to the HCA firmware. In this case, I think it implies memory protection error (registration issues) say in cxgb4 driver, it is associated with invalidate shared MR or invalidate bound memory window (with a QP): case T4_ERR_INVALIDATE_SHARED_MR: case T4_ERR_INVALIDATE_MR_WITH_MW_BOUND: wc-status = IB_WC_MW_BIND_ERR; break; drivers/infiniband/hw/cxgb4/cq.c line 654 of 898 --72%-- col 11-25 You'll probably need to mention the HCA name so the firmware people, if they are reading this, could pinpoint the exact cause. -- Wendy -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help with IB_WC_MW_BIND_ERR
On May 20, 2014, at 3:39 PM, Wendy Cheng s.wendy.ch...@gmail.com wrote: On Tue, May 20, 2014 at 11:55 AM, Chuck Lever chuck.le...@oracle.com wrote: Hi- What does it mean when a LOCAL_INV work request fails with a IB_WC_MW_BIND_ERR completion? Mapping an IB error code has been a great pain (at least for me) unless you have access to the HCA firmware. In this case, I think it implies memory protection error (registration issues) say in cxgb4 driver, it is associated with invalidate shared MR or invalidate bound memory window (with a QP): case T4_ERR_INVALIDATE_SHARED_MR: case T4_ERR_INVALIDATE_MR_WITH_MW_BOUND: wc-status = IB_WC_MW_BIND_ERR; break; drivers/infiniband/hw/cxgb4/cq.c line 654 of 898 --72%-- col 11-25 You'll probably need to mention the HCA name so the firmware people, if they are reading this, could pinpoint the exact cause. Thanks. ConnectX-2, mlx4 provider. The IB architecture spec lists five conditions that could result in an IB_WC_MW_BIND_ERR completion of LOCAL_INV: 1. Memory access was attempted on an L_Key or R_Key that is in the Invalid State; 2. Memory Region could not be Invalidated, because it is a Shared Memory Region; 3. Memory Region can not be invalidated because it has bound Memory Window; or 4. Memory Region could not be Invalidated, because it was created through a Register Memory Region or Reregister Memory Region. 5. Memory Window could not be Invalidated, because it was a Type 1 Memory Window. This is with FRMR and the MR is not shared (I think?). So I expect I’m dealing with condition 1. But I can’t seem to make any more headway on confirming that, or how it got that way. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html