Re: [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events
On 30/08/2015 21:23, Sagi Grimberg wrote: > > Looks like for some reason cm_get_bth_pkey got pkey_index of 0x > instead of 0 (working on the default pkey 0x at entry 0). It looks like the mlx5 driver doesn't interpret the completion format correctly. It takes a field defined in the programmer reference manual as pkey, and interprets it as pkey_index [1]. > log: > infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port > 1, pkey index 65535). -22 > ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id > 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 > (guid=0xfe80:0x2c90300ed0950) > ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started > infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port > 1, pkey index 65535). -22 > ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id > 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 > (guid=0xfe80:0x2c90300ed0950) > ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started > mlx5_0:dump_cqe:238:(pid 8584): dump error cqe > > > 002b > 94003004 002c b8e0 > ib_srpt receiving failed for idx 0 with status 4 > :04:00.0:poll_health:151:(pid 0): device's health compromised > assert_var[0] 0x0094 > assert_var[1] 0x > assert_var[2] 0x > assert_var[3] 0x > assert_var[4] 0x > assert_exit_ptr 0x0061d35c > assert_callra 0x0067a5f4 > fw_ver 0xa0641900 > hw_id 0x01ff > irisc_index 2 > synd 0x1: firmware internal error > ext_sync 0x > :04:00.0:health_care:76:(pid 7943): handling bad device here > ib_srpt Received DREQ and sent DREP for session > 0x0002c90300ed0960. > ib_srpt Received DREQ and sent DREP for session > 0x0002c90300ed0960. > ib_srpt Received IB TimeWait exit for cm_id 88046d1fb200. > ib_srpt Received IB TimeWait exit for cm_id 880454ffa000. > ib_srpt Session 0x0002c90300ed0960: kernel thread > ib_srpt_compl (PID 8585) stopped I don't know how that can cause all the other errors though. Haggai [1] http://lxr.free-electrons.com/source/drivers/infiniband/hw/mlx5/cq.c?v=4.1#L230 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 problems with 4.2-rc8
On 8/31/2015 1:38 AM, Doug Ledford wrote: On 08/29/2015 09:13 PM, Or Gerlitz wrote: On Fri, Aug 28, 2015 at 10:27 PM, Doug Ledfordwrote: I'm seeing this with rc8 on a dual port mlx4 adapter set to IB/Eth mode: mmm, both Amir and myself are just finishing vacations... so WB notes are not always lovely as you want them to be, life [ 77.883513] IPv6: ADDRCONF(NETDEV_UP): mlx4_roce: link is not ready [ 77.892044] mlx4_en: mlx4_roce: frag:0 - size:1518 prefix:0 stride:1536 [ 77.903129] genirq: Flags mismatch irq 135. (mlx4-65@:05:00.0) vs. (mlx4-65@:05:00.0) is this strict regression from some known point in the past on this system/config -- i.e 4.1 or 4.2-rc1?! Yes. When I was submitting the 4.2-rc changes this machine worked. This is one of my IB/Eth SRIOV machines. I tested with SRIOV disabled and it didn't effect things. Can you please send the mlx4 driver output when you load it with debug prints on? also do things work if you set the ports type to be ib/ib or eth/eth? It should work as ib/ib given that in ib/eth mode the ib port works. I doubt eth/eth would work, but I'll try and see. OK, Eth/Eth mode fails too (at least on the second port, I can say on the first port for certain as I can't bring it up, it's still plugged into an IB switch). However, now in Eth/Eth mode, attempts to bring up the interface manually at the command line have hung, which it didn't do in IB/Eth mode. I'll try to ping things down further, but that's what I have so far. And as requested, the config is attached. send us your compressed .config Matan, any idea what goes wrong here? Or. [ 77.914965] CPU: 0 PID: 1541 Comm: NetworkManager Not tainted 4.2.0-rc8 #58 [ 77.923292] Hardware name: Dell Inc. PowerEdge R820/04K5X5, BIOS 2.2.3 07/09/2014 [ 77.932205] c16e3ce1 8820365ab498 8167e6ff [ 77.941072] 8820339e9a00 8820365ab4f8 810d2b6e [ 77.949938] 0246 881032e67aa4 881035e10ba0 c16e3ce1 [ 77.958812] Call Trace: [ 77.962109] [] dump_stack+0x45/0x57 [ 77.968412] [] __setup_irq+0x51e/0x590 [ 77.975018] [] ? mlx4_interrupt+0x80/0x80 [mlx4_core] [ 77.983072] [] request_threaded_irq+0xf4/0x1a0 [ 77.990468] [] mlx4_assign_eq+0x135/0x360 [mlx4_core] [ 77.998513] [] mlx4_en_activate_cq+0x2a7/0x310 [mlx4_en] [ 78.006853] [] ? alloc_cpumask_var_node+0x28/0x40 [ 78.014542] [] ? find_next_bit+0x19/0x20 [ 78.021334] [] ? cpumask_next_and+0x34/0x50 [ 78.028425] [] mlx4_en_start_port+0x1bb/0xb60 [mlx4_en] [ 78.036689] [] ? mlx4_free_cmd_mailbox+0x31/0x40 [mlx4_core] [ 78.045435] [] mlx4_en_open+0x349/0x630 [mlx4_en] [ 78.053107] [] __dev_open+0xc9/0x140 [ 78.059538] [] __dev_change_flags+0xa1/0x160 [ 78.066718] [] dev_change_flags+0x29/0x60 [ 78.073602] [] do_setlink+0x5be/0xa70 [ 78.080097] [] ? mga_imageblit+0x2f/0x40 [mgag200] [ 78.087859] [] ? mga_dirty_update+0x1e6/0x2f0 [mgag200] [ 78.096112] [] ? mga_imageblit+0x2f/0x40 [mgag200] [ 78.103873] [] rtnl_newlink+0x4f0/0x880 [ 78.110586] [] ? rtnl_newlink+0xf3/0x880 [ 78.117372] [] ? security_capable+0x48/0x60 [ 78.124452] [] ? ns_capable+0x2d/0x60 [ 78.130950] [] rtnetlink_rcv_msg+0xa4/0x250 [ 78.138028] [] ? sock_has_perm+0x70/0x90 [ 78.144824] [] ? rtnetlink_rcv+0x40/0x40 [ 78.151615] [] netlink_rcv_skb+0xaf/0xc0 [ 78.158425] [] rtnetlink_rcv+0x2c/0x40 [ 78.164997] [] netlink_unicast+0x101/0x1f0 [ 78.171937] [] netlink_sendmsg+0x401/0x660 [ 78.178867] [] sock_sendmsg+0x38/0x50 [ 78.185335] [] ___sys_sendmsg+0x275/0x290 [ 78.192176] [] ? sysctl_head_finish+0x46/0x50 [ 78.199411] [] ? proc_sys_call_handler+0x88/0xe0 [ 78.206946] [] ? lockref_put_or_lock+0x4c/0x80 [ 78.214296] [] __sys_sendmsg+0x57/0xa0 [ 78.220878] [] SyS_sendmsg+0x12/0x20 [ 78.227283] [] entry_SYSCALL_64_fastpath+0x12/0x71 [ 78.235114] mlx4_en :05:00.0: Failed assigning an EQ to \xfff\xffb6Z6 \xff88\x\x\xff84\xffa20\xff81\x\x\x\x [ 78.243732] mlx4_en: mlx4_roce: Failed activating Rx CQ [ 78.319027] mlx4_en: mlx4_roce: Failed starting port:2 The interface in question is unusable. -- Doug Ledford GPG KeyID: 0E572FDD Actually, it looks like the dump stack we've got before [1] was fixed. This happens when the mlx4 driver is used in setups where number of cores >= 32. Doug, is that the case? [1] http://www.spinics.net/lists/netdev/msg341171.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mlx5: Fix incorrect wc pkey_index assignment for GSI messages
Since patch series "Demux IB CM requests in the rdma_cm module" the P_Key index is taken from the work completion rather than the message itself (see http://www.spinics.net/lists/netdev/msg335599.html). The HCA provides us with the message P_Key. In order to provide the P_Key index, we need to look it up. Given that this is relevant only for GSI messages (session establishments) which is less performance critical, micro-optimize against the GSI (is_qp1) branch. Signed-off-by: Sagi GrimbergCc: Haggai Eran --- drivers/infiniband/hw/mlx5/cq.c | 10 +- drivers/infiniband/hw/mlx5/mlx5_ib.h |5 + drivers/infiniband/hw/mlx5/qp.c |5 - 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 640c54e..3dfd287 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "mlx5_ib.h" #include "user.h" @@ -227,7 +228,14 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe, wc->dlid_path_bits = cqe->ml_path; g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3; wc->wc_flags |= g ? IB_WC_GRH : 0; - wc->pkey_index = be32_to_cpu(cqe->imm_inval_pkey) & 0x; + if (unlikely(is_qp1(qp->ibqp.qp_type))) { + u16 pkey = be32_to_cpu(cqe->imm_inval_pkey) & 0x; + + ib_find_cached_pkey(>ib_dev, qp->port, pkey, + >pkey_index); + } else { + wc->pkey_index = 0; + } } static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index fc987fe..a4ef6a7 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -656,6 +656,11 @@ static inline u8 convert_access(int acc) MLX5_PERM_LOCAL_READ; } +static inline int is_qp1(enum ib_qp_type qp_type) +{ + return qp_type == IB_QPT_GSI; +} + #define MLX5_MAX_UMR_SHIFT 16 #define MLX5_MAX_UMR_PAGES (1 << MLX5_MAX_UMR_SHIFT) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 9380d2d..8c51ea3 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -76,11 +76,6 @@ static int is_qp0(enum ib_qp_type qp_type) return qp_type == IB_QPT_SMI; } -static int is_qp1(enum ib_qp_type qp_type) -{ - return qp_type == IB_QPT_GSI; -} - static int is_sqp(enum ib_qp_type qp_type) { return is_qp0(qp_type) || is_qp1(qp_type); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/49] IB/core: Add header definitions
Hal, I'm working on a couple of patches to address these comments. I will be submitting them in the next day or so. On Wed, Jun 17, 2015 at 10:12:41AM -0400, Hal Rosenstock wrote: > On 6/17/2015 8:28 AM, Mike Marciniszyn wrote: > > From: Ira Weiny> > > > Add common OPA header definitions for driver > > build: > > - opa_port_info.h > > - opa_smi.h > > - hfi1_user.sh > > > > Additionally, ib_mad.h, has additional definitions > > that are common to ib_drivers including: > > - trap support > > - cca support > > > > The qib driver has the duplication removed in favor > > those in ib_mad.h > > > > Reviewed-by: Mike Marciniszyn > > Reviewed-by: John, Jubin > > Signed-off-by: Ira Weiny > > --- > > drivers/infiniband/hw/qib/qib_mad.h | 147 +--- > > include/rdma/ib_mad.h | 138 +++ > > include/rdma/opa_port_info.h| 433 > > +++ > > Should opa_port_info.h be in include/rdma or in drivers/infiniband/hw/hfi1 ? This file and opa_smi.h were placed here following the pattern of the same ib_*.h files. Indeed because there is currently only 1 OPA driver it could be moved to the hfi1 driver directory. However, I prefer to keep them in rdma. If Doug prefers I can send a patch to move them. > > + > > +/* > > + * Generic trap/notice producers > > + */ > > +#define IB_NOTICE_PROD_CA cpu_to_be16(1) > > +#define IB_NOTICE_PROD_SWITCH cpu_to_be16(2) > > +#define IB_NOTICE_PROD_ROUTER cpu_to_be16(3) > > +#define IB_NOTICE_PROD_CLASS_MGR cpu_to_be16(4) > > + > > +/* > > + * Generic trap/notice numbers > > SM Class trap/notice numbers > > As such, should they be in ib_smi.h rather than ib_mad.h ? Fixed in my patch series. > > > + */ > > +#define IB_NOTICE_TRAP_LLI_THRESH cpu_to_be16(129) > > +#define IB_NOTICE_TRAP_EBO_THRESH cpu_to_be16(130) > > +#define IB_NOTICE_TRAP_FLOW_UPDATE cpu_to_be16(131) > > +#define IB_NOTICE_TRAP_CAP_MASK_CHGcpu_to_be16(144) > > +#define IB_NOTICE_TRAP_SYS_GUID_CHGcpu_to_be16(145) > > +#define IB_NOTICE_TRAP_BAD_MKEYcpu_to_be16(256) > > +#define IB_NOTICE_TRAP_BAD_PKEYcpu_to_be16(257) > > +#define IB_NOTICE_TRAP_BAD_QKEYcpu_to_be16(258) > > + > > +/* > > + * Repress trap/notice flags > > + */ > > +#define IB_NOTICE_REPRESS_LLI_THRESH (1 << 0) > > +#define IB_NOTICE_REPRESS_EBO_THRESH (1 << 1) > > +#define IB_NOTICE_REPRESS_FLOW_UPDATE (1 << 2) > > +#define IB_NOTICE_REPRESS_CAP_MASK_CHG (1 << 3) > > +#define IB_NOTICE_REPRESS_SYS_GUID_CHG (1 << 4) > > +#define IB_NOTICE_REPRESS_BAD_MKEY (1 << 5) > > +#define IB_NOTICE_REPRESS_BAD_PKEY (1 << 6) > > +#define IB_NOTICE_REPRESS_BAD_QKEY (1 << 7) > > What does this correspond to ? Is this some standard thing or are these > defines driver specific ? > Fixed in my patch series. > > > + > > +/* > > + * Generic trap/notice other local changes flags (trap 144). > > SM Class trap/notice other local changes flags (trap 144) > > As such, should they be in ib_smi.h rather than ib_mad.h ? Fixed in my patch series. > > > + */ > > +#define IB_NOTICE_TRAP_LSE_CHG 0x04/* Link Speed Enable > > changed */ > > +#define IB_NOTICE_TRAP_LWE_CHG 0x02/* Link Width Enable > > changed */ > > +#define IB_NOTICE_TRAP_NODE_DESC_CHG 0x01 > > + > > +/* > > + * Generic trap/notice M_Key volation flags in dr_trunc_hop (trap 256). > > SM Class trap/notice M_Key violation flags in dr_trunc_hop (trap 256) > > As such, should they be in ib_smi.h rather than ib_mad.h ? Fixed in my patch series. > > > + */ > > +#define IB_NOTICE_TRAP_DR_NOTICE 0x80 > > +#define IB_NOTICE_TRAP_DR_TRUNC0x40 > > + > > enum { > > IB_MGMT_MAD_HDR = 24, > > IB_MGMT_MAD_DATA = 232, > > @@ -240,6 +294,90 @@ struct ib_class_port_info { > > __be32 trap_qkey; > > }; > > > > +struct ib_node_info { > > + u8 base_version; > > + u8 class_version; > > + u8 node_type; > > + u8 num_ports; > > + __be64 sys_guid; > > + __be64 node_guid; > > + __be64 port_guid; > > + __be16 partition_cap; > > + __be16 device_id; > > + __be32 revision; > > + u8 local_port_num; > > + u8 vendor_id[3]; > > +} __packed; > > This is SM attribute. Should it go into ib_smi.h like ib_port_info ? Fixed in my patch series. > > > + > > +struct ib_mad_notice_attr { > > + u8 generic_type; > > + u8 prod_type_msb; > > + __be16 prod_type_lsb; > > + __be16 trap_num; > > + __be16 issuer_lid; > > + __be16 toggle_count; > > + > > + union { > > + struct { > > + u8 details[54]; > > + } raw_data; > > + > > + struct { > > + __be16 reserved; > > + __be16 lid;/* where violation happened */ > > + u8 port_num;
Re: shrink struct ib_send_wr V3
On Sun, Aug 30, 2015 at 06:31:35PM +0300, Sagi Grimberg wrote: >> - patch 2 now explicitly replaces the weird overloading in the mlx5 >> driver with an explicit embedding of struct ib_send_wr, similar >> to what we do for all other MRs. > > That's nice, > > There is one non-trivial spot that was missed in mlx5_ib_post_send > though: Oh, that was a weird abuse of the old casts. I've foled both your fixes and force pushed to the wr-cleanup branch. I do not plan to resend the series until the merge window for 4.4 is open. Doug, any chance you could pick up the first patch in the series for 4.3-rc? It's marked for stable as well. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mlx5: Fix incorrect wc pkey_index assignment for GSI messages
On Mon, Aug 31, 2015 at 6:24 PM, Sagi Grimbergwrote: > Since patch series "Demux IB CM requests in the rdma_cm module" the > P_Key index is taken from the work completion rather than the message itself so prior to this series nobody in the IB core (and maybe across the whole upstream kernel) uses ib_wc->pkey_index?! > (see http://www.spinics.net/lists/netdev/msg335599.html). better to have pointer here to upstream commit and not to an archive URL which is possibly gonna die some day > The HCA provides us with the message P_Key. In order > to provide the P_Key index, we need to look it up. Given > that this is relevant only for GSI messages (session establishments) > which is less performance critical, micro-optimize against the GSI > (is_qp1) branch. > > Signed-off-by: Sagi Grimberg > Cc: Haggai Eran > --- > drivers/infiniband/hw/mlx5/cq.c | 10 +- > drivers/infiniband/hw/mlx5/mlx5_ib.h |5 + > drivers/infiniband/hw/mlx5/qp.c |5 - > 3 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c > index 640c54e..3dfd287 100644 > --- a/drivers/infiniband/hw/mlx5/cq.c > +++ b/drivers/infiniband/hw/mlx5/cq.c > @@ -33,6 +33,7 @@ > #include > #include > #include > +#include > #include "mlx5_ib.h" > #include "user.h" > > @@ -227,7 +228,14 @@ static void handle_responder(struct ib_wc *wc, struct > mlx5_cqe64 *cqe, > wc->dlid_path_bits = cqe->ml_path; > g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3; > wc->wc_flags |= g ? IB_WC_GRH : 0; > - wc->pkey_index = be32_to_cpu(cqe->imm_inval_pkey) & 0x; > + if (unlikely(is_qp1(qp->ibqp.qp_type))) { > + u16 pkey = be32_to_cpu(cqe->imm_inval_pkey) & 0x; > + > + ib_find_cached_pkey(>ib_dev, qp->port, pkey, > + >pkey_index); > + } else { > + wc->pkey_index = 0; > + } > } > > static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe) > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h > b/drivers/infiniband/hw/mlx5/mlx5_ib.h > index fc987fe..a4ef6a7 100644 > --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h > @@ -656,6 +656,11 @@ static inline u8 convert_access(int acc) >MLX5_PERM_LOCAL_READ; > } > > +static inline int is_qp1(enum ib_qp_type qp_type) > +{ > + return qp_type == IB_QPT_GSI; > +} > + > #define MLX5_MAX_UMR_SHIFT 16 > #define MLX5_MAX_UMR_PAGES (1 << MLX5_MAX_UMR_SHIFT) > > diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c > index 9380d2d..8c51ea3 100644 > --- a/drivers/infiniband/hw/mlx5/qp.c > +++ b/drivers/infiniband/hw/mlx5/qp.c > @@ -76,11 +76,6 @@ static int is_qp0(enum ib_qp_type qp_type) > return qp_type == IB_QPT_SMI; > } > > -static int is_qp1(enum ib_qp_type qp_type) > -{ > - return qp_type == IB_QPT_GSI; > -} > - > static int is_sqp(enum ib_qp_type qp_type) > { > return is_qp0(qp_type) || is_qp1(qp_type); > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 problems with 4.2-rc8
On Mon, Aug 31, 2015 at 4:02 PM, Doug Ledfordwrote: > On 08/31/2015 03:09 AM, Matan Barak wrote: >> Actually, it looks like the dump stack we've got before [1] was fixed. >> This happens when the mlx4 driver is used in setups where number of >> cores >= 32. >> Doug, is that the case? > Indeed, 48 cores on this machine. so do we have bingo here? the patch is in the net-next tree (and we can't put it in 4.2 only through -stable since 4.2 is released by now), does it solves the problem? Or. >> [1] http://www.spinics.net/lists/netdev/msg341171.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 problems with 4.2-rc8
On 08/31/2015 04:21 PM, Or Gerlitz wrote: > On Mon, Aug 31, 2015 at 4:02 PM, Doug Ledfordwrote: >> On 08/31/2015 03:09 AM, Matan Barak wrote: > >>> Actually, it looks like the dump stack we've got before [1] was fixed. >>> This happens when the mlx4 driver is used in setups where number of >>> cores >= 32. > >>> Doug, is that the case? > >> Indeed, 48 cores on this machine. > > so do we have bingo here? the patch is in the net-next tree (and we > can't put it in 4.2 only through -stable since 4.2 is released by > now), does it solves the problem? Yes, it solved the problem. I pulled the patch into my testing branch to confirm. -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH] IB/cma: Fix net_dev reference leak with failed requests
On 08/31/2015 11:20 AM, Weiny, Ira wrote: >> >> On 08/30/2015 02:12 AM, Or Gerlitz wrote: >>> On Thu, Aug 27, 2015 at 5:55 AM, Haggai Eran>> wrote: When no matching listening ID is found for a given request, the net_dev that was used to find the request isn't released. Fixes: 20c36836ecad ("IB/cma: Use found net_dev for passive connections") >>> >>> same here, Doug, >> >> Same as the last email, I have the commit ID now, and I fixed up the commit >> message. >> > > Doug I'm working on the clean up Hal suggested to the ib_mad.h file which was > updated in your to-be-rebased-4.3 branch via this patch. > > Fixes: abde0260e47b ("IB/core: Add header definitions") > > It looks like this is the patch destined for 4.3 on this branch k.o/for-4.3? > > Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA") Correct. I squashed the first two patches (which both touched core files) down to just one. > I personally did not mind the rebasing except for this issue. > > Let me know which branch I should base these changes off of. Base if off of the k.o/for-4.3. That's where it will go. I'll just end up applying this to the top of the stack. -- Doug Ledford GPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: shrink struct ib_send_wr V3
On 08/31/2015 12:11 PM, Christoph Hellwig wrote: > On Sun, Aug 30, 2015 at 06:31:35PM +0300, Sagi Grimberg wrote: >>> - patch 2 now explicitly replaces the weird overloading in the mlx5 >>> driver with an explicit embedding of struct ib_send_wr, similar >>> to what we do for all other MRs. >> >> That's nice, >> >> There is one non-trivial spot that was missed in mlx5_ib_post_send >> though: > > Oh, that was a weird abuse of the old casts. > > I've foled both your fixes and force pushed to the wr-cleanup branch. > > I do not plan to resend the series until the merge window for 4.4 > is open. Doug, any chance you could pick up the first patch in the > series for 4.3-rc? It's marked for stable as well. Yes, I can do that. -- Doug LedfordGPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events
On 8/31/2015 9:50 AM, Haggai Eran wrote: On 30/08/2015 21:23, Sagi Grimberg wrote: Looks like for some reason cm_get_bth_pkey got pkey_index of 0x instead of 0 (working on the default pkey 0x at entry 0). It looks like the mlx5 driver doesn't interpret the completion format correctly. It takes a field defined in the programmer reference manual as pkey, and interprets it as pkey_index [1]. You're right! I wonder how this ever used to work (and it did...). So the driver needs to lookup a pkey_index on each GSI packet? log: infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, pkey index 65535). -22 ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 (guid=0xfe80:0x2c90300ed0950) ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, pkey index 65535). -22 ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 (guid=0xfe80:0x2c90300ed0950) ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started mlx5_0:dump_cqe:238:(pid 8584): dump error cqe 002b 94003004 002c b8e0 ib_srpt receiving failed for idx 0 with status 4 :04:00.0:poll_health:151:(pid 0): device's health compromised assert_var[0] 0x0094 assert_var[1] 0x assert_var[2] 0x assert_var[3] 0x assert_var[4] 0x assert_exit_ptr 0x0061d35c assert_callra 0x0067a5f4 fw_ver 0xa0641900 hw_id 0x01ff irisc_index 2 synd 0x1: firmware internal error ext_sync 0x :04:00.0:health_care:76:(pid 7943): handling bad device here ib_srpt Received DREQ and sent DREP for session 0x0002c90300ed0960. ib_srpt Received DREQ and sent DREP for session 0x0002c90300ed0960. ib_srpt Received IB TimeWait exit for cm_id 88046d1fb200. ib_srpt Received IB TimeWait exit for cm_id 880454ffa000. ib_srpt Session 0x0002c90300ed0960: kernel thread ib_srpt_compl (PID 8585) stopped I don't know how that can cause all the other errors though. Me neither... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html