On 30/08/2015 21:23, Sagi Grimberg wrote: > > Looks like for some reason cm_get_bth_pkey got pkey_index of 0xffff > instead of 0 (working on the default pkey 0xffff at entry 0).
It looks like the mlx5 driver doesn't interpret the completion format correctly. It takes a field defined in the programmer reference manual as pkey, and interprets it as pkey_index [1]. > log: > infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port > 1, pkey index 65535). -22 > ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id > 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 > (guid=0xfe80000000000000:0x2c90300ed0950) > ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started > infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port > 1, pkey index 65535). -22 > ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id > 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 > (guid=0xfe80000000000000:0x2c90300ed0950) > ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started > mlx5_0:dump_cqe:238:(pid 8584): dump error cqe > 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 > 0000002b 00000000 00000000 00000000 > 00000000 94003004 0000002c 0000b8e0 > ib_srpt receiving failed for idx 0 with status 4 > 0000:04:00.0:poll_health:151:(pid 0): device's health compromised > assert_var[0] 0x00000094 > assert_var[1] 0x00000000 > assert_var[2] 0x00000000 > assert_var[3] 0x00000000 > assert_var[4] 0x00000000 > assert_exit_ptr 0x0061d35c > assert_callra 0x0067a5f4 > fw_ver 0xa0641900 > hw_id 0x000001ff > irisc_index 2 > synd 0x1: firmware internal error > ext_sync 0x0000 > 0000:04:00.0:health_care:76:(pid 7943): handling bad device here > ib_srpt Received DREQ and sent DREP for session > 0x00000000000000000002c90300ed0960. > ib_srpt Received DREQ and sent DREP for session > 0x00000000000000000002c90300ed0960. > ib_srpt Received IB TimeWait exit for cm_id ffff88046d1fb200. > ib_srpt Received IB TimeWait exit for cm_id ffff880454ffa000. > ib_srpt Session 0x00000000000000000002c90300ed0960: kernel thread > ib_srpt_compl (PID 8585) stopped I don't know how that can cause all the other errors though. Haggai [1] http://lxr.free-electrons.com/source/drivers/infiniband/hw/mlx5/cq.c?v=4.1#L230 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html