On 8/31/2015 9:50 AM, Haggai Eran wrote:
On 30/08/2015 21:23, Sagi Grimberg wrote:

Looks like for some reason cm_get_bth_pkey got pkey_index of 0xffff
instead of 0 (working on the default pkey 0xffff at entry 0).

It looks like the mlx5 driver doesn't interpret the completion format
correctly. It takes a field defined in the programmer reference manual
as pkey, and interprets it as pkey_index [1].

You're right! I wonder how this ever used to work (and it did...).
So the driver needs to lookup a pkey_index on each GSI packet?


log:
infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, 
pkey index 65535). -22
ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 
0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 
(guid=0xfe80000000000000:0x2c90300ed0950)
ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started
infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, 
pkey index 65535). -22
ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 
0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 
(guid=0xfe80000000000000:0x2c90300ed0950)
ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started
mlx5_0:dump_cqe:238:(pid 8584): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
0000002b 00000000 00000000 00000000
00000000 94003004 0000002c 0000b8e0
ib_srpt receiving failed for idx 0 with status 4
0000:04:00.0:poll_health:151:(pid 0): device's health compromised
assert_var[0] 0x00000094
assert_var[1] 0x00000000
assert_var[2] 0x00000000
assert_var[3] 0x00000000
assert_var[4] 0x00000000
assert_exit_ptr 0x0061d35c
assert_callra 0x0067a5f4
fw_ver 0xa0641900
hw_id 0x000001ff
irisc_index 2
synd 0x1: firmware internal error
ext_sync 0x0000
0000:04:00.0:health_care:76:(pid 7943): handling bad device here
ib_srpt Received DREQ and sent DREP for session 
0x00000000000000000002c90300ed0960.
ib_srpt Received DREQ and sent DREP for session 
0x00000000000000000002c90300ed0960.
ib_srpt Received IB TimeWait exit for cm_id ffff88046d1fb200.
ib_srpt Received IB TimeWait exit for cm_id ffff880454ffa000.
ib_srpt Session 0x00000000000000000002c90300ed0960: kernel thread ib_srpt_compl 
(PID 8585) stopped

I don't know how that can cause all the other errors though.

Me neither...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to