Bart Van Assche wrote:
On 02/05/13 21:54, Or Gerlitz wrote:
On Tue, Feb 5, 2013 at 6:25 PM, Bart Van Assche <bvanass...@acm.org> wrote:
On 02/04/13 22:11, Or Gerlitz wrote:
Bart, I'd like to sharpen the point: could you please clarify if the
series posted to linux-rdma stands for itself in the sense that SRP HA
scheme X (please state it) now works/better when the patches applied
on top of the latest 3.8-rc cut? OR for X to do better/work, one needs
this series AND the one you posted to linux-scsi.

Hello Or,

A huge number of patches have been taken upstream between 3.8-rc1 and 3.8-rc6. I have retested these three patches with 3.8-rc6 and would appreciate if you would also repeat your tests.

Thanks,

Bart.
Hello Bart,

I tested your 3.8 v3 patchset. I did the following:
- clone & checkout Roland's ib tree for-next branch
- applied Bart's 3.8 v3 patchset
- applied "save & restore host_scribble during error handling" patch - http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg17809.html

I have two paths to target thru port 1 & 2 (scsi_host host9 & host10)

- run I/Os
- disable port 1 @ 19:11:30
- error recovery for host9 kick in @ 19:12:04
- multipath remove the path, I/Os fail-over @ 19:12:51
- error recovery was still going on with host9 (sysfs entry for host9 still intact)
- enable port 1 @19:15:00
- host9 reconnect to target thru error recovery, multipathd module re-instate the path in kernel; and then host9 is REMOVED, usermode "multipath -l" did not show re-instate path thru host9

Feb  6 19:15:04 vsa30 kernel: scsi host9: SRP abort called
Feb 6 19:15:05 vsa30 multipathd: overflow in attribute '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/host9/target9:0:0/9:0:0:2/state'
Feb  6 19:15:14 vsa30 kernel: scsi host9: SRP abort called
Feb  6 19:15:14 vsa30 kernel: scsi host9: SRP reset_device called
Feb  6 19:15:14 vsa30 kernel: scsi host9: ib_srp: SRP reset_host called
Feb  6 19:15:14 vsa30 kernel: scsi host9: ib_srp: reconnect succeeded
Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c4400000050a522180003: sdd - tur checker reports path is up
Feb  6 19:15:26 vsa30 multipathd: 8:48: reinstated
Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c4400000050a522180003: remaining active paths: 2 Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c4400000050a522180002: sdc - tur checker reports path is up
Feb  6 19:15:26 vsa30 multipathd: 8:32: reinstated
Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c4400000050a522180002: remaining active paths: 2
Feb  6 19:15:26 vsa30 multipathd: sdc: remove path (uevent)
Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c4400000050a522180002: load table [0 409600 multipath 0 0 1 1 round-robin 0 1 1 8:80 1] Feb 6 19:15:26 vsa30 multipathd: sdc: path removed from map 3600144f0665c4400000050a522180002
Feb  6 19:15:26 vsa30 kernel: sd 9:0:0:1: [sdc] Synchronizing SCSI cache
Feb  6 19:15:26 vsa30 multipathd: sdd: remove path (uevent)
Feb 6 19:15:26 vsa30 multipathd: 3600144f0665c4400000050a522180003: load table [0 409600 multipath 0 0 1 1 round-robin 0 1 1 8:96 1] Feb 6 19:15:26 vsa30 multipathd: sdd: path removed from map 3600144f0665c4400000050a522180003
Feb  6 19:15:26 vsa30 kernel: sd 9:0:0:2: [sdd] Synchronizing SCSI cache

- disable port 2 @19:22:50
- error recovery kicked in on host10 @ 19:23:40
- I/Os failed with NO path to target @ 19:24:27
- without enabling port 2, error recovery was still going on host10 still 19:57:52 and stop. - host10 was still in sysfs /sys/class/scsi_host/host10 & taking reference on ib_srp module
- enable port 2 - nothing happened.

Conclusion:
1. disable the port/path long enough >35 minutes, we have dangling scsi host. 2. enable the port within 30 minute, scsi host re-establish connection, path re-instate and then scsi_host was removed (no entry in sysfs)

I attached a log here to show what happened above.

thanks,
-vu

Attachment: messages.bz2
Description: Binary data

Reply via email to