Bug#700444: multipath-tools: Automatic path recovery not occurring on IBM Power with dual VIOS served disks.

2013-03-10 Thread Ritesh Raj Sarraf
On Sunday 10 March 2013 07:21 PM, Frank Fegert wrote:
> ststnagios02:~# dmesg | tail
> [6835466.341578] sd 0:0:1:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 
> 00
> [6835466.341592] end_request: I/O error, dev sda, sector 0
> [6835467.341974] sd 1:0:1:0: [sdb] Unhandled error code
> [6835467.341983] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
> driverbyte=DRIVER_OK
> [6835467.341990] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 
> 00
> [6835467.342003] end_request: I/O error, dev sdb, sector 0
> [6835471.342506] sd 0:0:1:0: [sda] Unhandled error code
> [6835471.342516] sd 0:0:1:0: [sda]  Result: hostbyte=DID_ERROR 
> driverbyte=DRIVER_OK
> [6835471.342523] sd 0:0:1:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 
> 00
> [6835471.342536] end_request: I/O error, dev sda, sector 0
>
> ststnagios02:~# /sbin/scsiinfo -l
> /dev/sda /dev/sdb
>
> ststnagios02:~# /sbin/scsiinfo -i /dev/sda
> INQUIRY command status  = 1
>
> ststnagios02:~# /sbin/scsiinfo -i /dev/sdb
> INQUIRY command status  = 1
>
> The Debian system is currently still in the "floating" state, so if
> there's any other test you'd like me to run, that'd be no problem.
> All other AIX systems on the test hardware sucessfully recovered their
> paths though.


That means multipath is reporting the correct status. The scsi devices
never recovered.
What is the device driver (and HBA) that is under use? Is it supported
under Debian?
I see you are on 2.6.39. Have you tried with a newer kernel?

You should also check if rescanning the SCSI bus changes any state. Use
rescan-scsi-bus.

-- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."




signature.asc
Description: OpenPGP digital signature


Bug#700444: multipath-tools: Automatic path recovery not occurring on IBM Power with dual VIOS served disks.

2013-03-10 Thread Frank Fegert
Hello,

sorry for the delayed reply!

On Wed, Feb 13, 2013 at 11:47:00PM +0100, Frank Fegert wrote:
> > When you thought the paths were back, were they responding to scsi commands?
> 
> Sorry, didn't check that at the time.
> 
> > You could use tools from the sg3-utils package or use the scsi_id
> > program to  confirm that.
> 
> I'll try to setup a test environment.

In a test environment, after a consecutive reboot of each of the two
VIOS:

ststnagios02:~# multipath -ll
mpath0 (360050768019181279800023b) dm-0 AIX,VDASD
size=36G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 0:0:1:0 sda 8:0  failed faulty running
  `- 1:0:1:0 sdb 8:16 failed faulty running

ststnagios02:~# dmesg | tail
[6835466.341578] sd 0:0:1:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[6835466.341592] end_request: I/O error, dev sda, sector 0
[6835467.341974] sd 1:0:1:0: [sdb] Unhandled error code
[6835467.341983] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
driverbyte=DRIVER_OK
[6835467.341990] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[6835467.342003] end_request: I/O error, dev sdb, sector 0
[6835471.342506] sd 0:0:1:0: [sda] Unhandled error code
[6835471.342516] sd 0:0:1:0: [sda]  Result: hostbyte=DID_ERROR 
driverbyte=DRIVER_OK
[6835471.342523] sd 0:0:1:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[6835471.342536] end_request: I/O error, dev sda, sector 0

ststnagios02:~# /sbin/scsiinfo -l
/dev/sda /dev/sdb

ststnagios02:~# /sbin/scsiinfo -i /dev/sda
INQUIRY command status  = 1

ststnagios02:~# /sbin/scsiinfo -i /dev/sdb
INQUIRY command status  = 1

The Debian system is currently still in the "floating" state, so if
there's any other test you'd like me to run, that'd be no problem.
All other AIX systems on the test hardware sucessfully recovered their
paths though.

Thanks & best regards,

Frank Fegert


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#700444: multipath-tools: Automatic path recovery not occurring on IBM Power with dual VIOS served disks.

2013-02-13 Thread Frank Fegert
Hello,

On Wed, Feb 13, 2013 at 11:47:28AM +0530, Ritesh Raj Sarraf wrote:
> When you said temporary, are you sure when the paths recovered back?

well all other LPARs running AIX got their paths back ;-) So the
backend and the VIOSes were providing multiple paths again. After
the Debian LPAR was rebooted, all paths were back to normal. 

> When you thought the paths were back, were they responding to scsi commands?

Sorry, didn't check that at the time.

> You could use tools from the sg3-utils package or use the scsi_id
> program to  confirm that.

I'll try to setup a test environment.

> Any good reasons for using "no_path_retry 10" ???

At the top of my head, no. Might be a relict from earlier times, or
might be derived from the IBM SVC recommendations for Linux host
attachments ('no_path_retry "5"'). Although in this case the Debian
LPAR gets its SVC backed disks from the two VIOS rather than directly
from the SVC.

Thanks & best regards,

Frank Fegert


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#700444: multipath-tools: Automatic path recovery not occurring on IBM Power with dual VIOS served disks.

2013-02-12 Thread Ritesh Raj Sarraf
On Wednesday 13 February 2013 12:12 AM, Frank Fegert wrote:
> Hello,
>
> on a IBM Power LPAR with dual virtual I/O server (VIOS) backed disk
> devices, multipath seems not to be able to recover temporarily failed
> disk paths (e.g. after one VIOS is restarted after maintenance). The
> serial console shows:

When you said temporary, are you sure when the paths recovered back?
When you thought the paths were back, were they responding to scsi commands?
You could use tools from the sg3-utils package or use the scsi_id
program to  confirm that.

>
> [  325.695570] ibmvscsi 3003: Virtual adapter failed rc 2!
> [  325.799041] ibmvscsi 3003: SRP_VERSION: 16.a
> [  325.799076] ibmvscsi 3003: Partner adapter not ready
> [  325.799087] ibmvscsi 3003: error after reset
> [  326.072253] sd 1:0:1:0: [sdb] Unhandled error code
> [  326.072271] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
> driverbyte=DRIVER_OK
> [  326.072281] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 04 42 23 23 00 00 08 00
> [  326.072308] end_request: I/O error, dev sdb, sector 71443235
> [  326.072321] device-mapper: multipath: Failing path 8:16.
> [  330.538142] sd 1:0:1:0: [sdb] Unhandled error code
> [  330.538157] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
> driverbyte=DRIVER_OK
> [  330.538165] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
> [  330.538183] end_request: I/O error, dev sdb, sector 0
> [  335.538861] sd 1:0:1:0: [sdb] Unhandled error code
> [  335.538876] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
> driverbyte=DRIVER_OK
> [  335.538884] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
> [  335.538902] end_request: I/O error, dev sdb, sector 0
> ...
> ... loops forever ...


That could very well be the messages triggered by the multipath checkerloop.

>
> Any ideas where this may be caused and/or could be resolved?
>
> Thanks & best regards,
>
> Frank Fegert
>
>
> -- Package-specific info:
> Contents of /etc/multipath.conf:
> defaults {
> getuid_callout  "/lib/udev/scsi_id -g -u -d /dev/%n"
> no_path_retry  10
> user_friendly_names yes
> }
>

Any good reasons for using "no_path_retry 10" ???

-- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."




signature.asc
Description: OpenPGP digital signature


Bug#700444: multipath-tools: Automatic path recovery not occurring on IBM Power with dual VIOS served disks.

2013-02-12 Thread Frank Fegert
Package: multipath-tools
Version: 0.4.8+git0.761c66f-10
Severity: normal

Hello,

on a IBM Power LPAR with dual virtual I/O server (VIOS) backed disk
devices, multipath seems not to be able to recover temporarily failed
disk paths (e.g. after one VIOS is restarted after maintenance). The
serial console shows:

[  325.695570] ibmvscsi 3003: Virtual adapter failed rc 2!
[  325.799041] ibmvscsi 3003: SRP_VERSION: 16.a
[  325.799076] ibmvscsi 3003: Partner adapter not ready
[  325.799087] ibmvscsi 3003: error after reset
[  326.072253] sd 1:0:1:0: [sdb] Unhandled error code
[  326.072271] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
driverbyte=DRIVER_OK
[  326.072281] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 04 42 23 23 00 00 08 00
[  326.072308] end_request: I/O error, dev sdb, sector 71443235
[  326.072321] device-mapper: multipath: Failing path 8:16.
[  330.538142] sd 1:0:1:0: [sdb] Unhandled error code
[  330.538157] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
driverbyte=DRIVER_OK
[  330.538165] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  330.538183] end_request: I/O error, dev sdb, sector 0
[  335.538861] sd 1:0:1:0: [sdb] Unhandled error code
[  335.538876] sd 1:0:1:0: [sdb]  Result: hostbyte=DID_ERROR 
driverbyte=DRIVER_OK
[  335.538884] sd 1:0:1:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
[  335.538902] end_request: I/O error, dev sdb, sector 0
...
... loops forever ...

Any ideas where this may be caused and/or could be resolved?

Thanks & best regards,

Frank Fegert


-- Package-specific info:
Contents of /etc/multipath.conf:
defaults {
getuid_callout  "/lib/udev/scsi_id -g -u -d /dev/%n"
no_path_retry  10
user_friendly_names yes
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
multipath {
wwid360050768019181279800023C
}
}
devices {
}


-- System Information:
Debian Release: 6.0.6
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'stable-updates')
Architecture: powerpc (ppc64)

Kernel: Linux 2.6.39-bpo.2-powerpc (SMP w/6 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages multipath-tools depends on:
ii  initscripts2.88dsf-13.1+squeeze1 scripts for initializing and shutt
ii  kpartx 0.4.8+git0.761c66f-10 create device mappings for partiti
ii  libaio10.3.107-7 Linux kernel AIO access library - 
ii  libc6  2.11.3-4  Embedded GNU C Library: Shared lib
ii  libdevmapper1.02.1 2:1.02.48-5   The Linux Kernel Device Mapper use
ii  libncurses55.7+20100313-5shared libraries for terminal hand
ii  libreadline6   6.1-3 GNU readline and history libraries
ii  lsb-base   3.2-23.2squeeze1  Linux Standard Base 3.2 init scrip
ii  udev   164-3 /dev/ and hotplug management daemo

multipath-tools recommends no packages.

Versions of packages multipath-tools suggests:
ii  multipath-tools-bo 0.4.8+git0.761c66f-10 Support booting from multipath dev

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org