Hi all,

We have been battling with this issue with help from IBM support, but I'd like 
to understand the situation a bit better, so thought I'd go to the source ;-)

We seem to have a work-around, but it seems rather woolly and sledge-hammer 
like: re-run dracut to rebuild the initramfs copy of /etc/multipaths/bindings 
and wwids files, and effectively tore down, and re-built the multipaths, and 
overlying LVM storage from scratch. It appears that the copy of bindings in 
initramfs assigned a different mpath id to the affected wwids than the copy in 
/etc/multipath. I would not have expected multipathd to be inspecting the 
contents of initramfs (and don't think it is), but I'm also not sure if this is 
a symptom or a cause.

The multipaths had been added several weeks earlier, and had been working 
properly ever since, but I don't believe there had been a re-boot since they 
had been added, nor had the intitramfs been updated. We had a scheduled SAN 
hardware maintenance restart, and later noticed strange filesystem corruption 
on a guest server.

The issue is that several mpath devices ended up with overlapping block devices 
after an interruption to half of the paths, due to a hardware re-start of one 
of the SAN nodes. All the block devices from mpathn ended up being used by 
another mpathm too:

Truncated output of multipath -ll:

mpathn (360050763008080eef80000000000002d) dm-13 IBM     ,2145
| |- 1:0:0:12 sdm  8:192  active undef running
| `- 3:0:0:12 sdao 66:128 active undef running
  |- 3:0:1:12 sdbc 67:96  failed undef running
  `- 1:0:1:12 sdaa 65:160 active undef running
mpathm (360050763008080eef80000000000002c) dm-12 IBM     ,2145
| |- 1:0:1:12 sdaa 65:160 active undef running
| `- 3:0:1:12 sdbc 67:96  active undef running
| |- 1:0:1:11 sdz  65:144 active undef running
| |- 3:0:1:11 sdbb 67:80  failed undef running
| |- 1:0:0:12 sdm  8:192  active undef running
| `- 3:0:0:12 sdao 66:128 active undef running
  |- 1:0:0:11 sdl  8:176  active undef running
  `- 3:0:0:11 sdan 66:112 active undef running

Note: that failed sdbb device eventually came back as active, as this was taken 
just after the hardware was reset, but before everything had settled down 
again. However, the overlap never resolved itself.

Environment:
uname -a
Linux p8-srvr1 3.10.82-2042.1.pkvm2_1_1.71.ppc64 #1 SMP Fri Jul 31 09:52:38 CDT 
2015 ppc64 ppc64 ppc64 GNU/Linux

cat /etc/issue
IBM_PowerKVM release 2.1.1 build 62 service (pkvm2_1_1)
Kernel \r on a \m (\l)

rpm -qa |grep multipath
device-mapper-multipath-libs-0.4.9-51.pkvm2_1.5.ppc64
device-mapper-multipath-0.4.9-51.pkvm2_1.5.ppc64

So my questions are:
a) Is this expected behaviour or a bug?
b) If a bug, is there a fix?
c) Is there any further information you need to help diagnose?

Regards,

Andy D'Arcy Jewell
Linux/FOSS Operations
CSI LTD



******************************************************************
IMPORTANT NOTICE
'This e-mail message is intended solely for the person to whom it is addressed 
and may contain confidential or privileged information. If you have received it 
in error, please notify [email protected] and destroy this e-mail and any 
attachments. In addition, you must not disclose, copy, distribute or take any 
action in reliance on this e-mail or any attachments. Any liability (in 
negligence or otherwise) arising from any third party acting, or refraining 
from acting, on any information contained in this e-mail is excluded. Any views 
or opinions presented in this e-mail are solely those of the author and do not 
necessarily represent those of the company. 
When addressed to our customers any quotations contained in this e-mail are 
subject to contract and are on the terms of the company's standard Conditions, 
a copy of which is available on request. Any errors or omissions in any 
quotations or other information issued by the company shall be subject to 
correction without any liability on the part of the company. Copyright in 
documents created by or on behalf of this company remains vested in the 
company, and we assert our moral rights, unless the terms of our relevant 
client's agreement provide otherwise. 
Due to the nature of Internet communications CSI cannot guarantee that this 
communication or any attachments do not contain software viruses. We have taken 
every precaution to minimise this probability but cannot accept any liability 
for damage which you may sustain as a result of software viruses. We recommend 
you carry out your own virus checks before opening attachments. 
CSI reserves the right to monitor all e-mail communications through its 
internal and external networks.
This communication is from Computer Systems Integration Limited 
Registered in England and Wales
Registered number: 1748591
Registered address: Lynton House, 7-12 Tavistock Square. London WC1H 9BQ
******************************************************************
--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to