[Bug 1057054] Re: poor performance after upgrade to Precise
Then the only possible actors left are the actual initramdisk contents e.g. zcat | cpio -id and examine all the scripts (init, conf/modules) to determine how it could be loaded. The absence of the module reference in all of /etc and /usr/share/initramfs-tools/ tell me that whatever is loading that module is doing so as a side effect or an administration artifact e.g. someone wrote a udev rule and forgot. Another possible source and this is also way out in left field is if modules.dep was compromised and scsi_sh_rdac was added as dependency of another module and thus loaded indirectly. root@nashira:/lib/modules/2.6.32-41-generic# grep scsi_dh modules* modules.builtin:kernel/drivers/scsi/device_handler/scsi_dh.ko modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko: modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko: modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_emc.ko: modules.dep:kernel/drivers/scsi/device_handler/scsi_dh_alua.ko: Binary file modules.dep.bin matches modules.order:kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko modules.order:kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko modules.order:kernel/drivers/scsi/device_handler/scsi_dh_emc.ko modules.order:kernel/drivers/scsi/device_handler/scsi_dh_alua.ko root@nashira:/lib/modules/2.6.32-41-generic# vim modules.dep root@nashira:/lib/modules/2.6.32-41-generic# vim modules.builtin Fine here. concerning boot probe, scsi discovery is asymmetric, there's no expectation of order. Performance tuning is where I get off, I also don't know much about iSCSI transport, though yeah, jumbo frames it probably wise. If you're seeing those messages before the file system is mounted then the actors are definitely *in the ramdisk*, you just need to find them. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
nothing. from the 10.04 syslog, I do see this right after all the scsi attach events: Sep 24 13:54:01 file3 multipathd: sdm: add path (uevent) Sep 24 13:54:02 file3 kernel: [7.297268] sd 10:0:0:0: rdac: LUN 0 (unowned) Sep 24 13:54:02 file3 kernel: [7.299115] sd 8:0:0:0: rdac: LUN 0 (owned) Sep 24 13:54:02 file3 kernel: [7.300844] sd 9:0:0:0: rdac: LUN 0 (owned) Sep 24 13:54:02 file3 kernel: [7.302519] sd 10:0:0:1: rdac: LUN 1 (owned) Sep 24 13:54:02 file3 kernel: [7.304256] sd 10:0:0:2: rdac: LUN 2 (owned) Sep 24 13:54:02 file3 kernel: [7.306406] sd 10:0:0:3: rdac: LUN 3 (unowned) Sep 24 13:54:02 file3 kernel: [7.308048] sd 8:0:0:1: rdac: LUN 1 (unowned) Sep 24 13:54:02 file3 kernel: [7.309676] sd 9:0:0:1: rdac: LUN 1 (unowned) Sep 24 13:54:02 file3 kernel: [7.311221] sd 9:0:0:2: rdac: LUN 2 (unowned) Sep 24 13:54:02 file3 kernel: [7.313092] sd 8:0:0:2: rdac: LUN 2 (unowned) Sep 24 13:54:02 file3 kernel: [7.314769] sd 9:0:0:3: rdac: LUN 3 (owned) Sep 24 13:54:02 file3 kernel: [7.316383] sd 8:0:0:3: rdac: LUN 3 (owned) Sep 24 13:54:02 file3 kernel: [7.316388] rdac: device handler registered and similar with the first 12.04 boot, except the "owned/unowned" lines are intermixed with the scsi attaches instead of all at the end, so that's different. as far as active/active, I am seeing 140MB/s writes, so RR is working well enough. I think the MD3000 processor is the bottleneck. it's not renowned for being super fast. I did try adjusting the rr_weight and rr_min_io, but it doens't seems to have changed much. Defaults are working well. I'm not running Jumbo frames yet, so maybe that'll help some. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
There's nothing in any of the priority checkers except scsi cmds. See for yourself. bzr+ssh://bazaar.launchpad.net/+branch/ubuntu/lucid/multipath-tools/ path_priority/pp_rdac/pp_rdac.c -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
Then we might have a distro bug here, which is weird as I've done hundreds of SAN installs with lucid and have had to manage the scsi_dh modules everytime. So on your lucid system, /etc/initramfs-tools/modules should be empty except for the commented out examples. The next thing to check is the initram disks themselves, assuming there's no directives in the previous config file, the presence of the scsi_dh kos in the initrd would indicate that they were be globbed in by another initramfs helper. Actually... root@nashira:~# zcat /boot/initrd.img-3.2.0-26-generic | cpio -it | grep scsi_dh lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko lib/modules/3.2.0-26-generic/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko 81384 blocks root@nashira:~# zcat /boot/initrd.img-2.6.32-41-generic | cpio -it | grep scsi_dh lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_hp_sw.ko lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_alua.ko lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_rdac.ko lib/modules/2.6.32-41-generic/kernel/drivers/scsi/device_handler/scsi_dh_emc.ko 64508 blocks OK, I'm surprised :) In my 2.6.32 initrd I find: root@nashira:~/2.6.32# cat conf/modules scsi_dh_alua which I suppose prompts my device handler to be installed. Which is driven by 'modules' and 'modules.d' in /usr/share/initramfs-tools. So the modules apparently have always been there, thanks to this hook script. /usr/share/initramfs-tools/hook-functions scsi) copy_modules_dir kernel/drivers/scsi for x in mptfc mptsas mptscsih mptspi zfcp; do manual_add_modules "${x}" done ;; Which is how the scsi_dh_* kos got on the initramfs, but that doesn't explain how it was loaded. That directive had to come from somewhere, so it's either already in your /usr/share/initramfs-tools/modules|modules.d or it's in your /etc/initramfs-tools/modules.d|modules and you missed it. *something* has to be prompting it's inclusion. What's the output of, as root? grep -Rl scsi_dh /etc /usr/share/initramfs-tools/ -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
after the upgrade and initial reboot, I change the config file and verified multipath -ll was as I expected, then rebooted and did the pv test. I did nothing else between my first posting and the reply verifying lsmod, so yes, it was loaded on reboot. I have both controllers in the md3000i, and each has 2 ports. a LUN can be on either controller, and active/active on that one controller, with the other controller as standby. I also have luns on it used by esxi, and the RR paths appear to be the same. I did not check which driver got loaded on lucid, and the dell docs are really kinda all over the place so I was trying to ignore them. rdac did seem to be what was happening based on how multipath -ll showed things, and I did see roughly equal traffic on 2 nics. In my first attempt with Precise I did initially see the ping-ponging you mention, before I understood how to apply rdac configs. With Lucid I put the rdac config in right from the start and did not see that behavior. is it possible the old mpath_prio_rdac callout was loading the driver early in the process? I was not aware of scsi_dh_rdac's existence until this thread, so I have no explanation of how it's getting loaded when you think it shouldn't be. it's nice to see the config codified, I'll have to experiment with that some. thanks for that info. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
Hmm, that's interesting, does that mean that after reboot scsi_dh_rdac is loaded? Please verify. Yes, it is available with the lucid kernel. Also note that if you were to reference your vendor documentation, it would probably recommend that you load rdac driver (been around for a long time actually). NOTE: multipath is basically the same on every distro, so instructions for RH provided by your vendor are just as good for Ubuntu etc. However kernels change frequently which is why vendors and others need to be involved. Let's assume that rdac is never loaded and life is good on lucid. That leaves multipath itself and the Linux kernel, both of which have jumped dramatically between lucid and precise. Since there's no real regression tests for HW SAN, and the vendors aren't pitching in, it's easily possible that something weird like this could occur. It's my opinion that if you're using ALUA, it's up to you to determine whether an additional device handler is necessary. What may have happened is multipath in lucid was biasing the primary storage controller and forcing a trespass unbenounced to you. This would have made ALUA irrelevant and ping ponged your luns behind the RAID for a period of time, it also means you weren't using both your storage processors like you intended. That would have been a bonafide bug in lucid. It's likely that it was rectified in precise considering the outcome. Going back and finding it is an academic exercise as no matter what I find, it'll probably break lucid. RDAC must be loaded. [ALUA architecture example] http://virtualgeek.typepad.com/virtual_geek/2009/09/a-couple-important-alua-and-srm-notes.html As for the kernel, it does what it's told, something with that horrendous an impact would have likely impacted your non-san disks as well, if it affected only one that would be double weird :) So it likely comes down to how the IO was queued to begin with, and multipathd is responsible for that. Between the two MP versions, your SAN actually got a codified config, meaning you don't need to provide your own if you don't want to, it's built in. 0.4.8 didn't have this. [libmultipath/hwtable.c] { /* DELL MD3000 */ .vendor= "DELL", .product = "MD3000", .getuid= DEFAULT_GETUID, .features = "2 pg_init_retries 50", .hwhandler = "1 rdac", .selector = DEFAULT_SELECTOR, .pgpolicy = GROUP_BY_PRIO, .pgfailback= -FAILBACK_IMMEDIATE, .rr_weight = RR_WEIGHT_NONE, .no_path_retry = 15, .minio = DEFAULT_MINIO, .checker_name = RDAC, .prio_name = PRIO_RDAC, .prio_args = NULL, }, Your config is overridding any member you defined, the rest are coming through, like minio. BTW you might wish to double check exactly what your SAN can do. Active/Active isn't what it used to be and is really "dual active". http://gestaltit.com/all/tech/storage/stephen/multipath-activepassive- dual-active-activeactive/ -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
But it survive a reboot. several in fact. I always reboot after major config changes to reduce the chance of a 2am phone call after a power outage. I don't lightly file bugs, it's 3 days and 4 OS re-installs to get me here. This was working on Lucid, and required additional setup for Precise and I found documentation difficult to find. It's repeatable, I've been repeating it all week, both on upgrades and clean installs. Did lucid also use scsi_dh_rdac? I had no issues there. Something changed. I wish i could help find out what. Is there a debug flag and/or log file I can send? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
There is no feature, that has ever existed, that has the capacity to *at runtime* examine all attached disks, and cross reference their SCSI INQUIRY data to a table of available device handlers. That table does not exist, if it did, it would be miserable to maintain. I checked the udev rules and initramfs scripts from lucid -> precise. We never loaded dh modules automatically. The multipath C code has no facility to modprobe or insmod anything. So the only logical conclusion left is that the module was loaded without your knowledge, which means your configuration as it was would never survive a reboot. If that's not true, and you can reproduce that, I would be interested to see it. However, even if I had the answer, that doesn't completely make up for a complete lack of vendor participation in qualifying your SAN with our operating [1]. We cannot be expected to regression test every SAN in creation and rely on users like you (or vendors) to test and stay engaged. Please contact your vendor expressing support for official Ubuntu support for your SAN multipath-tools is supported by the Community, not Canonical, I volunteer to maintain it. That multipath section in the server guide? I wrote it with the next precise LTS as the deadline, months of effort. multipath as a whole is light years better than it was in lucid, or ever for that matter (many helped). I'm not disagreeing with you that things are missing and there's certainly room for improvement. You've pointed out several issues, like the dialog box,man page etc, that's all good stuff, please file a separate bug for each so we can track them. It's simply a matter of triage and bandwidth, a good multipath bug can soak weeks of time, so configuration polish like you mentioned falls to the way side. However, that sort of work is low hanging fruit, and doesn't require kernel storage engineer with years of experience to accomplish. Contributions are most certainly welcome. FYI, there really isn't a hard spec for multipath.conf, it actually functions a lot like YAML where keywords are globbed, the values integrated and override the defaults. There's no one place in the code where you can go and discover "this is how config works", it's scattered everywhere which makes creating regression tests prohibitive if not practically impossible. 1. The implication is that multipath may have changed so dramatically from 0.4.8 to 0.4.9 that the scsi_dh_rdac driver may not have been as necessary. There's no way we could have caught that on code review, testing was required. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
My devices are on iscsi, and I'm not booting off them. They are not connected until after the root goes live and network comes active, via normal means. I never manually loaded scsi_dh_rdac before, it's not in my initrd nor my modules files. It loaded automatically somehow. Why isn't it loading automatically earlier? What loaded it if multipath driver didn't? other drivers load child drivers, why can't multipath? I never found that html server guide in my searching and it is helpful. The PDF I only found after failing a couple times. Other software presents prompts and links when config formats change, or even changes them for you, it's become expected, multipath should do it too. The local manpage for multipath.conf still has prio_callout and not prio documented, and has zero mention of having to manually load drivers. Whether this a bug in how the drivers get loaded, or how the documentation is presented, or how upgrade transition is handled isn't that relevant, it's still a bug. It's Lucid to Precise upgrade regression bug. it's not smooth and it's not even easy to find why it's failing. If the upgrade had presented me with a screen saying the config and driver loading formats have changed, and that html link, I'd have figured it out and we wouldn't be here. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
That's never how it works. multipath has no kernel module loading ability. Would make a nice feature though. I admit that discovering "which dh is the necessary one" is a bit arcane and not well documented anywhere. The best practice here is to load the necessary device handler into your initrd so it's attached at the same time the SD devices are initially discovered. You still need to have this module loaded to provision additional luns at runtime and not have their performance plummet. Also, I discovered a typo in the multipath documentation. https://bugs.launchpad.net/serverguide/+bug/1057071 Had that worked to begin with, you would have never encountered this issue in the first place. Closing this issue as "Invalid" as it's not a bug. Thanks for the report. ** Changed in: multipath-tools (Ubuntu) Status: Incomplete => Invalid -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
via lsmod? scsi_dh_rdac is loaded. Does it need to be in the initramfs even if I'm not booting off it? doing so does "fix" the performance, but it is also counter-intuitive. Shouldn't the multipath driver read the conf file and load the needed modules before "finding" anything? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
Have you verified that the rdac driver is loaded? Also please account for the contents of /etc/initramfs-tools/modules, scsi_dh_rdac must be loaded at boot time to be discovered correctly. ** Changed in: multipath-tools (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
I have also tried a clean install of precise and I see the same results. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1057054] Re: poor performance after upgrade to Precise
** Attachment added: "my conf file" https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+attachment/3346034/+files/multipath.conf -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to multipath-tools in Ubuntu. https://bugs.launchpad.net/bugs/1057054 Title: poor performance after upgrade to Precise To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1057054/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs