On Tue, Dec 18, 2012 at 11:00 AM, Lars Ellenberg <[email protected]> wrote: > On Mon, Dec 17, 2012 at 04:09:25PM +0100, Rasto Levrinc wrote: >> On Mon, Dec 17, 2012 at 3:39 PM, Caspar Smit <[email protected]> wrote: >> > 2012/12/17 Rasto Levrinc <[email protected]> >> >> >> >> On Mon, Dec 17, 2012 at 3:09 PM, Caspar Smit <[email protected]> wrote: >> >> > Hi Rasto, >> >> > >> >> > I noticed this in one of my clusters: >> >> > >> >> ... >> >> >> >> > /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon >> >> > root 9869 0.0 0.0 39856 1516 pts/7 S+ 14:49 0:00 >> >> > \_ sudo -E -p DRBD MC sudo pwd: /usr/local/bin/lcmc-gui-helper-1.4.2 >> >> > hw-info-daemon >> >> > root 9870 0.3 0.0 27676 5000 pts/7 S+ 14:49 0:00 >> >> > \_ /usr/bin/perl /usr/local/bin/lcmc-gui-helper-1.4.2 hw-info-daemon >> >> > root 18176 0.0 0.0 9060 1180 pts/7 S+ 14:50 0:00 >> >> > \_ sh -c /sbin/pvdisplay -C --noheadings -o pv_name,vg_name 2>/dev/null >> >> > root 18177 0.0 0.0 17872 1604 pts/7 D+ 14:50 0:00 >> >> > \_ /sbin/pvdisplay -C --noheadings -o pv_name,vg_name >> >> > >> >> > Why is LCMC running so many pvdisplay commands at once? >> >> >> >> Hi Caspar, >> >> >> >> it is running it once in 10 seconds, to see if something has changed. >> >> Can you check what does it do on your nodes? >> >> >> >> /sbin/pvdisplay -C --noheadings -o pv_name,vg_name >> >> >> >> Rasto >> >> >> > >> > # /sbin/pvdisplay -C --noheadings -o pv_name,vg_name >> > /dev/sdb single_array3 >> > /dev/sdc single_array3 >> > /dev/sdd single_array3 >> > /dev/sdh replicated_array1and2 >> > /dev/sdi replicated_array1and2 >> > /dev/sdj replicated_array1and2 >> > /dev/sdk replicated_array1and2 >> > >> > I know that LCMC does monitor changes with the lcmc-gui-helper script, but >> > I >> > presume the "hw-info-daemon" part has to run only once and not 5(+) times >> > concurrently? >> > >> > Running 5x pvdisplay concurrently can really slow things down. >> >> It shouldn't run this 5x concurrently. What here probably happens, is that >> the hw daemon takes too long and is assumed dead and is restarted. > > Which does not really improve things in this case ;-)
That's actually a regression, the old daemon must be killed, when anything hangs, so that's the first bug. > >> Can it be that /sbin/pvdisplay -C --noheadings -o pv_name,vg_name >> hangs on or takes very long on your system, at least sometimes? > > We have seen lvm commands that scan meta data take several *minutes* to > complete on a moderately busy server. > [0.1 seconds when the system is idle, > virtually "forever" when it is really busy :-)] > > In part because of too many devices to be scanned, badly chosen filter > settings, badly chosen bio flags for O_DIRECT (that has been fixed since > in kernel), too long device queues (too large nr_requests), > and evil io scheduler interactions. > All tuneable, or possible to work around. > Still that brought it down to ~ 20 seconds only. > >> Anyway I can/should fix LCMC to deal with this situation. > > You should probably not initiate a full device scan every ten seconds, > but preferably on demand only, > or maybe every once in a while if loadavg is low. The LCMC needs to have relatively uptodate knowledge about LVM, if possible. It could run the scan only if any pv, vg or lv has changed. I've figured out that I could check the lvm .cache file timestamp that is written every time any of the lvm commands are run. It wouldn't help if the lvm cache is disabled, but I guess if somebody disables that, has also made some device filters. And update every 10 seconds + time the scan runs or so would be acceptable. Or maybe something clever like (time the scan runs) * 2 + 10. Rasto _______________________________________________ drbd-mc mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-mc
