Hello community, here is the log from the commit of package lvm2 for openSUSE:Factory checked in at 2018-02-03 15:37:06 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/lvm2 (Old) and /work/SRC/openSUSE:Factory/.lvm2.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "lvm2" Sat Feb 3 15:37:06 2018 rev:114 rq:570450 version:2.02.177 Changes: -------- --- /work/SRC/openSUSE:Factory/lvm2/device-mapper.changes 2018-01-16 09:39:54.748592628 +0100 +++ /work/SRC/openSUSE:Factory/.lvm2.new/device-mapper.changes 2018-02-03 15:37:07.644434746 +0100 @@ -1,0 +2,7 @@ +Tue Jan 16 11:53:36 UTC 2018 - z...@suse.com + +- clvmd: try to refresh device cache on the first failure + (bsc#978055, bsc#1076042) + + bug-978055_clvmd-try-to-refresh-device-cache-on-the-first-failu.patch + +------------------------------------------------------------------- lvm2-clvm.changes: same change lvm2.changes: same change New: ---- bug-978055_clvmd-try-to-refresh-device-cache-on-the-first-failu.patch ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ lvm2-clvm.spec ++++++ --- /var/tmp/diff_new_pack.1gPOpo/_old 2018-02-03 15:37:08.560391963 +0100 +++ /var/tmp/diff_new_pack.1gPOpo/_new 2018-02-03 15:37:08.564391776 +0100 @@ -61,6 +61,9 @@ Patch2001: bug-1012973_simplify-special-case-for-md-in-69-dm-lvm-metadata.patch ### COMMON-PATCH-END ### +# Patches for clvmd and cmirrord +Patch3001: bug-978055_clvmd-try-to-refresh-device-cache-on-the-first-failu.patch + %description A daemon for using LVM2 Logival Volumes in a clustered environment. @@ -76,6 +79,8 @@ %patch2001 -p1 ### COMMON-PREP-END ### +%patch3001 -p1 + %build extra_opts=" --enable-applib ++++++ bug-978055_clvmd-try-to-refresh-device-cache-on-the-first-failu.patch ++++++ >From 4f0681b1a296d88ac1dbdb26e46afed3285ad1bf Mon Sep 17 00:00:00 2001 From: Eric Ren <z...@suse.com> Date: Tue, 23 May 2017 15:09:46 +0800 Subject: [PATCH 09/10] clvmd: try to refresh device cache on the first failure 1. The original problem $ sudo lvchange -ay testvg/testlv Error locking on node 1302cf30: Volume group for uuid not found: qBKu65bSxfRq7gUf91NZuH4epLza4ifDieQJFd2to2WruVi5Brn7DxxsEgi5Zodw 2. This problem can be easily replicated a. Make clvmd running in cluster environment; b. Assume you have created LV "testlv" in local VG 'testvg' on a MD device 'md0'; c. Make sure 'md0' is stopped, and not in the device cache by executing 'clvmd -R' or 'pvscan'; d. Assemble 'md0' by issuing 'mdadm --assemble --scan --name md0'; e. To activate 'testlv', you will see the 'Error locking' problem. 3. Analysis a. After step 2.d, 'pvscan --cache ...' is triggered by udev rules, notifying 'md0' is ready. But, pvscan exits very early because lvmetad is not being used, thus doesn't go through the lock manager. Therefore, clvmd isn't aware of this udev events. The device cache hasn't 'md0'. b. In step 2.e, the client, 'lvchange -ay testvg/testlv' cmd, can find 'testlv' correctly in the client metadata, because the device list is gathered by call chain: lvm_run_command()->init_filters()->persistent_filter_load()->dev_cache_scan(). Then, it asks clvmd for "Locking VG V_testvg CR", which just drops the metadata in clmvd by call chain: do_lock_vg()->lvmcache_drop_metadata(), but the device cache is *not* refreshed. c. Finally, clvmd fails to find the lvid in activation path: do_lock_lv()->do_activate_lv()->lv_info_by_lvid() Apparently, the metadata DB is not complete without a complete device cache in clvmd. However, upstream say the pvscan tool intends to be only used with lvmetad, suggesting me not hacking there. So, we'd better fix this issue within clvmd code. Sometimes, the device cache in clvmd could be out of date. "clvmd -R" is invented for this issue. However, to run "clvmd -R" manually is not convenient, because it's hard to predict when device change would happen. This patch gives another try after refreshing the device cache. In normal, it doesn't cause any side-effect. In case of the issue above, it's worth a retry. Signed-off-by: Eric Ren <z...@suse.com> --- daemons/clvmd/lvm-functions.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/daemons/clvmd/lvm-functions.c b/daemons/clvmd/lvm-functions.c index 2446fd1..dcd3f9b 100644 --- a/daemons/clvmd/lvm-functions.c +++ b/daemons/clvmd/lvm-functions.c @@ -509,11 +509,14 @@ const char *do_lock_query(char *resource) int do_lock_lv(unsigned char command, unsigned char lock_flags, char *resource) { int status = 0; + int do_refresh = 0; DEBUGLOG("do_lock_lv: resource '%s', cmd = %s, flags = %s, critical_section = %d\n", resource, decode_locking_cmd(command), decode_flags(lock_flags), critical_section()); - if (!cmd->initialized.config || config_files_changed(cmd)) { +again: + if (!cmd->initialized.config || config_files_changed(cmd) + || do_refresh) { /* Reinitialise various settings inc. logging, filters */ if (do_refresh_cache()) { log_error("Updated config file invalid. Aborting."); @@ -579,6 +582,12 @@ int do_lock_lv(unsigned char command, unsigned char lock_flags, char *resource) init_test(0); pthread_mutex_unlock(&lvm_lock); + /* Try again in case device cache is stale */ + if (status == EIO && !do_refresh) { + do_refresh = 1; + goto again; + } + DEBUGLOG("Command return is %d, critical_section is %d\n", status, critical_section()); return status; } -- 2.10.2