[lustre-discuss] Lustre/ZFS snapshots mount error
Hi all, We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots taken daily, rotated over a week. It’s a beautiful feature we’ve been using in production ever since it was introduced with 2.10. -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. -) Both fsA & fsB have changelogs active. -) fsA has combined mgt/mdt on a single ZFS filesystem. -) fsB has a single mdt on a single ZFS filesystem. -) for fsA, I have no issues mounting any of the snapshots via lctl. -) for fsB, I can mount the most three recent snapshots, then encounter errors: [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system The relevant bits from dmesg are [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device [1353434.417765] Lustre: Skipped 3 previous similar messages [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [1353434.649484] Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at ...@tcp) [1353435.427747] Lustre: Skipped 23 previous similar messages [1353445.255899] Lustre: Failing over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous similar messages [1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation ost_disconnect to node ...@tcp failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous similar messages [1353445.353874] Lustre: server umount 3d40bbc-MDT complete [1353445.353877] Lustre: Skipped 3 previous similar messages [1353475.302224] Lustre: 4e646fe-MDD: changelog on [1353475.302228] Lustre: Skipped 1 previous similar message [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 36ca26b-MDT-osd: someone try to start transaction under readonly mode, should be disabled. [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) Skipped 1 previous similar message [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: P OE 3.10.0-862.6.3.el7.x86_64 #1 [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace: [1353498.968841] [] dump_stack+0x19/0x1b [1353498.968851] [] osd_trans_create+0x38b/0x3d0 [osd_zfs] [1353498.968876] [] llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887] [] llog_cat_reverse_process_cb+0x246/0x3f0 [obdclass] [1353498.968897] [] llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910] [] ? llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.968922] [] llog_cat_reverse_process+0x179/0x270 [obdclass] [1353498.968932] [] ? llog_init_handle+0xd5/0x9a0 [obdclass] [1353498.968943] [] ? llog_open_create+0x78/0x320 [obdclass] [1353498.968949] [] ? mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954] [] mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966] [] mdt_prepare+0x57/0x3b0 [mdt] [1353498.968983] [] server_start_targets+0x234d/0x2bd0 [obdclass] [1353498.968999] [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] [1353498.969012] [] server_fill_super+0x109d/0x185a [obdclass] [1353498.969025] [] lustre_fill_super+0x328/0x950 [obdclass] [1353498.969038] [] ? lustre_common_put_super+0x270/0x270 [obdclass] [1353498.969041] [] mount_nodev+0x4f/0xb0 [1353498.969053] [] lustre_mount+0x38/0x60 [obdclass] [1353498.969055] [] mount_fs+0x3e/0x1b0 [1353498.969060] [] vfs_kern_mount+0x67/0x110 [1353498.969062] [] do_mount+0x1ef/0xce0 [1353498.969066] [] ? kmem_cache_alloc_trace+0x3c/0x200 [1353498.969069] [] SyS_mount+0x83/0xd0 [1353498.969074] [] system_call_fastpath+0x1c/0x21 [1353498.969079] LustreError: 25582:0:(llog_cat.c:1027:llog_cat_reverse_process_cb()) 36ca26b-MDD: fail to destroy empty log: rc = -30 [1353498.970785] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: P OE 3.10.0-862.6.3.el7.x86_64 #1 [1353498.970786] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 08/04/2015 [1353498.970787] Call Trace: [1353498.970790] [] dump_stack+0x19/0x1b [
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas > On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) > wrote: > > Hi all, > > We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots > taken daily, rotated over a week. It’s a beautiful feature we’ve been using > in production ever since it was introduced with 2.10. > > -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. > -) Both fsA & fsB have changelogs active. > -) fsA has combined mgt/mdt on a single ZFS filesystem. > -) fsB has a single mdt on a single ZFS filesystem. > -) for fsA, I have no issues mounting any of the snapshots via lctl. > -) for fsB, I can mount the most three recent snapshots, then encounter > errors: > > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon > mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun > mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat > mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri > mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at > /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system > Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system > > The relevant bits from dmesg are > [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device > [1353434.417765] Lustre: Skipped 3 previous similar messages > [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, > recovery window shrunk from 300-900 down to 150-900 > [1353434.649484] Lustre: Skipped 3 previous similar messages > [1353434.866228] Lustre: 3d40bbc-MDD: changelog on > [1353434.866233] Lustre: Skipped 1 previous similar message > [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at > ...@tcp) > [1353435.427747] Lustre: Skipped 23 previous similar messages > [1353445.255899] Lustre: Failing over 3d40bbc-MDT > [1353445.255903] Lustre: Skipped 3 previous similar messages > [1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation > ost_disconnect to node ...@tcp failed: rc = -107 > [1353445.257896] LustreError: Skipped 23 previous similar messages > [1353445.353874] Lustre: server umount 3d40bbc-MDT complete > [1353445.353877] Lustre: Skipped 3 previous similar messages > [1353475.302224] Lustre: 4e646fe-MDD: changelog on > [1353475.302228] Lustre: Skipped 1 previous similar message > [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) > 36ca26b-MDT-osd: someone try to start transaction under readonly mode, > should be disabled. > [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) > Skipped 1 previous similar message > [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: > P OE 3.10.0-862.6.3.el7.x86_64 #1 > [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a > 08/04/2015 > [1353498.968832] Call Trace: > [1353498.968841] [] dump_stack+0x19/0x1b > [1353498.968851] [] osd_trans_create+0x38b/0x3d0 [osd_zfs] > [1353498.968876] [] llog_destroy+0x1f4/0x3f0 [obdclass] > [1353498.968887] [] > llog_cat_reverse_process_cb+0x246/0x3f0 [obdclass] > [1353498.968897] [] llog_reverse_process+0x38c/0xaa0 > [obdclass] > [1353498.968910] [] ? llog_cat_process_cb+0x4e0/0x4e0 > [obdclass] > [1353498.968922] [] llog_cat_reverse_process+0x179/0x270 > [obdclass] > [1353498.968932] [] ? llog_init_handle+0xd5/0x9a0 > [obdclass] > [1353498.968943] [] ? llog_open_create+0x78/0x320 > [obdclass] > [1353498.968949] [] ? mdd_root_get+0xf0/0xf0 [mdd] > [1353498.968954] [] mdd_prepare+0x13ff/0x1c70 [mdd] > [1353498.968966] [] mdt_prepare+0x57/0x3b0 [mdt] > [1353498.968983] [] server_start_targets+0x234d/0x2bd0 > [obdclass] > [1353498.968999] [] ? > class_config_dump_handler+0x7e0/0x7e0 [obdclass] > [1353498.969012] [] server_fill_super+0x109d/0x185a > [obdclass] > [1353498.969025] [] lustre_fill_super+0x328/0x950 > [obdclass] > [1353498.969038] [] ? lustre_common_put_super+0x270/0x270 > [obdclass] > [1353498.969041] [] mount_nodev+0x4f/0xb0 > [135
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
According to the stack trace, someone was trying to cleanup old empty llogs during mount the snapshot. We do NOT allow any modification during mount snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add LASSERT() when start the transaction. One possible solution is that, we can add some check in the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger Sent: Tuesday, August 28, 2018 5:57 AM To: Kirk, Benjamin (JSC-EG311) Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas > On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) > wrote: > > Hi all, > > We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots > taken daily, rotated over a week. It’s a beautiful feature we’ve been using > in production ever since it was introduced with 2.10. > > -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. > -) Both fsA & fsB have changelogs active. > -) fsA has combined mgt/mdt on a single ZFS filesystem. > -) fsB has a single mdt on a single ZFS filesystem. > -) for fsA, I have no issues mounting any of the snapshots via lctl. > -) for fsB, I can mount the most three recent snapshots, then encounter > errors: > > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon > mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n > eadc_AutoSS-Mon > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun > mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n > eadc_AutoSS-Sun > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat > mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n > eadc_AutoSS-Sat > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri > mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at > /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount > the snapshot eadc_AutoSS-Fri: Read-only file system > > The relevant bits from dmesg are > [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this > device [1353434.417765] Lustre: Skipped 3 previous similar messages > [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, > recovery window shrunk from 300-900 down to 150-900 [1353434.649484] > Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: > 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 > previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: > Connection restored to ...@tcp (at ...@tcp) [1353435.427747] Lustre: > Skipped 23 previous similar messages [1353445.255899] Lustre: Failing > over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous > similar messages [1353445.256150] LustreError: 11-0: > 3d40bbc-OST-osc-MDT: operation ost_disconnect to node ...@tcp > failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous > similar messages [1353445.353874] Lustre: server umount > 3d40bbc-MDT complete [1353445.353877] Lustre: Skipped 3 previous > similar messages [1353475.302224] Lustre: 4e646fe-MDD: changelog > on [1353475.302228] Lustre: Skipped 1 previous similar message > [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) > 36ca26b-MDT-osd: someone try to start transaction under readonly mode, > should be disabled. > [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) > Skipped 1 previous similar message > [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: > P OE 3.10.0-862.6.3.el7.x86_64 #1 > [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, > BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace: > [1353498.968841] [] dump_stack+0x19/0x1b > [1353498.968851] [] osd_trans_create+0x38b/0x3d0 > [osd_zfs] [1353498.968876] [] > llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887] > [] llog_cat_reverse_proce
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
The MDS situation is very basic: active/passive mds0/mds1 for both fas & fsB. fsA has the combined msg/mdt in a single zfs filesystem, and fsB has its own mdt in a separate zfs filesystem. mds0 is primary for all. fsA & fsB DO both have changelogs enabled to feed robinhood databases. What’s the recommended procedure here we should follow before mounting the snapshots? 1) disable changelogs on the active mdt’s (this will compromise robinhood, requiring a rescan…), or 2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in our case) and then mount the snapshot? Thanks for the help! -- Benjamin S. Kirk, Ph.D. NASA Lyndon B. Johnson Space Center Acting Chief, Aeroscience & Flight Mechanics Division On Aug 27, 2018, at 7:33 PM, Yong, Fan mailto:fan.y...@intel.com>> wrote: According to the stack trace, someone was trying to cleanup old empty llogs during mount the snapshot. We do NOT allow any modification during mount snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add LASSERT() when start the transaction. One possible solution is that, we can add some check in the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger Sent: Tuesday, August 28, 2018 5:57 AM To: Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> wrote: Hi all, We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots taken daily, rotated over a week. It’s a beautiful feature we’ve been using in production ever since it was introduced with 2.10. -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. -) Both fsA & fsB have changelogs active. -) fsA has combined mgt/mdt on a single ZFS filesystem. -) fsB has a single mdt on a single ZFS filesystem. -) for fsA, I have no issues mounting any of the snapshots via lctl. -) for fsB, I can mount the most three recent snapshots, then encounter errors: [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system The relevant bits from dmesg are [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device [1353434.417765] Lustre: Skipped 3 previous similar messages [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [1353434.649484] Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at ...@tcp) [1353435.427747] Lustre: Skipped 23 previous similar messages [1353445.255899] Lustre: Failing over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous similar messages [1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation ost_disconnect to node ...@tcp failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous similar messages [1353445.353874] Lustre: server umount 3d40bbc-MDT complete [1353445.353877] Lustre: Skipped 3 previous similar messages [1353475.302224] Lustre: 4e646fe-MDD: changelog on [1353475.302228] Lustre: Skipped 1 previous similar message [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 36ca26b-MDT-osd: someone try to start transaction under readonly
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
I would say that it is not your operations order caused trouble. Instead, it is related with the snapshot mount logic. As mentioned in former reply, we need some patch for the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf From: Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov] Sent: Tuesday, August 28, 2018 7:53 PM To: lustre-discuss@lists.lustre.org Cc: Andreas Dilger ; Yong, Fan Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error The MDS situation is very basic: active/passive mds0/mds1 for both fas & fsB. fsA has the combined msg/mdt in a single zfs filesystem, and fsB has its own mdt in a separate zfs filesystem. mds0 is primary for all. fsA & fsB DO both have changelogs enabled to feed robinhood databases. What’s the recommended procedure here we should follow before mounting the snapshots? 1) disable changelogs on the active mdt’s (this will compromise robinhood, requiring a rescan…), or 2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in our case) and then mount the snapshot? Thanks for the help! -- Benjamin S. Kirk, Ph.D. NASA Lyndon B. Johnson Space Center Acting Chief, Aeroscience & Flight Mechanics Division On Aug 27, 2018, at 7:33 PM, Yong, Fan mailto:fan.y...@intel.com>> wrote: According to the stack trace, someone was trying to cleanup old empty llogs during mount the snapshot. We do NOT allow any modification during mount snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add LASSERT() when start the transaction. One possible solution is that, we can add some check in the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger Sent: Tuesday, August 28, 2018 5:57 AM To: Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> wrote: Hi all, We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots taken daily, rotated over a week. It’s a beautiful feature we’ve been using in production ever since it was introduced with 2.10. -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. -) Both fsA & fsB have changelogs active. -) fsA has combined mgt/mdt on a single ZFS filesystem. -) fsB has a single mdt on a single ZFS filesystem. -) for fsA, I have no issues mounting any of the snapshots via lctl. -) for fsB, I can mount the most three recent snapshots, then encounter errors: [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system The relevant bits from dmesg are [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device [1353434.417765] Lustre: Skipped 3 previous similar messages [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [1353434.649484] Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp<mailto:...@tcp> (at ...@tcp<mailto:...@tcp>) [1353435.427747] Lustre: Skipped 23 previous similar messages [1353445.255899] Lustre: Failing over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous similar messages [1353445.256150] LustreError: 11-0: 3d40bbc-O
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
Dear All, we have a similar setup with Lustre on ZFS and we make regular use of snapshots for the purpose of backups (backups on tape use snapshots as source). We would like to use robinhood in future and the question is now how to do it. Would it be a workaround to disable the robinhood daemon temporary during the mount process? Does the problem only occur when changelogs are consumed during the process of mounting a snapshot? Or is it also a problem when changelogs are consumed while the snapshot remains mounted (which is for us typically several hours)? Is there already an LU-ticket about this issue? Thanks! Robert -- Dr. Robert Redl Scientific Programmer, "Waves to Weather" (SFB/TRR165) Meteorologisches Institut Ludwig-Maximilians-Universität München Theresienstr. 37, 80333 München, Germany Am 03.09.2018 um 08:16 schrieb Yong, Fan: > > I would say that it is not your operations order caused trouble. > Instead, it is related with the snapshot mount logic. As mentioned in > former reply, we need some patch for the llog logic to avoid modifying > llog under snapshot mode. > > > > > > -- > > Cheers, > > Nasf > > *From:*Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov] > *Sent:* Tuesday, August 28, 2018 7:53 PM > *To:* lustre-discuss@lists.lustre.org > *Cc:* Andreas Dilger ; Yong, Fan > > *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > > > The MDS situation is very basic: active/passive mds0/mds1 for both fas > & fsB. fsA has the combined msg/mdt in a single zfs filesystem, and > fsB has its own mdt in a separate zfs filesystem. mds0 is primary for > all. > > > > fsA & fsB DO both have changelogs enabled to feed robinhood databases. > > > > What’s the recommended procedure here we should follow before mounting > the snapshots? > > > > 1) disable changelogs on the active mdt’s (this will compromise > robinhood, requiring a rescan…), or > > 2) temporarily halt changelog consumption / cleanup (e.g. stop > robinhood in our case) and then mount the snapshot? > > > > Thanks for the help! > > > > -- > > Benjamin S. Kirk, Ph.D. > > NASA Lyndon B. Johnson Space Center > > Acting Chief, Aeroscience & Flight Mechanics Division > > > > On Aug 27, 2018, at 7:33 PM, Yong, Fan <mailto:fan.y...@intel.com>> wrote: > > > > According to the stack trace, someone was trying to cleanup old > empty llogs during mount the snapshot. We do NOT allow any > modification during mount snapshot; otherwise, it will trigger ZFS > backend BUG(). That is why we add LASSERT() when start the > transaction. One possible solution is that, we can add some check > in the llog logic to avoid modifying llog under snapshot mode. > > > -- > Cheers, > Nasf > > -Original Message- > From: lustre-discuss > [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of > Andreas Dilger > Sent: Tuesday, August 28, 2018 5:57 AM > To: Kirk, Benjamin (JSC-EG311) <mailto:benjamin.k...@nasa.gov>> > Cc: lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > It's probably best to file an LU ticket for this issue. > > It looks like there is something with the log processing at mount > that is trying to modify the configuration files. I'm not sure > whether that should be allowed or not. > > Does fab have the same MGS as fsA? Does it have the same MDS node > as fsA? > If it has a different MDS, you might consider to give it its own > MGS as well. > That doesn't have to be a separate MGS node, just a separate > filesystem (ZFS fileset in the same zpool) on the MDS node. > > Cheers, Andreas > > > On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) > mailto:benjamin.k...@nasa.gov>> wrote: > > Hi all, > > We have two filesystems, fsA & fsB (eadc below). Both of > which get snapshots taken daily, rotated over a week. It’s a > beautiful feature we’ve been using in production ever since it > was introduced with 2.10. > > -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. > -) Both fsA & fsB have changelogs active. > -) fsA has combined mgt/mdt on a single ZFS filesystem. > -) fsB has a single mdt on a single ZFS filesystem. > -) for fsA, I have no issues mounting any of the snapshots via > lctl. > -) for fsB, I can mount the most three rece
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
It is suspected that there were some llog to be handled when the snapshot was making Then when mount-up such snapshot, some conditions trigger the llog cleanup/modification automatically. So it is not related with your actions when mount the snapshot. Since we cannot control the system status when making the snapshot, then we have to skip llog related cleanup/modification against the snapshot when mount the snapshot. Such “skip” related logic is just what we need. Cheers, Nasf From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Robert Redl Sent: Saturday, September 8, 2018 9:04 PM To: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error Dear All, we have a similar setup with Lustre on ZFS and we make regular use of snapshots for the purpose of backups (backups on tape use snapshots as source). We would like to use robinhood in future and the question is now how to do it. Would it be a workaround to disable the robinhood daemon temporary during the mount process? Does the problem only occur when changelogs are consumed during the process of mounting a snapshot? Or is it also a problem when changelogs are consumed while the snapshot remains mounted (which is for us typically several hours)? Is there already an LU-ticket about this issue? Thanks! Robert -- Dr. Robert Redl Scientific Programmer, "Waves to Weather" (SFB/TRR165) Meteorologisches Institut Ludwig-Maximilians-Universität München Theresienstr. 37, 80333 München, Germany Am 03.09.2018 um 08:16 schrieb Yong, Fan: I would say that it is not your operations order caused trouble. Instead, it is related with the snapshot mount logic. As mentioned in former reply, we need some patch for the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf From: Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov] Sent: Tuesday, August 28, 2018 7:53 PM To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Cc: Andreas Dilger <mailto:adil...@whamcloud.com>; Yong, Fan <mailto:fan.y...@intel.com> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error The MDS situation is very basic: active/passive mds0/mds1 for both fas & fsB. fsA has the combined msg/mdt in a single zfs filesystem, and fsB has its own mdt in a separate zfs filesystem. mds0 is primary for all. fsA & fsB DO both have changelogs enabled to feed robinhood databases. What’s the recommended procedure here we should follow before mounting the snapshots? 1) disable changelogs on the active mdt’s (this will compromise robinhood, requiring a rescan…), or 2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in our case) and then mount the snapshot? Thanks for the help! -- Benjamin S. Kirk, Ph.D. NASA Lyndon B. Johnson Space Center Acting Chief, Aeroscience & Flight Mechanics Division On Aug 27, 2018, at 7:33 PM, Yong, Fan mailto:fan.y...@intel.com>> wrote: According to the stack trace, someone was trying to cleanup old empty llogs during mount the snapshot. We do NOT allow any modification during mount snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add LASSERT() when start the transaction. One possible solution is that, we can add some check in the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger Sent: Tuesday, August 28, 2018 5:57 AM To: Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> wrote: Hi all, We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots taken daily, rotated over a week. It’s a beautiful feature we’ve been using in production ever since it was introduced with 2.10. -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. -) Both fsA & fsB have changelogs active. -) fsA has combined mgt/mdt on a single ZFS filesystem. -) fsB has a single mdt on a single ZFS filesystem. -) for fsA, I have no issues mounting any of the snapshots via lctl. -) for fsB, I can mount the most three recent sn
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
Thanks for the fast reply! If I understood correctly, it is currently not possible to use the changelog feature together with the snapshot feature, right? Is there already a LU-Ticket about that? Cheers, Robert On 09/10/2018 02:57 PM, Yong, Fan wrote: > > It is suspected that there were some llog to be handled when the > snapshot was making Then when mount-up such snapshot, some conditions > trigger the llog cleanup/modification automatically. So it is not > related with your actions when mount the snapshot. Since we cannot > control the system status when making the snapshot, then we have to > skip llog related cleanup/modification against the snapshot when mount > the snapshot. Such “skip” related logic is just what we need. > > > > Cheers, > > Nasf > > *From:*lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] > *On Behalf Of * Robert Redl > *Sent:* Saturday, September 8, 2018 9:04 PM > *To:* lustre-discuss@lists.lustre.org > *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > > > Dear All, > > we have a similar setup with Lustre on ZFS and we make regular use of > snapshots for the purpose of backups (backups on tape use snapshots as > source). We would like to use robinhood in future and the question is > now how to do it. > > Would it be a workaround to disable the robinhood daemon temporary > during the mount process? > Does the problem only occur when changelogs are consumed during the > process of mounting a snapshot? Or is it also a problem when > changelogs are consumed while the snapshot remains mounted (which is > for us typically several hours)? > Is there already an LU-ticket about this issue? > > Thanks! > Robert > > -- > Dr. Robert Redl > Scientific Programmer, "Waves to Weather" (SFB/TRR165) > Meteorologisches Institut > Ludwig-Maximilians-Universität München > Theresienstr. 37, 80333 München, Germany > > Am 03.09.2018 um 08:16 schrieb Yong, Fan: > > I would say that it is not your operations order caused trouble. > Instead, it is related with the snapshot mount logic. As mentioned > in former reply, we need some patch for the llog logic to avoid > modifying llog under snapshot mode. > > > > > > -- > > Cheers, > > Nasf > > *From:*Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov] > *Sent:* Tuesday, August 28, 2018 7:53 PM > *To:* lustre-discuss@lists.lustre.org > <mailto:lustre-discuss@lists.lustre.org> > *Cc:* Andreas Dilger > <mailto:adil...@whamcloud.com>; Yong, Fan > <mailto:fan.y...@intel.com> > *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > > > The MDS situation is very basic: active/passive mds0/mds1 for both > fas & fsB. fsA has the combined msg/mdt in a single zfs > filesystem, and fsB has its own mdt in a separate zfs filesystem. > mds0 is primary for all. > > > > fsA & fsB DO both have changelogs enabled to feed robinhood databases. > > > > What’s the recommended procedure here we should follow before > mounting the snapshots? > > > > 1) disable changelogs on the active mdt’s (this will compromise > robinhood, requiring a rescan…), or > > 2) temporarily halt changelog consumption / cleanup (e.g. stop > robinhood in our case) and then mount the snapshot? > > > > Thanks for the help! > > > > -- > > Benjamin S. Kirk, Ph.D. > > NASA Lyndon B. Johnson Space Center > > Acting Chief, Aeroscience & Flight Mechanics Division > > > > On Aug 27, 2018, at 7:33 PM, Yong, Fan <mailto:fan.y...@intel.com>> wrote: > > > > According to the stack trace, someone was trying to cleanup > old empty llogs during mount the snapshot. We do NOT allow any > modification during mount snapshot; otherwise, it will trigger > ZFS backend BUG(). That is why we add LASSERT() when start the > transaction. One possible solution is that, we can add some > check in the llog logic to avoid modifying llog under snapshot > mode. > > > -- > Cheers, > Nasf > > -Original Message- > From: lustre-discuss > [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of > Andreas Dilger > Sent: Tuesday, August 28, 2018 5:57 AM > To: Kirk, Benjamin (JSC-EG311) <mailto:benjamin.k...@nasa.gov>> > Cc: lustre-discuss@lists.lustre.org >
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
Changelog is just one of the users for llog. There are many other users for llog. Means that even if without changelog, it is still possible to hit such trouble. So running robinhood when making snapshot may increase such race possibility, but disabling robinhood does not means resolved the issue. The final solution should be the enhancement of snapshot-mount logic. I did not find related LU ticket for this issue. -- Cheers, Nasf From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Robert Redl Sent: Tuesday, September 11, 2018 6:54 PM To: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error Thanks for the fast reply! If I understood correctly, it is currently not possible to use the changelog feature together with the snapshot feature, right? Is there already a LU-Ticket about that? Cheers, Robert On 09/10/2018 02:57 PM, Yong, Fan wrote: It is suspected that there were some llog to be handled when the snapshot was making Then when mount-up such snapshot, some conditions trigger the llog cleanup/modification automatically. So it is not related with your actions when mount the snapshot. Since we cannot control the system status when making the snapshot, then we have to skip llog related cleanup/modification against the snapshot when mount the snapshot. Such “skip” related logic is just what we need. Cheers, Nasf From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Robert Redl Sent: Saturday, September 8, 2018 9:04 PM To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error Dear All, we have a similar setup with Lustre on ZFS and we make regular use of snapshots for the purpose of backups (backups on tape use snapshots as source). We would like to use robinhood in future and the question is now how to do it. Would it be a workaround to disable the robinhood daemon temporary during the mount process? Does the problem only occur when changelogs are consumed during the process of mounting a snapshot? Or is it also a problem when changelogs are consumed while the snapshot remains mounted (which is for us typically several hours)? Is there already an LU-ticket about this issue? Thanks! Robert -- Dr. Robert Redl Scientific Programmer, "Waves to Weather" (SFB/TRR165) Meteorologisches Institut Ludwig-Maximilians-Universität München Theresienstr. 37, 80333 München, Germany Am 03.09.2018 um 08:16 schrieb Yong, Fan: I would say that it is not your operations order caused trouble. Instead, it is related with the snapshot mount logic. As mentioned in former reply, we need some patch for the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf From: Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov] Sent: Tuesday, August 28, 2018 7:53 PM To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Cc: Andreas Dilger <mailto:adil...@whamcloud.com>; Yong, Fan <mailto:fan.y...@intel.com> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error The MDS situation is very basic: active/passive mds0/mds1 for both fas & fsB. fsA has the combined msg/mdt in a single zfs filesystem, and fsB has its own mdt in a separate zfs filesystem. mds0 is primary for all. fsA & fsB DO both have changelogs enabled to feed robinhood databases. What’s the recommended procedure here we should follow before mounting the snapshots? 1) disable changelogs on the active mdt’s (this will compromise robinhood, requiring a rescan…), or 2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in our case) and then mount the snapshot? Thanks for the help! -- Benjamin S. Kirk, Ph.D. NASA Lyndon B. Johnson Space Center Acting Chief, Aeroscience & Flight Mechanics Division On Aug 27, 2018, at 7:33 PM, Yong, Fan mailto:fan.y...@intel.com>> wrote: According to the stack trace, someone was trying to cleanup old empty llogs during mount the snapshot. We do NOT allow any modification during mount snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add LASSERT() when start the transaction. One possible solution is that, we can add some check in the llog logic to avoid modifying llog under snapshot mode. -- Cheers, Nasf -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Andreas Dilger Sent: Tuesday, August 28, 2018 5:57 AM To: Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
I just opened an LU on the issue https://jira.whamcloud.com/browse/LU-11411 for anyone interested. Thanks a lot! -Ben On Aug 27, 2018, at 4:56 PM, Andreas Dilger mailto:adil...@whamcloud.com>> wrote: It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> wrote: Hi all, We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots taken daily, rotated over a week. It’s a beautiful feature we’ve been using in production ever since it was introduced with 2.10. -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. -) Both fsA & fsB have changelogs active. -) fsA has combined mgt/mdt on a single ZFS filesystem. -) fsB has a single mdt on a single ZFS filesystem. -) for fsA, I have no issues mounting any of the snapshots via lctl. -) for fsB, I can mount the most three recent snapshots, then encounter errors: [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system The relevant bits from dmesg are [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device [1353434.417765] Lustre: Skipped 3 previous similar messages [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [1353434.649484] Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at ...@tcp) [1353435.427747] Lustre: Skipped 23 previous similar messages [1353445.255899] Lustre: Failing over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous similar messages [1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation ost_disconnect to node ...@tcp failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous similar messages [1353445.353874] Lustre: server umount 3d40bbc-MDT complete [1353445.353877] Lustre: Skipped 3 previous similar messages [1353475.302224] Lustre: 4e646fe-MDD: changelog on [1353475.302228] Lustre: Skipped 1 previous similar message [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 36ca26b-MDT-osd: someone try to start transaction under readonly mode, should be disabled. [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) Skipped 1 previous similar message [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: P OE 3.10.0-862.6.3.el7.x86_64 #1 [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace: [1353498.968841] [] dump_stack+0x19/0x1b [1353498.968851] [] osd_trans_create+0x38b/0x3d0 [osd_zfs] [1353498.968876] [] llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887] [] llog_cat_reverse_process_cb+0x246/0x3f0 [obdclass] [1353498.968897] [] llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910] [] ? llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.968922] [] llog_cat_reverse_process+0x179/0x270 [obdclass] [1353498.968932] [] ? llog_init_handle+0xd5/0x9a0 [obdclass] [1353498.968943] [] ? llog_open_create+0x78/0x320 [obdclass] [1353498.968949] [] ? mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954] [] mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966] [] mdt_prepare+0x57/0x3b0 [mdt] [1353498.968983] [] server_start_targets+0x234d/0x2bd0 [obdclass] [1353498.968999] [] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass] [1353498.969012] [] server_fill_super+0x109d/0x185a [obdclass] [1353498.969025] [] lustre_fill_super+0x328/0x950 [obdclass] [1353498.969038] [] ? lustre_common_put_super+0x270/0x270 [obdc
Re: [lustre-discuss] Lustre/ZFS snapshots mount error
To follow up here, the LU appears to duplicate a DNE-tirggered issue as well. There is a patch available which resolved the issue for us when using 2.10.5 in our environment. For details and a link to the patch see https://jira.whamcloud.com/browse/LU-11411 -Ben On Sep 20, 2018, at 1:53 PM, Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> wrote: I just opened an LU on the issue https://jira.whamcloud.com/browse/LU-11411 for anyone interested. Thanks a lot! -Ben On Aug 27, 2018, at 4:56 PM, Andreas Dilger mailto:adil...@whamcloud.com>> wrote: It's probably best to file an LU ticket for this issue. It looks like there is something with the log processing at mount that is trying to modify the configuration files. I'm not sure whether that should be allowed or not. Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? If it has a different MDS, you might consider to give it its own MGS as well. That doesn't have to be a separate MGS node, just a separate filesystem (ZFS fileset in the same zpool) on the MDS node. Cheers, Andreas On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) mailto:benjamin.k...@nasa.gov>> wrote: Hi all, We have two filesystems, fsA & fsB (eadc below). Both of which get snapshots taken daily, rotated over a week. It’s a beautiful feature we’ve been using in production ever since it was introduced with 2.10. -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. -) Both fsA & fsB have changelogs active. -) fsA has combined mgt/mdt on a single ZFS filesystem. -) fsB has a single mdt on a single ZFS filesystem. -) for fsA, I have no issues mounting any of the snapshots via lctl. -) for fsB, I can mount the most three recent snapshots, then encounter errors: [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system The relevant bits from dmesg are [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device [1353434.417765] Lustre: Skipped 3 previous similar messages [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [1353434.649484] Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at ...@tcp) [1353435.427747] Lustre: Skipped 23 previous similar messages [1353445.255899] Lustre: Failing over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous similar messages [1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation ost_disconnect to node ...@tcp failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous similar messages [1353445.353874] Lustre: server umount 3d40bbc-MDT complete [1353445.353877] Lustre: Skipped 3 previous similar messages [1353475.302224] Lustre: 4e646fe-MDD: changelog on [1353475.302228] Lustre: Skipped 1 previous similar message [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 36ca26b-MDT-osd: someone try to start transaction under readonly mode, should be disabled. [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) Skipped 1 previous similar message [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: P OE 3.10.0-862.6.3.el7.x86_64 #1 [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace: [1353498.968841] [] dump_stack+0x19/0x1b [1353498.968851] [] osd_trans_create+0x38b/0x3d0 [osd_zfs] [1353498.968876] [] llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887] [] llog_cat_reverse_process_cb+0x246/0x3f0 [obdclass] [1353498.968897] [] llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910] [] ? llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.968922] [] llog_cat_reverse_process+0x179/0x270 [obdclass] [1353498.968932] [] ? llog_init_handle+0xd5/0x9a0 [obdclass] [1353498.968943] [] ? llog_open_create+0x78/0x320 [obdclass] [1353498.968949] [] ? mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954] [] mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966]