Re: [lustre-discuss] Lustre/ZFS snapshots mount error

2018-08-27 Thread Yong, Fan
According to the stack trace, someone was trying to cleanup old empty llogs 
during mount the snapshot. We do NOT allow any modification during mount 
snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add 
LASSERT() when start the transaction. One possible solution is that, we can add 
some check in the llog logic to avoid modifying llog under snapshot mode.


--
Cheers,
Nasf

-Original Message-
From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf 
Of Andreas Dilger
Sent: Tuesday, August 28, 2018 5:57 AM
To: Kirk, Benjamin (JSC-EG311) 
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error

It's probably best to file an LU ticket for this issue.

It looks like there is something with the log processing at mount that is 
trying to modify the configuration files.  I'm not sure whether that should be 
allowed or not.

Does fab have the same MGS as fsA?  Does it have the same MDS node as fsA?
If it has a different MDS, you might consider to give it its own MGS as well.
That doesn't have to be a separate MGS node, just a separate filesystem (ZFS 
fileset in the same zpool) on the MDS node.

Cheers, Andreas

> On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) 
>  wrote:
> 
> Hi all,
> 
> We have two filesystems, fsA & fsB (eadc below).  Both of which get snapshots 
> taken daily, rotated over a week.  It’s a beautiful feature we’ve been using 
> in production ever since it was introduced with 2.10.
> 
> -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5.
> -) Both fsA & fsB have changelogs active.
> -) fsA has combined mgt/mdt on a single ZFS filesystem.
> -) fsB has a single mdt on a single ZFS filesystem.
> -) for fsA, I have no issues mounting any of the snapshots via lctl.
> -) for fsB, I can mount the most three recent snapshots, then encounter 
> errors:
> 
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon 
> mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n 
> eadc_AutoSS-Mon
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun 
> mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n 
> eadc_AutoSS-Sun
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat 
> mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n 
> eadc_AutoSS-Sat
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri
> mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at 
> /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system Can't mount 
> the snapshot eadc_AutoSS-Fri: Read-only file system
> 
> The relevant bits from dmesg are
> [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this 
> device [1353434.417765] Lustre: Skipped 3 previous similar messages 
> [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, 
> recovery window shrunk from 300-900 down to 150-900 [1353434.649484] 
> Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: 
> 3d40bbc-MDD: changelog on [1353434.866233] Lustre: Skipped 1 
> previous similar message [1353435.427744] Lustre: 3d40bbc-MDT: 
> Connection restored to ...@tcp (at ...@tcp) [1353435.427747] Lustre: 
> Skipped 23 previous similar messages [1353445.255899] Lustre: Failing 
> over 3d40bbc-MDT [1353445.255903] Lustre: Skipped 3 previous 
> similar messages [1353445.256150] LustreError: 11-0: 
> 3d40bbc-OST-osc-MDT: operation ost_disconnect to node ...@tcp 
> failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous 
> similar messages [1353445.353874] Lustre: server umount 
> 3d40bbc-MDT complete [1353445.353877] Lustre: Skipped 3 previous 
> similar messages [1353475.302224] Lustre: 4e646fe-MDD: changelog 
> on [1353475.302228] Lustre: Skipped 1 previous similar message 
> [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
> 36ca26b-MDT-osd: someone try to start transaction under readonly mode, 
> should be disabled.
> [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
> Skipped 1 previous similar message
> [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: 
> P   OE     3.10.0-862.6.3.el7.x86_64 #1
> [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, 
> BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace:
> [1353498.968841]  [] dump_stack+0x19/0x1b 
> [1353498.968851]  [] osd_trans_create+0x38b/0x3d0 
> [osd_zfs] [1353498.968876]  [] 
> llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887]  
> [] llog_cat_reverse_process_cb+0x246/0x3f0 
> [obdclass] [1353498.968897]  [] 
> llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910]  
> [] ? llog_cat_process_cb+0x4e0/0x4e0 [obdclass] 
> [1353498.968922]  [] 
> 

Re: [lustre-discuss] Lustre/ZFS snapshots mount error

2018-08-27 Thread Andreas Dilger
It's probably best to file an LU ticket for this issue.

It looks like there is something with the log processing at mount that is 
trying to modify the configuration files.  I'm not sure whether that should be 
allowed or not.

Does fab have the same MGS as fsA?  Does it have the same MDS node as fsA?
If it has a different MDS, you might consider to give it its own MGS as well.
That doesn't have to be a separate MGS node, just a separate filesystem (ZFS 
fileset in the same zpool) on the MDS node.

Cheers, Andreas

> On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) 
>  wrote:
> 
> Hi all,
> 
> We have two filesystems, fsA & fsB (eadc below).  Both of which get snapshots 
> taken daily, rotated over a week.  It’s a beautiful feature we’ve been using 
> in production ever since it was introduced with 2.10.
> 
> -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5.
> -) Both fsA & fsB have changelogs active.
> -) fsA has combined mgt/mdt on a single ZFS filesystem.
> -) fsB has a single mdt on a single ZFS filesystem.
> -) for fsA, I have no issues mounting any of the snapshots via lctl.
> -) for fsB, I can mount the most three recent snapshots, then encounter 
> errors:
> 
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon
> mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun
> mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat
> mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat
> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri
> mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at 
> /mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system
> Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system
> 
> The relevant bits from dmesg are
> [1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device
> [1353434.417765] Lustre: Skipped 3 previous similar messages
> [1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, 
> recovery window shrunk from 300-900 down to 150-900
> [1353434.649484] Lustre: Skipped 3 previous similar messages
> [1353434.866228] Lustre: 3d40bbc-MDD: changelog on
> [1353434.866233] Lustre: Skipped 1 previous similar message
> [1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at 
> ...@tcp)
> [1353435.427747] Lustre: Skipped 23 previous similar messages
> [1353445.255899] Lustre: Failing over 3d40bbc-MDT
> [1353445.255903] Lustre: Skipped 3 previous similar messages
> [1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation 
> ost_disconnect to node ...@tcp failed: rc = -107
> [1353445.257896] LustreError: Skipped 23 previous similar messages
> [1353445.353874] Lustre: server umount 3d40bbc-MDT complete
> [1353445.353877] Lustre: Skipped 3 previous similar messages
> [1353475.302224] Lustre: 4e646fe-MDD: changelog on
> [1353475.302228] Lustre: Skipped 1 previous similar message
> [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
> 36ca26b-MDT-osd: someone try to start transaction under readonly mode, 
> should be disabled.
> [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
> Skipped 1 previous similar message
> [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: 
> P   OE     3.10.0-862.6.3.el7.x86_64 #1
> [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 
> 08/04/2015
> [1353498.968832] Call Trace:
> [1353498.968841]  [] dump_stack+0x19/0x1b
> [1353498.968851]  [] osd_trans_create+0x38b/0x3d0 [osd_zfs]
> [1353498.968876]  [] llog_destroy+0x1f4/0x3f0 [obdclass]
> [1353498.968887]  [] 
> llog_cat_reverse_process_cb+0x246/0x3f0 [obdclass]
> [1353498.968897]  [] llog_reverse_process+0x38c/0xaa0 
> [obdclass]
> [1353498.968910]  [] ? llog_cat_process_cb+0x4e0/0x4e0 
> [obdclass]
> [1353498.968922]  [] llog_cat_reverse_process+0x179/0x270 
> [obdclass]
> [1353498.968932]  [] ? llog_init_handle+0xd5/0x9a0 
> [obdclass]
> [1353498.968943]  [] ? llog_open_create+0x78/0x320 
> [obdclass]
> [1353498.968949]  [] ? mdd_root_get+0xf0/0xf0 [mdd]
> [1353498.968954]  [] mdd_prepare+0x13ff/0x1c70 [mdd]
> [1353498.968966]  [] mdt_prepare+0x57/0x3b0 [mdt]
> [1353498.968983]  [] server_start_targets+0x234d/0x2bd0 
> [obdclass]
> [1353498.968999]  [] ? 
> class_config_dump_handler+0x7e0/0x7e0 [obdclass]
> [1353498.969012]  [] server_fill_super+0x109d/0x185a 
> [obdclass]
> [1353498.969025]  [] lustre_fill_super+0x328/0x950 
> [obdclass]
> [1353498.969038]  [] ? lustre_common_put_super+0x270/0x270 
> [obdclass]
> [1353498.969041]  [] mount_nodev+0x4f/0xb0
> 

[lustre-discuss] Lustre 2.10.5 released

2018-08-27 Thread Peter Jones
We are pleased to announce that the Lustre 2.10.5 Release has been declared GA 
and is available for 
download. You can 
also grab the source from 
git.

Along with a number of useful bug fixes, this maintenance release includes the 
following notable enhancements over 2.10.4:


· Ubuntu 16.04 packages are now correctly produced  
(LU-11176)

· Mellanox OFED 4.4-2 is now the default version built and tested


Details of changes since 2.10.4 
 can be found in the 2.10.5 
change log.

Please log any issues found in the issue tracking 
system.

Thanks to all those who have contributed to the creation of this release.

We are expecting to release Lustre 2.10.6 in the coming months

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre/ZFS snapshots mount error

2018-08-27 Thread Kirk, Benjamin (JSC-EG311)
Hi all,

We have two filesystems, fsA & fsB (eadc below).  Both of which get snapshots 
taken daily, rotated over a week.  It’s a beautiful feature we’ve been using in 
production ever since it was introduced with 2.10.

-) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5.
-) Both fsA & fsB have changelogs active.
-) fsA has combined mgt/mdt on a single ZFS filesystem.
-) fsB has a single mdt on a single ZFS filesystem.
-) for fsA, I have no issues mounting any of the snapshots via lctl.
-) for fsB, I can mount the most three recent snapshots, then encounter errors:

[root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon
mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc
[root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Mon
[root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun
mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a
[root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sun
[root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat
mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe
[root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n eadc_AutoSS-Sat
[root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri
mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at 
/mnt/eadc_AutoSS-Fri_MDT failed: Read-only file system
Can't mount the snapshot eadc_AutoSS-Fri: Read-only file system

The relevant bits from dmesg are
[1353434.417762] Lustre: 3d40bbc-MDT: set dev_rdonly on this device
[1353434.417765] Lustre: Skipped 3 previous similar messages
[1353434.649480] Lustre: 3d40bbc-MDT: Imperative Recovery enabled, recovery 
window shrunk from 300-900 down to 150-900
[1353434.649484] Lustre: Skipped 3 previous similar messages
[1353434.866228] Lustre: 3d40bbc-MDD: changelog on
[1353434.866233] Lustre: Skipped 1 previous similar message
[1353435.427744] Lustre: 3d40bbc-MDT: Connection restored to ...@tcp (at 
...@tcp)
[1353435.427747] Lustre: Skipped 23 previous similar messages
[1353445.255899] Lustre: Failing over 3d40bbc-MDT
[1353445.255903] Lustre: Skipped 3 previous similar messages
[1353445.256150] LustreError: 11-0: 3d40bbc-OST-osc-MDT: operation 
ost_disconnect to node ...@tcp failed: rc = -107
[1353445.257896] LustreError: Skipped 23 previous similar messages
[1353445.353874] Lustre: server umount 3d40bbc-MDT complete
[1353445.353877] Lustre: Skipped 3 previous similar messages
[1353475.302224] Lustre: 4e646fe-MDD: changelog on
[1353475.302228] Lustre: Skipped 1 previous similar message
[1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
36ca26b-MDT-osd: someone try to start transaction under readonly mode, 
should be disabled.
[1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
Skipped 1 previous similar message
[1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: P  
 OE     3.10.0-862.6.3.el7.x86_64 #1
[1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 
08/04/2015
[1353498.968832] Call Trace:
[1353498.968841]  [] dump_stack+0x19/0x1b
[1353498.968851]  [] osd_trans_create+0x38b/0x3d0 [osd_zfs]
[1353498.968876]  [] llog_destroy+0x1f4/0x3f0 [obdclass]
[1353498.968887]  [] llog_cat_reverse_process_cb+0x246/0x3f0 
[obdclass]
[1353498.968897]  [] llog_reverse_process+0x38c/0xaa0 
[obdclass]
[1353498.968910]  [] ? llog_cat_process_cb+0x4e0/0x4e0 
[obdclass]
[1353498.968922]  [] llog_cat_reverse_process+0x179/0x270 
[obdclass]
[1353498.968932]  [] ? llog_init_handle+0xd5/0x9a0 [obdclass]
[1353498.968943]  [] ? llog_open_create+0x78/0x320 [obdclass]
[1353498.968949]  [] ? mdd_root_get+0xf0/0xf0 [mdd]
[1353498.968954]  [] mdd_prepare+0x13ff/0x1c70 [mdd]
[1353498.968966]  [] mdt_prepare+0x57/0x3b0 [mdt]
[1353498.968983]  [] server_start_targets+0x234d/0x2bd0 
[obdclass]
[1353498.968999]  [] ? class_config_dump_handler+0x7e0/0x7e0 
[obdclass]
[1353498.969012]  [] server_fill_super+0x109d/0x185a 
[obdclass]
[1353498.969025]  [] lustre_fill_super+0x328/0x950 [obdclass]
[1353498.969038]  [] ? lustre_common_put_super+0x270/0x270 
[obdclass]
[1353498.969041]  [] mount_nodev+0x4f/0xb0
[1353498.969053]  [] lustre_mount+0x38/0x60 [obdclass]
[1353498.969055]  [] mount_fs+0x3e/0x1b0
[1353498.969060]  [] vfs_kern_mount+0x67/0x110
[1353498.969062]  [] do_mount+0x1ef/0xce0
[1353498.969066]  [] ? kmem_cache_alloc_trace+0x3c/0x200
[1353498.969069]  [] SyS_mount+0x83/0xd0
[1353498.969074]  [] system_call_fastpath+0x1c/0x21
[1353498.969079] LustreError: 
25582:0:(llog_cat.c:1027:llog_cat_reverse_process_cb()) 36ca26b-MDD: fail 
to destroy empty log: rc = -30
[1353498.970785] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: P  
 OE     3.10.0-862.6.3.el7.x86_64 #1
[1353498.970786] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, BIOS 3.2a 
08/04/2015
[1353498.970787] Call Trace:
[1353498.970790]  [] dump_stack+0x19/0x1b