Bugs should be created at jira.whamcloud.com, and you can create an account 
when you are there.

Cheers, Andreas

> On Sep 11, 2018, at 06:10, Kirk, Benjamin <benjamin.k...@nasa.gov> wrote:
> 
> I will attempt to create an LU with our specific information. I’ve not done 
> that before, I believe I’ll need an account somewhere??
> 
>> On Sep 11, 2018, at 7:02 AM, "lustre-discuss-requ...@lists.lustre.org" 
>> <lustre-discuss-requ...@lists.lustre.org> wrote:
>> 
>> Send lustre-discuss mailing list submissions to
>>   lustre-discuss@lists.lustre.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>>   http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> or, via email, send a message with subject or body 'help' to
>>   lustre-discuss-requ...@lists.lustre.org
>> 
>> You can reach the person managing the list at
>>   lustre-discuss-ow...@lists.lustre.org
>> 
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of lustre-discuss digest..."
>> 
>> 
>> Today's Topics:
>> 
>>  1. Re: Lustre/ZFS snapshots mount error (Yong, Fan)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Tue, 11 Sep 2018 12:00:00 +0000
>> From: "Yong, Fan" <fan.y...@intel.com>
>> To: Robert Redl <robert.r...@lmu.de>,
>>   "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>
>> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>> Message-ID:
>>   <7fb055e0b36b6f4eb93e637e0640a56fcc040...@fmsmsx125.amr.corp.intel.com>
>> 
>> Content-Type: text/plain; charset="utf-8"
>> 
>> Changelog is just one of the users for llog. There are many other users for 
>> llog. Means that even if without changelog, it is still possible to hit such 
>> trouble. So running robinhood when making snapshot may increase such race 
>> possibility, but disabling robinhood does not means resolved the issue. The 
>> final solution should be the enhancement of snapshot-mount logic.
>> 
>> I did not find related LU ticket for this issue.
>> 
>> --
>> Cheers,
>> Nasf
>> 
>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
>> Behalf Of Robert Redl
>> Sent: Tuesday, September 11, 2018 6:54 PM
>> To: lustre-discuss@lists.lustre.org
>> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>> 
>> 
>> Thanks for the fast reply! If I understood correctly, it is currently not 
>> possible to use the changelog feature together with the snapshot feature, 
>> right?
>> 
>> Is there already a LU-Ticket about that?
>> 
>> Cheers,
>> Robert
>> 
>> On 09/10/2018 02:57 PM, Yong, Fan wrote:
>> It is suspected that there were some llog to be handled when the snapshot 
>> was making Then when mount-up such snapshot, some conditions trigger the 
>> llog cleanup/modification automatically. So it is not related with your 
>> actions when mount the snapshot. Since we cannot control the system status 
>> when making the snapshot, then we have to skip llog related 
>> cleanup/modification against the snapshot when mount the snapshot. Such 
>> ?skip? related logic is just what we need.
>> 
>> Cheers,
>> Nasf
>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
>> Behalf Of Robert Redl
>> Sent: Saturday, September 8, 2018 9:04 PM
>> To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>> 
>> 
>> Dear All,
>> 
>> we have a similar setup with Lustre on ZFS and we make regular use of 
>> snapshots for the purpose of backups (backups on tape use snapshots as 
>> source). We would like to use robinhood in future and the question is now 
>> how to do it.
>> 
>> Would it be a workaround to disable the robinhood daemon temporary during 
>> the mount process?
>> Does the problem only occur when changelogs are consumed during the process 
>> of mounting a snapshot? Or is it also a problem when changelogs are consumed 
>> while the snapshot remains mounted (which is for us typically several hours)?
>> Is there already an LU-ticket about this issue?
>> 
>> Thanks!
>> Robert
>> --
>> Dr. Robert Redl
>> Scientific Programmer, "Waves to Weather" (SFB/TRR165)
>> Meteorologisches Institut
>> Ludwig-Maximilians-Universit?t M?nchen
>> Theresienstr. 37, 80333 M?nchen, Germany
>> Am 03.09.2018 um 08:16 schrieb Yong, Fan:
>> I would say that it is not your operations order caused trouble. Instead, it 
>> is related with the snapshot mount logic. As mentioned in former reply, we 
>> need some patch for the llog logic to avoid modifying llog under snapshot 
>> mode.
>> 
>> 
>> --
>> Cheers,
>> Nasf
>> From: Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov]
>> Sent: Tuesday, August 28, 2018 7:53 PM
>> To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> Cc: Andreas Dilger <adil...@whamcloud.com><mailto:adil...@whamcloud.com>; 
>> Yong, Fan <fan.y...@intel.com><mailto:fan.y...@intel.com>
>> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>> 
>> The MDS situation is very basic: active/passive mds0/mds1 for both fas & 
>> fsB.  fsA has the combined msg/mdt in a single zfs filesystem, and fsB has 
>> its own mdt in a separate zfs filesystem.  mds0 is primary for all.
>> 
>> fsA & fsB DO both have changelogs enabled to feed robinhood databases.
>> 
>> What?s the recommended procedure here we should follow before mounting the 
>> snapshots?
>> 
>> 1) disable changelogs on the active mdt?s (this will compromise robinhood, 
>> requiring a rescan?), or
>> 2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in 
>> our case) and then mount the snapshot?
>> 
>> Thanks for the help!
>> 
>> --
>> Benjamin S. Kirk, Ph.D.
>> NASA Lyndon B. Johnson Space Center
>> Acting Chief, Aeroscience & Flight Mechanics Division
>> 
>> On Aug 27, 2018, at 7:33 PM, Yong, Fan 
>> <fan.y...@intel.com<mailto:fan.y...@intel.com>> wrote:
>> 
>> According to the stack trace, someone was trying to cleanup old empty llogs 
>> during mount the snapshot. We do NOT allow any modification during mount 
>> snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add 
>> LASSERT() when start the transaction. One possible solution is that, we can 
>> add some check in the llog logic to avoid modifying llog under snapshot mode.
>> 
>> 
>> --
>> Cheers,
>> Nasf
>> 
>> -----Original Message-----
>> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On 
>> Behalf Of Andreas Dilger
>> Sent: Tuesday, August 28, 2018 5:57 AM
>> To: Kirk, Benjamin (JSC-EG311) 
>> <benjamin.k...@nasa.gov<mailto:benjamin.k...@nasa.gov>>
>> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error
>> 
>> It's probably best to file an LU ticket for this issue.
>> 
>> It looks like there is something with the log processing at mount that is 
>> trying to modify the configuration files.  I'm not sure whether that should 
>> be allowed or not.
>> 
>> Does fab have the same MGS as fsA?  Does it have the same MDS node as fsA?
>> If it has a different MDS, you might consider to give it its own MGS as well.
>> That doesn't have to be a separate MGS node, just a separate filesystem (ZFS 
>> fileset in the same zpool) on the MDS node.
>> 
>> Cheers, Andreas
>> 
>> 
>> 
>> 
>> On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) 
>> <benjamin.k...@nasa.gov<mailto:benjamin.k...@nasa.gov>> wrote:
>> 
>> Hi all,
>> 
>> We have two filesystems, fsA & fsB (eadc below).  Both of which get 
>> snapshots taken daily, rotated over a week.  It?s a beautiful feature we?ve 
>> been using in production ever since it was introduced with 2.10.
>> 
>> -) We?ve got Lustre/ZFS 2.10.4 on CentOS 7.5.
>> -) Both fsA & fsB have changelogs active.
>> -) fsA has combined mgt/mdt on a single ZFS filesystem.
>> -) fsB has a single mdt on a single ZFS filesystem.
>> -) for fsA, I have no issues mounting any of the snapshots via lctl.
>> -) for fsB, I can mount the most three recent snapshots, then encounter 
>> errors:
>> 
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon
>> mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n
>> eadc_AutoSS-Mon
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun
>> mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n
>> eadc_AutoSS-Sun
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat
>> mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n
>> eadc_AutoSS-Sat
>> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri
>> mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at
>> /mnt/eadc_AutoSS-Fri_MDT0000 failed: Read-only file system Can't mount
>> the snapshot eadc_AutoSS-Fri: Read-only file system
>> 
>> The relevant bits from dmesg are
>> [1353434.417762] Lustre: 3d40bbc-MDT0000: set dev_rdonly on this
>> device [1353434.417765] Lustre: Skipped 3 previous similar messages
>> [1353434.649480] Lustre: 3d40bbc-MDT0000: Imperative Recovery enabled,
>> recovery window shrunk from 300-900 down to 150-900 [1353434.649484]
>> Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre:
>> 3d40bbc-MDD0000: changelog on [1353434.866233] Lustre: Skipped 1
>> previous similar message [1353435.427744] Lustre: 3d40bbc-MDT0000:
>> Connection restored to ...@tcp<mailto:...@tcp> (at ...@tcp<mailto:...@tcp>) 
>> [1353435.427747] Lustre:
>> Skipped 23 previous similar messages [1353445.255899] Lustre: Failing
>> over 3d40bbc-MDT0000 [1353445.255903] Lustre: Skipped 3 previous
>> similar messages [1353445.256150] LustreError: 11-0:
>> 3d40bbc-OST0000-osc-MDT0000: operation ost_disconnect to node 
>> ...@tcp<mailto:...@tcp>
>> failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous
>> similar messages [1353445.353874] Lustre: server umount
>> 3d40bbc-MDT0000 complete [1353445.353877] Lustre: Skipped 3 previous
>> similar messages [1353475.302224] Lustre: 4e646fe-MDD0000: changelog
>> on [1353475.302228] Lustre: Skipped 1 previous similar message 
>> [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
>> 36ca26b-MDT0000-osd: someone try to start transaction under readonly mode, 
>> should be disabled.
>> [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) 
>> Skipped 1 previous similar message
>> [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: 
>> P           OE  ------------   3.10.0-862.6.3.el7.x86_64 #1
>> [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT,
>> BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace:
>> [1353498.968841]  [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b
>> [1353498.968851]  [<ffffffffc0cbe5db>] osd_trans_create+0x38b/0x3d0
>> [osd_zfs] [1353498.968876]  [<ffffffffc1116044>]
>> llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887]
>> [<ffffffffc111f0f6>] llog_cat_reverse_process_cb+0x246/0x3f0
>> [obdclass] [1353498.968897]  [<ffffffffc111a32c>]
>> llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910]
>> [<ffffffffc111eeb0>] ? llog_cat_process_cb+0x4e0/0x4e0 [obdclass]
>> [1353498.968922]  [<ffffffffc111af69>]
>> llog_cat_reverse_process+0x179/0x270 [obdclass] [1353498.968932]
>> [<ffffffffc1115585>] ? llog_init_handle+0xd5/0x9a0 [obdclass]
>> [1353498.968943]  [<ffffffffc1116e78>] ? llog_open_create+0x78/0x320
>> [obdclass] [1353498.968949]  [<ffffffffc12e55f0>] ?
>> mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954]  [<ffffffffc12ec7af>]
>> mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966]  [<ffffffffc166b037>]
>> mdt_prepare+0x57/0x3b0 [mdt] [1353498.968983]  [<ffffffffc1183afd>]
>> server_start_targets+0x234d/0x2bd0 [obdclass] [1353498.968999]
>> [<ffffffffc1153500>] ? class_config_dump_handler+0x7e0/0x7e0
>> [obdclass] [1353498.969012]  [<ffffffffc118541d>]
>> server_fill_super+0x109d/0x185a [obdclass] [1353498.969025]
>> [<ffffffffc115cef8>] lustre_fill_super+0x328/0x950 [obdclass]
>> [1353498.969038]  [<ffffffffc115cbd0>] ?
>> lustre_common_put_super+0x270/0x270 [obdclass] [1353498.969041]
>> [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.969053]
>> [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass]
>> [1353498.969055]  [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0 [1353498.969060]  
>> [<ffffffffb563d4b7>] vfs_kern_mount+0x67/0x110 [1353498.969062]  
>> [<ffffffffb563fadf>] do_mount+0x1ef/0xce0 [1353498.969066]  
>> [<ffffffffb55f7c2c>] ? kmem_cache_alloc_trace+0x3c/0x200 [1353498.969069]  
>> [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.969074]  
>> [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21 [1353498.969079] 
>> LustreError: 25582:0:(llog_cat.c:1027:llog_cat_reverse_process_cb()) 
>> 36ca26b-MDD0000: fail to destroy empty log: rc = -30
>> [1353498.970785] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: 
>> P           OE  ------------   3.10.0-862.6.3.el7.x86_64 #1
>> [1353498.970786] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT,
>> BIOS 3.2a 08/04/2015 [1353498.970787] Call Trace:
>> [1353498.970790]  [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b
>> [1353498.970795]  [<ffffffffc0cbe5db>] osd_trans_create+0x38b/0x3d0
>> [osd_zfs] [1353498.970807]  [<ffffffffc1117921>]
>> llog_cancel_rec+0xc1/0x880 [obdclass] [1353498.970817]
>> [<ffffffffc111e13b>] llog_cat_cleanup+0xdb/0x380 [obdclass]
>> [1353498.970827]  [<ffffffffc111f14d>]
>> llog_cat_reverse_process_cb+0x29d/0x3f0 [obdclass] [1353498.970838]
>> [<ffffffffc111a32c>] llog_reverse_process+0x38c/0xaa0 [obdclass]
>> [1353498.970848]  [<ffffffffc111eeb0>] ?
>> llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.970858]
>> [<ffffffffc111af69>] llog_cat_reverse_process+0x179/0x270 [obdclass]
>> [1353498.970868]  [<ffffffffc1115585>] ? llog_init_handle+0xd5/0x9a0
>> [obdclass] [1353498.970878]  [<ffffffffc1116e78>] ?
>> llog_open_create+0x78/0x320 [obdclass] [1353498.970883]
>> [<ffffffffc12e55f0>] ? mdd_root_get+0xf0/0xf0 [mdd] [1353498.970887]
>> [<ffffffffc12ec7af>] mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.970894]
>> [<ffffffffc166b037>] mdt_prepare+0x57/0x3b0 [mdt] [1353498.970908]
>> [<ffffffffc1183afd>] server_start_targets+0x234d/0x2bd0 [obdclass]
>> [1353498.970924]  [<ffffffffc1153500>] ?
>> class_config_dump_handler+0x7e0/0x7e0 [obdclass] [1353498.970938]
>> [<ffffffffc118541d>] server_fill_super+0x109d/0x185a [obdclass]
>> [1353498.970950]  [<ffffffffc115cef8>] lustre_fill_super+0x328/0x950
>> [obdclass] [1353498.970962]  [<ffffffffc115cbd0>] ?
>> lustre_common_put_super+0x270/0x270 [obdclass] [1353498.970964]
>> [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.970976]
>> [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass]
>> [1353498.970978]  [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0
>> [1353498.970980]  [<ffffffffb563d4b7>] vfs_kern_mount+0x67/0x110
>> [1353498.970982]  [<ffffffffb563fadf>] do_mount+0x1ef/0xce0
>> [1353498.970984]  [<ffffffffb55f7c2c>] ?
>> kmem_cache_alloc_trace+0x3c/0x200 [1353498.970986]
>> [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.970989]
>> [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21 [1353498.970996]
>> LustreError: 25582:0:(mdd_device.c:354:mdd_changelog_llog_init())
>> 36ca26b-MDD0000: changelog init failed: rc = -30 [1353498.972790]
>> LustreError: 25582:0:(mdd_device.c:427:mdd_changelog_init())
>> 36ca26b-MDD0000: changelog setup during init failed: rc = -30
>> [1353498.974525] LustreError:
>> 25582:0:(mdd_device.c:1061:mdd_prepare()) 36ca26b-MDD0000: failed to
>> initialize changelog: rc = -30 [1353498.976229] LustreError:
>> 25582:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start
>> targets: -30 [1353499.072002] LustreError:
>> 25582:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-30)
>> 
>> 
>> I?m hoping those traces mean something to someone - any ideas?
>> 
>> Thanks!
>> 
>> --
>> Benjamin S. Kirk
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> Cheers, Andreas
>> ---
>> Andreas Dilger
>> CTO Whamcloud
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> 
>> lustre-discuss mailing list
>> 
>> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>> 
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> 
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: 
>> <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180911/160cde21/attachment.html>
>> 
>> ------------------------------
>> 
>> Subject: Digest Footer
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> 
>> ------------------------------
>> 
>> End of lustre-discuss Digest, Vol 150, Issue 14
>> ***********************************************
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud




Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to