Bugs should be created at jira.whamcloud.com, and you can create an account when you are there.
Cheers, Andreas > On Sep 11, 2018, at 06:10, Kirk, Benjamin <benjamin.k...@nasa.gov> wrote: > > I will attempt to create an LU with our specific information. I’ve not done > that before, I believe I’ll need an account somewhere?? > >> On Sep 11, 2018, at 7:02 AM, "lustre-discuss-requ...@lists.lustre.org" >> <lustre-discuss-requ...@lists.lustre.org> wrote: >> >> Send lustre-discuss mailing list submissions to >> lustre-discuss@lists.lustre.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> or, via email, send a message with subject or body 'help' to >> lustre-discuss-requ...@lists.lustre.org >> >> You can reach the person managing the list at >> lustre-discuss-ow...@lists.lustre.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of lustre-discuss digest..." >> >> >> Today's Topics: >> >> 1. Re: Lustre/ZFS snapshots mount error (Yong, Fan) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 11 Sep 2018 12:00:00 +0000 >> From: "Yong, Fan" <fan.y...@intel.com> >> To: Robert Redl <robert.r...@lmu.de>, >> "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> >> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error >> Message-ID: >> <7fb055e0b36b6f4eb93e637e0640a56fcc040...@fmsmsx125.amr.corp.intel.com> >> >> Content-Type: text/plain; charset="utf-8" >> >> Changelog is just one of the users for llog. There are many other users for >> llog. Means that even if without changelog, it is still possible to hit such >> trouble. So running robinhood when making snapshot may increase such race >> possibility, but disabling robinhood does not means resolved the issue. The >> final solution should be the enhancement of snapshot-mount logic. >> >> I did not find related LU ticket for this issue. >> >> -- >> Cheers, >> Nasf >> >> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On >> Behalf Of Robert Redl >> Sent: Tuesday, September 11, 2018 6:54 PM >> To: lustre-discuss@lists.lustre.org >> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error >> >> >> Thanks for the fast reply! If I understood correctly, it is currently not >> possible to use the changelog feature together with the snapshot feature, >> right? >> >> Is there already a LU-Ticket about that? >> >> Cheers, >> Robert >> >> On 09/10/2018 02:57 PM, Yong, Fan wrote: >> It is suspected that there were some llog to be handled when the snapshot >> was making Then when mount-up such snapshot, some conditions trigger the >> llog cleanup/modification automatically. So it is not related with your >> actions when mount the snapshot. Since we cannot control the system status >> when making the snapshot, then we have to skip llog related >> cleanup/modification against the snapshot when mount the snapshot. Such >> ?skip? related logic is just what we need. >> >> Cheers, >> Nasf >> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On >> Behalf Of Robert Redl >> Sent: Saturday, September 8, 2018 9:04 PM >> To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> >> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error >> >> >> Dear All, >> >> we have a similar setup with Lustre on ZFS and we make regular use of >> snapshots for the purpose of backups (backups on tape use snapshots as >> source). We would like to use robinhood in future and the question is now >> how to do it. >> >> Would it be a workaround to disable the robinhood daemon temporary during >> the mount process? >> Does the problem only occur when changelogs are consumed during the process >> of mounting a snapshot? Or is it also a problem when changelogs are consumed >> while the snapshot remains mounted (which is for us typically several hours)? >> Is there already an LU-ticket about this issue? >> >> Thanks! >> Robert >> -- >> Dr. Robert Redl >> Scientific Programmer, "Waves to Weather" (SFB/TRR165) >> Meteorologisches Institut >> Ludwig-Maximilians-Universit?t M?nchen >> Theresienstr. 37, 80333 M?nchen, Germany >> Am 03.09.2018 um 08:16 schrieb Yong, Fan: >> I would say that it is not your operations order caused trouble. Instead, it >> is related with the snapshot mount logic. As mentioned in former reply, we >> need some patch for the llog logic to avoid modifying llog under snapshot >> mode. >> >> >> -- >> Cheers, >> Nasf >> From: Kirk, Benjamin (JSC-EG311) [mailto:benjamin.k...@nasa.gov] >> Sent: Tuesday, August 28, 2018 7:53 PM >> To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> >> Cc: Andreas Dilger <adil...@whamcloud.com><mailto:adil...@whamcloud.com>; >> Yong, Fan <fan.y...@intel.com><mailto:fan.y...@intel.com> >> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error >> >> The MDS situation is very basic: active/passive mds0/mds1 for both fas & >> fsB. fsA has the combined msg/mdt in a single zfs filesystem, and fsB has >> its own mdt in a separate zfs filesystem. mds0 is primary for all. >> >> fsA & fsB DO both have changelogs enabled to feed robinhood databases. >> >> What?s the recommended procedure here we should follow before mounting the >> snapshots? >> >> 1) disable changelogs on the active mdt?s (this will compromise robinhood, >> requiring a rescan?), or >> 2) temporarily halt changelog consumption / cleanup (e.g. stop robinhood in >> our case) and then mount the snapshot? >> >> Thanks for the help! >> >> -- >> Benjamin S. Kirk, Ph.D. >> NASA Lyndon B. Johnson Space Center >> Acting Chief, Aeroscience & Flight Mechanics Division >> >> On Aug 27, 2018, at 7:33 PM, Yong, Fan >> <fan.y...@intel.com<mailto:fan.y...@intel.com>> wrote: >> >> According to the stack trace, someone was trying to cleanup old empty llogs >> during mount the snapshot. We do NOT allow any modification during mount >> snapshot; otherwise, it will trigger ZFS backend BUG(). That is why we add >> LASSERT() when start the transaction. One possible solution is that, we can >> add some check in the llog logic to avoid modifying llog under snapshot mode. >> >> >> -- >> Cheers, >> Nasf >> >> -----Original Message----- >> From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On >> Behalf Of Andreas Dilger >> Sent: Tuesday, August 28, 2018 5:57 AM >> To: Kirk, Benjamin (JSC-EG311) >> <benjamin.k...@nasa.gov<mailto:benjamin.k...@nasa.gov>> >> Cc: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> >> Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error >> >> It's probably best to file an LU ticket for this issue. >> >> It looks like there is something with the log processing at mount that is >> trying to modify the configuration files. I'm not sure whether that should >> be allowed or not. >> >> Does fab have the same MGS as fsA? Does it have the same MDS node as fsA? >> If it has a different MDS, you might consider to give it its own MGS as well. >> That doesn't have to be a separate MGS node, just a separate filesystem (ZFS >> fileset in the same zpool) on the MDS node. >> >> Cheers, Andreas >> >> >> >> >> On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) >> <benjamin.k...@nasa.gov<mailto:benjamin.k...@nasa.gov>> wrote: >> >> Hi all, >> >> We have two filesystems, fsA & fsB (eadc below). Both of which get >> snapshots taken daily, rotated over a week. It?s a beautiful feature we?ve >> been using in production ever since it was introduced with 2.10. >> >> -) We?ve got Lustre/ZFS 2.10.4 on CentOS 7.5. >> -) Both fsA & fsB have changelogs active. >> -) fsA has combined mgt/mdt on a single ZFS filesystem. >> -) fsB has a single mdt on a single ZFS filesystem. >> -) for fsA, I have no issues mounting any of the snapshots via lctl. >> -) for fsB, I can mount the most three recent snapshots, then encounter >> errors: >> >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Mon >> mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n >> eadc_AutoSS-Mon >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sun >> mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n >> eadc_AutoSS-Sun >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Sat >> mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n >> eadc_AutoSS-Sat >> [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n eadc_AutoSS-Fri >> mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at >> /mnt/eadc_AutoSS-Fri_MDT0000 failed: Read-only file system Can't mount >> the snapshot eadc_AutoSS-Fri: Read-only file system >> >> The relevant bits from dmesg are >> [1353434.417762] Lustre: 3d40bbc-MDT0000: set dev_rdonly on this >> device [1353434.417765] Lustre: Skipped 3 previous similar messages >> [1353434.649480] Lustre: 3d40bbc-MDT0000: Imperative Recovery enabled, >> recovery window shrunk from 300-900 down to 150-900 [1353434.649484] >> Lustre: Skipped 3 previous similar messages [1353434.866228] Lustre: >> 3d40bbc-MDD0000: changelog on [1353434.866233] Lustre: Skipped 1 >> previous similar message [1353435.427744] Lustre: 3d40bbc-MDT0000: >> Connection restored to ...@tcp<mailto:...@tcp> (at ...@tcp<mailto:...@tcp>) >> [1353435.427747] Lustre: >> Skipped 23 previous similar messages [1353445.255899] Lustre: Failing >> over 3d40bbc-MDT0000 [1353445.255903] Lustre: Skipped 3 previous >> similar messages [1353445.256150] LustreError: 11-0: >> 3d40bbc-OST0000-osc-MDT0000: operation ost_disconnect to node >> ...@tcp<mailto:...@tcp> >> failed: rc = -107 [1353445.257896] LustreError: Skipped 23 previous >> similar messages [1353445.353874] Lustre: server umount >> 3d40bbc-MDT0000 complete [1353445.353877] Lustre: Skipped 3 previous >> similar messages [1353475.302224] Lustre: 4e646fe-MDD0000: changelog >> on [1353475.302228] Lustre: Skipped 1 previous similar message >> [1353498.964016] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) >> 36ca26b-MDT0000-osd: someone try to start transaction under readonly mode, >> should be disabled. >> [1353498.967260] LustreError: 25582:0:(osd_handler.c:341:osd_trans_create()) >> Skipped 1 previous similar message >> [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: >> P OE ------------ 3.10.0-862.6.3.el7.x86_64 #1 >> [1353498.968830] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, >> BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace: >> [1353498.968841] [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b >> [1353498.968851] [<ffffffffc0cbe5db>] osd_trans_create+0x38b/0x3d0 >> [osd_zfs] [1353498.968876] [<ffffffffc1116044>] >> llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887] >> [<ffffffffc111f0f6>] llog_cat_reverse_process_cb+0x246/0x3f0 >> [obdclass] [1353498.968897] [<ffffffffc111a32c>] >> llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910] >> [<ffffffffc111eeb0>] ? llog_cat_process_cb+0x4e0/0x4e0 [obdclass] >> [1353498.968922] [<ffffffffc111af69>] >> llog_cat_reverse_process+0x179/0x270 [obdclass] [1353498.968932] >> [<ffffffffc1115585>] ? llog_init_handle+0xd5/0x9a0 [obdclass] >> [1353498.968943] [<ffffffffc1116e78>] ? llog_open_create+0x78/0x320 >> [obdclass] [1353498.968949] [<ffffffffc12e55f0>] ? >> mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954] [<ffffffffc12ec7af>] >> mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966] [<ffffffffc166b037>] >> mdt_prepare+0x57/0x3b0 [mdt] [1353498.968983] [<ffffffffc1183afd>] >> server_start_targets+0x234d/0x2bd0 [obdclass] [1353498.968999] >> [<ffffffffc1153500>] ? class_config_dump_handler+0x7e0/0x7e0 >> [obdclass] [1353498.969012] [<ffffffffc118541d>] >> server_fill_super+0x109d/0x185a [obdclass] [1353498.969025] >> [<ffffffffc115cef8>] lustre_fill_super+0x328/0x950 [obdclass] >> [1353498.969038] [<ffffffffc115cbd0>] ? >> lustre_common_put_super+0x270/0x270 [obdclass] [1353498.969041] >> [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.969053] >> [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass] >> [1353498.969055] [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0 [1353498.969060] >> [<ffffffffb563d4b7>] vfs_kern_mount+0x67/0x110 [1353498.969062] >> [<ffffffffb563fadf>] do_mount+0x1ef/0xce0 [1353498.969066] >> [<ffffffffb55f7c2c>] ? kmem_cache_alloc_trace+0x3c/0x200 [1353498.969069] >> [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.969074] >> [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21 [1353498.969079] >> LustreError: 25582:0:(llog_cat.c:1027:llog_cat_reverse_process_cb()) >> 36ca26b-MDD0000: fail to destroy empty log: rc = -30 >> [1353498.970785] CPU: 6 PID: 25582 Comm: mount.lustre Kdump: loaded Tainted: >> P OE ------------ 3.10.0-862.6.3.el7.x86_64 #1 >> [1353498.970786] Hardware name: Supermicro SYS-6027TR-D71FRF/X9DRT, >> BIOS 3.2a 08/04/2015 [1353498.970787] Call Trace: >> [1353498.970790] [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b >> [1353498.970795] [<ffffffffc0cbe5db>] osd_trans_create+0x38b/0x3d0 >> [osd_zfs] [1353498.970807] [<ffffffffc1117921>] >> llog_cancel_rec+0xc1/0x880 [obdclass] [1353498.970817] >> [<ffffffffc111e13b>] llog_cat_cleanup+0xdb/0x380 [obdclass] >> [1353498.970827] [<ffffffffc111f14d>] >> llog_cat_reverse_process_cb+0x29d/0x3f0 [obdclass] [1353498.970838] >> [<ffffffffc111a32c>] llog_reverse_process+0x38c/0xaa0 [obdclass] >> [1353498.970848] [<ffffffffc111eeb0>] ? >> llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.970858] >> [<ffffffffc111af69>] llog_cat_reverse_process+0x179/0x270 [obdclass] >> [1353498.970868] [<ffffffffc1115585>] ? llog_init_handle+0xd5/0x9a0 >> [obdclass] [1353498.970878] [<ffffffffc1116e78>] ? >> llog_open_create+0x78/0x320 [obdclass] [1353498.970883] >> [<ffffffffc12e55f0>] ? mdd_root_get+0xf0/0xf0 [mdd] [1353498.970887] >> [<ffffffffc12ec7af>] mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.970894] >> [<ffffffffc166b037>] mdt_prepare+0x57/0x3b0 [mdt] [1353498.970908] >> [<ffffffffc1183afd>] server_start_targets+0x234d/0x2bd0 [obdclass] >> [1353498.970924] [<ffffffffc1153500>] ? >> class_config_dump_handler+0x7e0/0x7e0 [obdclass] [1353498.970938] >> [<ffffffffc118541d>] server_fill_super+0x109d/0x185a [obdclass] >> [1353498.970950] [<ffffffffc115cef8>] lustre_fill_super+0x328/0x950 >> [obdclass] [1353498.970962] [<ffffffffc115cbd0>] ? >> lustre_common_put_super+0x270/0x270 [obdclass] [1353498.970964] >> [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.970976] >> [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass] >> [1353498.970978] [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0 >> [1353498.970980] [<ffffffffb563d4b7>] vfs_kern_mount+0x67/0x110 >> [1353498.970982] [<ffffffffb563fadf>] do_mount+0x1ef/0xce0 >> [1353498.970984] [<ffffffffb55f7c2c>] ? >> kmem_cache_alloc_trace+0x3c/0x200 [1353498.970986] >> [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.970989] >> [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21 [1353498.970996] >> LustreError: 25582:0:(mdd_device.c:354:mdd_changelog_llog_init()) >> 36ca26b-MDD0000: changelog init failed: rc = -30 [1353498.972790] >> LustreError: 25582:0:(mdd_device.c:427:mdd_changelog_init()) >> 36ca26b-MDD0000: changelog setup during init failed: rc = -30 >> [1353498.974525] LustreError: >> 25582:0:(mdd_device.c:1061:mdd_prepare()) 36ca26b-MDD0000: failed to >> initialize changelog: rc = -30 [1353498.976229] LustreError: >> 25582:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start >> targets: -30 [1353499.072002] LustreError: >> 25582:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-30) >> >> >> I?m hoping those traces mean something to someone - any ideas? >> >> Thanks! >> >> -- >> Benjamin S. Kirk >> >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> Cheers, Andreas >> --- >> Andreas Dilger >> CTO Whamcloud >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> lustre-discuss mailing list >> >> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> >> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180911/160cde21/attachment.html> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> >> ------------------------------ >> >> End of lustre-discuss Digest, Vol 150, Issue 14 >> *********************************************** > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org Cheers, Andreas --- Andreas Dilger CTO Whamcloud
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org