We were able to get our LFS back up using the fix in LU-13189 and have been 
stable since.  But I'd still appreciate some help backing out of this.  

* Is the "lfs setquota -p 1" the likely cause of our crash?
* If so:
        * Why would it take 1 week to show up?
        * What is the best way to reverse any ill effects the "lfs setquota -p 
1" command may have caused?
        * Should there be some protection in the lustre source for this?


-----Original Message-----
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
"Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss" 
<lustre-discuss@lists.lustre.org>
Reply-To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 
<darby.vicke...@nasa.gov>
Date: Thursday, September 30, 2021 at 11:41 AM
To: Colin Faber via lustre-discuss <lustre-discuss@lists.lustre.org>
Subject: [EXTERNAL] [lustre-discuss] ASSERTION( obj->oo_with_projid ) failed

    Hello everyone,

    We've run into a pretty nasty LBUG that took our LFS down.  We're not 
exactly sure the cause and could use some help.  Its pretty much identical to 
this:

        https://jira.whamcloud.com/browse/LU-13189

    One of our OSS's started crashing repeated last night.  We are configured 
with HA and tried failing over to its pair just to have that OSS crash in the 
same way.  We are in the process of doing the same thing mentioned in the above 
LU to get back up and running but we'd like to try and fix this without the 
#undef ZFS_PROJINHERIT if possible.  A couple of months ago we updated our 
servers to 2.14 – stock, no modifications – and we'd like to get back to stock 
2.14 again if possible.  Up until last night, our experience with 2.14 was 
great – very stable compared to what we were running previously (very old 2.10) 
and better performing.  Our specific stack trace from the crash dump is below 
if that helps.  Our servers are running 3.10.0-1160.31.1.el7.x86_64.  MDT and 
OST's are both using ZFS (version 2.0).  

    There are two things that could have contributed to the crash.  

    First, about 1 week ago, we tried to use project quotas for the first time. 
 Without reading the lustre manual, I just tried to set a project quota as such:


        lfs setquota -p 1 -b 307200 -B 309200 -i 10000 -I 11000 .


    But it was pretty obvious that didn't work.


        # lfs quota -p 1 /nobackup/
        Unexpected quotactl error: Operation not supported
        Disk quotas for prj 1 (pid 1):
             Filesystem  kbytes   quota   limit   grace   files   quota   limit 
  grace
            /nobackup/     [0]     [0]     [0]       -     [0]     [0]     [0]  
     -
        Some errors happened when getting quota info. Some devices may be not 
working or deactivated. The data in "[]" is inaccurate.
        #


    Then, after reading section 25.2 in the lustre manual 
(https://doc.lustre.org/lustre_manual.xhtml#enabling_disk_quotas), I saw that 
zfs version >=0.8 with kernel version < 4.5 requires a patched kernel.  So I 
just moved on figuring project quotas would not work since we are using the 
stock kernel.  But it now it appears this might be the cause of our problem.  
As of right now, I see this in the zfs properties for our metadata filesystem. 

        [root@hpfs-fsl-mds0 ~]# zpool get all mds0-0-new  | grep proj
        mds0-0-new  feature@project_quota          active                       
 local
        [root@hpfs-fsl-mds0 ~]#


    Several questions come to mind.  

    * Is this the likely cause of our crash?
    * Why would it take 1 week to show up?
    * What is the best way to reverse any ill effects the "lfs setquota -p 1" 
command may have caused?



    The second possible contributor is related to some maintenance we just 
finished on the metadata server yesterday morning.  After the update to 2.14 
(and zfs update from 0.7 to 2.0), we got this message from "zpool status" on 
our mdt pool:


      pool: mds0-0
     state: ONLINE
    status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
    action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
      scan: scrub repaired 0B in 1 days 17:49:23 with 0 errors on Fri Jul  9 
21:03:24 2021
    config:

        NAME        STATE     READ WRITE CKSUM
        mds0-0      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            mpathm  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathn  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-1  ONLINE       0     0     0
            mpatho  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathp  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-2  ONLINE       0     0     0
            mpathq  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathr  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-3  ONLINE       0     0     0
            mpaths  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpatht  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-4  ONLINE       0     0     0
            mpathu  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathv  ONLINE       0     0     0  block size: 512B configured, 
4096B native
          mirror-5  ONLINE       0     0     0
            mpathw  ONLINE       0     0     0  block size: 512B configured, 
4096B native
            mpathx  ONLINE       0     0     0  block size: 512B configured, 
4096B native



    This is related to the SSD's we are using for the MDT.  The physical block 
size is 4k (ashift=12) but the logical block size is 0.5k (ashift=9).  
Apparently, the old version of zfs (under which the original pool was built) 
picked ashift=9 but after the update zfs 2.0 was telling us we should be using 
the larger block size to match the physical block size of these drives.  
Despite this mismatch, our mdtest results (via io500) were greatly improved 
with the lustre 2.14 update.  But its still something we wanted to fix, which 
was the purpose of our maintenance outage yesterday.  So we backed up the 
mds0-0/meta-fsl file system to a separate pool, destroyed the old pool, rebuilt 
it (now with zfs choosing shift=12 for the block size) and copied the data back 
to the newly created pool.  However, this process failed.  Our old metadata 
file system (512B block size) was using about 490 GB of our 2.2 TB pool.  Due 
to the increase in block size, the data take up more space in the file system - 
potentially 8x more if each entry is less than 512 B to begin with.  We filled 
up the new ashift=12 pool.  So we had to revert back to an ashift=9 pool.   We 
are going to have buy more or bigger SSD's (or use raidz instead of raid10) if 
we want to go to a bigger ashift.  

    So this could be related too.  Theoretically, nothing should have changed 
as far as lustre was concerned.  But its hard to ignore that we put the file 
system back in service yesterday morning and about 10 hours later we ran into 
this problem.  


    If anyone has ideas, please let us know.  We're happy to post details here 
or to an LU.  

    Thanks,
    Darby Vicker




    [  138.597710] LustreError: 2476:0:(tgt_grant.c:803:tgt_grant_check()) 
hpfs-fsl-OST0005: cli cd0fda1d-691d-bb4f-1548-c45f8c2e578d is replaying 
OST_WRITE while one rnb hasn't OBD_BRW_FROM_GRANT set (0x8)
    [  138.699120] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) 
ASSERTION( obj->oo_with_projid ) failed:
    [  138.699155] LustreError: 2476:0:(osd_object.c:1353:osd_attr_set()) LBUG
    [  138.699176] Pid: 2476, comm: tgt_recover_5 3.10.0-1160.31.1.el7.x86_64 
#1 SMP Thu Jun 10 13:32:12 UTC 2021
    [  138.699177] Call Trace:
    [  138.699184]  [<ffffffffc104167c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
    [  138.699194]  [<ffffffffc104199c>] lbug_with_loc+0x4c/0xa0 [libcfs]
    [  138.699199]  [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
    [  138.699207]  [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
    [  138.699213]  [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
    [  138.699218]  [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
    [  138.699223]  [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
    [  138.699273]  [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750 
[ptlrpc]
    [  138.699299]  [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
    [  138.699317]  [<ffffffffc151406b>] 
replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
    [  138.699336]  [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0 
[ptlrpc]
    [  138.699354]  [<ffffffffba6c5e31>] kthread+0xd1/0xe0
    [  138.699357]  [<ffffffffbad95df7>] ret_from_fork_nospec_end+0x0/0x39
    [  138.699360]  [<ffffffffffffffff>] 0xffffffffffffffff
    [  138.699380] Kernel panic - not syncing: LBUG
    [  138.699395] CPU: 1 PID: 2476 Comm: tgt_recover_5 Kdump: loaded Tainted: 
P           OE  ------------   3.10.0-1160.31.1.el7.x86_64 #1
    [  138.699429] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
    [  138.699449] Call Trace:
    [  138.699460]  [<ffffffffbad835a9>] dump_stack+0x19/0x1b
    [  138.699477]  [<ffffffffbad7d2b1>] panic+0xe8/0x21f
    [  138.699496]  [<ffffffffc10419eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
    [  138.699519]  [<ffffffffc17a62db>] osd_attr_set+0xdeb/0xe60 [osd_zfs]
    [  138.699543]  [<ffffffffc18ca5cd>] ? ofd_attr_handle_id+0x12d/0x410 [ofd]
    [  138.699566]  [<ffffffffc18cf50e>] ofd_write_attr_set+0x87e/0xd20 [ofd]
    [  138.699588]  [<ffffffffba7de42d>] ? kzfree+0x2d/0x70
    [  138.699607]  [<ffffffffc18cfc03>] ofd_commitrw_write+0x253/0x1510 [ofd]
    [  138.699628]  [<ffffffffba7c7675>] ? __free_pages+0x25/0x30
    [  138.699649]  [<ffffffffc18d484d>] ofd_commitrw+0x2ad/0x9a0 [ofd]
    [  138.699693]  [<ffffffffc15b85d1>] tgt_brw_write+0xe51/0x1a10 [ptlrpc]
    [  138.699738]  [<ffffffffc15bca5a>] tgt_request_handle+0x7ea/0x1750 
[ptlrpc]
    [  138.699761]  [<ffffffffba6aee98>] ? add_timer+0x18/0x20
    [  138.699779]  [<ffffffffba6bc13b>] ? __queue_delayed_work+0x8b/0x1a0
    [  138.699822]  [<ffffffffc15bc270>] ? tgt_hpreq_handler+0x2c0/0x2c0 
[ptlrpc]
    [  138.699861]  [<ffffffffc150a096>] handle_recovery_req+0x96/0x290 [ptlrpc]
    [  138.699899]  [<ffffffffc151406b>] 
replay_request_or_update.isra.25+0x2fb/0x930 [ptlrpc]
    [  138.699940]  [<ffffffffc1514dbd>] target_recovery_thread+0x71d/0x11d0 
[ptlrpc]
    [  138.699963]  [<ffffffffbad88e60>] ? __schedule+0x320/0x680
    [  138.699998]  [<ffffffffc15146a0>] ? 
replay_request_or_update.isra.25+0x930/0x930 [ptlrpc]
    [  138.700023]  [<ffffffffba6c5e31>] kthread+0xd1/0xe0
    [  138.700039]  [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40
    [  138.700059]  [<ffffffffbad95df7>] ret_from_fork_nospec_begin+0x21/0x21
    [  138.700079]  [<ffffffffba6c5d60>] ? insert_kthread_work+0x40/0x40

    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss@lists.lustre.org
    
https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&amp;data=04%7C01%7Cdarby.vicker-1%40nasa.gov%7C84f0c8414146473a79d308d984398304%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637686204818487883%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pSEabqH75rxqYLWIKI0JrRjiTMtQ5BhoFuyQKFQsGL8%3D&amp;reserved=0

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
    • ... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss

Reply via email to