Re: [ceph-users] kernel cephfs - too many caps used by client
Only osds is v12.2.8, all of mds and mon used v12.2.12 # ceph versions { "mon": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 4 }, "osd": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 24, "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 203 }, "mds": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 5 }, "rgw": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 1 }, "overall": { "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 37, "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 203 } } Lei Liu 于2019年10月19日周六 上午10:09写道: > Thanks for your reply. > > Yes, Already set it. > > [mds] >> mds_max_caps_per_client = 10485760 # default is 1048576 > > > I think the current configuration is big enough for per client. Do I need > to continue to increase this value? > > Thanks. > > Patrick Donnelly 于2019年10月19日周六 上午6:30写道: > >> Hello Lei, >> >> On Thu, Oct 17, 2019 at 8:43 PM Lei Liu wrote: >> > >> > Hi cephers, >> > >> > We have some ceph clusters use cephfs in production(mount with kernel >> cephfs), but several of clients often keep a lot of caps(millions) >> unreleased. >> > I know this is due to the client's inability to complete the cache >> release, errors might have been encountered, but no logs. >> > >> > client kernel version is 3.10.0-957.21.3.el7.x86_64 >> > ceph version is mostly v12.2.8 >> > >> > ceph status shows: >> > >> > x clients failing to respond to cache pressure >> > >> > client kernel debug shows: >> > >> > # cat >> /sys/kernel/debug/ceph/a00cc99c-f9f9-4dd9-9281-43cd12310e41.client11291811/caps >> > total 23801585 >> > avail 1074 >> > used 23800511 >> > reserved 0 >> > min 1024 >> > >> > mds config: >> > [mds] >> > mds_max_caps_per_client = 10485760 >> > # 50G >> > mds_cache_memory_limit = 53687091200 >> > >> > I want to know if some ceph configurations can solve this problem ? >> >> mds_max_caps_per_client is new in Luminous 12.2.12. See [1]. You need >> to upgrade. >> >> [1] https://tracker.ceph.com/issues/38130 >> >> -- >> Patrick Donnelly, Ph.D. >> He / Him / His >> Senior Software Engineer >> Red Hat Sunnyvale, CA >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >> >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel cephfs - too many caps used by client
Thanks for your reply. Yes, Already set it. [mds] > mds_max_caps_per_client = 10485760 # default is 1048576 I think the current configuration is big enough for per client. Do I need to continue to increase this value? Thanks. Patrick Donnelly 于2019年10月19日周六 上午6:30写道: > Hello Lei, > > On Thu, Oct 17, 2019 at 8:43 PM Lei Liu wrote: > > > > Hi cephers, > > > > We have some ceph clusters use cephfs in production(mount with kernel > cephfs), but several of clients often keep a lot of caps(millions) > unreleased. > > I know this is due to the client's inability to complete the cache > release, errors might have been encountered, but no logs. > > > > client kernel version is 3.10.0-957.21.3.el7.x86_64 > > ceph version is mostly v12.2.8 > > > > ceph status shows: > > > > x clients failing to respond to cache pressure > > > > client kernel debug shows: > > > > # cat > /sys/kernel/debug/ceph/a00cc99c-f9f9-4dd9-9281-43cd12310e41.client11291811/caps > > total 23801585 > > avail 1074 > > used 23800511 > > reserved 0 > > min 1024 > > > > mds config: > > [mds] > > mds_max_caps_per_client = 10485760 > > # 50G > > mds_cache_memory_limit = 53687091200 > > > > I want to know if some ceph configurations can solve this problem ? > > mds_max_caps_per_client is new in Luminous 12.2.12. See [1]. You need > to upgrade. > > [1] https://tracker.ceph.com/issues/38130 > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Senior Software Engineer > Red Hat Sunnyvale, CA > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Full disclosure - I have not created an erasure code pool yet! I have been wanting to do the same thing that you are attempting and have these links saved. I believe this is what you are looking for. This link is for decompiling the CRUSH rules and recompiling: https://docs.ceph.com/docs/luminous/rados/operations/crush-map-edits/ This link is for creating the EC rules for 4+2 with only 3 hosts: https://ceph.io/planet/erasure-code-on-small-clusters/ I hope that helps! Chris On 2019-10-18 2:55 pm, Salsa wrote: Ok, I'm lost here. How am I supposed to write a crush rule? So far I managed to run: #ceph osd crush rule dump test -o test.txt So I can edit the rule. Now I have two problems: 1. Whats the functions and operations to use here? Is there documentation anywhere abuot this? 2. How may I create a crush rule using this file? 'ceph osd crush rule create ... -i test.txt' does not work. Am I taking the wrong approach here? -- Salsa Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Friday, October 18, 2019 3:56 PM, Paul Emmerich wrote: Default failure domain in Ceph is "host" (see ec profile), i.e., you need at least k+m hosts (but at least k+m+1 is better for production setups). You can change that to OSD, but that's not a good idea for a production setup for obvious reasons. It's slightly better to write a crush rule that explicitly picks two disks on 3 different hosts Paul Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote: > I have probably misunterstood how to create erasure coded pools so I may be in need of some theory and appreciate if you can point me to documentation that may clarify my doubts. > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > I tried to create an erasure code profile like so: > " > > ceph osd erasure-code-profile get ec4x2rs > > == > > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > If I create a pool using this profile or any profile where K+M > hosts , then the pool gets stuck. > " > > ceph -s > > > > cluster: > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > health: HEALTH_WARN > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > 2 pools have too many placement groups > too few PGs per OSD (4 < min 30) > services: > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > data: > pools: 11 pools, 32 pgs > objects: 0 objects, 0 B > usage: 32 GiB used, 109 TiB / 109 TiB avail > pgs: 50.000% pgs not active > 16 active+clean > 16 creating+incomplete > > ceph osd pool ls > > = > > test_ec > test_ec2 > " > The pool will never leave this "creating+incomplete" state. > The pools were created like this: > " > > ceph osd pool create test_ec2 16 16 erasure ec4x2rs > > > > ceph osd pool create test_ec 16 16 erasure > > === > > " > The default profile pool is created correctly. > My profiles are like this: > " > > ceph osd erasure-code-profile get default > > == > > k=2 > m=1 > plugin=jerasure > technique=reed_sol_van > > ceph osd erasure-code-profile get ec4x2rs > > == > > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > From what I've read it seems to be possible to create erasure code pools with higher than hosts K+M. Is this not so? > What am I doing wrong? Do I have to create any special crush map rule? > -- > Salsa > Sent with ProtonMail Secure Email. > > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] kernel cephfs - too many caps used by client
Hello Lei, On Thu, Oct 17, 2019 at 8:43 PM Lei Liu wrote: > > Hi cephers, > > We have some ceph clusters use cephfs in production(mount with kernel > cephfs), but several of clients often keep a lot of caps(millions) unreleased. > I know this is due to the client's inability to complete the cache release, > errors might have been encountered, but no logs. > > client kernel version is 3.10.0-957.21.3.el7.x86_64 > ceph version is mostly v12.2.8 > > ceph status shows: > > x clients failing to respond to cache pressure > > client kernel debug shows: > > # cat > /sys/kernel/debug/ceph/a00cc99c-f9f9-4dd9-9281-43cd12310e41.client11291811/caps > total 23801585 > avail 1074 > used 23800511 > reserved 0 > min 1024 > > mds config: > [mds] > mds_max_caps_per_client = 10485760 > # 50G > mds_cache_memory_limit = 53687091200 > > I want to know if some ceph configurations can solve this problem ? mds_max_caps_per_client is new in Luminous 12.2.12. See [1]. You need to upgrade. [1] https://tracker.ceph.com/issues/38130 -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Ok, I'm lost here. How am I supposed to write a crush rule? So far I managed to run: #ceph osd crush rule dump test -o test.txt So I can edit the rule. Now I have two problems: 1. Whats the functions and operations to use here? Is there documentation anywhere abuot this? 2. How may I create a crush rule using this file? 'ceph osd crush rule create ... -i test.txt' does not work. Am I taking the wrong approach here? -- Salsa Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Friday, October 18, 2019 3:56 PM, Paul Emmerich wrote: > Default failure domain in Ceph is "host" (see ec profile), i.e., you > need at least k+m hosts (but at least k+m+1 is better for production > setups). > You can change that to OSD, but that's not a good idea for a > production setup for obvious reasons. It's slightly better to write a > crush rule that explicitly picks two disks on 3 different hosts > > Paul > > > > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Fri, Oct 18, 2019 at 8:45 PM Salsa sa...@protonmail.com wrote: > > > I have probably misunterstood how to create erasure coded pools so I may be > > in need of some theory and appreciate if you can point me to documentation > > that may clarify my doubts. > > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > > I tried to create an erasure code profile like so: > > " > > > > ceph osd erasure-code-profile get ec4x2rs > > > > == > > > > crush-device-class= > > crush-failure-domain=host > > crush-root=default > > jerasure-per-chunk-alignment=false > > k=4 > > m=2 > > plugin=jerasure > > technique=reed_sol_van > > w=8 > > " > > If I create a pool using this profile or any profile where K+M > hosts , > > then the pool gets stuck. > > " > > > > ceph -s > > > > > > > > cluster: > > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > > health: HEALTH_WARN > > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > > 2 pools have too many placement groups > > too few PGs per OSD (4 < min 30) > > services: > > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > > data: > > pools: 11 pools, 32 pgs > > objects: 0 objects, 0 B > > usage: 32 GiB used, 109 TiB / 109 TiB avail > > pgs: 50.000% pgs not active > > 16 active+clean > > 16 creating+incomplete > > > > ceph osd pool ls > > > > = > > > > test_ec > > test_ec2 > > " > > The pool will never leave this "creating+incomplete" state. > > The pools were created like this: > > " > > > > ceph osd pool create test_ec2 16 16 erasure ec4x2rs > > > > > > > > ceph osd pool create test_ec 16 16 erasure > > > > === > > > > " > > The default profile pool is created correctly. > > My profiles are like this: > > " > > > > ceph osd erasure-code-profile get default > > > > == > > > > k=2 > > m=1 > > plugin=jerasure > > technique=reed_sol_van > > > > ceph osd erasure-code-profile get ec4x2rs > > > > == > > > > crush-device-class= > > crush-failure-domain=host > > crush-root=default > > jerasure-per-chunk-alignment=false > > k=4 > > m=2 > > plugin=jerasure > > technique=reed_sol_van > > w=8 > > " > > From what I've read it seems to be possible to create erasure code pools > > with higher than hosts K+M. Is this not so? > > What am I doing wrong? Do I have to create any special crush map rule? > > -- > > Salsa > > Sent with ProtonMail Secure Email. > > > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can't create erasure coded pools with k+m greater than hosts?
Default failure domain in Ceph is "host" (see ec profile), i.e., you need at least k+m hosts (but at least k+m+1 is better for production setups). You can change that to OSD, but that's not a good idea for a production setup for obvious reasons. It's slightly better to write a crush rule that explicitly picks two disks on 3 different hosts Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, Oct 18, 2019 at 8:45 PM Salsa wrote: > > I have probably misunterstood how to create erasure coded pools so I may be > in need of some theory and appreciate if you can point me to documentation > that may clarify my doubts. > > I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). > > I tried to create an erasure code profile like so: > > " > # ceph osd erasure-code-profile get ec4x2rs > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > > If I create a pool using this profile or any profile where K+M > hosts , then > the pool gets stuck. > > " > # ceph -s > cluster: > id: eb4aea44-0c63-4202-b826-e16ea60ed54d > health: HEALTH_WARN > Reduced data availability: 16 pgs inactive, 16 pgs incomplete > 2 pools have too many placement groups > too few PGs per OSD (4 < min 30) > > services: > mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) > mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 > osd: 30 osds: 30 up (since 2w), 30 in (since 2w) > > data: > pools: 11 pools, 32 pgs > objects: 0 objects, 0 B > usage: 32 GiB used, 109 TiB / 109 TiB avail > pgs: 50.000% pgs not active > 16 active+clean > 16 creating+incomplete > > # ceph osd pool ls > test_ec > test_ec2 > " > The pool will never leave this "creating+incomplete" state. > > The pools were created like this: > " > # ceph osd pool create test_ec2 16 16 erasure ec4x2rs > # ceph osd pool create test_ec 16 16 erasure > " > The default profile pool is created correctly. > > My profiles are like this: > " > # ceph osd erasure-code-profile get default > k=2 > m=1 > plugin=jerasure > technique=reed_sol_van > > # ceph osd erasure-code-profile get ec4x2rs > crush-device-class= > crush-failure-domain=host > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > " > > From what I've read it seems to be possible to create erasure code pools with > higher than hosts K+M. Is this not so? > What am I doing wrong? Do I have to create any special crush map rule? > > -- > Salsa > > Sent with ProtonMail Secure Email. > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't create erasure coded pools with k+m greater than hosts?
I have probably misunterstood how to create erasure coded pools so I may be in need of some theory and appreciate if you can point me to documentation that may clarify my doubts. I have so far 1 cluster with 3 hosts and 30 OSDs (10 each host). I tried to create an erasure code profile like so: " # ceph osd erasure-code-profile get ec4x2rs crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 " If I create a pool using this profile or any profile where K+M > hosts , then the pool gets stuck. " # ceph -s cluster: id: eb4aea44-0c63-4202-b826-e16ea60ed54d health: HEALTH_WARN Reduced data availability: 16 pgs inactive, 16 pgs incomplete 2 pools have too many placement groups too few PGs per OSD (4 < min 30) services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 11d) mgr: ceph01(active, since 74m), standbys: ceph03, ceph02 osd: 30 osds: 30 up (since 2w), 30 in (since 2w) data: pools: 11 pools, 32 pgs objects: 0 objects, 0 B usage: 32 GiB used, 109 TiB / 109 TiB avail pgs: 50.000% pgs not active 16 active+clean 16 creating+incomplete # ceph osd pool ls test_ec test_ec2 " The pool will never leave this "creating+incomplete" state. The pools were created like this: " # ceph osd pool create test_ec2 16 16 erasure ec4x2rs # ceph osd pool create test_ec 16 16 erasure " The default profile pool is created correctly. My profiles are like this: " # ceph osd erasure-code-profile get default k=2 m=1 plugin=jerasure technique=reed_sol_van # ceph osd erasure-code-profile get ec4x2rs crush-device-class= crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 " From what I've read it seems to be possible to create erasure code pools with higher than hosts K+M. Is this not so? What am I doing wrong? Do I have to create any special crush map rule? -- Salsa Sent with [ProtonMail](https://protonmail.com) Secure Email.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Problematic inode preventing ceph-mds from starting
Last week I asked about a rogue inode that was causing ceph-mds to segfault during replay. We didn't get any suggestions from this list, so we have been familiarizing ourselves with the ceph source code, and have added the following patch: --- a/src/mds/CInode.cc +++ b/src/mds/CInode.cc @@ -736,6 +736,13 @@ CDir *CInode::get_approx_dirfrag(frag_t fg) CDir *CInode::get_or_open_dirfrag(MDCache *mdcache, frag_t fg) { + if (!is_dir()) { +ostringstream oss; +JSONFormatter f(true); +dump(, DUMP_PATH | DUMP_INODE_STORE_BASE | DUMP_MDS_CACHE_OBJECT | DUMP_LOCKS | DUMP_STATE | DUMP_CAPS | DUMP_DIRFRAGS); +f.flush(oss); +dout(0) << oss.str() << dendl; + } ceph_assert(is_dir()); // have it? This has given us a culprit: -2> 2019-10-18 16:19:06.934 7faefa470700 0 mds.0.cache.ino(0x1995e63) "/unimportant/path/we/can/tolerate/losing/compat.py"10995216789470"2018-03-24 03:18:17.621969""2018-03-24 03:18:17.620969"3318855521001{ "dir_hash": 0 } { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 1, "pool_ns": "" } [] 3411844674407370955161500"2015-01-27 16:01:52.467669""2018-03-24 03:18:17.621969"21-1[] { "version": 0, "mtime": "0.00", "num_files": 0, "num_subdirs": 0 } { "version": 0, "rbytes": 34, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "0.00" } { "version": 0, "rbytes": 34, "rfiles": 1, "rsubdirs": 0, "rsnaps": 0, "rctime": "0.00" } 2540123[] { "splits": [] } true{ "replicas": {} } { "authority": [ 0, -2 ], "replica_nonce": 0 } 0falsefalse{} 0{ "gather_set": [], "state": "lock", "is_leased": false, "num_rdlocks": 0, "num_wrlocks": 0, "num_xlocks": 0, "xlock_by": {} } {} {} {} {} {} {} {} {} {} [ "auth" ] [] -1-1[] [] -1> 2019-10-18 16:19:06.964 7faefa470700 -1 /opt/app-root/src/ceph/src/mds/CInode.cc: In function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 7faefa470700 time 2019-10-18 16:19:06.934662 /opt/app-root/src/ceph/src/mds/CInode.cc: 746: FAILED ceph_assert(is_dir()) ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1aa) [0x7faf0a9ce39e] 2: (()+0x12a8620) [0x7faf0a9ce620] 3: (CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x253) [0x557562a4b1ad] 4: (OpenFileTable::_prefetch_dirfrags()+0x4db) [0x557562b63d63] 5: (OpenFileTable::_open_ino_finish(inodeno_t, int)+0x16a) [0x557562b63720] 6: (C_OFT_OpenInoFinish::finish(int)+0x2d) [0x557562b67699] 7: (Context::complete(int)+0x27) [0x557562657fbf] 8: (MDSContext::complete(int)+0x152) [0x557562b04aa4] 9: (void finish_contexts > >(CephContext*, std::vector >&, int)+0x2c8) [0x557562660e36] 10: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, int)+0x185) [0x557562844c4d] 11: (MDCache::_open_ino_backtrace_fetched(inodeno_t, ceph::buffer::v14_2_0::list&, int)+0xbbf) [0x557562842785] 12: (C_IO_MDC_OpenInoBacktraceFetched::finish(int)+0x37) [0x557562886a31] 13: (Context::complete(int)+0x27) [0x557562657fbf] 14: (MDSContext::complete(int)+0x152) [0x557562b04aa4] 15: (MDSIOContextBase::complete(int)+0x345) [0x557562b0522d] 16: (Finisher::finisher_thread_entry()+0x38b) [0x7faf0a9033e1] 17: (Finisher::FinisherThread::entry()+0x1c) [0x5575626a2772] 18: (Thread::entry_wrapper()+0x78) [0x7faf0a97203c] 19: (Thread::_entry_func(void*)+0x18) [0x7faf0a971fba] 20: (()+0x7dd5) [0x7faf07844dd5] 21: (clone()+0x6d) [0x7faf064f502d] I tried removing it, but it does not show up in the omapkeys for that inode: lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys 1995e63. __about__.py_head __init__.py_head __pycache___head _compat.py_head _structures.py_head markers.py_head requirements.py_head specifiers.py_head utils.py_head version.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 1995e63. _compat.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 1995e63. compat.py_head lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey 1995e63. file-does-not-exist_head lima:/home/neale$ ceph -- rados -p cephfs_metadata listomapkeys 1995e63. __about__.py_head __init__.py_head __pycache___head _structures.py_head markers.py_head requirements.py_head specifiers.py_head utils.py_head version.py_head Predictably, this did nothing to solve our problem, and ceph-mds is still dying during startup. Any suggestions? Neale Pickett A-4: Advanced Research in Cyber Systems Los Alamos National Laboratory ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com