Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, Did you edit the code before trying Luminous? Yes, I'm still on jewel. I also noticed from your > original mail that it appears you're using multiple active metadata> servers? If so, that's not stable in Jewel. You may have tripped on> one of many bugs fixed in Luminous for that configuration. No, Im using active/backup configuration. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
On Thu, Sep 28, 2017 at 5:16 AM, Micha Krausewrote: > Hi, > > I had a chance to catch John Spray at the Ceph Day, and he suggested that I > try to reproduce this bug in luminos. Did you edit the code before trying Luminous? I also noticed from your original mail that it appears you're using multiple active metadata servers? If so, that's not stable in Jewel. You may have tripped on one of many bugs fixed in Luminous for that configuration. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
On Thu, Sep 28, 2017 at 5:16 AM Micha Krausewrote: > Hi, > > I had a chance to catch John Spray at the Ceph Day, and he suggested that > I try to reproduce this bug in luminos. > > To fix my immediate problem we discussed 2 ideas: > > 1. Manually edit the Meta-data, unfortunately I was not able to find any > Information on how the meta-data is structured :-( > > 2. Edit the code to set the link count to 0 if it is negative: > > > diff --git a/src/mds/StrayManager.cc b/src/mds/StrayManager.cc > index 9e53907..2ca1449 100644 > --- a/src/mds/StrayManager.cc > +++ b/src/mds/StrayManager.cc > @@ -553,6 +553,10 @@ bool StrayManager::__eval_stray(CDentry *dn, bool > delay) > logger->set(l_mdc_num_strays_delayed, num_strays_delayed); > } > > + if (in->inode.nlink < 0) { > +in->inode.nlink=0; > + } > + > // purge? > if (in->inode.nlink == 0) { > // past snaprealm parents imply snapped dentry remote links. > diff --git a/src/xxHash b/src/xxHash > --- a/src/xxHash > +++ b/src/xxHash > @@ -1 +1 @@ > > > Im not sure if this works, the patched mds no longer crashes, however I > expected that this value: > > root@mds02:~ # ceph daemonperf mds.1 > -mds-- --mds_server-- ---objecter--- -mds_cache- > ---mds_log > rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts > subm| >0 100k 0 | 000 | 000 | 00 625k 0 | 30 > 25k 0 > > > Should go down, but it stays at 625k, unfortunately I don't have another > System to compare. > > After I started the patched mds once, I reverted back to an unpatched mds, > and it also stopped crashing, so I guess it did "fix" something. > > > A question just out of curiosity, I tried to log these events with > something like: > > dout(10) << "Fixed negative inode count"; > > or > > derr << "Fixed negative inode count"; > > But my compiler yelled at me for trying this. > dout and derr are big macros. You need to end the line with " << dendl;" to close it off. > > Micha Krause > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, I had a chance to catch John Spray at the Ceph Day, and he suggested that I try to reproduce this bug in luminos. To fix my immediate problem we discussed 2 ideas: 1. Manually edit the Meta-data, unfortunately I was not able to find any Information on how the meta-data is structured :-( 2. Edit the code to set the link count to 0 if it is negative: diff --git a/src/mds/StrayManager.cc b/src/mds/StrayManager.cc index 9e53907..2ca1449 100644 --- a/src/mds/StrayManager.cc +++ b/src/mds/StrayManager.cc @@ -553,6 +553,10 @@ bool StrayManager::__eval_stray(CDentry *dn, bool delay) logger->set(l_mdc_num_strays_delayed, num_strays_delayed); } + if (in->inode.nlink < 0) { +in->inode.nlink=0; + } + // purge? if (in->inode.nlink == 0) { // past snaprealm parents imply snapped dentry remote links. diff --git a/src/xxHash b/src/xxHash --- a/src/xxHash +++ b/src/xxHash @@ -1 +1 @@ Im not sure if this works, the patched mds no longer crashes, however I expected that this value: root@mds02:~ # ceph daemonperf mds.1 -mds-- --mds_server-- ---objecter--- -mds_cache- ---mds_log rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts subm| 0 100k 0 | 000 | 000 | 00 625k 0 | 30 25k 0 Should go down, but it stays at 625k, unfortunately I don't have another System to compare. After I started the patched mds once, I reverted back to an unpatched mds, and it also stopped crashing, so I guess it did "fix" something. A question just out of curiosity, I tried to log these events with something like: dout(10) << "Fixed negative inode count"; or derr << "Fixed negative inode count"; But my compiler yelled at me for trying this. Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
A serious problem of mds I think. Anyone to fix it? Regards. On Thu, Sep 14, 2017 at 19:55 Micha Krausewrote: > Hi, > > looking at the code, and running with debug mds = 10 it looks like I have > an inode with negative link count. > > -2> 2017-09-14 13:28:39.249399 7f3919616700 10 mds.0.cache.strays > eval_stray [dentry #100/stray7/17aa2f6 [2,head] auth (dversion lock) > pv=0 v=23058565 inode=0x7f394b7e0730 0x7f3945a96270] > -1> 2017-09-14 13:28:39.249445 7f3919616700 10 mds.0.cache.strays > inode is [inode 17aa2f6 [2,head] ~mds0/stray7/17aa2f6 auth > v23057120 s=4476488 nl=-1 n(v0 b4476488 1=1+0) (iversion lock) 0x7f394b7e > > I guess "nl" stands for number of links. > > The code in StrayManager.cc checks for: > > if (in->inode.nlink == 0) { ... } > else { > eval_remote_stray(dn, NULL); > } > > void StrayManager::eval_remote_stray(CDentry *stray_dn, CDentry *remote_dn) > { > ... > assert(stray_in->inode.nlink >= 1); > ... > } > > So if my link count is indeed -1 ceph will die here. > > > The question is: how can I get rid of this inode? > > > Micha Krause > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.
Hi, looking at the code, and running with debug mds = 10 it looks like I have an inode with negative link count. -2> 2017-09-14 13:28:39.249399 7f3919616700 10 mds.0.cache.strays eval_stray [dentry #100/stray7/17aa2f6 [2,head] auth (dversion lock) pv=0 v=23058565 inode=0x7f394b7e0730 0x7f3945a96270] -1> 2017-09-14 13:28:39.249445 7f3919616700 10 mds.0.cache.strays inode is [inode 17aa2f6 [2,head] ~mds0/stray7/17aa2f6 auth v23057120 s=4476488 nl=-1 n(v0 b4476488 1=1+0) (iversion lock) 0x7f394b7e I guess "nl" stands for number of links. The code in StrayManager.cc checks for: if (in->inode.nlink == 0) { ... } else { eval_remote_stray(dn, NULL); } void StrayManager::eval_remote_stray(CDentry *stray_dn, CDentry *remote_dn) { ... assert(stray_in->inode.nlink >= 1); ... } So if my link count is indeed -1 ceph will die here. The question is: how can I get rid of this inode? Micha Krause ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com