Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.

2017-10-04 Thread Micha Krause

Hi,


Did you edit the code before trying Luminous?


Yes, I'm still on jewel.



I also noticed from your  > original mail that it appears you're using multiple 
active metadata> servers? If so, that's not stable in Jewel. You may have tripped 
on> one of many bugs fixed in Luminous for that configuration.

No, Im using active/backup configuration.


Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.

2017-10-02 Thread Patrick Donnelly
On Thu, Sep 28, 2017 at 5:16 AM, Micha Krause  wrote:
> Hi,
>
> I had a chance to catch John Spray at the Ceph Day, and he suggested that I
> try to reproduce this bug in luminos.

Did you edit the code before trying Luminous? I also noticed from your
original mail that it appears you're using multiple active metadata
servers? If so, that's not stable in Jewel. You may have tripped on
one of many bugs fixed in Luminous for that configuration.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.

2017-09-28 Thread Gregory Farnum
On Thu, Sep 28, 2017 at 5:16 AM Micha Krause  wrote:

> Hi,
>
> I had a chance to catch John Spray at the Ceph Day, and he suggested that
> I try to reproduce this bug in luminos.
>
> To fix my immediate problem we discussed 2 ideas:
>
> 1. Manually edit the Meta-data, unfortunately I was not able to find any
> Information on how the meta-data is structured :-(
>
> 2. Edit the code to set the link count to 0 if it is negative:
>
>
> diff --git a/src/mds/StrayManager.cc b/src/mds/StrayManager.cc
> index 9e53907..2ca1449 100644
> --- a/src/mds/StrayManager.cc
> +++ b/src/mds/StrayManager.cc
> @@ -553,6 +553,10 @@ bool StrayManager::__eval_stray(CDentry *dn, bool
> delay)
>   logger->set(l_mdc_num_strays_delayed, num_strays_delayed);
> }
>
> +  if (in->inode.nlink < 0) {
> +in->inode.nlink=0;
> +  }
> +
> // purge?
> if (in->inode.nlink == 0) {
>   // past snaprealm parents imply snapped dentry remote links.
> diff --git a/src/xxHash b/src/xxHash
> --- a/src/xxHash
> +++ b/src/xxHash
> @@ -1 +1 @@
>
>
> Im not sure if this works, the patched mds no longer crashes, however I
> expected that this value:
>
> root@mds02:~ # ceph daemonperf mds.1
> -mds-- --mds_server-- ---objecter--- -mds_cache-
> ---mds_log
> rlat inos caps|hsr  hcs  hcr |writ read actv|recd recy stry purg|segs evts
> subm|
>0  100k   0 |  000 |  000 |  00  625k   0 | 30
>  25k   0
> 
>
> Should go down, but it stays at 625k, unfortunately I don't have another
> System to compare.
>
> After I started the patched mds once, I reverted back to an unpatched mds,
> and it also stopped crashing, so I guess it did "fix" something.
>
>
> A question just out of curiosity, I tried to log these events with
> something like:
>
>   dout(10) << "Fixed negative inode count";
>
> or
>
>   derr << "Fixed negative inode count";
>
> But my compiler yelled at me for trying this.
>

dout and derr are big macros. You need to end the line with " << dendl;" to
close it off.


>
> Micha Krause
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.

2017-09-28 Thread Micha Krause

Hi,

I had a chance to catch John Spray at the Ceph Day, and he suggested that I try 
to reproduce this bug in luminos.

To fix my immediate problem we discussed 2 ideas:

1. Manually edit the Meta-data, unfortunately I was not able to find any 
Information on how the meta-data is structured :-(

2. Edit the code to set the link count to 0 if it is negative:


diff --git a/src/mds/StrayManager.cc b/src/mds/StrayManager.cc
index 9e53907..2ca1449 100644
--- a/src/mds/StrayManager.cc
+++ b/src/mds/StrayManager.cc
@@ -553,6 +553,10 @@ bool StrayManager::__eval_stray(CDentry *dn, bool delay)
 logger->set(l_mdc_num_strays_delayed, num_strays_delayed);
   }

+  if (in->inode.nlink < 0) {
+in->inode.nlink=0;
+  }
+
   // purge?
   if (in->inode.nlink == 0) {
 // past snaprealm parents imply snapped dentry remote links.
diff --git a/src/xxHash b/src/xxHash
--- a/src/xxHash
+++ b/src/xxHash
@@ -1 +1 @@


Im not sure if this works, the patched mds no longer crashes, however I 
expected that this value:

root@mds02:~ # ceph daemonperf mds.1
-mds-- --mds_server-- ---objecter--- -mds_cache- ---mds_log
rlat inos caps|hsr  hcs  hcr |writ read actv|recd recy stry purg|segs evts subm|
  0  100k   0 |  000 |  000 |  00  625k   0 | 30   25k   0
   

Should go down, but it stays at 625k, unfortunately I don't have another System 
to compare.

After I started the patched mds once, I reverted back to an unpatched mds, and it also 
stopped crashing, so I guess it did "fix" something.


A question just out of curiosity, I tried to log these events with something 
like:

 dout(10) << "Fixed negative inode count";

or

 derr << "Fixed negative inode count";

But my compiler yelled at me for trying this.


Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.

2017-09-14 Thread Meyers Mark
A serious problem of mds I think.
Anyone to fix it?

Regards.

On Thu, Sep 14, 2017 at 19:55 Micha Krause  wrote:

> Hi,
>
> looking at the code, and running with debug mds = 10 it looks like I have
> an inode with negative link count.
>
>  -2> 2017-09-14 13:28:39.249399 7f3919616700 10 mds.0.cache.strays
> eval_stray [dentry #100/stray7/17aa2f6 [2,head] auth (dversion lock)
> pv=0 v=23058565 inode=0x7f394b7e0730 0x7f3945a96270]
>  -1> 2017-09-14 13:28:39.249445 7f3919616700 10 mds.0.cache.strays
> inode is [inode 17aa2f6 [2,head] ~mds0/stray7/17aa2f6 auth
> v23057120 s=4476488 nl=-1 n(v0 b4476488 1=1+0) (iversion lock) 0x7f394b7e
>
> I guess "nl" stands for number of links.
>
> The code in StrayManager.cc checks for:
>
> if (in->inode.nlink == 0) { ... }
> else {
> eval_remote_stray(dn, NULL);
> }
>
> void StrayManager::eval_remote_stray(CDentry *stray_dn, CDentry *remote_dn)
> {
> ...
> assert(stray_in->inode.nlink >= 1);
> ...
> }
>
> So if my link count is indeed -1 ceph will die here.
>
>
> The question is: how can I get rid of this inode?
>
>
> Micha Krause
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS crashes shortly after startup while trying to purge stray files.

2017-09-14 Thread Micha Krause

Hi,

looking at the code, and running with debug mds = 10 it looks like I have an 
inode with negative link count.

-2> 2017-09-14 13:28:39.249399 7f3919616700 10 mds.0.cache.strays 
eval_stray [dentry #100/stray7/17aa2f6 [2,head] auth (dversion lock) pv=0 
v=23058565 inode=0x7f394b7e0730 0x7f3945a96270]
-1> 2017-09-14 13:28:39.249445 7f3919616700 10 mds.0.cache.strays  inode is 
[inode 17aa2f6 [2,head] ~mds0/stray7/17aa2f6 auth v23057120 s=4476488 
nl=-1 n(v0 b4476488 1=1+0) (iversion lock) 0x7f394b7e

I guess "nl" stands for number of links.

The code in StrayManager.cc checks for:

if (in->inode.nlink == 0) { ... }
else {
eval_remote_stray(dn, NULL);
}

void StrayManager::eval_remote_stray(CDentry *stray_dn, CDentry *remote_dn)
{
...
assert(stray_in->inode.nlink >= 1);
...
}

So if my link count is indeed -1 ceph will die here.


The question is: how can I get rid of this inode?


Micha Krause
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com