Hi Venky,

thanks for taking the time. I'm afraid I still don't get the difference. Maybe 
the ceph dev terminology means something else than what I use. Let's look at 
this statement, I think it summarises my misery quite well:

> It's an implementation difference. In octopus, each child dir (direct
> descendent of the ephemeral pinned directory) is pinned to a target
> MDS based on the hash of its (child dir) inode number. From pacific
> onwards, the dirfrags are distributed across ranks. This limits the
> number of subtrees.

Let's say we have /home/{a..c} and I enable ephemeral pinning on /home. Let's 
also say that each of /home/{a..c} have a number of directory fragments, maybe 
somewhere deeper down in the hierarchy. As far as I understand it, ephemeral 
distributed pinning means that a static pin based on a hash function is 
assigned to each of /home/{a..c}, which, in turn, is then inherited by all of 
their child directories. Meaning that all directories under /home/a/ have the 
same effective static pin as /home/a and likewise for /home/b/... and 
/home/c/...

To me, this implies that any directory fragment that is a descendent of /home/a 
is also pinned to the same MDS as /home/a. I really don't understand what the 
difference between "each child dir (direct descendent of the ephemeral pinned 
directory) is pinned to a target MDS" (octopus) and "the dirfrags are 
distributed across ranks" (pacific) is. In other words, if /home/a is assigned 
a rank pin and all of its descendants inherit this rank pin, how can any 
directory fragment of (a descendant of) /home/a end up on an MDS that is 
different than the one assigned to /home/a?

What I observed is that /home/a/.../xyz and /home/a/..../uvw ended up on 
different ranks and none of the descriptions I have seen so far give an 
explanation for why this is expected. All explanations I have seen state that 
these should be on the same MDS in both, octopus and pacific.

It would be great if you could help me out here. Maybe it really is just 
terminology?

Thanks a lot for your time again!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Venky Shankar <vshan...@redhat.com>
Sent: 29 November 2022 15:54:12
To: Frank Schilder
Cc: Reed Dier; ceph-users
Subject: Re: [ceph-users] Re: MDS stuck ops

Hi Frank,

On Tue, Nov 29, 2022 at 5:38 PM Frank Schilder <fr...@dtu.dk> wrote:
>
> Hi Venky,
>
> maybe you can help me clarifying the situation a bit. I don't understand the 
> difference between the two pinning implementations you describe in your reply 
> and I also don't see any difference in meaning in the documentation between 
> octopus and quicy, the difference is just in wording. Both texts state that 
> "all of a directory’s immediate children should be ephemerally pinned" 
> (octopus) and "This has the effect of distributing immediate children across 
> a range of MDS ranks" (quincy).
>
> To me, both mean that, if I enable distributed ephemeral pinning on /home, 
> then for every child /home/X of home it follows that /home/X and any 
> directory under /home/X/ are pinned to the same MDS rank. Meaning their 
> information in cache exists on this rank only and no other MDS is serving 
> requests for any of these directories.
>
> Is there something wrong with this interpretation?

Distributed ephemeral pins will distribute immediate children across a
range of MDS ranks - /home/X might be on rank 1, /home/Y on rank 2,
/home/Z on rank 0, and so on.

>
> I tried it with octopus and the cache for directories under /home/X/ was all 
> over the place. Nothing was pinned to a single rank and on top of that the 
> number of sub-trees was extremely unevenly assigned and excessively large. 
> After I set an explicit pin on every child /home/X of /home, only then was 
> all cache information about all subdirs of /home/X/ handled by the MDS I 
> pinned it to.

The directories (children) are spread across MDSs based on the
(consistent) hash of its inode number. The distribution should be
uniform across ranks.

>
> What should the result of distributed ephemeral pinning actually be when set 
> on /home?
> What would be different between octopus and quincy?

It's an implementation difference. In octopus, each child dir (direct
descendent of the ephemeral pinned directory) is pinned to a target
MDS based on the hash of its (child dir) inode number. From pacific
onwards, the dirfrags are distributed across ranks. This limits the
number of subtrees.

> Is the documentation (for octopus) misleading or does the implementation not 
> match documentation?

I think the docs are fine - quincy docs do mention that the directory
fragments are distributed while the octopus docs do not. I agree, the
wordings are a bit subtle.

>
> Thanks for any insight!
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Venky Shankar <vshan...@redhat.com>
> Sent: 29 November 2022 10:09:21
> To: Frank Schilder
> Cc: Reed Dier; ceph-users
> Subject: Re: [ceph-users] Re: MDS stuck ops
>
> On Tue, Nov 29, 2022 at 1:42 PM Frank Schilder <fr...@dtu.dk> wrote:
> >
> > Hi Venky.
> >
> > > You most likely ran into performance issues with distributed ephemeral
> > > pins with octopus. It'd be nice to try out one of the latest releases
> > > for this.
> >
> > I run into the problem that distributed ephemeral pinning seems not 
> > actually implemented in octopus. This mode didn't pin anything, see also 
> > the recent conversation with Patrick:
>
> Distributed ephemeral pins used to distribute inodes under a directory
> mongst MDSs which had scalability issues due to the sheer number of
> subtrees. This was changed to distribute dirfrags and I think those
> changes were not in octopus.
>
> >
> > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YEB34F5SREAOOMATOKC6NO3G2GVCSOOZ
> >
> > I sent him a couple of dumps, but am not sure if he is doing anything with 
> > it. I wrote a small script to do the distributed pinning by hand and it 
> > solved all sorts of problems.
>
> Distributing dirfrags solved a lot of scalability issues and those
> changes are available in pacific and beyond. We aren't backporting to
> octopus anymore, so the options are limited.
>
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
>
>
> --
> Cheers,
> Venky
>


--
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to