[ceph-users] Re: Num values for 3 DC 4+2 crush rule

2024-03-16 Thread Eugen Block

Hi Torkil,

Num is 0 but it's not replicated so how does this translate to  
picking 3 of 3 datacenters?


it doesn't really make a difference if replicated or not, it just  
defines how many crush buckets to choose, so it applies in the same  
way as for your replicated pool.


I am thinking we should just change 3 to 2 for the chooseleaf line  
for the 4+2 rule since for 4+5 each DC needs 3 shards and for 4+2  
each DC needs 2 shards. Comments?


Unless your output from the 4+2 rule is incomplete, it doesn't  
currently contain a line how many datacenters to choose, so don't  
forget that. ;-) But yeah, you could either have



step choose indep 0 type datacenter
step chooseleaf indep 2 type host


or


step choose indep 3 type datacenter
step chooseleaf indep 2 type host


The result should be the same. But I recommend to verify with the crushtool:

# get current crushmap
ceph osd getcrushmap -o crushmap.bin
# decompile crushmap
crushtool -d crushmap.bin -o crushmap.txt
# change your crush rule

# test it
crushtool -i crushmap.test --test --rule  --show-mappings --num-rep 6
crushtool -i crushmap.test --test --rule  --show-bad-mappings  
--num-rep 6


You'll see the osd mappings which will tell you if the PGs would be  
distributed as required.


Regards,
Eugen

Zitat von Torkil Svensgaard :

I was just looking at our crush rules as we need to change them from  
failure domain host to failure domain datacenter. The replicated  
ones seem trivial but what about this one for EC 4+2?


rule rbd_ec_data {
id 0
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}

We already have this crush rule for EC 4+5:

"
rule cephfs.hdd.data {
id 7
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step choose indep 0 type datacenter
step chooseleaf indep 3 type host
step emit
}
"

I don't understand the "num" argument for the choose step. The  
documentation[1] says:


"
If {num} == 0, choose pool-num-replicas buckets (as many buckets as  
are available).


If pool-num-replicas > {num} > 0, choose that many buckets.

If {num} < 0, choose pool-num-replicas - {num} buckets.
"

Num is 0 but it's not replicated so how does this translate to  
picking 3 of 3 datacenters?


I am thinking we should just change 3 to 2 for the chooseleaf line  
for the 4+2 rule since for 4+5 each DC needs 3 shards and for 4+2  
each DC needs 2 shards. Comments?


Mvh.

Torkil

[1] https://docs.ceph.com/en/reef/rados/operations/crush-map-edits/

--
Torkil Svensgaard
Systems Administrator
Danish Research Centre for Magnetic Resonance DRCMR, Section 714
Copenhagen University Hospital Amager and Hvidovre
Kettegaard Allé 30, 2650 Hvidovre, Denmark
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: Ceph fs snapshot problem

2024-03-16 Thread Neeraj Pratap Singh
As per the error message you mentioned;
Permission denied : It seems that the 'subvolume' flag has been set on the
root directory and we cannot create snapshots in directories under subvol
dir.

Can u pls retry creating directory after unsetting it by using: setfattr -n
ceph.dir.subvolume -v 0 /mnt


On Sat, Mar 16, 2024 at 2:00 PM Marcus  wrote:

> Hi,
> There is no such attribute.
> /mnt: ceph.dir.subvolume: No such attribute
>
> I did not have getfattr installed so needed to install attr package. Can
> it be that this package was not installed when fs was created so
> ceph.dir.subvolume could not be set at creation?
> Did not get any warnings at creation though.
>
> Thanks for you help!!
>
> On lör, mar 16 2024 at 00:53:22 +0530, Neeraj Pratap Singh <
> neesi...@redhat.com> wrote:
>
> Can u pls do getfattr on root directory and tell whats the output?
> Run this command: getfattr -n ceph.dir.subvolume /mnt
>
> On Thu, Mar 14, 2024 at 4:38 PM Marcus  wrote:
>
>>
>> Hi all,
>> I have just setup a small ceph cluster with ceph fs.
>> The setup is reef 18.2.1 on Debian bookworm.
>> The system is up and running the way it should,
>> though I have a problem with ceph fs snapshots.
>>
>> When I read the doc I should be able to make a
>> snapshot in any directory in the filesystem.
>> I can do a snapshot in the root of the filesystem
>> but if I try somewhere else I get:
>> Operation not permitted
>> This is the same if I do it with mkdir or
>> with ceph fs subvolume snapshot create ...
>>
>> I have created an auth client with rws:
>> [client.snap-mount]
>> key = 
>> caps mds = "allow rws fsname=gds-common"
>> caps mon = "allow r fsname=gds-common"
>> caps osd = "allow rw tag cephfs data=gds-common"
>>
>> Where the filsystem is called gds-common,
>> saved in a file on the client: /etc/ceph/ceph.client.snap-mount.keyring
>>
>> I mount ceph fs with:
>> mount -t ceph :/ -o name=snap-mount /mnt
>>
>> If I create a snapshot in root, it works fine, as in:
>> mkdir /mnt/.snap/mysnap
>> I also notice that in every subdir there is a "snapshot dir" as well
>> with the name _mysnap_1, as in:
>> /mnt/dir/.snap/_mysnap_1
>>
>> My guess that is is a part of the snapshot system, this "snapshot"
>> dissapear when the snapshot is removed with:
>> rmdir /mnt/.snap/mysnap
>>
>> If I try to make a snapshot in another directory this does not work:
>> mkdir /mnt/dir/.snap/othersnap
>> Get the error:
>> cannot create directory ‘/mnt/dir/.snap/othersnap’: Operation not
>> permitted
>>
>> It is the same thing on the commandline, root works:
>> ceph fs subvolume snapshot create gds-common / fromcmd
>>
>> But not in a subdir:
>> ceph fs subvolume snapshot create gds-common /dir dirsnap
>> Error EINVAL: invalid value specified for ceph.dir.subvolume
>>
>> I also notice that when you use any command of type:
>> ceph fs subvolume snapshot ...
>> You get a new directory (volumes) in the root:
>> /mnt/volumes/_legacy/cd76f96956469e7be39d750cc7d9.meta
>>
>> I do not know if I am missing something, some lacking of
>> config or so.
>>
>> Thanks for your help!!
>>
>> Best regards
>> Marcus
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: activating+undersized+degraded+remapped

2024-03-16 Thread Eugen Block
Yeah, the whole story would help to give better advice. With EC the  
default min_size is k+1, you could reduce the min_size to 5  
temporarily, this might bring the PGs back online. But the long term  
fix is to have all required OSDs up and have enough OSDs to sustain an  
outage.


Zitat von Wesley Dillingham :


Please share "ceph osd tree" and "ceph osd df tree" I suspect you have not
enough hosts to satisfy the EC

On Sat, Mar 16, 2024, 8:04 AM Deep Dish  wrote:


Hello

I found myself in the following situation:

[WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive

pg 4.3d is stuck inactive for 8d, current state
activating+undersized+degraded+remapped, last acting
[4,NONE,46,NONE,10,13,NONE,74]

pg 4.6e is stuck inactive for 9d, current state
activating+undersized+degraded+remapped, last acting
[NONE,27,77,79,55,48,50,NONE]

pg 4.cb is stuck inactive for 8d, current state
activating+undersized+degraded+remapped, last acting
[6,NONE,42,8,60,22,35,45]


I have one cephfs with two backing pools -- one for replicated data, the
other for erasure data.  Each pool is mapped to REPLICATED/ vs. ERASURE/
directories on the filesystem.


The above pgs. are affecting the ERASURE pool (5+3) backing the FS.   How
can I get ceph to recover these three PGs?



Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: activating+undersized+degraded+remapped

2024-03-16 Thread Wesley Dillingham
Please share "ceph osd tree" and "ceph osd df tree" I suspect you have not
enough hosts to satisfy the EC

On Sat, Mar 16, 2024, 8:04 AM Deep Dish  wrote:

> Hello
>
> I found myself in the following situation:
>
> [WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive
>
> pg 4.3d is stuck inactive for 8d, current state
> activating+undersized+degraded+remapped, last acting
> [4,NONE,46,NONE,10,13,NONE,74]
>
> pg 4.6e is stuck inactive for 9d, current state
> activating+undersized+degraded+remapped, last acting
> [NONE,27,77,79,55,48,50,NONE]
>
> pg 4.cb is stuck inactive for 8d, current state
> activating+undersized+degraded+remapped, last acting
> [6,NONE,42,8,60,22,35,45]
>
>
> I have one cephfs with two backing pools -- one for replicated data, the
> other for erasure data.  Each pool is mapped to REPLICATED/ vs. ERASURE/
> directories on the filesystem.
>
>
> The above pgs. are affecting the ERASURE pool (5+3) backing the FS.   How
> can I get ceph to recover these three PGs?
>
>
>
> Thank you.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] activating+undersized+degraded+remapped

2024-03-16 Thread Deep Dish
Hello

I found myself in the following situation:

[WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive

pg 4.3d is stuck inactive for 8d, current state
activating+undersized+degraded+remapped, last acting
[4,NONE,46,NONE,10,13,NONE,74]

pg 4.6e is stuck inactive for 9d, current state
activating+undersized+degraded+remapped, last acting
[NONE,27,77,79,55,48,50,NONE]

pg 4.cb is stuck inactive for 8d, current state
activating+undersized+degraded+remapped, last acting
[6,NONE,42,8,60,22,35,45]


I have one cephfs with two backing pools -- one for replicated data, the
other for erasure data.  Each pool is mapped to REPLICATED/ vs. ERASURE/
directories on the filesystem.


The above pgs. are affecting the ERASURE pool (5+3) backing the FS.   How
can I get ceph to recover these three PGs?



Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-03-16 Thread Frédéric Nass
 
  
Hello Van Diep, 
  
 
I read this after you got out of trouble. 
  
According to your ceph osd tree, it looks like your problems started when the 
ceph orchestrator created osd.29 on node 'cephgw03' because it looks very 
unlikely that you created a 100MB OSD on a node that's named after "GW". 
  
You may have added the 'osds' label to the 'cephgw03' node and/or played with 
the service_type:osd and/or added the cephgw03 node to the crushmap, which 
triggered the creation of osd.29 by the orchestrator. 
cephgw03 node being part of the 'default' root bucket, other OSDs legitimately 
started to send objects to osd.29, way to small to accommodate them, PGs then 
becoming 'backfill_toofull'. 
  
To get out of this situation, you could have: 
  
$ ceph osd crush add-bucket closet root 
$ ceph osd crush move cephgw03 root=closet 
  
This would have moved 'cephgw03' node out of the 'default' root and probably 
fixed your problem instantly.  
 
Regards,  
 
Frédéric.  

   

-Message original-

De: Anthony 
à: nguyenvandiep 
Cc: ceph-users 
Envoyé: samedi 24 février 2024 16:24 CET
Sujet : [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

There ya go. 

You have 4 hosts, one of which appears to be down and have a single OSD that is 
so small as to not be useful. Whatever cephgw03 is, it looks like a mistake. 
OSDs much smaller than, say, 1TB often aren’t very useful. 

Your pools appear to be replicated, size=3. 

So each of your cephosd* hosts stores one replica of each RADOS object. 

You added the 10TB spinners to only two of your hosts, which means that they’re 
only being used as though they were 4TB OSDs. That’s part of what’s going on. 

You want to add a 10TB spinner to cephosd02. That will help your situation 
significantly. 

After that, consider adding a cephosd04 host. Having at least one more failure 
domain than replicas lets you better use uneven host capacities. 




> On Feb 24, 2024, at 10:06 AM, nguyenvand...@baoviet.com.vn wrote: 
> 
> Hi Mr Anthony, 
> 
> pls check the output 
> 
> https://anotepad.com/notes/s7nykdmc 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io  
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: Ceph fs snapshot problem

2024-03-16 Thread Marcus

Hi,
There is no such attribute.
/mnt: ceph.dir.subvolume: No such attribute

I did not have getfattr installed so needed to install attr package. 
Can it be that this package was not installed when fs was created so 
ceph.dir.subvolume could not be set at creation?

Did not get any warnings at creation though.

Thanks for you help!!

On lör, mar 16 2024 at 00:53:22 +0530, Neeraj Pratap Singh 
 wrote:

Can u pls do getfattr on root directory and tell whats the output?
Run this command: getfattr -n ceph.dir.subvolume /mnt

On Thu, Mar 14, 2024 at 4:38 PM Marcus > wrote:


 Hi all,
 I have just setup a small ceph cluster with ceph fs.
 The setup is reef 18.2.1 on Debian bookworm.
 The system is up and running the way it should,
 though I have a problem with ceph fs snapshots.

 When I read the doc I should be able to make a
 snapshot in any directory in the filesystem.
 I can do a snapshot in the root of the filesystem
 but if I try somewhere else I get:
 Operation not permitted
 This is the same if I do it with mkdir or
 with ceph fs subvolume snapshot create ...

 I have created an auth client with rws:
 [client.snap-mount]
 key = 
 caps mds = "allow rws fsname=gds-common"
 caps mon = "allow r fsname=gds-common"
 caps osd = "allow rw tag cephfs data=gds-common"

 Where the filsystem is called gds-common,
 saved in a file on the client: 
/etc/ceph/ceph.client.snap-mount.keyring


 I mount ceph fs with:
 mount -t ceph :/ -o name=snap-mount /mnt

 If I create a snapshot in root, it works fine, as in:
 mkdir /mnt/.snap/mysnap
 I also notice that in every subdir there is a "snapshot dir" as well
 with the name _mysnap_1, as in:
 /mnt/dir/.snap/_mysnap_1

 My guess that is is a part of the snapshot system, this "snapshot"
 dissapear when the snapshot is removed with:
 rmdir /mnt/.snap/mysnap

 If I try to make a snapshot in another directory this does not work:
 mkdir /mnt/dir/.snap/othersnap
 Get the error:
 cannot create directory ‘/mnt/dir/.snap/othersnap’: Operation 
not

 permitted

 It is the same thing on the commandline, root works:
 ceph fs subvolume snapshot create gds-common / fromcmd

 But not in a subdir:
 ceph fs subvolume snapshot create gds-common /dir dirsnap
 Error EINVAL: invalid value specified for ceph.dir.subvolume

 I also notice that when you use any command of type:
 ceph fs subvolume snapshot ...
 You get a new directory (volumes) in the root:
 /mnt/volumes/_legacy/cd76f96956469e7be39d750cc7d9.meta

 I do not know if I am missing something, some lacking of
 config or so.

 Thanks for your help!!

 Best regards
 Marcus

 ___
 ceph-users mailing list -- ceph-users@ceph.io 

 To unsubscribe send an email to ceph-users-le...@ceph.io 



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io