[ceph-users] Why is the disk usage much larger than the available space displayed by the `df` command after disabling ext4 journal?

2022-10-12 Thread 郑亮
Hi all,
I have create a pod using rbd image as backend storage, then map rbd image
to local block device, and mount it with ext4 filesystem. The `df`
displays the disk usage much larger than the available space displayed
after disabling ext4 journal. The following is the steps to reproduce,
thanks in advance.

Environment details

   - Image/version of Ceph CSI driver : cephcsi:v3.5.1
   - Kernel version :

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) uname -a
Linux k1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
x86_64 x86_64 x86_64 GNU/Linux


   - Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd
   its
   krbd or rbd-nbd) : krbd
   - Kubernetes cluster version :

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) kubectl version
Client Version: version.Info{Major:"1", Minor:"22",
GitVersion:"v1.22.7",
GitCommit:"b56e432f2191419647a6a13b9f5867801850f969",
GitTreeState:"clean", BuildDate:"2022-02-16T11:50:27Z",
GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22",
GitVersion:"v1.22.7",
GitCommit:"b56e432f2191419647a6a13b9f5867801850f969",
GitTreeState:"clean", BuildDate:"2022-02-16T11:43:55Z",
GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}


   - Ceph cluster version :

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel)ceph --version
ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)

Steps to reproduce

Steps to reproduce the behavior:

   1. Create a storageclass with storageclass
   
   2. Then create pvc, and test pod like below

➜ /root ☞ cat csi-rbd/examples/pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
- ReadWriteOnce
  resources:
requests:
  storage: 50Gi
  storageClassName: csi-rbd-sc

🍺 /root ☞cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: csi-rbd-demo-pod
spec:
  containers:
- name: web-server
  image: docker.io/library/nginx:latest
  volumeMounts:
- name: mypvc
  mountPath: /var/lib/www/html
  volumes:
- name: mypvc
  persistentVolumeClaim:
claimName: rbd-pvc
readOnly: false


   1. The following steps are executed in ceph cluster

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) rbd ls -p
pool-51312494-44b2-43bc-8ba1-9c4f5eda3287
csi-vol-ad0bba2a-49fc-11ed-8ab9-3a534777138b

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) rbd map
pool-51312494-44b2-43bc-8ba1-9c4f5eda3287/csi-vol-ad0bba2a-49fc-11ed-8ab9-3a534777138b
/dev/rbd0🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) lsblk -f
NAMEFSTYPE  LABEL UUID
  MOUNTPOINT
sr0
vda
├─vda1  xfs   a080444c-7927-49f7-b94f-e20f823bbc95   /boot
├─vda2  LVM2_member   jDjk4o-AaZU-He1S-8t56-4YEY-ujTp-ozFrK5
│ ├─centos-root xfs   5e322b94-4141-4a15-ae29-4136ae9c2e15   /
│ └─centos-swap swap  d59f7992-9027-407a-84b3-ec69c3dadd4e
└─vda3  LVM2_member   Qn0c4t-Sf93-oIDr-e57o-XQ73-DsyG-pGI8X0
  └─centos-root xfs   5e322b94-4141-4a15-ae29-4136ae9c2e15   /
vdb
vdc
rbd0ext4  e381fa9f-9f94-43d1-8f3a-c2d90bc8de27

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) mount /dev/rbd0
/mnt/ext4🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) df -hT | egrep
'rbd|Type'
Filesystem  Type  Size  Used Avail Use% Mounted on
/dev/rbd0   ext4   49G   53M   49G   1% /mnt/ext4🍺
/root/go/src/ceph/ceph-csi ☞ git:(devel) umount /mnt/ext4🍺
/root/go/src/ceph/ceph-csi ☞ git:(devel) tune2fs -o
journal_data_writeback /dev/rbd0
tune2fs 1.46.5 (30-Dec-2021)🍺 /root/go/src/ceph/ceph-csi ☞
git:(devel) tune2fs -O "^has_journal" /dev/rbd0   * <= disable
ext4 journal*
tune2fs 1.46.5 (30-Dec-2021)🍺 /root/go/src/ceph/ceph-csi ☞
git:(devel) e2fsck -f  /dev/rbd0
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/rbd0: 11/3276800 files (0.0% non-contiguous), 219022/13107200
blocks🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) mount /dev/rbd0
/mnt/ext4🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) df -hT | egrep
'rbd|Type'
Filesystem  Type  Size  Used Avail Use% Mounted on
/dev/rbd0   ext4   64Z   64Z   50G 100% /mnt/ext4

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel)mount | grep rbd
/dev/rbd0 on /mnt/ext4 type ext4 (rw,relatime,stripe=1024)

Actual results

The disk usage much larger than the available space displayed by the df command
after disabling ext4 journal

🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel)df -T | egrep 'rbd|Type'
Filesystem  Type1K-blocks
Used Available Use% Mounted on
/dev/rbd0   ext4 73786976277711028224
73786976277659475512  51536328 100% /mnt/ext4

Expected behavior

The df command could show disk us

[ceph-users] Re: Inherited CEPH nightmare

2022-10-12 Thread Janne Johansson
> I've changed some elements of the config now and the results are much better 
> but still quite poor relative to what I would consider normal SSD performance.
 The number of PGs has been increased from 128 to 256.  Not yet run JJ Balancer.
> In terms of performance, I measured the time it takes for ProxMox to clone a 
> 127GB VM. It now clones in around 18 minutes, rather than 1 hour 55 mins 
> before the config changes, so there is progress here.
> Any other suggestions are welcome.
> root@cl1-h1-lv:~# ceph osd df
> ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP  META 
> AVAIL%USE   VAR   PGS  STATUS
>  4ssd  0.90970   1.0  932 GiB  635 GiB  632 GiB   1.1 MiB  2.5 GiB  
> 297 GiB  68.12  1.03   79  up
>  9ssd  0.90970   1.0  932 GiB  643 GiB  640 GiB62 MiB  2.1 GiB  
> 289 GiB  68.98  1.05   81  up

It would be possible (and perhaps improve a slight bit more) to allow
even more PGs to the large pools, you have around 80 PGs per OSD now,
and between 100-200 is supposed to be an ok figure, given all pools,
so if you are at ~80 now with 256 PGs on the main pool, you could bump
it to 512 unless you plan to add lots more pools later without
expanding the amount of OSDs.

Not a huge win, but more "placing it at the middle of the comfort
zone" in terms of "slightly faster scrubs", "spread work around
several OSDs when one large operation is requested" and so on.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why is the disk usage much larger than the available space displayed by the `df` command after disabling ext4 journal?

2022-10-12 Thread Ilya Dryomov
On Wed, Oct 12, 2022 at 9:37 AM 郑亮  wrote:
>
> Hi all,
> I have create a pod using rbd image as backend storage, then map rbd image
> to local block device, and mount it with ext4 filesystem. The `df`
> displays the disk usage much larger than the available space displayed
> after disabling ext4 journal. The following is the steps to reproduce,
> thanks in advance.
>
> Environment details
>
>- Image/version of Ceph CSI driver : cephcsi:v3.5.1
>- Kernel version :
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) uname -a
> Linux k1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
> x86_64 x86_64 x86_64 GNU/Linux
>
>
>- Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd
>its
>krbd or rbd-nbd) : krbd
>- Kubernetes cluster version :
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) kubectl version
> Client Version: version.Info{Major:"1", Minor:"22",
> GitVersion:"v1.22.7",
> GitCommit:"b56e432f2191419647a6a13b9f5867801850f969",
> GitTreeState:"clean", BuildDate:"2022-02-16T11:50:27Z",
> GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
> Server Version: version.Info{Major:"1", Minor:"22",
> GitVersion:"v1.22.7",
> GitCommit:"b56e432f2191419647a6a13b9f5867801850f969",
> GitTreeState:"clean", BuildDate:"2022-02-16T11:43:55Z",
> GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
>
>
>- Ceph cluster version :
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel)ceph --version
> ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus 
> (stable)
>
> Steps to reproduce
>
> Steps to reproduce the behavior:
>
>1. Create a storageclass with storageclass
>
> 
>2. Then create pvc, and test pod like below
>
> ➜ /root ☞ cat csi-rbd/examples/pvc.yaml
> ---
> apiVersion: v1
> kind: PersistentVolumeClaim
> metadata:
>   name: rbd-pvc
> spec:
>   accessModes:
> - ReadWriteOnce
>   resources:
> requests:
>   storage: 50Gi
>   storageClassName: csi-rbd-sc
>
> 🍺 /root ☞cat pod.yaml
> apiVersion: v1
> kind: Pod
> metadata:
>   name: csi-rbd-demo-pod
> spec:
>   containers:
> - name: web-server
>   image: docker.io/library/nginx:latest
>   volumeMounts:
> - name: mypvc
>   mountPath: /var/lib/www/html
>   volumes:
> - name: mypvc
>   persistentVolumeClaim:
> claimName: rbd-pvc
> readOnly: false
>
>
>1. The following steps are executed in ceph cluster
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) rbd ls -p
> pool-51312494-44b2-43bc-8ba1-9c4f5eda3287
> csi-vol-ad0bba2a-49fc-11ed-8ab9-3a534777138b
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) rbd map
> pool-51312494-44b2-43bc-8ba1-9c4f5eda3287/csi-vol-ad0bba2a-49fc-11ed-8ab9-3a534777138b
> /dev/rbd0🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) lsblk -f
> NAMEFSTYPE  LABEL UUID
>   MOUNTPOINT
> sr0
> vda
> ├─vda1  xfs   a080444c-7927-49f7-b94f-e20f823bbc95   /boot
> ├─vda2  LVM2_member   jDjk4o-AaZU-He1S-8t56-4YEY-ujTp-ozFrK5
> │ ├─centos-root xfs   5e322b94-4141-4a15-ae29-4136ae9c2e15   /
> │ └─centos-swap swap  d59f7992-9027-407a-84b3-ec69c3dadd4e
> └─vda3  LVM2_member   Qn0c4t-Sf93-oIDr-e57o-XQ73-DsyG-pGI8X0
>   └─centos-root xfs   5e322b94-4141-4a15-ae29-4136ae9c2e15   /
> vdb
> vdc
> rbd0ext4  e381fa9f-9f94-43d1-8f3a-c2d90bc8de27
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) mount /dev/rbd0
> /mnt/ext4🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) df -hT | egrep
> 'rbd|Type'
> Filesystem  Type  Size  Used Avail Use% Mounted on
> /dev/rbd0   ext4   49G   53M   49G   1% /mnt/ext4🍺
> /root/go/src/ceph/ceph-csi ☞ git:(devel) umount /mnt/ext4🍺
> /root/go/src/ceph/ceph-csi ☞ git:(devel) tune2fs -o
> journal_data_writeback /dev/rbd0
> tune2fs 1.46.5 (30-Dec-2021)🍺 /root/go/src/ceph/ceph-csi ☞
> git:(devel) tune2fs -O "^has_journal" /dev/rbd0   * <= disable
> ext4 journal*
> tune2fs 1.46.5 (30-Dec-2021)🍺 /root/go/src/ceph/ceph-csi ☞
> git:(devel) e2fsck -f  /dev/rbd0
> e2fsck 1.46.5 (30-Dec-2021)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/rbd0: 11/3276800 files (0.0% non-contiguous), 219022/13107200
> blocks🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) mount /dev/rbd0
> /mnt/ext4🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) df -hT | egrep
> 'rbd|Type'
> Filesystem  Type  Size  Used Avail Use% Mounted on
> /dev/rbd0   ext4   64Z   64Z   50G 100% /mnt/ext4
>
> 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel)mount | grep rbd
> /dev/rbd0 on /mnt/ext4 type ext4 (rw,relatime,stripe=1024)
>
> Actual results
>
> The disk usage much larger than the available space displayed by the df 
> command
> after disabling ext4 journal

[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-12 Thread Frank Schilder
Hi Szabo.

You mean like copy+paste what I wrote before:

> IDCLASS WEIGHT   REWEIGHT  SIZE RAW USE   DATA  OMAP 
> META  AVAIL%USE   VAR   PGS  STATUS TYPE NAME
>29   ssd  0.09099   1.0   93 GiB49 GiB17 GiB   16 GiB  
>   15 GiB   44 GiB  52.42  1.91  104 up  osd.29
>44   ssd  0.09099   1.0   93 GiB50 GiB23 GiB   10 GiB  
>   16 GiB   43 GiB  53.88  1.96  121 up  osd.44
>58   ssd  0.09099   1.0   93 GiB49 GiB16 GiB   15 GiB  
>   18 GiB   44 GiB  52.81  1.92  123 up  osd.58
>   984   ssd  0.09099   1.0   93 GiB57 GiB26 GiB   13 GiB  
>   17 GiB   37 GiB  60.81  2.21  133 up  osd.984

This is representative for the entire pool. I'm done with getting the cluster 
up again and these disks are now almost empty. The problem seems to be that 
100G OSDs are just a bit too small for octopus.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Szabo, Istvan (Agoda) 
Sent: 07 October 2022 14:28
To: Frank Schilder
Cc: Igor Fedotov; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: OSD crashes during upgrade mimic->octopus

Finally how is your pg distribution? How many pg/disk?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Frank Schilder 
Sent: Friday, October 7, 2022 6:50 PM
To: Igor Fedotov ; ceph-users@ceph.io
Subject: [ceph-users] Re: OSD crashes during upgrade mimic->octopus

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi all,

trying to respond to 4 past emails :)

We started using manual conversion and, if  the conversion fails, it fails in 
the last step. So far, we have a fail on 1 out of 8 OSDs. The OSD can be 
repaired with running a compaction + another repair, which will complete the 
last step. Looks like we are just on the edge and can get away with 
double-compaction.

For the interested future reader, we have subdivided 400G high-performance SSDs 
into 4x100G OSDs for our FS meta data pool. The increased concurrency improves 
performance a lot. But yes, we are on the edge. OMAP+META is almost 50%.

In our case, just merging 2x100 into 1x200 will probably not improve things as 
we will end up with an even more insane number of objects per PG than what we 
have already today. I will plan for having more OSDs for the meta-data pool 
available and also plan for having the infamous 60G temp space available with a 
bit more margin than what we have now.

Thanks to everyone who helped!

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Igor Fedotov 
Sent: 07 October 2022 13:21:29
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSD crashes during upgrade mimic->octopus

Hi Frank,

there no tools to defragment OSD atm. The only way to defragment OSD is to 
redeploy it...


Thanks,

Igor


On 10/7/2022 3:04 AM, Frank Schilder wrote:
> Hi Igor,
>
> sorry for the extra e-mail. I forgot to ask: I'm interested in a tool to 
> de-fragment the OSD. It doesn't look like the fsck command does that. Is 
> there any such tool?
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Frank Schilder 
> Sent: 07 October 2022 01:53:20
> To: Igor Fedotov; ceph-users@ceph.io
> Subject: [ceph-users] Re: OSD crashes during upgrade mimic->octopus
>
> Hi Igor,
>
> I added a sample of OSDs on identical disks. The usage is quite well 
> balanced, so the numbers I included are representative. I don't believe that 
> we had one such extreme outlier. Maybe it ran full during conversion. Most of 
> the data is OMAP after all.
>
> I can't dump the free-dumps into paste bin, they are too large. Not sure if 
> you can access ceph-post-files. I will send you a tgz in a separate e-mail 
> directly to you.
>
>> And once again - do other non-starting OSDs show the same ENOSPC error?
>> Evidently I'm unable to make any generalization about the root cause
>> due to lack of the info...
> As I said before, I need more time to check this and give you the answer you 
> actually want. The stupid answer is they don't, because the other 3 are taken 
> down the moment 16 crashes and don't reach the same point. I need to take 
> them out of the grouped management and start them by hand, which I can do 
> tomorrow. I'm too tired now to play on our production system.
>
> The free-dumps are on their separate way. I included one for OSD 17 as well 
> (on the same disk).
>
> Best regar

[ceph-users] Re: Invalid crush class

2022-10-12 Thread Frank Schilder
https://tracker.ceph.com/issues/45253
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Michael Thomas 
Sent: 08 October 2022 16:40:37
To: ceph-users@ceph.io
Subject: [ceph-users] Invalid crush class

In 15.2.7, how can I remove an invalid crush class?  I'm surprised that
I was able to create it in the first place:

[root@ceph1 bin]# ceph osd crush class ls
[
 "ssd",
 "JBOD.hdd",
 "nvme",
 "hdd"
]


[root@ceph1 bin]# ceph osd crush class ls-osd JBOD.hdd
Invalid command: invalid chars . in JBOD.hdd
osd crush class ls-osd  :  list all osds belonging to the
specific 
Error EINVAL: invalid command

There are no devices mapped to this class:

[root@ceph1 bin]# ceph osd crush tree | grep JBOD | wc -l
0

--Mike
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inherited CEPH nightmare

2022-10-12 Thread Anthony D'Atri
I agree with Janne’s thoughts here, especially since you’re on SSDs. 

> On Oct 12, 2022, at 03:38, Janne Johansson  wrote:
> 
>> I've changed some elements of the config now and the results are much better 
>> but still quite poor relative to what I would consider normal SSD 
>> performance.
> The number of PGs has been increased from 128 to 256.  Not yet run JJ 
> Balancer.
>> In terms of performance, I measured the time it takes for ProxMox to clone a 
>> 127GB VM. It now clones in around 18 minutes, rather than 1 hour 55 mins 
>> before the config changes, so there is progress here.
>> Any other suggestions are welcome.
>> root@cl1-h1-lv:~# ceph osd df
>> ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP  META 
>> AVAIL%USE   VAR   PGS  STATUS
>> 4ssd  0.90970   1.0  932 GiB  635 GiB  632 GiB   1.1 MiB  2.5 GiB  
>> 297 GiB  68.12  1.03   79  up
>> 9ssd  0.90970   1.0  932 GiB  643 GiB  640 GiB62 MiB  2.1 GiB  
>> 289 GiB  68.98  1.05   81  up
> 
> It would be possible (and perhaps improve a slight bit more) to allow
> even more PGs to the large pools, you have around 80 PGs per OSD now,
> and between 100-200 is supposed to be an ok figure, given all pools,
> so if you are at ~80 now with 256 PGs on the main pool, you could bump
> it to 512 unless you plan to add lots more pools later without
> expanding the amount of OSDs.
> 
> Not a huge win, but more "placing it at the middle of the comfort
> zone" in terms of "slightly faster scrubs", "spread work around
> several OSDs when one large operation is requested" and so on.
> 
> -- 
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: crush hierarchy backwards and upmaps ...

2022-10-12 Thread Frank Schilder
Hi Dan,

your comment is very important: https://tracker.ceph.com/issues/57348

By the way, is anyone looking at new cases? I submitted a couple since spring 
and in the past it took not more than 2 weeks to 1 month until someone assigned 
them to a project. Since spring I can't see any such activity any more. The 
issue tracker seems to have turned into a black hole. Do you know what the 
reason might be?

thanks and bets regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dan van der Ster 
Sent: 11 October 2022 19:39:11
To: Christopher Durham
Cc: Ceph Users
Subject: [ceph-users] Re: crush hierarchy backwards and upmaps ...

Hi Chris,

Just curious, does this rule make sense and help with the multi level crush
map issue?
(Maybe it also results in zero movement, or at least less then the
alternative you proposed?)

step choose indep 4 type rack
step chooseleaf indep 2 type chassis

Cheers, Dan




On Tue, Oct 11, 2022, 19:29 Christopher Durham  wrote:

> Dan,
>
> Thank you.
>
> I did what you said regarding --test-map-pgs-dump and it wants to move 3
> OSDs in every PG. Yuk.
>
> So before i do that, I tried this rule, after changing all my 'pod' bucket
> definitions to 'chassis', and compiling and
> injecting the new crushmap to an osdmap:
>
>
> rule mypoolname {
> id -5
> type erasure
> step take myroot
> step choose indep 4 type rack
> step choose indep 2 type chassis
> step chooseleaf indep 1 type host
> step emit
>
> }
>
> --test-pg-upmap-entries shows there were NO changes to be done after
> comparing it with the original!!!
>
> However, --upmap-cleanup says:
>
> verify_upmap number of buckets 8 exceeds desired number of 2
> check_pg_upmaps verify_upmap of poolid.pgid returning -22
>
> This is output for every current upmap, but I really do want 8 total
> buckets per PG, as my pool is a 6+2.
>
> The upmap-cleanup output wants me to remove all of my upmaps.
>
> This seems consistent with a bug report that says that there is a problem
> with the balancer on a
> multi-level rule such as the above, albeit on 14.2.x. Any thoughts?
>
> https://tracker.ceph.com/issues/51729
>
> I am leaning towards just eliminating the middle rule and go directly from
> rack to host, even though
> it wants to move a LARGE amount of data according to  a diff before and
> after of --test-pg-upmap-entries.
> In this scenario, I dont see any unexpected errors with --upmap-cleanup
> and I do not want to get stuck
>
> rule mypoolname {
> id -5
> type erasure
> step take myroot
> step choose indep 4 type rack
> step chooseleaf indep 2 type host
> step emit
> }
>
> -Chris
>
>
> -Original Message-
> From: Dan van der Ster 
> To: Christopher Durham 
> Cc: Ceph Users 
> Sent: Mon, Oct 10, 2022 12:22 pm
> Subject: [ceph-users] Re: crush hierarchy backwards and upmaps ...
>
> Hi,
>
> Here's a similar bug: https://tracker.ceph.com/issues/47361
>
> Back then, upmap would generate mappings that invalidate the crush rule. I
> don't know if that is still the case, but indeed you'll want to correct
> your rule.
>
> Something else you can do before applying the new crush map is use
> osdmaptool to compare the PGs placement before and after, something like:
>
> osdmaptool --test-map-pgs-dump osdmap.before > before.txt
>
> osdmaptool --test-map-pgs-dump osdmap.after > after.txt
>
> diff -u before.txt after.txt
>
> The above will help you estimate how much data will move after injecting
> the fixed crush map. So depending on the impact you can schedule the change
> appropriately.
>
> I also recommend to keep a backup of the previous crushmap so that you can
> quickly restore it if anything goes wrong.
>
> Cheers, Dan
>
>
>
>
>
> On Mon, Oct 10, 2022, 19:31 Christopher Durham  wrote:
>
> > Hello,
> > I am using pacific 16.2.10 on Rocky 8.6 Linux.
> >
> > After setting upmap_max_deviation to 1 on the ceph balancer in ceph-mgr,
> I
> > achieved a near perfect balance of PGs and space on my OSDs. This is
> great.
> >
> > However, I started getting the following errors on my ceph-mon logs,
> every
> > three minutes, for each of the OSDs that had been mappedby the balancer:
> >2022-10-07T17:10:39.619+ 7f7c2786d700 1 verify_upmap unable to get
> > parent of osd.497, skipping for now
> >
> > After banging my head against the wall for a bit trying to figure this
> > out, I think I have discovered the issue:
> >
> > Currently, I have my pool EC Pool configured with the following crush
> rule:
> >
> > rule mypoolname {
> >id -5
> >type erasure
> >step take myroot
> >step choose indep 4 type rack
> >step choose indep 2 type pod
> >step chooseleaf indep 1 type host
> >step emit
> > }
> >
> > Basically, pick 4 racks, then 2 pods in each rack, and then one host in
> > each pod, For a total of
> > 8 chunks. (The pool is a is a 6+2). The 4 racks are chosen from the
> myroot
> > root entry,

[ceph-users] Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

2022-10-12 Thread Frank Schilder
Hi Nicola,

its not noise. Even though the modules seem disabled and pool flags are set to 
false, they still linger around in the background and interfere. See the recent 
thread 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WST6K5A4UQGGISBFGJEZS4HFL2VVWW32/
 .

With all the settings you have, the last one would be setting

ceph config set mgr target_max_misplaced_ratio 1

and all the balancer- and scaling modules will just do what you tell them, 
assuming you know what you are doing. I restored default behaviour with instant 
application of changes and don't have any problems with it.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Josh Baergen 
Sent: 07 October 2022 17:16:49
To: Nicola Mori
Cc: ceph-users
Subject: [ceph-users] Re: Iinfinite backfill loop + number of pgp groups stuck 
at wrong value

As of Nautilus+, when you set pg_num, it actually internally sets
pg(p)_num_target, and then slowly increases (or decreases, if you're
merging) pg_num and then pgp_num until it reaches the target. The
amount of backfill scheduled into the system is controlled by
target_max_misplaced_ratio.

Josh

On Fri, Oct 7, 2022 at 3:50 AM Nicola Mori  wrote:
>
> The situation got solved by itself, since probably there was no error. I
> manually increased the number of PGs and PGPs to 128 some days ago, and
> the PGP count was being updated step by step. Actually after a bump from
> 5% to 7% in the count of misplaced object I noticed that the number of
> PGPs was updated to 126, and after a last bump it is now at 128 with a
> ~4% of misplaced objects currently decreasing.
> Sorry for the noise,
>
> Nicola
>
> On 07/10/22 09:15, Nicola Mori wrote:
> > Dear Ceph users,
> >
> > my cluster is stuck since several days with some PG backfilling. The
> > number of misplaced objects slowly decreases down to 5%, and at that
> > point jumps up again to about 7%, and so on. I found several possible
> > reasons for this behavior. One is related to the balancer, which anyway
> > I think is not operating:
> >
> > # ceph balancer status
> > {
> >  "active": false,
> >  "last_optimize_duration": "0:00:00.000938",
> >  "last_optimize_started": "Thu Oct  6 16:19:59 2022",
> >  "mode": "upmap",
> >  "optimize_result": "Too many objects (0.071539 > 0.05) are
> > misplaced; try again later",
> >  "plans": []
> > }
> >
> > (the lase optimize result is from yesterday when I disabled it, and
> > since then the backfill loop has happened several times).
> > Another possible reason seems to be an imbalance of PG and PGB  numbers.
> > Effectively I found such an imbalance on one of my pools:
> >
> > # ceph osd pool get wizard_data pg_num
> > pg_num: 128
> > # ceph osd pool get wizard_data pgp_num
> > pgp_num: 123
> >
> > but I cannot fix it:
> > # ceph osd pool set wizard_data pgp_num 128
> > set pool 3 pgp_num to 128
> > # ceph osd pool get wizard_data pgp_num
> > pgp_num: 123
> >
> > The autoscaler is off for that pool:
> >
> > POOL   SIZE  TARGET SIZERATE  RAW CAPACITY
> > RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM
> > AUTOSCALE  BULK
> > wizard_data   8951G   1.333730697632152.8T
> > 0.0763  1.0 128  off
> > False
> >
> > so I don't understand why the PGP number is stuck at 123.
> > Thanks in advance for any help,
> >
> > Nicola
>
> --
> Nicola Mori, Ph.D.
> INFN sezione di Firenze
> Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
> +390554572660
> m...@fi.infn.it
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to force PG merging in one step?

2022-10-12 Thread Eugen Block

Hi Frank,

thanks, that was a great hint! I have a strong déjà vu feeling, we  
discussed this before with increasing pg_num, didn't we?


although I don't have a feeling of déjà vu I believe it's a reoccuring  
issue so chances are you're right. ;-)


I just set it to 1 and it did exactly what I wanted. Its the same  
number of PGs backfilling, but pgp_num=1024, so while the  
rebalancing load is the same, I got rid of any redundant data  
movements and I can actually see the progress of the merge just with  
ceph status.


It's helpful to know that setting the target_max_misplaced_ratio to 1  
doesn't cause unwanted side effects. I agree with your point of view  
to reduce unnecessary data movement as much as possible and this seems  
to do the trick (in this case). I'll keep that in mind for future  
recovery scenarios, thanks for testing it in the real world. ;-)


Related to that, I have set mon_max_pg_per_osd=300 and do have OSDs  
with more than 400 PGs. Still, I don't see the promised health  
warning in ceph status. Is this a known issue?


During recovery there's another factor involved  
(osd_max_pg_per_osd_hard_ratio), the default is 3. I had to deal with  
that a few months back when I got inactive PGs due to many chunks and  
"only" a factor of 3. In that specific cluster I increased it to 5 and  
didn't encounter inactive PGs anymore.


Regards,
Eugen

Zitat von Frank Schilder :


Hi Eugen,

thanks, that was a great hint! I have a strong déjà vu feeling, we  
discussed this before with increasing pg_num, didn't we? I just set  
it to 1 and it did exactly what I wanted. Its the same number of PGs  
backfilling, but pgp_num=1024, so while the rebalancing load is the  
same, I got rid of any redundant data movements and I can actually  
see the progress of the merge just with ceph status.


Related to that, I have set mon_max_pg_per_osd=300 and do have OSDs  
with more than 400 PGs. Still, I don't see the promised health  
warning in ceph status. Is this a known issue?


Opinion part.

Returning to the above setting, I have to say that the assignment of  
which parameter influences what seems a bit unintuitive if not  
inconsistent. The parameter target_max_misplaced_ratio belongs to  
the balancer module, but merging PGs clearly is a task of the  
pg_autoscaler module. I'm not balancing, I'm scaling PG numbers.  
Such cross dependencies make it really hard to find relevant  
information in the section of the documentation where one would be  
looking for it. It starts being distributed all over the place.


If its not possible to have such things separated and specific tasks  
consistently explained in a single section, there could at least be  
a hint including also the case of PG merging/splitting in the  
description of target_max_misplaced_ratio so that a search for these  
terms brings up this page. There should also be a cross reference  
from "ceph osd pool set pg[p]_num" to target_max_misplaced_ratio.  
Well, its now here in this message for google to reveal.


I have to add that, while I understand the motivation behind adding  
these baby sitting modules, I would actually appreciate if one could  
disable them. I personally find them to be really annoying  
especially in emergency situations, but also in normal operations. I  
would consider them a nice to have and not enforce it on people who  
want to be in charge.


For example, in my current situation, I'm halving the PG count of a  
pool. Doing the merge in one go or letting the  
target_max_misplaced_ratio "help" me leads to exactly the same  
number of PGs backfilling at any time. Which means both cases,  
target_max_misplaced_ratio=0.05 and 1 lead to exactly the same  
interference of rebalancing IO with user IO. The difference is that  
with target_max_misplaced_ratio=0.05 this phase of reduced  
performance will take longer, because every time the module decides  
to change pgp_num it will inevitably also rebalance objects again  
that have been moved before. I find it difficult to consider this an  
improvement. I prefer to avoid any redundant writes at all cost for  
the benefit of disk life time. If I really need to reduce the impact  
of recovery IO I can set recovery_sleep.


My personal opinion to the user group.

Thanks for your help and have a nice evening!

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: 11 October 2022 14:13:45
To: ceph-users@ceph.io
Subject: [ceph-users] Re: How to force PG merging in one step?

Hi Frank,

I don't think it's the autoscaler interferring here but the default 5%
target_max_misplaced_ratio. I haven't tested the impacts of increasing
that to a much higher value, so be careful.

Regards,
Eugen


Zitat von Frank Schilder :


Hi all,

I need to reduce the number of PGs in a pool from 2048 to 512 and
would really like to do that in a single step. I executed the set
pg_num 512 command, b

[ceph-users] Upgrade from Mimic to Pacific, hidden zone in RGW?

2022-10-12 Thread Federico Lazcano
Hi everyone! I'm looking for help in an upgrade from Mimic.
I've managed to upgrade MON, MGR, OSD from Mimic to Nautilus, Octopus an
Paficic, in that order.

But i'm having trouble migrating the RGW service. It seems that when added
two more RGW servers they somehow were created in a different zone than the
original (Mimic) RGW servers.


root@ceph11-test:/# ceph -s
  cluster:
id: dfade847-e28f-4551-99dc-21e3094d9c8f
health: HEALTH_WARN
mons are allowing insecure global_id reclaim   <--- There are
still RGW in Mimic.
   services:
mon: 3 daemons, quorum ceph11-test,ceph12-test,ceph13-test (age 8d)
mgr: ceph11-test(active, since 8d), standbys: ceph12-test, ceph13-test
osd: 6 osds: 6 up (since 8d), 6 in (since 8d)
rgw: 4 daemons active (4 hosts, 2 zones)   <  TWO ZONES ?


  data:
pools:   8 pools, 416 pgs
objects: 238.31k objects, 42 GiB
usage:   88 GiB used, 2.8 TiB / 2.9 TiB avail
pgs: 416 active+clean


But I can't find a way to list the OTHER Zone



root@ceph11-test:/# radosgw-admin realm list
{
"default_info": "4bd729f3-9e52-43d8-995c-8683d4bf4fbf",
"realms": [
"default"
]
}
root@ceph11-test:/# radosgw-admin zonegroup list
{
"default_info": "47f2c5e8-f942-4e68-8cd9-6372a0ee6935",
"zonegroups": [
"default"
]
}
root@ceph11-test:/# radosgw-admin zone list
{
"default_info": "c91cebf7-81c7-40e2-b107-2e58036cdb92",
"zones": [
"default"
]
}


There are only the default pools


root@ceph11-test:/# ceph osd pool ls
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data
.rgw.root
default.rgw.control
default.rgw.buckets.non-ec
device_health_metrics


I'm using HAProxy to publish de RGW servers.
(extract from haproxy.cfg)
server ceph-rgw3-test ceph-rgw3-test:7480 check fall 3 rise 2   #
OLD Servers in Mimic
server ceph-rgw4-test ceph-rgw4-test:7480 check fall 3 rise 2   #
OLD Servers in Mimic
server ceph11-test ceph11-test:7480 check fall 3 rise 2 #
NEW Servers in Pacific
server ceph12-test ceph12-test:7480 check fall 3 rise 2 #
NEW Servers in Pacific

When I configure the old (Mimic) RGW as the backend, everything works ok,
but if I configure the
new RGW (Pacific) I get HTTP 301 errors when I try to use existing buckets.

*** with old RGW servers - Mimic ***
 s3cmd ls s3://test
2022-10-04 02:19  2097152
 s3://test/s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3
2022-10-04 02:18  2097152
 s3://test/s3loop-74bef337-92ea-4e83-938c-4865d4ee795a.bin.s3
 s3cmd get s3://test/s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3
download: 's3://test/s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3' ->
'./s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3'  [1 of 1]
 2097152 of 2097152   100% in6s   307.83 KB/s  done

*** with new RGW servers - Pacific ***
 s3cmd ls s3://test
 (no results)
   s3cmd get s3://test/s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3
download: 's3://test/s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3' ->
'./s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3'  [1 of 1]
ERROR: Download of './s3loop-2fc8a8a9-6b93-4680-b0a3-6875efaa6cb4.bin.s3'
failed (Reason: 404 (NoSuchBucket))
ERROR: S3 error: 404 (NoSuchBucket)


I suspect this behavior reflects that the old servers and the new servers
are in different zones...but I can't «see» the other zone configuration.

Thanks in advance.

-- 
Federico Lazcano
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why is the disk usage much larger than the available space displayed by the `df` command after disabling ext4 journal?

2022-10-12 Thread 郑亮
Hi Ilya,
Thanks for your quick response!
As mentioned in the ceph-csi ticket [1], it is normal to do this on a loop
device and a rbd image which is created through k8s PVC but not applied in
pod. [1] https://github.com/ceph/ceph-csi/issues/3424

Best Regards,
Liang Zheng

Ilya Dryomov  于2022年10月12日周三 19:15写道:

> On Wed, Oct 12, 2022 at 9:37 AM 郑亮  wrote:
> >
> > Hi all,
> > I have create a pod using rbd image as backend storage, then map rbd
> image
> > to local block device, and mount it with ext4 filesystem. The `df`
> > displays the disk usage much larger than the available space displayed
> > after disabling ext4 journal. The following is the steps to reproduce,
> > thanks in advance.
> >
> > Environment details
> >
> >- Image/version of Ceph CSI driver : cephcsi:v3.5.1
> >- Kernel version :
> >
> > 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) uname -a
> > Linux k1 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
> > x86_64 x86_64 x86_64 GNU/Linux
> >
> >
> >- Mounter used for mounting PVC (for cephFS its fuse or kernel. for
> rbd
> >its
> >krbd or rbd-nbd) : krbd
> >- Kubernetes cluster version :
> >
> > 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) kubectl version
> > Client Version: version.Info{Major:"1", Minor:"22",
> > GitVersion:"v1.22.7",
> > GitCommit:"b56e432f2191419647a6a13b9f5867801850f969",
> > GitTreeState:"clean", BuildDate:"2022-02-16T11:50:27Z",
> > GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
> > Server Version: version.Info{Major:"1", Minor:"22",
> > GitVersion:"v1.22.7",
> > GitCommit:"b56e432f2191419647a6a13b9f5867801850f969",
> > GitTreeState:"clean", BuildDate:"2022-02-16T11:43:55Z",
> > GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
> >
> >
> >- Ceph cluster version :
> >
> > 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel)ceph --version
> > ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus
> (stable)
> >
> > Steps to reproduce
> >
> > Steps to reproduce the behavior:
> >
> >1. Create a storageclass with storageclass
> ><
> https://github.com/ceph/ceph-csi/blob/devel/examples/rbd/storageclass.yaml
> >
> >2. Then create pvc, and test pod like below
> >
> > ➜ /root ☞ cat csi-rbd/examples/pvc.yaml
> > ---
> > apiVersion: v1
> > kind: PersistentVolumeClaim
> > metadata:
> >   name: rbd-pvc
> > spec:
> >   accessModes:
> > - ReadWriteOnce
> >   resources:
> > requests:
> >   storage: 50Gi
> >   storageClassName: csi-rbd-sc
> >
> > 🍺 /root ☞cat pod.yaml
> > apiVersion: v1
> > kind: Pod
> > metadata:
> >   name: csi-rbd-demo-pod
> > spec:
> >   containers:
> > - name: web-server
> >   image: docker.io/library/nginx:latest
> >   volumeMounts:
> > - name: mypvc
> >   mountPath: /var/lib/www/html
> >   volumes:
> > - name: mypvc
> >   persistentVolumeClaim:
> > claimName: rbd-pvc
> > readOnly: false
> >
> >
> >1. The following steps are executed in ceph cluster
> >
> > 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) rbd ls -p
> > pool-51312494-44b2-43bc-8ba1-9c4f5eda3287
> > csi-vol-ad0bba2a-49fc-11ed-8ab9-3a534777138b
> >
> > 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) rbd map
> >
> pool-51312494-44b2-43bc-8ba1-9c4f5eda3287/csi-vol-ad0bba2a-49fc-11ed-8ab9-3a534777138b
> > /dev/rbd0🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) lsblk -f
> > NAMEFSTYPE  LABEL UUID
> >   MOUNTPOINT
> > sr0
> > vda
> > ├─vda1  xfs   a080444c-7927-49f7-b94f-e20f823bbc95
>  /boot
> > ├─vda2  LVM2_member   jDjk4o-AaZU-He1S-8t56-4YEY-ujTp-ozFrK5
> > │ ├─centos-root xfs   5e322b94-4141-4a15-ae29-4136ae9c2e15
>  /
> > │ └─centos-swap swap  d59f7992-9027-407a-84b3-ec69c3dadd4e
> > └─vda3  LVM2_member   Qn0c4t-Sf93-oIDr-e57o-XQ73-DsyG-pGI8X0
> >   └─centos-root xfs   5e322b94-4141-4a15-ae29-4136ae9c2e15
>  /
> > vdb
> > vdc
> > rbd0ext4  e381fa9f-9f94-43d1-8f3a-c2d90bc8de27
> >
> > 🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) mount /dev/rbd0
> > /mnt/ext4🍺 /root/go/src/ceph/ceph-csi ☞ git:(devel) df -hT | egrep
> > 'rbd|Type'
> > Filesystem  Type  Size  Used Avail Use% Mounted on
> > /dev/rbd0   ext4   49G   53M   49G   1% /mnt/ext4🍺
> > /root/go/src/ceph/ceph-csi ☞ git:(devel) umount /mnt/ext4🍺
> > /root/go/src/ceph/ceph-csi ☞ git:(devel) tune2fs -o
> > journal_data_writeback /dev/rbd0
> > tune2fs 1.46.5 (30-Dec-2021)🍺 /root/go/src/ceph/ceph-csi ☞
> > git:(devel) tune2fs -O "^has_journal" /dev/rbd0   * <= disable
> > ext4 journal*
> > tune2fs 1.46.5 (30-Dec-2021)🍺 /root/go/src/ceph/ceph-csi ☞
> > git:(devel) e2fsck -f  /dev/rbd0
> > e2fsck 1.46.5 (30-Dec-2021)
> > Pass 1: Checking inodes, blocks, and sizes
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity
> > Pass 4: Checking reference counts
> > Pass 5: Checking group summary information
> > /dev/rbd0: 11/3276800 files (0.0% non-c

[ceph-users] Re: How to force PG merging in one step?

2022-10-12 Thread Frank Schilder
Hi Eugen.

> During recovery there's another factor involved 
> (osd_max_pg_per_osd_hard_ratio), the default is 3. I had to deal with 
> that a few months back when I got inactive PGs due to many chunks and 
> "only" a factor of 3. In that specific cluster I increased it to 5 and 
> didn't encounter inactive PGs anymore.

Yes, I looked at this as well and I remember cases where people got stuck with 
temporary PG numbers being too high. This is precisely why I wanted to see this 
warning. If its off during recovery, the only way to notice that something is 
going wrong is when you hit the hard limit. But then its too late.

I actually wanted to see this during recovery to have an early warning sign. I 
purposefully did not increase pg_num_max to 500 to make sure that warning shows 
up. I personally consider it really bad behaviour if recovery/rebalancing 
disables this warning. Recovery is the operation where exceeding a PG limit 
limit without knowing will hurt most.

Thanks for the heads up. Probably need to watch my * a bit more with certain 
things.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD crashes during upgrade mimic->octopus

2022-10-12 Thread Alexander E. Patrakov
пт, 7 окт. 2022 г. в 19:50, Frank Schilder :
> For the interested future reader, we have subdivided 400G high-performance 
> SSDs into 4x100G OSDs for our FS meta data pool. The increased concurrency 
> improves performance a lot. But yes, we are on the edge. OMAP+META is almost 
> 50%.

Please be careful with that. In the past, I had to help a customer who
ran out of disk space on small SSD partitions. This has happened
because MONs keep a history of all OSD and PG maps until at least the
clean state. So, during a prolonged semi-outage (when the cluster is
not healthy) they will slowly creep and accumulate and eat disk space
- and the problematic part is that this creepage is replicated to
OSDs.


-- 
Alexander E. Patrakov
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting Minutes - 2022 Oct 12

2022-10-12 Thread Patrick Donnelly
Hello all,

Here's today's minutes:

- Update on OVH use by Ceph Foundation
- https://tracker.ceph.com/issues/57778 -- release doc and feature/bug
changes out of sync?
  + Create a ceph branch for docs, e.g. pacific-docs (updates with
each release; can be updated out-of-sync with release)
  + Maybe modify doc changes to release branches to include version
the change will be released in; difficulty is keeping the version
numbers accurate due to hotfixes.
  + Need to update readthedocs to use the new branch and update release process.
  + How to backport release notes? Statically edit release branch to
link to /latest? change readthedocs to no longer checkout main release
table. TODO: patrick
- Ceph Virtual 2022 https://github.com/ceph/ceph.io/pull/450
  + https://virtual-event-2022.ceph.io/en/community/events/2022/ceph-virtual/
- Question about k8s host recommendations re: cephfs kernel client
  + Experienced an outage because some k8s cluster (with cephfs pvcs)
was using kernel 5.16.13, which has a known null deref bug, fixed in
5.18. (kernel was coming with Fedora Core OS)
  + What is the recommended k8s host os in the rhel world for ceph kclients?


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd: Snapshot Only Permissions

2022-10-12 Thread Dan Poltawski
Hi All,

Is there any way to configure capabilities for a user to allow the client to 
*only* create/delete snapshots? I can't find anything which suggests this is 
possible on https://docs.ceph.com/en/latest/rados/operations/user-management/.

Context: I'm writing a script to automatically create and delete snapshots. 
Ideally i'd like to restrict the permissions for this user so it can't do 
anything else with rbd images and give it the least privileges possible.

thanks,
Dan





The Networking People (TNP) Limited. Registered office: Network House, Caton 
Rd, Lancaster, LA1 3PE. Registered in England & Wales with company number: 
07667393

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Iinfinite backfill loop + number of pgp groups stuck at wrong value

2022-10-12 Thread Nicola Mori
Thank you Frank for the insight. I'd need to study a bit more the 
details of all of this, but for sure now I understand it a bit better.


Nicola
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io