[ceph-users] [ Ceph ] - Downgrade path failure

2021-08-12 Thread Lokendra Rathour
Hi Team, We have installed the pacific release of Ceph using Ceph-Ansible. Now we are planning to downgrade the Ceph release from Pacific to Octopus. We have tried this but it fails with the error message "• stderr: 'Error EPERM: require_osd_release cannot be lowered once it has been set'" Is

[ceph-users] Re: PSA: upgrading older clusters without CephFS

2021-08-12 Thread Alexandre Marangone
This part confuses me a bit "If your cluster has not used CephFS since before the Jewel release" Can you clarify whether this applies to clusters deployed before Jewel or any cluster deployed until now that has not used CephFS? Thanks, Alex On Thu, Aug 5, 2021 at 8:44 PM Patrick Donnelly

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Nico Schottelius
Wow, thanks everyone for the amazing pointers - I've not seen the osd op queue settings so far! At the moment the cluster is configured with [18:49:55] server4.place6:~# ceph config dump WHOMASK LEVELOPTION

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Frank Schilder
For reference, here my settings: osd class:hdd advanced osd_recovery_sleep0.05 osd class:rbd_data advanced osd_recovery_sleep0.025000 osd class:rbd_meta advanced osd_recovery_sleep0.002500 osd class:ssd

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Frank Schilder
> Wow, that is impressive and sounds opposite of what we see around > here. Often rebalances directly and strongly impact client I/O. It might be the missing settings: osd_op_queue = wpq osd_op_queue_cut_off = high If the cluster comes from kraken, these might be inherited with different

[ceph-users] Re: Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-12 Thread Ilya Dryomov
On Thu, Aug 12, 2021 at 5:03 PM Boris Behrens wrote: > > Hi everybody, > > we just stumbled over a problem where the rbd image does not shrink, when > files are removed. > This only happenes when the rbd image is partitioned. > > * We tested it with centos8/ubuntu20.04 with ext4 and a gpt

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Nico Schottelius
Hey Frank, Frank Schilder writes: > The recovery_sleep options are the next choice to look at. Increase it and > clients will get more I/O time slots. However, with your settings, I'm > surprised clients are impacted at all. I usually leave the op-priority at its > default and use

[ceph-users] Re: ceph osd continously fails

2021-08-12 Thread Wesley Dillingham
Can you send the results of "ceph daemon osd.0 status" and maybe do that for a couple of osd ids ? You may need to target ones which are currently running. Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn On Wed, Aug 11, 2021 at 9:51

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Steven Pine
Yes, osd_op_queue_cut_off makes a significant difference. And as mentioned, be sure to check your osd recovery sleep settings, there are several depending on your underlying drives: "osd_recovery_sleep": "0.00", "osd_recovery_sleep_hdd": "0.05", "osd_recovery_sleep_hybrid":

[ceph-users] Re: Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-12 Thread Eugen Block
Hi, have you checked ‚rbd sparsify‘ to reclaim unused space? Zitat von Boris Behrens : Hi everybody, we just stumbled over a problem where the rbd image does not shrink, when files are removed. This only happenes when the rbd image is partitioned. * We tested it with centos8/ubuntu20.04 with

[ceph-users] Re: Very slow I/O during rebalance - options to tune?

2021-08-12 Thread Peter Lieven
Am 12.08.21 um 17:25 schrieb Frank Schilder: Wow, that is impressive and sounds opposite of what we see around here. Often rebalances directly and strongly impact client I/O. It might be the missing settings: osd_op_queue = wpq osd_op_queue_cut_off = high Afaik, the default for

[ceph-users] Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-12 Thread Boris Behrens
Hi everybody, we just stumbled over a problem where the rbd image does not shrink, when files are removed. This only happenes when the rbd image is partitioned. * We tested it with centos8/ubuntu20.04 with ext4 and a gpt partition table (/boot and /) * the kvm device is virtio-scsi-pci with krbd

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-08-12 Thread Dave Piper
Hi Igor, Just to update you on our progress. - We've not had another repro of this since switching to bitmap allocator / upgrading to the latest octopus release. I'll try to gather the full set of diags if we do see this again. - I think my issues with an empty /var/lib/ceph/osd/ceph-N/ folder

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-08-12 Thread David Orman
https://github.com/ceph/ceph/pull/42690 looks like it might be a fix, but it's pending review. On Thu, Aug 12, 2021 at 7:46 AM André Gemünd wrote: > > We're seeing the same here with v16.2.5 on CentOS 8.3 > > Do you know of any progress? > > Best Greetings > André > > - Am 9. Aug 2021 um

[ceph-users] Not able to reach quorum during update

2021-08-12 Thread Michael Wodniok
Hi, during upgrade from Octopus (15.2.13) to Pacific (16.2.4) I was not able to complete the upgrade. I've upgraded using the `ceph orch upgrade start` command. After one mon was upgrade, the second upgraded one is not more able to form a quorum, looping following messages in its journal:

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-08-12 Thread André Gemünd
We're seeing the same here with v16.2.5 on CentOS 8.3 Do you know of any progress? Best Greetings André - Am 9. Aug 2021 um 18:15 schrieb David Orman orma...@corenode.com: > Hi, > > We are seeing very similar behavior on 16.2.5, and also have noticed > that an undeploy/deploy cycle fixes

[ceph-users] Re: [EXTERNAL] Re: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-08-12 Thread Igor Fedotov
Hi Dave, thanks for the update. I'm curious whether reverting back to default allocator on the latest release would be OK as well. Please try if possible. Thanks, Igor On 8/12/2021 2:00 PM, Dave Piper wrote: Hi Igor, Just to update you on our progress. - We've not had another repro of

[ceph-users] Re: Docker container snapshots accumulate until disk full failure?

2021-08-12 Thread Sebastian Knust
Dear Harry, `docker image prune -a` removes all dangling images as well as all images not referenced by any running container. I successfully used it in my setups to remove old versions. In RHEL/CentOS, podman is used and thus you should use `podman image prune -a` instead. HTH, Cheers