[ceph-users] Why is min_size of erasure pools set to k+1

2023-11-20 Thread Vladimir Brik
Could someone help me understand why it's a bad idea to set min_size of erasure-coded pools to k? >From what I've read, the argument for k+1 is that if min_size is k and you lose an OSD during recovery after a failure of m OSDs, data will become unavailable. But how does setting min_size to k+1

[ceph-users] Re: OSD tries (and fails) to scrub the same PGs over and over

2023-07-21 Thread Vladimir Brik
> what's the cluster status? Is there recovery or backfilling > going on? No. Everything is good except this PG is not getting scrubbed. Vlad On 7/21/23 01:41, Eugen Block wrote: Hi, what's the cluster status? Is there recovery or backfilling going on? Zitat von Vladimir Brik :

[ceph-users] OSD tries (and fails) to scrub the same PGs over and over

2023-07-19 Thread Vladimir Brik
I have a PG that hasn't been scrubbed in over a month and not deep-scrubbed in over two months. I tried forcing with `ceph pg (deep-)scrub` but with no success. Looking at the logs of that PG's primary OSD it looks like every once in a while it attempts (and apparently fails) to scrub that

[ceph-users] Re: Enable Centralized Logging in Dashboard.

2023-05-17 Thread Vladimir Brik
How do I create a user name and password that I could use to log in to grafana? Vlad On 11/16/22 08:42, E Taka wrote: Thank you, Nizam. I wasn't aware that the Dashboard login is not the same as the grafana login. Now I have accass to the logfiles. Am Mi., 16. Nov. 2022 um 15:06 Uhr schrieb

[ceph-users] Any issues with podman 4.2 and Quincy?

2023-02-13 Thread Vladimir Brik
Has anybody run into issues with Quincy and podman 4.2? 4x podman series are not mentioned in https://docs.ceph.com/en/quincy/cephadm/compatibility/ but podman 3x is no longer available in Alma Linux Vlad ___ ceph-users mailing list --

[ceph-users] Re: What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Vladimir Brik
On 12/13/22 12:46, Janne Johansson wrote: Den tis 13 dec. 2022 kl 17:47 skrev Vladimir Brik : Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL devi

[ceph-users] What happens when a DB/WAL device runs out of space?

2022-12-13 Thread Vladimir Brik
Hello I have a bunch of HDD OSDs with DB/WAL devices on SSD. If the current trends continue, the DB/WAL devices will become full before the HDDs completely fill up (e.g. a 50% full HDD has DB/WAL device that is about 65% full). Will anything terrible happen when DB/WAL devices fill up?

[ceph-users] Re: cephfs-top doesn't work

2022-10-05 Thread Vladimir Brik
e know? Thanks. On Tue, 19 Apr 2022 at 01:14, Vladimir Brik <mailto:vladimir.b...@icecube.wisc.edu>> wrote: Does anybody know why cephfs-top may only display header lines (date, client types, metric names) but no actual data? When I run it, cephfs-top consumes quite a bit of

[ceph-users] How to report a potential security issue

2022-10-04 Thread Vladimir Brik
Hello I think I may have run into a bug in cephfs that has security implications. I am not sure it's a good idea to send the details to the public mailing list or create a public ticket for it. How should I proceed? Thanks Vlad ___ ceph-users

[ceph-users] How to orch apply single site rgw with custom front-end

2021-06-15 Thread Vladimir Brik
Hello How can I use ceph orch apply to deploy single site rgw daemons with custom frontend configuration? Basically, I have three servers in a DNS round-robin, each running a 15.2.12 rgw daemon with this configuration: rgw_frontends = civetweb num_threads=5000 port=443s

[ceph-users] Is it safe to mix Octopus and Pacific mons?

2021-06-09 Thread Vladimir Brik
Hello My attempt to upgrade from Octopus to Pacific ran into issues, and I currently have one 16.2.4 mon and two 15.2.12 mons. Is this safe to run the cluster like this or should I shut down the 16.2.4 mon until I figure out what to do next with the upgrade? Thanks, Vlad

[ceph-users] Upgrade to 16 failed: wrong /sys/fs/cgroup path

2021-06-09 Thread Vladimir Brik
Hello My upgrade from 15.2.12 to 16.2.4 is stuck because a mon daemon failed to upgrade. Systemctl status of the mon showed this error: Error: open /sys/fs/cgroup/cpuacct,cpu/system.slice/... It turns out there is no /sys/fs/cgroup/cpuacct,cpu directory on my system. Instead, I have

[ceph-users] Stray hosts and daemons

2021-05-20 Thread Vladimir Brik
I am not sure how to interpret CEPHADM_STRAY_HOST and CEPHADM_STRAY_DAEMON warnings. They seem to be inconsistent. I converted my cluster to be managed by cephadm by adopting mon and all other daemons, and they show up in ceph orch ps, but ceph health says mons are stray: [WRN]

[ceph-users] Re: Balancer not balancing (14.2.7, crush-compat)

2020-04-09 Thread Vladimir Brik
One possibly relevant detail: the cluster has 8 nodes, and the new pool I created uses k5 m2 erasure coding. Vlad On 4/9/20 11:28 AM, Vladimir Brik wrote: Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing

[ceph-users] Balancer not balancing (14.2.7, crush-compat)

2020-04-09 Thread Vladimir Brik
Hello I am running ceph 14.2.7 with balancer in crush-compat mode (needed because of old clients), but it's doesn't seem to be doing anything. It used to work in the past. I am not sure what changed. I created a big pool, ~285TB stored, and it doesn't look like it ever got balanced: pool 43

[ceph-users] A fast tool to export/copy a pool

2020-03-09 Thread Vladimir Brik
I am wondering if there exists a tool, faster than "rados export", that can copy and restore read-only pools (to/from another pool or file system). It looks like "rados export" is very slow because it is single threaded (the best I can tell, --workers doesn't make a difference). Vlad

[ceph-users] Migrating data to a more efficient EC pool

2020-02-24 Thread Vladimir Brik
Hello I have ~300TB of data in default.rgw.buckets.data k2m2 pool and I would like to move it to a new k5m2 pool. I found instructions using cache tiering[1], but they come with a vague scary warning, and it looks like EC-EC may not even be possible [2] (is it still the case?). Can