[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-04 Thread Eugen Block

Hi,


1. RocksDB options, which I provided to each mon via their configuration
files, got overwritten during mon redeployment and I had to re-add
mon_rocksdb_options back.


IIRC, you didn't use the extra_entrypoint_args for that option but  
added it directly to the container unit.run file. So it's expected  
that it's removed after an update. If you want it to persist a  
container update you should consider using the extra_entrypoint_args:


cat mon.yaml
service_type: mon
service_name: mon
placement:
  hosts:
  - host1
  - host2
  - host3
extra_entrypoint_args:
  -  
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'


Regards,
Eugen

Zitat von Zakhar Kirpichenko :


Hi,

I have upgraded my test and production cephadm-managed clusters from
16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
There were a few things which I noticed after each upgrade:

1. RocksDB options, which I provided to each mon via their configuration
files, got overwritten during mon redeployment and I had to re-add
mon_rocksdb_options back.

2. Monitor debug_rocksdb option got silently reset back to the default 4/5,
I had to set it back to 1/5.

3. For roughly 2 hours after the upgrade, despite the clusters being
healthy and operating normally, all monitors would run manual compactions
very often and write to disks at very high rates. For example, production
monitors had their rocksdb:low0 thread write to store.db:

monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.

After roughly 2 hours with no changes to the cluster the write rates
dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason for
frequent manual compactions and high write rates wasn't immediately
apparent.

4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
/var/lib/ceph/FSID/crash/posted, despite I already fixed it manually after
the upgrade to 16.2.14 which had broken it as well.

5. Mgr RAM usage appears to be increasing at a slower rate than it did with
16.2.14, although it's too early to tell whether the issue with mgrs
randomly consuming all RAM and getting OOM-killed has been fixed - with
16.2.14 this would normally take several days.

Overall, things look good. Thanks to the Ceph team for this release!

Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Help with deep scrub warnings

2024-03-04 Thread Nicola Mori

Dear Ceph users,

in order to reduce the deep scrub load on my cluster I set the deep 
scrub interval to 2 weeks, and tuned other parameters as follows:


# ceph config get osd osd_deep_scrub_interval
1209600.00
# ceph config get osd osd_scrub_sleep
0.10
# ceph config get osd osd_scrub_load_threshold
0.30
# ceph config get osd osd_deep_scrub_randomize_ratio
0.10
# ceph config get osd osd_scrub_min_interval
259200.00
# ceph config get osd osd_scrub_max_interval
1209600.00

In my admittedly poor knowledge of Ceph's deep scrub procedures, these 
settings should spread the deep scrub operations in two weeks instead of 
the default one week, lowering the scrub frequency and the related load. 
But I'm currently getting warnings like:


[WRN] PG_NOT_DEEP_SCRUBBED: 56 pgs not deep-scrubbed in time
pg 3.1e1 not deep-scrubbed since 2024-02-22T00:22:55.296213+
pg 3.1d9 not deep-scrubbed since 2024-02-20T03:41:25.461002+
pg 3.1d5 not deep-scrubbed since 2024-02-20T09:52:57.334058+
pg 3.1cb not deep-scrubbed since 2024-02-20T03:30:40.510979+
. . .

I don't understand the first one, since the deep scrub interval should 
be two weeks so I don''t expect warnings for PGs which have been 
deep-scrubbed less than 14 days ago (at the moment I'm writing it's Tue 
Mar  5 07:39:07 UTC 2024).


Moreover, I don't understand why the deep scrub for so many PGs is 
lagging behind. Is there something wrong in my settings?


Thanks in advance for any help,

Nicola


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgraded 16.2.14 to 16.2.15

2024-03-04 Thread Zakhar Kirpichenko
Hi,

I have upgraded my test and production cephadm-managed clusters from
16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
There were a few things which I noticed after each upgrade:

1. RocksDB options, which I provided to each mon via their configuration
files, got overwritten during mon redeployment and I had to re-add
mon_rocksdb_options back.

2. Monitor debug_rocksdb option got silently reset back to the default 4/5,
I had to set it back to 1/5.

3. For roughly 2 hours after the upgrade, despite the clusters being
healthy and operating normally, all monitors would run manual compactions
very often and write to disks at very high rates. For example, production
monitors had their rocksdb:low0 thread write to store.db:

monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.

After roughly 2 hours with no changes to the cluster the write rates
dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason for
frequent manual compactions and high write rates wasn't immediately
apparent.

4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
/var/lib/ceph/FSID/crash/posted, despite I already fixed it manually after
the upgrade to 16.2.14 which had broken it as well.

5. Mgr RAM usage appears to be increasing at a slower rate than it did with
16.2.14, although it's too early to tell whether the issue with mgrs
randomly consuming all RAM and getting OOM-killed has been fixed - with
16.2.14 this would normally take several days.

Overall, things look good. Thanks to the Ceph team for this release!

Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [RGW] Restrict a subuser to access only one specific bucket

2024-03-04 Thread Huy Nguyen
Hi community,
I have a user that owns some buckets. I want to create a subuser that has 
permission to access only one bucket. What can I do to archive this?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] debian-reef_OLD?

2024-03-04 Thread Daniel Brown

I likely missed an announcement, and if so, please forgive me. 


I’m seeing some failure for when running apt on a cluster of ubuntu machines — 
looks like a directory has changed on https://download.ceph.com/ 


Was: 

debian-reef/

Now appears to be:

debian-reef_OLD/

  
Was reef pulled? 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs not balanced

2024-03-04 Thread Joshua Baergen
The balancer will operate on all pools unless otherwise specified.

Josh

On Mon, Mar 4, 2024 at 1:12 PM Cedric  wrote:
>
> Did the balancer has enabled pools ? "ceph balancer pool ls"
>
> Actually I am wondering if the balancer do something when no pools are
> added.
>
>
>
> On Mon, Mar 4, 2024, 11:30 Ml Ml  wrote:
>
> > Hello,
> >
> > i wonder why my autobalancer is not working here:
> >
> > root@ceph01:~# ceph -s
> >   cluster:
> > id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
> > health: HEALTH_ERR
> > 1 backfillfull osd(s)
> > 1 full osd(s)
> > 1 nearfull osd(s)
> > 4 pool(s) full
> >
> > => osd.17 was too full (92% or something like that)
> >
> > root@ceph01:~# ceph osd df tree
> > ID   CLASS  WEIGHT REWEIGHT  SIZE ... %USE  ... PGS TYPE NAME
> > -25 209.50084 -  213 TiB  ... 69.56 ...   - datacenter
> > xxx-dc-root
> > -19  84.59369 -   86 TiB  ... 56.97 ...   - rack
> > RZ1.Reihe4.R10
> >  -3  35.49313 -   37 TiB  ... 57.88 ...   - host
> > ceph02
> >   2hdd1.7   1.0  1.7 TiB  ... 58.77 ...  44
> >  osd.2
> >   3hdd1.0   1.0  2.7 TiB  ... 22.14 ...  25
> >  osd.3
> >   7hdd2.5   1.0  2.7 TiB  ... 58.84 ...  70
> >  osd.7
> >   9hdd9.5   1.0  9.5 TiB  ... 63.07 ... 268
> >  osd.9
> >  13hdd2.67029   1.0  2.7 TiB  ... 53.59 ...  65
> >  osd.13
> >  16hdd2.8   1.0  2.7 TiB  ... 59.35 ...  71
> >  osd.16
> >  19hdd1.7   1.0  1.7 TiB  ... 48.98 ...  37
> >  osd.19
> >  23hdd2.38419   1.0  2.4 TiB  ... 59.33 ...  64
> >  osd.23
> >  24hdd1.3   1.0  1.7 TiB  ... 51.23 ...  39
> >  osd.24
> >  28hdd3.63869   1.0  3.6 TiB  ... 64.17 ... 104
> >  osd.28
> >  31hdd2.7   1.0  2.7 TiB  ... 64.73 ...  76
> >  osd.31
> >  32hdd3.3   1.0  3.3 TiB  ... 67.28 ... 101
> >  osd.32
> >  -9  22.88817 -   23 TiB  ... 56.96 ...   - host
> > ceph06
> >  35hdd7.15259   1.0  7.2 TiB  ... 55.71 ... 182
> >  osd.35
> >  36hdd5.24519   1.0  5.2 TiB  ... 53.75 ... 128
> >  osd.36
> >  45hdd5.24519   1.0  5.2 TiB  ... 60.91 ... 144
> >  osd.45
> >  48hdd5.24519   1.0  5.2 TiB  ... 57.94 ... 139
> >  osd.48
> > -17  26.21239 -   26 TiB  ... 55.67 ...   - host
> > ceph08
> >  37hdd6.67569   1.0  6.7 TiB  ... 58.17 ... 174
> >  osd.37
> >  40hdd9.53670   1.0  9.5 TiB  ... 58.54 ... 250
> >  osd.40
> >  46hdd5.0   1.0  5.0 TiB  ... 52.39 ... 116
> >  osd.46
> >  47hdd5.0   1.0  5.0 TiB  ... 50.05 ... 112
> >  osd.47
> > -20  59.11053 -   60 TiB  ... 82.47 ...   - rack
> > RZ1.Reihe4.R9
> >  -4  23.09996 -   24 TiB  ... 79.92 ...   - host
> > ceph03
> >   5hdd1.7   0.75006  1.7 TiB  ... 87.24 ...  66
> >  osd.5
> >   6hdd1.7   0.44998  1.7 TiB  ... 47.30 ...  36
> >  osd.6
> >  10hdd2.7   0.85004  2.7 TiB  ... 83.23 ... 100
> >  osd.10
> >  15hdd2.7   0.75006  2.7 TiB  ... 74.26 ...  88
> >  osd.15
> >  17hdd0.5   0.85004  1.6 TiB  ... 91.44 ...  67
> >  osd.17
> >  20hdd2.0   0.85004  1.7 TiB  ... 88.41 ...  68
> >  osd.20
> >  21hdd2.7   0.75006  2.7 TiB  ... 77.25 ...  91
> >  osd.21
> >  25hdd1.7   0.90002  1.7 TiB  ... 78.31 ...  60
> >  osd.25
> >  26hdd2.7   1.0  2.7 TiB  ... 82.75 ...  99
> >  osd.26
> >  27hdd2.7   0.90002  2.7 TiB  ... 84.26 ... 101
> >  osd.27
> >  63hdd1.8   0.90002  1.7 TiB  ... 84.15 ...  65
> >  osd.63
> > -13  36.01057 -   36 TiB  ... 84.12 ...   - host
> > ceph05
> >  11hdd7.15259   0.90002  7.2 TiB  ... 85.45 ... 273
> >  osd.11
> >  39hdd7.2   0.85004  7.2 TiB  ... 80.90 ... 257
> >  osd.39
> >  41hdd7.2   0.75006  7.2 TiB  ... 74.95 ... 239
> >  osd.41
> >  42hdd9.0   1.0  9.5 TiB  ... 92.00 ... 392
> >  osd.42
> >  43hdd5.45799   1.0  5.5 TiB  ... 84.84 ... 207
> >  osd.43
> > -21  65.79662 -   66 TiB  ... 74.29 ...   - rack
> > RZ3.Reihe3.R10
> >  -2  28.49664 -   29 TiB  ... 74.79 ...   - host
> > ceph01
> >   0hdd2.7   1.0  2.7 TiB  ... 73.82 ...  88
> >  osd.0
> >   1hdd3.63869   1.0  3.6 TiB  ... 73.47 ... 121
> >  osd.1
> >   4hdd2.7   1.0  2.7 TiB  ... 74.63 ...  89
> >  osd.4
> >   8hdd2.7   1.0  2.7 TiB  ... 77.10 ...  92
> >  osd.8
> >  12hdd2.7   1.0  2.7 TiB  ... 78.76 ...  94
> >  osd.12
> >  14hdd5.45799   1.0  5.5 TiB  ... 78.86 ... 193
> >  osd.14
> >  18hdd1.8   1.0  2.7 TiB  ... 63.79 ...  76
> >  osd.18
> >  22hdd

[ceph-users] Re: OSDs not balanced

2024-03-04 Thread Cedric
Did the balancer has enabled pools ? "ceph balancer pool ls"

Actually I am wondering if the balancer do something when no pools are
added.



On Mon, Mar 4, 2024, 11:30 Ml Ml  wrote:

> Hello,
>
> i wonder why my autobalancer is not working here:
>
> root@ceph01:~# ceph -s
>   cluster:
> id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
> health: HEALTH_ERR
> 1 backfillfull osd(s)
> 1 full osd(s)
> 1 nearfull osd(s)
> 4 pool(s) full
>
> => osd.17 was too full (92% or something like that)
>
> root@ceph01:~# ceph osd df tree
> ID   CLASS  WEIGHT REWEIGHT  SIZE ... %USE  ... PGS TYPE NAME
> -25 209.50084 -  213 TiB  ... 69.56 ...   - datacenter
> xxx-dc-root
> -19  84.59369 -   86 TiB  ... 56.97 ...   - rack
> RZ1.Reihe4.R10
>  -3  35.49313 -   37 TiB  ... 57.88 ...   - host
> ceph02
>   2hdd1.7   1.0  1.7 TiB  ... 58.77 ...  44
>  osd.2
>   3hdd1.0   1.0  2.7 TiB  ... 22.14 ...  25
>  osd.3
>   7hdd2.5   1.0  2.7 TiB  ... 58.84 ...  70
>  osd.7
>   9hdd9.5   1.0  9.5 TiB  ... 63.07 ... 268
>  osd.9
>  13hdd2.67029   1.0  2.7 TiB  ... 53.59 ...  65
>  osd.13
>  16hdd2.8   1.0  2.7 TiB  ... 59.35 ...  71
>  osd.16
>  19hdd1.7   1.0  1.7 TiB  ... 48.98 ...  37
>  osd.19
>  23hdd2.38419   1.0  2.4 TiB  ... 59.33 ...  64
>  osd.23
>  24hdd1.3   1.0  1.7 TiB  ... 51.23 ...  39
>  osd.24
>  28hdd3.63869   1.0  3.6 TiB  ... 64.17 ... 104
>  osd.28
>  31hdd2.7   1.0  2.7 TiB  ... 64.73 ...  76
>  osd.31
>  32hdd3.3   1.0  3.3 TiB  ... 67.28 ... 101
>  osd.32
>  -9  22.88817 -   23 TiB  ... 56.96 ...   - host
> ceph06
>  35hdd7.15259   1.0  7.2 TiB  ... 55.71 ... 182
>  osd.35
>  36hdd5.24519   1.0  5.2 TiB  ... 53.75 ... 128
>  osd.36
>  45hdd5.24519   1.0  5.2 TiB  ... 60.91 ... 144
>  osd.45
>  48hdd5.24519   1.0  5.2 TiB  ... 57.94 ... 139
>  osd.48
> -17  26.21239 -   26 TiB  ... 55.67 ...   - host
> ceph08
>  37hdd6.67569   1.0  6.7 TiB  ... 58.17 ... 174
>  osd.37
>  40hdd9.53670   1.0  9.5 TiB  ... 58.54 ... 250
>  osd.40
>  46hdd5.0   1.0  5.0 TiB  ... 52.39 ... 116
>  osd.46
>  47hdd5.0   1.0  5.0 TiB  ... 50.05 ... 112
>  osd.47
> -20  59.11053 -   60 TiB  ... 82.47 ...   - rack
> RZ1.Reihe4.R9
>  -4  23.09996 -   24 TiB  ... 79.92 ...   - host
> ceph03
>   5hdd1.7   0.75006  1.7 TiB  ... 87.24 ...  66
>  osd.5
>   6hdd1.7   0.44998  1.7 TiB  ... 47.30 ...  36
>  osd.6
>  10hdd2.7   0.85004  2.7 TiB  ... 83.23 ... 100
>  osd.10
>  15hdd2.7   0.75006  2.7 TiB  ... 74.26 ...  88
>  osd.15
>  17hdd0.5   0.85004  1.6 TiB  ... 91.44 ...  67
>  osd.17
>  20hdd2.0   0.85004  1.7 TiB  ... 88.41 ...  68
>  osd.20
>  21hdd2.7   0.75006  2.7 TiB  ... 77.25 ...  91
>  osd.21
>  25hdd1.7   0.90002  1.7 TiB  ... 78.31 ...  60
>  osd.25
>  26hdd2.7   1.0  2.7 TiB  ... 82.75 ...  99
>  osd.26
>  27hdd2.7   0.90002  2.7 TiB  ... 84.26 ... 101
>  osd.27
>  63hdd1.8   0.90002  1.7 TiB  ... 84.15 ...  65
>  osd.63
> -13  36.01057 -   36 TiB  ... 84.12 ...   - host
> ceph05
>  11hdd7.15259   0.90002  7.2 TiB  ... 85.45 ... 273
>  osd.11
>  39hdd7.2   0.85004  7.2 TiB  ... 80.90 ... 257
>  osd.39
>  41hdd7.2   0.75006  7.2 TiB  ... 74.95 ... 239
>  osd.41
>  42hdd9.0   1.0  9.5 TiB  ... 92.00 ... 392
>  osd.42
>  43hdd5.45799   1.0  5.5 TiB  ... 84.84 ... 207
>  osd.43
> -21  65.79662 -   66 TiB  ... 74.29 ...   - rack
> RZ3.Reihe3.R10
>  -2  28.49664 -   29 TiB  ... 74.79 ...   - host
> ceph01
>   0hdd2.7   1.0  2.7 TiB  ... 73.82 ...  88
>  osd.0
>   1hdd3.63869   1.0  3.6 TiB  ... 73.47 ... 121
>  osd.1
>   4hdd2.7   1.0  2.7 TiB  ... 74.63 ...  89
>  osd.4
>   8hdd2.7   1.0  2.7 TiB  ... 77.10 ...  92
>  osd.8
>  12hdd2.7   1.0  2.7 TiB  ... 78.76 ...  94
>  osd.12
>  14hdd5.45799   1.0  5.5 TiB  ... 78.86 ... 193
>  osd.14
>  18hdd1.8   1.0  2.7 TiB  ... 63.79 ...  76
>  osd.18
>  22hdd1.7   1.0  1.7 TiB  ... 74.85 ...  57
>  osd.22
>  30hdd1.7   1.0  1.7 TiB  ... 76.34 ...  59
>  osd.30
>  64hdd3.2   1.0  3.3 TiB  ... 73.48 ... 110
>  osd.64
> -11  12.3 -   12 TiB  ... 73.40 ...   - host
> ceph04
>  34hdd5.2   1.0  5.2 TiB  ... 72.81 ... 171
>  osd.34
>  44hdd7.2   1.0 

[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Mark Nelson


On 3/4/24 08:40, Maged Mokhtar wrote:


On 04/03/2024 15:37, Frank Schilder wrote:
Fast write enabled would mean that the primary OSD sends #size 
copies to the
entire active set (including itself) in parallel and sends an ACK 
to the
client as soon as min_size ACKs have been received from the peers 
(including
itself). In this way, one can tolerate (size-min_size) slow(er) 
OSDs (slow
for whatever reason) without suffering performance penalties 
immediately
(only after too many requests started piling up, which will show 
as a slow

requests warning).

What happens if there occurs an error on the slowest osd after the 
min_size ACK has already been send to the client?


This should not be different than what exists today..unless 
of-course if

the error happens on the local/primary osd
Can this be addressed with reasonable effort? I don't expect this to 
be a quick-fix and it should be tested. However, beating the 
tail-latency statistics with the extra redundancy should be worth it. 
I observe fluctuations of latencies, OSDs become randomly slow for 
whatever reason for short time intervals and then return to normal.


A reason for this could be DB compaction. I think during compaction 
latency tends to spike.


A fast-write option would effectively remove the impact of this.

Best regards and thanks for considering this!


i think this is something the rados devs need to say. it does sound 
worth investigating. it is not just for cases with db compaction but 
more importantly the normal(happy) io path as it will have the most 
impact.



Typically a L0->L1 compaction will have two primary effects:


1) It will cause large IO read/write traffic to the disk potentially 
impacting other IO taking place if the disk is already saturated.


2) It will block memtable flushes until the compaction finishes. This 
means that more and more data will accumulate in the memtables/WAL which 
can trigger throttling and eventually stalls if you run out of buffer 
space.  By default, we allow up to 1GB of writes to WAL/memtables before 
writes are fully stalled, but RocksDB will typlically throttle writes 
before you get to that point.  It's possible a larger buffer may allow 
you to absorb traffic spikes for longer at the expense of more disk and 
memory usage.  Ultimately though, if you are hitting throttling, it 
means that the DB can't keep up with the WAL ingestion rate.



Mark





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs not balanced

2024-03-04 Thread Anthony D'Atri


> I think the short answer is "because you have so wildly varying sizes
> both for drives and hosts".

Arguably OP's OSDs *are* balanced in that their PGs are roughly in line with 
their sizes, but indeed the size disparity is problematic in some ways.

Notably, the 500GB OSD should just be removed.  I think balancing doesn't 
account for WAL/DB/other overhead, so it won't be accurately accounted for and 
can't hold much data nyway.

This cluster shows evidence of reweight-by-utilization having been run, but 
only on two of the hosts.  If the balancer module is active, those override 
weights will confound it.


> 
> If your drive sizes span from 0.5 to 9.5, there will naturally be
> skewed data, and it is not a huge surprise that the automation has
> some troubles getting it "good". When the balancer places a PG on a
> 0.5-sized drive compared to a 9.5-sized one, it eats up 19x more of
> the "free space" on the smaller one, so there are very few good
> options when the sizes are so different. Even if you placed all PGs
> correctly due to size, the 9.5-sized disk would end up getting 19x
> more IO than the small drive and for hdd, it seldom is possible to
> gracefully handle a 19-fold increase in IO, most of the time will
> probably be spent on seeks.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v16.2.15 Pacific released

2024-03-04 Thread Zakhar Kirpichenko
This is great news! Many thanks!

/Z

On Mon, 4 Mar 2024 at 17:25, Yuri Weinstein  wrote:

> We're happy to announce the 15th, and expected to be the last,
> backport release in the Pacific series.
>
> https://ceph.io/en/news/blog/2024/v16-2-15-pacific-released/
>
> Notable Changes
> ---
>
> * `ceph config dump --format ` output will display the localized
>   option names instead of their normalized version. For example,
>   "mgr/prometheus/x/server_port" will be displayed instead of
>   "mgr/prometheus/server_port". This matches the output of the non
> pretty-print
>   formatted version of the command.
>
> * CephFS: MDS evicts clients who are not advancing their request tids,
> which causes
>   a large buildup of session metadata, resulting in the MDS going
> read-only due to
>   the RADOS operation exceeding the size threshold. The
> `mds_session_metadata_threshold`
>   config controls the maximum size that an (encoded) session metadata can
> grow.
>
> * RADOS: The `get_pool_is_selfmanaged_snaps_mode` C++ API has been
> deprecated
>   due to its susceptibility to false negative results.  Its safer
> replacement is
>   `pool_is_in_selfmanaged_snaps_mode`.
>
> * RBD: When diffing against the beginning of time (`fromsnapname == NULL`)
> in
>   fast-diff mode (`whole_object == true` with `fast-diff` image feature
> enabled
>   and valid), diff-iterate is now guaranteed to execute locally if
> exclusive
>   lock is available.  This brings a dramatic performance improvement for
> QEMU
>   live disk synchronization and backup use cases.
>
> Getting Ceph
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at https://download.ceph.com/tarballs/ceph-16.2.15.tar.gz
> * Containers at https://quay.io/repository/ceph/ceph
> * For packages, see https://docs.ceph.com/en/latest/install/get-packages/
> * Release git sha1: 618f440892089921c3e944a991122ddc44e60516
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v16.2.15 Pacific released

2024-03-04 Thread Yuri Weinstein
We're happy to announce the 15th, and expected to be the last,
backport release in the Pacific series.

https://ceph.io/en/news/blog/2024/v16-2-15-pacific-released/

Notable Changes
---

* `ceph config dump --format ` output will display the localized
  option names instead of their normalized version. For example,
  "mgr/prometheus/x/server_port" will be displayed instead of
  "mgr/prometheus/server_port". This matches the output of the non pretty-print
  formatted version of the command.

* CephFS: MDS evicts clients who are not advancing their request tids,
which causes
  a large buildup of session metadata, resulting in the MDS going
read-only due to
  the RADOS operation exceeding the size threshold. The
`mds_session_metadata_threshold`
  config controls the maximum size that an (encoded) session metadata can grow.

* RADOS: The `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
  due to its susceptibility to false negative results.  Its safer replacement is
  `pool_is_in_selfmanaged_snaps_mode`.

* RBD: When diffing against the beginning of time (`fromsnapname == NULL`) in
  fast-diff mode (`whole_object == true` with `fast-diff` image feature enabled
  and valid), diff-iterate is now guaranteed to execute locally if exclusive
  lock is available.  This brings a dramatic performance improvement for QEMU
  live disk synchronization and backup use cases.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-16.2.15.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: 618f440892089921c3e944a991122ddc44e60516
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Maged Mokhtar



On 04/03/2024 15:37, Frank Schilder wrote:

Fast write enabled would mean that the primary OSD sends #size copies to the
entire active set (including itself) in parallel and sends an ACK to the
client as soon as min_size ACKs have been received from the peers (including
itself). In this way, one can tolerate (size-min_size) slow(er) OSDs (slow
for whatever reason) without suffering performance penalties immediately
(only after too many requests started piling up, which will show as a slow
requests warning).


What happens if there occurs an error on the slowest osd after the min_size ACK 
has already been send to the client?


This should not be different than what exists today..unless of-course if
the error happens on the local/primary osd

Can this be addressed with reasonable effort? I don't expect this to be a 
quick-fix and it should be tested. However, beating the tail-latency statistics 
with the extra redundancy should be worth it. I observe fluctuations of 
latencies, OSDs become randomly slow for whatever reason for short time 
intervals and then return to normal.

A reason for this could be DB compaction. I think during compaction latency 
tends to spike.

A fast-write option would effectively remove the impact of this.

Best regards and thanks for considering this!


i think this is something the rados devs need to say. it does sound 
worth investigating. it is not just for cases with db compaction but 
more importantly the normal(happy) io path as it will have the most impact.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Frank Schilder
>>> Fast write enabled would mean that the primary OSD sends #size copies to the
>>> entire active set (including itself) in parallel and sends an ACK to the
>>> client as soon as min_size ACKs have been received from the peers (including
>>> itself). In this way, one can tolerate (size-min_size) slow(er) OSDs (slow
>>> for whatever reason) without suffering performance penalties immediately
>>> (only after too many requests started piling up, which will show as a slow
>>> requests warning).
>>>
>> What happens if there occurs an error on the slowest osd after the min_size 
>> ACK has already been send to the client?
>>
>This should not be different than what exists today..unless of-course if
>the error happens on the local/primary osd

Can this be addressed with reasonable effort? I don't expect this to be a 
quick-fix and it should be tested. However, beating the tail-latency statistics 
with the extra redundancy should be worth it. I observe fluctuations of 
latencies, OSDs become randomly slow for whatever reason for short time 
intervals and then return to normal.

A reason for this could be DB compaction. I think during compaction latency 
tends to spike.

A fast-write option would effectively remove the impact of this.

Best regards and thanks for considering this!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [Quincy] cannot configure dashboard to listen on all ports

2024-03-04 Thread wodel youchi
Hi,
ceph dashboard fails to listen on all IPs.

log_channel(cluster) log [ERR] : Unhandled exception from module 'dashboard'
while running on mgr.controllera: OSError("No socket could be created --
(('0.0.0.0', 8443): [Errno -2] Name or service not known) -- (('::', 8443,
0, 0):


ceph version 17.2.7  quincy (stable)
Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Maged Mokhtar



On 04/03/2024 13:35, Marc wrote:

Fast write enabled would mean that the primary OSD sends #size copies to the
entire active set (including itself) in parallel and sends an ACK to the
client as soon as min_size ACKs have been received from the peers (including
itself). In this way, one can tolerate (size-min_size) slow(er) OSDs (slow
for whatever reason) without suffering performance penalties immediately
(only after too many requests started piling up, which will show as a slow
requests warning).


What happens if there occurs an error on the slowest osd after the min_size ACK 
has already been send to the client?

This should not be different than what exists today..unless of-course if 
the error happens on the local/primary osd

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Marc
> 
> Fast write enabled would mean that the primary OSD sends #size copies to the
> entire active set (including itself) in parallel and sends an ACK to the
> client as soon as min_size ACKs have been received from the peers (including
> itself). In this way, one can tolerate (size-min_size) slow(er) OSDs (slow
> for whatever reason) without suffering performance penalties immediately
> (only after too many requests started piling up, which will show as a slow
> requests warning).
> 

What happens if there occurs an error on the slowest osd after the min_size ACK 
has already been send to the client? 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs not balanced

2024-03-04 Thread Janne Johansson
Den mån 4 mars 2024 kl 11:30 skrev Ml Ml :
>
> Hello,
>
> i wonder why my autobalancer is not working here:

I think the short answer is "because you have so wildly varying sizes
both for drives and hosts".

If your drive sizes span from 0.5 to 9.5, there will naturally be
skewed data, and it is not a huge surprise that the automation has
some troubles getting it "good". When the balancer places a PG on a
0.5-sized drive compared to a 9.5-sized one, it eats up 19x more of
the "free space" on the smaller one, so there are very few good
options when the sizes are so different. Even if you placed all PGs
correctly due to size, the 9.5-sized disk would end up getting 19x
more IO than the small drive and for hdd, it seldom is possible to
gracefully handle a 19-fold increase in IO, most of the time will
probably be spent on seeks.

> root@ceph01:~# ceph -s
>   cluster:
> id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
> health: HEALTH_ERR
> 1 backfillfull osd(s)
> 1 full osd(s)
> 1 nearfull osd(s)
> 4 pool(s) full
>
> => osd.17 was too full (92% or something like that)
>
> root@ceph01:~# ceph osd df tree
> ID   CLASS  WEIGHT REWEIGHT  SIZE ... %USE  ... PGS TYPE NAME
> -25 209.50084 -  213 TiB  ... 69.56 ...   - datacenter
> xxx-dc-root
> -19  84.59369 -   86 TiB  ... 56.97 ...   - rack
> RZ1.Reihe4.R10
>  -3  35.49313 -   37 TiB  ... 57.88 ...   - host 
> ceph02
>   2hdd1.7   1.0  1.7 TiB  ... 58.77 ...  44 osd.2
>   3hdd1.0   1.0  2.7 TiB  ... 22.14 ...  25 osd.3
>   7hdd2.5   1.0  2.7 TiB  ... 58.84 ...  70 osd.7
>   9hdd9.5   1.0  9.5 TiB  ... 63.07 ... 268 osd.9
>  13hdd2.67029   1.0  2.7 TiB  ... 53.59 ...  65 osd.13
>  16hdd2.8   1.0  2.7 TiB  ... 59.35 ...  71 osd.16
>  19hdd1.7   1.0  1.7 TiB  ... 48.98 ...  37 osd.19
>  23hdd2.38419   1.0  2.4 TiB  ... 59.33 ...  64 osd.23
>  24hdd1.3   1.0  1.7 TiB  ... 51.23 ...  39 osd.24
>  28hdd3.63869   1.0  3.6 TiB  ... 64.17 ... 104 osd.28
>  31hdd2.7   1.0  2.7 TiB  ... 64.73 ...  76 osd.31
>  32hdd3.3   1.0  3.3 TiB  ... 67.28 ... 101 osd.32
>  -9  22.88817 -   23 TiB  ... 56.96 ...   - host 
> ceph06
>  35hdd7.15259   1.0  7.2 TiB  ... 55.71 ... 182 osd.35
>  36hdd5.24519   1.0  5.2 TiB  ... 53.75 ... 128 osd.36
>  45hdd5.24519   1.0  5.2 TiB  ... 60.91 ... 144 osd.45
>  48hdd5.24519   1.0  5.2 TiB  ... 57.94 ... 139 osd.48
> -17  26.21239 -   26 TiB  ... 55.67 ...   - host 
> ceph08
>  37hdd6.67569   1.0  6.7 TiB  ... 58.17 ... 174 osd.37
>  40hdd9.53670   1.0  9.5 TiB  ... 58.54 ... 250 osd.40
>  46hdd5.0   1.0  5.0 TiB  ... 52.39 ... 116 osd.46
>  47hdd5.0   1.0  5.0 TiB  ... 50.05 ... 112 osd.47
> -20  59.11053 -   60 TiB  ... 82.47 ...   - rack
> RZ1.Reihe4.R9
>  -4  23.09996 -   24 TiB  ... 79.92 ...   - host 
> ceph03
>   5hdd1.7   0.75006  1.7 TiB  ... 87.24 ...  66 osd.5
>   6hdd1.7   0.44998  1.7 TiB  ... 47.30 ...  36 osd.6
>  10hdd2.7   0.85004  2.7 TiB  ... 83.23 ... 100 osd.10
>  15hdd2.7   0.75006  2.7 TiB  ... 74.26 ...  88 osd.15
>  17hdd0.5   0.85004  1.6 TiB  ... 91.44 ...  67 osd.17
>  20hdd2.0   0.85004  1.7 TiB  ... 88.41 ...  68 osd.20
>  21hdd2.7   0.75006  2.7 TiB  ... 77.25 ...  91 osd.21
>  25hdd1.7   0.90002  1.7 TiB  ... 78.31 ...  60 osd.25
>  26hdd2.7   1.0  2.7 TiB  ... 82.75 ...  99 osd.26
>  27hdd2.7   0.90002  2.7 TiB  ... 84.26 ... 101 osd.27
>  63hdd1.8   0.90002  1.7 TiB  ... 84.15 ...  65 osd.63
> -13  36.01057 -   36 TiB  ... 84.12 ...   - host 
> ceph05
>  11hdd7.15259   0.90002  7.2 TiB  ... 85.45 ... 273 osd.11
>  39hdd7.2   0.85004  7.2 TiB  ... 80.90 ... 257 osd.39
>  41hdd7.2   0.75006  7.2 TiB  ... 74.95 ... 239 osd.41
>  42hdd9.0   1.0  9.5 TiB  ... 92.00 ... 392 osd.42
>  43hdd5.45799   1.0  5.5 TiB  ... 84.84 ... 207 osd.43
> -21  65.79662 -   66 TiB  ... 74.29 ...   - rack
> RZ3.Reihe3.R10
>  -2  28.49664 -   29 TiB  ... 74.79 ...   - 

[ceph-users] OSDs not balanced

2024-03-04 Thread Ml Ml
Hello,

i wonder why my autobalancer is not working here:

root@ceph01:~# ceph -s
  cluster:
id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
health: HEALTH_ERR
1 backfillfull osd(s)
1 full osd(s)
1 nearfull osd(s)
4 pool(s) full

=> osd.17 was too full (92% or something like that)

root@ceph01:~# ceph osd df tree
ID   CLASS  WEIGHT REWEIGHT  SIZE ... %USE  ... PGS TYPE NAME
-25 209.50084 -  213 TiB  ... 69.56 ...   - datacenter
xxx-dc-root
-19  84.59369 -   86 TiB  ... 56.97 ...   - rack
RZ1.Reihe4.R10
 -3  35.49313 -   37 TiB  ... 57.88 ...   - host ceph02
  2hdd1.7   1.0  1.7 TiB  ... 58.77 ...  44 osd.2
  3hdd1.0   1.0  2.7 TiB  ... 22.14 ...  25 osd.3
  7hdd2.5   1.0  2.7 TiB  ... 58.84 ...  70 osd.7
  9hdd9.5   1.0  9.5 TiB  ... 63.07 ... 268 osd.9
 13hdd2.67029   1.0  2.7 TiB  ... 53.59 ...  65 osd.13
 16hdd2.8   1.0  2.7 TiB  ... 59.35 ...  71 osd.16
 19hdd1.7   1.0  1.7 TiB  ... 48.98 ...  37 osd.19
 23hdd2.38419   1.0  2.4 TiB  ... 59.33 ...  64 osd.23
 24hdd1.3   1.0  1.7 TiB  ... 51.23 ...  39 osd.24
 28hdd3.63869   1.0  3.6 TiB  ... 64.17 ... 104 osd.28
 31hdd2.7   1.0  2.7 TiB  ... 64.73 ...  76 osd.31
 32hdd3.3   1.0  3.3 TiB  ... 67.28 ... 101 osd.32
 -9  22.88817 -   23 TiB  ... 56.96 ...   - host ceph06
 35hdd7.15259   1.0  7.2 TiB  ... 55.71 ... 182 osd.35
 36hdd5.24519   1.0  5.2 TiB  ... 53.75 ... 128 osd.36
 45hdd5.24519   1.0  5.2 TiB  ... 60.91 ... 144 osd.45
 48hdd5.24519   1.0  5.2 TiB  ... 57.94 ... 139 osd.48
-17  26.21239 -   26 TiB  ... 55.67 ...   - host ceph08
 37hdd6.67569   1.0  6.7 TiB  ... 58.17 ... 174 osd.37
 40hdd9.53670   1.0  9.5 TiB  ... 58.54 ... 250 osd.40
 46hdd5.0   1.0  5.0 TiB  ... 52.39 ... 116 osd.46
 47hdd5.0   1.0  5.0 TiB  ... 50.05 ... 112 osd.47
-20  59.11053 -   60 TiB  ... 82.47 ...   - rack
RZ1.Reihe4.R9
 -4  23.09996 -   24 TiB  ... 79.92 ...   - host ceph03
  5hdd1.7   0.75006  1.7 TiB  ... 87.24 ...  66 osd.5
  6hdd1.7   0.44998  1.7 TiB  ... 47.30 ...  36 osd.6
 10hdd2.7   0.85004  2.7 TiB  ... 83.23 ... 100 osd.10
 15hdd2.7   0.75006  2.7 TiB  ... 74.26 ...  88 osd.15
 17hdd0.5   0.85004  1.6 TiB  ... 91.44 ...  67 osd.17
 20hdd2.0   0.85004  1.7 TiB  ... 88.41 ...  68 osd.20
 21hdd2.7   0.75006  2.7 TiB  ... 77.25 ...  91 osd.21
 25hdd1.7   0.90002  1.7 TiB  ... 78.31 ...  60 osd.25
 26hdd2.7   1.0  2.7 TiB  ... 82.75 ...  99 osd.26
 27hdd2.7   0.90002  2.7 TiB  ... 84.26 ... 101 osd.27
 63hdd1.8   0.90002  1.7 TiB  ... 84.15 ...  65 osd.63
-13  36.01057 -   36 TiB  ... 84.12 ...   - host ceph05
 11hdd7.15259   0.90002  7.2 TiB  ... 85.45 ... 273 osd.11
 39hdd7.2   0.85004  7.2 TiB  ... 80.90 ... 257 osd.39
 41hdd7.2   0.75006  7.2 TiB  ... 74.95 ... 239 osd.41
 42hdd9.0   1.0  9.5 TiB  ... 92.00 ... 392 osd.42
 43hdd5.45799   1.0  5.5 TiB  ... 84.84 ... 207 osd.43
-21  65.79662 -   66 TiB  ... 74.29 ...   - rack
RZ3.Reihe3.R10
 -2  28.49664 -   29 TiB  ... 74.79 ...   - host ceph01
  0hdd2.7   1.0  2.7 TiB  ... 73.82 ...  88 osd.0
  1hdd3.63869   1.0  3.6 TiB  ... 73.47 ... 121 osd.1
  4hdd2.7   1.0  2.7 TiB  ... 74.63 ...  89 osd.4
  8hdd2.7   1.0  2.7 TiB  ... 77.10 ...  92 osd.8
 12hdd2.7   1.0  2.7 TiB  ... 78.76 ...  94 osd.12
 14hdd5.45799   1.0  5.5 TiB  ... 78.86 ... 193 osd.14
 18hdd1.8   1.0  2.7 TiB  ... 63.79 ...  76 osd.18
 22hdd1.7   1.0  1.7 TiB  ... 74.85 ...  57 osd.22
 30hdd1.7   1.0  1.7 TiB  ... 76.34 ...  59 osd.30
 64hdd3.2   1.0  3.3 TiB  ... 73.48 ... 110 osd.64
-11  12.3 -   12 TiB  ... 73.40 ...   - host ceph04
 34hdd5.2   1.0  5.2 TiB  

[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Frank Schilder
Hi all, coming late to the party but want to ship in as well with some 
experience.

The problem of tail latencies of individual OSDs is a real pain for any 
redundant storage system. However, there is a way to deal with this in an 
elegant way when using large replication factors. The idea is to use the 
counterpart of the "fast read" option that exists for EC pools and:

1) make this option available to replicated pools as well (is on the road map 
as far as I know), but also
2) implement an option "fast write" for all pool types.

Fast write enabled would mean that the primary OSD sends #size copies to the 
entire active set (including itself) in parallel and sends an ACK to the client 
as soon as min_size ACKs have been received from the peers (including itself). 
In this way, one can tolerate (size-min_size) slow(er) OSDs (slow for whatever 
reason) without suffering performance penalties immediately (only after too 
many requests started piling up, which will show as a slow requests warning).

I have fast read enabled on all EC pools. This does increase the 
cluster-internal network traffic, which is nowadays absolutely no problem (in 
the good old 1G times it potentially would be). In return, the read latencies 
on the client side are lower and much more predictable. In effect, the user 
experience improved dramatically.

I would really wish that such an option gets added as we use wide replication 
profiles (rep-(4,2) and EC(8+3), each with 2 "spare" OSDs) and exploiting large 
replication factors (more precisely, large (size-min_size)) to mitigate the 
impact of slow OSDs would be awesome. It would also add some incentive to stop 
the ridiculous size=2 min_size=1 habit, because one gets an extra gain from 
replication on top of redundancy.

In the long run, the ceph write path should try to deal with a-priori known 
different-latency connections (fast local ACK with async remote completion, was 
asked for a couple of times), for example, for stretched clusters where one has 
an internal connection for the local part and external connections for the 
remote parts. It would be great to have similar ways of mitigating some 
penalties of the slow write paths to remote sites.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Peter Grandi 
Sent: Wednesday, February 21, 2024 1:10 PM
To: list Linux fs Ceph
Subject: [ceph-users] Re: Performance improvement suggestion

> 1. Write object A from client.
> 2. Fsync to primary device completes.
> 3. Ack to client.
> 4. Writes sent to replicas.
[...]

As mentioned in the discussion this proposal is the opposite of
what the current policy, is, which is to wait for all replicas
to be written before writes are acknowledged to the client:

https://github.com/ceph/ceph/blob/main/doc/architecture.rst

   "After identifying the target placement group, the client
   writes the object to the identified placement group's primary
   OSD. The primary OSD then [...] confirms that the object was
   stored successfully in the secondary and tertiary OSDs, and
   reports to the client that the object was stored
   successfully."

A more revolutionary option would be for 'librados' to write in
parallel to all the "active set" OSDs and report this to the
primary, but that would greatly increase client-Ceph traffic,
while the current logic increases traffic only among OSDs.

> So I think that to maintain any semblance of reliability,
> you'd need to at least wait for a commit ack from the first
> replica (i.e. min_size=2).

Perhaps it could be similar to 'k'+'m' for EC, that is 'k'
synchronous (write completes to the client only when all at
least 'k' replicas, including primary, have been committed) and
'm' asynchronous, instead of 'k' being just 1 or 2.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-crash NOT reporting crashes due to wrong permissions on /var/lib/ceph/crash/posted (Debian / Ubuntu packages)

2024-03-04 Thread Eneko Lacunza

Hi,

El 2/3/24 a las 18:00, Tyler Stachecki escribió:

On 23.02.24 16:18, Christian Rohmann wrote:

I just noticed issues with ceph-crash using the Debian /Ubuntu
packages (package: ceph-base):

While the /var/lib/ceph/crash/posted folder is created by the package
install,
it's not properly chowned to ceph:ceph by the postinst script.

...

You might want to check if you might be affected as well.
Failing to post crashes to the local cluster results in them not being
reported back via telemetry.

Sorry to bluntly bump this again, but did nobody else notice this on
your clusters?
Call me egoistic, but the more clusters return crash reports the more
stable my Ceph likely becomes ;-)

I do observe the ownership does not match ceph:ceph on Debian with v17.2.7.
$ sudo ls -l /var/lib/ceph/crash | grep posted
drwxr-xr-x 2 root root 4096 Feb 10 19:23 posted

The issue seems to be that the postinst script does not recursively
chown and only chowns subdirectories directly under /var/lib/ceph:
https://github.com/ceph/ceph/blob/91e8cea0d31775de0e59936b3608a9a453353a45/debian/ceph-base.postinst#L40

The rpm spec looks to do subdirectories under /var/lib/ceph as well,
but explicitly lists everything out instead of globs, and also
includes posted:
https://github.com/ceph/ceph/blob/91e8cea0d31775de0e59936b3608a9a453353a45/ceph.spec.in#L1643


This seems to have been fixed in Proxmox recently:

* master (reef?): 
https://lists.proxmox.com/pipermail/pve-devel/2024-February/061803.html
* quincy: 
https://lists.proxmox.com/pipermail/pve-devel/2024-February/061798.html


Not sure this has been upstreamed.

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io