[ceph-users] Re: Lousy recovery for mclock and reef

2024-05-25 Thread Zakhar Kirpichenko
Hi!

Could you please elaborate what you meant by "adding another disc to the
recovery process"?

/Z


On Sat, 25 May 2024, 22:49 Mazzystr,  wrote:

> Well this was an interesting journey through the bowels of Ceph.  I have
> about 6 hours into tweaking every setting imaginable just to circle back to
> my basic configuration and 2G memory target per osd.  I was never able to
> exceed 22 Mib/Sec recovery time during that journey.
>
> I did end up fixing the issue and now I see the following -
>
>   io:
> recovery: 129 MiB/s, 33 objects/s
>
> This is normal for my measly cluster.  I like micro ceph clusters.  I have
> a lot of them. :)
>
> What was the fix?  Adding another disc to the recovery process!  I was
> recovering to one disc now I'm recovering to two.  I have three total that
> need to be recovered.  Somehow that one disc was completely swamped.  I was
> unable to see it in htop, atop, iostat.  Disc business was 6% max.
>
> My config is back to mclock scheduler, profile high_recovery_ops, and
> backfills of 256.
>
> Thank you everyone that took the time to review and contribute.  Hopefully
> this provides some modern information for the next person that has slow
> recovery.
>
> /Chris C
>
>
>
>
>
> On Fri, May 24, 2024 at 1:43 PM Kai Stian Olstad 
> wrote:
>
> > On 24.05.2024 21:07, Mazzystr wrote:
> > > I did the obnoxious task of updating ceph.conf and restarting all my
> > > osds.
> > >
> > > ceph --admin-daemon /var/run/ceph/ceph-osd.*.asok config get
> > > osd_op_queue
> > > {
> > > "osd_op_queue": "wpq"
> > > }
> > >
> > > I have some spare memory on my target host/osd and increased the target
> > > memory of that OSD to 10 Gb and restarted.  No effect observed.  In
> > > fact
> > > mem usage on the host is stable so I don't think the change took effect
> > > even with updating ceph.conf, restart and a direct asok config set.
> > > target
> > > memory value is confirmed to be set via asok config get
> > >
> > > Nothing has helped.  I still cannot break the 21 MiB/s barrier.
> > >
> > > Does anyone have any more ideas?
> >
> > For recovery you can adjust the following.
> >
> > osd_max_backfills default is 1, in my system I get the best performance
> > with 3 and wpq.
> >
> > The following I have not adjusted myself, but you can try.
> > osd_recovery_max_active is default to 3.
> > osd_recovery_op_priority is default to 3, a lower number increases the
> > priority for recovery.
> >
> > All of them can be runtime adjusted.
> >
> >
> > --
> > Kai Stian Olstad
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove failed OSD

2024-05-04 Thread Zakhar Kirpichenko
I ended up manually cleaning up the OSD host, removing stale LVs and DM
entries, and then purging the OSD with `ceph osd purge osd.19`. Looks like
it's gone for good.

/Z

On Sat, 4 May 2024 at 08:29, Zakhar Kirpichenko  wrote:

> Hi!
>
> An OSD failed in our 16.2.15 cluster. I prepared it for removal and ran
> `ceph orch daemon rm osd.19 --force`. Somehow that didn't work as expected,
> so now we still have osd.19 in the crush map:
>
> -10 122.66965  host ceph02
>  19   1.0  osd.19 down 0  1.0
>
> But OSD has been cleaned up on the host, although incompletely, as both
> block and block.db LVs still exist.
>
> If I try to remove the OSD again, I get an error:
>
> # ceph orch daemon rm osd.19  --force
> Error EINVAL: Unable to find daemon(s) ['osd.19']
>
> How can I clean up this OSD and get rid of it completely, including the
> crush map? I would appreciate any suggestions or pointers.
>
> Best regards,
> Zakhar
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remove failed OSD

2024-05-03 Thread Zakhar Kirpichenko
Hi!

An OSD failed in our 16.2.15 cluster. I prepared it for removal and ran
`ceph orch daemon rm osd.19 --force`. Somehow that didn't work as expected,
so now we still have osd.19 in the crush map:

-10 122.66965  host ceph02
 19   1.0  osd.19 down 0  1.0

But OSD has been cleaned up on the host, although incompletely, as both
block and block.db LVs still exist.

If I try to remove the OSD again, I get an error:

# ceph orch daemon rm osd.19  --force
Error EINVAL: Unable to find daemon(s) ['osd.19']

How can I clean up this OSD and get rid of it completely, including the
crush map? I would appreciate any suggestions or pointers.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] Re: Ceph 16.2.x mon compactions, disk writes

2024-04-16 Thread Zakhar Kirpichenko
I remember that I found the part which said "if something goes wrong,
monitors will fail" rather discouraging :-)

/Z

On Tue, 16 Apr 2024 at 18:59, Eugen Block  wrote:

> Sorry, I meant extra-entrypoint-arguments:
>
> https://www.spinics.net/lists/ceph-users/msg79251.html
>
> Zitat von Eugen Block :
>
> > You can use the extra container arguments I pointed out a few months
> > ago. Those work in my test clusters, although I haven’t enabled that
> > in production yet. But it shouldn’t make a difference if it’s a test
> > cluster or not. 
> >
> > Zitat von Zakhar Kirpichenko :
> >
> >> Hi,
> >>
> >>> Did you noticed any downsides with your compression settings so far?
> >>
> >> None, at least on our systems. Except the part that I haven't found a
> way
> >> to make the settings persist.
> >>
> >>> Do you have all mons now on compression?
> >>
> >> I have 3 out of 5 monitors with compression and 2 without it. The 2
> >> monitors with uncompressed RocksDB have much larger disks which do not
> >> suffer from writes as much as the other 3. I keep them uncompressed
> "just
> >> in case", i.e. for the unlikely event if the 3 monitors with compressed
> >> RocksDB fail or have any issues specifically because of the
> compression. I
> >> have to say that this hasn't happened yet, and this precaution may be
> >> unnecessary.
> >>
> >>> Did release updates go through without issues?
> >>
> >> In our case, container updates overwrite the monitors' configurations
> and
> >> reset RocksDB options, thus each updated monitor runs with no RocksDB
> >> compression until it is added back manually. Other than that, I have not
> >> encountered any issues related to compression during the updates.
> >>
> >>> Do you know if this works also with reef (we see massive writes as well
> >> there)?
> >>
> >> Unfortunately, I can't comment on Reef as we're still using Pacific.
> >>
> >> /Z
> >>
> >> On Tue, 16 Apr 2024 at 18:08, Dietmar Rieder <
> dietmar.rie...@i-med.ac.at>
> >> wrote:
> >>
> >>> Hi Zakhar, hello List,
> >>>
> >>> I just wanted to follow up on this and ask a few quesitions:
> >>>
> >>> Did you noticed any downsides with your compression settings so far?
> >>> Do you have all mons now on compression?
> >>> Did release updates go through without issues?
> >>> Do you know if this works also with reef (we see massive writes as well
> >>> there)?
> >>>
> >>> Can you briefly tabulate the commands you used to persistently set the
> >>> compression options?
> >>>
> >>> Thanks so much,
> >>>
> >>>   Dietmar
> >>>
> >>>
> >>> On 10/18/23 06:14, Zakhar Kirpichenko wrote:
> >>>> Many thanks for this, Eugen! I very much appreciate yours and Mykola's
> >>>> efforts and insight!
> >>>>
> >>>> Another thing I noticed was a reduction of RocksDB store after the
> >>>> reduction of the total PG number by 30%, from 590-600 MB:
> >>>>
> >>>> 65M 3675511.sst
> >>>> 65M 3675512.sst
> >>>> 65M 3675513.sst
> >>>> 65M 3675514.sst
> >>>> 65M 3675515.sst
> >>>> 65M 3675516.sst
> >>>> 65M 3675517.sst
> >>>> 65M 3675518.sst
> >>>> 62M 3675519.sst
> >>>>
> >>>> to about half of the original size:
> >>>>
> >>>> -rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
> >>>> -rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
> >>>> -rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
> >>>> -rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst
> >>>>
> >>>> Then when I restarted the monitors one by one before adding
> compression,
> >>>> RocksDB store reduced even further. I am not sure why and what exactly
> >>> got
> >>>> automatically removed from the store:
> >>>>
> >>>> -rw-r--r-- 1 167 167   841960 Oct 18 03:31 018779.log
> >>>> -rw-r--r-- 1 167 167 67290532 Oct 18 03:31 018781.sst
> >>>> -rw-r--r-- 1 167 167 53287626 Oct 18 03:31 018782.sst
> >>>>
> >>>> Then I have en

[ceph-users] Re: [EXTERN] Re: Ceph 16.2.x mon compactions, disk writes

2024-04-16 Thread Zakhar Kirpichenko
Hi,

>Did you noticed any downsides with your compression settings so far?

None, at least on our systems. Except the part that I haven't found a way
to make the settings persist.

>Do you have all mons now on compression?

I have 3 out of 5 monitors with compression and 2 without it. The 2
monitors with uncompressed RocksDB have much larger disks which do not
suffer from writes as much as the other 3. I keep them uncompressed "just
in case", i.e. for the unlikely event if the 3 monitors with compressed
RocksDB fail or have any issues specifically because of the compression. I
have to say that this hasn't happened yet, and this precaution may be
unnecessary.

>Did release updates go through without issues?

In our case, container updates overwrite the monitors' configurations and
reset RocksDB options, thus each updated monitor runs with no RocksDB
compression until it is added back manually. Other than that, I have not
encountered any issues related to compression during the updates.

>Do you know if this works also with reef (we see massive writes as well
there)?

Unfortunately, I can't comment on Reef as we're still using Pacific.

/Z

On Tue, 16 Apr 2024 at 18:08, Dietmar Rieder 
wrote:

> Hi Zakhar, hello List,
>
> I just wanted to follow up on this and ask a few quesitions:
>
> Did you noticed any downsides with your compression settings so far?
> Do you have all mons now on compression?
> Did release updates go through without issues?
> Do you know if this works also with reef (we see massive writes as well
> there)?
>
> Can you briefly tabulate the commands you used to persistently set the
> compression options?
>
> Thanks so much,
>
>Dietmar
>
>
> On 10/18/23 06:14, Zakhar Kirpichenko wrote:
> > Many thanks for this, Eugen! I very much appreciate yours and Mykola's
> > efforts and insight!
> >
> > Another thing I noticed was a reduction of RocksDB store after the
> > reduction of the total PG number by 30%, from 590-600 MB:
> >
> > 65M 3675511.sst
> > 65M 3675512.sst
> > 65M 3675513.sst
> > 65M 3675514.sst
> > 65M 3675515.sst
> > 65M 3675516.sst
> > 65M 3675517.sst
> > 65M 3675518.sst
> > 62M 3675519.sst
> >
> > to about half of the original size:
> >
> > -rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
> > -rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
> > -rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
> > -rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst
> >
> > Then when I restarted the monitors one by one before adding compression,
> > RocksDB store reduced even further. I am not sure why and what exactly
> got
> > automatically removed from the store:
> >
> > -rw-r--r-- 1 167 167   841960 Oct 18 03:31 018779.log
> > -rw-r--r-- 1 167 167 67290532 Oct 18 03:31 018781.sst
> > -rw-r--r-- 1 167 167 53287626 Oct 18 03:31 018782.sst
> >
> > Then I have enabled LZ4 and LZ4HC compression in our small production
> > cluster (6 nodes, 96 OSDs) on 3 out of 5
> > monitors:
> compression=kLZ4Compression,bottommost_compression=kLZ4HCCompression.
> > I specifically went for LZ4 and LZ4HC because of the balance between
> > compression/decompression speed and impact on CPU usage. The compression
> > doesn't seem to affect the cluster in any negative way, the 3 monitors
> with
> > compression are operating normally. The effect of the compression on
> > RocksDB store size and disk writes is quite noticeable:
> >
> > Compression disabled, 155 MB store.db, ~125 MB RocksDB sst, and ~530 MB
> > writes over 5 minutes:
> >
> > -rw-r--r-- 1 167 167  4227337 Oct 18 03:58 3080868.log
> > -rw-r--r-- 1 167 167 67253592 Oct 18 03:57 3080870.sst
> > -rw-r--r-- 1 167 167 57783180 Oct 18 03:57 3080871.sst
> >
> > # du -hs
> > /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/;
> > iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
> > 155M
> >   /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
> > 2471602 be/4 167   6.05 M473.24 M  0.00 %  0.16 % ceph-mon -n
> > mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> > --default-log-to-stderr=true --default-log-stderr-prefix=debug
> >   --default-mon-cluster-log-to-file=false
> > --default-mon-cluster-log-to-stderr=true [rocksdb:low0]
> > 2471633 be/4 167 188.00 K 40.91 M  0.00 %  0.02 % ceph-mon -n
> > mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> > --default-log-to-stderr=true --default-log-stderr-prefix=debug
> >   --default-mon-cluster-log-to-f

[ceph-users] Re: cephadm: daemon osd.x on yyy is in error state

2024-04-06 Thread Zakhar Kirpichenko
Well, I've replaced the failed drives and that cleared the error. Arguably,
it was a better solution :-)

/Z

On Sat, 6 Apr 2024 at 10:13,  wrote:

> did it help?  Maybe you found a better solution?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-04-04 Thread Zakhar Kirpichenko
Thank you, Eugen. This makes sense.

/Z

On Thu, 4 Apr 2024 at 10:32, Eugen Block  wrote:

> Hi,
>
> the noin flag seems to be only applicable to existing OSDs which are
> already in the crushmap. It doesn't apply to newly created OSDs, I
> could confirm that in a small test cluster with Pacific and Reef. I
> don't have any insights if that is by design or not, I assume it's
> supposed to work like that.
> If you want to prevent data movement when creating new OSDs you could
> use the osd_crush_initial_weight config option and set it to 0. We
> have that in our cluster as well, but of course you'd have to reweight
> new OSDs manually.
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Any comments regarding `osd noin`, please?
> >
> > /Z
> >
> > On Tue, 2 Apr 2024 at 16:09, Zakhar Kirpichenko 
> wrote:
> >
> >> Hi,
> >>
> >> I'm adding a few OSDs to an existing cluster, the cluster is running
> with
> >> `osd noout,noin`:
> >>
> >>   cluster:
> >> id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
> >> health: HEALTH_WARN
> >> noout,noin flag(s) set
> >>
> >> Specifically `noin` is documented as "prevents booting OSDs from being
> >> marked in". But freshly added OSDs were immediately marked `up` and
> `in`:
> >>
> >>   services:
> >> ...
> >> osd: 96 osds: 96 up (since 5m), 96 in (since 6m); 338 remapped pgs
> >>  flags noout,noin
> >>
> >> # ceph osd tree in | grep -E "osd.11|osd.12|osd.26"
> >>  11hdd9.38680  osd.11   up   1.0
> 1.0
> >>  12hdd9.38680  osd.12   up   1.0
> 1.0
> >>  26hdd9.38680  osd.26   up   1.0
> 1.0
> >>
> >> Is this expected behavior? Do I misunderstand the purpose of the `noin`
> >> option?
> >>
> >> Best regards,
> >> Zakhar
> >>
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-04-04 Thread Zakhar Kirpichenko
Thanks, this is a good suggestion!

/Z

On Thu, 4 Apr 2024 at 10:29, Janne Johansson  wrote:

> Den tors 4 apr. 2024 kl 06:11 skrev Zakhar Kirpichenko :
> > Any comments regarding `osd noin`, please?
> > >
> > > I'm adding a few OSDs to an existing cluster, the cluster is running
> with
> > > `osd noout,noin`:
> > >
> > >   cluster:
> > > id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
> > > health: HEALTH_WARN
> > > noout,noin flag(s) set
> > >
> > > Specifically `noin` is documented as "prevents booting OSDs from being
> > > marked in". But freshly added OSDs were immediately marked `up` and
> `in`:
>
> Only that we mostly set "initial osd crush weight" to 0.0001 so they
> are up and in at first start, but don't receive PGs, instead of using
> "noin" when adding new OSDs.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-04-03 Thread Zakhar Kirpichenko
Any comments regarding `osd noin`, please?

/Z

On Tue, 2 Apr 2024 at 16:09, Zakhar Kirpichenko  wrote:

> Hi,
>
> I'm adding a few OSDs to an existing cluster, the cluster is running with
> `osd noout,noin`:
>
>   cluster:
> id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
> health: HEALTH_WARN
> noout,noin flag(s) set
>
> Specifically `noin` is documented as "prevents booting OSDs from being
> marked in". But freshly added OSDs were immediately marked `up` and `in`:
>
>   services:
> ...
> osd: 96 osds: 96 up (since 5m), 96 in (since 6m); 338 remapped pgs
>  flags noout,noin
>
> # ceph osd tree in | grep -E "osd.11|osd.12|osd.26"
>  11hdd9.38680  osd.11   up   1.0  1.0
>  12hdd9.38680  osd.12   up   1.0  1.0
>  26hdd9.38680  osd.26   up   1.0  1.0
>
> Is this expected behavior? Do I misunderstand the purpose of the `noin`
> option?
>
> Best regards,
> Zakhar
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replace block drives of combined NVME+HDD OSDs

2024-04-02 Thread Zakhar Kirpichenko
Thank you, Eugen.

It was actually very straightforward. I'm happy to report back that there
were no issues with removing and zapping the OSDs whose data devices were
unavailable. I had to manually remove stale dm entries, but that was it.

/Z

On Tue, 2 Apr 2024 at 11:00, Eugen Block  wrote:

> Hi,
>
> here's the link to the docs [1] how to replace OSDs.
>
> ceph orch osd rm  --replace --zap [--force]
>
> This should zap both the data drive and db LV (yes, its data is
> useless without the data drive), not sure how it will handle if the
> data drive isn't accessible though.
> One thing I'm not sure about is how your spec file will be handled.
> Since the drive letters can change I recommend to use a more generic
> approach, for example the rotational flags and drive sizes instead of
> paths. But if the drive letters won't change for the replaced drives
> it should work. I also don't expect an impact on the rest of the OSDs
> (except for backfilling, of course).
>
> Regards,
> Eugen
>
> [1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > Unfortunately, some of our HDDs failed and we need to replace these
> drives
> > which are parts of "combined" OSDs (DB/WAL on NVME, block storage on
> HDD).
> > All OSDs are defined with a service definition similar to this one:
> >
> > ```
> > service_type: osd
> > service_id: ceph02_combined_osd
> > service_name: osd.ceph02_combined_osd
> > placement:
> >   hosts:
> >   - ceph02
> > spec:
> >   data_devices:
> > paths:
> > - /dev/sda
> > - /dev/sdb
> > - /dev/sdc
> > - /dev/sdd
> > - /dev/sde
> > - /dev/sdf
> > - /dev/sdg
> > - /dev/sdh
> > - /dev/sdi
> >   db_devices:
> > paths:
> > - /dev/nvme0n1
> > - /dev/nvme1n1
> >   filter_logic: AND
> >   objectstore: bluestore
> > ```
> >
> > In the above example, HDDs `sda` and `sdb` are not readable and data
> cannot
> > be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
> > are intact, but I guess that data is useless. I think the best approach
> is
> > to replace the dead drives and completely rebuild each affected OSD. How
> > should we go about this, preferably in a way that other OSDs on the node
> > remain unaffected and operational?
> >
> > I would appreciate any advice or pointers to the relevant documentation.
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Pacific 16.2.15 `osd noin`

2024-04-02 Thread Zakhar Kirpichenko
Hi,

I'm adding a few OSDs to an existing cluster, the cluster is running with
`osd noout,noin`:

  cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_WARN
noout,noin flag(s) set

Specifically `noin` is documented as "prevents booting OSDs from being
marked in". But freshly added OSDs were immediately marked `up` and `in`:

  services:
...
osd: 96 osds: 96 up (since 5m), 96 in (since 6m); 338 remapped pgs
 flags noout,noin

# ceph osd tree in | grep -E "osd.11|osd.12|osd.26"
 11hdd9.38680  osd.11   up   1.0  1.0
 12hdd9.38680  osd.12   up   1.0  1.0
 26hdd9.38680  osd.26   up   1.0  1.0

Is this expected behavior? Do I misunderstand the purpose of the `noin`
option?

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Replace block drives of combined NVME+HDD OSDs

2024-04-01 Thread Zakhar Kirpichenko
Hi,

Unfortunately, some of our HDDs failed and we need to replace these drives
which are parts of "combined" OSDs (DB/WAL on NVME, block storage on HDD).
All OSDs are defined with a service definition similar to this one:

```
service_type: osd
service_id: ceph02_combined_osd
service_name: osd.ceph02_combined_osd
placement:
  hosts:
  - ceph02
spec:
  data_devices:
paths:
- /dev/sda
- /dev/sdb
- /dev/sdc
- /dev/sdd
- /dev/sde
- /dev/sdf
- /dev/sdg
- /dev/sdh
- /dev/sdi
  db_devices:
paths:
- /dev/nvme0n1
- /dev/nvme1n1
  filter_logic: AND
  objectstore: bluestore
```

In the above example, HDDs `sda` and `sdb` are not readable and data cannot
be copied over to new HDDs. NVME partitions of `nvme0n1` with DB/WAL data
are intact, but I guess that data is useless. I think the best approach is
to replace the dead drives and completely rebuild each affected OSD. How
should we go about this, preferably in a way that other OSDs on the node
remain unaffected and operational?

I would appreciate any advice or pointers to the relevant documentation.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm: daemon osd.x on yyy is in error state

2024-03-30 Thread Zakhar Kirpichenko
Hi,

A disk failed in our cephadm-managed 16.2.15 cluster, the affected OSD is
down, out and stopped with cephadm, I also removed the failed drive from
the host's service definition. The cluster has finished recovering but the
following warning persists:

[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon osd.11 on ceph02 is in error state

Is it possible to remove or suppress this warning without having to
completely remove the OSD?

I would appreciate any advice or pointers.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Zakhar Kirpichenko
Well, that option could be included in new mon configs generated during mon
upgrades. But it isn't being used, a minimal config is written instead.
I.e. it seems that the configuration option is useless for all intents and
purposes, as it doesn't seem to be taken into account at any stage of a
mon's lifecycle.

/Z

On Tue, 5 Mar 2024 at 10:09, Eugen Block  wrote:

> Hi,
>
> > I also added it to the cluster config
> > with "ceph config set mon mon_rocksdb_options", but it seems that this
> > option doesn't have any effect at all.
>
> that's because it's an option that has to be present *during* mon
> startup, not *after* the startup when it can read the config store.
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi Eugen,
> >
> > It is correct that I manually added the configuration, but not to the
> > unit.run but rather to each mon's config (i.e.
> > /var/lib/ceph/FSID/mon.*/config). I also added it to the cluster config
> > with "ceph config set mon mon_rocksdb_options", but it seems that this
> > option doesn't have any effect at all.
> >
> > /Z
> >
> > On Tue, 5 Mar 2024 at 09:58, Eugen Block  wrote:
> >
> >> Hi,
> >>
> >> > 1. RocksDB options, which I provided to each mon via their
> configuration
> >> > files, got overwritten during mon redeployment and I had to re-add
> >> > mon_rocksdb_options back.
> >>
> >> IIRC, you didn't use the extra_entrypoint_args for that option but
> >> added it directly to the container unit.run file. So it's expected
> >> that it's removed after an update. If you want it to persist a
> >> container update you should consider using the extra_entrypoint_args:
> >>
> >> cat mon.yaml
> >> service_type: mon
> >> service_name: mon
> >> placement:
> >>hosts:
> >>- host1
> >>- host2
> >>- host3
> >> extra_entrypoint_args:
> >>-
> >>
> >>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
> >>
> >> Regards,
> >> Eugen
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Hi,
> >> >
> >> > I have upgraded my test and production cephadm-managed clusters from
> >> > 16.2.14 to 16.2.15. The upgrade was smooth and completed without
> issues.
> >> > There were a few things which I noticed after each upgrade:
> >> >
> >> > 1. RocksDB options, which I provided to each mon via their
> configuration
> >> > files, got overwritten during mon redeployment and I had to re-add
> >> > mon_rocksdb_options back.
> >> >
> >> > 2. Monitor debug_rocksdb option got silently reset back to the default
> >> 4/5,
> >> > I had to set it back to 1/5.
> >> >
> >> > 3. For roughly 2 hours after the upgrade, despite the clusters being
> >> > healthy and operating normally, all monitors would run manual
> compactions
> >> > very often and write to disks at very high rates. For example,
> production
> >> > monitors had their rocksdb:low0 thread write to store.db:
> >> >
> >> > monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
> >> > monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.
> >> >
> >> > After roughly 2 hours with no changes to the cluster the write rates
> >> > dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The
> reason
> >> for
> >> > frequent manual compactions and high write rates wasn't immediately
> >> > apparent.
> >> >
> >> > 4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
> >> > /var/lib/ceph/FSID/crash/posted, despite I already fixed it manually
> >> after
> >> > the upgrade to 16.2.14 which had broken it as well.
> >> >
> >> > 5. Mgr RAM usage appears to be increasing at a slower rate than it did
> >> with
> >> > 16.2.14, although it's too early to tell whether the issue with mgrs
> >> > randomly consuming all RAM and getting OOM-killed has been fixed -
> with
> >> > 16.2.14 this would normally take several days.
> >> >
> >> > Overall, things look good. Thanks to the Ceph team for this release!
> >> >
> >> > Zakhar
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Zakhar Kirpichenko
Hi Eugen,

It is correct that I manually added the configuration, but not to the
unit.run but rather to each mon's config (i.e.
/var/lib/ceph/FSID/mon.*/config). I also added it to the cluster config
with "ceph config set mon mon_rocksdb_options", but it seems that this
option doesn't have any effect at all.

/Z

On Tue, 5 Mar 2024 at 09:58, Eugen Block  wrote:

> Hi,
>
> > 1. RocksDB options, which I provided to each mon via their configuration
> > files, got overwritten during mon redeployment and I had to re-add
> > mon_rocksdb_options back.
>
> IIRC, you didn't use the extra_entrypoint_args for that option but
> added it directly to the container unit.run file. So it's expected
> that it's removed after an update. If you want it to persist a
> container update you should consider using the extra_entrypoint_args:
>
> cat mon.yaml
> service_type: mon
> service_name: mon
> placement:
>hosts:
>- host1
>- host2
>- host3
> extra_entrypoint_args:
>-
>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > I have upgraded my test and production cephadm-managed clusters from
> > 16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
> > There were a few things which I noticed after each upgrade:
> >
> > 1. RocksDB options, which I provided to each mon via their configuration
> > files, got overwritten during mon redeployment and I had to re-add
> > mon_rocksdb_options back.
> >
> > 2. Monitor debug_rocksdb option got silently reset back to the default
> 4/5,
> > I had to set it back to 1/5.
> >
> > 3. For roughly 2 hours after the upgrade, despite the clusters being
> > healthy and operating normally, all monitors would run manual compactions
> > very often and write to disks at very high rates. For example, production
> > monitors had their rocksdb:low0 thread write to store.db:
> >
> > monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
> > monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.
> >
> > After roughly 2 hours with no changes to the cluster the write rates
> > dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason
> for
> > frequent manual compactions and high write rates wasn't immediately
> > apparent.
> >
> > 4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
> > /var/lib/ceph/FSID/crash/posted, despite I already fixed it manually
> after
> > the upgrade to 16.2.14 which had broken it as well.
> >
> > 5. Mgr RAM usage appears to be increasing at a slower rate than it did
> with
> > 16.2.14, although it's too early to tell whether the issue with mgrs
> > randomly consuming all RAM and getting OOM-killed has been fixed - with
> > 16.2.14 this would normally take several days.
> >
> > Overall, things look good. Thanks to the Ceph team for this release!
> >
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgraded 16.2.14 to 16.2.15

2024-03-04 Thread Zakhar Kirpichenko
Hi,

I have upgraded my test and production cephadm-managed clusters from
16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
There were a few things which I noticed after each upgrade:

1. RocksDB options, which I provided to each mon via their configuration
files, got overwritten during mon redeployment and I had to re-add
mon_rocksdb_options back.

2. Monitor debug_rocksdb option got silently reset back to the default 4/5,
I had to set it back to 1/5.

3. For roughly 2 hours after the upgrade, despite the clusters being
healthy and operating normally, all monitors would run manual compactions
very often and write to disks at very high rates. For example, production
monitors had their rocksdb:low0 thread write to store.db:

monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.

After roughly 2 hours with no changes to the cluster the write rates
dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason for
frequent manual compactions and high write rates wasn't immediately
apparent.

4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
/var/lib/ceph/FSID/crash/posted, despite I already fixed it manually after
the upgrade to 16.2.14 which had broken it as well.

5. Mgr RAM usage appears to be increasing at a slower rate than it did with
16.2.14, although it's too early to tell whether the issue with mgrs
randomly consuming all RAM and getting OOM-killed has been fixed - with
16.2.14 this would normally take several days.

Overall, things look good. Thanks to the Ceph team for this release!

Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v16.2.15 Pacific released

2024-03-04 Thread Zakhar Kirpichenko
This is great news! Many thanks!

/Z

On Mon, 4 Mar 2024 at 17:25, Yuri Weinstein  wrote:

> We're happy to announce the 15th, and expected to be the last,
> backport release in the Pacific series.
>
> https://ceph.io/en/news/blog/2024/v16-2-15-pacific-released/
>
> Notable Changes
> ---
>
> * `ceph config dump --format ` output will display the localized
>   option names instead of their normalized version. For example,
>   "mgr/prometheus/x/server_port" will be displayed instead of
>   "mgr/prometheus/server_port". This matches the output of the non
> pretty-print
>   formatted version of the command.
>
> * CephFS: MDS evicts clients who are not advancing their request tids,
> which causes
>   a large buildup of session metadata, resulting in the MDS going
> read-only due to
>   the RADOS operation exceeding the size threshold. The
> `mds_session_metadata_threshold`
>   config controls the maximum size that an (encoded) session metadata can
> grow.
>
> * RADOS: The `get_pool_is_selfmanaged_snaps_mode` C++ API has been
> deprecated
>   due to its susceptibility to false negative results.  Its safer
> replacement is
>   `pool_is_in_selfmanaged_snaps_mode`.
>
> * RBD: When diffing against the beginning of time (`fromsnapname == NULL`)
> in
>   fast-diff mode (`whole_object == true` with `fast-diff` image feature
> enabled
>   and valid), diff-iterate is now guaranteed to execute locally if
> exclusive
>   lock is available.  This brings a dramatic performance improvement for
> QEMU
>   live disk synchronization and backup use cases.
>
> Getting Ceph
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at https://download.ceph.com/tarballs/ceph-16.2.15.tar.gz
> * Containers at https://quay.io/repository/ceph/ceph
> * For packages, see https://docs.ceph.com/en/latest/install/get-packages/
> * Release git sha1: 618f440892089921c3e944a991122ddc44e60516
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What's up with 16.2.15?

2024-02-29 Thread Zakhar Kirpichenko
Hi,

We randomly got several Pacific package updates to 16.2.15 available for
Ubuntu 20.04. As far as I can see, 16.2.15 hasn't been released and there's
been no release announcement. The updates seem to be no longer available.
What's going on with 16.2.15?

/Z
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm Failed to apply 1 service(s)

2024-02-16 Thread Zakhar Kirpichenko
Many thanks for your help, Eugen! Things are back to normal now :-)

/Z

On Fri, 16 Feb 2024 at 14:52, Eugen Block  wrote:

> Sure, you can save the drivegroup spec in a file, edit it according to
> your requirements (not sure if having device paths in there makes
> sense though) and apply it:
>
> ceph orch apply -i new-drivegroup.yml
>
> Zitat von Zakhar Kirpichenko :
>
> > Many thanks for your response, Eugen!
> >
> > I tried to fail mgr twice, unfortunately that had no effect on the issue.
> > Neither `cephadm ceph-volume inventory` nor `ceph device ls-by-host
> ceph03`
> > have the failed drive on the list.
> >
> > Though your assumption is correct, the spec appears to explicitly include
> > the failed drive:
> >
> > ---
> > service_type: osd
> > service_id: ceph03_combined_osd
> > service_name: osd.ceph03_combined_osd
> > placement:
> >   hosts:
> >   - ceph03
> > spec:
> >   data_devices:
> > paths:
> > ...
> > - /dev/sde
> > ...
> >   db_devices:
> > paths:
> > - /dev/nvme0n1
> > - /dev/nvme1n1
> >   filter_logic: AND
> >   objectstore: bluestore
> > ---
> >
> > Do you know the best way to remove the device from the spec?
> >
> > /Z
> >
> > On Fri, 16 Feb 2024 at 14:10, Eugen Block  wrote:
> >
> >> Hi,
> >>
> >> sometimes the easiest fix is to failover the mgr, have you tried that?
> >> If that didn't work, can you share the drivegroup spec?
> >>
> >> ceph orch ls  --export
> >>
> >> Does it contain specific device paths or something? Does 'cephadm ls'
> >> on that node show any traces of the previous OSD?
> >> I'd probably try to check some things like
> >>
> >> cephadm ceph-volume inventory
> >> ceph device ls-by-host 
> >>
> >> Regards,
> >> Eugen
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Hi,
> >> >
> >> > We had a physical drive malfunction in one of our Ceph OSD hosts
> managed
> >> by
> >> > cephadm (Ceph 16.2.14). I have removed the drive from the system, and
> the
> >> > kernel no longer sees it:
> >> >
> >> > ceph03 ~]# ls -al /dev/sde
> >> > ls: cannot access '/dev/sde': No such file or directory
> >> >
> >> > I have removed the corresponding OSD from cephadm, crush map, etc. For
> >> all
> >> > intents and purposes that OSD and its block device no longer exist:
> >> >
> >> > root@ceph01:/# ceph orch ps | grep osd.26
> >> > root@ceph01:/# ceph osd tree| grep 26
> >> > root@ceph01:/# ceph orch device ls | grep -E "ceph03.*sde"
> >> >
> >> > None of the above commands return anything. Cephadm correctly sees 8
> >> > remaining OSDs on the host:
> >> >
> >> > root@ceph01:/# ceph orch ls | grep ceph03_c
> >> > osd.ceph03_combined_osd 8  33s ago2y   ceph03
> >> >
> >> > Unfortunately, cephadm appears to be trying to apply a spec to host
> >> ceph03
> >> > including the disk that is now missing:
> >> >
> >> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
> >> > --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
> >> > --privileged --group-add=disk --init -e CONTAINER_IMAGE=
> >> >
> >>
> quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
> >> > -e NODE_NAME=ceph03 -e CEPH_USE_RANDOM_NONCE=1 -e
> >> > CEPH_VOLUME_OSDSPEC_AFFINITY=ceph03_combined_osd -e
> >> > CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> >> > /var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z -v
> >> > /var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z -v
> >> >
> >>
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z
> >> > -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm
> -v
> >> > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> >> > /tmp/ceph-tmpc7b33pf0:/etc/ceph/ceph.conf:z -v
> >> > /tmp/ceph-tmpq45nkmd6:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
> >> >
> >>
> quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
> >> > lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
> /d

[ceph-users] Re: cephadm Failed to apply 1 service(s)

2024-02-16 Thread Zakhar Kirpichenko
Answering my own question: I exported the spec, removed the failed drive
and re-applied the spec again; the spec appears to have been updated
correctly and the warning is gone.

/Z

On Fri, 16 Feb 2024 at 14:33, Zakhar Kirpichenko  wrote:

> Many thanks for your response, Eugen!
>
> I tried to fail mgr twice, unfortunately that had no effect on the issue.
> Neither `cephadm ceph-volume inventory` nor `ceph device ls-by-host ceph03`
> have the failed drive on the list.
>
> Though your assumption is correct, the spec appears to explicitly include
> the failed drive:
>
> ---
> service_type: osd
> service_id: ceph03_combined_osd
> service_name: osd.ceph03_combined_osd
> placement:
>   hosts:
>   - ceph03
> spec:
>   data_devices:
> paths:
> ...
> - /dev/sde
> ...
>   db_devices:
> paths:
> - /dev/nvme0n1
> - /dev/nvme1n1
>   filter_logic: AND
>   objectstore: bluestore
> ---
>
> Do you know the best way to remove the device from the spec?
>
> /Z
>
> On Fri, 16 Feb 2024 at 14:10, Eugen Block  wrote:
>
>> Hi,
>>
>> sometimes the easiest fix is to failover the mgr, have you tried that?
>> If that didn't work, can you share the drivegroup spec?
>>
>> ceph orch ls  --export
>>
>> Does it contain specific device paths or something? Does 'cephadm ls'
>> on that node show any traces of the previous OSD?
>> I'd probably try to check some things like
>>
>> cephadm ceph-volume inventory
>> ceph device ls-by-host 
>>
>> Regards,
>> Eugen
>>
>> Zitat von Zakhar Kirpichenko :
>>
>> > Hi,
>> >
>> > We had a physical drive malfunction in one of our Ceph OSD hosts
>> managed by
>> > cephadm (Ceph 16.2.14). I have removed the drive from the system, and
>> the
>> > kernel no longer sees it:
>> >
>> > ceph03 ~]# ls -al /dev/sde
>> > ls: cannot access '/dev/sde': No such file or directory
>> >
>> > I have removed the corresponding OSD from cephadm, crush map, etc. For
>> all
>> > intents and purposes that OSD and its block device no longer exist:
>> >
>> > root@ceph01:/# ceph orch ps | grep osd.26
>> > root@ceph01:/# ceph osd tree| grep 26
>> > root@ceph01:/# ceph orch device ls | grep -E "ceph03.*sde"
>> >
>> > None of the above commands return anything. Cephadm correctly sees 8
>> > remaining OSDs on the host:
>> >
>> > root@ceph01:/# ceph orch ls | grep ceph03_c
>> > osd.ceph03_combined_osd 8  33s ago2y   ceph03
>> >
>> > Unfortunately, cephadm appears to be trying to apply a spec to host
>> ceph03
>> > including the disk that is now missing:
>> >
>> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
>> > --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
>> > --privileged --group-add=disk --init -e CONTAINER_IMAGE=
>> >
>> quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
>> > -e NODE_NAME=ceph03 -e CEPH_USE_RANDOM_NONCE=1 -e
>> > CEPH_VOLUME_OSDSPEC_AFFINITY=ceph03_combined_osd -e
>> > CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
>> > /var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z -v
>> > /var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z -v
>> >
>> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z
>> > -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
>> > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
>> > /tmp/ceph-tmpc7b33pf0:/etc/ceph/ceph.conf:z -v
>> > /tmp/ceph-tmpq45nkmd6:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
>> >
>> quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
>> > lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
>> /dev/sdf
>> > /dev/sdg /dev/sdh /dev/sdi --db-devices /dev/nvme0n1 /dev/nvme1n1 --yes
>> > --no-systemd
>> >
>> > Note that `lvm batch` includes the missing drive, /dev/sde. This fails
>> > because the drive no longer exists. Other than this cephadm ceph-volume
>> > thingy, the cluster is healthy.How can I tell cephadm that it should
>> stop
>> > trying to use /dev/sde, which no longer exists, without affecting other
>> > OSDs on the host?
>> >
>> > I would very much appreciate any advice or pointers.
>> >
>> > Best regards,
>> > Zakhar
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm Failed to apply 1 service(s)

2024-02-16 Thread Zakhar Kirpichenko
Many thanks for your response, Eugen!

I tried to fail mgr twice, unfortunately that had no effect on the issue.
Neither `cephadm ceph-volume inventory` nor `ceph device ls-by-host ceph03`
have the failed drive on the list.

Though your assumption is correct, the spec appears to explicitly include
the failed drive:

---
service_type: osd
service_id: ceph03_combined_osd
service_name: osd.ceph03_combined_osd
placement:
  hosts:
  - ceph03
spec:
  data_devices:
paths:
...
- /dev/sde
...
  db_devices:
paths:
- /dev/nvme0n1
- /dev/nvme1n1
  filter_logic: AND
  objectstore: bluestore
---

Do you know the best way to remove the device from the spec?

/Z

On Fri, 16 Feb 2024 at 14:10, Eugen Block  wrote:

> Hi,
>
> sometimes the easiest fix is to failover the mgr, have you tried that?
> If that didn't work, can you share the drivegroup spec?
>
> ceph orch ls  --export
>
> Does it contain specific device paths or something? Does 'cephadm ls'
> on that node show any traces of the previous OSD?
> I'd probably try to check some things like
>
> cephadm ceph-volume inventory
> ceph device ls-by-host 
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > We had a physical drive malfunction in one of our Ceph OSD hosts managed
> by
> > cephadm (Ceph 16.2.14). I have removed the drive from the system, and the
> > kernel no longer sees it:
> >
> > ceph03 ~]# ls -al /dev/sde
> > ls: cannot access '/dev/sde': No such file or directory
> >
> > I have removed the corresponding OSD from cephadm, crush map, etc. For
> all
> > intents and purposes that OSD and its block device no longer exist:
> >
> > root@ceph01:/# ceph orch ps | grep osd.26
> > root@ceph01:/# ceph osd tree| grep 26
> > root@ceph01:/# ceph orch device ls | grep -E "ceph03.*sde"
> >
> > None of the above commands return anything. Cephadm correctly sees 8
> > remaining OSDs on the host:
> >
> > root@ceph01:/# ceph orch ls | grep ceph03_c
> > osd.ceph03_combined_osd 8  33s ago2y   ceph03
> >
> > Unfortunately, cephadm appears to be trying to apply a spec to host
> ceph03
> > including the disk that is now missing:
> >
> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
> > --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
> > --privileged --group-add=disk --init -e CONTAINER_IMAGE=
> >
> quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
> > -e NODE_NAME=ceph03 -e CEPH_USE_RANDOM_NONCE=1 -e
> > CEPH_VOLUME_OSDSPEC_AFFINITY=ceph03_combined_osd -e
> > CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> > /var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z -v
> > /var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z -v
> >
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z
> > -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> > /tmp/ceph-tmpc7b33pf0:/etc/ceph/ceph.conf:z -v
> > /tmp/ceph-tmpq45nkmd6:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
> >
> quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
> > lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
> > /dev/sdg /dev/sdh /dev/sdi --db-devices /dev/nvme0n1 /dev/nvme1n1 --yes
> > --no-systemd
> >
> > Note that `lvm batch` includes the missing drive, /dev/sde. This fails
> > because the drive no longer exists. Other than this cephadm ceph-volume
> > thingy, the cluster is healthy.How can I tell cephadm that it should stop
> > trying to use /dev/sde, which no longer exists, without affecting other
> > OSDs on the host?
> >
> > I would very much appreciate any advice or pointers.
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm Failed to apply 1 service(s)

2024-02-16 Thread Zakhar Kirpichenko
Hi,

We had a physical drive malfunction in one of our Ceph OSD hosts managed by
cephadm (Ceph 16.2.14). I have removed the drive from the system, and the
kernel no longer sees it:

ceph03 ~]# ls -al /dev/sde
ls: cannot access '/dev/sde': No such file or directory

I have removed the corresponding OSD from cephadm, crush map, etc. For all
intents and purposes that OSD and its block device no longer exist:

root@ceph01:/# ceph orch ps | grep osd.26
root@ceph01:/# ceph osd tree| grep 26
root@ceph01:/# ceph orch device ls | grep -E "ceph03.*sde"

None of the above commands return anything. Cephadm correctly sees 8
remaining OSDs on the host:

root@ceph01:/# ceph orch ls | grep ceph03_c
osd.ceph03_combined_osd 8  33s ago2y   ceph03

Unfortunately, cephadm appears to be trying to apply a spec to host ceph03
including the disk that is now missing:

RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
--privileged --group-add=disk --init -e CONTAINER_IMAGE=
quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
-e NODE_NAME=ceph03 -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_OSDSPEC_AFFINITY=ceph03_combined_osd -e
CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
/var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z -v
/var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z -v
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
/tmp/ceph-tmpc7b33pf0:/etc/ceph/ceph.conf:z -v
/tmp/ceph-tmpq45nkmd6:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
quay.io/ceph/ceph@sha256:843f112990e6489362c625229c3ea3d90b8734bd5e14e0aeaf89942fbb980a8b
lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi --db-devices /dev/nvme0n1 /dev/nvme1n1 --yes
--no-systemd

Note that `lvm batch` includes the missing drive, /dev/sde. This fails
because the drive no longer exists. Other than this cephadm ceph-volume
thingy, the cluster is healthy.How can I tell cephadm that it should stop
trying to use /dev/sde, which no longer exists, without affecting other
OSDs on the host?

I would very much appreciate any advice or pointers.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-07 Thread Zakhar Kirpichenko
Indeed, it looks like it's been recently reopened. Thanks for this!

/Z

On Wed, 7 Feb 2024 at 15:43, David Orman  wrote:

> That tracker's last update indicates it's slated for inclusion.
>
> On Thu, Feb 1, 2024, at 10:47, Zakhar Kirpichenko wrote:
> > Hi,
> >
> > Please consider not leaving this behind:
> https://github.com/ceph/ceph/pull/55109
> >
> > It's a serious bug, which potentially affects a whole node stability if
> > the affected mgr is colocated with OSDs. The bug was known for quite a
> > while and really shouldn't be left unfixed.
> >
> > /Z
> >
> > On Thu, 1 Feb 2024 at 18:45, Nizamudeen A  wrote:
> >> Thanks Laura,
> >>
> >> Raised a PR for  https://tracker.ceph.com/issues/57386
> >> https://github.com/ceph/ceph/pull/55415
> >>
> >>
> >> On Thu, Feb 1, 2024 at 5:15 AM Laura Flores  wrote:
> >>
> >> > I reviewed the rados suite. @Adam King ,
> @Nizamudeen A
> >> >  would appreciate a look from you, as there are some
> >> > orchestrator and dashboard trackers that came up.
> >> >
> >> > pacific-release, 16.2.15
> >> >
> >> > Failures:
> >> > 1. https://tracker.ceph.com/issues/62225
> >> > 2. https://tracker.ceph.com/issues/64278
> >> > 3. https://tracker.ceph.com/issues/58659
> >> > 4. https://tracker.ceph.com/issues/58658
> >> > 5. https://tracker.ceph.com/issues/64280 -- new tracker, worth a
> look
> >> > from Orch
> >> > 6. https://tracker.ceph.com/issues/63577
> >> > 7. https://tracker.ceph.com/issues/63894
> >> > 8. https://tracker.ceph.com/issues/64126
> >> > 9. https://tracker.ceph.com/issues/63887
> >> > 10. https://tracker.ceph.com/issues/61602
> >> > 11. https://tracker.ceph.com/issues/54071
> >> > 12. https://tracker.ceph.com/issues/57386
> >> > 13. https://tracker.ceph.com/issues/64281
> >> > 14. https://tracker.ceph.com/issues/49287
> >> >
> >> > Details:
> >> > 1. pacific upgrade test fails on 'ceph versions | jq -e' command -
> >> > Ceph - RADOS
> >> > 2. Unable to update caps for client.iscsi.iscsi.a - Ceph -
> Orchestrator
> >> > 3. mds_upgrade_sequence: failure when deploying node-exporter -
> Ceph -
> >> > Orchestrator
> >> > 4. mds_upgrade_sequence: Error: initializing source
> >> > docker://prom/alertmanager:v0.20.0 - Ceph - Orchestrator
> >> > 5. mgr-nfs-upgrade test times out from failed cephadm daemons -
> Ceph -
> >> > Orchestrator
> >> > 6. cephadm: docker.io/library/haproxy: toomanyrequests: You have
> >> > reached your pull rate limit. You may increase the limit by
> authenticating
> >> > and upgrading: https://www.docker.com/increase-rate-limit - Ceph -
> >> > Orchestrator
> >> > 7. qa: cephadm failed with an error code 1, alertmanager
> container not
> >> > found. - Ceph - Orchestrator
> >> > 8. ceph-iscsi build was retriggered and now missing
> >> > package_manager_version attribute - Ceph
> >> > 9. Starting alertmanager fails from missing container - Ceph -
> >> > Orchestrator
> >> > 10. pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s)
> do
> >> > not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
> >> > 11. rados/cephadm/osds: Invalid command: missing required
> parameter
> >> > hostname() - Ceph - Orchestrator
> >> > 12. cephadm/test_dashboard_e2e.sh: Expected to find content:
> '/^foo$/'
> >> > within the selector: 'cd-modal .badge' but never did - Ceph - Mgr -
> >> > Dashboard
> >> > 13. Failed to download key at
> >> > http://download.ceph.com/keys/autobuild.asc: Request failed:  >> > error [Errno 101] Network is unreachable> - Infrastructure
> >> > 14. podman: setting cgroup config for procHooks process caused:
> Unit
> >> > libpod-$hash.scope not found - Ceph - Orchestrator
> >> >
> >> > On Wed, Jan 31, 2024 at 1:41 PM Casey Bodley 
> wrote:
> >> >
> >> >> On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein 
> >> >> wrote:
> >> >> >
> >> >> > Details of this release are summarized here:
> >> >> >
> >> >> > https://tracker.ceph.com/iss

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-01 Thread Zakhar Kirpichenko
Hi,

Please consider not leaving this behind:
https://github.com/ceph/ceph/pull/55109

It's a serious bug, which potentially affects a whole node stability if the
affected mgr is colocated with OSDs. The bug was known for quite a while
and really shouldn't be left unfixed.

/Z

On Thu, 1 Feb 2024 at 18:45, Nizamudeen A  wrote:

> Thanks Laura,
>
> Raised a PR for  https://tracker.ceph.com/issues/57386
> https://github.com/ceph/ceph/pull/55415
>
>
> On Thu, Feb 1, 2024 at 5:15 AM Laura Flores  wrote:
>
> > I reviewed the rados suite. @Adam King , @Nizamudeen
> A
> >  would appreciate a look from you, as there are some
> > orchestrator and dashboard trackers that came up.
> >
> > pacific-release, 16.2.15
> >
> > Failures:
> > 1. https://tracker.ceph.com/issues/62225
> > 2. https://tracker.ceph.com/issues/64278
> > 3. https://tracker.ceph.com/issues/58659
> > 4. https://tracker.ceph.com/issues/58658
> > 5. https://tracker.ceph.com/issues/64280 -- new tracker, worth a
> look
> > from Orch
> > 6. https://tracker.ceph.com/issues/63577
> > 7. https://tracker.ceph.com/issues/63894
> > 8. https://tracker.ceph.com/issues/64126
> > 9. https://tracker.ceph.com/issues/63887
> > 10. https://tracker.ceph.com/issues/61602
> > 11. https://tracker.ceph.com/issues/54071
> > 12. https://tracker.ceph.com/issues/57386
> > 13. https://tracker.ceph.com/issues/64281
> > 14. https://tracker.ceph.com/issues/49287
> >
> > Details:
> > 1. pacific upgrade test fails on 'ceph versions | jq -e' command -
> > Ceph - RADOS
> > 2. Unable to update caps for client.iscsi.iscsi.a - Ceph -
> Orchestrator
> > 3. mds_upgrade_sequence: failure when deploying node-exporter - Ceph
> -
> > Orchestrator
> > 4. mds_upgrade_sequence: Error: initializing source
> > docker://prom/alertmanager:v0.20.0 - Ceph - Orchestrator
> > 5. mgr-nfs-upgrade test times out from failed cephadm daemons - Ceph
> -
> > Orchestrator
> > 6. cephadm: docker.io/library/haproxy: toomanyrequests: You have
> > reached your pull rate limit. You may increase the limit by
> authenticating
> > and upgrading: https://www.docker.com/increase-rate-limit - Ceph -
> > Orchestrator
> > 7. qa: cephadm failed with an error code 1, alertmanager container
> not
> > found. - Ceph - Orchestrator
> > 8. ceph-iscsi build was retriggered and now missing
> > package_manager_version attribute - Ceph
> > 9. Starting alertmanager fails from missing container - Ceph -
> > Orchestrator
> > 10. pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do
> > not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
> > 11. rados/cephadm/osds: Invalid command: missing required parameter
> > hostname() - Ceph - Orchestrator
> > 12. cephadm/test_dashboard_e2e.sh: Expected to find content:
> '/^foo$/'
> > within the selector: 'cd-modal .badge' but never did - Ceph - Mgr -
> > Dashboard
> > 13. Failed to download key at
> > http://download.ceph.com/keys/autobuild.asc: Request failed:  > error [Errno 101] Network is unreachable> - Infrastructure
> > 14. podman: setting cgroup config for procHooks process caused: Unit
> > libpod-$hash.scope not found - Ceph - Orchestrator
> >
> > On Wed, Jan 31, 2024 at 1:41 PM Casey Bodley  wrote:
> >
> >> On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein 
> >> wrote:
> >> >
> >> > Details of this release are summarized here:
> >> >
> >> > https://tracker.ceph.com/issues/64151#note-1
> >> >
> >> > Seeking approvals/reviews for:
> >> >
> >> > rados - Radek, Laura, Travis, Ernesto, Adam King
> >> > rgw - Casey
> >>
> >> rgw approved, thanks
> >>
> >> > fs - Venky
> >> > rbd - Ilya
> >> > krbd - in progress
> >> >
> >> > upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed)
> >> > upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed)
> >> >
> >> > upgrade/pacific-x (quincy) - in progress
> >> > upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)
> >> >
> >> > ceph-volume - Guillaume
> >> >
> >> > TIA
> >> > YuriW
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >
> >> ___
> >> Dev mailing list -- d...@ceph.io
> >> To unsubscribe send an email to dev-le...@ceph.io
> >>
> >
> >
> > --
> >
> > Laura Flores
> >
> > She/Her/Hers
> >
> > Software Engineer, Ceph Storage 
> >
> > Chicago, IL
> >
> > lflo...@ibm.com | lflo...@redhat.com 
> > M: +17087388804
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD reported Slow operations

2024-01-28 Thread Zakhar Kirpichenko
Hi,

You have 67 TB of raw space available. With a replication factor of 3,
which is what you seem to be using, that is ~22 TB usable space under ideal
conditions.

MAX AVAIL column shows the available space, taking into account the raw
space, the replication factor and the CRUSH map, before the first OSD
becomes full. In other words, because of the way the data is distributed
across the OSDs in your cluster, you won't be able to utilize the whole 22
TB because one or more OSDs will get full if you write another 16 TB of
data to your pools.



/Z

On Sun, 28 Jan 2024 at 18:36, V A Prabha  wrote:

> Hi
> Just a continuation of this mail, Could you help me out to understand the
> ceph df output. PFA the screenshot with this mail.
>
> 1. Raw storage is 180 TB
>
> 2. Stored Value is 37 TB
>
> 3. Used Value is 112 TB
>
> 4. Available Value is 67 TB
>
> 5. Pool Max Available Value is 16 TB
> Though the Available Value is still 67 TB how can it be utilized for the
> pools?
> On November 6, 2023 at 4:18 PM V A Prabha  wrote:
>
> Please clarify my query.
> I had 700+ volumes  (220 applications) running in 36 OSDs when it reported
> the slow operations. Due to emergency, we migrated 200+ VMs to another
> virtualization environment. So we have shutdown all the related VMs in our
> Openstack production setup running with Ceph.
> We have not deleted the 200+ volumes from Ceph as waiting for the
> concurrence from the departments.
> My query is that even when the applications are down, does the volume at
> the backend make our cluster busy when there is no active transactions?
> Is there any parameter in the ceph osd log and ceph mon log that gives me
> the clue for the cluster business?
> Is there any zombie or unwanted process that make the ceph cluster busy or
> the IOPS budget of the disk that makes the cluster busy?
>
>
> On November 4, 2023 at 4:29 PM Zakhar Kirpichenko 
> wrote:
>
> You have an IOPS budget, i.e. how much I/O your spinners can deliver.
> Space utilization doesn't affect it much.
>
> You can try disabling write (not read!) cache on your HDDs with sdparm
> (for example, sdparm -c WCE /dev/bla); in my experience this allows HDDs to
> deliver 50-100% more write IOPS. If there is lots of free RAM on the OSD
> nodes, you can play with osd_memory_target and bluestore_cache_size_hdd OSD
> options; be careful though: depending on you workload, the performance
> impact may be insignificant, but your OSDs may run out of memory.
>
> /z
>
> On Sat, 4 Nov 2023 at 12:04, V A Prabha < prab...@cdac.in> wrote:
>
> Now in this situation how can stabilize my production setup as you have
> mentioned the cluster is very busy.
> Is there any configuration parameter tuning will help or the only option
> is to reduce the applications running on the cluster.
> Though if I have free available storage of 1.6 TB free in each of my OSD,
> that will not help in my IOPS issue right?
> Please guide me
>
> On November 2, 2023 at 12:47 PM Zakhar Kirpichenko < zak...@gmail.com>
> wrote:
>
> >1. The calculated IOPS is for the rw operation right ?
>
> Total drive IOPS, read or write. Depending on the exact drive models, it
> may be lower or higher than 200. I took the average for a smaller sized
> 7.2k rpm SAS drive. Modern drives usually deliver lower read IOPS and
> higher write IOPS.
>
> >2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
>
> You have almost 3k IOPS and your OSDs report slow ops. I'd say the cluster
> is busy, as in loaded with I/O, perhaps more I/O than it can handle well.
>
> >3. Nodes are not balanced?  you mean to say that the count of OSDs in
> each server differs. But we have enabled autoscale and optimal distribution
> so that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
>
> Yes, the OSD count differs. This means that the CPU, memory usage, network
> load and latency differ per node and may cause performance variations,
> depending on your workload.
>
> /Z
>
> On Thu, 2 Nov 2023 at 08:18, V A Prabha < prab...@cdac.in> wrote:
>
> Thanks for your prompt reply ..
> But the query is
> 1.The calculated IOPS is for the rw operation right ?
> 2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
> 3. Nodes are not balanced?  you mean to say that the count of OSDs in each
> server differs. But we have enabled autoscale and optimal distribution so
> that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
> Correct me if my queries a

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2024-01-24 Thread Zakhar Kirpichenko
I have to say that not including a fix for a serious issue into the last
minor release of Pacific is a rather odd decision.

/Z

On Thu, 25 Jan 2024 at 09:00, Konstantin Shalygin  wrote:

> Hi,
>
> The backport to pacific was rejected [1], you may switch to reef, when [2]
> merged and released
>
>
> [1] https://github.com/ceph/ceph/pull/55109
> [2] https://github.com/ceph/ceph/pull/55110
>
> k
> Sent from my iPhone
>
> > On Jan 25, 2024, at 04:12, changzhi tan <544463...@qq.com> wrote:
> >
> > Is there any way to solve this problem?thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2024-01-24 Thread Zakhar Kirpichenko
I found that quickly restarting the affected mgr every 2 days is an okay
kludge. It takes less than a second to restart, and never grows to
dangerous sizes which is when it randomly starts ballooning.

/Z

On Thu, 25 Jan 2024, 03:12 changzhi tan, <544463...@qq.com> wrote:

> Is there any way to solve this problem?thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-12-18 Thread Zakhar Kirpichenko
Hi,

Today after 3 weeks of normal operation the mgr reached memory usage of
1600 MB, quickly ballooned to over 100 GB for no apparent reason and got
oom-killed again. There were no suspicious messages in the logs until the
message indicating that the mgr failed to allocate more memory. Any
thoughts?

/Z

On Mon, 11 Dec 2023 at 12:34, Zakhar Kirpichenko  wrote:

> Hi,
>
> Another update: after 2 more weeks the mgr process grew to ~1.5 GB, which
> again was expected:
>
> mgr.ceph01.vankui ceph01  *:8443,9283  running (2w)102s ago   2y
>  1519M-  16.2.14  fc0182d6cda5  3451f8c6c07e
> mgr.ceph02.shsinf ceph02  *:8443,9283  running (2w)102s ago   7M
>   112M-  16.2.14  fc0182d6cda5  1c3d2d83b6df
>
> The cluster is healthy and operating normally, the mgr process is growing
> slowly. It's still unclear what caused the ballooning and OOM issue under
> very similar conditions.
>
> /Z
>
> On Sat, 25 Nov 2023 at 08:31, Zakhar Kirpichenko  wrote:
>
>> Hi,
>>
>> A small update: after disabling 'progress' module the active mgr (on
>> ceph01) used up ~1.3 GB of memory in 3 days, which was expected:
>>
>> mgr.ceph01.vankui ceph01  *:8443,9283  running (3d)  9m ago   2y
>>1284M-  16.2.14  fc0182d6cda5  3451f8c6c07e
>> mgr.ceph02.shsinf ceph02  *:8443,9283  running (3d)  9m ago   7M
>> 374M-  16.2.14  fc0182d6cda5  1c3d2d83b6df
>>
>> The cluster is healthy and operating normally. The mgr process is growing
>> slowly, at roughly about 1-2 MB per 10 minutes give or take, which is not
>> quick enough to balloon to over 100 GB RSS over several days, which likely
>> means that whatever triggers the issue happens randomly and quite suddenly.
>> I'll continue monitoring the mgr and get back with more observations.
>>
>> /Z
>>
>> On Wed, 22 Nov 2023 at 16:33, Zakhar Kirpichenko 
>> wrote:
>>
>>> Thanks for this. This looks similar to what we're observing. Although we
>>> don't use the API apart from the usage by Ceph deployment itself - which I
>>> guess still counts.
>>>
>>> /Z
>>>
>>> On Wed, 22 Nov 2023, 15:22 Adrien Georget, 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> This memory leak with ceph-mgr seems to be due to a change in Ceph
>>>> 16.2.12.
>>>> Check this issue : https://tracker.ceph.com/issues/59580
>>>> We are also affected by this, with or without containerized services.
>>>>
>>>> Cheers,
>>>> Adrien
>>>>
>>>> Le 22/11/2023 à 14:14, Eugen Block a écrit :
>>>> > One other difference is you use docker, right? We use podman, could
>>>> it
>>>> > be some docker restriction?
>>>> >
>>>> > Zitat von Zakhar Kirpichenko :
>>>> >
>>>> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node
>>>> has
>>>> >> 384
>>>> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
>>>> >> memory,
>>>> >> give or take, is available (mostly used by page cache) on each node
>>>> >> during
>>>> >> normal operation. Nothing unusual there, tbh.
>>>> >>
>>>> >> No unusual mgr modules or settings either, except for disabled
>>>> progress:
>>>> >>
>>>> >> {
>>>> >> "always_on_modules": [
>>>> >> "balancer",
>>>> >> "crash",
>>>> >> "devicehealth",
>>>> >> "orchestrator",
>>>> >> "pg_autoscaler",
>>>> >> "progress",
>>>> >> "rbd_support",
>>>> >> "status",
>>>> >> "telemetry",
>>>> >> "volumes"
>>>> >> ],
>>>> >> "enabled_modules": [
>>>> >> "cephadm",
>>>> >> "dashboard",
>>>> >> "iostat",
>>>> >> "prometheus",
>>>> >> "restful"
>>>> >> ],
>>>> >>
>>>> >> /Z
>>>> >>
>>>> >> On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
>>>> >>
>

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-12-11 Thread Zakhar Kirpichenko
Hi,

Another update: after 2 more weeks the mgr process grew to ~1.5 GB, which
again was expected:

mgr.ceph01.vankui ceph01  *:8443,9283  running (2w)102s ago   2y
 1519M-  16.2.14  fc0182d6cda5  3451f8c6c07e
mgr.ceph02.shsinf ceph02  *:8443,9283  running (2w)102s ago   7M
  112M-  16.2.14  fc0182d6cda5  1c3d2d83b6df

The cluster is healthy and operating normally, the mgr process is growing
slowly. It's still unclear what caused the ballooning and OOM issue under
very similar conditions.

/Z

On Sat, 25 Nov 2023 at 08:31, Zakhar Kirpichenko  wrote:

> Hi,
>
> A small update: after disabling 'progress' module the active mgr (on
> ceph01) used up ~1.3 GB of memory in 3 days, which was expected:
>
> mgr.ceph01.vankui ceph01  *:8443,9283  running (3d)  9m ago   2y
>  1284M-  16.2.14  fc0182d6cda5  3451f8c6c07e
> mgr.ceph02.shsinf ceph02  *:8443,9283  running (3d)  9m ago   7M
>   374M-  16.2.14  fc0182d6cda5  1c3d2d83b6df
>
> The cluster is healthy and operating normally. The mgr process is growing
> slowly, at roughly about 1-2 MB per 10 minutes give or take, which is not
> quick enough to balloon to over 100 GB RSS over several days, which likely
> means that whatever triggers the issue happens randomly and quite suddenly.
> I'll continue monitoring the mgr and get back with more observations.
>
> /Z
>
> On Wed, 22 Nov 2023 at 16:33, Zakhar Kirpichenko  wrote:
>
>> Thanks for this. This looks similar to what we're observing. Although we
>> don't use the API apart from the usage by Ceph deployment itself - which I
>> guess still counts.
>>
>> /Z
>>
>> On Wed, 22 Nov 2023, 15:22 Adrien Georget, 
>> wrote:
>>
>>> Hi,
>>>
>>> This memory leak with ceph-mgr seems to be due to a change in Ceph
>>> 16.2.12.
>>> Check this issue : https://tracker.ceph.com/issues/59580
>>> We are also affected by this, with or without containerized services.
>>>
>>> Cheers,
>>> Adrien
>>>
>>> Le 22/11/2023 à 14:14, Eugen Block a écrit :
>>> > One other difference is you use docker, right? We use podman, could it
>>> > be some docker restriction?
>>> >
>>> > Zitat von Zakhar Kirpichenko :
>>> >
>>> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has
>>> >> 384
>>> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
>>> >> memory,
>>> >> give or take, is available (mostly used by page cache) on each node
>>> >> during
>>> >> normal operation. Nothing unusual there, tbh.
>>> >>
>>> >> No unusual mgr modules or settings either, except for disabled
>>> progress:
>>> >>
>>> >> {
>>> >> "always_on_modules": [
>>> >> "balancer",
>>> >> "crash",
>>> >> "devicehealth",
>>> >> "orchestrator",
>>> >> "pg_autoscaler",
>>> >> "progress",
>>> >> "rbd_support",
>>> >> "status",
>>> >> "telemetry",
>>> >> "volumes"
>>> >> ],
>>> >> "enabled_modules": [
>>> >> "cephadm",
>>> >> "dashboard",
>>> >> "iostat",
>>> >> "prometheus",
>>> >> "restful"
>>> >> ],
>>> >>
>>> >> /Z
>>> >>
>>> >> On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
>>> >>
>>> >>> What does your hardware look like memory-wise? Just for comparison,
>>> >>> one customer cluster has 4,5 GB in use (middle-sized cluster for
>>> >>> openstack, 280 OSDs):
>>> >>>
>>> >>>  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM
>>> TIME+
>>> >>> COMMAND
>>> >>> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
>>> >>> 57022:54 ceph-mgr
>>> >>>
>>> >>> In our own cluster (smaller than that and not really heavily used)
>>> the
>>> >>> mgr uses almost 2 GB. So those numbers you have seem relatively
>>> small.
>>> >>>
>>> &

[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-05 Thread Zakhar Kirpichenko
Thank you, Tyler. Unfortunately (or fortunately?) the drive is fine in this
case: there were no errors reported by the kernel at the time, and I
successfully managed to run a bunch of tests on the drive for many hours
before rebooting the host. The drive has worked without any issues for 3
days now.

I've already checked the file descriptor numbers, the defaults already are
very high and the usage is rather low.

/Z

On Wed, 6 Dec 2023 at 03:24, Tyler Stachecki 
wrote:

> On Tue, Dec 5, 2023 at 10:13 AM Zakhar Kirpichenko 
> wrote:
> >
> > Any input from anyone?
> >
> > /Z
>
> IIt's not clear whether or not these issues are related. I see three
> things in this e-mail chain:
> 1) bdev() _aio_thread with EPERM, as in the subject of this e-mail chain
> 2) bdev() _aio_thread with the I/O error condition (see [1], which is
> a *slightly* different if/else switch than the EPERM)
> 3) The tracker, which seems related to BlueFS (?):
> https://tracker.ceph.com/issues/53906
>
> Shooting in a dark a bit... but for the EPERM issue you mention, maybe
> try raising the number of file descriptors available to the
> process/container/system? Not sure why else would you get EPERM in
> this context.
>
> On 2) I have definitely seen that happen before and it's always
> matched with an I/O error reported by the kernel in `dmesg` output.
> Sometimes the drive keeps going along fine after a restart of the
> OSD/reboot of the system, sometimes not.
>
> Cheers,
> Tyler
>
> >
> > On Mon, 4 Dec 2023 at 12:52, Zakhar Kirpichenko 
> wrote:
> >
> > > Hi,
> > >
> > > Just to reiterate, I'm referring to an OSD crash loop because of the
> > > following error:
> > >
> > > "2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
> > > /var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation
> not
> > > permitted)". More relevant log entries: https://pastebin.com/gDat6rfk
> > >
> > > The crash log suggested that there could be a hardware issue but there
> was
> > > none, I was able to access the block device for testing purposes
> without
> > > any issues, and the problem went away after I rebooted the host, this
> OSD
> > > is currently operating without any issues under load.
> > >
> > > Any ideas?
> > >
> > > /Z
> > >
> > > On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko 
> wrote:
> > >
> > >> Thanks! The bug I referenced is the reason for the 1st OSD crash, but
> not
> > >> for the subsequent crashes. The reason for those is described where
> you
> > >> . I'm asking for help with that one.
> > >>
> > >> /Z
> > >>
> > >> On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad 
> > >> wrote:
> > >>
> > >>> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote:
> > >>> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded
> > >>> >https://tracker.ceph.com/issues/53906 bug.
> > >>>
> > >>> 
> > >>>
> > >>> >It would be good to understand what has triggered this condition and
> > >>> how it
> > >>> >can be resolved without rebooting the whole host. I would very much
> > >>> >appreciate any suggestions.
> > >>>
> > >>> If you look closely at 53906 you'll see it's a duplicate of
> > >>> https://tracker.ceph.com/issues/53907
> > >>>
> > >>> In there you have the fix and a workaround until next minor is
> released.
> > >>>
> > >>> --
> > >>> Kai Stian Olstad
> > >>>
> > >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-05 Thread Zakhar Kirpichenko
Any input from anyone?

/Z

On Mon, 4 Dec 2023 at 12:52, Zakhar Kirpichenko  wrote:

> Hi,
>
> Just to reiterate, I'm referring to an OSD crash loop because of the
> following error:
>
> "2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
> /var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not
> permitted)". More relevant log entries: https://pastebin.com/gDat6rfk
>
> The crash log suggested that there could be a hardware issue but there was
> none, I was able to access the block device for testing purposes without
> any issues, and the problem went away after I rebooted the host, this OSD
> is currently operating without any issues under load.
>
> Any ideas?
>
> /Z
>
> On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko  wrote:
>
>> Thanks! The bug I referenced is the reason for the 1st OSD crash, but not
>> for the subsequent crashes. The reason for those is described where you
>> . I'm asking for help with that one.
>>
>> /Z
>>
>> On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad 
>> wrote:
>>
>>> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote:
>>> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded
>>> >https://tracker.ceph.com/issues/53906 bug.
>>>
>>> 
>>>
>>> >It would be good to understand what has triggered this condition and
>>> how it
>>> >can be resolved without rebooting the whole host. I would very much
>>> >appreciate any suggestions.
>>>
>>> If you look closely at 53906 you'll see it's a duplicate of
>>> https://tracker.ceph.com/issues/53907
>>>
>>> In there you have the fix and a workaround until next minor is released.
>>>
>>> --
>>> Kai Stian Olstad
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-04 Thread Zakhar Kirpichenko
Hi,

Just to reiterate, I'm referring to an OSD crash loop because of the
following error:

"2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
/var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not
permitted)". More relevant log entries: https://pastebin.com/gDat6rfk

The crash log suggested that there could be a hardware issue but there was
none, I was able to access the block device for testing purposes without
any issues, and the problem went away after I rebooted the host, this OSD
is currently operating without any issues under load.

Any ideas?

/Z

On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko  wrote:

> Thanks! The bug I referenced is the reason for the 1st OSD crash, but not
> for the subsequent crashes. The reason for those is described where you
> . I'm asking for help with that one.
>
> /Z
>
> On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad 
> wrote:
>
>> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote:
>> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded
>> >https://tracker.ceph.com/issues/53906 bug.
>>
>> 
>>
>> >It would be good to understand what has triggered this condition and how
>> it
>> >can be resolved without rebooting the whole host. I would very much
>> >appreciate any suggestions.
>>
>> If you look closely at 53906 you'll see it's a duplicate of
>> https://tracker.ceph.com/issues/53907
>>
>> In there you have the fix and a workaround until next minor is released.
>>
>> --
>> Kai Stian Olstad
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-03 Thread Zakhar Kirpichenko
Thanks! The bug I referenced is the reason for the 1st OSD crash, but not
for the subsequent crashes. The reason for those is described where you
. I'm asking for help with that one.

/Z

On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad  wrote:

> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote:
> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded
> >https://tracker.ceph.com/issues/53906 bug.
>
> 
>
> >It would be good to understand what has triggered this condition and how
> it
> >can be resolved without rebooting the whole host. I would very much
> >appreciate any suggestions.
>
> If you look closely at 53906 you'll see it's a duplicate of
> https://tracker.ceph.com/issues/53907
>
> In there you have the fix and a workaround until next minor is released.
>
> --
> Kai Stian Olstad
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-02 Thread Zakhar Kirpichenko
Hi,

One of our 16.2.14 cluster OSDs crashed again because of the dreaded
https://tracker.ceph.com/issues/53906 bug. Usually an OSD, which crashed
because of this bug, restarts within seconds and continues normal
operation. This time it failed to restart and kept crashing:

"assert_condition": "abort",
"assert_file":
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/blk/kernel/KernelDevice.cc",
"assert_func": "void KernelDevice::_aio_thread()",
"assert_line": 604,
"assert_msg":
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/blk/kernel/KernelDevice.cc:
In function 'void KernelDevice::_aio_thread()' thread 7f08520e2700 time
2023-12-03T04:00:36.689614+\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/blk/kernel/KernelDevice.cc:
604: ceph_abort_msg(\"Unexpected IO error. This may suggest HW issue.
Please check your dmesg!\")\n",
"assert_thread_name": "bstore_aio",
"backtrace": [
"/lib64/libpthread.so.0(+0x12cf0) [0x7f085e308cf0]",
"gsignal()",
"abort()",
"(ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string,
std::allocator > const&)+0x1b6) [0x55f01d9494cb]",
"(KernelDevice::_aio_thread()+0x1285) [0x55f01e4b5c15]",
"(KernelDevice::AioCompletionThread::entry()+0x11)
[0x55f01e4c0ee1]",
"/lib64/libpthread.so.0(+0x81ca) [0x7f085e2fe1ca]",
"clone()"
],

There was nothing in dmesg though and the block device looked healthy. I
took the OSD down, ran a long SMART test on its block drive, ran a read
test on the drive and found no issues. I tried restarting the OSD again and
found in its debug that it failed because of an
"2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
/var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not
permitted)" error: https://pastebin.com/gDat6rfk

I remember hitting this previously:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GYL72G3F4PPCSWG5STQ7WLUXTNNI676S/,
and this time a host reboot completely resolved the issue.

It would be good to understand what has triggered this condition and how it
can be resolved without rebooting the whole host. I would very much
appreciate any suggestions.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-24 Thread Zakhar Kirpichenko
Hi,

A small update: after disabling 'progress' module the active mgr (on
ceph01) used up ~1.3 GB of memory in 3 days, which was expected:

mgr.ceph01.vankui ceph01  *:8443,9283  running (3d)  9m ago   2y
 1284M-  16.2.14  fc0182d6cda5  3451f8c6c07e
mgr.ceph02.shsinf ceph02  *:8443,9283  running (3d)  9m ago   7M
  374M-  16.2.14  fc0182d6cda5  1c3d2d83b6df

The cluster is healthy and operating normally. The mgr process is growing
slowly, at roughly about 1-2 MB per 10 minutes give or take, which is not
quick enough to balloon to over 100 GB RSS over several days, which likely
means that whatever triggers the issue happens randomly and quite suddenly.
I'll continue monitoring the mgr and get back with more observations.

/Z

On Wed, 22 Nov 2023 at 16:33, Zakhar Kirpichenko  wrote:

> Thanks for this. This looks similar to what we're observing. Although we
> don't use the API apart from the usage by Ceph deployment itself - which I
> guess still counts.
>
> /Z
>
> On Wed, 22 Nov 2023, 15:22 Adrien Georget, 
> wrote:
>
>> Hi,
>>
>> This memory leak with ceph-mgr seems to be due to a change in Ceph
>> 16.2.12.
>> Check this issue : https://tracker.ceph.com/issues/59580
>> We are also affected by this, with or without containerized services.
>>
>> Cheers,
>> Adrien
>>
>> Le 22/11/2023 à 14:14, Eugen Block a écrit :
>> > One other difference is you use docker, right? We use podman, could it
>> > be some docker restriction?
>> >
>> > Zitat von Zakhar Kirpichenko :
>> >
>> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has
>> >> 384
>> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
>> >> memory,
>> >> give or take, is available (mostly used by page cache) on each node
>> >> during
>> >> normal operation. Nothing unusual there, tbh.
>> >>
>> >> No unusual mgr modules or settings either, except for disabled
>> progress:
>> >>
>> >> {
>> >> "always_on_modules": [
>> >> "balancer",
>> >> "crash",
>> >> "devicehealth",
>> >> "orchestrator",
>> >> "pg_autoscaler",
>> >> "progress",
>> >> "rbd_support",
>> >> "status",
>> >> "telemetry",
>> >> "volumes"
>> >> ],
>> >> "enabled_modules": [
>> >> "cephadm",
>> >> "dashboard",
>> >> "iostat",
>> >> "prometheus",
>> >> "restful"
>> >> ],
>> >>
>> >> /Z
>> >>
>> >> On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
>> >>
>> >>> What does your hardware look like memory-wise? Just for comparison,
>> >>> one customer cluster has 4,5 GB in use (middle-sized cluster for
>> >>> openstack, 280 OSDs):
>> >>>
>> >>>  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>> >>> COMMAND
>> >>> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
>> >>> 57022:54 ceph-mgr
>> >>>
>> >>> In our own cluster (smaller than that and not really heavily used) the
>> >>> mgr uses almost 2 GB. So those numbers you have seem relatively small.
>> >>>
>> >>> Zitat von Zakhar Kirpichenko :
>> >>>
>> >>> > I've disabled the progress module entirely and will see how it goes.
>> >>> > Otherwise, mgr memory usage keeps increasing slowly, from past
>> >>> experience
>> >>> > it will stabilize at around 1.5-1.6 GB. Other than this event
>> >>> warning,
>> >>> it's
>> >>> > unclear what could have caused random memory ballooning.
>> >>> >
>> >>> > /Z
>> >>> >
>> >>> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
>> >>> >
>> >>> >> I see these progress messages all the time, I don't think they
>> cause
>> >>> >> it, but I might be wrong. You can disable it just to rule that out.
>> >>> >>
>> >>> >> Zitat von Zakhar Kirpichenko :
>> >>> >

[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Zakhar Kirpichenko
Hi,

Please note that there are cases where the use of ceph.conf inside a
container is justified. For example, I was unable to set monitor's
mon_rocksdb_options by any means except for providing them in monitor's own
ceph.conf within the container, all other attempts to pass this settings
were ignored by the monitor.

/Z

On Thu, 23 Nov 2023 at 16:53, Albert Shih  wrote:

> Le 23/11/2023 à 15:35:25+0100, Michel Jouvin a écrit
> Hi,
>
> >
> > You should never edit any file in the containers, cephadm takes care of
> it.
> > Most of the parameters described in the doc you mentioned are better
> managed
> > with "ceph config" command in the Ceph configuration database. If you
> want
> > to run the ceph commnand on a Ceph machine outside a container, you can
> add
>
> Ok. Of course I'll not touch anything inside any container, I juste check
> the overlay to see if the container use this file.
>
> It's just I see in lot of place in the documentation some configuration
> to put in the /etc/ceph/ceph.conf
>
> > the label _admin to your host in "ceph orch host" so that cephadm takes
> care
> > of maintaining your /etc/ceph.conf (outside the container).
>
> Ok. I'm indeed using ceph orch & Cie.
>
> Thanks.
>
> Regards.
>
> JAS
> --
> Albert SHIH 嶺 
> Observatoire de Paris
> France
> Heure locale/Local time:
> jeu. 23 nov. 2023 15:48:36 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Thanks for this. This looks similar to what we're observing. Although we
don't use the API apart from the usage by Ceph deployment itself - which I
guess still counts.

/Z

On Wed, 22 Nov 2023, 15:22 Adrien Georget, 
wrote:

> Hi,
>
> This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12.
> Check this issue : https://tracker.ceph.com/issues/59580
> We are also affected by this, with or without containerized services.
>
> Cheers,
> Adrien
>
> Le 22/11/2023 à 14:14, Eugen Block a écrit :
> > One other difference is you use docker, right? We use podman, could it
> > be some docker restriction?
> >
> > Zitat von Zakhar Kirpichenko :
> >
> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has
> >> 384
> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
> >> memory,
> >> give or take, is available (mostly used by page cache) on each node
> >> during
> >> normal operation. Nothing unusual there, tbh.
> >>
> >> No unusual mgr modules or settings either, except for disabled progress:
> >>
> >> {
> >> "always_on_modules": [
> >> "balancer",
> >> "crash",
> >> "devicehealth",
> >> "orchestrator",
> >> "pg_autoscaler",
> >> "progress",
> >> "rbd_support",
> >> "status",
> >> "telemetry",
> >> "volumes"
> >> ],
> >> "enabled_modules": [
> >> "cephadm",
> >> "dashboard",
> >> "iostat",
> >> "prometheus",
> >> "restful"
> >> ],
> >>
> >> /Z
> >>
> >> On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
> >>
> >>> What does your hardware look like memory-wise? Just for comparison,
> >>> one customer cluster has 4,5 GB in use (middle-sized cluster for
> >>> openstack, 280 OSDs):
> >>>
> >>>  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> >>> COMMAND
> >>> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
> >>> 57022:54 ceph-mgr
> >>>
> >>> In our own cluster (smaller than that and not really heavily used) the
> >>> mgr uses almost 2 GB. So those numbers you have seem relatively small.
> >>>
> >>> Zitat von Zakhar Kirpichenko :
> >>>
> >>> > I've disabled the progress module entirely and will see how it goes.
> >>> > Otherwise, mgr memory usage keeps increasing slowly, from past
> >>> experience
> >>> > it will stabilize at around 1.5-1.6 GB. Other than this event
> >>> warning,
> >>> it's
> >>> > unclear what could have caused random memory ballooning.
> >>> >
> >>> > /Z
> >>> >
> >>> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
> >>> >
> >>> >> I see these progress messages all the time, I don't think they cause
> >>> >> it, but I might be wrong. You can disable it just to rule that out.
> >>> >>
> >>> >> Zitat von Zakhar Kirpichenko :
> >>> >>
> >>> >> > Unfortunately, I don't have a full stack trace because there's no
> >>> crash
> >>> >> > when the mgr gets oom-killed. There's just the mgr log, which
> >>> looks
> >>> >> > completely normal until about 2-3 minutes before the oom-kill,
> >>> when
> >>> >> > tmalloc warnings show up.
> >>> >> >
> >>> >> > I'm not sure that it's the same issue that is described in the
> >>> tracker.
> >>> >> We
> >>> >> > seem to have some stale "events" in the progress module though:
> >>> >> >
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Yes, we use docker, though we haven't had any issues because of it. I don't
think that docker itself can cause mgr memory leaks.

/Z

On Wed, 22 Nov 2023, 15:14 Eugen Block,  wrote:

> One other difference is you use docker, right? We use podman, could it
> be some docker restriction?
>
> Zitat von Zakhar Kirpichenko :
>
> > It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
> > GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
> > give or take, is available (mostly used by page cache) on each node
> during
> > normal operation. Nothing unusual there, tbh.
> >
> > No unusual mgr modules or settings either, except for disabled progress:
> >
> > {
> > "always_on_modules": [
> > "balancer",
> > "crash",
> > "devicehealth",
> > "orchestrator",
> > "pg_autoscaler",
> > "progress",
> > "rbd_support",
> > "status",
> > "telemetry",
> > "volumes"
> > ],
> > "enabled_modules": [
> > "cephadm",
> > "dashboard",
> > "iostat",
> > "prometheus",
> > "restful"
> > ],
> >
> > /Z
> >
> > On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
> >
> >> What does your hardware look like memory-wise? Just for comparison,
> >> one customer cluster has 4,5 GB in use (middle-sized cluster for
> >> openstack, 280 OSDs):
> >>
> >>  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
> >> COMMAND
> >> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
> >> 57022:54 ceph-mgr
> >>
> >> In our own cluster (smaller than that and not really heavily used) the
> >> mgr uses almost 2 GB. So those numbers you have seem relatively small.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > I've disabled the progress module entirely and will see how it goes.
> >> > Otherwise, mgr memory usage keeps increasing slowly, from past
> experience
> >> > it will stabilize at around 1.5-1.6 GB. Other than this event warning,
> >> it's
> >> > unclear what could have caused random memory ballooning.
> >> >
> >> > /Z
> >> >
> >> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
> >> >
> >> >> I see these progress messages all the time, I don't think they cause
> >> >> it, but I might be wrong. You can disable it just to rule that out.
> >> >>
> >> >> Zitat von Zakhar Kirpichenko :
> >> >>
> >> >> > Unfortunately, I don't have a full stack trace because there's no
> >> crash
> >> >> > when the mgr gets oom-killed. There's just the mgr log, which looks
> >> >> > completely normal until about 2-3 minutes before the oom-kill, when
> >> >> > tmalloc warnings show up.
> >> >> >
> >> >> > I'm not sure that it's the same issue that is described in the
> >> tracker.
> >> >> We
> >> >> > seem to have some stale "events" in the progress module though:
> >> >> >
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
give or take, is available (mostly used by page cache) on each node during
normal operation. Nothing unusual there, tbh.

No unusual mgr modules or settings either, except for disabled progress:

{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
],

/Z

On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:

> What does your hardware look like memory-wise? Just for comparison,
> one customer cluster has 4,5 GB in use (middle-sized cluster for
> openstack, 280 OSDs):
>
>  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
> COMMAND
> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
> 57022:54 ceph-mgr
>
> In our own cluster (smaller than that and not really heavily used) the
> mgr uses almost 2 GB. So those numbers you have seem relatively small.
>
> Zitat von Zakhar Kirpichenko :
>
> > I've disabled the progress module entirely and will see how it goes.
> > Otherwise, mgr memory usage keeps increasing slowly, from past experience
> > it will stabilize at around 1.5-1.6 GB. Other than this event warning,
> it's
> > unclear what could have caused random memory ballooning.
> >
> > /Z
> >
> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
> >
> >> I see these progress messages all the time, I don't think they cause
> >> it, but I might be wrong. You can disable it just to rule that out.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Unfortunately, I don't have a full stack trace because there's no
> crash
> >> > when the mgr gets oom-killed. There's just the mgr log, which looks
> >> > completely normal until about 2-3 minutes before the oom-kill, when
> >> > tmalloc warnings show up.
> >> >
> >> > I'm not sure that it's the same issue that is described in the
> tracker.
> >> We
> >> > seem to have some stale "events" in the progress module though:
> >> >
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
> 2023-11-21T14:57:35.950+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
> >> >
> >> > I tried clearing them but they keep showing up. I am wondering if
> these
> >> > missing events can cause memory leaks over time.
> >> >
> >> > /Z
> >> >
> >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
> >> >
> >> >> Do you have the full stack trace? The pastebin only contains the
> >> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
> Maybe
> >> >> comment in the tracker issue directly 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
I've disabled the progress module entirely and will see how it goes.
Otherwise, mgr memory usage keeps increasing slowly, from past experience
it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's
unclear what could have caused random memory ballooning.

/Z

On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:

> I see these progress messages all the time, I don't think they cause
> it, but I might be wrong. You can disable it just to rule that out.
>
> Zitat von Zakhar Kirpichenko :
>
> > Unfortunately, I don't have a full stack trace because there's no crash
> > when the mgr gets oom-killed. There's just the mgr log, which looks
> > completely normal until about 2-3 minutes before the oom-kill, when
> > tmalloc warnings show up.
> >
> > I'm not sure that it's the same issue that is described in the tracker.
> We
> > seem to have some stale "events" in the progress module though:
> >
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> > Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
> >
> > I tried clearing them but they keep showing up. I am wondering if these
> > missing events can cause memory leaks over time.
> >
> > /Z
> >
> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
> >
> >> Do you have the full stack trace? The pastebin only contains the
> >> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
> >> comment in the tracker issue directly since Radek asked for someone
> >> with a similar problem in a newer release.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting
> >> > OOM-killed.
> >> >
> >> > It started happening in our cluster after the upgrade to 16.2.14. We
> >> > haven't had this issue with earlier Pacific releases.
> >> >
> >> > /Z
> >> >
> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
> >> >
> >> >> Just checking it on the phone, but isn’t this quite similar?
> >> >>
> >> >> https://tracker.ceph.com/issues/45136
> >> >>
> >> >> Zitat von Zakhar Kirpichenko :
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I'm facing a rather new issue with our Ceph cluster: from time to
> time
> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after
> consuming
> >> over
> >> >> > 100 GB RAM:
> >> >> >
> >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
> >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> >> >> > [  +0.10]  oom_kill_process.cold+0xb/0x10
> >> >> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss
> pgtables_bytes
> >> >> > swapents oom_score_adj name
> >> >> > [  +0.08]
> >> >> >
> >> >>
> >>
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
> >> >> > [  +0.000697] Out of me

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Unfortunately, I don't have a full stack trace because there's no crash
when the mgr gets oom-killed. There's just the mgr log, which looks
completely normal until about 2-3 minutes before the oom-kill, when
tmalloc warnings show up.

I'm not sure that it's the same issue that is described in the tracker. We
seem to have some stale "events" in the progress module though:

Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
44824331-3f6b-45c4-b925-423d098c3c76 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
0139bc54-ae42-4483-b278-851d77f23f9f does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
7f14d01c-498c-413f-b2ef-05521050190a does not exist
Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+
7f4bb19ef700  0 [progress WARNING root] complete: ev
48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist

I tried clearing them but they keep showing up. I am wondering if these
missing events can cause memory leaks over time.

/Z

On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:

> Do you have the full stack trace? The pastebin only contains the
> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
> comment in the tracker issue directly since Radek asked for someone
> with a similar problem in a newer release.
>
> Zitat von Zakhar Kirpichenko :
>
> > Thanks, Eugen. It is similar in the sense that the mgr is getting
> > OOM-killed.
> >
> > It started happening in our cluster after the upgrade to 16.2.14. We
> > haven't had this issue with earlier Pacific releases.
> >
> > /Z
> >
> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
> >
> >> Just checking it on the phone, but isn’t this quite similar?
> >>
> >> https://tracker.ceph.com/issues/45136
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Hi,
> >> >
> >> > I'm facing a rather new issue with our Ceph cluster: from time to time
> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming
> over
> >> > 100 GB RAM:
> >> >
> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> >> > [  +0.10]  oom_kill_process.cold+0xb/0x10
> >> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes
> >> > swapents oom_score_adj name
> >> > [  +0.08]
> >> >
> >>
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
> >> > [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
> shmem-rss:0kB,
> >> > UID:167 pgtables:260356kB oom_score_adj:0
> >> > [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >> >
> >> > The cluster is stable and operating normally, there's nothing unusual
> >> going
> >> > on before, during or after the kill, thus it's unclear what causes the
> >> mgr
> >> > to balloon, use all RAM and get killed. Systemd logs aren't very
> helpful:
> >> > they just show normal mgr operations until it fails to allocate memory
> >> and
> >> > gets killed: https://pastebin.com/MLyw9iVi
> >> >
> >> > The mgr experienced this issue several times in the last 2 months, and
> >> the
> >> > events don't appear to correlate with any other events in the cluster
> >> > because basically nothing else happened at around those times. How
> can I
> >> > investigate this and figure out what's causing the mgr to consume all
> >> > memory and get kill

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Zakhar Kirpichenko
Thanks, Eugen. It is similar in the sense that the mgr is getting
OOM-killed.

It started happening in our cluster after the upgrade to 16.2.14. We
haven't had this issue with earlier Pacific releases.

/Z

On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:

> Just checking it on the phone, but isn’t this quite similar?
>
> https://tracker.ceph.com/issues/45136
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > I'm facing a rather new issue with our Ceph cluster: from time to time
> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
> > 100 GB RAM:
> >
> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> > [  +0.10]  oom_kill_process.cold+0xb/0x10
> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes
> > swapents oom_score_adj name
> > [  +0.08]
> >
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
> > [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB,
> > UID:167 pgtables:260356kB oom_score_adj:0
> > [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >
> > The cluster is stable and operating normally, there's nothing unusual
> going
> > on before, during or after the kill, thus it's unclear what causes the
> mgr
> > to balloon, use all RAM and get killed. Systemd logs aren't very helpful:
> > they just show normal mgr operations until it fails to allocate memory
> and
> > gets killed: https://pastebin.com/MLyw9iVi
> >
> > The mgr experienced this issue several times in the last 2 months, and
> the
> > events don't appear to correlate with any other events in the cluster
> > because basically nothing else happened at around those times. How can I
> > investigate this and figure out what's causing the mgr to consume all
> > memory and get killed?
> >
> > I would very much appreciate any advice!
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-21 Thread Zakhar Kirpichenko
Hi,

I'm facing a rather new issue with our Ceph cluster: from time to time
ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
100 GB RAM:

[Nov21 15:02] tp_osd_tp invoked oom-killer:
gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[  +0.10]  oom_kill_process.cold+0xb/0x10
[  +0.02] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes
swapents oom_score_adj name
[  +0.08]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
[  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB,
UID:167 pgtables:260356kB oom_score_adj:0
[  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

The cluster is stable and operating normally, there's nothing unusual going
on before, during or after the kill, thus it's unclear what causes the mgr
to balloon, use all RAM and get killed. Systemd logs aren't very helpful:
they just show normal mgr operations until it fails to allocate memory and
gets killed: https://pastebin.com/MLyw9iVi

The mgr experienced this issue several times in the last 2 months, and the
events don't appear to correlate with any other events in the cluster
because basically nothing else happened at around those times. How can I
investigate this and figure out what's causing the mgr to consume all
memory and get killed?

I would very much appreciate any advice!

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] OSD Memory Usage

2023-11-16 Thread Zakhar Kirpichenko
Orch ps seems to show virtual size set instead of resident size set.

/Z

On Thu, 16 Nov 2023 at 09:43, Nguyễn Hữu Khôi 
wrote:

> Hello,
> Yes, I see it does not  exceed RSS but I see in "ceph orch ps". it is over
> target.  Does Mem Use include cache, I am right?
>
> NAMEHOST  PORTSSTATUS REFRESHED
>  AGE  MEM USE  MEM LIM  VERSIONIMAGE ID  CONTAINER ID
>
> osd.7   sg-osd01   running (3d)  8m ago
> 4w4231M4096M  17.2.6 90a2664234e1  922185643cb8
> osd.8   sg-osd03   running (3d)  7m ago
> 4w3407M4096M  17.2.6 90a2664234e1  0ec74fe54bbe
> osd.9   sg-osd01   running (3d)  8m ago
> 4w4575M4096M  17.2.6 90a2664234e1  c2f1c1ee2087
> osd.10  sg-osd03   running (3d)  7m ago
> 4w3821M4096M  17.2.6 90a2664234e1  fecbd5e910de
> osd.11  sg-osd01   running (3d)  8m ago
> 4w3578M4096M  17.2.6 90a2664234e1  f201704e9026
> osd.12  sg-osd03   running (3d)  7m ago
> 4w3076M4096M  17.2.6 90a2664234e1  e741b67b6582
> osd.13  sg-osd01   running (3d)  8m ago
> 4w3688M4096M  17.2.6 90a2664234e1  bffa59278fc2
> osd.14  sg-osd03   running (3d)  7m ago
> 4w3652M4096M  17.2.6 90a2664234e1  7d9eb3fb9c1e
> osd.15  sg-osd01   running (3d)  8m ago
> 4w3343M4096M  17.2.6 90a2664234e1  d96a425ae5c9
> osd.16  sg-osd03   running (3d)  7m ago
> 4w2492M4096M  17.2.6 90a2664234e1  637c43176fdc
> osd.17  sg-osd01   running (3d)  8m ago
> 4w3011M4096M  17.2.6 90a2664234e1  a39456dd2c0c
> osd.18  sg-osd03   running (3d)  7m ago
> 4w2341M4096M  17.2.6 90a2664234e1  7b750672391b
> osd.19  sg-osd01   running (3d)  8m ago
> 4w2672M4096M  17.2.6 90a2664234e1  6358234e95f5
> osd.20  sg-osd03   running (3d)  7m ago
> 4w3297M4096M  17.2.6 90a2664234e1  2ecba6b066fd
> osd.21  sg-osd01   running (3d)  8m ago
> 4w5147M4096M  17.2.6 90a2664234e1  1d0e4efe48bd
> osd.22  sg-osd03   running (3d)  7m ago
> 4w3432M4096M  17.2.6 90a2664234e1  5bb6d4f71f9d
> osd.23  sg-osd03   running (3d)  7m ago
> 4w2893M4096M  17.2.6 90a2664234e1  f7e1948e57d5
> osd.24  sg-osd02   running (3d)  7m ago
>  12d3007M4096M  17.2.6 90a2664234e1  85d896abe467
> osd.25  sg-osd02   running (3d)  7m ago
>  12d2666M4096M  17.2.6 90a2664234e1  9800cd8ff1a1
> osd.26  sg-osd02   running (3d)  7m ago
>  12d2918M4096M  17.2.6 90a2664234e1  f2e0b2d50625
> osd.27  sg-osd02   running (3d)  7m ago
>  12d3586M4096M  17.2.6 90a2664234e1  ee2fa3a9b40a
> osd.28  sg-osd02   running (3d)  7m ago
>  12d2391M4096M  17.2.6 90a2664234e1  4cf7adf9f60a
> osd.29  sg-osd02   running (3d)  7m ago
>  12d5642M4096M  17.2.6 90a2664234e1  8c1ba98a1738
> osd.30  sg-osd02   running (3d)  7m ago
>  12d4728M4096M  17.2.6 90a2664234e1  e308497de2e5
> osd.31  sg-osd02   running (3d)  7m ago
>  12d3615M4096M  17.2.6 90a2664234e1  89b80d464627
> osd.32  sg-osd02   running (3d)  7m ago
>  12d1703M4096M  17.2.6 90a2664234e1  1e4608786078
> osd.33  sg-osd02   running (3d)  7m ago
>  12d3039M4096M  17.2.6 90a2664234e1  16e04a1da987
> osd.34  sg-osd02   running (3d)  7m ago
>  12d2434M4096M  17.2.6 90a2664234e1  014076e28182
>
>
>
> btw as you said, I feel this value does not have much impact because if we
> set 1 or 4GB. It still can consume much memory when they need more memory,
>
> Nguyen Huu Khoi
>
>
> On Thu, Nov 16, 2023 at 2:13 PM Zakhar Kirpichenko 
> wrote:
>
>> You're most welcome!
>>
>> I'd say that real leak issues are very rare. For example, these are my
>> OSDs with memory target=16GB which have been running for quite a while, as
>> you can see they don't exceed 16 GB RSS:
>>
>>  

[ceph-users] Re: [CEPH] OSD Memory Usage

2023-11-15 Thread Zakhar Kirpichenko
You're most welcome!

I'd say that real leak issues are very rare. For example, these are my OSDs
with memory target=16GB which have been running for quite a while, as you
can see they don't exceed 16 GB RSS:

 PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND
  92298 167   20   0   18.7g  15.8g  12264 S   1.3   4.2   1974:06
ceph-osd
  94527 167   20   0   19.5g  15.8g  12248 S   2.3   4.2   2287:26
ceph-osd
  93749 167   20   0   19.1g  15.7g  12804 S   2.3   4.2   1768:22
ceph-osd
  89534 167   20   0   20.1g  15.7g  12412 S   4.0   4.2   2512:18
ceph-osd
3706552 167   20   0   20.5g  15.7g  15588 S   2.3   4.2   1385:26
ceph-osd
  90297 167   20   0   19.5g  15.6g  12432 S   3.0   4.1   2261:00
ceph-osd
   9799 167   20   0   22.9g  15.4g  12432 S   2.0   4.1   2494:00
ceph-osd
   9778 167   20   0   23.1g  15.3g  12556 S   2.6   4.1   2591:25
ceph-osd
   9815 167   20   0   23.4g  15.1g  12584 S   2.0   4.0   2722:28
ceph-osd
   9809 167   20   0   22.3g  15.1g  12068 S   3.6   4.0   5234:52
ceph-osd
   9811 167   20   0   23.4g  14.9g  12952 S   2.6   4.0   2593:19
ceph-osd
   9819 167   20   0   23.9g  14.9g  12636 S   2.6   4.0   3043:19
ceph-osd
   9820 167   20   0   23.3g  14.8g  12884 S   2.0   3.9   3073:43
ceph-osd
   9769 167   20   0   22.4g  14.7g  12612 S   2.6   3.9   2840:22
ceph-osd
   9836 167   20   0   24.0g  14.7g  12648 S   2.6   3.9   3300:34
ceph-osd
   9818 167   20   0   22.0g  14.7g  12152 S   2.3   3.9   5729:06
ceph-osd

Long story short, if you set reasonable targets, OSDs are unlikely to
exceed them during normal operations. If you set memory targets too low, it
is likely that they will be exceeded as OSDs need reasonable amounts of
memory to operate.

/Z

On Thu, 16 Nov 2023 at 08:37, Nguyễn Hữu Khôi 
wrote:

> Hello. Thank you very much for your explanation.
>
> Because I thought that  osd_memory_target will help me limit OSD memory
> usage which will help prevent memory leak - I tried google and many people
> talked about memory leak. A nice man, @Anthony D'Atri  ,
> on this forum helped me to understand that it wont help to limit OSD usage.
>
> I set it to 1GB because I want to see how this option works.
>
> I will read and test with caches options.
>
> Nguyen Huu Khoi
>
>
> On Thu, Nov 16, 2023 at 12:23 PM Zakhar Kirpichenko 
> wrote:
>
>> Hi,
>>
>> osd_memory_target is a "target", i.e. an OSD make an effort to consume up
>> to the specified amount of RAM, but won't consume less than required for
>> its operation and caches, which have some minimum values such as for
>> example osd_memory_cache_min, bluestore_cache_size,
>> bluestore_cache_size_hdd, bluestore_cache_size_ssd, etc. The recommended
>> and default OSD memory target is 4 GB.
>>
>> Your nodes have a sufficient amount of RAM, thus I don't see why you
>> would want to reduce OSD memory consumption below the recommended defaults,
>> especially considering that in-memory caches are important for Ceph
>> operations as they're many times faster than the fastest storage devices. I
>> run my OSDs with osd_memory_target=17179869184 (16 GB) and it helps,
>> especially with slower HDD-backed OSDs.
>>
>> /Z
>>
>> On Thu, 16 Nov 2023 at 01:02, Nguyễn Hữu Khôi 
>> wrote:
>>
>>> Hello,
>>> I am using a CEPH cluster. After monitoring it, I set:
>>>
>>> ceph config set osd osd_memory_target_autotune false
>>>
>>> ceph config set osd osd_memory_target 1G
>>>
>>> Then restart all OSD services then do test again, I just use fio commands
>>> from multi clients and I see that OSD memory consume is over 1GB. Would
>>> you
>>> like to help me understand this case?
>>>
>>> Ceph version: Quincy
>>>
>>> OSD: 3 nodes with 11 nvme each and 512GB ram per node.
>>>
>>> CPU: 2 socket xeon gold 6138 cpu with 56 cores per socket.
>>>
>>> Network: 25Gbps x 2 for public network and 25Gbps x 2 for storage
>>> network.
>>> MTU is 9000
>>>
>>> Thank you very much.
>>>
>>>
>>> Nguyen Huu Khoi
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] OSD Memory Usage

2023-11-15 Thread Zakhar Kirpichenko
Hi,

osd_memory_target is a "target", i.e. an OSD make an effort to consume up
to the specified amount of RAM, but won't consume less than required for
its operation and caches, which have some minimum values such as for
example osd_memory_cache_min, bluestore_cache_size,
bluestore_cache_size_hdd, bluestore_cache_size_ssd, etc. The recommended
and default OSD memory target is 4 GB.

Your nodes have a sufficient amount of RAM, thus I don't see why you would
want to reduce OSD memory consumption below the recommended defaults,
especially considering that in-memory caches are important for Ceph
operations as they're many times faster than the fastest storage devices. I
run my OSDs with osd_memory_target=17179869184 (16 GB) and it helps,
especially with slower HDD-backed OSDs.

/Z

On Thu, 16 Nov 2023 at 01:02, Nguyễn Hữu Khôi 
wrote:

> Hello,
> I am using a CEPH cluster. After monitoring it, I set:
>
> ceph config set osd osd_memory_target_autotune false
>
> ceph config set osd osd_memory_target 1G
>
> Then restart all OSD services then do test again, I just use fio commands
> from multi clients and I see that OSD memory consume is over 1GB. Would you
> like to help me understand this case?
>
> Ceph version: Quincy
>
> OSD: 3 nodes with 11 nvme each and 512GB ram per node.
>
> CPU: 2 socket xeon gold 6138 cpu with 56 cores per socket.
>
> Network: 25Gbps x 2 for public network and 25Gbps x 2 for storage network.
> MTU is 9000
>
> Thank you very much.
>
>
> Nguyen Huu Khoi
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-08 Thread Zakhar Kirpichenko
Take hints from this: "544 pgs not deep-scrubbed in time". Your OSDs are
unable to scrub their data in time, likely because they cannot cope with
the client + scrubbing I/O. I.e. there's too much data on too few and too
slow spindles.

You can play with osd_deep_scrub_interval and increase the scrub interval
from the default 604800 seconds (1 week) to 1209600 (2 weeks) or more. It
may be also a good idea to manually force scrubbing of some PGs to spread
scrubbing time more evenly over the selected period.

But in general this is not a balanced setup and little can be done to
alleviate the lack of spindle performance.

/Z

On Wed, 8 Nov 2023 at 17:22,  wrote:

> Hi Eugen
>  Please find the details below
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
> nodeep-scrub flag(s) set
> 544 pgs not deep-scrubbed in time
>
> services:
> mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
> mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
> mds: 3 up:standby
> osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
> flags nodeep-scrub
>
> data:
> pools: 2 pools, 544 pgs
> objects: 10.14M objects, 39 TiB
> usage: 116 TiB used, 63 TiB / 179 TiB avail
> pgs: 544 active+clean
>
> io:
> client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
>
>
> Ceph Versions:
> root@meghdootctr1:/var/log/ceph# ceph --version
> ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>
> Ceph df -h
> https://pastebin.com/1ffucyJg
>
> Ceph OSD performance dump
> https://pastebin.com/1R6YQksE
>
> Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value
> of 250 +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We
> are using only 4 OSDs from HP3 par and it is working fine without any
> latency and iops issues from the beginning but the remaining 32 OSDs are
> from DELL EMC in which 4 OSDs are much better than the remaining 28 OSDs)
>
> https://pastebin.com/CixaQmBi
>
> Please help me to identify if the issue is with the DELL EMC Storage, Ceph
> configuration parameter tuning or the Overload in the cloud setup
>
>
>
> On November 1, 2023 at 9:48 PM Eugen Block  wrote:
> > Hi,
> >
> > for starters please add more cluster details like 'ceph status', 'ceph
> > versions', 'ceph osd df tree'. Increasing the to 10G was the right
> > thing to do, you don't get far with 1G with real cluster load. How are
> > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> > How is the disk utilization?
> >
> > Regards,
> > Eugen
> >
> > Zitat von prab...@cdac.in:
> >
> > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > > allocated to a single Ceph Cluster with 3 monitors and 3 managers.
> > > There were 830 volumes and VMs created in Openstack with Ceph as a
> > > backend. On Sep 21, users reported slowness in accessing the VMs.
> > > Analysing the logs lead us to problem with SAS , Network congestion
> > > and Ceph configuration( as all default values were used). We updated
> > > the Network from 1Gbps to 10Gbps for public and cluster networking.
> > > There was no change.
> > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
> > > reported very low IOPS of 30 to 50 while the remaining showed 300+
> > > IOPS.
> > > We gradually started reducing the load on the ceph cluster and now
> > > the volumes count is 650. Now the slow operations has gradually
> > > reduced but I am aware that this is not the solution.
> > > Ceph configuration is updated with increasing the
> > > osd_journal_size to 10 GB,
> > > osd_max_backfills = 1
> > > osd_recovery_max_active = 1
> > > osd_recovery_op_priority = 1
> > > bluestore_cache_trim_max_skip_pinned=1
> > >
> > > After one month, now we faced another issue with Mgr daemon stopped
> > > in all 3 quorums and 16 OSDs went down. From the
> > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
> > > its a production setup
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-06 Thread Zakhar Kirpichenko
Only client I/O, cluster recovery I/O and/or data scrubbing I/O make the
cluster "busy". If you have removed client workloads and the cluster is
healthy, it should be mostly idle. Simply having data sitting in the
cluster but not being accessed or modified doesn't make the cluster do any
work, except for scheduled data scrubbing. Note that depending on the data
volume and the OSD performance, scrubbing may take considerable time.
Monitor logs generally provide a good idea of what's going on with the
cluster and whether scrubbing is active.

/Z

On Mon, 6 Nov 2023 at 12:48, V A Prabha  wrote:

> Please clarify my query.
> I had 700+ volumes  (220 applications) running in 36 OSDs when it reported
> the slow operations. Due to emergency, we migrated 200+ VMs to another
> virtualization environment. So we have shutdown all the related VMs in our
> Openstack production setup running with Ceph.
> We have not deleted the 200+ volumes from Ceph as waiting for the
> concurrence from the departments.
> My query is that even when the applications are down, does the volume at
> the backend make our cluster busy when there is no active transactions?
> Is there any parameter in the ceph osd log and ceph mon log that gives me
> the clue for the cluster business?
> Is there any zombie or unwanted process that make the ceph cluster busy or
> the IOPS budget of the disk that makes the cluster busy?
>
>
> On November 4, 2023 at 4:29 PM Zakhar Kirpichenko 
> wrote:
>
> You have an IOPS budget, i.e. how much I/O your spinners can deliver.
> Space utilization doesn't affect it much.
>
> You can try disabling write (not read!) cache on your HDDs with sdparm
> (for example, sdparm -c WCE /dev/bla); in my experience this allows HDDs to
> deliver 50-100% more write IOPS. If there is lots of free RAM on the OSD
> nodes, you can play with osd_memory_target and bluestore_cache_size_hdd OSD
> options; be careful though: depending on you workload, the performance
> impact may be insignificant, but your OSDs may run out of memory.
>
> /z
>
> On Sat, 4 Nov 2023 at 12:04, V A Prabha < prab...@cdac.in> wrote:
>
> Now in this situation how can stabilize my production setup as you have
> mentioned the cluster is very busy.
> Is there any configuration parameter tuning will help or the only option
> is to reduce the applications running on the cluster.
> Though if I have free available storage of 1.6 TB free in each of my OSD,
> that will not help in my IOPS issue right?
> Please guide me
>
> On November 2, 2023 at 12:47 PM Zakhar Kirpichenko < zak...@gmail.com>
> wrote:
>
> >1. The calculated IOPS is for the rw operation right ?
>
> Total drive IOPS, read or write. Depending on the exact drive models, it
> may be lower or higher than 200. I took the average for a smaller sized
> 7.2k rpm SAS drive. Modern drives usually deliver lower read IOPS and
> higher write IOPS.
>
> >2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
>
> You have almost 3k IOPS and your OSDs report slow ops. I'd say the cluster
> is busy, as in loaded with I/O, perhaps more I/O than it can handle well.
>
> >3. Nodes are not balanced?  you mean to say that the count of OSDs in
> each server differs. But we have enabled autoscale and optimal distribution
> so that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
>
> Yes, the OSD count differs. This means that the CPU, memory usage, network
> load and latency differ per node and may cause performance variations,
> depending on your workload.
>
> /Z
>
> On Thu, 2 Nov 2023 at 08:18, V A Prabha < prab...@cdac.in> wrote:
>
> Thanks for your prompt reply ..
> But the query is
> 1.The calculated IOPS is for the rw operation right ?
> 2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
> 3. Nodes are not balanced?  you mean to say that the count of OSDs in each
> server differs. But we have enabled autoscale and optimal distribution so
> that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
> Correct me if my queries are irrelevant
>
>
>
> On November 2, 2023 at 11:36 AM Zakhar Kirpichenko < zak...@gmail.com>
> wrote:
>
> Sure, it's 36 OSDs at 200 IOPS each (tops, likely lower), I assume size=3
> replication so 1/3 of the total performance, and some 30%-ish OSD
> overhead.
>
> (36 x 200) * 1/3 * 0.7 = 1680. That's how many IOPS you can realistically
> expect from your cluster. You get more than that, but the cluster is very
> busy and OSDs ar

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-04 Thread Zakhar Kirpichenko
You have an IOPS budget, i.e. how much I/O your spinners can deliver. Space
utilization doesn't affect it much.

You can try disabling write (not read!) cache on your HDDs with sdparm (for
example, sdparm -c WCE /dev/bla); in my experience this allows HDDs to
deliver 50-100% more write IOPS. If there is lots of free RAM on the OSD
nodes, you can play with osd_memory_target and bluestore_cache_size_hdd OSD
options; be careful though: depending on you workload, the performance
impact may be insignificant, but your OSDs may run out of memory.

/z

On Sat, 4 Nov 2023 at 12:04, V A Prabha  wrote:

> Now in this situation how can stabilize my production setup as you have
> mentioned the cluster is very busy.
> Is there any configuration parameter tuning will help or the only option
> is to reduce the applications running on the cluster.
> Though if I have free available storage of 1.6 TB free in each of my OSD,
> that will not help in my IOPS issue right?
> Please guide me
>
> On November 2, 2023 at 12:47 PM Zakhar Kirpichenko 
> wrote:
>
> >1. The calculated IOPS is for the rw operation right ?
>
> Total drive IOPS, read or write. Depending on the exact drive models, it
> may be lower or higher than 200. I took the average for a smaller sized
> 7.2k rpm SAS drive. Modern drives usually deliver lower read IOPS and
> higher write IOPS.
>
> >2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
>
> You have almost 3k IOPS and your OSDs report slow ops. I'd say the cluster
> is busy, as in loaded with I/O, perhaps more I/O than it can handle well.
>
> >3. Nodes are not balanced?  you mean to say that the count of OSDs in
> each server differs. But we have enabled autoscale and optimal distribution
> so that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
>
> Yes, the OSD count differs. This means that the CPU, memory usage, network
> load and latency differ per node and may cause performance variations,
> depending on your workload.
>
> /Z
>
> On Thu, 2 Nov 2023 at 08:18, V A Prabha < prab...@cdac.in> wrote:
>
> Thanks for your prompt reply ..
> But the query is
> 1.The calculated IOPS is for the rw operation right ?
> 2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
> 3. Nodes are not balanced?  you mean to say that the count of OSDs in each
> server differs. But we have enabled autoscale and optimal distribution so
> that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
> Correct me if my queries are irrelevant
>
>
>
> On November 2, 2023 at 11:36 AM Zakhar Kirpichenko < zak...@gmail.com>
> wrote:
>
> Sure, it's 36 OSDs at 200 IOPS each (tops, likely lower), I assume size=3
> replication so 1/3 of the total performance, and some 30%-ish OSD
> overhead.
>
> (36 x 200) * 1/3 * 0.7 = 1680. That's how many IOPS you can realistically
> expect from your cluster. You get more than that, but the cluster is very
> busy and OSDs aren't coping.
>
> Also your nodes are not balanced.
>
> /Z
>
> On Thu, 2 Nov 2023 at 07:33, V A Prabha < prab...@cdac.in> wrote:
>
> Can you please elaborate your identifications and the statement .
>
>
> On November 2, 2023 at 9:40 AM Zakhar Kirpichenko < zak...@gmail.com>
> wrote:
>
> I'm afraid you're simply hitting the I/O limits of your disks.
>
> /Z
>
> On Thu, 2 Nov 2023 at 03:40, V A Prabha < prab...@cdac.in> wrote:
>
>  Hi Eugen
>  Please find the details below
>
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
> nodeep-scrub flag(s) set
> 544 pgs not deep-scrubbed in time
>
> services:
> mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
> mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
> mds: 3 up:standby
> osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
> flags nodeep-scrub
>
> data:
> pools: 2 pools, 544 pgs
> objects: 10.14M objects, 39 TiB
> usage: 116 TiB used, 63 TiB / 179 TiB avail
> pgs: 544 active+clean
>
> io:
> client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
>
>
> Ceph Versions:
>
> root@meghdootctr1:/var/log/ceph# ceph --version
> ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>
> Ceph df -h
> https://pastebin.com/1ffucyJg
>
> Ceph OSD performance dump
> https://pastebin.com/1R6YQksE
>
> Ceph tell osd.XX bench  (Out of 36 osds only 8

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread Zakhar Kirpichenko
>1. The calculated IOPS is for the rw operation right ?

Total drive IOPS, read or write. Depending on the exact drive models, it
may be lower or higher than 200. I took the average for a smaller sized
7.2k rpm SAS drive. Modern drives usually deliver lower read IOPS and
higher write IOPS.

>2. Cluster is very busy? Is there any misconfiguration or missing tuning
paramater that makes the cluster busy?

You have almost 3k IOPS and your OSDs report slow ops. I'd say the cluster
is busy, as in loaded with I/O, perhaps more I/O than it can handle well.

>3. Nodes are not balanced?  you mean to say that the count of OSDs in each
server differs. But we have enabled autoscale and optimal distribution so
that you can see from the output of ceph osd df tree that is count of
pgs(45/OSD) and use% (65 to 67%). Is that not significant?

Yes, the OSD count differs. This means that the CPU, memory usage, network
load and latency differ per node and may cause performance variations,
depending on your workload.

/Z

On Thu, 2 Nov 2023 at 08:18, V A Prabha  wrote:

> Thanks for your prompt reply ..
> But the query is
> 1.The calculated IOPS is for the rw operation right ?
> 2. Cluster is very busy? Is there any misconfiguration or missing tuning
> paramater that makes the cluster busy?
> 3. Nodes are not balanced?  you mean to say that the count of OSDs in each
> server differs. But we have enabled autoscale and optimal distribution so
> that you can see from the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
> Correct me if my queries are irrelevant
>
>
>
> On November 2, 2023 at 11:36 AM Zakhar Kirpichenko 
> wrote:
>
> Sure, it's 36 OSDs at 200 IOPS each (tops, likely lower), I assume size=3
> replication so 1/3 of the total performance, and some 30%-ish OSD
> overhead.
>
> (36 x 200) * 1/3 * 0.7 = 1680. That's how many IOPS you can realistically
> expect from your cluster. You get more than that, but the cluster is very
> busy and OSDs aren't coping.
>
> Also your nodes are not balanced.
>
> /Z
>
> On Thu, 2 Nov 2023 at 07:33, V A Prabha < prab...@cdac.in> wrote:
>
> Can you please elaborate your identifications and the statement .
>
>
> On November 2, 2023 at 9:40 AM Zakhar Kirpichenko < zak...@gmail.com>
> wrote:
>
> I'm afraid you're simply hitting the I/O limits of your disks.
>
> /Z
>
> On Thu, 2 Nov 2023 at 03:40, V A Prabha < prab...@cdac.in> wrote:
>
>  Hi Eugen
>  Please find the details below
>
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
> nodeep-scrub flag(s) set
> 544 pgs not deep-scrubbed in time
>
> services:
> mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
> mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
> mds: 3 up:standby
> osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
> flags nodeep-scrub
>
> data:
> pools: 2 pools, 544 pgs
> objects: 10.14M objects, 39 TiB
> usage: 116 TiB used, 63 TiB / 179 TiB avail
> pgs: 544 active+clean
>
> io:
> client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
>
>
> Ceph Versions:
>
> root@meghdootctr1:/var/log/ceph# ceph --version
> ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>
> Ceph df -h
> https://pastebin.com/1ffucyJg
>
> Ceph OSD performance dump
> https://pastebin.com/1R6YQksE
>
> Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value
> of 250
> +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are
> using
> only 4 OSDs from HP3 par and it is working fine without any latency and
> iops
> issues from the beginning but the remaining 32 OSDs are from DELL EMC in
> which 4
> OSDs are much better than the remaining 28 OSDs)
>
> https://pastebin.com/CixaQmBi
>
> Please help me to identify if the issue is with the DELL EMC Storage, Ceph
> configuration parameter tuning or the Overload in the cloud setup
>
>
>
> On November 1, 2023 at 9:48 PM Eugen Block < ebl...@nde.ag> wrote:
> > Hi,
> >
> > for starters please add more cluster details like 'ceph status', 'ceph
> > versions', 'ceph osd df tree'. Increasing the to 10G was the right
> > thing to do, you don't get far with 1G with real cluster load. How are
> > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> > How is the disk utilization?
> >
> > Regards,
> > Eugen
> >
> > Zitat von prab...@cdac.in:
> >
> > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > > allocated to a single Ceph Clus

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-02 Thread Zakhar Kirpichenko
Sure, it's 36 OSDs at 200 IOPS each (tops, likely lower), I assume size=3
replication so 1/3 of the total performance, and some 30%-ish OSD overhead.

(36 x 200) * 1/3 * 0.7 = 1680. That's how many IOPS you can realistically
expect from your cluster. You get more than that, but the cluster is very
busy and OSDs aren't coping.

Also your nodes are not balanced.

/Z

On Thu, 2 Nov 2023 at 07:33, V A Prabha  wrote:

> Can you please elaborate your identifications and the statement .
>
>
> On November 2, 2023 at 9:40 AM Zakhar Kirpichenko 
> wrote:
>
> I'm afraid you're simply hitting the I/O limits of your disks.
>
> /Z
>
> On Thu, 2 Nov 2023 at 03:40, V A Prabha < prab...@cdac.in> wrote:
>
>  Hi Eugen
>  Please find the details below
>
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
> nodeep-scrub flag(s) set
> 544 pgs not deep-scrubbed in time
>
> services:
> mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
> mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
> mds: 3 up:standby
> osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
> flags nodeep-scrub
>
> data:
> pools: 2 pools, 544 pgs
> objects: 10.14M objects, 39 TiB
> usage: 116 TiB used, 63 TiB / 179 TiB avail
> pgs: 544 active+clean
>
> io:
> client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
>
>
> Ceph Versions:
>
> root@meghdootctr1:/var/log/ceph# ceph --version
> ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>
> Ceph df -h
> https://pastebin.com/1ffucyJg
>
> Ceph OSD performance dump
> https://pastebin.com/1R6YQksE
>
> Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value
> of 250
> +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are
> using
> only 4 OSDs from HP3 par and it is working fine without any latency and
> iops
> issues from the beginning but the remaining 32 OSDs are from DELL EMC in
> which 4
> OSDs are much better than the remaining 28 OSDs)
>
> https://pastebin.com/CixaQmBi
>
> Please help me to identify if the issue is with the DELL EMC Storage, Ceph
> configuration parameter tuning or the Overload in the cloud setup
>
>
>
> On November 1, 2023 at 9:48 PM Eugen Block < ebl...@nde.ag> wrote:
> > Hi,
> >
> > for starters please add more cluster details like 'ceph status', 'ceph
> > versions', 'ceph osd df tree'. Increasing the to 10G was the right
> > thing to do, you don't get far with 1G with real cluster load. How are
> > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> > How is the disk utilization?
> >
> > Regards,
> > Eugen
> >
> > Zitat von prab...@cdac.in:
> >
> > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > > allocated to a single Ceph Cluster with 3 monitors and 3 managers.
> > > There were 830 volumes and VMs created in Openstack with Ceph as a
> > > backend. On Sep 21, users reported slowness in accessing the VMs.
> > > Analysing the logs lead us to problem with SAS , Network congestion
> > > and Ceph configuration( as all default values were used). We updated
> > > the Network from 1Gbps to 10Gbps for public and cluster networking.
> > > There was no change.
> > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
> > > reported very low IOPS of 30 to 50 while the remaining showed 300+
> > > IOPS.
> > > We gradually started reducing the load on the ceph cluster and now
> > > the volumes count is 650. Now the slow operations has gradually
> > > reduced but I am aware that this is not the solution.
> > > Ceph configuration is updated with increasing the
> > > osd_journal_size to 10 GB,
> > > osd_max_backfills = 1
> > > osd_recovery_max_active = 1
> > > osd_recovery_op_priority = 1
> > > bluestore_cache_trim_max_skip_pinned=1
> > >
> > > After one month, now we faced another issue with Mgr daemon stopped
> > > in all 3 quorums and 16 OSDs went down. From the
> > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
> > > its a production setup
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ce

[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-01 Thread Zakhar Kirpichenko
I'm afraid you're simply hitting the I/O limits of your disks.

/Z

On Thu, 2 Nov 2023 at 03:40, V A Prabha  wrote:

>  Hi Eugen
>  Please find the details below
>
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
> nodeep-scrub flag(s) set
> 544 pgs not deep-scrubbed in time
>
> services:
> mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
> mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
> mds: 3 up:standby
> osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
> flags nodeep-scrub
>
> data:
> pools: 2 pools, 544 pgs
> objects: 10.14M objects, 39 TiB
> usage: 116 TiB used, 63 TiB / 179 TiB avail
> pgs: 544 active+clean
>
> io:
> client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
>
>
> Ceph Versions:
>
> root@meghdootctr1:/var/log/ceph# ceph --version
> ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>
> Ceph df -h
> https://pastebin.com/1ffucyJg
>
> Ceph OSD performance dump
> https://pastebin.com/1R6YQksE
>
> Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value
> of 250
> +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are
> using
> only 4 OSDs from HP3 par and it is working fine without any latency and
> iops
> issues from the beginning but the remaining 32 OSDs are from DELL EMC in
> which 4
> OSDs are much better than the remaining 28 OSDs)
>
> https://pastebin.com/CixaQmBi
>
> Please help me to identify if the issue is with the DELL EMC Storage, Ceph
> configuration parameter tuning or the Overload in the cloud setup
>
>
>
> On November 1, 2023 at 9:48 PM Eugen Block  wrote:
> > Hi,
> >
> > for starters please add more cluster details like 'ceph status', 'ceph
> > versions', 'ceph osd df tree'. Increasing the to 10G was the right
> > thing to do, you don't get far with 1G with real cluster load. How are
> > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> > How is the disk utilization?
> >
> > Regards,
> > Eugen
> >
> > Zitat von prab...@cdac.in:
> >
> > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > > allocated to a single Ceph Cluster with 3 monitors and 3 managers.
> > > There were 830 volumes and VMs created in Openstack with Ceph as a
> > > backend. On Sep 21, users reported slowness in accessing the VMs.
> > > Analysing the logs lead us to problem with SAS , Network congestion
> > > and Ceph configuration( as all default values were used). We updated
> > > the Network from 1Gbps to 10Gbps for public and cluster networking.
> > > There was no change.
> > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
> > > reported very low IOPS of 30 to 50 while the remaining showed 300+
> > > IOPS.
> > > We gradually started reducing the load on the ceph cluster and now
> > > the volumes count is 650. Now the slow operations has gradually
> > > reduced but I am aware that this is not the solution.
> > > Ceph configuration is updated with increasing the
> > > osd_journal_size to 10 GB,
> > > osd_max_backfills = 1
> > > osd_recovery_max_active = 1
> > > osd_recovery_op_priority = 1
> > > bluestore_cache_trim_max_skip_pinned=1
> > >
> > > After one month, now we faced another issue with Mgr daemon stopped
> > > in all 3 quorums and 16 OSDs went down. From the
> > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
> > > its a production setup
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> Thanks & Regards,
> Ms V A Prabha / श्रीमती प्रभा वी ए
> Joint Director / संयुक्त निदेशक
> Centre for Development of Advanced Computing(C-DAC) / प्रगत संगणन विकास
> केन्द्र(सी-डैक)
> Tidel Park”, 8th Floor, “D” Block, (North ) / “टाइडल पार्क”,8वीं
> मंजिल,
> “डी” ब्लॉक, (उत्तर और दक्षिण)
> No.4, Rajiv Gandhi Salai / नं.4, राजीव गांधी सलाई
> Taramani / तारामणि
> Chennai / चेन्नई – 600113
> Ph.No.:044-22542226/27
> Fax No.: 044-22542294
>
> 
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
>
> 

[ceph-users] Re: Ceph 16.2.14: pgmap updated every few seconds for no apparent reason

2023-10-25 Thread Zakhar Kirpichenko
Thanks for the warning, Eugen.

/Z

On Wed, 25 Oct 2023 at 13:04, Eugen Block  wrote:

> Hi,
>
> this setting is not as harmless as I assumed. There seem to be more
> ticks/periods/health_checks involved. When I choose a mgr_tick_period
> value > 30 seconds the two MGRs keep respawning. 30 seconds are the
> highest value that still seemed to work without MGR respawn, even with
> increased mon_mgr_beacon_grace (default 30 sec.). So if you decide to
> increase the mgr_tick_period don't go over 30 unless you find out what
> else you need to change.
>
> Regards,
> Eugen
>
>
> Zitat von Eugen Block :
>
> > Hi,
> >
> > you can change the report interval with this config option (default
> > 2 seconds):
> >
> > $ ceph config get mgr mgr_tick_period
> > 2
> >
> > $ ceph config set mgr mgr_tick_period 10
> >
> > Regards,
> > Eugen
> >
> > Zitat von Chris Palmer :
> >
> >> I have just checked 2 quincy 17.2.6 clusters, and I see exactly the
> >> same. The pgmap version is bumping every two seconds (which ties in
> >> with the frequency you observed). Both clusters are healthy with
> >> nothing apart from client IO happening.
> >>
> >> On 13/10/2023 12:09, Zakhar Kirpichenko wrote:
> >>> Hi,
> >>>
> >>> I am investigating excessive mon writes in our cluster and wondering
> >>> whether excessive pgmap updates could be the culprit. Basically pgmap
> is
> >>> updated every few seconds, sometimes over ten times per minute, in a
> >>> healthy cluster with no OSD and/or PG changes:
> >>>
> >>> Oct 13 11:03:03 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:01.515438+
> >>> mgr.ceph01.vankui (mgr.336635131) 838252 : cluster [DBG] pgmap v606575:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 60 MiB/s rd, 109 MiB/s wr, 5.65k
> op/s
> >>> Oct 13 11:03:04 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:03.520953+
> >>> mgr.ceph01.vankui (mgr.336635131) 838253 : cluster [DBG] pgmap v606576:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 128 MiB/s wr, 5.76k
> op/s
> >>> Oct 13 11:03:06 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:05.524474+
> >>> mgr.ceph01.vankui (mgr.336635131) 838255 : cluster [DBG] pgmap v606577:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 122 MiB/s wr, 5.57k
> op/s
> >>> Oct 13 11:03:08 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:07.530484+
> >>> mgr.ceph01.vankui (mgr.336635131) 838256 : cluster [DBG] pgmap v606578:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 79 MiB/s rd, 127 MiB/s wr, 6.62k
> op/s
> >>> Oct 13 11:03:10 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:09.57+
> >>> mgr.ceph01.vankui (mgr.336635131) 838258 : cluster [DBG] pgmap v606579:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 66 MiB/s rd, 104 MiB/s wr, 5.38k
> op/s
> >>> Oct 13 11:03:12 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:11.537908+
> >>> mgr.ceph01.vankui (mgr.336635131) 838259 : cluster [DBG] pgmap v606580:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 85 MiB/s rd, 121 MiB/s wr, 6.43k
> op/s
> >>> Oct 13 11:03:13 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:13.543490+
> >>> mgr.ceph01.vankui (mgr.336635131) 838260 : cluster [DBG] pgmap v606581:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 78 MiB/s rd, 127 MiB/s wr, 6.54k
> op/s
> >>> Oct 13 11:03:16 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:15.547122+
> >>> mgr.ceph01.vankui (mgr.336635131) 838262 : cluster [DBG] pgmap v606582:
> >>> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >>> TiB used, 716 TiB / 777 TiB avail; 71 MiB/s rd, 122 MiB/s wr, 6.08k
> op/s
> >>> Oct 13 11:03:18 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:17.553180+
> >>> mgr.ceph01.vankui (mgr.336635131) 83

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-20 Thread Zakhar Kirpichenko
Thank you, Igor. I was just reading the detailed list of changes for
16.2.14, as I suspected that we might not be able to go back to the
previous minor release :-) Thanks again for the suggestions, we'll consider
our options.

/Z

On Fri, 20 Oct 2023 at 16:08, Igor Fedotov  wrote:

> Zakhar,
>
> my general concern about downgrading to previous versions is that this
> procedure is generally neither assumed nor tested by dev team. Although is
> possible most of the time. But in this specific case it is not doable due
> to (at least) https://github.com/ceph/ceph/pull/52212 which enables 4K
> bluefs allocation unit support - once some daemon gets it - there is no way
> back.
>
> I'm still thinking that setting "fit_to_fast" mode without enabling
> dynamic compaction levels is quite safe but definitely it's better to be
> tested in the real environment and under actual payload first. Also you
> might want to apply such a workaround gradually - one daemon first, bake it
> for a while, then apply for the full node, bake a bit more and finally go
> forward and update the remaining. Or even better - bake it in a test
> cluster first.
>
> Alternatively you might consider building updated code yourself and make
> patched binaries on top of .14...
>
>
> Thanks,
>
> Igor
>
>
> On 20/10/2023 15:10, Zakhar Kirpichenko wrote:
>
> Thank you, Igor.
>
> It is somewhat disappointing that fixing this bug in Pacific has such a
> low priority, considering its impact on existing clusters.
>
> The document attached to the PR explicitly says about
> `level_compaction_dynamic_level_bytes` that "enabling it on an existing DB
> requires special caution", we'd rather not experiment with something that
> has the potential to cause data corruption or loss in a production cluster.
> Perhaps a downgrade to the previous version, 16.2.13 which worked for us
> without any issues, is an option, or would you advise against such a
> downgrade from 16.2.14?
>
> /Z
>
> On Fri, 20 Oct 2023 at 14:46, Igor Fedotov  wrote:
>
>> Hi Zakhar,
>>
>> Definitely we expect one more (and apparently the last) Pacific minor
>> release. There is no specific date yet though - the plans are to release
>> Quincy and Reef minor releases prior to it. Hopefully to be done before the
>> Christmas/New Year.
>>
>> Meanwhile you might want to workaround the issue by tuning
>> bluestore_volume_selection_policy. Unfortunately most likely my original
>> proposal to set it to rocksdb_original wouldn't work in this case so you
>> better try "fit_to_fast" mode. This should be coupled with enabling
>> 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
>> good spec on applying this mode to BlueStore attached to
>> https://github.com/ceph/ceph/pull/37156.
>>
>>
>> Thanks,
>>
>> Igor
>> On 20/10/2023 06:03, Zakhar Kirpichenko wrote:
>>
>> Igor, I noticed that there's no roadmap for the next 16.2.x release. May
>> I ask what time frame we are looking at with regards to a possible fix?
>>
>> We're experiencing several OSD crashes caused by this issue per day.
>>
>> /Z
>>
>> On Mon, 16 Oct 2023 at 14:19, Igor Fedotov  wrote:
>>
>>> That's true.
>>> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
>>>
>>> Many thanks, Igor. I found previously submitted bug reports and
>>> subscribed to them. My understanding is that the issue is going to be fixed
>>> in the next Pacific minor release.
>>>
>>> /Z
>>>
>>> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov 
>>> wrote:
>>>
>>>> Hi Zakhar,
>>>>
>>>> please see my reply for the post on the similar issue at:
>>>>
>>>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
>>>> > Hi,
>>>> >
>>>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
>>>> > in bstore_kv_sync thread:
>>>> >
>>>> >
>>>> > 1. "assert_thread_name": "bstore_kv_sync",
>>>> > 2. "backtrace": [
>>>> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>>>> > 4. "gsignal()",
>>>> > 5. "abort()",
>>>> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, i

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-20 Thread Zakhar Kirpichenko
Thanks, Tyler. I appreciate what you're saying, though I can't fully agree:
16.2.13 didn't have crashing OSDs, so the crashes in 16.2.14 seem like a
regression - please correct me if I'm wrong. If it is indeed a regression,
then I'm not sure that suggesting to upgrade is the right thing to do in
this case.

We would consider upgrading, but unfortunately our Openstack Wallaby is
holding us back as its cinder doesn't support Ceph 17.x, so we're stuck
with having to find a solution for Ceph 16.x.

/Z

On Fri, 20 Oct 2023 at 15:39, Tyler Stachecki 
wrote:

> On Fri, Oct 20, 2023, 8:11 AM Zakhar Kirpichenko  wrote:
>
>> Thank you, Igor.
>>
>> It is somewhat disappointing that fixing this bug in Pacific has such a
>> low
>> priority, considering its impact on existing clusters.
>>
>
> Unfortunately, the hard truth here is that Pacific (stable) was released
> over 30 months ago. It has had a good run for a freely distributed product,
> and there's only so much time you can dedicate to backporting bugfixes --
> it claws time away from other forward-thinking initiatives.
>
> Speaking from someone who's been at the helm of production clusters, I
> know Ceph upgrades can be an experience and it's frustrating to hear, but
> you have to jump sometime...
>
> Regards,
> Tyler
>
>
>> On Fri, 20 Oct 2023 at 14:46, Igor Fedotov  wrote:
>>
>> > Hi Zakhar,
>> >
>> > Definitely we expect one more (and apparently the last) Pacific minor
>> > release. There is no specific date yet though - the plans are to release
>> > Quincy and Reef minor releases prior to it. Hopefully to be done before
>> the
>> > Christmas/New Year.
>> >
>> > Meanwhile you might want to workaround the issue by tuning
>> > bluestore_volume_selection_policy. Unfortunately most likely my original
>> > proposal to set it to rocksdb_original wouldn't work in this case so you
>> > better try "fit_to_fast" mode. This should be coupled with enabling
>> > 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
>> > good spec on applying this mode to BlueStore attached to
>> > https://github.com/ceph/ceph/pull/37156.
>> >
>> >
>> > Thanks,
>> >
>> > Igor
>> > On 20/10/2023 06:03, Zakhar Kirpichenko wrote:
>> >
>> > Igor, I noticed that there's no roadmap for the next 16.2.x release.
>> May I
>> > ask what time frame we are looking at with regards to a possible fix?
>> >
>> > We're experiencing several OSD crashes caused by this issue per day.
>> >
>> > /Z
>> >
>> > On Mon, 16 Oct 2023 at 14:19, Igor Fedotov 
>> wrote:
>> >
>> >> That's true.
>> >> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
>> >>
>> >> Many thanks, Igor. I found previously submitted bug reports and
>> >> subscribed to them. My understanding is that the issue is going to be
>> fixed
>> >> in the next Pacific minor release.
>> >>
>> >> /Z
>> >>
>> >> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov 
>> wrote:
>> >>
>> >>> Hi Zakhar,
>> >>>
>> >>> please see my reply for the post on the similar issue at:
>> >>>
>> >>>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Igor
>> >>>
>> >>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
>> >>> > Hi,
>> >>> >
>> >>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
>> >>> > in bstore_kv_sync thread:
>> >>> >
>> >>> >
>> >>> > 1. "assert_thread_name": "bstore_kv_sync",
>> >>> > 2. "backtrace": [
>> >>> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>> >>> > 4. "gsignal()",
>> >>> > 5. "abort()",
>> >>> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int,
>> char
>> >>> > const*)+0x1a9) [0x564dc5f87d0b]",
>> >>> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>> >>> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*,
>> bluefs_fnode_t
>> >>> > const&)+0x15e) [0x564dc6604a9e]"

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-20 Thread Zakhar Kirpichenko
Thank you, Igor.

It is somewhat disappointing that fixing this bug in Pacific has such a low
priority, considering its impact on existing clusters.

The document attached to the PR explicitly says about
`level_compaction_dynamic_level_bytes` that "enabling it on an existing DB
requires special caution", we'd rather not experiment with something that
has the potential to cause data corruption or loss in a production cluster.
Perhaps a downgrade to the previous version, 16.2.13 which worked for us
without any issues, is an option, or would you advise against such a
downgrade from 16.2.14?

/Z

On Fri, 20 Oct 2023 at 14:46, Igor Fedotov  wrote:

> Hi Zakhar,
>
> Definitely we expect one more (and apparently the last) Pacific minor
> release. There is no specific date yet though - the plans are to release
> Quincy and Reef minor releases prior to it. Hopefully to be done before the
> Christmas/New Year.
>
> Meanwhile you might want to workaround the issue by tuning
> bluestore_volume_selection_policy. Unfortunately most likely my original
> proposal to set it to rocksdb_original wouldn't work in this case so you
> better try "fit_to_fast" mode. This should be coupled with enabling
> 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
> good spec on applying this mode to BlueStore attached to
> https://github.com/ceph/ceph/pull/37156.
>
>
> Thanks,
>
> Igor
> On 20/10/2023 06:03, Zakhar Kirpichenko wrote:
>
> Igor, I noticed that there's no roadmap for the next 16.2.x release. May I
> ask what time frame we are looking at with regards to a possible fix?
>
> We're experiencing several OSD crashes caused by this issue per day.
>
> /Z
>
> On Mon, 16 Oct 2023 at 14:19, Igor Fedotov  wrote:
>
>> That's true.
>> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
>>
>> Many thanks, Igor. I found previously submitted bug reports and
>> subscribed to them. My understanding is that the issue is going to be fixed
>> in the next Pacific minor release.
>>
>> /Z
>>
>> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov  wrote:
>>
>>> Hi Zakhar,
>>>
>>> please see my reply for the post on the similar issue at:
>>>
>>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
>>> > Hi,
>>> >
>>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
>>> > in bstore_kv_sync thread:
>>> >
>>> >
>>> > 1. "assert_thread_name": "bstore_kv_sync",
>>> > 2. "backtrace": [
>>> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>>> > 4. "gsignal()",
>>> > 5. "abort()",
>>> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> > const*)+0x1a9) [0x564dc5f87d0b]",
>>> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>>> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>>> > const&)+0x15e) [0x564dc6604a9e]",
>>> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>>> unsigned
>>> > long)+0x77d) [0x564dc66951cd]",
>>> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>>> > [0x564dc6695670]",
>>> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>>> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>>> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>>> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>>> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>>> > [0x564dc6c761c2]",
>>> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
>>> [0x564dc6c77808]",
>>> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>>> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>>> > long)+0x309) [0x564dc6b780c9]",
>>> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>>> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
>>> unsigned
>>> > long, bool, unsigned long*, unsigned long,

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-19 Thread Zakhar Kirpichenko
Igor, I noticed that there's no roadmap for the next 16.2.x release. May I
ask what time frame we are looking at with regards to a possible fix?

We're experiencing several OSD crashes caused by this issue per day.

/Z

On Mon, 16 Oct 2023 at 14:19, Igor Fedotov  wrote:

> That's true.
> On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
>
> Many thanks, Igor. I found previously submitted bug reports and subscribed
> to them. My understanding is that the issue is going to be fixed in the
> next Pacific minor release.
>
> /Z
>
> On Mon, 16 Oct 2023 at 14:03, Igor Fedotov  wrote:
>
>> Hi Zakhar,
>>
>> please see my reply for the post on the similar issue at:
>>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
>> > Hi,
>> >
>> > After upgrading to Ceph 16.2.14 we had several OSD crashes
>> > in bstore_kv_sync thread:
>> >
>> >
>> > 1. "assert_thread_name": "bstore_kv_sync",
>> > 2. "backtrace": [
>> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>> > 4. "gsignal()",
>> > 5. "abort()",
>> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x1a9) [0x564dc5f87d0b]",
>> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>> > const&)+0x15e) [0x564dc6604a9e]",
>> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>> unsigned
>> > long)+0x77d) [0x564dc66951cd]",
>> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>> > [0x564dc6695670]",
>> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>> > [0x564dc6c761c2]",
>> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
>> [0x564dc6c77808]",
>> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>> > long)+0x309) [0x564dc6b780c9]",
>> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
>> unsigned
>> > long, bool, unsigned long*, unsigned long,
>> > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>> > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>> > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>> > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>> > std::shared_ptr)+0x84)
>> [0x564dc6b1f644]",
>> > 20.
>> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>> > [0x564dc6b2004a]",
>> > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>> > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>> > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>> > 24. "clone()"
>> > 25. ],
>> >
>> >
>> > I am attaching two instances of crash info for further reference:
>> > https://pastebin.com/E6myaHNU
>> >
>> > OSD configuration is rather simple and close to default:
>> >
>> > osd.6 dev   bluestore_cache_size_hdd4294967296
>> >osd.6 dev
>> > bluestore_cache_size_ssd4294967296
>> >osd   advanced  debug_rocksdb
>> >1/5
>>  osd
>> >  advanced  osd_max_backfills   2
>> >  osd   basic
>> > osd_memory_target   17179869184
>> >  osd   advanced  osd_recovery_max_active
>> >  2 osd
>> >  advanced  osd_scrub_sleep 0.10
>> >osd   advanced
>> >   rbd_balance_parent_readsfalse
>> >
>> > debug_rocksdb is a recent change, otherwise this configuration has been
>> > running without issues for months. The crashes happened on two different
>> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
>> > block) don't exhibit any issues. We have not experienced such crashes
>> with
>> > Ceph < 16.2.14.
>> >
>> > Is this a known issue, or should I open a bug report?
>> >
>> > Best regards,
>> > Zakhar
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: pgmap updated every few seconds for no apparent reason

2023-10-19 Thread Zakhar Kirpichenko
Thanks, Eugen. This is a useful setting.

/Z

On Thu, 19 Oct 2023 at 10:43, Eugen Block  wrote:

> Hi,
>
> you can change the report interval with this config option (default 2
> seconds):
>
> $ ceph config get mgr mgr_tick_period
> 2
>
> $ ceph config set mgr mgr_tick_period 10
>
> Regards,
> Eugen
>
> Zitat von Chris Palmer :
>
> > I have just checked 2 quincy 17.2.6 clusters, and I see exactly the
> > same. The pgmap version is bumping every two seconds (which ties in
> > with the frequency you observed). Both clusters are healthy with
> > nothing apart from client IO happening.
> >
> > On 13/10/2023 12:09, Zakhar Kirpichenko wrote:
> >> Hi,
> >>
> >> I am investigating excessive mon writes in our cluster and wondering
> >> whether excessive pgmap updates could be the culprit. Basically pgmap is
> >> updated every few seconds, sometimes over ten times per minute, in a
> >> healthy cluster with no OSD and/or PG changes:
> >>
> >> Oct 13 11:03:03 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:01.515438+
> >> mgr.ceph01.vankui (mgr.336635131) 838252 : cluster [DBG] pgmap v606575:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 60 MiB/s rd, 109 MiB/s wr, 5.65k op/s
> >> Oct 13 11:03:04 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:03.520953+
> >> mgr.ceph01.vankui (mgr.336635131) 838253 : cluster [DBG] pgmap v606576:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 128 MiB/s wr, 5.76k op/s
> >> Oct 13 11:03:06 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:05.524474+
> >> mgr.ceph01.vankui (mgr.336635131) 838255 : cluster [DBG] pgmap v606577:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 122 MiB/s wr, 5.57k op/s
> >> Oct 13 11:03:08 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:07.530484+
> >> mgr.ceph01.vankui (mgr.336635131) 838256 : cluster [DBG] pgmap v606578:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 79 MiB/s rd, 127 MiB/s wr, 6.62k op/s
> >> Oct 13 11:03:10 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:09.57+
> >> mgr.ceph01.vankui (mgr.336635131) 838258 : cluster [DBG] pgmap v606579:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 66 MiB/s rd, 104 MiB/s wr, 5.38k op/s
> >> Oct 13 11:03:12 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:11.537908+
> >> mgr.ceph01.vankui (mgr.336635131) 838259 : cluster [DBG] pgmap v606580:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 85 MiB/s rd, 121 MiB/s wr, 6.43k op/s
> >> Oct 13 11:03:13 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:13.543490+
> >> mgr.ceph01.vankui (mgr.336635131) 838260 : cluster [DBG] pgmap v606581:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 78 MiB/s rd, 127 MiB/s wr, 6.54k op/s
> >> Oct 13 11:03:16 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:15.547122+
> >> mgr.ceph01.vankui (mgr.336635131) 838262 : cluster [DBG] pgmap v606582:
> >> 2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB
> data, 61
> >> TiB used, 716 TiB / 777 TiB avail; 71 MiB/s rd, 122 MiB/s wr, 6.08k op/s
> >> Oct 13 11:03:18 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:17.553180+
> >> mgr.ceph01.vankui (mgr.336635131) 838263 : cluster [DBG] pgmap v606583:
> >> 2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
> >> active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 75
> MiB/s
> >> rd, 176 MiB/s wr, 6.83k op/s
> >> Oct 13 11:03:20 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:19.555960+
> >> mgr.ceph01.vankui (mgr.336635131) 838264 : cluster [DBG] pgmap v606584:
> >> 2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
> >> active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 58
> MiB/s
> >> rd, 161 MiB/s wr, 5.55k op/s
> >> Oct 13 11:03:22 ceph03 bash[4019]: cluster
> 2023-10-13T11:03:21.560597+
> >> mgr.ceph01.vankui (mgr.336635131) 838266 : cluster [DBG] pgmap v606585:
> >> 2400 pg

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-18 Thread Zakhar Kirpichenko
Frank,

The only changes in ceph.conf are just the compression settings, most of
the cluster configuration is in the monitor database thus my ceph.conf is
rather short:

---
[global]
fsid = xxx
mon_host = [list of mons]

[mon.yyy]
public network = a.b.c.d/e
mon_rocksdb_options =
"write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression"
---

Note that my bottommost_compression choice is LZ4HC, whose compression is
better than LZ4 at the expense of higher CPU usage. My nodes have lots of
CPU to spare, so I went for LZ4HC for better space savings and a lower
amount of writes. In general, I would recommend trying a faster and less
intense compression first, LZ4 across the board is a good starting choice.

/Z

On Wed, 18 Oct 2023 at 12:02, Frank Schilder  wrote:

> Hi Zakhar,
>
> since its a bit beyond of the scope of basic, could you please post the
> complete ceph.conf config section for these changes for reference?
>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ____________
> From: Zakhar Kirpichenko 
> Sent: Wednesday, October 18, 2023 6:14 AM
> To: Eugen Block
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes
>
> Many thanks for this, Eugen! I very much appreciate yours and Mykola's
> efforts and insight!
>
> Another thing I noticed was a reduction of RocksDB store after the
> reduction of the total PG number by 30%, from 590-600 MB:
>
> 65M 3675511.sst
> 65M 3675512.sst
> 65M 3675513.sst
> 65M 3675514.sst
> 65M 3675515.sst
> 65M 3675516.sst
> 65M 3675517.sst
> 65M 3675518.sst
> 62M 3675519.sst
>
> to about half of the original size:
>
> -rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
> -rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
> -rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
> -rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst
>
> Then when I restarted the monitors one by one before adding compression,
> RocksDB store reduced even further. I am not sure why and what exactly got
> automatically removed from the store:
>
> -rw-r--r-- 1 167 167   841960 Oct 18 03:31 018779.log
> -rw-r--r-- 1 167 167 67290532 Oct 18 03:31 018781.sst
> -rw-r--r-- 1 167 167 53287626 Oct 18 03:31 018782.sst
>
> Then I have enabled LZ4 and LZ4HC compression in our small production
> cluster (6 nodes, 96 OSDs) on 3 out of 5
> monitors:
> compression=kLZ4Compression,bottommost_compression=kLZ4HCCompression.
> I specifically went for LZ4 and LZ4HC because of the balance between
> compression/decompression speed and impact on CPU usage. The compression
> doesn't seem to affect the cluster in any negative way, the 3 monitors with
> compression are operating normally. The effect of the compression on
> RocksDB store size and disk writes is quite noticeable:
>
> Compression disabled, 155 MB store.db, ~125 MB RocksDB sst, and ~530 MB
> writes over 5 minutes:
>
> -rw-r--r-- 1 167 167  4227337 Oct 18 03:58 3080868.log
> -rw-r--r-- 1 167 167 67253592 Oct 18 03:57 3080870.sst
> -rw-r--r-- 1 167 167 57783180 Oct 18 03:57 3080871.sst
>
> # du -hs
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/;
> iotop -ao -bn 2 -d 300 2>&1 | grep ceph-mon
> 155M
>  /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
> 2471602 be/4 167   6.05 M473.24 M  0.00 %  0.16 % ceph-mon -n
> mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix=debug
>  --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-stderr=true [rocksdb:low0]
> 2471633 be/4 167 188.00 K 40.91 M  0.00 %  0.02 % ceph-mon -n
> mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix=debug
>  --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-stderr=true [ms_dispatch]
> 2471603 be/4 167  16.00 K 24.16 M  0.00 %  0.01 % ceph-mon -n
> mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix=debug
>  --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-stderr=true [rocksdb:high0]
>
> Compression enabled, 60 MB store.db, ~23 MB RocksDB sst, and ~130 MB of
> writes over 5 minutes:
>
> -rw-r--r-- 1 167 167  5766659 Oct 18 03:56 3723355.log
> -rw-r--r-- 1 167 167 22240390 Oct 18 03:56 3723357.sst
>
> # du -hs
> /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-17 Thread Zakhar Kirpichenko
should result in lower but much faster compression.

I hope this helps. My plan is to keep the monitors with the current
settings, i.e. 3 with compression + 2 without compression, until the next
minor release of Pacific to see whether the monitors with compressed
RocksDB store can be upgraded without issues.

/Z


On Tue, 17 Oct 2023 at 23:45, Eugen Block  wrote:

> Hi Zakhar,
>
> I took a closer look into what the MONs really do (again with Mykola's
> help) and why manual compaction is triggered so frequently. With
> debug_paxos=20 I noticed that paxosservice and paxos triggered manual
> compactions. So I played with these values:
>
> paxos_service_trim_max = 1000 (default 500)
> paxos_service_trim_min = 500 (default 250)
> paxos_trim_max = 1000 (default 500)
> paxos_trim_min = 500 (default 250)
>
> This reduced the amount of writes by a factor of 3 or 4, the iotop
> values are fluctuating a bit, of course. As Mykola suggested I created
> a tracker issue [1] to increase the default values since they don't
> seem suitable for a production environment. Although I don't have
> tested that in production yet I'll ask one of our customers to do that
> in their secondary cluster (for rbd mirroring) where they also suffer
> from large mon stores and heavy writes to the mon store. Your findings
> with the compaction were quite helpful as well, we'll test that as well.
> Igor mentioned that the default bluestore_rocksdb config for OSDs will
> enable compression because of positive test results. If we can confirm
> that compression works well for MONs too, compression could be enabled
> by default as well.
>
> Regards,
> Eugen
>
> https://tracker.ceph.com/issues/63229
>
> Zitat von Zakhar Kirpichenko :
>
> > With the help of community members, I managed to enable RocksDB
> compression
> > for a test monitor, and it seems to be working well.
> >
> > Monitor w/o compression writes about 750 MB to disk in 5 minutes:
> >
> >4854 be/4 167   4.97 M755.02 M  0.00 %  0.24 % ceph-mon -n
> > mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> > --default-log-to-stderr=true --default-log-stderr-prefix=debug
> >  --default-mon-cluster-log-to-file=false
> > --default-mon-cluster-log-to-stderr=true [rocksdb:low0]
> >
> > Monitor with LZ4 compression writes about 1/4 of that over the same time
> > period:
> >
> > 2034728 be/4 167 172.00 K199.27 M  0.00 %  0.06 % ceph-mon -n
> > mon.ceph05 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> > --default-log-to-stderr=true --default-log-stderr-prefix=debug
> >  --default-mon-cluster-log-to-file=false
> > --default-mon-cluster-log-to-stderr=true [rocksdb:low0]
> >
> > This is caused by the apparent difference in store.db sizes.
> >
> > Mon store.db w/o compression:
> >
> > # ls -al
> > /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db
> > total 257196
> > drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
> > drwx-- 3 167 167 4096 Aug 31 05:22 ..
> > -rw-r--r-- 1 167 167  1517623 Oct 16 14:00 3073035.log
> > -rw-r--r-- 1 167 167 67285944 Oct 16 14:00 3073037.sst
> > -rw-r--r-- 1 167 167 67402325 Oct 16 14:00 3073038.sst
> > -rw-r--r-- 1 167 167 62364991 Oct 16 14:00 3073039.sst
> >
> > Mon store.db with compression:
> >
> > # ls -al
> > /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/store.db
> > total 91188
> > drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
> > drwx-- 3 167 167 4096 Oct 16 13:35 ..
> > -rw-r--r-- 1 167 167  1760114 Oct 16 14:00 012693.log
> > -rw-r--r-- 1 167 167 52236087 Oct 16 14:00 012695.sst
> >
> > There are no apparent downsides thus far. If everything works well, I
> will
> > try adding compression to other monitors.
> >
> > /Z
> >
> > On Mon, 16 Oct 2023 at 14:57, Zakhar Kirpichenko 
> wrote:
> >
> >> The issue persists, although to a lesser extent. Any comments from the
> >> Ceph team please?
> >>
> >> /Z
> >>
> >> On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko 
> wrote:
> >>
> >>> > Some of it is transferable to RocksDB on mons nonetheless.
> >>>
> >>> Please point me to relevant Ceph documentation, i.e. a description of
> how
> >>> various Ceph monitor and RocksDB tunables affect the operations of
> >>> monitors, I'll gladly look into it.
> >>>
> >>> > Please point me to such recommendations, if they're on docs.ceph.com
> I'll
> >>> get them updated.
> >>>
&g

[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-17 Thread Zakhar Kirpichenko
Thanks for this, Eugen. I think I'll stick to adding the option to the
config file, it seems like a safer way to do it.

/Z

On Tue, 17 Oct 2023, 15:21 Eugen Block,  wrote:

> Hi,
>
> I managed to get the compression setting into the MONs by using the
> extra-entrypoint-arguments [1]:
>
> ceph01:~ # cat mon-specs.yaml
> service_type: mon
> placement:
>hosts:
>- ceph01
>- ceph02
>- ceph03
> extra_entrypoint_args:
>-
>
> '--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
>
> Just note that if you make a mistake and run 'ceph orch apply -i
> mon-specs.yaml' with incorrect options your MON containers will all
> fail. So test that in a non-critical environment first. In case the
> daemons fail to start you can remove those options from the unit.run
> file and get them up again.
> But for me that worked and the daemons have the compression setting
> enabled now. What remains unclear is which config options can be
> changed as usual with the config database and which require this
> extra-entrypoint-argument.
>
> Thanks again, Mykola!
> Eugen
>
> [1]
>
> https://docs.ceph.com/en/quincy/cephadm/services/#extra-entrypoint-arguments
>
> Zitat von Zakhar Kirpichenko :
>
> > Thanks for the suggestion, Josh!
> >
> >  That part is relatively simple: the container gets ceph.conf from the
> > host's filesystem, for example:
> >
> > "HostConfig": {
> > "Binds": [
> > "/dev:/dev",
> > "/run/udev:/run/udev",
> >
> > "/var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z",
> >
> > "/var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z",
> >
> >
> "/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z",
> >
> >
> "/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05:/var/lib/ceph/mon/ceph-ceph05:z",
> >
> >
> "/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/config:/etc/ceph/ceph.conf:z"
> > ],
> >
> > When I stop the monitor, edit the file directly and restart the monitor,
> > mon_rocksdb_options seem to be applied correctly!
> >
> > Unfortunately, I specify global mon_rocksdb_options and redeploy the
> > monitor, the new ceph.conf doesn't have mon_rocksdb_options at all. I am
> > not sure that this is a reliable way to enable compression, but it works
> -
> > so it's better than other ways which don't work :-)
> >
> > /Z
> >
> > On Mon, 16 Oct 2023 at 16:16, Josh Baergen 
> > wrote:
> >
> >> > the resulting ceph.conf inside the monitor container doesn't have
> >> mon_rocksdb_options
> >>
> >> I don't know where this particular ceph.conf copy comes from, but I
> >> still suspect that this is where this particular option needs to be
> >> set. The reason I think this is that rocksdb mount options are needed
> >> _before_ the mon is able to access any of the centralized conf data,
> >> which I believe is itself stored in rocksdb.
> >>
> >> Josh
> >>
> >> On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko 
> >> wrote:
> >> >
> >> > Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf.
> >> This didn't work either: ceph.conf gets overridden at monitor start, the
> >> resulting ceph.conf inside the monitor container doesn't have
> >> mon_rocksdb_options, the monitor starts with no RocksDB compression.
> >> >
> >> > I would appreciate it if someone from the Ceph team could please chip
> in
> >> and suggest a working way to enable RocksDB compression in Ceph
> monitors.
> >> >
> >> > /Z
> >> >
> >> > On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko 
> >> wrote:
> >> >>
> >> >> Thanks for your response, Josh. Our ceph.conf doesn't have anything
> but
> >> the mon addresses, modern Ceph versions store their configuration in the
> >> monitor configuration database.
> >> >>
> >> >> This works rather well for various Ceph components, including the
> >> monitors. RocksDB options are also applied to monitors correctly, but
> for
> >> some reason are being ignored.
> >> >>
> >> >> /Z
> >> >>
> &

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-16 Thread Zakhar Kirpichenko
With the help of community members, I managed to enable RocksDB compression
for a test monitor, and it seems to be working well.

Monitor w/o compression writes about 750 MB to disk in 5 minutes:

   4854 be/4 167   4.97 M755.02 M  0.00 %  0.24 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

Monitor with LZ4 compression writes about 1/4 of that over the same time
period:

2034728 be/4 167 172.00 K199.27 M  0.00 %  0.06 % ceph-mon -n
mon.ceph05 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

This is caused by the apparent difference in store.db sizes.

Mon store.db w/o compression:

# ls -al
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db
total 257196
drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
drwx-- 3 167 167 4096 Aug 31 05:22 ..
-rw-r--r-- 1 167 167  1517623 Oct 16 14:00 3073035.log
-rw-r--r-- 1 167 167 67285944 Oct 16 14:00 3073037.sst
-rw-r--r-- 1 167 167 67402325 Oct 16 14:00 3073038.sst
-rw-r--r-- 1 167 167 62364991 Oct 16 14:00 3073039.sst

Mon store.db with compression:

# ls -al
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/store.db
total 91188
drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
drwx-- 3 167 167 4096 Oct 16 13:35 ..
-rw-r--r-- 1 167 167  1760114 Oct 16 14:00 012693.log
-rw-r--r-- 1 167 167 52236087 Oct 16 14:00 012695.sst

There are no apparent downsides thus far. If everything works well, I will
try adding compression to other monitors.

/Z

On Mon, 16 Oct 2023 at 14:57, Zakhar Kirpichenko  wrote:

> The issue persists, although to a lesser extent. Any comments from the
> Ceph team please?
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko  wrote:
>
>> > Some of it is transferable to RocksDB on mons nonetheless.
>>
>> Please point me to relevant Ceph documentation, i.e. a description of how
>> various Ceph monitor and RocksDB tunables affect the operations of
>> monitors, I'll gladly look into it.
>>
>> > Please point me to such recommendations, if they're on docs.ceph.com I'll
>> get them updated.
>>
>> This are the recommendations we used when we built our Pacific cluster:
>> https://docs.ceph.com/en/pacific/start/hardware-recommendations/
>>
>> Our drives are 4x times larger than recommended by this guide. The drives
>> are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
>> storage of rarely modified files. It is not documented or suggested
>> anywhere that monitor processes write several hundred gigabytes of data per
>> day, exceeding the amount of data written by OSDs. Which is why I am not
>> convinced that what we're observing is expected behavior, but it's not easy
>> to get a definitive answer from the Ceph community.
>>
>> /Z
>>
>> On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri 
>> wrote:
>>
>>> Some of it is transferable to RocksDB on mons nonetheless.
>>>
>>> but their specs exceed Ceph hardware recommendations by a good margin
>>>
>>>
>>> Please point me to such recommendations, if they're on docs.ceph.com I'll
>>> get them updated.
>>>
>>> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko  wrote:
>>>
>>> Thank you, Anthony. As I explained to you earlier, the article you had
>>> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
>>> not with OSDs but rather monitors and their RocksDB store. Indeed, the
>>> drives are not enterprise-grade, but their specs exceed Ceph hardware
>>> recommendations by a good margin, they're being used as boot drives only
>>> and aren't supposed to be written to continuously at high rates - which is
>>> what unfortunately is happening. I am trying to determine why it is
>>> happening and how the issue can be alleviated or resolved, unfortunately
>>> monitor RocksDB usage and tunables appear to be not documented at all.
>>>
>>> /Z
>>>
>>> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri 
>>> wrote:
>>>
>>>> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
>>>> Reef you would experience fewer writes.  Universal compaction might also
>>>> help, but in the end this SSD is a client SKU and really not suited for
>>>> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
>>>> could change the overprovisioning on the ones you have.
>>>>
>>>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko  wrote:
>>>>
>>>> I would very much appreciate it if someone with a better understanding
>>>> of
>>>> monitor internals and use of RocksDB could please chip in.
>>>>
>>>>
>>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-16 Thread Zakhar Kirpichenko
Thanks for the suggestion, Josh!

 That part is relatively simple: the container gets ceph.conf from the
host's filesystem, for example:

"HostConfig": {
"Binds": [
"/dev:/dev",
"/run/udev:/run/udev",

"/var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z",

"/var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05:/var/lib/ceph/mon/ceph-ceph05:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/config:/etc/ceph/ceph.conf:z"
],

When I stop the monitor, edit the file directly and restart the monitor,
mon_rocksdb_options seem to be applied correctly!

Unfortunately, I specify global mon_rocksdb_options and redeploy the
monitor, the new ceph.conf doesn't have mon_rocksdb_options at all. I am
not sure that this is a reliable way to enable compression, but it works -
so it's better than other ways which don't work :-)

/Z

On Mon, 16 Oct 2023 at 16:16, Josh Baergen 
wrote:

> > the resulting ceph.conf inside the monitor container doesn't have
> mon_rocksdb_options
>
> I don't know where this particular ceph.conf copy comes from, but I
> still suspect that this is where this particular option needs to be
> set. The reason I think this is that rocksdb mount options are needed
> _before_ the mon is able to access any of the centralized conf data,
> which I believe is itself stored in rocksdb.
>
> Josh
>
> On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko 
> wrote:
> >
> > Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf.
> This didn't work either: ceph.conf gets overridden at monitor start, the
> resulting ceph.conf inside the monitor container doesn't have
> mon_rocksdb_options, the monitor starts with no RocksDB compression.
> >
> > I would appreciate it if someone from the Ceph team could please chip in
> and suggest a working way to enable RocksDB compression in Ceph monitors.
> >
> > /Z
> >
> > On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko 
> wrote:
> >>
> >> Thanks for your response, Josh. Our ceph.conf doesn't have anything but
> the mon addresses, modern Ceph versions store their configuration in the
> monitor configuration database.
> >>
> >> This works rather well for various Ceph components, including the
> monitors. RocksDB options are also applied to monitors correctly, but for
> some reason are being ignored.
> >>
> >> /Z
> >>
> >> On Sat, 14 Oct 2023, 17:40 Josh Baergen, 
> wrote:
> >>>
> >>> Apologies if you tried this already and I missed it - have you tried
> >>> configuring that setting in /etc/ceph/ceph.conf (or wherever your conf
> >>> file is) instead of via 'ceph config'? I wonder if mon settings like
> >>> this one won't actually apply the way you want because they're needed
> >>> before the mon has the ability to obtain configuration from,
> >>> effectively, itself.
> >>>
> >>> Josh
> >>>
> >>> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko 
> wrote:
> >>> >
> >>> > I also tried setting RocksDB compression options and deploying a new
> >>> > monitor. The monitor started with no RocksDB compression again.
> >>> >
> >>> > Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at
> mon
> >>> > start and at mon deploy. How can I enable RocksDB compression in Ceph
> >>> > monitors?
> >>> >
> >>> > Any input from anyone, please?
> >>> >
> >>> > /Z
> >>> >
> >>> > On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko 
> wrote:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > I'm still trying to fight large Ceph monitor writes. One option I
> >>> > > considered is enabling RocksDB compression, as our nodes have more
> than
> >>> > > sufficient RAM and CPU. Unfortunately, monitors seem to completely
> ignore
> >>> > > the compression setting:
> >>> > >
> >>> > > I tried:
> >>> > >
> >>> > > - setting ceph config set mon.ceph05 mon_rocksdb_options
> >>> > >
> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
> >>> > > res

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-16 Thread Zakhar Kirpichenko
The issue persists, although to a lesser extent. Any comments from the Ceph
team please?

/Z

On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko  wrote:

> > Some of it is transferable to RocksDB on mons nonetheless.
>
> Please point me to relevant Ceph documentation, i.e. a description of how
> various Ceph monitor and RocksDB tunables affect the operations of
> monitors, I'll gladly look into it.
>
> > Please point me to such recommendations, if they're on docs.ceph.com I'll
> get them updated.
>
> This are the recommendations we used when we built our Pacific cluster:
> https://docs.ceph.com/en/pacific/start/hardware-recommendations/
>
> Our drives are 4x times larger than recommended by this guide. The drives
> are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
> storage of rarely modified files. It is not documented or suggested
> anywhere that monitor processes write several hundred gigabytes of data per
> day, exceeding the amount of data written by OSDs. Which is why I am not
> convinced that what we're observing is expected behavior, but it's not easy
> to get a definitive answer from the Ceph community.
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri 
> wrote:
>
>> Some of it is transferable to RocksDB on mons nonetheless.
>>
>> but their specs exceed Ceph hardware recommendations by a good margin
>>
>>
>> Please point me to such recommendations, if they're on docs.ceph.com I'll
>> get them updated.
>>
>> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko  wrote:
>>
>> Thank you, Anthony. As I explained to you earlier, the article you had
>> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
>> not with OSDs but rather monitors and their RocksDB store. Indeed, the
>> drives are not enterprise-grade, but their specs exceed Ceph hardware
>> recommendations by a good margin, they're being used as boot drives only
>> and aren't supposed to be written to continuously at high rates - which is
>> what unfortunately is happening. I am trying to determine why it is
>> happening and how the issue can be alleviated or resolved, unfortunately
>> monitor RocksDB usage and tunables appear to be not documented at all.
>>
>> /Z
>>
>> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri 
>> wrote:
>>
>>> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
>>> Reef you would experience fewer writes.  Universal compaction might also
>>> help, but in the end this SSD is a client SKU and really not suited for
>>> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
>>> could change the overprovisioning on the ones you have.
>>>
>>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko  wrote:
>>>
>>> I would very much appreciate it if someone with a better understanding of
>>> monitor internals and use of RocksDB could please chip in.
>>>
>>>
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Many thanks, Igor. I found previously submitted bug reports and subscribed
to them. My understanding is that the issue is going to be fixed in the
next Pacific minor release.

/Z

On Mon, 16 Oct 2023 at 14:03, Igor Fedotov  wrote:

> Hi Zakhar,
>
> please see my reply for the post on the similar issue at:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>
>
> Thanks,
>
> Igor
>
> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
> > Hi,
> >
> > After upgrading to Ceph 16.2.14 we had several OSD crashes
> > in bstore_kv_sync thread:
> >
> >
> > 1. "assert_thread_name": "bstore_kv_sync",
> > 2. "backtrace": [
> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
> > 4. "gsignal()",
> > 5. "abort()",
> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x1a9) [0x564dc5f87d0b]",
> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
> > const&)+0x15e) [0x564dc6604a9e]",
> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
> unsigned
> > long)+0x77d) [0x564dc66951cd]",
> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
> > [0x564dc6695670]",
> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
> > [0x564dc6c761c2]",
> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
> [0x564dc6c77808]",
> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
> > long)+0x309) [0x564dc6b780c9]",
> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
> unsigned
> > long, bool, unsigned long*, unsigned long,
> > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
> > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
> > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
> > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
> > std::shared_ptr)+0x84)
> [0x564dc6b1f644]",
> > 20.
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
> > [0x564dc6b2004a]",
> > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
> > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
> > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
> > 24. "clone()"
> > 25. ],
> >
> >
> > I am attaching two instances of crash info for further reference:
> > https://pastebin.com/E6myaHNU
> >
> > OSD configuration is rather simple and close to default:
> >
> > osd.6 dev   bluestore_cache_size_hdd4294967296
> >osd.6 dev
> > bluestore_cache_size_ssd4294967296
> >osd   advanced  debug_rocksdb
> >1/5
>  osd
> >  advanced  osd_max_backfills   2
> >  osd   basic
> > osd_memory_target   17179869184
> >  osd   advanced  osd_recovery_max_active
> >  2 osd
> >  advanced  osd_scrub_sleep 0.10
> >osd   advanced
> >   rbd_balance_parent_readsfalse
> >
> > debug_rocksdb is a recent change, otherwise this configuration has been
> > running without issues for months. The crashes happened on two different
> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
> > block) don't exhibit any issues. We have not experienced such crashes
> with
> > Ceph < 16.2.14.
> >
> > Is this a known issue, or should I open a bug report?
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Unfortunately, the OSD log from the earlier crash is not available. I have
extracted the OSD log, including the recent events, from the latest crash:
https://www.dropbox.com/scl/fi/1ne8h85iuc5vx78qm1t93/20231016_osd6.zip?rlkey=fxyn242q7c69ec5lkv29csx13=0
I hope this helps to identify the crash reason.

The log entries that I find suspicious are these right before the crash:

debug  -1726> 2023-10-15T22:31:21.575+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17024319488 unmapped: 4164763648
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1723> 2023-10-15T22:31:22.579+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17024589824 unmapped: 4164493312
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1718> 2023-10-15T22:31:23.579+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17027031040 unmapped: 4162052096
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1714> 2023-10-15T22:31:24.579+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17026301952 unmapped: 4162781184
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
debug  -1713> 2023-10-15T22:31:25.383+ 7f961ccb8700  5
bluestore.MempoolThread(0x55c5bee8cb98) _resize_shards cache_size:
13797582406 kv_alloc: 8321499136 kv_used: 8245313424 kv_onode_alloc:
4697620480 kv_onode_used: 4690617424 meta_alloc: 469762048 meta_used:
371122625 data_alloc: 134217728 data_used: 44314624
...
debug  -1710> 2023-10-15T22:31:25.583+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17026367488 unmapped: 4162715648
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1707> 2023-10-15T22:31:26.583+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17026211840 unmapped: 4162871296
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1704> 2023-10-15T22:31:27.583+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17024548864 unmapped: 4164534272
heap: 21189083136 old mem: 13797582406 new mem: 13797582406

There's plenty of RAM in the system, about 120 GB free and used for cache.

/Z

On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko  wrote:

> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread:
>
>
>1. "assert_thread_name": "bstore_kv_sync",
>2. "backtrace": [
>3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>4. "gsignal()",
>5. "abort()",
>6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>const*)+0x1a9) [0x564dc5f87d0b]",
>7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>const&)+0x15e) [0x564dc6604a9e]",
>9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>unsigned long)+0x77d) [0x564dc66951cd]",
>10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>[0x564dc6695670]",
>11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>[0x564dc6c761c2]",
>15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
>16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>long)+0x309) [0x564dc6b780c9]",
>17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
>long, bool, unsigned long*, unsigned long,
>rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>std::shared_ptr)+0x84) [0x564dc6b1f644]",
>20. 
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>[0x564dc6b2004a]",
>21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>24. "clone()"
>25. ],
>
>
> I am attaching two instances of crash info for furth

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Not sure how it managed to screw up formatting, OSD configuration in a more
readable form: https://pastebin.com/mrC6UdzN

/Z

On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko  wrote:

> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread:
>
>
>1. "assert_thread_name": "bstore_kv_sync",
>2. "backtrace": [
>3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>4. "gsignal()",
>5. "abort()",
>6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>const*)+0x1a9) [0x564dc5f87d0b]",
>7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>const&)+0x15e) [0x564dc6604a9e]",
>9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>unsigned long)+0x77d) [0x564dc66951cd]",
>10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>[0x564dc6695670]",
>11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>[0x564dc6c761c2]",
>15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
>16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>long)+0x309) [0x564dc6b780c9]",
>17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
>long, bool, unsigned long*, unsigned long,
>rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>std::shared_ptr)+0x84) [0x564dc6b1f644]",
>20. 
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>[0x564dc6b2004a]",
>21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>24. "clone()"
>25. ],
>
>
> I am attaching two instances of crash info for further reference:
> https://pastebin.com/E6myaHNU
>
> OSD configuration is rather simple and close to default:
>
> osd.6 dev   bluestore_cache_size_hdd4294967296
>   osd.6 dev
> bluestore_cache_size_ssd4294967296
>   osd   advanced  debug_rocksdb
>   1/5 osd
> advanced  osd_max_backfills   2
> osd   basic
> osd_memory_target   17179869184
> osd   advanced  osd_recovery_max_active
> 2 osd
> advanced  osd_scrub_sleep 0.10
>   osd   advanced
>  rbd_balance_parent_readsfalse
>
> debug_rocksdb is a recent change, otherwise this configuration has been
> running without issues for months. The crashes happened on two different
> hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
> block) don't exhibit any issues. We have not experienced such crashes with
> Ceph < 16.2.14.
>
> Is this a known issue, or should I open a bug report?
>
> Best regards,
> Zakhar
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Hi,

After upgrading to Ceph 16.2.14 we had several OSD crashes
in bstore_kv_sync thread:


   1. "assert_thread_name": "bstore_kv_sync",
   2. "backtrace": [
   3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
   4. "gsignal()",
   5. "abort()",
   6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
   const*)+0x1a9) [0x564dc5f87d0b]",
   7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
   8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
   const&)+0x15e) [0x564dc6604a9e]",
   9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned
   long)+0x77d) [0x564dc66951cd]",
   10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
   [0x564dc6695670]",
   11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
   12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
   13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
   const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
   14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
   [0x564dc6c761c2]",
   15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
   16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
   const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
   long)+0x309) [0x564dc6b780c9]",
   17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
   rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
   long, bool, unsigned long*, unsigned long,
   rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
   18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
   rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
   19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
   std::shared_ptr)+0x84) [0x564dc6b1f644]",
   20. 
"(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
   [0x564dc6b2004a]",
   21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
   22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
   23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
   24. "clone()"
   25. ],


I am attaching two instances of crash info for further reference:
https://pastebin.com/E6myaHNU

OSD configuration is rather simple and close to default:

osd.6 dev   bluestore_cache_size_hdd4294967296
  osd.6 dev
bluestore_cache_size_ssd4294967296
  osd   advanced  debug_rocksdb
  1/5 osd
advanced  osd_max_backfills   2
osd   basic
osd_memory_target   17179869184
osd   advanced  osd_recovery_max_active
2 osd
advanced  osd_scrub_sleep 0.10
  osd   advanced
 rbd_balance_parent_readsfalse

debug_rocksdb is a recent change, otherwise this configuration has been
running without issues for months. The crashes happened on two different
hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
block) don't exhibit any issues. We have not experienced such crashes with
Ceph < 16.2.14.

Is this a known issue, or should I open a bug report?

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-15 Thread Zakhar Kirpichenko
Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf. This
didn't work either: ceph.conf gets overridden at monitor start, the
resulting ceph.conf inside the monitor container doesn't have
mon_rocksdb_options, the monitor starts with no RocksDB compression.

I would appreciate it if someone from the Ceph team could please chip in
and suggest a working way to enable RocksDB compression in Ceph monitors.

/Z

On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko  wrote:

> Thanks for your response, Josh. Our ceph.conf doesn't have anything but
> the mon addresses, modern Ceph versions store their configuration in the
> monitor configuration database.
>
> This works rather well for various Ceph components, including the
> monitors. RocksDB options are also applied to monitors correctly, but for
> some reason are being ignored.
>
> /Z
>
> On Sat, 14 Oct 2023, 17:40 Josh Baergen, 
> wrote:
>
>> Apologies if you tried this already and I missed it - have you tried
>> configuring that setting in /etc/ceph/ceph.conf (or wherever your conf
>> file is) instead of via 'ceph config'? I wonder if mon settings like
>> this one won't actually apply the way you want because they're needed
>> before the mon has the ability to obtain configuration from,
>> effectively, itself.
>>
>> Josh
>>
>> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko 
>> wrote:
>> >
>> > I also tried setting RocksDB compression options and deploying a new
>> > monitor. The monitor started with no RocksDB compression again.
>> >
>> > Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at mon
>> > start and at mon deploy. How can I enable RocksDB compression in Ceph
>> > monitors?
>> >
>> > Any input from anyone, please?
>> >
>> > /Z
>> >
>> > On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm still trying to fight large Ceph monitor writes. One option I
>> > > considered is enabling RocksDB compression, as our nodes have more
>> than
>> > > sufficient RAM and CPU. Unfortunately, monitors seem to completely
>> ignore
>> > > the compression setting:
>> > >
>> > > I tried:
>> > >
>> > > - setting ceph config set mon.ceph05 mon_rocksdb_options
>> > >
>> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
>> > > restarting the test monitor. The monitor started with no RocksDB
>> > > compression:
>> > >
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> Compression
>> > > algorithms supported:
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kZSTDNotFinalCompression supported: 0
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kXpressCompression supported: 0
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kLZ4HCCompression supported: 1
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kLZ4Compression supported: 1
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kBZip2Compression supported: 0
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kZlibCompression supported: 1
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > > kSnappyCompression supported: 1
>> > > ...
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > >  Options.compression: NoCompression
>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>> > >Options.bottommost_compression: Disabled
>> > >
>> > > - setting ceph config set mon mon_rocksdb_options
>> > >
>> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
>> > > restarting the test monitor. The monitor started with no RocksDB
>> > > compression, the same way as above.
>> > >
>> > > In each case config options were correctly set and readable with
>> config
>> > > get. I also found a suggestion in ceph-users (
>> > >
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KJM232IHN7FKYI5LODUREN7SVO45BL42/
>> )
>> > > to set compression in a similar manner. Unfortunately, these options
>> appear
>> > > to be ignored.
>> > >
>> > > How can I enable RocksDB compression in Ceph monitors?
>> > >
>> > > I would very much appreciate your advices and comments.
>> > >
>> > > Best regards,
>> > > Zakhar
>> > >
>> > >
>> > >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-14 Thread Zakhar Kirpichenko
Thanks for your response, Josh. Our ceph.conf doesn't have anything but the
mon addresses, modern Ceph versions store their configuration in the
monitor configuration database.

This works rather well for various Ceph components, including the monitors.
RocksDB options are also applied to monitors correctly, but for some reason
are being ignored.

/Z

On Sat, 14 Oct 2023, 17:40 Josh Baergen,  wrote:

> Apologies if you tried this already and I missed it - have you tried
> configuring that setting in /etc/ceph/ceph.conf (or wherever your conf
> file is) instead of via 'ceph config'? I wonder if mon settings like
> this one won't actually apply the way you want because they're needed
> before the mon has the ability to obtain configuration from,
> effectively, itself.
>
> Josh
>
> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko 
> wrote:
> >
> > I also tried setting RocksDB compression options and deploying a new
> > monitor. The monitor started with no RocksDB compression again.
> >
> > Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at mon
> > start and at mon deploy. How can I enable RocksDB compression in Ceph
> > monitors?
> >
> > Any input from anyone, please?
> >
> > /Z
> >
> > On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko 
> wrote:
> >
> > > Hi,
> > >
> > > I'm still trying to fight large Ceph monitor writes. One option I
> > > considered is enabling RocksDB compression, as our nodes have more than
> > > sufficient RAM and CPU. Unfortunately, monitors seem to completely
> ignore
> > > the compression setting:
> > >
> > > I tried:
> > >
> > > - setting ceph config set mon.ceph05 mon_rocksdb_options
> > >
> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
> > > restarting the test monitor. The monitor started with no RocksDB
> > > compression:
> > >
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb: Compression
> > > algorithms supported:
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kZSTDNotFinalCompression supported: 0
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kXpressCompression supported: 0
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kLZ4HCCompression supported: 1
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kLZ4Compression supported: 1
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kBZip2Compression supported: 0
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kZlibCompression supported: 1
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > > kSnappyCompression supported: 1
> > > ...
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > >  Options.compression: NoCompression
> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> > >Options.bottommost_compression: Disabled
> > >
> > > - setting ceph config set mon mon_rocksdb_options
> > >
> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
> > > restarting the test monitor. The monitor started with no RocksDB
> > > compression, the same way as above.
> > >
> > > In each case config options were correctly set and readable with config
> > > get. I also found a suggestion in ceph-users (
> > >
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KJM232IHN7FKYI5LODUREN7SVO45BL42/
> )
> > > to set compression in a similar manner. Unfortunately, these options
> appear
> > > to be ignored.
> > >
> > > How can I enable RocksDB compression in Ceph monitors?
> > >
> > > I would very much appreciate your advices and comments.
> > >
> > > Best regards,
> > > Zakhar
> > >
> > >
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-14 Thread Zakhar Kirpichenko
I also tried setting RocksDB compression options and deploying a new
monitor. The monitor started with no RocksDB compression again.

Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at mon
start and at mon deploy. How can I enable RocksDB compression in Ceph
monitors?

Any input from anyone, please?

/Z

On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko  wrote:

> Hi,
>
> I'm still trying to fight large Ceph monitor writes. One option I
> considered is enabling RocksDB compression, as our nodes have more than
> sufficient RAM and CPU. Unfortunately, monitors seem to completely ignore
> the compression setting:
>
> I tried:
>
> - setting ceph config set mon.ceph05 mon_rocksdb_options
> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
> restarting the test monitor. The monitor started with no RocksDB
> compression:
>
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb: Compression
> algorithms supported:
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kZSTDNotFinalCompression supported: 0
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kXpressCompression supported: 0
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kLZ4HCCompression supported: 1
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kLZ4Compression supported: 1
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kBZip2Compression supported: 0
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kZlibCompression supported: 1
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> kSnappyCompression supported: 1
> ...
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>  Options.compression: NoCompression
> debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>Options.bottommost_compression: Disabled
>
> - setting ceph config set mon mon_rocksdb_options
> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
> restarting the test monitor. The monitor started with no RocksDB
> compression, the same way as above.
>
> In each case config options were correctly set and readable with config
> get. I also found a suggestion in ceph-users (
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KJM232IHN7FKYI5LODUREN7SVO45BL42/)
> to set compression in a similar manner. Unfortunately, these options appear
> to be ignored.
>
> How can I enable RocksDB compression in Ceph monitors?
>
> I would very much appreciate your advices and comments.
>
> Best regards,
> Zakhar
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-13 Thread Zakhar Kirpichenko
Hi,

I'm still trying to fight large Ceph monitor writes. One option I
considered is enabling RocksDB compression, as our nodes have more than
sufficient RAM and CPU. Unfortunately, monitors seem to completely ignore
the compression setting:

I tried:

- setting ceph config set mon.ceph05 mon_rocksdb_options
"write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
restarting the test monitor. The monitor started with no RocksDB
compression:

debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb: Compression
algorithms supported:
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kZSTDNotFinalCompression supported: 0
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kXpressCompression supported: 0
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kLZ4HCCompression supported: 1
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kLZ4Compression supported: 1
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kBZip2Compression supported: 0
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kZlibCompression supported: 1
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
kSnappyCompression supported: 1
...
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
 Options.compression: NoCompression
debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
 Options.bottommost_compression: Disabled

- setting ceph config set mon mon_rocksdb_options
"write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
restarting the test monitor. The monitor started with no RocksDB
compression, the same way as above.

In each case config options were correctly set and readable with config
get. I also found a suggestion in ceph-users (
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KJM232IHN7FKYI5LODUREN7SVO45BL42/)
to set compression in a similar manner. Unfortunately, these options appear
to be ignored.

How can I enable RocksDB compression in Ceph monitors?

I would very much appreciate your advices and comments.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-13 Thread Zakhar Kirpichenko
> Some of it is transferable to RocksDB on mons nonetheless.

Please point me to relevant Ceph documentation, i.e. a description of how
various Ceph monitor and RocksDB tunables affect the operations of
monitors, I'll gladly look into it.

> Please point me to such recommendations, if they're on docs.ceph.com I'll
get them updated.

This are the recommendations we used when we built our Pacific cluster:
https://docs.ceph.com/en/pacific/start/hardware-recommendations/

Our drives are 4x times larger than recommended by this guide. The drives
are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
storage of rarely modified files. It is not documented or suggested
anywhere that monitor processes write several hundred gigabytes of data per
day, exceeding the amount of data written by OSDs. Which is why I am not
convinced that what we're observing is expected behavior, but it's not easy
to get a definitive answer from the Ceph community.

/Z

On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri 
wrote:

> Some of it is transferable to RocksDB on mons nonetheless.
>
> but their specs exceed Ceph hardware recommendations by a good margin
>
>
> Please point me to such recommendations, if they're on docs.ceph.com I'll
> get them updated.
>
> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko  wrote:
>
> Thank you, Anthony. As I explained to you earlier, the article you had
> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
> not with OSDs but rather monitors and their RocksDB store. Indeed, the
> drives are not enterprise-grade, but their specs exceed Ceph hardware
> recommendations by a good margin, they're being used as boot drives only
> and aren't supposed to be written to continuously at high rates - which is
> what unfortunately is happening. I am trying to determine why it is
> happening and how the issue can be alleviated or resolved, unfortunately
> monitor RocksDB usage and tunables appear to be not documented at all.
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri 
> wrote:
>
>> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
>> Reef you would experience fewer writes.  Universal compaction might also
>> help, but in the end this SSD is a client SKU and really not suited for
>> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
>> could change the overprovisioning on the ones you have.
>>
>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko  wrote:
>>
>> I would very much appreciate it if someone with a better understanding of
>> monitor internals and use of RocksDB could please chip in.
>>
>>
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-13 Thread Zakhar Kirpichenko
Thank you, Anthony. As I explained to you earlier, the article you had sent
is about RocksDB tuning for Bluestore OSDs, while the issue at hand is not
with OSDs but rather monitors and their RocksDB store. Indeed, the drives
are not enterprise-grade, but their specs exceed Ceph hardware
recommendations by a good margin, they're being used as boot drives only
and aren't supposed to be written to continuously at high rates - which is
what unfortunately is happening. I am trying to determine why it is
happening and how the issue can be alleviated or resolved, unfortunately
monitor RocksDB usage and tunables appear to be not documented at all.

/Z

On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri 
wrote:

> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with Reef
> you would experience fewer writes.  Universal compaction might also help,
> but in the end this SSD is a client SKU and really not suited for
> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
> could change the overprovisioning on the ones you have.
>
> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko  wrote:
>
> I would very much appreciate it if someone with a better understanding of
> monitor internals and use of RocksDB could please chip in.
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-13 Thread Zakhar Kirpichenko
An interesting find:

I reduced the number of PGs for some of the less utilized pools, which
brought the total number of PGs in the cluster down from 2400 to 1664. The
cluster is healthy, the only change was the 30% reduction of PGs, but mons
now have a much smaller store.db, have much fewer "manual compaction"
events and write significantly less data.

Store.db is about 1/2 smaller, ~260 MB in 3 .sst files compared to 590-600
MB in 9 .sst files before the PG changes:

total 258012
drwxr-xr-x 2 167 167 4096 Oct 13 16:15 .
drwx-- 3 167 167 4096 Aug 31 05:22 ..
-rw-r--r-- 1 167 167  7218886 Oct 13 16:16 3056869.log
-rw-r--r-- 1 167 167 67250650 Oct 13 16:15 3056871.sst
-rw-r--r-- 1 167 167 67367527 Oct 13 16:15 3056872.sst
-rw-r--r-- 1 167 167 63268486 Oct 13 16:15 3056873.sst
-rw-r--r-- 1 167 167   17 Sep 18 11:53 CURRENT
-rw-r--r-- 1 167 167   37 Nov  3  2021 IDENTITY
-rw-r--r-- 1 167 1670 Nov  3  2021 LOCK
-rw-r--r-- 1 167 167 27039408 Oct 13 16:15 MANIFEST-2785821
-rw-r--r-- 1 167 167 5287 Sep  1 04:39 OPTIONS-2710412
-rw-r--r-- 1 167 167 5287 Sep 18 11:53 OPTIONS-2785824

"Manual compaction" events now run half as often compared to before the
change.

Before the change, compaction events per hour:

# docker logs a4615a23b4c6 2>&1| grep -i 2023-10-13T10 | grep -ci "manual
compaction from"
88

After the change, compaction events per hour:

# docker logs a4615a23b4c6 2>&1| grep -i 2023-10-13T15 | grep -ci "manual
compaction from"
45

I ran several iotop measurements, mons consistently write 550-750 MB to
disk every 5 minutes compared to 1.5-2.5 GB every 5 min before the changes:

   4919 be/4 167   7.29 M754.04 M  0.00 %  0.17 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
   4919 be/4 167   8.12 M554.53 M  0.00 %  0.12 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
   4919 be/4 167 532.00 K750.40 M  0.00 %  0.16 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

It is a significant reduction of store.db and associated disk writes.

I would very much appreciate it if someone with a better understanding of
monitor internals and use of RocksDB could please chip in.

/Z

On Wed, 11 Oct 2023 at 19:00, Zakhar Kirpichenko  wrote:

> Thank you, Frank. This confirms that monitors indeed do this, and
>
> Our boot drives in 3 systems are smaller 1 DWPD drives (RAID1 to protect
> against a random single drive failure), and over 3 years mons have eaten
> through 60% of their endurance. Other systems have larger boot drives and
> 2% of their endurance were used up over 1.5 years.
>
> It would still be good to get an understanding why monitors do this, and
> whether there is any way to reduce the amount of writes. Unfortunately,
> Ceph documentation in this regard is severely lacking.
>
> I'm copying this to ceph-docs, perhaps someone will find it useful and
> adjust the hardware recommendations.
>
> /Z
>
> On Wed, 11 Oct 2023, 18:23 Frank Schilder,  wrote:
>
>> Oh wow! I never bothered looking, because on our hardware the wear is so
>> low:
>>
>> # iotop -ao -bn 2 -d 300
>> Total DISK READ :   0.00 B/s | Total DISK WRITE :   6.46 M/s
>> Actual DISK READ:   0.00 B/s | Actual DISK WRITE:   6.47 M/s
>> TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
>>2230 be/4 ceph  0.00 B   1818.71 M  0.00 %  0.46 % ceph-mon
>> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
>> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
>> [rocksdb:low0]
>>2256 be/4 ceph  0.00 B 19.27 M  0.00 %  0.43 % ceph-mon
>> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
>> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
>> [safe_timer]
>>2250 be/4 ceph  0.00 B 42.38 M  0.00 %  0.26 % ceph-mon
>> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
>> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
>> [fn_monstore]
>>2231 be/4 ceph  0.00 B 58.36 M  0.00 %  0.01 % ceph-mon
>> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
&g

[ceph-users] Re: Please help collecting stats of Ceph monitor disk writes

2023-10-13 Thread Zakhar Kirpichenko
Thank you, Frank.

Tbh, I think it doesn't matter if the number of manual compactions is for
24h or for a smaller period, as long as it's over a reasonable period of
time, so that an average number of compactions per hour can be calculated.

/Z

On Fri, 13 Oct 2023 at 16:01, Frank Schilder  wrote:

> Hi Zakhar,
>
> I'm pretty sure you wanted the #manual compactions for an entire day, not
> from whenever the log starts to current time, which is most often not
> 23:59. You need to get the date from the previous day and make sure the log
> contains a full 00:00-23:59 window.
>
> 1) iotop results:
> TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
>2256 be/4 ceph  0.00 B 17.48 M  0.00 %  0.80 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [safe_timer]
>2230 be/4 ceph  0.00 B   1514.19 M  0.00 %  0.37 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [rocksdb:low0]
>2250 be/4 ceph  0.00 B 36.23 M  0.00 %  0.15 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [fn_monstore]
>2231 be/4 ceph  0.00 B 50.52 M  0.00 %  0.02 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [rocksdb:high0]
>2225 be/4 ceph  0.00 B120.00 K  0.00 %  0.00 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [log]
>
> 2) manual compactions (over a full 24h window): 1882
>
> 3) monitor store.db size: 616M
>
> 4) cluster version and status:
>
> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus
> (stable)
>
>   cluster:
> id: xxx
> health: HEALTH_WARN
> 1 large omap objects
>
>   services:
> mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 7w)
> mgr: ceph-25(active, since 4w), standbys: ceph-26, ceph-01, ceph-03,
> ceph-02
> mds: con-fs2:8 4 up:standby 8 up:active
> osd: 1284 osds: 1282 up (since 27h), 1282 in (since 3w)
>
>   task status:
>
>   data:
> pools:   14 pools, 25065 pgs
> objects: 2.18G objects, 3.9 PiB
> usage:   4.8 PiB used, 8.3 PiB / 13 PiB avail
> pgs: 25037 active+clean
>  26active+clean+scrubbing+deep
>  2 active+clean+scrubbing
>
>   io:
> client:   1.7 GiB/s rd, 1013 MiB/s wr, 3.02k op/s rd, 1.78k op/s wr
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.14: pgmap updated every few seconds for no apparent reason

2023-10-13 Thread Zakhar Kirpichenko
Hi,

I am investigating excessive mon writes in our cluster and wondering
whether excessive pgmap updates could be the culprit. Basically pgmap is
updated every few seconds, sometimes over ten times per minute, in a
healthy cluster with no OSD and/or PG changes:

Oct 13 11:03:03 ceph03 bash[4019]: cluster 2023-10-13T11:03:01.515438+
mgr.ceph01.vankui (mgr.336635131) 838252 : cluster [DBG] pgmap v606575:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 60 MiB/s rd, 109 MiB/s wr, 5.65k op/s
Oct 13 11:03:04 ceph03 bash[4019]: cluster 2023-10-13T11:03:03.520953+
mgr.ceph01.vankui (mgr.336635131) 838253 : cluster [DBG] pgmap v606576:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 128 MiB/s wr, 5.76k op/s
Oct 13 11:03:06 ceph03 bash[4019]: cluster 2023-10-13T11:03:05.524474+
mgr.ceph01.vankui (mgr.336635131) 838255 : cluster [DBG] pgmap v606577:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 64 MiB/s rd, 122 MiB/s wr, 5.57k op/s
Oct 13 11:03:08 ceph03 bash[4019]: cluster 2023-10-13T11:03:07.530484+
mgr.ceph01.vankui (mgr.336635131) 838256 : cluster [DBG] pgmap v606578:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 79 MiB/s rd, 127 MiB/s wr, 6.62k op/s
Oct 13 11:03:10 ceph03 bash[4019]: cluster 2023-10-13T11:03:09.57+
mgr.ceph01.vankui (mgr.336635131) 838258 : cluster [DBG] pgmap v606579:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 66 MiB/s rd, 104 MiB/s wr, 5.38k op/s
Oct 13 11:03:12 ceph03 bash[4019]: cluster 2023-10-13T11:03:11.537908+
mgr.ceph01.vankui (mgr.336635131) 838259 : cluster [DBG] pgmap v606580:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 85 MiB/s rd, 121 MiB/s wr, 6.43k op/s
Oct 13 11:03:13 ceph03 bash[4019]: cluster 2023-10-13T11:03:13.543490+
mgr.ceph01.vankui (mgr.336635131) 838260 : cluster [DBG] pgmap v606581:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 78 MiB/s rd, 127 MiB/s wr, 6.54k op/s
Oct 13 11:03:16 ceph03 bash[4019]: cluster 2023-10-13T11:03:15.547122+
mgr.ceph01.vankui (mgr.336635131) 838262 : cluster [DBG] pgmap v606582:
2400 pgs: 5 active+clean+scrubbing+deep, 2395 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 71 MiB/s rd, 122 MiB/s wr, 6.08k op/s
Oct 13 11:03:18 ceph03 bash[4019]: cluster 2023-10-13T11:03:17.553180+
mgr.ceph01.vankui (mgr.336635131) 838263 : cluster [DBG] pgmap v606583:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 75 MiB/s
rd, 176 MiB/s wr, 6.83k op/s
Oct 13 11:03:20 ceph03 bash[4019]: cluster 2023-10-13T11:03:19.555960+
mgr.ceph01.vankui (mgr.336635131) 838264 : cluster [DBG] pgmap v606584:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 58 MiB/s
rd, 161 MiB/s wr, 5.55k op/s
Oct 13 11:03:22 ceph03 bash[4019]: cluster 2023-10-13T11:03:21.560597+
mgr.ceph01.vankui (mgr.336635131) 838266 : cluster [DBG] pgmap v606585:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 62 MiB/s
rd, 221 MiB/s wr, 6.19k op/s
Oct 13 11:03:24 ceph03 bash[4019]: cluster 2023-10-13T11:03:23.565974+
mgr.ceph01.vankui (mgr.336635131) 838267 : cluster [DBG] pgmap v606586:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 50 MiB/s
rd, 246 MiB/s wr, 5.93k op/s
Oct 13 11:03:26 ceph03 bash[4019]: cluster 2023-10-13T11:03:25.569471+
mgr.ceph01.vankui (mgr.336635131) 838269 : cluster [DBG] pgmap v606587:
2400 pgs: 1 active+clean+scrubbing, 5 active+clean+scrubbing+deep, 2394
active+clean; 16 TiB data, 61 TiB used, 716 TiB / 777 TiB avail; 41 MiB/s
rd, 240 MiB/s wr, 4.99k op/s
Oct 13 11:03:28 ceph03 bash[4019]: cluster 2023-10-13T11:03:27.575618+
mgr.ceph01.vankui (mgr.336635131) 838270 : cluster [DBG] pgmap v606588:
2400 pgs: 4 active+clean+scrubbing+deep, 2396 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 44 MiB/s rd, 259 MiB/s wr, 5.38k op/s
Oct 13 11:03:30 ceph03 bash[4019]: cluster 2023-10-13T11:03:29.578262+
mgr.ceph01.vankui (mgr.336635131) 838271 : cluster [DBG] pgmap v606589:
2400 pgs: 4 active+clean+scrubbing+deep, 2396 active+clean; 16 TiB data, 61
TiB used, 716 TiB / 777 TiB avail; 31 MiB/s rd, 195 MiB/s wr, 4.06k op/s
Oct 13 11:03:32 ceph03 bash[4019]: cluster 2023-10-13T11:03:31.582849+
mgr.ceph01.vankui (mgr.336635131) 838272 : cluster [DBG] pgmap v606590:
2400 pgs: 4 

[ceph-users] Please help collecting stats of Ceph monitor disk writes

2023-10-13 Thread Zakhar Kirpichenko
Hi!

Further to my thread "Ceph 16.2.x mon compactions, disk writes" (
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XGCI2LFW5RH3GUOQFJ542ISCSZH3FRX2/)
where we have established that Ceph monitors indeed write considerable
amounts of data to disks, I would like to request fellow Ceph users to
provide feedback and help gather some statistics regarding whether this
happens on all clusters or on some specific subset of clusters.

The procedure is rather simple and won't take much of your time.

If you are willing to help, please follow this procedure:

-

1. Install iotop and run the following command on any of your monitor nodes:

iotop -ao -bn 2 -d 300 2>&1 | grep -E "TID|ceph-mon"

This will collect a 5-minute disk I/O statistics and produce an output
containing the stats for Ceph monitor threads running on the node:

TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
   4854 be/4 167   8.62 M  2.27 G  0.00 %  0.72 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
   4919 be/4 167   0.00 B 39.43 M  0.00 %  0.02 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
   4855 be/4 167   8.00 K 19.55 M  0.00 %  0.00 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]

We're particularly interested in the amount of written data.

-

2. Optional: collect the number of "manual compaction" events from the
monitor.

This step will depend on how your monitor runs. My cluster is managed by
cephadm and monitors run in docker containers, thus I can do something like
this, where MYMONCONTAINERID is the container ID of Ceph monitor:

# date; d=$(date +'%Y-%m-%d'); docker logs MYMONCONTAINERID 2>&1 | grep $d
| grep -ci "manual compaction from"
Fri 13 Oct 2023 06:29:39 AM UTC
580

Alternatively, I could run the command against the log file MYMONLOGFILE,
whose location I obtained with docker inspect:

# date; d=$(date +'%Y-%m-%d'); grep $d MYMONLOGFILE | grep -ci "manual
compaction from"
Fri 13 Oct 2023 06:35:27 AM UTC
588

If you run monitors with podman or without containerization, please get
this information the way that is most convenient in your setup.

-

3. Optional: collect the monitor store.db size.

Usually the monitor store.db is available at
/var/lib/ceph/FSID/mon.NAME/store.db/, for example:

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
642M
 /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/

-

4. Optional: collect Ceph cluster version and status.

For example:

root@ceph01:/# ceph version; ceph -s
ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific
(stable)
  cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_OK

  services:
mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
mgr: ceph01.vankui(active, since 13d), standbys: ceph02.shsinf
osd: 96 osds: 96 up (since 2w), 95 in (since 3w)

  data:
pools:   10 pools, 2400 pgs
objects: 6.30M objects, 16 TiB
usage:   61 TiB used, 716 TiB / 777 TiB avail
pgs: 2396 active+clean
 3active+clean+scrubbing+deep
 1active+clean+scrubbing

  io:
client:   71 MiB/s rd, 60 MiB/s wr, 2.94k op/s rd, 2.56k op/s wr

-

5. Reply to this thread and submit the collected information.

For example:

1) iotop results:
... Paste data obtained in step 1)

2) manual compactions:
... Paste data obtained in step 2), or put "N/A"

3) monitor store.db size:
... Paste data obtained in step 3), or put "N/A"

4) cluster version and status:
... Paste data obtained in step 4), or put "N/A"

-

I would very much appreciate your effort and help with gathering these
stats. Please don't hesitate to contact me with any questions or concerns.

Best regards,

Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Thank you, Frank. This confirms that monitors indeed do this, and

Our boot drives in 3 systems are smaller 1 DWPD drives (RAID1 to protect
against a random single drive failure), and over 3 years mons have eaten
through 60% of their endurance. Other systems have larger boot drives and
2% of their endurance were used up over 1.5 years.

It would still be good to get an understanding why monitors do this, and
whether there is any way to reduce the amount of writes. Unfortunately,
Ceph documentation in this regard is severely lacking.

I'm copying this to ceph-docs, perhaps someone will find it useful and
adjust the hardware recommendations.

/Z

On Wed, 11 Oct 2023, 18:23 Frank Schilder,  wrote:

> Oh wow! I never bothered looking, because on our hardware the wear is so
> low:
>
> # iotop -ao -bn 2 -d 300
> Total DISK READ :   0.00 B/s | Total DISK WRITE :   6.46 M/s
> Actual DISK READ:   0.00 B/s | Actual DISK WRITE:   6.47 M/s
> TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
>2230 be/4 ceph  0.00 B   1818.71 M  0.00 %  0.46 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [rocksdb:low0]
>2256 be/4 ceph  0.00 B 19.27 M  0.00 %  0.43 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [safe_timer]
>2250 be/4 ceph  0.00 B 42.38 M  0.00 %  0.26 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [fn_monstore]
>2231 be/4 ceph  0.00 B 58.36 M  0.00 %  0.01 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [rocksdb:high0]
> 644 be/3 root  0.00 B576.00 K  0.00 %  0.00 % [jbd2/sda3-8]
>2225 be/4 ceph  0.00 B128.00 K  0.00 %  0.00 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [log]
> 1637141 be/4 root  0.00 B  0.00 B  0.00 %  0.00 %
> [kworker/u113:2-flush-8:0]
> 1636453 be/4 root  0.00 B  0.00 B  0.00 %  0.00 %
> [kworker/u112:0-ceph0]
>1560 be/4 root  0.00 B 20.00 K  0.00 %  0.00 % rsyslogd -n
> [in:imjournal]
>1561 be/4 root  0.00 B 56.00 K  0.00 %  0.00 % rsyslogd -n
> [rs:main Q:Reg]
>
> 1.8GB every 5 minutes, thats 518GB per day. The 400G drives we have are
> rated 10DWPD and with the 6-drives RAID10 config this gives plenty of
> life-time. I guess this write load will kill any low-grade SSD (typical
> bood devices, even enterprise ones) specifically if its smaller drives and
> the controller doesn't reallocate cells according to remaining write
> endurance.
>
> I guess there was a reason for the recommendations by Dell. I always
> thought that the recent recommendation for MON store storage in the ceph
> docs are a "bit unrealistic", apparently both, in size and in performance
> (including endurance). Well, I guess you need to look for write intensive
> drives with decent specs. If you do, also go for sufficient size. This will
> absorb temporary usage peaks that can be very large and also provide extra
> endurance with SSDs with good controllers.
>
> I also think the recommendations on the ceph docs deserve a reality check.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zakhar Kirpichenko 
> Sent: Wednesday, October 11, 2023 4:30 PM
> To: Eugen Block
> Cc: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes
>
> Eugen,
>
> Thanks for your response. May I ask what numbers you're referring to?
>
> I am not referring to monitor store.db sizes. I am specifically referring
> to writes monitors do to their store.db file by frequently rotating and
> replacing them with new versions during compactions. The size of the
> store.db remains more or less the same.
>
> This is a 300s iotop snippet, sorted by aggregated disk writes:
>
> Total DISK READ:35.56 M/s | Total DISK WRITE:23.89 M/s
> Current DISK READ:  35.64 M/s | Current DISK WRITE:  24.09 M/s
> TID  PRIO  USER DISK READ DISK WRITE>  SWAPIN  IOCOMMAND
>4919 be/4 167  16.75 M  2.24 G  0.00 %  1.34 % ceph-mon -n
> mon.ceph03 -f --setuser ceph --setgr~lt-mon-cluster-log-to-stderr=true
> 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
on/ceph-ceph03/store.db/3677170.sst
ceph-mon 4838  167  208r  REG 253,11 67226761 14813316
/var/lib/ceph/mon/ceph-ceph03/store.db/3677171.sst
ceph-mon 4838  167  220r  REG 253,11 67258798 14813332
/var/lib/ceph/mon/ceph-ceph03/store.db/3677172.sst
ceph-mon 4838  167  221r  REG 253,11 67224665 14813345
/var/lib/ceph/mon/ceph-ceph03/store.db/3677173.sst
ceph-mon 4838  167  224r  REG 253,11 67224123 14813348
/var/lib/ceph/mon/ceph-ceph03/store.db/3677174.sst
ceph-mon 4838  167  228r  REG 253,11 62195349 14813381
/var/lib/ceph/mon/ceph-ceph03/store.db/3677175.sst

I hope this clears up the situation.

Do you observe this behavior in your clusters? Can you please check whether
your mons do something similar and store.db/*.sst change often?

/Z

On Wed, 11 Oct 2023 at 16:22, Eugen Block  wrote:

> That all looks normal to me, to be honest. Can you show some details
> how you calculate the "hundreds of GB per day"? I see similar stats as
> Frank on different clusters with different client IO.
>
> Zitat von Zakhar Kirpichenko :
>
> > Sure, nothing unusual there:
> >
> > ---
> >
> >   cluster:
> > id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
> > health: HEALTH_OK
> >
> >   services:
> > mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
> > mgr: ceph01.vankui(active, since 12d), standbys: ceph02.shsinf
> > osd: 96 osds: 96 up (since 2w), 95 in (since 3w)
> >
> >   data:
> > pools:   10 pools, 2400 pgs
> > objects: 6.23M objects, 16 TiB
> > usage:   61 TiB used, 716 TiB / 777 TiB avail
> > pgs: 2396 active+clean
> >  3active+clean+scrubbing+deep
> >  1active+clean+scrubbing
> >
> >   io:
> > client:   2.7 GiB/s rd, 27 MiB/s wr, 46.95k op/s rd, 2.17k op/s wr
> >
> > ---
> >
> > Please disregard the big read number, a customer is running a
> > read-intensive job. Mon store writes keep happening when the cluster is
> > much more quiet, thus I think that intensive reads have no effect on the
> > mons.
> >
> > Mgr:
> >
> > "always_on_modules": [
> > "balancer",
> > "crash",
> > "devicehealth",
> > "orchestrator",
> > "pg_autoscaler",
> > "progress",
> > "rbd_support",
> > "status",
> > "telemetry",
> > "volumes"
> > ],
> > "enabled_modules": [
> > "cephadm",
> > "dashboard",
> > "iostat",
> > "prometheus",
> > "restful"
> > ],
> >
> > ---
> >
> > /Z
> >
> >
> > On Wed, 11 Oct 2023 at 14:50, Eugen Block  wrote:
> >
> >> Can you add some more details as requested by Frank? Which mgr modules
> >> are enabled? What's the current 'ceph -s' output?
> >>
> >> > Is autoscaler running and doing stuff?
> >> > Is balancer running and doing stuff?
> >> > Is backfill going on?
> >> > Is recovery going on?
> >> > Is your ceph version affected by the "excessive logging to MON
> >> > store" issue that was present starting with pacific but should have
> >> > been addressed
> >>
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > We don't use CephFS at all and don't have RBD snapshots apart from
> some
> >> > cloning for Openstack images.
> >> >
> >> > The size of mon stores isn't an issue, it's < 600 MB. But it gets
> >> > overwritten often causing lots of disk writes, and that is an issue
> for
> >> us.
> >> >
> >> > /Z
> >> >
> >> > On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:
> >> >
> >> >> Do you use many snapshots (rbd or cephfs)? That can cause a heavy
> >> >> monitor usage, we've seen large mon stores on  customer clusters with
> >> >> rbd mirroring on snapshot basis. In a healthy cluster they have mon
> >> >> stores of around 2GB in size.
> >> >>
> >> >> >> @Eugen: Was there not an option to limit logging to the MON store?
> >> >>
> >> >> I don't recall at the moment, worth checking tough.
> >> >>
> >>

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Sure, nothing unusual there:

---

  cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_OK

  services:
mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
mgr: ceph01.vankui(active, since 12d), standbys: ceph02.shsinf
osd: 96 osds: 96 up (since 2w), 95 in (since 3w)

  data:
pools:   10 pools, 2400 pgs
objects: 6.23M objects, 16 TiB
usage:   61 TiB used, 716 TiB / 777 TiB avail
pgs: 2396 active+clean
 3active+clean+scrubbing+deep
 1active+clean+scrubbing

  io:
client:   2.7 GiB/s rd, 27 MiB/s wr, 46.95k op/s rd, 2.17k op/s wr

---

Please disregard the big read number, a customer is running a
read-intensive job. Mon store writes keep happening when the cluster is
much more quiet, thus I think that intensive reads have no effect on the
mons.

Mgr:

"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
],

---

/Z


On Wed, 11 Oct 2023 at 14:50, Eugen Block  wrote:

> Can you add some more details as requested by Frank? Which mgr modules
> are enabled? What's the current 'ceph -s' output?
>
> > Is autoscaler running and doing stuff?
> > Is balancer running and doing stuff?
> > Is backfill going on?
> > Is recovery going on?
> > Is your ceph version affected by the "excessive logging to MON
> > store" issue that was present starting with pacific but should have
> > been addressed
>
>
> Zitat von Zakhar Kirpichenko :
>
> > We don't use CephFS at all and don't have RBD snapshots apart from some
> > cloning for Openstack images.
> >
> > The size of mon stores isn't an issue, it's < 600 MB. But it gets
> > overwritten often causing lots of disk writes, and that is an issue for
> us.
> >
> > /Z
> >
> > On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:
> >
> >> Do you use many snapshots (rbd or cephfs)? That can cause a heavy
> >> monitor usage, we've seen large mon stores on  customer clusters with
> >> rbd mirroring on snapshot basis. In a healthy cluster they have mon
> >> stores of around 2GB in size.
> >>
> >> >> @Eugen: Was there not an option to limit logging to the MON store?
> >>
> >> I don't recall at the moment, worth checking tough.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Thank you, Frank.
> >> >
> >> > The cluster is healthy, operating normally, nothing unusual is going
> on.
> >> We
> >> > observe lots of writes by mon processes into mon rocksdb stores,
> >> > specifically:
> >> >
> >> > /var/lib/ceph/mon/ceph-cephXX/store.db:
> >> > 65M 3675511.sst
> >> > 65M 3675512.sst
> >> > 65M 3675513.sst
> >> > 65M 3675514.sst
> >> > 65M 3675515.sst
> >> > 65M 3675516.sst
> >> > 65M 3675517.sst
> >> > 65M 3675518.sst
> >> > 62M 3675519.sst
> >> >
> >> > The site of the files is not huge, but monitors rotate and write out
> >> these
> >> > files often, sometimes several times per minute, resulting in lots of
> >> data
> >> > written to disk. The writes coincide with "manual compaction" events
> >> logged
> >> > by the monitors, for example:
> >> >
> >> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting
> 1@5
> >> +
> >> > 9@6 files to L6, score -1.00
> >> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> EVENT_LOG_v1
> >> > {"time_micros": 1697022610487624, "job": 70854, "event":
> >> > "compaction_started", "compaction_reason": "ManualCompaction",
> >> "files_L5":
> >> > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
> >> > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
> >> > 601117031}
> >> > debug 2023-10-11T11:10:10.619+ 7f48a3a

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
We don't use CephFS at all and don't have RBD snapshots apart from some
cloning for Openstack images.

The size of mon stores isn't an issue, it's < 600 MB. But it gets
overwritten often causing lots of disk writes, and that is an issue for us.

/Z

On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:

> Do you use many snapshots (rbd or cephfs)? That can cause a heavy
> monitor usage, we've seen large mon stores on  customer clusters with
> rbd mirroring on snapshot basis. In a healthy cluster they have mon
> stores of around 2GB in size.
>
> >> @Eugen: Was there not an option to limit logging to the MON store?
>
> I don't recall at the moment, worth checking tough.
>
> Zitat von Zakhar Kirpichenko :
>
> > Thank you, Frank.
> >
> > The cluster is healthy, operating normally, nothing unusual is going on.
> We
> > observe lots of writes by mon processes into mon rocksdb stores,
> > specifically:
> >
> > /var/lib/ceph/mon/ceph-cephXX/store.db:
> > 65M 3675511.sst
> > 65M 3675512.sst
> > 65M 3675513.sst
> > 65M 3675514.sst
> > 65M 3675515.sst
> > 65M 3675516.sst
> > 65M 3675517.sst
> > 65M 3675518.sst
> > 62M 3675519.sst
> >
> > The site of the files is not huge, but monitors rotate and write out
> these
> > files often, sometimes several times per minute, resulting in lots of
> data
> > written to disk. The writes coincide with "manual compaction" events
> logged
> > by the monitors, for example:
> >
> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5
> +
> > 9@6 files to L6, score -1.00
> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb: EVENT_LOG_v1
> > {"time_micros": 1697022610487624, "job": 70854, "event":
> > "compaction_started", "compaction_reason": "ManualCompaction",
> "files_L5":
> > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
> > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
> > 601117031}
> > debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675544: 2015 keys, 67287115 bytes
> > debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675545: 24343 keys, 67336225 bytes
> > debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675546: 1196 keys, 67225813 bytes
> > debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675547: 1049 keys, 67252678 bytes
> > debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675548: 1081 keys, 67216638 bytes
> > debug 2023-10-11T11:10:11.303+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675549: 1196 keys, 67245376 bytes
> > debug 2023-10-11T11:10:12.023+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675550: 1195 keys, 67246813 bytes
> > debug 2023-10-11T11:10:13.059+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675551: 1205 keys, 67223302 bytes
> > debug 2023-10-11T11:10:13.903+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675552: 1312 keys, 56416011 bytes
> > debug 2023-10-11T11:10:13.911+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5
> +
> > 9@6 files to L6 => 594449971 bytes
> > debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
> > Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
> > [default] compacted to: base level 5 level multiplier 10.00 max bytes
> base
> > 268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9
> wr,
> > level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
> > read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
> > records dropped: 516 output_compression: NoCompression
> > debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
luster map continuously.
>
> Is autoscaler running and doing stuff?
> Is balancer running and doing stuff?
> Is backfill going on?
> Is recovery going on?
> Is your ceph version affected by the "excessive logging to MON store"
> issue that was present starting with pacific but should have been addressed
> by now?
>
> @Eugen: Was there not an option to limit logging to the MON store?
>
> For information to readers, we followed old recommendations from a Dell
> white paper for building a ceph cluster and have a 1TB Raid10 array on 6x
> write intensive SSDs for the MON stores. After 5 years we are below 10%
> wear. Average size of the MON store for a healthy cluster is 500M-1G, but
> we have seen this ballooning to 100+GB in degraded conditions.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zakhar Kirpichenko 
> Sent: Wednesday, October 11, 2023 12:00 PM
> To: Eugen Block
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes
>
> Thank you, Eugen.
>
> I'm interested specifically to find out whether the huge amount of data
> written by monitors is expected. It is eating through the endurance of our
> system drives, which were not specced for high DWPD/TBW, as this is not a
> documented requirement, and monitors produce hundreds of gigabytes of
> writes per day. I am looking for ways to reduce the amount of writes, if
> possible.
>
> /Z
>
> On Wed, 11 Oct 2023 at 12:41, Eugen Block  wrote:
>
> > Hi,
> >
> > what you report is the expected behaviour, at least I see the same on
> > all clusters. I can't answer why the compaction is required that
> > often, but you can control the log level of the rocksdb output:
> >
> > ceph config set mon debug_rocksdb 1/5 (default is 4/5)
> >
> > This reduces the log entries and you wouldn't see the manual
> > compaction logs anymore. There are a couple more rocksdb options but I
> > probably wouldn't change too much, only if you know what you're doing.
> > Maybe Igor can comment if some other tuning makes sense here.
> >
> > Regards,
> > Eugen
> >
> > Zitat von Zakhar Kirpichenko :
> >
> > > Any input from anyone, please?
> > >
> > > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko 
> > wrote:
> > >
> > >> Any input from anyone, please?
> > >>
> > >> It's another thing that seems to be rather poorly documented: it's
> > unclear
> > >> what to expect, what 'normal' behavior should be, and what can be done
> > >> about the huge amount of writes by monitors.
> > >>
> > >> /Z
> > >>
> > >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko 
> > wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Monitors in our 16.2.14 cluster appear to quite often run "manual
> > >>> compaction" tasks:
> > >>>
> > >>> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb:
> > EVENT_LOG_v1
> > >>> {"time_micros": 1696843853892760, "job": 64225, "event":
> > "flush_started",
> > >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
> > >>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
> > >>> "Manual Compaction"}
> > >>> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:30:53.910204)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
> > 'paxos;
> > >>> will stop at (end)
> > >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:14

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Thank you, Eugen.

I'm interested specifically to find out whether the huge amount of data
written by monitors is expected. It is eating through the endurance of our
system drives, which were not specced for high DWPD/TBW, as this is not a
documented requirement, and monitors produce hundreds of gigabytes of
writes per day. I am looking for ways to reduce the amount of writes, if
possible.

/Z

On Wed, 11 Oct 2023 at 12:41, Eugen Block  wrote:

> Hi,
>
> what you report is the expected behaviour, at least I see the same on
> all clusters. I can't answer why the compaction is required that
> often, but you can control the log level of the rocksdb output:
>
> ceph config set mon debug_rocksdb 1/5 (default is 4/5)
>
> This reduces the log entries and you wouldn't see the manual
> compaction logs anymore. There are a couple more rocksdb options but I
> probably wouldn't change too much, only if you know what you're doing.
> Maybe Igor can comment if some other tuning makes sense here.
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Any input from anyone, please?
> >
> > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko 
> wrote:
> >
> >> Any input from anyone, please?
> >>
> >> It's another thing that seems to be rather poorly documented: it's
> unclear
> >> what to expect, what 'normal' behavior should be, and what can be done
> >> about the huge amount of writes by monitors.
> >>
> >> /Z
> >>
> >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Monitors in our 16.2.14 cluster appear to quite often run "manual
> >>> compaction" tasks:
> >>>
> >>> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb:
> EVENT_LOG_v1
> >>> {"time_micros": 1696843853892760, "job": 64225, "event":
> "flush_started",
> >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
> >>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
> >>> "Manual Compaction"}
> >>> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:30:53.910204)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
> 'paxos;
> >>> will stop at (end)
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:30:53.911004)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-5 to level-6 from 'paxos ..
> 'paxos;
> >>> will stop at (end)
> >>> debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb:
> EVENT_LOG_v1
> >>> {"time_micros": 1696843928961390, "job": 64228, "event":
> "flush_started",
> >>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
> >>> "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
> >>> "Manual Compaction"}
> >>> debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original
> Log

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Any input from anyone, please?

On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko  wrote:

> Any input from anyone, please?
>
> It's another thing that seems to be rather poorly documented: it's unclear
> what to expect, what 'normal' behavior should be, and what can be done
> about the huge amount of writes by monitors.
>
> /Z
>
> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko  wrote:
>
>> Hi,
>>
>> Monitors in our 16.2.14 cluster appear to quite often run "manual
>> compaction" tasks:
>>
>> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1696843853892760, "job": 64225, "event": "flush_started",
>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:30:53.910204) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos;
>> will stop at (end)
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:30:53.911004) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos;
>> will stop at (end)
>> debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1696843928961390, "job": 64228, "event": "flush_started",
>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
>> "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:32:08.977739) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-0 to level-5 from 'logm .. 'logm;
>> will stop at (end)
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:32:08.978512) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-5 to level-6 from 'logm .. 'logm;
>> will stop at (end)
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-10 Thread Zakhar Kirpichenko
Any input from anyone, please?

It's another thing that seems to be rather poorly documented: it's unclear
what to expect, what 'normal' behavior should be, and what can be done
about the huge amount of writes by monitors.

/Z

On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko  wrote:

> Hi,
>
> Monitors in our 16.2.14 cluster appear to quite often run "manual
> compaction" tasks:
>
> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1696843853892760, "job": 64225, "event": "flush_started",
> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
> "Manual Compaction"}
> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
> Time 2023/10/09-09:30:53.910204) [db_impl/db_impl_compaction_flush.cc:2516]
> [default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos;
> will stop at (end)
> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
> Time 2023/10/09-09:30:53.911004) [db_impl/db_impl_compaction_flush.cc:2516]
> [default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos;
> will stop at (end)
> debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1696843928961390, "job": 64228, "event": "flush_started",
> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
> "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
> "Manual Compaction"}
> debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
> Time 2023/10/09-09:32:08.977739) [db_impl/db_impl_compaction_flush.cc:2516]
> [default] Manual compaction from level-0 to level-5 from 'logm .. 'logm;
> will stop at (end)
> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
> Time 2023/10/09-09:32:08.978512) [db_impl/db_impl_compaction_flush.cc:2516]
> [default] Manual compaction from level-5 to level-6 from 'logm .. 'logm;
> will stop at (end)
> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> starting
> debug 2023-10-09T09:32:12

[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-09 Thread Zakhar Kirpichenko
Thanks for the suggestion. That pid belongs to the mon process. I.e. the
monitor is logging all client connections and commands.

/Z

On Mon, 9 Oct 2023 at 14:24, Kai Stian Olstad  wrote:

> On 09.10.2023 10:05, Zakhar Kirpichenko wrote:
> > I did try to play with various debug settings. The issue is that mons
> > produce logs of all commands issued by clients, not just mgr. For
> > example,
> > an Openstack Cinder node asking for space it can use:
> >
> > Oct  9 07:59:01 ceph03 bash[4019]: debug 2023-10-09T07:59:01.303+
>
> This log say that it's bash with PID 4019 that is creating the log
> entry.
> Maybe start there, check what what other thing you are running on the
> server that creates this messages.
>
> --
> Kai Stian Olstad
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-09 Thread Zakhar Kirpichenko
I have `mon debug_client`, but it's already at 0/5 by default.

/Z

On Mon, 9 Oct 2023 at 11:24, Marc  wrote:

> Hi Zakhar,
>
> >
> > I did try to play with various debug settings. The issue is that mons
> > produce logs of all commands issued by clients, not just mgr. For
> example,
> > an Openstack Cinder node asking for space it can use:
> >
> > Oct  9 07:59:01 ceph03 bash[4019]: debug 2023-10-09T07:59:01.303+
> > 7f489da8f700  0 log_channel(audit) log [DBG] : from='client.?
> > 10.208.1.11:0/3286277243 <http://10.208.1.11:0/3286277243> '
> > entity='client.cinder' cmd=[{"prefix":"osd pool get-quota", "pool":
> > "volumes-ssd", "format":"json"}]: dispatch
>
> I am on a older version of ceph still so I am not sure if I even have
> these. There is also an option ceph.conf to do client logging
>
>   [client]
>   #debug client = 5
>
>
> >
> > It is unclear which particular mon debug option out of many controls this
> > particular type of debug. I tried searching for documentation of mon
> debug
> > options to no avail.
> >
>
> Maybe there is something equal to this for logging?
> ceph daemon mon.a perf schema|less
> ceph daemon osd.0 perf schema|less
>
>
> >
> >
> >   Did you do something like this
> >
> >   Getting keys with
> >   ceph daemon mon.a config show | grep debug_ | grep mgr
> >
> >   ceph tell mon.* injectargs --$monk=0/0
> >
> >   >
> >   > Any input from anyone, please?
> >   >
> >   > This part of Ceph is very poorly documented. Perhaps there's a
> > better place
> >   > to ask this question? Please let me know.
> >   >
> >   > /Z
> >   >
> >   > On Sat, 7 Oct 2023 at 22:00, Zakhar Kirpichenko <
> zak...@gmail.com
> > <mailto:zak...@gmail.com> > wrote:
> >   >
> >   > > Hi!
> >   > >
> >   > > I am still fighting excessive logging. I've reduced unnecessary
> > logging
> >   > > from most components except for mon audit:
> > https://pastebin.com/jjWvUEcQ
> >   > >
> >   > > How can I stop logging this particular type of messages?
> >   > >
> >   > > I would appreciate your help and advice.
> >   > >
> >   > > /Z
> >   > >
> >   > >
> >   > >> Thank you for your response, Igor.
> >   > >>
> >   > >> Currently debug_rocksdb is set to 4/5:
> >   > >>
> >   > >> # ceph config get osd debug_rocksdb
> >   > >> 4/5
> >   > >>
> >   > >> This setting seems to be default. Is my understanding correct
> > that
> >   > you're
> >   > >> suggesting setting it to 3/5 or even 0/5? Would setting it to
> > 0/5 have
> >   > any
> >   > >> negative effects on the cluster?
> >   > >>
> >   > >> /Z
> >   > >>
> >   > >> On Wed, 4 Oct 2023 at 21:23, Igor Fedotov <
> igor.fedo...@croit.io
> > <mailto:igor.fedo...@croit.io> > wrote:
> >   > >>
> >   > >>> Hi Zakhar,
> >   > >>>
> >   > >>> do reduce rocksdb logging verbosity you might want to set
> > debug_rocksdb
> >   > >>> to 3 (or 0).
> >   > >>>
> >   > >>> I presume it produces a  significant part of the logging
> > traffic.
> >   > >>>
> >   > >>>
> >   > >>> Thanks,
> >   > >>>
> >   > >>> Igor
> >   > >>>
> >   > >>> On 04/10/2023 20:51, Zakhar Kirpichenko wrote:
> >   > >>> > Any input from anyone, please?
> >   > >>> >
> >   > >>> > On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko
> > mailto:zak...@gmail.com> >
> >   > >>> wrote:
> >   > >>> >
> >   > >>> >> Hi,
> >   > >>> >>
> >   > >>> >> Our Ceph 16.2.x cluster managed by cephadm is logging a
> lot
> > of very
> >

[ceph-users] Ceph 16.2.x mon compactions, disk writes

2023-10-09 Thread Zakhar Kirpichenko
Hi,

Monitors in our 16.2.14 cluster appear to quite often run "manual
compaction" tasks:

debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1696843853892760, "job": 64225, "event": "flush_started",
"num_memtables": 1, "num_entries": 715, "num_deletes": 251,
"total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
"Manual Compaction"}
debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:30:53.910204) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos;
will stop at (end)
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:30:53.911004) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos;
will stop at (end)
debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1696843928961390, "job": 64228, "event": "flush_started",
"num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
"total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
"Manual Compaction"}
debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:32:08.977739) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-0 to level-5 from 'logm .. 'logm;
will stop at (end)
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:32:08.978512) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-5 to level-6 from 'logm .. 'logm;
will stop at (end)
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:33:29.028+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1696844009033151, "job": 64231, "event": "flush_started",
"num_memtables": 1, "num_entries": 1430, "num_deletes": 251,
"total_data_size": 8975535, "memory_usage": 9035920, "flush_reason":
"Manual Compaction"}
debug 2023-10-09T09:33:29.044+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:33:29.048+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:33:29.049585) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos;
will stop at (end)
debug 2023-10-09T09:33:29.048+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction

[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-09 Thread Zakhar Kirpichenko
Thanks for your reply, Marc!

I did try to play with various debug settings. The issue is that mons
produce logs of all commands issued by clients, not just mgr. For example,
an Openstack Cinder node asking for space it can use:

Oct  9 07:59:01 ceph03 bash[4019]: debug 2023-10-09T07:59:01.303+
7f489da8f700  0 log_channel(audit) log [DBG] : from='client.?
10.208.1.11:0/3286277243' entity='client.cinder' cmd=[{"prefix":"osd pool
get-quota", "pool": "volumes-ssd", "format":"json"}]: dispatch

It is unclear which particular mon debug option out of many controls this
particular type of debug. I tried searching for documentation of mon debug
options to no avail.

/Z


On Mon, 9 Oct 2023 at 10:03, Marc  wrote:

>
> Did you do something like this
>
> Getting keys with
> ceph daemon mon.a config show | grep debug_ | grep mgr
>
> ceph tell mon.* injectargs --$monk=0/0
>
> >
> > Any input from anyone, please?
> >
> > This part of Ceph is very poorly documented. Perhaps there's a better
> place
> > to ask this question? Please let me know.
> >
> > /Z
> >
> > On Sat, 7 Oct 2023 at 22:00, Zakhar Kirpichenko 
> wrote:
> >
> > > Hi!
> > >
> > > I am still fighting excessive logging. I've reduced unnecessary logging
> > > from most components except for mon audit:
> https://pastebin.com/jjWvUEcQ
> > >
> > > How can I stop logging this particular type of messages?
> > >
> > > I would appreciate your help and advice.
> > >
> > > /Z
> > >
> > > On Thu, 5 Oct 2023 at 06:47, Zakhar Kirpichenko 
> wrote:
> > >
> > >> Thank you for your response, Igor.
> > >>
> > >> Currently debug_rocksdb is set to 4/5:
> > >>
> > >> # ceph config get osd debug_rocksdb
> > >> 4/5
> > >>
> > >> This setting seems to be default. Is my understanding correct that
> > you're
> > >> suggesting setting it to 3/5 or even 0/5? Would setting it to 0/5 have
> > any
> > >> negative effects on the cluster?
> > >>
> > >> /Z
> > >>
> > >> On Wed, 4 Oct 2023 at 21:23, Igor Fedotov 
> wrote:
> > >>
> > >>> Hi Zakhar,
> > >>>
> > >>> do reduce rocksdb logging verbosity you might want to set
> debug_rocksdb
> > >>> to 3 (or 0).
> > >>>
> > >>> I presume it produces a  significant part of the logging traffic.
> > >>>
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Igor
> > >>>
> > >>> On 04/10/2023 20:51, Zakhar Kirpichenko wrote:
> > >>> > Any input from anyone, please?
> > >>> >
> > >>> > On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko <
> zak...@gmail.com>
> > >>> wrote:
> > >>> >
> > >>> >> Hi,
> > >>> >>
> > >>> >> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of
> very
> > >>> >> detailed messages, Ceph logs alone on hosts with monitors and
> > several
> > >>> OSDs
> > >>> >> has already eaten through 50% of the endurance of the flash system
> > >>> drives
> > >>> >> over a couple of years.
> > >>> >>
> > >>> >> Cluster logging settings are default, and it seems that all
> daemons
> > >>> are
> > >>> >> writing lots and lots of debug information to the logs, such as
> for
> > >>> >> example: https://pastebin.com/ebZq8KZk (it's just a snippet, but
> > >>> there's
> > >>> >> lots and lots of various information).
> > >>> >>
> > >>> >> Is there a way to reduce the amount of logging and, for example,
> > >>> limit the
> > >>> >> logging to warnings or important messages so that it doesn't
> include
> > >>> every
> > >>> >> successful authentication attempt, compaction etc, etc, when the
> > >>> cluster is
> > >>> >> healthy and operating normally?
> > >>> >>
> > >>> >> I would very much appreciate your advice on this.
> > >>> >>
> > >>> >> Best regards,
> > >>> >> Zakhar
> > >>> >>
> > >>> >>
> > >>> >>
> > >>> > ___
> > >>> > ceph-users mailing list -- ceph-users@ceph.io
> > >>> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>>
> > >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-09 Thread Zakhar Kirpichenko
Any input from anyone, please?

This part of Ceph is very poorly documented. Perhaps there's a better place
to ask this question? Please let me know.

/Z

On Sat, 7 Oct 2023 at 22:00, Zakhar Kirpichenko  wrote:

> Hi!
>
> I am still fighting excessive logging. I've reduced unnecessary logging
> from most components except for mon audit: https://pastebin.com/jjWvUEcQ
>
> How can I stop logging this particular type of messages?
>
> I would appreciate your help and advice.
>
> /Z
>
> On Thu, 5 Oct 2023 at 06:47, Zakhar Kirpichenko  wrote:
>
>> Thank you for your response, Igor.
>>
>> Currently debug_rocksdb is set to 4/5:
>>
>> # ceph config get osd debug_rocksdb
>> 4/5
>>
>> This setting seems to be default. Is my understanding correct that you're
>> suggesting setting it to 3/5 or even 0/5? Would setting it to 0/5 have any
>> negative effects on the cluster?
>>
>> /Z
>>
>> On Wed, 4 Oct 2023 at 21:23, Igor Fedotov  wrote:
>>
>>> Hi Zakhar,
>>>
>>> do reduce rocksdb logging verbosity you might want to set debug_rocksdb
>>> to 3 (or 0).
>>>
>>> I presume it produces a  significant part of the logging traffic.
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>> On 04/10/2023 20:51, Zakhar Kirpichenko wrote:
>>> > Any input from anyone, please?
>>> >
>>> > On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko 
>>> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
>>> >> detailed messages, Ceph logs alone on hosts with monitors and several
>>> OSDs
>>> >> has already eaten through 50% of the endurance of the flash system
>>> drives
>>> >> over a couple of years.
>>> >>
>>> >> Cluster logging settings are default, and it seems that all daemons
>>> are
>>> >> writing lots and lots of debug information to the logs, such as for
>>> >> example: https://pastebin.com/ebZq8KZk (it's just a snippet, but
>>> there's
>>> >> lots and lots of various information).
>>> >>
>>> >> Is there a way to reduce the amount of logging and, for example,
>>> limit the
>>> >> logging to warnings or important messages so that it doesn't include
>>> every
>>> >> successful authentication attempt, compaction etc, etc, when the
>>> cluster is
>>> >> healthy and operating normally?
>>> >>
>>> >> I would very much appreciate your advice on this.
>>> >>
>>> >> Best regards,
>>> >> Zakhar
>>> >>
>>> >>
>>> >>
>>> > ___
>>> > ceph-users mailing list -- ceph-users@ceph.io
>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-07 Thread Zakhar Kirpichenko
Hi!

I am still fighting excessive logging. I've reduced unnecessary logging
from most components except for mon audit: https://pastebin.com/jjWvUEcQ

How can I stop logging this particular type of messages?

I would appreciate your help and advice.

/Z

On Thu, 5 Oct 2023 at 06:47, Zakhar Kirpichenko  wrote:

> Thank you for your response, Igor.
>
> Currently debug_rocksdb is set to 4/5:
>
> # ceph config get osd debug_rocksdb
> 4/5
>
> This setting seems to be default. Is my understanding correct that you're
> suggesting setting it to 3/5 or even 0/5? Would setting it to 0/5 have any
> negative effects on the cluster?
>
> /Z
>
> On Wed, 4 Oct 2023 at 21:23, Igor Fedotov  wrote:
>
>> Hi Zakhar,
>>
>> do reduce rocksdb logging verbosity you might want to set debug_rocksdb
>> to 3 (or 0).
>>
>> I presume it produces a  significant part of the logging traffic.
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 04/10/2023 20:51, Zakhar Kirpichenko wrote:
>> > Any input from anyone, please?
>> >
>> > On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
>> >> detailed messages, Ceph logs alone on hosts with monitors and several
>> OSDs
>> >> has already eaten through 50% of the endurance of the flash system
>> drives
>> >> over a couple of years.
>> >>
>> >> Cluster logging settings are default, and it seems that all daemons are
>> >> writing lots and lots of debug information to the logs, such as for
>> >> example: https://pastebin.com/ebZq8KZk (it's just a snippet, but
>> there's
>> >> lots and lots of various information).
>> >>
>> >> Is there a way to reduce the amount of logging and, for example, limit
>> the
>> >> logging to warnings or important messages so that it doesn't include
>> every
>> >> successful authentication attempt, compaction etc, etc, when the
>> cluster is
>> >> healthy and operating normally?
>> >>
>> >> I would very much appreciate your advice on this.
>> >>
>> >> Best regards,
>> >> Zakhar
>> >>
>> >>
>> >>
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-04 Thread Zakhar Kirpichenko
Thank you for your response, Igor.

Currently debug_rocksdb is set to 4/5:

# ceph config get osd debug_rocksdb
4/5

This setting seems to be default. Is my understanding correct that you're
suggesting setting it to 3/5 or even 0/5? Would setting it to 0/5 have any
negative effects on the cluster?

/Z

On Wed, 4 Oct 2023 at 21:23, Igor Fedotov  wrote:

> Hi Zakhar,
>
> do reduce rocksdb logging verbosity you might want to set debug_rocksdb
> to 3 (or 0).
>
> I presume it produces a  significant part of the logging traffic.
>
>
> Thanks,
>
> Igor
>
> On 04/10/2023 20:51, Zakhar Kirpichenko wrote:
> > Any input from anyone, please?
> >
> > On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko 
> wrote:
> >
> >> Hi,
> >>
> >> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
> >> detailed messages, Ceph logs alone on hosts with monitors and several
> OSDs
> >> has already eaten through 50% of the endurance of the flash system
> drives
> >> over a couple of years.
> >>
> >> Cluster logging settings are default, and it seems that all daemons are
> >> writing lots and lots of debug information to the logs, such as for
> >> example: https://pastebin.com/ebZq8KZk (it's just a snippet, but
> there's
> >> lots and lots of various information).
> >>
> >> Is there a way to reduce the amount of logging and, for example, limit
> the
> >> logging to warnings or important messages so that it doesn't include
> every
> >> successful authentication attempt, compaction etc, etc, when the
> cluster is
> >> healthy and operating normally?
> >>
> >> I would very much appreciate your advice on this.
> >>
> >> Best regards,
> >> Zakhar
> >>
> >>
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-04 Thread Zakhar Kirpichenko
Any input from anyone, please?

On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko  wrote:

> Hi,
>
> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
> detailed messages, Ceph logs alone on hosts with monitors and several OSDs
> has already eaten through 50% of the endurance of the flash system drives
> over a couple of years.
>
> Cluster logging settings are default, and it seems that all daemons are
> writing lots and lots of debug information to the logs, such as for
> example: https://pastebin.com/ebZq8KZk (it's just a snippet, but there's
> lots and lots of various information).
>
> Is there a way to reduce the amount of logging and, for example, limit the
> logging to warnings or important messages so that it doesn't include every
> successful authentication attempt, compaction etc, etc, when the cluster is
> healthy and operating normally?
>
> I would very much appreciate your advice on this.
>
> Best regards,
> Zakhar
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14: [progress WARNING root] complete: ev {UUID} does not exist

2023-09-29 Thread Zakhar Kirpichenko
Many thanks for the clarification!

/Z

On Fri, 29 Sept 2023 at 16:43, Tyler Stachecki 
wrote:

>
>
> On Fri, Sep 29, 2023, 9:40 AM Zakhar Kirpichenko  wrote:
>
>> Thanks for the suggestion, Tyler! Do you think switching the progress
>> module off will have no material impact on the operation of the cluster?
>>
>
> It does not. It literally just tracks the completion rate of certain
> actions so that it can render progress bars ETAs in e.g. `ceph -s` output.
>
>
>> /Z
>>
>> On Fri, 29 Sept 2023 at 14:13, Tyler Stachecki 
>> wrote:
>>
>>> On Fri, Sep 29, 2023, 5:55 AM Zakhar Kirpichenko 
>>> wrote:
>>>
>>>> Thank you, Eugen.
>>>>
>>>> Indeed it looks like the progress module had some stale events from the
>>>> time when we added new OSDs and set a specific number of PGs for pools,
>>>> while the autoscaler tried to scale them down. Somehow the scale-down
>>>> events got stuck in the progress log, although these tasks have
>>>> finished a
>>>> long time ago. Failing over to another MGR didn't help, so I have
>>>> cleared
>>>> the progress log.
>>>>
>>>> I also restarted both mgrs, but unfortunately the warnings are still
>>>> being
>>>> logged.
>>>>
>>>> /Z
>>>
>>>
>>> I would recommend just turning off the progress module via `ceph
>>> progress off`. It's historically been a source of bugs (like this...) and
>>> does not do much in the grand scheme of things.
>>>
>>>
>>>> On Fri, 29 Sept 2023 at 11:32, Eugen Block  wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > this is from the mgr progress module [1]. I haven't played too much
>>>> > with it yet, you can check out the output of 'ceph progress json',
>>>> > maybe there are old events from a (failed) upgrade etc. You can reset
>>>> > it with 'ceph progress clear', you could also turn it off ('ceph
>>>> > progress off') but I don't know what impact that would have, so maybe
>>>> > investigate first and then try just clearing it. Maybe a mgr failover
>>>> > would do the same, not sure.
>>>> >
>>>> > Regards,
>>>> > Eugen
>>>> >
>>>> > [1]
>>>> >
>>>> >
>>>> https://github.com/ceph/ceph/blob/1d10b71792f3be8887a7631e69851ac2df3585af/src/pybind/mgr/progress/module.py#L797
>>>> >
>>>> > Zitat von Zakhar Kirpichenko :
>>>> >
>>>> > > Hi,
>>>> > >
>>>> > > Mgr of my cluster logs this every few seconds:
>>>> > >
>>>> > > [progress WARNING root] complete: ev
>>>> 7de5bb74-790b-4fda-8838-e4af4af18c62
>>>> > > does not exist
>>>> > > [progress WARNING root] complete: ev
>>>> fff93fce-b630-4141-81ee-19e7a3e61483
>>>> > > does not exist
>>>> > > [progress WARNING root] complete: ev
>>>> a02f6966-5b9f-49e8-89c4-b4fb8e6f4423
>>>> > > does not exist
>>>> > > [progress WARNING root] complete: ev
>>>> 8d318560-ff1a-477f-9386-43f6b51080bf
>>>> > > does not exist
>>>> > > [progress WARNING root] complete: ev
>>>> ff3740a9-6434-470a-808f-a2762fb542a0
>>>> > > does not exist
>>>> > > [progress WARNING root] complete: ev
>>>> 7d0589f1-545e-4970-867b-8482ce48d7f0
>>>> > > does not exist
>>>> > > [progress WARNING root] complete: ev
>>>> 78d57e43-5be5-43f0-8b1a-cdc60e410892
>>>> > > does not exist
>>>> > >
>>>> > > I would appreciate an advice on what these warnings mean and how
>>>> they can
>>>> > > be resolved.
>>>> > >
>>>> > > Best regards,
>>>> > > Zakhar
>>>> > > ___
>>>> > > ceph-users mailing list -- ceph-users@ceph.io
>>>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>> >
>>>> >
>>>> > ___
>>>> > ceph-users mailing list -- ceph-users@ceph.io
>>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>> >
>>>> ___
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14: [progress WARNING root] complete: ev {UUID} does not exist

2023-09-29 Thread Zakhar Kirpichenko
Thanks for the suggestion, Tyler! Do you think switching the progress
module off will have no material impact on the operation of the cluster?

/Z

On Fri, 29 Sept 2023 at 14:13, Tyler Stachecki 
wrote:

> On Fri, Sep 29, 2023, 5:55 AM Zakhar Kirpichenko  wrote:
>
>> Thank you, Eugen.
>>
>> Indeed it looks like the progress module had some stale events from the
>> time when we added new OSDs and set a specific number of PGs for pools,
>> while the autoscaler tried to scale them down. Somehow the scale-down
>> events got stuck in the progress log, although these tasks have finished a
>> long time ago. Failing over to another MGR didn't help, so I have cleared
>> the progress log.
>>
>> I also restarted both mgrs, but unfortunately the warnings are still being
>> logged.
>>
>> /Z
>
>
> I would recommend just turning off the progress module via `ceph progress
> off`. It's historically been a source of bugs (like this...) and does not
> do much in the grand scheme of things.
>
>
>> On Fri, 29 Sept 2023 at 11:32, Eugen Block  wrote:
>>
>> > Hi,
>> >
>> > this is from the mgr progress module [1]. I haven't played too much
>> > with it yet, you can check out the output of 'ceph progress json',
>> > maybe there are old events from a (failed) upgrade etc. You can reset
>> > it with 'ceph progress clear', you could also turn it off ('ceph
>> > progress off') but I don't know what impact that would have, so maybe
>> > investigate first and then try just clearing it. Maybe a mgr failover
>> > would do the same, not sure.
>> >
>> > Regards,
>> > Eugen
>> >
>> > [1]
>> >
>> >
>> https://github.com/ceph/ceph/blob/1d10b71792f3be8887a7631e69851ac2df3585af/src/pybind/mgr/progress/module.py#L797
>> >
>> > Zitat von Zakhar Kirpichenko :
>> >
>> > > Hi,
>> > >
>> > > Mgr of my cluster logs this every few seconds:
>> > >
>> > > [progress WARNING root] complete: ev
>> 7de5bb74-790b-4fda-8838-e4af4af18c62
>> > > does not exist
>> > > [progress WARNING root] complete: ev
>> fff93fce-b630-4141-81ee-19e7a3e61483
>> > > does not exist
>> > > [progress WARNING root] complete: ev
>> a02f6966-5b9f-49e8-89c4-b4fb8e6f4423
>> > > does not exist
>> > > [progress WARNING root] complete: ev
>> 8d318560-ff1a-477f-9386-43f6b51080bf
>> > > does not exist
>> > > [progress WARNING root] complete: ev
>> ff3740a9-6434-470a-808f-a2762fb542a0
>> > > does not exist
>> > > [progress WARNING root] complete: ev
>> 7d0589f1-545e-4970-867b-8482ce48d7f0
>> > > does not exist
>> > > [progress WARNING root] complete: ev
>> 78d57e43-5be5-43f0-8b1a-cdc60e410892
>> > > does not exist
>> > >
>> > > I would appreciate an advice on what these warnings mean and how they
>> can
>> > > be resolved.
>> > >
>> > > Best regards,
>> > > Zakhar
>> > > ___
>> > > ceph-users mailing list -- ceph-users@ceph.io
>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14: [progress WARNING root] complete: ev {UUID} does not exist

2023-09-29 Thread Zakhar Kirpichenko
Thank you, Eugen.

Indeed it looks like the progress module had some stale events from the
time when we added new OSDs and set a specific number of PGs for pools,
while the autoscaler tried to scale them down. Somehow the scale-down
events got stuck in the progress log, although these tasks have finished a
long time ago. Failing over to another MGR didn't help, so I have cleared
the progress log.

I also restarted both mgrs, but unfortunately the warnings are still being
logged.

/Z

On Fri, 29 Sept 2023 at 11:32, Eugen Block  wrote:

> Hi,
>
> this is from the mgr progress module [1]. I haven't played too much
> with it yet, you can check out the output of 'ceph progress json',
> maybe there are old events from a (failed) upgrade etc. You can reset
> it with 'ceph progress clear', you could also turn it off ('ceph
> progress off') but I don't know what impact that would have, so maybe
> investigate first and then try just clearing it. Maybe a mgr failover
> would do the same, not sure.
>
> Regards,
> Eugen
>
> [1]
>
> https://github.com/ceph/ceph/blob/1d10b71792f3be8887a7631e69851ac2df3585af/src/pybind/mgr/progress/module.py#L797
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > Mgr of my cluster logs this every few seconds:
> >
> > [progress WARNING root] complete: ev 7de5bb74-790b-4fda-8838-e4af4af18c62
> > does not exist
> > [progress WARNING root] complete: ev fff93fce-b630-4141-81ee-19e7a3e61483
> > does not exist
> > [progress WARNING root] complete: ev a02f6966-5b9f-49e8-89c4-b4fb8e6f4423
> > does not exist
> > [progress WARNING root] complete: ev 8d318560-ff1a-477f-9386-43f6b51080bf
> > does not exist
> > [progress WARNING root] complete: ev ff3740a9-6434-470a-808f-a2762fb542a0
> > does not exist
> > [progress WARNING root] complete: ev 7d0589f1-545e-4970-867b-8482ce48d7f0
> > does not exist
> > [progress WARNING root] complete: ev 78d57e43-5be5-43f0-8b1a-cdc60e410892
> > does not exist
> >
> > I would appreciate an advice on what these warnings mean and how they can
> > be resolved.
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 16.2.14: [progress WARNING root] complete: ev {UUID} does not exist

2023-09-29 Thread Zakhar Kirpichenko
Hi,

Mgr of my cluster logs this every few seconds:

[progress WARNING root] complete: ev 7de5bb74-790b-4fda-8838-e4af4af18c62
does not exist
[progress WARNING root] complete: ev fff93fce-b630-4141-81ee-19e7a3e61483
does not exist
[progress WARNING root] complete: ev a02f6966-5b9f-49e8-89c4-b4fb8e6f4423
does not exist
[progress WARNING root] complete: ev 8d318560-ff1a-477f-9386-43f6b51080bf
does not exist
[progress WARNING root] complete: ev ff3740a9-6434-470a-808f-a2762fb542a0
does not exist
[progress WARNING root] complete: ev 7d0589f1-545e-4970-867b-8482ce48d7f0
does not exist
[progress WARNING root] complete: ev 78d57e43-5be5-43f0-8b1a-cdc60e410892
does not exist

I would appreciate an advice on what these warnings mean and how they can
be resolved.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-09-19 Thread Zakhar Kirpichenko
Any input from anyone, please?

On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko  wrote:

> Hi,
>
> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
> detailed messages, Ceph logs alone on hosts with monitors and several OSDs
> has already eaten through 50% of the endurance of the flash system drives
> over a couple of years.
>
> Cluster logging settings are default, and it seems that all daemons are
> writing lots and lots of debug information to the logs, such as for
> example: https://pastebin.com/ebZq8KZk (it's just a snippet, but there's
> lots and lots of various information).
>
> Is there a way to reduce the amount of logging and, for example, limit the
> logging to warnings or important messages so that it doesn't include every
> successful authentication attempt, compaction etc, etc, when the cluster is
> healthy and operating normally?
>
> I would very much appreciate your advice on this.
>
> Best regards,
> Zakhar
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.x excessive logging, how to reduce?

2023-09-19 Thread Zakhar Kirpichenko
Hi,

Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
detailed messages, Ceph logs alone on hosts with monitors and several OSDs
has already eaten through 50% of the endurance of the flash system drives
over a couple of years.

Cluster logging settings are default, and it seems that all daemons are
writing lots and lots of debug information to the logs, such as for
example: https://pastebin.com/ebZq8KZk (it's just a snippet, but there's
lots and lots of various information).

Is there a way to reduce the amount of logging and, for example, limit the
logging to warnings or important messages so that it doesn't include every
successful authentication attempt, compaction etc, etc, when the cluster is
healthy and operating normally?

I would very much appreciate your advice on this.

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   >