[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-16 Thread Laura Flores
On behalf of @Radoslaw Zarzynski , rados approved.

Summary of known failures here:
https://tracker.ceph.com/projects/rados/wiki/QUINCY#Quincy-v1727-validation

On Mon, Oct 16, 2023 at 3:17 PM Ilya Dryomov  wrote:

> On Mon, Oct 16, 2023 at 8:52 PM Yuri Weinstein 
> wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/63219#note-2
> > Release Notes - TBD
> >
> > Issue https://tracker.ceph.com/issues/63192 appears to be failing
> several runs.
> > Should it be fixed for this release?
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura
> > rados - Laura, Radek, Travis, Ernesto, Adam King
> >
> > rgw - Casey
> > fs - Venky
> > orch - Adam King
> >
> > rbd - Ilya
> > krbd - Ilya
>
> rbd and krbd approved.
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-16 Thread Ilya Dryomov
On Mon, Oct 16, 2023 at 8:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya

rbd and krbd approved.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Dashboard and Object Gateway

2023-10-16 Thread Tim Holloway
First, an abject apology for the horrors I'm about to unveil. I made a
cold migration from GlusterFS to Ceph a few months back, so it was a
learn-/screwup/-as-you-go affair.

For reasons of presumed compatibility with some of my older servers, I
started with Ceph Octopus. Unfortunately, Octopus seems to have been a
nexus of transitions from older Ceph organization and management to a
newer (cephadm) system combined with a relocation of many ceph
resources and compounded by stale bits of documentation (notably some
references to SysV procedures and an obsolete installer that doesn't
even come with Octopus).

A far bigger problem was a known issue where actions would be scheduled
but never executed if the system was even slightly dirty. And of
course, since my system was hopelessly dirty, that was a major issue.
Finally I took a risk and bumped up to Pacific, where that issue no
longer exists. I won't say that I'm 100% clean even now, but at least
the remaining crud is in areas where it cannot do any harm. Presumably.

Given that, the only bar now remaining to total joy has been my
inability to connect via the Ceph Dashboard to the Object Gateway.

This seems to be an oft-reported problem, but generally referenced
relative to higher-level administrative interfaces like Kubernetes and
rook. I'm interfacing more directly, however. Regardless, the error
reported is notably familiar:

[quote]
The Object Gateway Service is not configured
Error connecting to Object Gateway: RGW REST API failed request with
status code 404
(b'{"Code":"NoSuchBucket","Message":"","BucketName":"default","RequestI
d":"tx00' b'000dd0c65b8bda685b4-00652d8e0f-5e3a9b-
default","HostId":"5e3a9b-default-defa' b'ult"}')
Please consult the documentation on how to configure and enable the
Object Gateway management functionality. 
[/quote]

In point of fact, what this REALLY means in my case is that the bucket
that is supposed to contain the necessary information for the dashboard
and rgw to communicate has not been created. Presumably that SHOULDhave
been done by the "ceph dashboard set-rgw-credentials" command, but
apparently isn't, because the default zone has no buckets at all, much
less one named "default".

By way of reference, the dashboard is definitely trying to interact
with the rgw container, because trying object gateway options on the
dashboard result in the container logging the following.

beast: 0x7efd29621620: 10.0.1.16 - dashboard [16/Oct/2023:19:25:03.678
+] "GET /default/metadata/user?myself HTTP/1.1" 404

To make everything happy, I'd be glad to accept instructions on how to
manually brute-force construct this bucket.

Of course, as a cleaner long-term solution, it would be nice if the
failure to create could be detected and logged.

And of course, the ultimate solution: something that would assist in
making whatever processes are unhappy be happy.

Thanks,
  Tim
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] quincy v17.2.7 QE Validation status

2023-10-16 Thread Yuri Weinstein
Details of this release are summarized here:

https://tracker.ceph.com/issues/63219#note-2
Release Notes - TBD

Issue https://tracker.ceph.com/issues/63192 appears to be failing several runs.
Should it be fixed for this release?

Seeking approvals/reviews for:

smoke - Laura
rados - Laura, Radek, Travis, Ernesto, Adam King

rgw - Casey
fs - Venky
orch - Adam King

rbd - Ilya
krbd - Ilya

upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve

client-upgrade-quincy-reef - Laura

powercycle - Brad pls confirm

ceph-volume - Guillaume pls take a look

Please reply to this email with approval and/or trackers of known
issues/PRs to address them.

Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.

Thx
YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Unable to delete rbd images

2023-10-16 Thread Mohammad Alam
Hello All,
Greetings. We've a Ceph Cluster with the version
*ceph version 14.2.16-402-g7d47dbaf4d
(7d47dbaf4d0960a2e910628360ae36def84ed913) nautilus (stable)
=

Issues: Can't able to delete rbd images

We have deleted target from the dashboard and now trying to delete rbd images 
from cli but not able to delete.

when we ran "rbd rm -f tegile-500tb -p iscsi-images" its returning
2023-10-16 15:22:16.719 7f90bb332700 -1 librbd::image::PreRemoveRequest: 
0x7f90a80041a0 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again 
after closing/unmapping it or waiting 30s for the crashed client to timeout.


It is also not being deleted from dashboard.


Even we tried to list the watcher but it is not returning anything like no such 
file or directory ,



"rbd info iscsi-images/tegile-500tb"
rbd: error opening image tegile-500tb: (2) No such file or directory



It is not showing on "rbd showmapped" output as well for that particular 
images, hence we can not unmap it.

We can not restart iscsi gateway because that is being running and we can not 
interrupt it. 

===

Suggest how to fix this issue,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-16 Thread 544463199
I encountered a similar problem on ceph17.2.5, could you found which commit 
caused it?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How do you handle large Ceph object storage cluster?

2023-10-16 Thread pawel . przestrzelski
Hi Everyone,

My company is dealing with quite large Ceph cluster (>10k OSDs, >60 PB of 
data). It is entirely dedicated to object storage with S3 interface. 
Maintenance and its extension are getting more and more problematic and time 
consuming. We consider to split it to two or more completely separate clusters 
(without replication of data among them) and create S3 layer of abstraction 
with some additional metadata that will allow us to use these 2+ physically 
independent instances as a one logical cluster. Additionally, newest data is 
the most demanded data, so we have to spread it equally among clusters to avoid 
skews in cluster load.

Do you have any similar experience? How did you handle it? Maybe you have some 
advice? I'm not a Ceph expert. I'm just a Ceph's user and software developer 
who does not like to duplicate someone's job.

Best,
Paweł
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-16 Thread Zakhar Kirpichenko
With the help of community members, I managed to enable RocksDB compression
for a test monitor, and it seems to be working well.

Monitor w/o compression writes about 750 MB to disk in 5 minutes:

   4854 be/4 167   4.97 M755.02 M  0.00 %  0.24 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

Monitor with LZ4 compression writes about 1/4 of that over the same time
period:

2034728 be/4 167 172.00 K199.27 M  0.00 %  0.06 % ceph-mon -n
mon.ceph05 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]

This is caused by the apparent difference in store.db sizes.

Mon store.db w/o compression:

# ls -al
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db
total 257196
drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
drwx-- 3 167 167 4096 Aug 31 05:22 ..
-rw-r--r-- 1 167 167  1517623 Oct 16 14:00 3073035.log
-rw-r--r-- 1 167 167 67285944 Oct 16 14:00 3073037.sst
-rw-r--r-- 1 167 167 67402325 Oct 16 14:00 3073038.sst
-rw-r--r-- 1 167 167 62364991 Oct 16 14:00 3073039.sst

Mon store.db with compression:

# ls -al
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/store.db
total 91188
drwxr-xr-x 2 167 167 4096 Oct 16 14:00 .
drwx-- 3 167 167 4096 Oct 16 13:35 ..
-rw-r--r-- 1 167 167  1760114 Oct 16 14:00 012693.log
-rw-r--r-- 1 167 167 52236087 Oct 16 14:00 012695.sst

There are no apparent downsides thus far. If everything works well, I will
try adding compression to other monitors.

/Z

On Mon, 16 Oct 2023 at 14:57, Zakhar Kirpichenko  wrote:

> The issue persists, although to a lesser extent. Any comments from the
> Ceph team please?
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko  wrote:
>
>> > Some of it is transferable to RocksDB on mons nonetheless.
>>
>> Please point me to relevant Ceph documentation, i.e. a description of how
>> various Ceph monitor and RocksDB tunables affect the operations of
>> monitors, I'll gladly look into it.
>>
>> > Please point me to such recommendations, if they're on docs.ceph.com I'll
>> get them updated.
>>
>> This are the recommendations we used when we built our Pacific cluster:
>> https://docs.ceph.com/en/pacific/start/hardware-recommendations/
>>
>> Our drives are 4x times larger than recommended by this guide. The drives
>> are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
>> storage of rarely modified files. It is not documented or suggested
>> anywhere that monitor processes write several hundred gigabytes of data per
>> day, exceeding the amount of data written by OSDs. Which is why I am not
>> convinced that what we're observing is expected behavior, but it's not easy
>> to get a definitive answer from the Ceph community.
>>
>> /Z
>>
>> On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri 
>> wrote:
>>
>>> Some of it is transferable to RocksDB on mons nonetheless.
>>>
>>> but their specs exceed Ceph hardware recommendations by a good margin
>>>
>>>
>>> Please point me to such recommendations, if they're on docs.ceph.com I'll
>>> get them updated.
>>>
>>> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko  wrote:
>>>
>>> Thank you, Anthony. As I explained to you earlier, the article you had
>>> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
>>> not with OSDs but rather monitors and their RocksDB store. Indeed, the
>>> drives are not enterprise-grade, but their specs exceed Ceph hardware
>>> recommendations by a good margin, they're being used as boot drives only
>>> and aren't supposed to be written to continuously at high rates - which is
>>> what unfortunately is happening. I am trying to determine why it is
>>> happening and how the issue can be alleviated or resolved, unfortunately
>>> monitor RocksDB usage and tunables appear to be not documented at all.
>>>
>>> /Z
>>>
>>> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri 
>>> wrote:
>>>
 cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
 Reef you would experience fewer writes.  Universal compaction might also
 help, but in the end this SSD is a client SKU and really not suited for
 enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
 could change the overprovisioning on the ones you have.

 On Oct 13, 2023, at 12:30, Zakhar Kirpichenko  wrote:

 I would very much appreciate it if someone with a better understanding
 of
 monitor internals and use of RocksDB could please chip in.



>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-16 Thread Zakhar Kirpichenko
Thanks for the suggestion, Josh!

 That part is relatively simple: the container gets ceph.conf from the
host's filesystem, for example:

"HostConfig": {
"Binds": [
"/dev:/dev",
"/run/udev:/run/udev",

"/var/run/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/run/ceph:z",

"/var/log/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86:/var/log/ceph:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/crash:/var/lib/ceph/crash:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05:/var/lib/ceph/mon/ceph-ceph05:z",

"/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/config:/etc/ceph/ceph.conf:z"
],

When I stop the monitor, edit the file directly and restart the monitor,
mon_rocksdb_options seem to be applied correctly!

Unfortunately, I specify global mon_rocksdb_options and redeploy the
monitor, the new ceph.conf doesn't have mon_rocksdb_options at all. I am
not sure that this is a reliable way to enable compression, but it works -
so it's better than other ways which don't work :-)

/Z

On Mon, 16 Oct 2023 at 16:16, Josh Baergen 
wrote:

> > the resulting ceph.conf inside the monitor container doesn't have
> mon_rocksdb_options
>
> I don't know where this particular ceph.conf copy comes from, but I
> still suspect that this is where this particular option needs to be
> set. The reason I think this is that rocksdb mount options are needed
> _before_ the mon is able to access any of the centralized conf data,
> which I believe is itself stored in rocksdb.
>
> Josh
>
> On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko 
> wrote:
> >
> > Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf.
> This didn't work either: ceph.conf gets overridden at monitor start, the
> resulting ceph.conf inside the monitor container doesn't have
> mon_rocksdb_options, the monitor starts with no RocksDB compression.
> >
> > I would appreciate it if someone from the Ceph team could please chip in
> and suggest a working way to enable RocksDB compression in Ceph monitors.
> >
> > /Z
> >
> > On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko 
> wrote:
> >>
> >> Thanks for your response, Josh. Our ceph.conf doesn't have anything but
> the mon addresses, modern Ceph versions store their configuration in the
> monitor configuration database.
> >>
> >> This works rather well for various Ceph components, including the
> monitors. RocksDB options are also applied to monitors correctly, but for
> some reason are being ignored.
> >>
> >> /Z
> >>
> >> On Sat, 14 Oct 2023, 17:40 Josh Baergen, 
> wrote:
> >>>
> >>> Apologies if you tried this already and I missed it - have you tried
> >>> configuring that setting in /etc/ceph/ceph.conf (or wherever your conf
> >>> file is) instead of via 'ceph config'? I wonder if mon settings like
> >>> this one won't actually apply the way you want because they're needed
> >>> before the mon has the ability to obtain configuration from,
> >>> effectively, itself.
> >>>
> >>> Josh
> >>>
> >>> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko 
> wrote:
> >>> >
> >>> > I also tried setting RocksDB compression options and deploying a new
> >>> > monitor. The monitor started with no RocksDB compression again.
> >>> >
> >>> > Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at
> mon
> >>> > start and at mon deploy. How can I enable RocksDB compression in Ceph
> >>> > monitors?
> >>> >
> >>> > Any input from anyone, please?
> >>> >
> >>> > /Z
> >>> >
> >>> > On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko 
> wrote:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > I'm still trying to fight large Ceph monitor writes. One option I
> >>> > > considered is enabling RocksDB compression, as our nodes have more
> than
> >>> > > sufficient RAM and CPU. Unfortunately, monitors seem to completely
> ignore
> >>> > > the compression setting:
> >>> > >
> >>> > > I tried:
> >>> > >
> >>> > > - setting ceph config set mon.ceph05 mon_rocksdb_options
> >>> > >
> "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
> >>> > > restarting the test monitor. The monitor started with no RocksDB
> >>> > > compression:
> >>> > >
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> Compression
> >>> > > algorithms supported:
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> >>> > > kZSTDNotFinalCompression supported: 0
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> >>> > > kXpressCompression supported: 0
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> >>> > > kLZ4HCCompression supported: 1
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> >>> > > kLZ4Compression supported: 1
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> >>> > > kBZip2Compression supported: 0
> >>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
> >>> > > kZlibCompression 

[ceph-users] Re: Ceph 16.2.14: how to set mon_rocksdb_options to enable RocksDB compression?

2023-10-16 Thread Josh Baergen
> the resulting ceph.conf inside the monitor container doesn't have 
> mon_rocksdb_options

I don't know where this particular ceph.conf copy comes from, but I
still suspect that this is where this particular option needs to be
set. The reason I think this is that rocksdb mount options are needed
_before_ the mon is able to access any of the centralized conf data,
which I believe is itself stored in rocksdb.

Josh

On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko  wrote:
>
> Out of curiosity, I tried setting mon_rocksdb_options via ceph.conf. This 
> didn't work either: ceph.conf gets overridden at monitor start, the resulting 
> ceph.conf inside the monitor container doesn't have mon_rocksdb_options, the 
> monitor starts with no RocksDB compression.
>
> I would appreciate it if someone from the Ceph team could please chip in and 
> suggest a working way to enable RocksDB compression in Ceph monitors.
>
> /Z
>
> On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko  wrote:
>>
>> Thanks for your response, Josh. Our ceph.conf doesn't have anything but the 
>> mon addresses, modern Ceph versions store their configuration in the monitor 
>> configuration database.
>>
>> This works rather well for various Ceph components, including the monitors. 
>> RocksDB options are also applied to monitors correctly, but for some reason 
>> are being ignored.
>>
>> /Z
>>
>> On Sat, 14 Oct 2023, 17:40 Josh Baergen,  wrote:
>>>
>>> Apologies if you tried this already and I missed it - have you tried
>>> configuring that setting in /etc/ceph/ceph.conf (or wherever your conf
>>> file is) instead of via 'ceph config'? I wonder if mon settings like
>>> this one won't actually apply the way you want because they're needed
>>> before the mon has the ability to obtain configuration from,
>>> effectively, itself.
>>>
>>> Josh
>>>
>>> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko  wrote:
>>> >
>>> > I also tried setting RocksDB compression options and deploying a new
>>> > monitor. The monitor started with no RocksDB compression again.
>>> >
>>> > Ceph monitors seem to ignore mon_rocksdb_options set at runtime, at mon
>>> > start and at mon deploy. How can I enable RocksDB compression in Ceph
>>> > monitors?
>>> >
>>> > Any input from anyone, please?
>>> >
>>> > /Z
>>> >
>>> > On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko  wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > I'm still trying to fight large Ceph monitor writes. One option I
>>> > > considered is enabling RocksDB compression, as our nodes have more than
>>> > > sufficient RAM and CPU. Unfortunately, monitors seem to completely 
>>> > > ignore
>>> > > the compression setting:
>>> > >
>>> > > I tried:
>>> > >
>>> > > - setting ceph config set mon.ceph05 mon_rocksdb_options
>>> > > "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
>>> > > restarting the test monitor. The monitor started with no RocksDB
>>> > > compression:
>>> > >
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb: Compression
>>> > > algorithms supported:
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kZSTDNotFinalCompression supported: 0
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kXpressCompression supported: 0
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kLZ4HCCompression supported: 1
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kLZ4Compression supported: 1
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kBZip2Compression supported: 0
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kZlibCompression supported: 1
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > > kSnappyCompression supported: 1
>>> > > ...
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > >  Options.compression: NoCompression
>>> > > debug 2023-10-13T19:47:00.403+ 7f1cd967a880  4 rocksdb:
>>> > >Options.bottommost_compression: Disabled
>>> > >
>>> > > - setting ceph config set mon mon_rocksdb_options
>>> > > "write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true",
>>> > > restarting the test monitor. The monitor started with no RocksDB
>>> > > compression, the same way as above.
>>> > >
>>> > > In each case config options were correctly set and readable with config
>>> > > get. I also found a suggestion in ceph-users (
>>> > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/KJM232IHN7FKYI5LODUREN7SVO45BL42/)
>>> > > to set compression in a similar manner. Unfortunately, these options 
>>> > > appear
>>> > > to be ignored.
>>> > >
>>> > > How can I enable RocksDB compression in Ceph monitors?
>>> > >
>>> > > I would very much appreciate your advices and comments.
>>> > >
>>> > > Best regards,
>>> > > Zakhar
>>> > >
>>> > >
>>> > >
>>> > ___
>>> 

[ceph-users] Re: find PG with large omap object

2023-10-16 Thread Frank Schilder
Hi Eugen,

the warning threshold is per omap object, not per PG (which apparently has more 
than 1 omap object). Still, I misread the numbers by 1 order, which means that 
the difference between the last 2 entries is about 10, which does point 
towards a large omap object in PG 12.193.

I issued a deep-scrub on this PG and the warning is resolved.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Monday, October 16, 2023 2:41 PM
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: find PG with large omap object

Hi Frank,

> # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] |
> sort_by(.nk)' pgs.dump | tail
>   },
>   {
> "id": "12.17b",
> "nk": 1493776
>   },
>   {
> "id": "12.193",
> "nk": 1583589
>   }

those numbers are > 1 million and the warning threshold is 200k. So a
warning is expected.

Zitat von Frank Schilder :

> Hi Eugen,
>
> thanks for the one-liner :) I'm afraid I'm in the same position as
> before though.
>
> I dumped all PGs to a file and executed these 2 commands:
>
> # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}]
> | sort_by(.nk)' pgs.dump | tail
>   },
>   {
> "id": "12.193",
> "nk": 1002401056
>   },
>   {
> "id": "21.0",
> "nk": 1235777228
>   }
> ]
>
> # jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] |
> sort_by(.nk)' pgs.dump | tail
>   },
>   {
> "id": "12.17b",
> "nk": 1493776
>   },
>   {
> "id": "12.193",
> "nk": 1583589
>   }
> ]
>
> Neither is beyond the warn limit and pool 12 is indeed the pool
> where the warnings came from. OK, now back to the logs:
>
> # zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-*
> /var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200
> osd.592 (osd.592) 104 : cluster [WRN] Large omap object found.
> Object: 12:c05de58b:::63b.:head PG: 12.d1a7ba03 (12.3) Key
> count: 21 Size (bytes): 230080309
> /var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200
> osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found.
> Object: 12:c9a32586:::63a.:head PG: 12.61a4c593 (12.193) Key
> count: 200243 Size (bytes): 230307097
> /var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200
> osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found.
> Object: 12:eb96322f:::637.:head PG: 12.f44c69d7 (12.1d7) Key
> count: 200329 Size (bytes): 230310393
> /var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200
> osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found.
> Object: 12:08fb0eb7:::635.:head PG: 12.ed70df10 (12.110) Key
> count: 200183 Size (bytes): 230150641
> /var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200
> osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found.
> Object: 12:d6758956:::634.:head PG: 12.6a91ae6b (12.6b) Key
> count: 200247 Size (bytes): 230343371
> /var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200
> osd.980 (osd.980) 308 : cluster [WRN] Large omap object found.
> Object: 12:63f985e7:::639.:head PG: 12.e7a19fc6 (12.1c6) Key
> count: 200160 Size (bytes): 230179282
> /var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200
> osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found.
> Object: 12:44346421:::632.:head PG: 12.84262c22 (12.22) Key
> count: 200325 Size (bytes): 230481029
> /var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200
> osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found.
> Object: 12:c9a32586:::63a.:head PG: 12.61a4c593 (12.193) Key
> count: 200327 Size (bytes): 230404872
> /var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200
> osd.592 (osd.592) 461 : cluster [WRN] Large omap object found.
> Object: 12:c05de58b:::63b.:head PG: 12.d1a7ba03 (12.3) Key
> count: 200228 Size (bytes): 230347361
>
> The warnings report a key count > 20, but none of the PGs in the
> dump does. Apparently, all these PGs were (deep-) scrubbed already
> and the omap key count was updated (or am I misunderstanding
> something here). I still don't know and can neither conclude which
> PG the warning originates from. As far as I can tell, the warning
> should not be there.
>
> Do you have an idea how to continue diagnosis from here apart from
> just trying a deep scrub on all PGs in the list from the log?
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Eugen Block 
> Sent: Monday, October 16, 2023 1:41 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: find PG with large omap object
>
> Hi,
> not sure if this is what you need, but if you know the pool id (you
> probably should) you could try this, it's from an Octopus test cluster
> (assuming the warning was for 

[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2023-10-16 Thread Igor Fedotov

Hi Chris,

for the first question (osd.76) you might want to try ceph-volume's "lvm 
migrate --from data --target " command. Looks like some 
persistent DB remnants are still kept at main device causing the alert.


W.r.t osd.86's question - the line "SLOW    0 B 3.0 GiB 
59 GiB" means that RocksDB higher levels  data (usually L3+) are spread 
over DB and main (aka slow) devices as 3 GB and 59 GB respectively.


In other words SLOW row refers to DB data which is originally supposed 
to be at SLOW device (due to RocksDB data mapping mechanics). But 
improved bluefs logic (introduced by 
https://github.com/ceph/ceph/pull/29687) permitted extra DB disk usage 
for a part of this data.


Resizing DB volume and following DB compaction should do the trick and 
move all the data to DB device. Alternatively ceph-volume's lvm migrate 
command should do the same but the result will be rather temporary 
without DB volume resizing.


Hope this helps.


Thanks,

Igor

On 06/10/2023 06:55, Chris Dunlop wrote:

Hi,

tl;dr why are my osds still spilling?

I've recently upgraded to 16.2.14 from 16.2.9 and started receiving 
bluefs spillover warnings (due to the "fix spillover alert" per the 
16.2.14 release notes). E.g. from 'ceph health detail', the warning on 
one of these (there are a few):


osd.76 spilled over 128 KiB metadata from 'db' device (56 GiB used of 
60 GiB) to slow device


This is a 15T HDD with only a 60G SSD for the db so it's not 
surprising it spilled as it's way below the recommendation for rbd 
usage at db size 1-2% of the storage size.


There was some spare space on the db ssd so I increased the size of 
the db LV up over 400G and did an bluefs-bdev-expand.


However, days later, I'm still getting the spillover warning for that 
osd, including after running a manual compact:


# ceph tell osd.76 compact

See attached perf-dump-76 for the perf dump output:

# cephadm enter --name 'osd.76' ceph daemon 'osd.76' perf dump" | jq 
-r '.bluefs'


In particular, if my understanding is correct, that's telling me the 
db available size is 487G (i.e. the LV expand worked), of which it's 
using 59G, and there's 128K spilled to the slow device:


"db_total_bytes": 512309059584,  # 487G
"db_used_bytes": 63470305280,    # 59G
"slow_used_bytes": 131072,   # 128K

A "bluefs stats" also says the db is using 128K of slow storage 
(although perhaps it's getting the info from the same place as the 
perf dump?):


# ceph tell osd.76 bluefs stats 1 : device size 0x7747ffe000 : using 
0xea620(59 GiB)

2 : device size 0xe8d7fc0 : using 0x6554d689000(6.3 TiB)
RocksDBBlueFSVolumeSelector Usage Matrix:
DEV/LEV WAL DB  SLOW    * *   
REAL    FILES   LOG 0 B 10 MiB  0 
B 0 B 0 B 8.8 MiB 1   WAL 0 
B 2.5 GiB 0 B 0 B 0 B 751 MiB 
8   DB  0 B 56 GiB  128 KiB 0 
B 0 B 50 GiB  842 SLOW    0 B 
0 B 0 B 0 B 0 B 0 B 0 
TOTAL   0 B 58 GiB  128 KiB 0 B 0 
B 0 B 850 MAXIMUMS:
LOG 0 B 22 MiB  0 B 0 B 0 
B 18 MiB  WAL 0 B 3.9 GiB 0 B 
0 B 0 B 1.0 GiB DB  0 B 71 
GiB  282 MiB 0 B 0 B 62 GiB  SLOW    0 
B 0 B 0 B 0 B 0 B 0 B 
TOTAL   0 B 74 GiB  282 MiB 0 B 0 
B 0 B
SIZE <<  0 B 453 GiB 14 TiB 


I had a look at the "DUMPING STATS" output in the logs bug I don't 
know how to interpret it. I did try calculating the total of the sizes 
on the "Sum" lines but that comes to 100G so I don't know what that 
all means. See attached log-stats-76.


I also tried "ceph-kvstore-tool bluestore-kv ... stats":

$ {
  cephadm  unit --fsid $clusterid --name osd.76 stop
  cephadm shell --fsid $clusterid --name osd.76 -- ceph-kvstore-tool 
bluestore-kv /var/lib/ceph/osd/ceph-76 stats cephadm  unit --fsid 
$clusterid --name osd.76 start

}

Output attached as bluestore-kv-stats-76. I can't see anything 
interesting in there, although again I don't really know how to 
interpret it.


So... why is this osd db still spilling onto slow storage, and how do 
I fix things so it's no longer using the slow storage?



And a bonus issue...  on another osd that hasn't yet been resized 
(i.e.  again with a grossly undersized 60G db on SSD with a 15T HDD) 
I'm also getting a spillover warning. The "bluefs stats" seems to be 
saying the db is NOT currently spilling (i.e. "0 B" the DB/SLOW 
position in the matrix), but there's "something" currently using 59G 
on the slow device:


$ ceph tell osd.85 bluefs stats
1 : device size 0xee000 : using 0x3a390(15 GiB)
2 : device size 0xe8d7fc0 : using 

[ceph-users] Re: find PG with large omap object

2023-10-16 Thread Eugen Block

Hi Frank,

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] |  
sort_by(.nk)' pgs.dump | tail

  },
  {
"id": "12.17b",
"nk": 1493776
  },
  {
"id": "12.193",
"nk": 1583589
  }


those numbers are > 1 million and the warning threshold is 200k. So a  
warning is expected.


Zitat von Frank Schilder :


Hi Eugen,

thanks for the one-liner :) I'm afraid I'm in the same position as  
before though.


I dumped all PGs to a file and executed these 2 commands:

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}]  
| sort_by(.nk)' pgs.dump | tail

  },
  {
"id": "12.193",
"nk": 1002401056
  },
  {
"id": "21.0",
"nk": 1235777228
  }
]

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] |  
sort_by(.nk)' pgs.dump | tail

  },
  {
"id": "12.17b",
"nk": 1493776
  },
  {
"id": "12.193",
"nk": 1583589
  }
]

Neither is beyond the warn limit and pool 12 is indeed the pool  
where the warnings came from. OK, now back to the logs:


# zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-*
/var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200  
osd.592 (osd.592) 104 : cluster [WRN] Large omap object found.  
Object: 12:c05de58b:::63b.:head PG: 12.d1a7ba03 (12.3) Key  
count: 21 Size (bytes): 230080309
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200  
osd.949 (osd.949) 6897 : cluster [WRN] Large omap object found.  
Object: 12:c9a32586:::63a.:head PG: 12.61a4c593 (12.193) Key  
count: 200243 Size (bytes): 230307097
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200  
osd.988 (osd.988) 4365 : cluster [WRN] Large omap object found.  
Object: 12:eb96322f:::637.:head PG: 12.f44c69d7 (12.1d7) Key  
count: 200329 Size (bytes): 230310393
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200  
osd.50 (osd.50) 4549 : cluster [WRN] Large omap object found.  
Object: 12:08fb0eb7:::635.:head PG: 12.ed70df10 (12.110) Key  
count: 200183 Size (bytes): 230150641
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200  
osd.18 (osd.18) 7011 : cluster [WRN] Large omap object found.  
Object: 12:d6758956:::634.:head PG: 12.6a91ae6b (12.6b) Key  
count: 200247 Size (bytes): 230343371
/var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200  
osd.980 (osd.980) 308 : cluster [WRN] Large omap object found.  
Object: 12:63f985e7:::639.:head PG: 12.e7a19fc6 (12.1c6) Key  
count: 200160 Size (bytes): 230179282
/var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200  
osd.563 (osd.563) 3661 : cluster [WRN] Large omap object found.  
Object: 12:44346421:::632.:head PG: 12.84262c22 (12.22) Key  
count: 200325 Size (bytes): 230481029
/var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200  
osd.949 (osd.949) 7088 : cluster [WRN] Large omap object found.  
Object: 12:c9a32586:::63a.:head PG: 12.61a4c593 (12.193) Key  
count: 200327 Size (bytes): 230404872
/var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200  
osd.592 (osd.592) 461 : cluster [WRN] Large omap object found.  
Object: 12:c05de58b:::63b.:head PG: 12.d1a7ba03 (12.3) Key  
count: 200228 Size (bytes): 230347361


The warnings report a key count > 20, but none of the PGs in the  
dump does. Apparently, all these PGs were (deep-) scrubbed already  
and the omap key count was updated (or am I misunderstanding  
something here). I still don't know and can neither conclude which  
PG the warning originates from. As far as I can tell, the warning  
should not be there.


Do you have an idea how to continue diagnosis from here apart from  
just trying a deep scrub on all PGs in the list from the log?


Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Monday, October 16, 2023 1:41 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: find PG with large omap object

Hi,
not sure if this is what you need, but if you know the pool id (you
probably should) you could try this, it's from an Octopus test cluster
(assuming the warning was for the number of keys, not bytes):

$ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select
(.pgid | startswith("17.")) |  .pgid + " " +
"\(.stat_sum.num_omap_keys)"'
17.6 191
17.7 759
17.4 358
17.5 0
17.2 177
17.3 1
17.0 375
17.1 176

If you don't know the pool you could sort the ouput by the second
column and see which PG has the largest number of omap_keys.

Regards,
Eugen

Zitat von Frank Schilder :


Hi all,

we had a bunch of large omap object warnings after a user deleted a
lot of files on a ceph fs with snapshots. After the snapshots were
rotated out, all but one of these warnings disappeared over time.
However, one warning is stuck and I wonder if its something else.

Is there a reasonable way (say, one-liner with no more than 120
characters) to 

[ceph-users] Re: find PG with large omap object

2023-10-16 Thread Frank Schilder
Hi Eugen,

thanks for the one-liner :) I'm afraid I'm in the same position as before 
though.

I dumped all PGs to a file and executed these 2 commands:

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_bytes}] | 
sort_by(.nk)' pgs.dump | tail
  },
  {
"id": "12.193",
"nk": 1002401056
  },
  {
"id": "21.0",
"nk": 1235777228
  }
]

# jq '[.pg_stats[] | {"id": .pgid, "nk": .stat_sum.num_omap_keys}] | 
sort_by(.nk)' pgs.dump | tail
  },
  {
"id": "12.17b",
"nk": 1493776
  },
  {
"id": "12.193",
"nk": 1583589
  }
]

Neither is beyond the warn limit and pool 12 is indeed the pool where the 
warnings came from. OK, now back to the logs:

# zgrep -i 'Large omap object found. Object:' /var/log/ceph/ceph.log-*   
/var/log/ceph/ceph.log-20231008.gz:2023-10-05T01:25:14.581962+0200 osd.592 
(osd.592) 104 : cluster [WRN] Large omap object found. Object: 
12:c05de58b:::63b.:head PG: 12.d1a7ba03 (12.3) Key count: 21 Size 
(bytes): 230080309
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T04:33:02.678879+0200 osd.949 
(osd.949) 6897 : cluster [WRN] Large omap object found. Object: 
12:c9a32586:::63a.:head PG: 12.61a4c593 (12.193) Key count: 200243 Size 
(bytes): 230307097
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T07:22:40.512228+0200 osd.988 
(osd.988) 4365 : cluster [WRN] Large omap object found. Object: 
12:eb96322f:::637.:head PG: 12.f44c69d7 (12.1d7) Key count: 200329 Size 
(bytes): 230310393
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T15:08:03.785186+0200 osd.50 
(osd.50) 4549 : cluster [WRN] Large omap object found. Object: 
12:08fb0eb7:::635.:head PG: 12.ed70df10 (12.110) Key count: 200183 Size 
(bytes): 230150641
/var/log/ceph/ceph.log-20231008.gz:2023-10-07T16:37:12.901470+0200 osd.18 
(osd.18) 7011 : cluster [WRN] Large omap object found. Object: 
12:d6758956:::634.:head PG: 12.6a91ae6b (12.6b) Key count: 200247 Size 
(bytes): 230343371
/var/log/ceph/ceph.log-20231008.gz:2023-10-08T01:25:16.125068+0200 osd.980 
(osd.980) 308 : cluster [WRN] Large omap object found. Object: 
12:63f985e7:::639.:head PG: 12.e7a19fc6 (12.1c6) Key count: 200160 Size 
(bytes): 230179282
/var/log/ceph/ceph.log-20231015:2023-10-09T00:51:32.587849+0200 osd.563 
(osd.563) 3661 : cluster [WRN] Large omap object found. Object: 
12:44346421:::632.:head PG: 12.84262c22 (12.22) Key count: 200325 Size 
(bytes): 230481029
/var/log/ceph/ceph.log-20231015:2023-10-09T15:35:28.803117+0200 osd.949 
(osd.949) 7088 : cluster [WRN] Large omap object found. Object: 
12:c9a32586:::63a.:head PG: 12.61a4c593 (12.193) Key count: 200327 Size 
(bytes): 230404872
/var/log/ceph/ceph.log-20231015:2023-10-09T18:51:35.615096+0200 osd.592 
(osd.592) 461 : cluster [WRN] Large omap object found. Object: 
12:c05de58b:::63b.:head PG: 12.d1a7ba03 (12.3) Key count: 200228 Size 
(bytes): 230347361

The warnings report a key count > 20, but none of the PGs in the dump does. 
Apparently, all these PGs were (deep-) scrubbed already and the omap key count 
was updated (or am I misunderstanding something here). I still don't know and 
can neither conclude which PG the warning originates from. As far as I can 
tell, the warning should not be there.

Do you have an idea how to continue diagnosis from here apart from just trying 
a deep scrub on all PGs in the list from the log?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Monday, October 16, 2023 1:41 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: find PG with large omap object

Hi,
not sure if this is what you need, but if you know the pool id (you
probably should) you could try this, it's from an Octopus test cluster
(assuming the warning was for the number of keys, not bytes):

$ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select
(.pgid | startswith("17.")) |  .pgid + " " +
"\(.stat_sum.num_omap_keys)"'
17.6 191
17.7 759
17.4 358
17.5 0
17.2 177
17.3 1
17.0 375
17.1 176

If you don't know the pool you could sort the ouput by the second
column and see which PG has the largest number of omap_keys.

Regards,
Eugen

Zitat von Frank Schilder :

> Hi all,
>
> we had a bunch of large omap object warnings after a user deleted a
> lot of files on a ceph fs with snapshots. After the snapshots were
> rotated out, all but one of these warnings disappeared over time.
> However, one warning is stuck and I wonder if its something else.
>
> Is there a reasonable way (say, one-liner with no more than 120
> characters) to get ceph to tell me which PG this is coming from? I
> just want to issue a deep scrub to check if it disappears and going
> through the logs and querying every single object for its key count
> seems a bit of a hassle for something that ought to be part of "ceph
> health detail".
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-16 Thread Zakhar Kirpichenko
The issue persists, although to a lesser extent. Any comments from the Ceph
team please?

/Z

On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko  wrote:

> > Some of it is transferable to RocksDB on mons nonetheless.
>
> Please point me to relevant Ceph documentation, i.e. a description of how
> various Ceph monitor and RocksDB tunables affect the operations of
> monitors, I'll gladly look into it.
>
> > Please point me to such recommendations, if they're on docs.ceph.com I'll
> get them updated.
>
> This are the recommendations we used when we built our Pacific cluster:
> https://docs.ceph.com/en/pacific/start/hardware-recommendations/
>
> Our drives are 4x times larger than recommended by this guide. The drives
> are rated for < 0.5 DWPD, which is more than sufficient for boot drives and
> storage of rarely modified files. It is not documented or suggested
> anywhere that monitor processes write several hundred gigabytes of data per
> day, exceeding the amount of data written by OSDs. Which is why I am not
> convinced that what we're observing is expected behavior, but it's not easy
> to get a definitive answer from the Ceph community.
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri 
> wrote:
>
>> Some of it is transferable to RocksDB on mons nonetheless.
>>
>> but their specs exceed Ceph hardware recommendations by a good margin
>>
>>
>> Please point me to such recommendations, if they're on docs.ceph.com I'll
>> get them updated.
>>
>> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko  wrote:
>>
>> Thank you, Anthony. As I explained to you earlier, the article you had
>> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is
>> not with OSDs but rather monitors and their RocksDB store. Indeed, the
>> drives are not enterprise-grade, but their specs exceed Ceph hardware
>> recommendations by a good margin, they're being used as boot drives only
>> and aren't supposed to be written to continuously at high rates - which is
>> what unfortunately is happening. I am trying to determine why it is
>> happening and how the issue can be alleviated or resolved, unfortunately
>> monitor RocksDB usage and tunables appear to be not documented at all.
>>
>> /Z
>>
>> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri 
>> wrote:
>>
>>> cf. Mark's article I sent you re RocksDB tuning.  I suspect that with
>>> Reef you would experience fewer writes.  Universal compaction might also
>>> help, but in the end this SSD is a client SKU and really not suited for
>>> enterprise use.  If you had the 1TB SKU you'd get much longer life, or you
>>> could change the overprovisioning on the ones you have.
>>>
>>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko  wrote:
>>>
>>> I would very much appreciate it if someone with a better understanding of
>>> monitor internals and use of RocksDB could please chip in.
>>>
>>>
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: find PG with large omap object

2023-10-16 Thread Eugen Block

Hi,
not sure if this is what you need, but if you know the pool id (you  
probably should) you could try this, it's from an Octopus test cluster  
(assuming the warning was for the number of keys, not bytes):


$ ceph -f json pg dump pgs 2>/dev/null | jq -r '.pg_stats[] | select  
(.pgid | startswith("17.")) |  .pgid + " " +  
"\(.stat_sum.num_omap_keys)"'

17.6 191
17.7 759
17.4 358
17.5 0
17.2 177
17.3 1
17.0 375
17.1 176

If you don't know the pool you could sort the ouput by the second  
column and see which PG has the largest number of omap_keys.


Regards,
Eugen

Zitat von Frank Schilder :


Hi all,

we had a bunch of large omap object warnings after a user deleted a  
lot of files on a ceph fs with snapshots. After the snapshots were  
rotated out, all but one of these warnings disappeared over time.  
However, one warning is stuck and I wonder if its something else.


Is there a reasonable way (say, one-liner with no more than 120  
characters) to get ceph to tell me which PG this is coming from? I  
just want to issue a deep scrub to check if it disappears and going  
through the logs and querying every single object for its key count  
seems a bit of a hassle for something that ought to be part of "ceph  
health detail".


Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Igor Fedotov

That's true.

On 16/10/2023 14:13, Zakhar Kirpichenko wrote:
Many thanks, Igor. I found previously submitted bug reports and 
subscribed to them. My understanding is that the issue is going to be 
fixed in the next Pacific minor release.


/Z

On Mon, 16 Oct 2023 at 14:03, Igor Fedotov  wrote:

Hi Zakhar,

please see my reply for the post on the similar issue at:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/


Thanks,

Igor

On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread:
>
>
>     1. "assert_thread_name": "bstore_kv_sync",
>     2. "backtrace": [
>     3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>     4. "gsignal()",
>     5. "abort()",
>     6. "(ceph::__ceph_assert_fail(char const*, char const*, int,
char
>     const*)+0x1a9) [0x564dc5f87d0b]",
>     7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>     8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*,
bluefs_fnode_t
>     const&)+0x15e) [0x564dc6604a9e]",
>     9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned
long, unsigned
>     long)+0x77d) [0x564dc66951cd]",
>     10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>     [0x564dc6695670]",
>     11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b)
[0x564dc66b1a6b]",
>     12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>     13.
"(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>     const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>     14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>     [0x564dc6c761c2]",
>     15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
[0x564dc6c77808]",
>     16.
"(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>     const&, rocksdb::log::Writer*, unsigned long*, bool, bool,
unsigned
>     long)+0x309) [0x564dc6b780c9]",
>     17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>     rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned
long*, unsigned
>     long, bool, unsigned long*, unsigned long,
>     rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>     18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>     rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>     19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>  std::shared_ptr)+0x84)
[0x564dc6b1f644]",
>     20.

"(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>     [0x564dc6b2004a]",
>     21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>     22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>     23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>     24. "clone()"
>     25. ],
>
>
> I am attaching two instances of crash info for further reference:
> https://pastebin.com/E6myaHNU
>
> OSD configuration is rather simple and close to default:
>
> osd.6         dev       bluestore_cache_size_hdd   4294967296
>                                            osd.6  dev
> bluestore_cache_size_ssd            4294967296
>                    osd           advanced  debug_rocksdb
>    1/5              osd
>          advanced  osd_max_backfills                   2
>                                                  osd      basic
> osd_memory_target                   17179869184
>                      osd           advanced osd_recovery_max_active
>      2          osd
>      advanced  osd_scrub_sleep  0.10
>                                        osd  advanced
>   rbd_balance_parent_reads            false
>
> debug_rocksdb is a recent change, otherwise this configuration
has been
> running without issues for months. The crashes happened on two
different
> hosts with identical hardware, the hosts and storage (NVME
DB/WAL, HDD
> block) don't exhibit any issues. We have not experienced such
crashes with
> Ceph < 16.2.14.
>
> Is this a known issue, or should I open a bug report?
>
> Best regards,
> Zakhar
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Many thanks, Igor. I found previously submitted bug reports and subscribed
to them. My understanding is that the issue is going to be fixed in the
next Pacific minor release.

/Z

On Mon, 16 Oct 2023 at 14:03, Igor Fedotov  wrote:

> Hi Zakhar,
>
> please see my reply for the post on the similar issue at:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>
>
> Thanks,
>
> Igor
>
> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
> > Hi,
> >
> > After upgrading to Ceph 16.2.14 we had several OSD crashes
> > in bstore_kv_sync thread:
> >
> >
> > 1. "assert_thread_name": "bstore_kv_sync",
> > 2. "backtrace": [
> > 3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
> > 4. "gsignal()",
> > 5. "abort()",
> > 6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
> > const*)+0x1a9) [0x564dc5f87d0b]",
> > 7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
> > 8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
> > const&)+0x15e) [0x564dc6604a9e]",
> > 9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
> unsigned
> > long)+0x77d) [0x564dc66951cd]",
> > 10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
> > [0x564dc6695670]",
> > 11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
> > 12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
> > 13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
> > const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
> > 14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
> > [0x564dc6c761c2]",
> > 15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88)
> [0x564dc6c77808]",
> > 16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
> > const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
> > long)+0x309) [0x564dc6b780c9]",
> > 17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
> > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*,
> unsigned
> > long, bool, unsigned long*, unsigned long,
> > rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
> > 18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
> > rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
> > 19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
> > std::shared_ptr)+0x84)
> [0x564dc6b1f644]",
> > 20.
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
> > [0x564dc6b2004a]",
> > 21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
> > 22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
> > 23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
> > 24. "clone()"
> > 25. ],
> >
> >
> > I am attaching two instances of crash info for further reference:
> > https://pastebin.com/E6myaHNU
> >
> > OSD configuration is rather simple and close to default:
> >
> > osd.6 dev   bluestore_cache_size_hdd4294967296
> >osd.6 dev
> > bluestore_cache_size_ssd4294967296
> >osd   advanced  debug_rocksdb
> >1/5
>  osd
> >  advanced  osd_max_backfills   2
> >  osd   basic
> > osd_memory_target   17179869184
> >  osd   advanced  osd_recovery_max_active
> >  2 osd
> >  advanced  osd_scrub_sleep 0.10
> >osd   advanced
> >   rbd_balance_parent_readsfalse
> >
> > debug_rocksdb is a recent change, otherwise this configuration has been
> > running without issues for months. The crashes happened on two different
> > hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
> > block) don't exhibit any issues. We have not experienced such crashes
> with
> > Ceph < 16.2.14.
> >
> > Is this a known issue, or should I open a bug report?
> >
> > Best regards,
> > Zakhar
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Igor Fedotov

Hi Zakhar,

please see my reply for the post on the similar issue at: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/



Thanks,

Igor

On 16/10/2023 09:26, Zakhar Kirpichenko wrote:

Hi,

After upgrading to Ceph 16.2.14 we had several OSD crashes
in bstore_kv_sync thread:


1. "assert_thread_name": "bstore_kv_sync",
2. "backtrace": [
3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
4. "gsignal()",
5. "abort()",
6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x564dc5f87d0b]",
7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
const&)+0x15e) [0x564dc6604a9e]",
9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned
long)+0x77d) [0x564dc66951cd]",
10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
[0x564dc6695670]",
11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
[0x564dc6c761c2]",
15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
long)+0x309) [0x564dc6b780c9]",
17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
long, bool, unsigned long*, unsigned long,
rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
std::shared_ptr)+0x84) [0x564dc6b1f644]",
20. 
"(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
[0x564dc6b2004a]",
21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
24. "clone()"
25. ],


I am attaching two instances of crash info for further reference:
https://pastebin.com/E6myaHNU

OSD configuration is rather simple and close to default:

osd.6 dev   bluestore_cache_size_hdd4294967296
   osd.6 dev
bluestore_cache_size_ssd4294967296
   osd   advanced  debug_rocksdb
   1/5 osd
 advanced  osd_max_backfills   2
 osd   basic
osd_memory_target   17179869184
 osd   advanced  osd_recovery_max_active
 2 osd
 advanced  osd_scrub_sleep 0.10
   osd   advanced
  rbd_balance_parent_readsfalse

debug_rocksdb is a recent change, otherwise this configuration has been
running without issues for months. The crashes happened on two different
hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
block) don't exhibit any issues. We have not experienced such crashes with
Ceph < 16.2.14.

Is this a known issue, or should I open a bug report?

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] find PG with large omap object

2023-10-16 Thread Frank Schilder
Hi all,

we had a bunch of large omap object warnings after a user deleted a lot of 
files on a ceph fs with snapshots. After the snapshots were rotated out, all 
but one of these warnings disappeared over time. However, one warning is stuck 
and I wonder if its something else.

Is there a reasonable way (say, one-liner with no more than 120 characters) to 
get ceph to tell me which PG this is coming from? I just want to issue a deep 
scrub to check if it disappears and going through the logs and querying every 
single object for its key count seems a bit of a hassle for something that 
ought to be part of "ceph health detail".

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Unfortunately, the OSD log from the earlier crash is not available. I have
extracted the OSD log, including the recent events, from the latest crash:
https://www.dropbox.com/scl/fi/1ne8h85iuc5vx78qm1t93/20231016_osd6.zip?rlkey=fxyn242q7c69ec5lkv29csx13=0
I hope this helps to identify the crash reason.

The log entries that I find suspicious are these right before the crash:

debug  -1726> 2023-10-15T22:31:21.575+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17024319488 unmapped: 4164763648
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1723> 2023-10-15T22:31:22.579+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17024589824 unmapped: 4164493312
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1718> 2023-10-15T22:31:23.579+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17027031040 unmapped: 4162052096
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1714> 2023-10-15T22:31:24.579+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17026301952 unmapped: 4162781184
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
debug  -1713> 2023-10-15T22:31:25.383+ 7f961ccb8700  5
bluestore.MempoolThread(0x55c5bee8cb98) _resize_shards cache_size:
13797582406 kv_alloc: 8321499136 kv_used: 8245313424 kv_onode_alloc:
4697620480 kv_onode_used: 4690617424 meta_alloc: 469762048 meta_used:
371122625 data_alloc: 134217728 data_used: 44314624
...
debug  -1710> 2023-10-15T22:31:25.583+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17026367488 unmapped: 4162715648
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1707> 2023-10-15T22:31:26.583+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17026211840 unmapped: 4162871296
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
...
debug  -1704> 2023-10-15T22:31:27.583+ 7f961ccb8700  5 prioritycache
tune_memory target: 17179869184 mapped: 17024548864 unmapped: 4164534272
heap: 21189083136 old mem: 13797582406 new mem: 13797582406

There's plenty of RAM in the system, about 120 GB free and used for cache.

/Z

On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko  wrote:

> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread:
>
>
>1. "assert_thread_name": "bstore_kv_sync",
>2. "backtrace": [
>3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>4. "gsignal()",
>5. "abort()",
>6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>const*)+0x1a9) [0x564dc5f87d0b]",
>7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>const&)+0x15e) [0x564dc6604a9e]",
>9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>unsigned long)+0x77d) [0x564dc66951cd]",
>10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>[0x564dc6695670]",
>11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>[0x564dc6c761c2]",
>15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
>16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>long)+0x309) [0x564dc6b780c9]",
>17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
>long, bool, unsigned long*, unsigned long,
>rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>std::shared_ptr)+0x84) [0x564dc6b1f644]",
>20. 
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>[0x564dc6b2004a]",
>21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>24. "clone()"
>25. ],
>
>
> I am attaching two instances of crash info for further reference:
> https://pastebin.com/E6myaHNU
>
> OSD configuration is rather simple and close to default:
>
> osd.6 dev   bluestore_cache_size_hdd4294967296
>   osd.6 dev
> bluestore_cache_size_ssd4294967296
>   osd   advanced  debug_rocksdb
>   1/5 osd
> advanced  

[ceph-users] Re: Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Not sure how it managed to screw up formatting, OSD configuration in a more
readable form: https://pastebin.com/mrC6UdzN

/Z

On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko  wrote:

> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread:
>
>
>1. "assert_thread_name": "bstore_kv_sync",
>2. "backtrace": [
>3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
>4. "gsignal()",
>5. "abort()",
>6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
>const*)+0x1a9) [0x564dc5f87d0b]",
>7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
>8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
>const&)+0x15e) [0x564dc6604a9e]",
>9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long,
>unsigned long)+0x77d) [0x564dc66951cd]",
>10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
>[0x564dc6695670]",
>11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
>12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
>13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
>const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
>14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
>[0x564dc6c761c2]",
>15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
>16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
>const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
>long)+0x309) [0x564dc6b780c9]",
>17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
>long, bool, unsigned long*, unsigned long,
>rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
>18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
>rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
>19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
>std::shared_ptr)+0x84) [0x564dc6b1f644]",
>20. 
> "(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
>[0x564dc6b2004a]",
>21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
>22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
>23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
>24. "clone()"
>25. ],
>
>
> I am attaching two instances of crash info for further reference:
> https://pastebin.com/E6myaHNU
>
> OSD configuration is rather simple and close to default:
>
> osd.6 dev   bluestore_cache_size_hdd4294967296
>   osd.6 dev
> bluestore_cache_size_ssd4294967296
>   osd   advanced  debug_rocksdb
>   1/5 osd
> advanced  osd_max_backfills   2
> osd   basic
> osd_memory_target   17179869184
> osd   advanced  osd_recovery_max_active
> 2 osd
> advanced  osd_scrub_sleep 0.10
>   osd   advanced
>  rbd_balance_parent_readsfalse
>
> debug_rocksdb is a recent change, otherwise this configuration has been
> running without issues for months. The crashes happened on two different
> hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
> block) don't exhibit any issues. We have not experienced such crashes with
> Ceph < 16.2.14.
>
> Is this a known issue, or should I open a bug report?
>
> Best regards,
> Zakhar
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.14: OSDs randomly crash in bstore_kv_sync

2023-10-16 Thread Zakhar Kirpichenko
Hi,

After upgrading to Ceph 16.2.14 we had several OSD crashes
in bstore_kv_sync thread:


   1. "assert_thread_name": "bstore_kv_sync",
   2. "backtrace": [
   3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
   4. "gsignal()",
   5. "abort()",
   6. "(ceph::__ceph_assert_fail(char const*, char const*, int, char
   const*)+0x1a9) [0x564dc5f87d0b]",
   7. "/usr/bin/ceph-osd(+0x584ed4) [0x564dc5f87ed4]",
   8. "(RocksDBBlueFSVolumeSelector::sub_usage(void*, bluefs_fnode_t
   const&)+0x15e) [0x564dc6604a9e]",
   9. "(BlueFS::_flush_range_F(BlueFS::FileWriter*, unsigned long, unsigned
   long)+0x77d) [0x564dc66951cd]",
   10. "(BlueFS::_flush_F(BlueFS::FileWriter*, bool, bool*)+0x90)
   [0x564dc6695670]",
   11. "(BlueFS::fsync(BlueFS::FileWriter*)+0x18b) [0x564dc66b1a6b]",
   12. "(BlueRocksWritableFile::Sync()+0x18) [0x564dc66c1768]",
   13. "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions
   const&, rocksdb::IODebugContext*)+0x1f) [0x564dc6b6496f]",
   14. "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x402)
   [0x564dc6c761c2]",
   15. "(rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x564dc6c77808]",
   16. "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup
   const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned
   long)+0x309) [0x564dc6b780c9]",
   17. "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&,
   rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned
   long, bool, unsigned long*, unsigned long,
   rocksdb::PreReleaseCallback*)+0x2629) [0x564dc6b80c69]",
   18. "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&,
   rocksdb::WriteBatch*)+0x21) [0x564dc6b80e61]",
   19. "(RocksDBStore::submit_common(rocksdb::WriteOptions&,
   std::shared_ptr)+0x84) [0x564dc6b1f644]",
   20. 
"(RocksDBStore::submit_transaction_sync(std::shared_ptr)+0x9a)
   [0x564dc6b2004a]",
   21. "(BlueStore::_kv_sync_thread()+0x30d8) [0x564dc6602ec8]",
   22. "(BlueStore::KVSyncThread::entry()+0x11) [0x564dc662ab61]",
   23. "/lib64/libpthread.so.0(+0x81ca) [0x7ff2f67461ca]",
   24. "clone()"
   25. ],


I am attaching two instances of crash info for further reference:
https://pastebin.com/E6myaHNU

OSD configuration is rather simple and close to default:

osd.6 dev   bluestore_cache_size_hdd4294967296
  osd.6 dev
bluestore_cache_size_ssd4294967296
  osd   advanced  debug_rocksdb
  1/5 osd
advanced  osd_max_backfills   2
osd   basic
osd_memory_target   17179869184
osd   advanced  osd_recovery_max_active
2 osd
advanced  osd_scrub_sleep 0.10
  osd   advanced
 rbd_balance_parent_readsfalse

debug_rocksdb is a recent change, otherwise this configuration has been
running without issues for months. The crashes happened on two different
hosts with identical hardware, the hosts and storage (NVME DB/WAL, HDD
block) don't exhibit any issues. We have not experienced such crashes with
Ceph < 16.2.14.

Is this a known issue, or should I open a bug report?

Best regards,
Zakhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io