[ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load
On 10/09/2022 12:50, duluxoz wrote: Hi Guys, So, I finally got things sorted :-) Time to eat some crow-pie :-P Turns out I had two issues, both of which involved typos (don't they always?). The first was I had transposed two digits of an IP Address in the `iscsi-gateway.cfg` -> `trusted_ip_list`. The second was that I had called the `iscsi-gateway.cfg` file `isci-gateway.cfg`. Okay, this should be the reason why the 'api_secure' was using the default value. Thanks! DOH! Thanks for all your help - if I hadn't had a couple of people to bounce ideas off and point out the blindingly obvious (to confirm I wasn't going crazy) then I don;t think I would have found these errors so quickly Thank you! Cheers Dulux-Oz On 10/09/2022 00:40, Bailey Allison wrote: Hi Matt, No problem, looking at the output of gwcli -d there it looks like it's having issues getting the api endpoint, are you able to try running: curl --user admin:admin -X GET http://X.X.X.X:5000/api or curl http://X.X.X.X:5000/api Replacing the IP address with the node hosting the iSCSI gateway? It should spit out a bunch of stuff, but it would at least let us know if the api itself is listening or not. Also here's the output of gwcli -d from our cluster to compare: root@ubuntu-gw01:~# gwcli -d Adding ceph cluster 'ceph' to the UI Fetching ceph osd information Querying ceph for state information Refreshing disk information from the config object - Scanning will use 8 scan threads - rbd image scan complete: 0s Refreshing gateway & client information - checking iSCSI/API ports on ubuntu-gw01 - checking iSCSI/API ports on ubuntu-gw02 1 gateway is inaccessible - updates will be disabled Querying ceph for state information Gathering pool stats for cluster 'ceph' Regards, Bailey -Original Message- From: duluxoz Sent: September 9, 2022 4:11 AM To: Bailey Allison ; ceph-users@ceph.io Subject: [ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load Hi Bailey, Sorry for the delay in getting back to you (I had a few non-related issues to resolve) - and thanks for replying. The results from `gwcli -d`: ~~~ Adding ceph cluster 'ceph' to the UI Fetching ceph osd information Querying ceph for state information REST API failure, code : 500 Unable to access the configuration object Traceback (most recent call last): File "/usr/bin/gwcli", line 194, in main() File "/usr/bin/gwcli", line 108, in main "({})".format(settings.config.api_endpoint)) AttributeError: 'Settings' object has no attribute 'api_endpoint' ~~~ Checked all of the other things you mentioned: all good. Any further ideas? Cheers On 08/09/2022 10:08, Bailey Allison wrote: Hi Dulux-oz, Are you able to share the output of "gwcli -d" from your iSCSI node? Just a few things I can think to check off the top of my head, is port 5000 accessible/opened on the node running iSCSI? I think by default the API tries to listen/use a pool called rbd, so does your cluster have a pool named that? It looks like it does based on your logs but something to check anyways, otherwise I believe you can change the pool it uses within the iscsi-gateway.cfg file though. If there's any blocklisted OSDs on the node you're running iSCSI on it will also prevent rbd-target-api from starting I have found from experience, but again per your logs it looks like there isn't any. Just in case it might help I've also attached an iscsi-gateway-cfg file from a cluster we've got with it working here: # This is seed configuration used by the ceph_iscsi_config modules # when handling configuration tasks for iscsi gateway(s) # # Please do not change this file directly since it is managed by Ansible and will be overwritten [config] api_password = admin api_port = 5000 # API settings. # The API supports a number of options that allow you to tailor it to your # local environment. If you want to run the API under https, you will need to # create cert/key files that are compatible for each iSCSI gateway node, that is # not locked to a specific node. SSL cert and key files *must* be called # 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' directory # on *each* gateway node. With the SSL files in place, you can use 'api_secure = true' # to switch to https mode. # To support the API, the bear minimum settings are: api_secure = False # Optional settings related to the CLI/API service api_user = admin cluster_name = ceph loop_delay = 1 pool = rbd trusted_ip_list = X.X.X.X,X.X.X.X,X.X.X.X,X.X.X.X -Original Message- From: duluxoz Sent: September 7, 2022 6:38 AM To: ceph-users@ceph.io Subject: [ceph-users] Ceph iSCSI rbd-target.api Failed to Load Hi All, I've followed the instructions on the CEPH Doco website on Configuring the iSCSI Target. Everything went AOK up to the point where I try to start the rbd-target-api service, which fails (the rbd-target-gw service started OK). A `systemctl status
[ceph-users] Re: [ceph-users] OSD Crash in recovery: SST file contains data beyond the point of corruption.
Hi Igor, looks like the setting wont work, the container now starts with a different error message that the setting is an invalid argument. Did i something wrong by setting: ceph config set osd.4 bluestore_rocksdb_options_annex "wal_recovery_mode=kSkipAnyCorruptedRecord" ? debug 2022-09-12T20:20:45.044+ 8714e040 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-4/block size 2.7 TiB debug 2022-09-12T20:20:45.044+ 8714e040 1 bluefs mount debug 2022-09-12T20:20:45.044+ 8714e040 1 bluefs _init_alloc shared, id 1, capacity 0x2baa100, block size 0x1 debug 2022-09-12T20:20:45.608+ 8714e040 1 bluefs mount shared_bdev_used = 0 debug 2022-09-12T20:20:45.608+ 8714e040 1 bluestore(/var/lib/ceph/osd/ceph-4) _prepare_db_environment set db_paths to db,2850558889164 db.slow,2850558889164 debug 2022-09-12T20:20:45.608+ 8714e040 -1 rocksdb: Invalid argument: No mapping for enum : wal_recovery_mode debug 2022-09-12T20:20:45.608+ 8714e040 -1 rocksdb: Invalid argument: No mapping for enum : wal_recovery_mode debug 2022-09-12T20:20:45.608+ 8714e040 1 rocksdb: do_open load rocksdb options failed debug 2022-09-12T20:20:45.608+ 8714e040 -1 bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db: debug 2022-09-12T20:20:45.608+ 8714e040 1 bluefs umount debug 2022-09-12T20:20:45.608+ 8714e040 1 bdev(0xec8e3c00 /var/lib/ceph/osd/ceph-4/block) close debug 2022-09-12T20:20:45.836+ 8714e040 1 bdev(0xec8e2400 /var/lib/ceph/osd/ceph-4/block) close debug 2022-09-12T20:20:46.088+ 8714e040 -1 osd.4 0 OSD:init: unable to mount object store debug 2022-09-12T20:20:46.088+ 8714e040 -1 ** ERROR: osd init failed: (5) Input/output error Regards and many thanks for the help! Ben Am Montag, September 12, 2022 21:14 CEST, schrieb Igor Fedotov : Hi Benjamin, honestly the following advice is unlikely to help but you may want to try to set bluestore_rocksdb_options_annex to one of the following options: - wal_recovery_mode=kTolerateCorruptedTailRecords - wal_recovery_mode=kSkipAnyCorruptedRecord The indication that the setting is in effect would be the respective value at the end of following log line: debug 2022-09-12T17:37:05.574+ a8316040 4 rocksdb: Options.wal_recovery_mode: 2 It should get 0 and 3 respectively. Hoe this helps, Igor On 9/12/2022 9:09 PM, Benjamin Naber wrote: > Hi Everybody, > > im struggeling now a couple of days with a degraded cehp cluster. > Its a simple 3 node Cluster with 6 OSD´s, 3 SSD based, 3 HDD based. A couple > of days ago one of the nodes crashed. in case of Hardisk failure, i replaces > the hard disk and the recovery process started without any issues. > As the node was still recovering the new replaced OSD drive was switched to > backfillfull. And this is where the pain stareted. I added another node > bought a harddrive and wiped the replacement OSD. > The Cluster then was a 4 node sized cluster with 3 OSD´s for the SSD pool and > 4 OSD´s for the HDD pool. > Then i started the recovery process from beginning. Ceph has also started at > this point a reassingment of missplaced objects. > Then a power failure to one of the remaining nodes happend and now im > stucking with a degraded Cluster and 49 pgs inactive, 3 pgs incomplete. > The OSD Container on the power failure node dindt come up anymore in case of > rocksdb error. Any advice how the recover the corrupt rocksdb ? > Container Log and rocksdb error: > > https://pastebin.com/gvGJdubx > > Regards an thanks for your help! > > Ben > > > -- > ___ > Diese E-mail einschließlich eventuell angehängter Dateien enthält > vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie nicht > der richtige Adressat sind und diese E-mail irrtümlich erhalten haben, dürfen > Sie weder den Inhalt dieser E-mail nutzen noch dürfen Sie die eventuell > angehängten Dateien öffnen und auch keine Kopie fertigen oder den Inhalt > weitergeben / verbreiten. Bitte verständigen Sie den Absender und löschen Sie > diese E-mail und eventuell angehängte Dateien umgehend. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx -- ___ Benjamin Naber • Holzstraße 7 • D-73650 Winterbach Mobil: +49 (0) 152.34087809 E-Mail: benjamin.na...@coders-area.de ___ Diese E-mail einschließlich eventuell angehängter Dateien enthält vertrauliche und / oder
[ceph-users] Re: OSD Crash in recovery: SST file contains data beyond the point of corruption.
Hi Benjamin, honestly the following advice is unlikely to help but you may want to try to set bluestore_rocksdb_options_annex to one of the following options: - wal_recovery_mode=kTolerateCorruptedTailRecords - wal_recovery_mode=kSkipAnyCorruptedRecord The indication that the setting is in effect would be the respective value at the end of following log line: debug 2022-09-12T17:37:05.574+ a8316040 4 rocksdb: Options.wal_recovery_mode: 2 It should get 0 and 3 respectively. Hoe this helps, Igor On 9/12/2022 9:09 PM, Benjamin Naber wrote: Hi Everybody, im struggeling now a couple of days with a degraded cehp cluster. Its a simple 3 node Cluster with 6 OSD´s, 3 SSD based, 3 HDD based. A couple of days ago one of the nodes crashed. in case of Hardisk failure, i replaces the hard disk and the recovery process started without any issues. As the node was still recovering the new replaced OSD drive was switched to backfillfull. And this is where the pain stareted. I added another node bought a harddrive and wiped the replacement OSD. The Cluster then was a 4 node sized cluster with 3 OSD´s for the SSD pool and 4 OSD´s for the HDD pool. Then i started the recovery process from beginning. Ceph has also started at this point a reassingment of missplaced objects. Then a power failure to one of the remaining nodes happend and now im stucking with a degraded Cluster and 49 pgs inactive, 3 pgs incomplete. The OSD Container on the power failure node dindt come up anymore in case of rocksdb error. Any advice how the recover the corrupt rocksdb ? Container Log and rocksdb error: https://pastebin.com/gvGJdubx Regards an thanks for your help! Ben -- ___ Diese E-mail einschließlich eventuell angehängter Dateien enthält vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch keine Kopie fertigen oder den Inhalt weitergeben / verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-mail und eventuell angehängte Dateien umgehend. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CephFS MDS sizing
On Tue, Sep 6, 2022 at 11:29 AM Vladimir Brik wrote: > > > What problem are you actually > > trying to solve with that information? > I suspect that the mds_cache_memory_limit we set (~60GB) is > sub-optimal and I am wondering if we would be better off if, > say, we halved the cache limits and doubled the number of > MDSes. I am looking for metrics to quantify this, and > cache_hit_rate and others in "dump loads" seem relevant. There are other indirect ways to measure cache effectiveness. Using the mds `perf dump` command, you can look at the objecter.omap_rd to see how often the MDS goes out to directory objects to read dentries. You can also look at the mds_mem.ino+ mds_mem.ino- to see how often inodes go in and out of the cache. -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] OSD Crash in recovery: SST file contains data beyond the point of corruption.
Hi Everybody, im struggeling now a couple of days with a degraded cehp cluster. Its a simple 3 node Cluster with 6 OSD´s, 3 SSD based, 3 HDD based. A couple of days ago one of the nodes crashed. in case of Hardisk failure, i replaces the hard disk and the recovery process started without any issues. As the node was still recovering the new replaced OSD drive was switched to backfillfull. And this is where the pain stareted. I added another node bought a harddrive and wiped the replacement OSD. The Cluster then was a 4 node sized cluster with 3 OSD´s for the SSD pool and 4 OSD´s for the HDD pool. Then i started the recovery process from beginning. Ceph has also started at this point a reassingment of missplaced objects. Then a power failure to one of the remaining nodes happend and now im stucking with a degraded Cluster and 49 pgs inactive, 3 pgs incomplete. The OSD Container on the power failure node dindt come up anymore in case of rocksdb error. Any advice how the recover the corrupt rocksdb ? Container Log and rocksdb error: https://pastebin.com/gvGJdubx Regards an thanks for your help! Ben -- ___ Diese E-mail einschließlich eventuell angehängter Dateien enthält vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch keine Kopie fertigen oder den Inhalt weitergeben / verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-mail und eventuell angehängte Dateien umgehend. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RGW multisite Cloud Sync module with support for client side encryption?
Hello Ceph-Users, I have a question regarding support for any client side encryption in the Cloud Sync Module for RGW (https://docs.ceph.com/en/latest/radosgw/cloud-sync-module/). While a "regular" multi-site setup (https://docs.ceph.com/en/latest/radosgw/multisite/) is usually syncing data between Ceph clusters, RGWs and other supporting infrastructure in the same administrative domain this might be different when looking at cloud sync. One could setup a sync to e.g. AWS S3 or any other compatible S3 implementation that is provided as a service and by another provider. 1) I was wondering if there is any transparent way to apply client side encryption to those objects that are sent to the remote service? Even something the likes of a single static key (see https://github.com/ceph/ceph/blob/1c9e84a447bb628f2235134f8d54928f7d6b7796/doc/radosgw/encryption.rst#automatic-encryption-for-testing-only) would protect against the remote provider being able to look at the data. 2) What happens to objects that are encrypted on the source RGW and via SSE-S3? (https://docs.ceph.com/en/quincy/radosgw/encryption/#sse-s3) I suppose those remain encrypted? But this does require users to actively make use of SSE-S3, right? Thanks again with kind regards, Christian ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Increasing number of unscrubbed PGs
Hi, On 9/12/22 11:44, Eugen Block wrote: Hi, I'm still not sure why increasing the interval doesn't help (maybe there's some flag set to the PG or something), but you could just increase osd_max_scrubs if your OSDs are not too busy. On one customer cluster with high load during the day we configured the scrubs to run during the night but then with osd_max_scrubs = 6. What is your current value for osd_max_scrubs? This is the complete OSD related configuration: osd advanced bluefs_buffered_io true osd advanced osd_command_max_records 1024 osd advanced osd_deep_scrub_interval 4838400.00 osd advanced osd_max_backfills 5 osd advanced osd_max_scrubs 10 osd advanced osd_op_complaint_time 10.00 osd advanced osd_scrub_sleep 0.00 Regards, Burkhard ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds's stay in up:standby
Hi, what happenend to the cluster? Several services report a short uptime (68 minutes). If you shared some MDS logs someone might find a hint why they won't become active. If the regular logs don't reveal anything enable debug logs. Zitat von Tobias Florek : Hi! I am running a rook managed hyperconverged ceph cluster on kubernetes using ceph 17.2.3 with a single-rank single fs cephfs. I am now facing the problem that the mds's stay in up:standby. I tried setting allow_standby_replay to false and restarting both mds daemons, but nothing changed. ceph -s cluster: id: 08f51f08-9551-488f-9419-787a7717555e health: HEALTH_ERR 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged services: mon: 5 daemons, quorum cy,dt,du,dv,dw (age 68m) mgr: a(active, since 64m), standbys: b mds: 0/1 daemons up, 2 standby osd: 10 osds: 10 up (since 68m), 10 in (since 3d) data: volumes: 0/1 healthy, 1 recovering; 1 damaged pools: 14 pools, 273 pgs objects: 834.69k objects, 1.2 TiB usage: 3.7 TiB used, 23 TiB / 26 TiB avail pgs: 273 active+clean The journal looks ok though: cephfs-journal-tool --rank cephfs:0 journal inspect Overall journal integrity: OK cephfs-journal-tool --rank cephfs:0 header get { "magic": "ceph fs volume v011", "write_pos": 2344234253408, "expire_pos": 2344068406026, "trimmed_pos": 2344041316352, "stream_format": 1, "layout": { "stripe_unit": 4194304, "stripe_count": 1, "object_size": 4194304, "pool_id": 10, "pool_ns": "" } } cephfs-journal-tool --rank cephfs:0 event get summary Events by type: OPEN: 47779 SESSION: 24 SUBTREEMAP: 113 UPDATE: 53346 Errors: 0 ceph fs dump e269368 enable_multiple, ever_enabled_multiple: 1,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5= mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'cephfs' (1) fs_name cephfs epoch 269356 flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay created 2020-05-05T21:54:21.907356+ modified2022-09-07T13:32:13.263940+ tableserver 0 root0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 required_client_features{} last_failure0 last_failure_osd_epoch 69305 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {} failed damaged 0 stopped data_pools [11,14] metadata_pool 10 inline_data disabled balancer standby_count_wanted1 Standby daemons: [mds.cephfs-a{-1:94490181} state up:standby seq 1 join_fscid=1 addr [v2:172.21.0.75:6800/3162134136,v1:172.21.0.75:6801/3162134136] compat {c=[1],r=[1] ,i=[7ff]}] [mds.cephfs-b{-1:94519600} state up:standby seq 1 join_fscid=1 addr [v2:172.21.0.76:6800/2282837495,v1:172.21.0.76:6801/2282837495] compat {c=[1],r=[1] ,i=[7ff]}] dumped fsmap epoch 269368 Thank you for your help! Tobias Florek ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Increasing number of unscrubbed PGs
Hi, I'm still not sure why increasing the interval doesn't help (maybe there's some flag set to the PG or something), but you could just increase osd_max_scrubs if your OSDs are not too busy. On one customer cluster with high load during the day we configured the scrubs to run during the night but then with osd_max_scrubs = 6. What is your current value for osd_max_scrubs? Regards, Eugen Zitat von Burkhard Linke : Hi, our cluster is running pacific 16.2.10. Since the upgrade the clusters starts to report an increasing number of PG without a timely deep-scrub: # ceph -s cluster: id: health: HEALTH_WARN 1073 pgs not deep-scrubbed in time services: mon: 3 daemons, quorum XXX,XXX,XXX (age 10d) mgr: XXX(active, since 3w), standbys: XXX, XXX mds: 2/2 daemons up, 2 standby osd: 460 osds: 459 up (since 3d), 459 in (since 5d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 2/2 healthy pools: 16 pools, 5073 pgs objects: 733.76M objects, 1.1 PiB usage: 1.6 PiB used, 3.3 PiB / 4.9 PiB avail pgs: 4941 active+clean 105 active+clean+scrubbing 27 active+clean+scrubbing+deep The cluster is healthy otherwise, with the exception of one failed OSD. It has been marked out and should not interfere with scrubbing. Scrubbing itself is running, but there are too few deep-scrubs. If I remember correctly we had a larger number of deep scrubs before the last upgrade. It tried to extend the deep-scrub interval, but to no avail yet. The majority of PGs is part of a ceph data pool (4096 of 4941 pgs), and those are also most of the pgs reported. The pool is backed by 12 machines with 48 disks each, so there should be enough I/O capacity for running deep-scrubs. Load on these machines and disks is also pretty low. Any hints on debugging this? The number of affected PGs has rising from 600 to over 1000 during the weekend and continues to rise... Best regards, Burkhard Linke ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Increasing number of unscrubbed PGs
Hi, our cluster is running pacific 16.2.10. Since the upgrade the clusters starts to report an increasing number of PG without a timely deep-scrub: # ceph -s cluster: id: health: HEALTH_WARN 1073 pgs not deep-scrubbed in time services: mon: 3 daemons, quorum XXX,XXX,XXX (age 10d) mgr: XXX(active, since 3w), standbys: XXX, XXX mds: 2/2 daemons up, 2 standby osd: 460 osds: 459 up (since 3d), 459 in (since 5d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 2/2 healthy pools: 16 pools, 5073 pgs objects: 733.76M objects, 1.1 PiB usage: 1.6 PiB used, 3.3 PiB / 4.9 PiB avail pgs: 4941 active+clean 105 active+clean+scrubbing 27 active+clean+scrubbing+deep The cluster is healthy otherwise, with the exception of one failed OSD. It has been marked out and should not interfere with scrubbing. Scrubbing itself is running, but there are too few deep-scrubs. If I remember correctly we had a larger number of deep scrubs before the last upgrade. It tried to extend the deep-scrub interval, but to no avail yet. The majority of PGs is part of a ceph data pool (4096 of 4941 pgs), and those are also most of the pgs reported. The pool is backed by 12 machines with 48 disks each, so there should be enough I/O capacity for running deep-scrubs. Load on these machines and disks is also pretty low. Any hints on debugging this? The number of affected PGs has rising from 600 to over 1000 during the weekend and continues to rise... Best regards, Burkhard Linke ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io