Hi, The PG numbers are still very low in my opinion. you have 42 OSDs and only 614 PGs that makes roughly 15 PG / OSD. That's quite far from the rule of thumb of 100 PG/OSD. But maybe your problem is located in a different place. You may want to check whether all your `rbd-target-api` services are up and running. gwcli relies on them.
Kind regards, Laszlo Budai On 9/30/25 10:31, Kardos László wrote:
Hello, I apologize for sending the wrong pool details earlier. We store the data in the following data pool: xxxx0-data pool 15 'xxxx0-data' erasure profile laurel_ec size 4 min_size 3 crush_rule 8 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change 30830 lfor 0/0/30825 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 12288 application rbd,rgw cluster: id: c404fafe-767c-11ee-bc37-0509d00921ba health: HEALTH_OK services: mon: 5 daemons, quorum v188-ceph-mgr0,v188-ceph-mgr1,v188-ceph-iscsigw2,v188-ceph6,v188-ceph5 (age 5d) mgr: v188-ceph-mgr0.rxcecw(active, since 11w), standbys: v188-ceph-mgr1.hmbuma mds: 1/1 daemons up, 1 standby osd: 42 osds: 42 up (since 2M), 42 in (since 3M) tcmu-runner: 10 portals active (4 hosts) data: volumes: 1/1 healthy pools: 11 pools, 614 pgs objects: 13.63M objects, 51 TiB usage: 75 TiB used, 71 TiB / 147 TiB avail pgs: 613 active+clean 1 active+clean+scrubbing+deep io: client: 8.1 MiB/s rd, 105 MiB/s wr, 320 op/s rd, 2.31k op/s wr Best Regards, Laszlo Kardos -----Original Message----- From: Eugen Block<[email protected]> Sent: Tuesday, September 30, 2025 9:03 AM To:[email protected] Subject: [ceph-users] Re: Ceph GWCLI issue Hi, I don't have an answer why the image is in unknown state, but I'd be concerned about the pool's pg_num. You have Terabytes in a pool with a single PG? That's awful and should be increased to a more suitable value. I can't say if that would fix anything regarding the unknown issue, but that's definitely not good at all. What is the overall Ceph status (ceph -s)? Regards, Eugen Zitat von Kardos László<[email protected]>:Hello, We have encountered the following issue in our production environment: A new RBD Image was created within an existing pool, and its status is reported as "unknown" in GWCLI. Based on our tests, this does not appear to cause operational issues, but we would like to investigate the root cause. No relevant information regarding this issue was found in the logs. GWCLI output: o- / .......................................................................... ............................................... [...] o- cluster .......................................................................... ............................... [Clusters: 1] | o- ceph .......................................................................... .................................. [HEALTH_OK] | o- pools .......................................................................... ............................... [Pools: 11] | | o- .mgr ................................................................ [(x3), Commit: 0.00Y/15591725M (0%), Used: 194124K] | | o- .nfs ................................................................. [(x3), Commit: 0.00Y/15591725M (0%), Used: 16924b] | | o- xxxx-test ............................................................. [(2+1), Commit: 0.00Y/23727198M (0%), Used: 0.00Y] | | o- xxxxx-erasure-0 ............................................ [(2+1), Commit: 0.00Y/23727198M (0%), Used: 61519257668K] | | o- xxxxxx-repl ...................................................... [(x3), Commit: 0.00Y/15591725M (0%), Used: 130084b] | | o- cephfs.cephfs-test.data ............................................ [(x3), Commit: 0.00Y/15591725M (0%), Used: 9090444K] | | o- cephfs.cephfs-test.meta .......................................... [(x3), Commit: 0.00Y/15591725M (0%), Used: 516415713b] | | o- xxxxx-data ..................................................... [(3+1), Commit: 0.00Y/9604386M (0%), Used: 7547753556K] | | o- xxxxx-rpl .......................................................... [(x3), Commit: 12.0T/4268616M (294%), Used: 85265b] | | o- xxxxx-data ................................................... [(3+1), Commit: 0.00Y/5011626M (0%), Used: 10955179612K] | | o- replicated_xxxx ............................................... [(x3), Commit: 25.0T/2280846592K (1176%), Used: 46912b] | o- topology .......................................................................... ..................... [OSDs: 42,MONs: 5] o- disks .......................................................................... ............................. [37.0T, Disks: 3] | o- xxxx-rpl .......................................................................... ................... [xxxx-rpl (12.0T)] | | o- xxxxx_lun0 ........................................................................ [xxxx-rpl/xxxxx_lun0 (Online, 12.0T)] | o- replicated_xxxx .......................................................................... ..... [replicated_xxxx (25.0T)] | o- xxxx_lun0 ............................................................... [replicated_xxxx/xxxx_lun0 (Online, 12.0T)] | o- xxxx_lun_new ........................................................ [replicated_xxxx/xxxx_lun_new (Unknown, 13.0T)] The image (xxxx_lun_new) is provisioned to multiple ESXi hosts, mounted, and formatted with VMFS6. The datastore is writable and readable by the hosts. There is a change in the block size of the RBD Image: the older RBD Images use a 4 MiB block size, while the new RBD Image uses a 512 KiB block size. RBD Image Parameters: For replicated_xxxx / xxxx_lun0 (Online status in GWCLI): rbd image 'xxxx_lun0': size 12 TiB in 3145728 objects order 22 (4 MiB objects) snapshot_count: 0 id: 5c1b5ecfdfa46 data_pool: xxxx0-data block_name_prefix: rbd_data.14.5c1b5ecfdfa46 format: 2 features: exclusive-lock, data-pool op_features: flags: create_timestamp: Tue Jul 8 13:02:11 2025 access_timestamp: Thu Sep 25 13:49:47 2025 modify_timestamp: Thu Sep 25 13:50:05 2025 For replicated_xxxx / xxxx_lun_new (Unknown status in GWCLI): rbd image 'xxxx_lun_new': size 13 TiB in 27262976 objects order 19 (512 KiB objects) snapshot_count: 0 id: 1945d9cf9f41ab data_pool: xxxx0-data block_name_prefix: rbd_data.14.1945d9cf9f41ab format: 2 features: exclusive-lock, data-pool op_features: flags: create_timestamp: Wed Sep 24 11:21:21 2025 access_timestamp: Thu Sep 25 13:50:42 2025 modify_timestamp: Thu Sep 25 13:49:48 2025 Pool Parameters: pool 14 'replicated_xxxx' replicated size 3 min_size 2 crush_rule 7 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 30743 flags hashpspool stripe_width 0 application rbd,rgw Ceph version: ceph --version ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) Question: What could be causing the RBD Image (xxxx_lun_new) to appear in an "unknown" state in GWCLI?_______________________________________________ ceph-users mailing list [email protected] To unsubscribe send an email [email protected] _______________________________________________ ceph-users mailing list [email protected] To unsubscribe send an email [email protected]
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
