[ceph-users] Re: Ceph GWCLI issue

Laszlo Budai Thu, 02 Oct 2025 12:06:00 -0700

Hi,

The PG numbers are still very low in my opinion. you have 42 OSDs and only 614 
PGs that makes roughly 15 PG / OSD. That's quite far from the rule of thumb of 
100 PG/OSD. But maybe your problem is located in a different place. You may 
want to check whether all your `rbd-target-api` services are up and running. 
gwcli relies on them.


Kind regards,
Laszlo Budai

On 9/30/25 10:31, Kardos László wrote:

Hello,
I apologize for sending the wrong pool details earlier.
We store the data in the following data pool:  xxxx0-data

pool 15 'xxxx0-data' erasure profile laurel_ec size 4 min_size 3 crush_rule
8 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off last_change
30830 lfor 0/0/30825 flags hashpspool,ec_overwrites,selfmanaged_snaps
stripe_width 12288 application rbd,rgw

cluster:
     id:     c404fafe-767c-11ee-bc37-0509d00921ba
     health: HEALTH_OK

   services:
     mon:         5 daemons, quorum
v188-ceph-mgr0,v188-ceph-mgr1,v188-ceph-iscsigw2,v188-ceph6,v188-ceph5 (age
5d)
     mgr:         v188-ceph-mgr0.rxcecw(active, since 11w), standbys:
v188-ceph-mgr1.hmbuma
     mds:         1/1 daemons up, 1 standby
     osd:         42 osds: 42 up (since 2M), 42 in (since 3M)
     tcmu-runner: 10 portals active (4 hosts)

   data:
     volumes: 1/1 healthy
     pools:   11 pools, 614 pgs
     objects: 13.63M objects, 51 TiB
     usage:   75 TiB used, 71 TiB / 147 TiB avail
     pgs:     613 active+clean
              1   active+clean+scrubbing+deep

   io:
     client:   8.1 MiB/s rd, 105 MiB/s wr, 320 op/s rd, 2.31k op/s wr


Best Regards,
Laszlo Kardos

-----Original Message-----
From: Eugen Block<[email protected]>
Sent: Tuesday, September 30, 2025 9:03 AM
To:[email protected]
Subject: [ceph-users] Re: Ceph GWCLI issue


Hi,

I don't have an answer why the image is in unknown state, but I'd be
concerned about the pool's pg_num. You have Terabytes in a pool with a
single PG? That's awful and should be increased to a more suitable value. I
can't say if that would fix anything regarding the unknown issue, but that's
definitely not good at all.

What is the overall Ceph status (ceph -s)?

Regards,
Eugen


Zitat von Kardos László<[email protected]>:

Hello,

We have encountered the following issue in our production environment:

A new RBD Image was created within an existing pool, and its status is
reported as "unknown" in GWCLI. Based on our tests, this does not
appear to cause operational issues, but we would like to investigate
the root cause. No relevant information regarding this issue was found in
the logs.

GWCLI output:



o- /
..........................................................................
............................................... [...]

   o- cluster
..........................................................................
............................... [Clusters: 1]

   | o- ceph
..........................................................................
.................................. [HEALTH_OK]

   |   o- pools
..........................................................................
............................... [Pools: 11]

   |   | o- .mgr
................................................................
[(x3),
Commit: 0.00Y/15591725M (0%), Used: 194124K]

  |   | o- .nfs
.................................................................
[(x3),
Commit: 0.00Y/15591725M (0%), Used: 16924b]

   |   | o- xxxx-test
............................................................. [(2+1),
Commit: 0.00Y/23727198M (0%), Used: 0.00Y]

   |   | o- xxxxx-erasure-0 ............................................
[(2+1), Commit: 0.00Y/23727198M (0%), Used: 61519257668K]

   |   | o- xxxxxx-repl
...................................................... [(x3), Commit:
0.00Y/15591725M (0%), Used: 130084b]

   |   | o- cephfs.cephfs-test.data
............................................ [(x3), Commit:
0.00Y/15591725M (0%), Used: 9090444K]

   |   | o- cephfs.cephfs-test.meta
.......................................... [(x3), Commit:
0.00Y/15591725M (0%), Used: 516415713b]

   |   | o- xxxxx-data
..................................................... [(3+1), Commit:
0.00Y/9604386M (0%), Used: 7547753556K]

   |   | o- xxxxx-rpl
.......................................................... [(x3), Commit:
12.0T/4268616M (294%), Used: 85265b]

   |   | o- xxxxx-data ...................................................
[(3+1), Commit: 0.00Y/5011626M (0%), Used: 10955179612K]

   |   | o- replicated_xxxx ...............................................
[(x3), Commit: 25.0T/2280846592K (1176%), Used: 46912b]

   |   o- topology
..........................................................................
..................... [OSDs: 42,MONs: 5]

   o- disks
..........................................................................
............................. [37.0T, Disks: 3]

  | o- xxxx-rpl
..........................................................................
................... [xxxx-rpl (12.0T)]

   | | o- xxxxx_lun0
........................................................................
[xxxx-rpl/xxxxx_lun0 (Online, 12.0T)]

   | o- replicated_xxxx
..........................................................................
..... [replicated_xxxx (25.0T)]

   |   o- xxxx_lun0
...............................................................
[replicated_xxxx/xxxx_lun0 (Online, 12.0T)]

   |   o- xxxx_lun_new
........................................................
[replicated_xxxx/xxxx_lun_new (Unknown, 13.0T)]



The image (xxxx_lun_new) is provisioned to multiple ESXi hosts,
mounted, and formatted with VMFS6. The datastore is writable and
readable by the hosts.

There is a change in the block size of the RBD Image: the older RBD
Images use a 4 MiB block size, while the new RBD Image uses a 512 KiB
block size.

RBD Image Parameters:

For replicated_xxxx / xxxx_lun0 (Online status in GWCLI):



rbd image 'xxxx_lun0':

         size 12 TiB in 3145728 objects

         order 22 (4 MiB objects)

         snapshot_count: 0

         id: 5c1b5ecfdfa46

         data_pool: xxxx0-data

         block_name_prefix: rbd_data.14.5c1b5ecfdfa46

         format: 2

         features: exclusive-lock, data-pool

         op_features:

         flags:

         create_timestamp: Tue Jul  8 13:02:11 2025

         access_timestamp: Thu Sep 25 13:49:47 2025

         modify_timestamp: Thu Sep 25 13:50:05 2025





For replicated_xxxx / xxxx_lun_new (Unknown status in GWCLI):

rbd image 'xxxx_lun_new':
         size 13 TiB in 27262976 objects
         order 19 (512 KiB objects)
         snapshot_count: 0
         id: 1945d9cf9f41ab
         data_pool: xxxx0-data
         block_name_prefix: rbd_data.14.1945d9cf9f41ab
         format: 2
         features: exclusive-lock, data-pool
         op_features:
         flags:
         create_timestamp: Wed Sep 24 11:21:21 2025
         access_timestamp: Thu Sep 25 13:50:42 2025
         modify_timestamp: Thu Sep 25 13:49:48 2025



Pool Parameters:

pool 14 'replicated_xxxx' replicated size 3 min_size 2 crush_rule 7
object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change
30743 flags hashpspool stripe_width 0 application rbd,rgw

Ceph version:

ceph --version

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable)



Question:

What could be causing the RBD Image (xxxx_lun_new) to appear in an
"unknown" state in GWCLI?


_______________________________________________
ceph-users mailing list [email protected] To unsubscribe send an email
[email protected]
_______________________________________________
ceph-users mailing list [email protected]
To unsubscribe send an email [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Ceph GWCLI issue

Reply via email to