[ceph-users] Fwd: On your post: "[Tentacle 20.2.0]: Inconsistent pg's after enabling ec optimisation flag"

Reto Gysi via ceph-users Fri, 05 Dec 2025 07:34:38 -0800

Hi Bill

Am Fr., 5. Dez. 2025 um 10:02 Uhr schrieb Bill Scales <
[email protected]>:


> Hi Reto,
>
> Sorry to hear about your problems with turning on ec optimizations. I've
> led the team that developed this feature, so we are keen to help understand
> what happened to your system.
>
> Your configuration looks fine as far as being supported with ec
> optimizations. The daemons (mons, osds and mgr) need to be running tentacle
> code to use this feature, there is no requirement to update any of your
> clients.
>
> Do you have any more examples of the inconsistent objects logs that you
> can share with me?
>
> There was a bug that we fixed late in the development cycle which was
> scrubbing incorrectly reporting a size mismatch for objects written prior
> to turning on optimizations. This is because prior to turning on
> optimizations objects are padded to a multiple of the stripe_width,
> afterwards objects are no longer padded. The scrub code was occasionally
> getting confused and incorrectly reporting an inconsistency. In this case
> the scrub was reporting false positive errors - there was nothing wrong
> with the data and no problems accessing the data. I notice the log you
> shared in the email was for a size mismatch, I'm interested whether all the
> logs were for mismatched sizes. We will do some further work to confirm
> that the fix is in the 20.2.0 tentacle release and that there are no
> problems with the back port.
>

As far I could see the errors where all about the mismatched sizes.


>
> You also mention that you saw some OSD crashes. Do you have any further
> information about these?
>

I just could cause another crash with starting up a windows VM, and doing a
WIndows file history backup to a drive that is on an RBD Image on pool
rbd_ecpool where I had the allow_ec_optimization flag enabled.

<disk type="network" device="disk">
  <driver name="qemu" type="raw" cache="writethrough" io="threads"
discard="unmap" detect_zeroes="unmap"/>
  <auth username="admin">
    <secret type="ceph" uuid="878b0bc5-c471-4ec6-a92a-f65282ffbdf6"/>
  </auth>
  <source protocol="rbd" name="rbd/game_windows_backup_drive" index="3">
    <host name="zephir" port="6789"/>
  </source>
  <target dev="vdd" bus="virtio"/>
  <alias name="virtio-disk3"/>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x02"
function="0x0"/>
</disk>

 root@zephir:~# ceph -s
 cluster:
   id:     27923302-87a5-11ec-ac5b-976d21a49941
   health: HEALTH_WARN
           2 osds down
           Reduced data availability: 49 pgs inactive
           Degraded data redundancy: 15722192/79848340 objects degraded
(19.690%), 246 pgs degraded, 247 pgs undersized

 services:
   mon:           3 daemons, quorum zephir,debian,raspi (age 20h) [leader:
zephir]
   mgr:           zephir.enywvy(active, since 21h), standbys: debian.nusuye
   mds:           3/3 daemons up, 3 standby
   osd:           18 osds: 16 up (since 5m), 18 in (since 3d); 8 remapped
pgs
   cephfs-mirror: 2 daemons active (2 hosts)
   rbd-mirror:    2 daemons active (2 hosts)
   rgw:           1 daemon active (1 hosts, 1 zones)
   tcmu-runner:   5 portals active (2 hosts)

 data:
   volumes: 3/3 healthy
   pools:   25 pools, 450 pgs
   objects: 17.52M objects, 62 TiB
   usage:   116 TiB used, 63 TiB / 179 TiB avail
   pgs:     10.889% pgs not active
            15722192/79848340 objects degraded (19.690%)
            197 active+undersized+degraded
            183 active+clean
            49  undersized+degraded+peered
            15  active+clean+scrubbing
            5   active+clean+scrubbing+deep
            1   active+undersized

 io:
   client:   1.7 KiB/s rd, 1023 B/s wr, 1 op/s rd, 0 op/s wr

root@zephir:~# ceph osd status
ID  HOST     USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
0  debian  11.2T  5289G      0        0       0        0   exists,up
1  zephir  11.6T  4846G      0        0       0        0   exists
2  zephir  10.7T  5827G      0        0       2        4   exists,up
3  zephir  11.0T  5497G      0        0       0        0   exists
4  zephir   159G  1376G      0        0       2        0   exists,up
5  zephir   120G  1415G      0        0       0        0   exists,up
6  debian  11.3T  5200G      0        0       1        0   exists,up
7  zephir   305G  1230G      0        0       0        0   exists,up
8  zephir  10.9T  5594G      0        0       0        0   exists,up
9  zephir  11.9T  4616G      0        0       0        0   exists,up
10  zephir   235G  1300G      0        0       0        0   exists,up
11  zephir   223G  1312G      0        0       0        0   exists,up
12  zephir  11.0T  5479G      0        0       0        0   exists,up
13  debian   118G   631G      0        0       3        8   exists,up
14  debian   345G  1190G      0        0       0        0   exists,up
15  debian  11.4T  5079G      0        0       0        0   exists,up
16  debian   554G  3029G      0        0       3        8   exists,up
17  debian  12.5T  5874G      0        0       1        0   exists,up
root@zephir:~#

I attached the log of osd.1 to
https://filebin.net/jwvs6kuqrc7hx8id

Best Regards,

Reto



>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Fwd: On your post: "[Tentacle 20.2.0]: Inconsistent pg's after enabling ec optimisation flag"

Reply via email to