[ceph-users] Re: ceph-users Digest, Vol 134, Issue 88

renjianxinlover Fri, 29 Aug 2025 05:05:37 -0700

hi,


faced with a strange problem, one of multiple mds keeps in resolve state. 
following logs repeat
```
...
2025-08-29 20:01:39.582 7f8656fe3700 1 mds.ceph-prod-60 Updating MDS map to 
version 1655087 from mon.3
2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 my compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no 
anchor table,9=file layout v2,10=snaprealm v2}
2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 mdsmap compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 my gid is 91681643
2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 map says I am 
mds.12.1646436 state up:resolve
2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 msgr says I am 
[v2:7.33.104.23:6988/1418417955,v1:7.33.104.23:6989/1418417955]
2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 handle_mds_map: 
handling map as rank 12
2025-08-29 20:01:39.606 7f86527da700 10 mds.12.cache cache not ready for 
trimming
2025-08-29 20:01:40.606 7f86527da700 10 mds.12.cache cache not ready for 
trimming
2025-08-29 20:01:40.698 7f86547de700 5 mds.beacon.ceph-prod-60 Sending beacon 
up:resolve seq 6940
2025-08-29 20:01:40.698 7f86597e8700 5 mds.beacon.ceph-prod-60 received beacon 
reply up:resolve seq 6940 rtt 0
2025-08-29 20:01:41.606 7f86527da700 10 mds.12.cache cache not ready for 
trimming
...
```
mds status look like
```
=========
+------+---------+---------------------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+---------+---------------------------+---------------+-------+-------+
| 0 | active | ceph-prod-45 | Reqs: 0 /s | 113k | 112k |
| 1 | active | ceph-prod-46 | Reqs: 0 /s | 114k | 113k |
| 2 | active | ceph-prod-47 | Reqs: 50 /s | 3967k | 3924k |
| 3 | active | ceph-prod-10 | Reqs: 14 /s | 2402k | 2391k |
| 4 | active | ceph-prod-02 | Reqs: 0 /s | 31.6k | 27.0k |
| 5 | active | ceph-prod-48 | Reqs: 0 /s | 357k | 356k |
| 6 | active | ceph-prod-11 | Reqs: 0 /s | 1144k | 1144k |
| 7 | active | ceph-prod-57 | Reqs: 0 /s | 168k | 168k |
| 8 | active | ceph-prod-44 | Reqs: 30 /s | 5007k | 5007k |
| 9 | active | ceph-prod-20 | Reqs: 0 /s | 195k | 195k |
| 10 | active | ceph-prod-43 | Reqs: 0 /s | 1757k | 1750k |
| 11 | active | ceph-prod-01 | Reqs: 0 /s | 2879k | 2849k |
| 12 | resolve | ceph-prod-60 | | 652 | 655 |
| 13 | active | fuxi-aliyun-ceph-res-tmp3 | Reqs: 0 /s | 79.9k | 59.6k |
+------+---------+---------------------------+---------------+-------+-------+
+-----------------+----------+-------+-------+
| Pool | type | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 2110G | 2738G |
| cephfs_data | data | 1457T | 205T |
+-----------------+----------+-------+-------+
```


can anyone help? thanks.




---- Replied Message ----
| From | <[email protected]> |
| Date | 8/28/2025 16:54 |
| To | <[email protected]> |
| Subject | ceph-users Digest, Vol 134, Issue 88 |
Send ceph-users mailing list submissions to
        [email protected]

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

   1. Re: OSD crc errors: Faulty SSD? (Anthony D'Atri)
   2. Re: OSD crc errors: Faulty SSD? (Igor Fedotov)
   3. Debian Packages for Trixie (Andrew)


----------------------------------------------------------------------

Date: Wed, 27 Aug 2025 10:08:16 -0400
From: Anthony D'Atri <[email protected]>
Subject: [ceph-users] Re: OSD crc errors: Faulty SSD?
To: Roland Giesler <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain;       charset=utf-8

Please:

* Identify the SSD involved
* Look for messages in `dmesg` and `/var/log/{syslog/messages}` for that device
* `smartctl -a /dev/xxxx`
* If you you got the device from Dell or HP, look for a firmware update.


Are you setting non-default RocksDB options?  

On Aug 27, 2025, at 10:01 AM, Roland Giesler <[email protected]> wrote:

 I have relatively new Samsung Enterprise NVMe in a node that is generating the 
following error:

 2025-08-26T15:56:43.870+0200 7fe8ac968700  0 bad crc in data 3326000616 != exp 
1246001655 fromv1:192.168.131.4:0/1799093090
 2025-08-26T16:03:54.757+0200 7fe8ad96a700  0 bad crc in data 3195468789 != exp 
4291467912 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:17:34.160+0200 7fe8ad96a700  0 bad crc in data 1471079732 != exp 
1408597599 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:33:34.035+0200 7fe8ad96a700  0 bad crc in data 724234454 != exp 
3110238891 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:36:34.265+0200 7fe8ad96a700  0 bad crc in data 96649884 != exp 
3724606899 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:40:34.395+0200 7fe8ad96a700  0 bad crc in data 1554359919 != exp 
1420125995 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:54:18.323+0200 7fe8ad169700  0 bad crc in data 362320144 != exp 
1850249930 fromv1:192.168.131.1:0/1316652062


 This is ceph osd.40.  More details from the log:

 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
{"time_micros": 1756220415016487, "job": 447787, "event": 
"table_file_deletion", "file_number": 389377}
 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
{"time_micros": 1756220415016839, "job": 447787, "event": 
"table_file_deletion", "file_number": 389359}
 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
{"time_micros": 1756220415017245, "job": 447787, "event": 
"table_file_deletion", "file_number": 389342}
 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
{"time_micros": 1756220415017633, "job": 447787, "event": 
"table_file_deletion", "file_number": 389276}
 2025-08-26T17:00:15.041+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
{"time_micros": 1756220415047481, "job": 447787, "event": 
"table_file_deletion", "file_number": 389254}
 2025-08-26T17:00:15.045+0200 7fe8a06fe700  4 rocksdb: (Original Log Time 
2025/08/26-17:00:15.047592) [db/db_impl/db_impl_compaction_flush.cc:2818] 
Compaction nothing to do
 2025-08-26T17:04:35.776+0200 7fe899ee2700  4 rocksdb: 
[db/db_impl/db_impl.cc:901] ------- DUMPING STATS -------
 2025-08-26T17:04:35.776+0200 7fe899ee2700  4 rocksdb: 
[db/db_impl/db_impl.cc:903]
 ** DB Stats **
 Uptime(secs): 24012063.5 total, 600.0 interval
 Cumulative writes: 10G writes, 37G keys, 10G commit groups, 1.0 writes per 
commit group, ingest: 19073.99 GB, 0.81 MB/s
 Cumulative WAL: 10G writes, 4860M syncs, 2.14 writes per sync, written: 
19073.99 GB, 0.81 MB/s
 Cumulative stall: 00:00:0.000H:M:S, 0.0 percent
 Interval writes: 159K writes, 546K keys, 159K commit groups, 1.0 writes per 
commit group, ingest: 202.96 MB, 0.34 MB/s
 Interval WAL: 159K writes, 76K syncs, 2.10 writes per sync, written: 0.20 MB, 
0.34 MB/s
 Interval stall: 00:00:0.000H:M:S, 0.0 percent
 ** Compaction Stats [default] **
 Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) 
Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) 
Avg(sec) KeyIn KeyDrop
 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       2.4      2.4    
   0.0   1.0      0.0     25.1     98.33             49.99     19074    0.005   
    0      0
   L1      2/0   137.69 MB   1.0    299.7     2.4    297.3     297.9      0.7   
    0.0 123.5     59.6     59.2   5152.28           4745.40      4769    1.080  
 8130M    47M
   L2      7/0   410.80 MB   0.2      1.3     0.2      1.1       1.2      0.1   
    0.3   5.8     79.4     72.1     17.00             15.59         4    4.250  
   32M  3822K
  Sum      9/0   548.48 MB   0.0    301.0     2.6    298.4     301.5      3.1   
    0.3 125.0     58.5     58.6   5267.61           4810.98     23847    0.221  
 8162M    51M
  Int      0/0    0.00 KB   0.0      0.1     0.0      0.1       0.1      0.0    
   0.0 3194.3     48.0     47.9      2.87              2.72         2    1.437  
 5633K   2173
 ** Compaction Stats [default] **
 Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
Comp(cnt) Avg(sec) KeyIn KeyDrop
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Low      0/0    0.00 KB   0.0    301.0     2.6    298.4     299.1      0.7    
   0.0   0.0     59.6     59.3   5169.28           4761.00      4773    1.083   
8162M    51M
 High      0/0    0.00 KB   0.0      0.0     0.0      0.0       2.4      2.4    
   0.0   0.0      0.0     25.1     98.33             49.99     19073    0.005   
    0      0
 User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0    
   0.0   0.0      0.0     32.8      0.00              0.00         1    0.001   
    0      0

 This gets repeated many times until finally all seems well again.

 This a few minutes later the problem repeats itself.

 Is this a faulty ssd causing this?

 Environment:
 # pveversion: pve-manager/7.4-19/f98bf8d4 (running kernel: 5.15.131-2-pve)
 # ceph version 17.2.7 (29dffbfe59476a6bb5363cf5cc629089b25654e3) quincy 
(stable)
 _______________________________________________
 ceph-users mailing list -- [email protected]
 To unsubscribe send an email to [email protected]

------------------------------

Date: Wed, 27 Aug 2025 17:22:19 +0300
From: Igor Fedotov <[email protected]>
Subject: [ceph-users] Re: OSD crc errors: Faulty SSD?
To: Roland Giesler <[email protected]>, [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi Roland,

this looks like a messenger error to me. Hence it's rather a 
transport/networking issue not data-at-rest one.

Thanks,

Igor

On 8/27/2025 5:01 PM, Roland Giesler wrote:
I have relatively new Samsung Enterprise NVMe in a node that is 
 generating the following error:

 2025-08-26T15:56:43.870+0200 7fe8ac968700  0 bad crc in data 
 3326000616 != exp 1246001655 fromv1:192.168.131.4:0/1799093090
 2025-08-26T16:03:54.757+0200 7fe8ad96a700  0 bad crc in data 
 3195468789 != exp 4291467912 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:17:34.160+0200 7fe8ad96a700  0 bad crc in data 
 1471079732 != exp 1408597599 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:33:34.035+0200 7fe8ad96a700  0 bad crc in data 724234454 
 != exp 3110238891 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:36:34.265+0200 7fe8ad96a700  0 bad crc in data 96649884 
 != exp 3724606899 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:40:34.395+0200 7fe8ad96a700  0 bad crc in data 
 1554359919 != exp 1420125995 fromv1:192.168.131.3:0/315398791
 2025-08-26T16:54:18.323+0200 7fe8ad169700  0 bad crc in data 362320144 
 != exp 1850249930 fromv1:192.168.131.1:0/1316652062


 This is ceph osd.40.  More details from the log:

 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
 {"time_micros": 1756220415016487, "job": 447787, "event": 
 "table_file_deletion", "file_number": 389377}
 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
 {"time_micros": 1756220415016839, "job": 447787, "event": 
 "table_file_deletion", "file_number": 389359}
 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
 {"time_micros": 1756220415017245, "job": 447787, "event": 
 "table_file_deletion", "file_number": 389342}
 2025-08-26T17:00:15.013+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
 {"time_micros": 1756220415017633, "job": 447787, "event": 
 "table_file_deletion", "file_number": 389276}
 2025-08-26T17:00:15.041+0200 7fe8a06fe700  4 rocksdb: EVENT_LOG_v1 
 {"time_micros": 1756220415047481, "job": 447787, "event": 
 "table_file_deletion", "file_number": 389254}
 2025-08-26T17:00:15.045+0200 7fe8a06fe700  4 rocksdb: (Original Log 
 Time 2025/08/26-17:00:15.047592) 
 [db/db_impl/db_impl_compaction_flush.cc:2818] Compaction nothing to do
 2025-08-26T17:04:35.776+0200 7fe899ee2700  4 rocksdb: 
 [db/db_impl/db_impl.cc:901] ------- DUMPING STATS -------
 2025-08-26T17:04:35.776+0200 7fe899ee2700  4 rocksdb: 
 [db/db_impl/db_impl.cc:903]
 ** DB Stats **
 Uptime(secs): 24012063.5 total, 600.0 interval
 Cumulative writes: 10G writes, 37G keys, 10G commit groups, 1.0 writes 
 per commit group, ingest: 19073.99 GB, 0.81 MB/s
 Cumulative WAL: 10G writes, 4860M syncs, 2.14 writes per sync, 
 written: 19073.99 GB, 0.81 MB/s
 Cumulative stall: 00:00:0.000H:M:S, 0.0 percent
 Interval writes: 159K writes, 546K keys, 159K commit groups, 1.0 
 writes per commit group, ingest: 202.96 MB, 0.34 MB/s
 Interval WAL: 159K writes, 76K syncs, 2.10 writes per sync, written: 
 0.20 MB, 0.34 MB/s
 Interval stall: 00:00:0.000H:M:S, 0.0 percent
 ** Compaction Stats [default] **
 Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
 Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
 Comp(cnt) Avg(sec) KeyIn KeyDrop
 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 

   L0      0/0    0.00 KB   0.0      0.0     0.0      0.0 2.4      
 2.4       0.0   1.0      0.0     25.1 98.33             49.99     
 19074    0.005       0      0
   L1      2/0   137.69 MB   1.0    299.7     2.4    297.3 297.9      
 0.7       0.0 123.5     59.6     59.2 5152.28           4745.40      
 4769    1.080   8130M    47M
   L2      7/0   410.80 MB   0.2      1.3     0.2      1.1 1.2      
 0.1       0.3   5.8     79.4     72.1 17.00             15.59         
 4    4.250     32M  3822K
  Sum      9/0   548.48 MB   0.0    301.0     2.6    298.4 301.5      
 3.1       0.3 125.0     58.5     58.6 5267.61           4810.98     
 23847    0.221   8162M    51M
  Int      0/0    0.00 KB   0.0      0.1     0.0      0.1 0.1      
 0.0       0.0 3194.3     48.0     47.9 2.87              2.72         
 2    1.437   5633K   2173
 ** Compaction Stats [default] **
 Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
 Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
 Comp(cnt) Avg(sec) KeyIn KeyDrop
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 

  Low      0/0    0.00 KB   0.0    301.0     2.6    298.4 299.1      
 0.7       0.0   0.0     59.6     59.3 5169.28           4761.00      
 4773    1.083   8162M    51M
 High      0/0    0.00 KB   0.0      0.0     0.0      0.0 2.4      
 2.4       0.0   0.0      0.0     25.1 98.33             49.99     
 19073    0.005       0      0
 User      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      
 0.0       0.0   0.0      0.0     32.8 0.00              0.00         
 1    0.001       0      0

 This gets repeated many times until finally all seems well again.

 This a few minutes later the problem repeats itself.

 Is this a faulty ssd causing this?

 Environment:
 # pveversion: pve-manager/7.4-19/f98bf8d4 (running kernel: 
 5.15.131-2-pve)
 # ceph version 17.2.7 (29dffbfe59476a6bb5363cf5cc629089b25654e3) 
 quincy (stable)
 _______________________________________________
 ceph-users mailing list -- [email protected]
 To unsubscribe send an email to [email protected]

------------------------------

Date: Thu, 28 Aug 2025 18:52:05 +1000
From: Andrew <[email protected]>
Subject: [ceph-users] Debian Packages for Trixie
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hello Team,

I can see that there are official packages for Debian Old-Stable 
(Bookworm) available on the official site: 
https://download.ceph.com/debian-19.2.3/dists/

Does anyone know if there is an approximate (or likely) ETA for the 
Debian Packages to be released for current stable version (Trixie)?

Many thanks,

Andrew

------------------------------

Subject: Digest Footer

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


------------------------------

End of ceph-users Digest, Vol 134, Issue 88
*******************************************
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: ceph-users Digest, Vol 134, Issue 88

Reply via email to