It happened again today:

2021-09-15 04:25:20.551098 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 
pools nearfull)
2021-09-15 04:19:01.512425 [INF]  Health check cleared: POOL_FULL (was: 1 pools 
full)
2021-09-15 04:19:01.512389 [WRN]  Health check failed: 1 pools nearfull 
(POOL_NEAR_FULL)
2021-09-15 04:18:05.015251 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 
pools nearfull)
2021-09-15 04:18:05.015217 [ERR]  Health check failed: 1 pools full (POOL_FULL)
2021-09-15 04:13:45.312115 [WRN]  Health check failed: 1 pools nearfull 
(POOL_NEAR_FULL) 

During this time, we are running snapshot rotation on RBD images. Could this 
have anything to do with it?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <fr...@dtu.dk>
Sent: 13 September 2021 12:20
To: ceph-users
Subject: [ceph-users] Health check failed: 1 pools ful

Hi all,

I recently had a strange blip in the ceph logs:

2021-09-09 04:19:09.612111 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 
pools nearfull)
2021-09-09 04:13:18.187602 [INF]  Health check cleared: POOL_FULL (was: 1 pools 
full)
2021-09-09 04:13:18.187566 [WRN]  Health check failed: 1 pools nearfull 
(POOL_NEAR_FULL)
2021-09-09 04:12:09.078878 [INF]  Health check cleared: POOL_NEAR_FULL (was: 1 
pools nearfull)
2021-09-09 04:12:09.078850 [ERR]  Health check failed: 1 pools full (POOL_FULL)
2021-09-09 04:08:16.898112 [WRN]  Health check failed: 1 pools nearfull 
(POOL_NEAR_FULL)

None of our pools are anywhere near full or close to their quotas:

# ceph df detail
GLOBAL:
    SIZE       AVAIL       RAW USED     %RAW USED     OBJECTS
    11 PiB     9.6 PiB      1.8 PiB         16.11     845.1 M
POOLS:
    NAME                     ID     QUOTA OBJECTS     QUOTA BYTES     USED      
  %USED     MAX AVAIL     OBJECTS       DIRTY       READ        WRITE       RAW 
USED
    sr-rbd-meta-one          1      N/A               500 GiB          90 GiB   
   0.21        41 TiB         31558     31.56 k     799 MiB     338 MiB      
270 GiB
    sr-rbd-data-one          2      N/A               70 TiB           36 TiB   
  27.96        93 TiB      13966792     13.97 M     4.2 GiB     2.5 GiB       
48 TiB
    sr-rbd-one-stretch       3      N/A               1 TiB           222 GiB   
   0.52        41 TiB         68813     68.81 k     863 MiB     860 MiB      
667 GiB
    con-rbd-meta-hpc-one     7      N/A               10 GiB           51 KiB   
      0       1.7 TiB            61         61      7.0 MiB     3.8 MiB      
154 KiB
    con-rbd-data-hpc-one     8      N/A               5 TiB            35 GiB   
      0       5.9 PiB          9245      9.24 k     144 MiB      78 MiB       
44 GiB
    sr-rbd-data-one-hdd      11     N/A               200 TiB         118 TiB   
  39.90       177 TiB      31460630     31.46 M      14 GiB     2.2 GiB      
157 TiB
    con-fs2-meta1            12     N/A               250 GiB         2.0 GiB   
   0.15       1.3 TiB      18045470     18.05 M      20 MiB     108 MiB      
7.9 GiB
    con-fs2-meta2            13     N/A               100 GiB             0 B   
      0       1.3 TiB     216425275     216.4 M     141 KiB     7.9 MiB         
 0 B
    con-fs2-data             14     N/A               2.0 PiB         1.3 PiB   
  18.41       5.9 PiB     541502957     541.5 M     4.9 GiB     5.0 GiB      
1.7 PiB
    con-fs2-data-ec-ssd      17     N/A               1 TiB           239 GiB   
   5.29       4.2 TiB       3225690      3.23 M      17 MiB         0 B      
299 GiB
    ms-rbd-one               18     N/A               1 TiB           262 GiB   
   0.62        41 TiB         73711     73.71 k     4.8 MiB     1.5 GiB      
786 GiB
    con-fs2-data2            19     N/A               5 PiB            29 TiB   
   0.52       5.4 PiB      20322725     20.32 M      83 MiB      97 MiB       
39 TiB

I'm not sure if IO stopped, it does not look like it. The blip might have been 
artificial. I could not find any information about which pool(s) was causing 
this.

We are running ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) 
mimic (stable).

Any ideas what is going on or if this could be a problem?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to