Hi Muhammad,

Yes, that tool helps! Thank you for pointing it out!

With a combination of openSeaChest_Info and smartctl I was able to extract the following stats of our cluster, and the numbers are very surprising to me. I hope someone here can explain the what we see below:

node1   AnnualWrkload   Read    Written         Power On Hours  
osd0     93.14          318.79    19.48         31815.65        
osd1     94.38          322.67    20.11         31815.42        
osd2     41.08           38.95    11.33         10722.47        new disk
osd3     94.56          323.98    19.45         31815.35        
osd12   124.20          340.11    20.09         25406.73        
osd13   112.43          308.18    17.88         25405.72        
sdb14   120.67          330.96    19.01         25405.65        
osd15   105.59          287.78    18.45         25405.90        
ssd journal               0.46  1643.58         31813.00        
                                        
node2                                   
osd4    697.75          2390     151.23         31864.88        (2.39PB)
osd5    677.74          2320     144.94         31864.68        (2.32PB)
osd6    687.13          2340     157.11         31865.05        (2.34PB)
osd7    619.19          2100     151.08         31864.67        (2.10PB)
osd16   827.57          2260     142.81         25405.93        (2.26PB)
osd17   996.03          2720     167.97         25405.87        (2.72PB)
osd18   809.36          2210     137.96         25405.82        (2.21PB)
osd19   844.06          2300     146.84         25405.90        (2.30PB)
ssd journal             0.46    1637.60         31862.00        
                                        
node3                                   
osd8     75.30          258.79    14.67         31813.67        
osd9     77.30          264.87    15.85         31813.68        
osd10    82.32          282.43    16.53         31813.60        
osd11    82.26          282.72    16.01         31813.73        
osd20    96.86          265.25    15.65         25404.37        
osd21    93.18          256.11    14.12         25404.22        
osd22   108.43          298.29    16.15         25404.23        
osd23    30.80           33.61    10.78         12625.07        new disk
ssd journal               0.46  1644.83         31811.00        
AnnualWrkload = Annualized Workload Rate (TB/year)
Read = Total Bytes Read (TB)
Written = Total Bytes Written (TB)
Power On Hours = hours the drive has been used

From the numbers above, it seems the OSDs on node2 are used INCREDIBLY much more than those on the other two nodes. The numbers for node2 are even reported in PB, and the other nodes in TB. (converted to TB using https://www.gbmb.org/pb-to-tb, to make sure there are no conversion errors)

However, SSD journal usage across the three nodes looks similar.

All OSDs have the same weight:
root@node2:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 87.35376 root default -2 29.11688 host pm1 0 hdd 3.64000 osd.0 up 1.00000 1.00000 1 hdd 3.64000 osd.1 up 1.00000 1.00000 2 hdd 3.63689 osd.2 up 1.00000 1.00000 3 hdd 3.64000 osd.3 up 1.00000 1.00000 12 hdd 3.64000 osd.12 up 1.00000 1.00000 13 hdd 3.64000 osd.13 up 1.00000 1.00000 14 hdd 3.64000 osd.14 up 1.00000 1.00000 15 hdd 3.64000 osd.15 up 1.00000 1.00000 -3 29.12000 host pm2 4 hdd 3.64000 osd.4 up 1.00000 1.00000 5 hdd 3.64000 osd.5 up 1.00000 1.00000 6 hdd 3.64000 osd.6 up 1.00000 1.00000 7 hdd 3.64000 osd.7 up 1.00000 1.00000 16 hdd 3.64000 osd.16 up 1.00000 1.00000 17 hdd 3.64000 osd.17 up 1.00000 1.00000 18 hdd 3.64000 osd.18 up 1.00000 1.00000 19 hdd 3.64000 osd.19 up 1.00000 1.00000 -4 29.11688 host pm3 8 hdd 3.64000 osd.8 up 1.00000 1.00000 9 hdd 3.64000 osd.9 up 1.00000 1.00000 10 hdd 3.64000 osd.10 up 1.00000 1.00000 11 hdd 3.64000 osd.11 up 1.00000 1.00000 20 hdd 3.64000 osd.20 up 1.00000 1.00000 21 hdd 3.64000 osd.21 up 1.00000 1.00000 22 hdd 3.64000 osd.22 up 1.00000 1.00000 23 hdd 3.63689 osd.23 up 1.00000 1.00000

Disk usage also looks ok:
root@pm2:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 3.64000 1.00000 3.64TiB 2.01TiB 1.62TiB 55.34 0.98 137 1 hdd 3.64000 1.00000 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141 2 hdd 3.63689 1.00000 3.64TiB 1.92TiB 1.72TiB 52.79 0.94 128 3 hdd 3.64000 1.00000 3.64TiB 2.07TiB 1.57TiB 56.90 1.01 143 12 hdd 3.64000 1.00000 3.64TiB 2.15TiB 1.48TiB 59.18 1.05 138 13 hdd 3.64000 1.00000 3.64TiB 1.99TiB 1.64TiB 54.80 0.97 131 14 hdd 3.64000 1.00000 3.64TiB 1.93TiB 1.70TiB 53.13 0.94 127 15 hdd 3.64000 1.00000 3.64TiB 2.19TiB 1.45TiB 60.10 1.07 143 4 hdd 3.64000 1.00000 3.64TiB 2.11TiB 1.53TiB 57.97 1.03 142 5 hdd 3.64000 1.00000 3.64TiB 1.97TiB 1.67TiB 54.11 0.96 134 6 hdd 3.64000 1.00000 3.64TiB 2.12TiB 1.51TiB 58.40 1.04 142 7 hdd 3.64000 1.00000 3.64TiB 1.97TiB 1.66TiB 54.28 0.97 128 16 hdd 3.64000 1.00000 3.64TiB 2.00TiB 1.64TiB 54.90 0.98 133 17 hdd 3.64000 1.00000 3.64TiB 2.33TiB 1.30TiB 64.14 1.14 153 18 hdd 3.64000 1.00000 3.64TiB 1.97TiB 1.67TiB 54.07 0.96 132 19 hdd 3.64000 1.00000 3.64TiB 1.89TiB 1.75TiB 51.94 0.92 124 8 hdd 3.64000 1.00000 3.64TiB 1.79TiB 1.85TiB 49.24 0.88 123 9 hdd 3.64000 1.00000 3.64TiB 2.17TiB 1.46TiB 59.72 1.06 144 10 hdd 3.64000 1.00000 3.64TiB 2.40TiB 1.24TiB 65.88 1.17 157 11 hdd 3.64000 1.00000 3.64TiB 2.06TiB 1.58TiB 56.64 1.01 133 20 hdd 3.64000 1.00000 3.64TiB 2.19TiB 1.45TiB 60.23 1.07 148 21 hdd 3.64000 1.00000 3.64TiB 1.74TiB 1.90TiB 47.80 0.85 115 22 hdd 3.64000 1.00000 3.64TiB 2.05TiB 1.59TiB 56.27 1.00 138 23 hdd 3.63689 1.00000 3.64TiB 1.96TiB 1.67TiB 54.01 0.96 130 TOTAL 87.3TiB 49.1TiB 38.2TiB 56.23 MIN/MAX VAR: 0.85/1.17 STDDEV: 4.08

The cluster is HEALTH_OK and seems to be  working fine.

When comparing "iostat -x 1" between node2 and the other two, we see similar %util for all OSDs across all nodes.

How can the reported disk stats for node2 be SO different than the other two nodes, whereas for the rest everything seems to be running as it should?

Or are we missing something?

Thanks!

MJ
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to