Hi,
We would like to replace the current seagate ST4000NM0034 HDDs in our
ceph cluster with SSDs, and before doing that, we would like to checkout
the typical usage of our current drives, over the last years, so we can
select the best (price/performance/endurance) SSD to replace them with.
I am trying to extract this info from the fields "Blocks received from
initiator" / "blocks sent to initiator", as these are the fields
smartctl gets from the seagate disks. But the numbers seem strange, and
I would like to request feedback here.
Three nodes, all equal, 8 OSDs per node, all 4TB ST4000NM0034
(filestore) HDDs with SSD-based journals:
root@node1:~# ceph osd crush tree
ID CLASS WEIGHT TYPE NAME
-1 87.35376 root default
-2 29.11688 host node1
0 hdd 3.64000 osd.0
1 hdd 3.64000 osd.1
2 hdd 3.63689 osd.2
3 hdd 3.64000 osd.3
12 hdd 3.64000 osd.12
13 hdd 3.64000 osd.13
14 hdd 3.64000 osd.14
15 hdd 3.64000 osd.15
-3 29.12000 host node2
4 hdd 3.64000 osd.4
5 hdd 3.64000 osd.5
6 hdd 3.64000 osd.6
7 hdd 3.64000 osd.7
16 hdd 3.64000 osd.16
17 hdd 3.64000 osd.17
18 hdd 3.64000 osd.18
19 hdd 3.64000 osd.19
-4 29.11688 host node3
8 hdd 3.64000 osd.8
9 hdd 3.64000 osd.9
10 hdd 3.64000 osd.10
11 hdd 3.64000 osd.11
20 hdd 3.64000 osd.20
21 hdd 3.64000 osd.21
22 hdd 3.64000 osd.22
23 hdd 3.63689 osd.23
We are looking at the numbers from smartctl, and basing our calculations
on this output for each individual various OSD:
Vendor (Seagate) cache information
Blocks sent to initiator = 3783529066
Blocks received from initiator = 3121186120
Blocks read from cache and sent to initiator = 545427169
Number of read and write commands whose size <= segment size = 93877358
Number of read and write commands whose size > segment size = 2290879
I created the following spreadsheet:
blocks sent blocks received total blocks
to initiator from initiator calculated read% write%
aka
node1
osd0 905060564 1900663448 2805724012 32,26% 67,74%
sda
osd1 2270442418 3756215880 6026658298 37,67% 62,33%
sdb
osd2 3531938448 3940249192 7472187640 47,27% 52,73%
sdc
osd3 2824808123 3130655416 5955463539 47,43% 52,57%
sdd
osd12 1956722491 1294854032 3251576523 60,18% 39,82%
sdg
osd13 3410188306 1265443936 4675632242 72,94% 27,06%
sdh
osd14 3765454090 3115079112 6880533202 54,73% 45,27%
sdi
osd15 2272246730 2218847264 4491093994 50,59% 49,41%
sdj
node2
osd4 3974937107 740853712 4715790819 84,29% 15,71%
sda
osd5 1181377668 2109150744 3290528412 35,90% 64,10%
sdb
osd5 1903438106 608869008 2512307114 75,76% 24,24%
sdc
osd7 3511170043 724345936 4235515979 82,90% 17,10%
sdd
osd16 2642731906 3981984640 6624716546 39,89% 60,11%
sdg
osd17 3994977805 3703856288 7698834093 51,89% 48,11%
sdh
osd18 3992157229 2096991672 6089148901 65,56% 34,44%
sdi
osd19 279766405 1053039640 1332806045 20,99% 79,01%
sdj
node3
osd8 3711322586 234696960 3946019546 94,05% 5,95%
sda
osd9 1203912715 3132990000 4336902715 27,76% 72,24%
sdb
osd10 912356010 1681434416 2593790426 35,17% 64,83%
sdc
osd11 810488345 2626589896 3437078241 23,58% 76,42%
sdd
osd20 1506879946 2421596680 3928476626 38,36% 61,64%
sdg
osd21 2991526593 7525120 2999051713 99,75% 0,25%
sdh
osd22 29560337 3226114552 3255674889 0,91% 99,09%
sdi
osd23 2019195656 2563506320 4582701976 44,06% 55,94%
sdj
But as can be seen above, this results in some very strange numbers, for
example node3/osd21 and node2/osd19, node3/osd8, the numbers are unlikely.
So, probably we're doing something wrong in our logic here.
Can someone explain what we're doing wrong, and is it possible to obtain
stats like these also from ceph directly? Does ceph keep historical
stats like above..?
MJ
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io