[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-10 Thread ceph
Hello MJ,

Perhaps your PGs are a unbalanced?

Ceph osd df tree

Greetz
Mehmet 

Am 10. Februar 2020 14:58:25 MEZ schrieb lists :
>Hi,
>
>We would like to replace the current seagate ST4000NM0034 HDDs in our 
>ceph cluster with SSDs, and before doing that, we would like to
>checkout 
>the typical usage of our current drives, over the last years, so we can
>
>select the best (price/performance/endurance) SSD to replace them with.
>
>I am trying to extract this info from the fields "Blocks received from 
>initiator" / "blocks sent to initiator", as these are the fields 
>smartctl gets from the seagate disks. But the numbers seem strange, and
>
>I would like to request feedback here.
>
>Three nodes, all equal, 8 OSDs per node, all 4TB ST4000NM0034 
>(filestore) HDDs with SSD-based journals:
>
>> root@node1:~# ceph osd crush tree
>> ID CLASS WEIGHT   TYPE NAME
>> -1   87.35376 root default
>> -2   29.11688 host node1
>>  0   hdd  3.64000 osd.0
>>  1   hdd  3.64000 osd.1
>>  2   hdd  3.63689 osd.2
>>  3   hdd  3.64000 osd.3
>> 12   hdd  3.64000 osd.12
>> 13   hdd  3.64000 osd.13
>> 14   hdd  3.64000 osd.14
>> 15   hdd  3.64000 osd.15
>> -3   29.12000 host node2
>>  4   hdd  3.64000 osd.4
>>  5   hdd  3.64000 osd.5
>>  6   hdd  3.64000 osd.6
>>  7   hdd  3.64000 osd.7
>> 16   hdd  3.64000 osd.16
>> 17   hdd  3.64000 osd.17
>> 18   hdd  3.64000 osd.18
>> 19   hdd  3.64000 osd.19
>> -4   29.11688 host node3
>>  8   hdd  3.64000 osd.8
>>  9   hdd  3.64000 osd.9
>> 10   hdd  3.64000 osd.10
>> 11   hdd  3.64000 osd.11
>> 20   hdd  3.64000 osd.20
>> 21   hdd  3.64000 osd.21
>> 22   hdd  3.64000 osd.22
>> 23   hdd  3.63689 osd.23
>
>We are looking at the numbers from smartctl, and basing our
>calculations 
>on this output for each individual various OSD:
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 3783529066
>>   Blocks received from initiator = 3121186120
>>   Blocks read from cache and sent to initiator = 545427169
>>   Number of read and write commands whose size <= segment size =
>93877358
>>   Number of read and write commands whose size > segment size =
>2290879
>
>I created the following spreadsheet:
>
>>  blocks sent blocks received total blocks
>>  to initiatorfrom initiator  calculated  read%   write%  
>> aka
>> node1
>> osd0 905060564   1900663448  2805724012  32,26%  67,74%  
>> sda
>> osd1 2270442418  3756215880  6026658298  37,67%  62,33%  
>> sdb
>> osd2 3531938448  3940249192  7472187640  47,27%  52,73%  
>> sdc
>> osd3 2824808123  3130655416  5955463539  47,43%  52,57%  
>> sdd
>> osd121956722491  1294854032  3251576523  60,18%  39,82%  
>> sdg
>> osd133410188306  1265443936  4675632242  72,94%  27,06%  
>> sdh
>> osd143765454090  3115079112  6880533202  54,73%  45,27%  
>> sdi
>> osd152272246730  2218847264  4491093994  50,59%  49,41%  
>> sdj
>>  
>> node2
>> osd4 3974937107  740853712   4715790819  84,29%  15,71%  
>> sda
>> osd5 1181377668  2109150744  3290528412  35,90%  64,10%  
>> sdb
>> osd5 1903438106  608869008   2512307114  75,76%  24,24%  
>> sdc
>> osd7 3511170043  724345936   4235515979  82,90%  17,10%  
>> sdd
>> osd162642731906  3981984640  6624716546  39,89%  60,11%  
>> sdg
>> osd173994977805  3703856288  7698834093  51,89%  48,11%  
>> sdh
>> osd183992157229  2096991672  6089148901  65,56%  34,44%  
>> sdi
>> osd19279766405   1053039640  1332806045  20,99%  79,01%  
>> sdj
>>  
>> node3
>> osd8 3711322586  234696960   3946019546  94,05%  5,95%   
>> sda
>> osd9 1203912715  313299  4336902715  27,76%  72,24%  
>> sdb
>> osd10912356010   1681434416  2593790426  35,17%  64,83%  
>> sdc
>> osd11810488345   2626589896  3437078241  23,58%  76,42%  
>> sdd
>> osd201506879946  2421596680  3928476626  38,36%  61,64%  
>> sdg
>> osd212991526593  7525120 2999051713  99,75%  0,25%   
>> sdh
>> osd22295603373226114552  3255674889  0,91%   99,09%  
>> sdi
>> osd232019195656  2563506320  4582701976  44,06%  55,94%  
>> s

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-10 Thread Joe Comeau
try from admin node
 
ceph osd df
ceph osd status
thanks Joe
 

>>>  2/10/2020 10:44 AM >>>
Hello MJ,

Perhaps your PGs are a unbalanced?

Ceph osd df tree

Greetz
Mehmet 

Am 10. Februar 2020 14:58:25 MEZ schrieb lists :
>Hi,
>
>We would like to replace the current seagate ST4000NM0034 HDDs in our 
>ceph cluster with SSDs, and before doing that, we would like to
>checkout 
>the typical usage of our current drives, over the last years, so we can
>
>select the best (price/performance/endurance) SSD to replace them with.
>
>I am trying to extract this info from the fields "Blocks received from 
>initiator" / "blocks sent to initiator", as these are the fields 
>smartctl gets from the seagate disks. But the numbers seem strange, and
>
>I would like to request feedback here.
>
>Three nodes, all equal, 8 OSDs per node, all 4TB ST4000NM0034 
>(filestore) HDDs with SSD-based journals:
>
>> root@node1:~# ceph osd crush tree
>> ID CLASS WEIGHT   TYPE NAME
>> -1  87.35376 root default
>> -2  29.11688 host node1
>>  0   hdd  3.64000 osd.0
>>  1   hdd  3.64000 osd.1
>>  2   hdd  3.63689 osd.2
>>  3   hdd  3.64000 osd.3
>> 12   hdd  3.64000osd.12
>> 13   hdd  3.64000osd.13
>> 14   hdd  3.64000osd.14
>> 15   hdd  3.64000osd.15
>> -3  29.12000 host node2
>>  4   hdd  3.64000 osd.4
>>  5   hdd  3.64000 osd.5
>>  6   hdd  3.64000 osd.6
>>  7   hdd  3.64000 osd.7
>> 16   hdd  3.64000osd.16
>> 17   hdd  3.64000osd.17
>> 18   hdd  3.64000osd.18
>> 19   hdd  3.64000osd.19
>> -4  29.11688 host node3
>>  8   hdd  3.64000 osd.8
>>  9   hdd  3.64000 osd.9
>> 10   hdd  3.64000osd.10
>> 11   hdd  3.64000osd.11
>> 20   hdd  3.64000osd.20
>> 21   hdd  3.64000osd.21
>> 22   hdd  3.64000osd.22
>> 23   hdd  3.63689osd.23
>
>We are looking at the numbers from smartctl, and basing our
>calculations 
>on this output for each individual various OSD:
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 3783529066
>>   Blocks received from initiator = 3121186120
>>   Blocks read from cache and sent to initiator = 545427169
>>   Number of read and write commands whose size <= segment size =
>93877358
>>   Number of read and write commands whose size > segment size =
>2290879
>
>I created the following spreadsheet:
>
>>  blocks sent blocks received total blocks
>>   to initiatorfrom initiatorcalculatedread%write%
>>  aka
>> node1
>> osd0 905060564   1900663448  2805724012  32,26%  67,74%  
>> sda
>> osd1 2270442418  3756215880  6026658298  37,67%  62,33%  
>> sdb
>> osd2 3531938448  3940249192  7472187640  47,27%  52,73%  
>> sdc
>> osd3 2824808123  3130655416  5955463539  47,43%  52,57%  
>> sdd
>> osd121956722491  1294854032  3251576523  60,18%  39,82%  
>> sdg
>> osd133410188306  1265443936  4675632242  72,94%  27,06%  
>> sdh
>> osd143765454090  3115079112  6880533202  54,73%  45,27%  
>> sdi
>> osd152272246730  2218847264  4491093994  50,59%  49,41%  
>> sdj
>>  
>> node2
>> osd4 3974937107  740853712   4715790819  84,29%  15,71%  
>> sda
>> osd5 1181377668  2109150744  3290528412  35,90%  64,10%  
>> sdb
>> osd5 1903438106  608869008   2512307114  75,76%  24,24%  
>> sdc
>> osd7 3511170043  724345936   4235515979  82,90%  17,10%  
>> sdd
>> osd162642731906  3981984640  6624716546  39,89%  60,11%  
>> sdg
>> osd173994977805  3703856288  7698834093  51,89%  48,11%  
>> sdh
>> osd183992157229  2096991672  6089148901  65,56%  34,44%  
>> sdi
>> osd19279766405   1053039640  1332806045  20,99%  79,01%  
>> sdj
>>  
>> node3
>> osd8 3711322586  234696960   3946019546  94,05%  5,95%   
>> sda
>> osd9 1203912715  313299  4336902715  27,76%  72,24%  
>> sdb
>> osd10912356010   1681434416  2593790426  35,17%  64,83%  
>> sdc
>> osd11810488345   2626589896  3437078241  23,58%  76,42%  
>> sdd
>> osd201506879946  2421596680  3928476626  38,36%  61,64%  
>> sdg
>> osd212991526593  7525120 2999051713  99,75%  0,25%   
>> sdh
>> osd22295603373226114552  3255674889  0,91%   99

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-11 Thread lists

Hi Joe and Mehmet!

Thanks for your responses!

The requested outputs at the end of the message.

But to make my question more clear:

What we are actually after, is not about CURRENT usage of our OSDs, but 
stats on total GBs written in the cluster, per OSD, and read/write ratio.


With those numbers, we would be able to identify suitable replacement 
SSDs for our current HDDs, and select specifically for OUR typical use. 
(taking into account endurance, speed, price, etc, etc)


And it seems smartctl on our seagate ST4000NM0034 drives do not give us 
data on total bytes written or read. (...or are we simply not looking in 
the right place..?)


Requested outputs below:


root@node1:~# ceph osd df tree
ID CLASS WEIGHT   REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS TYPE NAME
-1   87.35376- 87.3TiB 49.1TiB 38.2TiB 56.22 1.00   - root default
-2   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.23 1.00   - host node1
 0   hdd  3.64000  1.0 3.64TiB 2.01TiB 1.62TiB 55.34 0.98 137 osd.0
 1   hdd  3.64000  1.0 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141 osd.1
 2   hdd  3.63689  1.0 3.64TiB 1.92TiB 1.72TiB 52.79 0.94 128 osd.2
 3   hdd  3.64000  1.0 3.64TiB 2.07TiB 1.57TiB 56.90 1.01 143 osd.3
12   hdd  3.64000  1.0 3.64TiB 2.15TiB 1.48TiB 59.18 1.05 138 osd.12
13   hdd  3.64000  1.0 3.64TiB 1.99TiB 1.64TiB 54.80 0.97 131 osd.13
14   hdd  3.64000  1.0 3.64TiB 1.93TiB 1.70TiB 53.13 0.94 127 osd.14
15   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.10 1.07 143 osd.15
-3   29.12000- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host node2
 4   hdd  3.64000  1.0 3.64TiB 2.11TiB 1.53TiB 57.97 1.03 142 osd.4
 5   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.11 0.96 134 osd.5
 6   hdd  3.64000  1.0 3.64TiB 2.12TiB 1.51TiB 58.40 1.04 142 osd.6
 7   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.66TiB 54.28 0.97 128 osd.7
16   hdd  3.64000  1.0 3.64TiB 2.00TiB 1.64TiB 54.90 0.98 133 osd.16
17   hdd  3.64000  1.0 3.64TiB 2.33TiB 1.30TiB 64.14 1.14 153 osd.17
18   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.07 0.96 132 osd.18
19   hdd  3.64000  1.0 3.64TiB 1.89TiB 1.75TiB 51.93 0.92 124 osd.19
-4   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host node3
 8   hdd  3.64000  1.0 3.64TiB 1.79TiB 1.85TiB 49.24 0.88 123 osd.8
 9   hdd  3.64000  1.0 3.64TiB 2.17TiB 1.47TiB 59.72 1.06 144 osd.9
10   hdd  3.64000  1.0 3.64TiB 2.40TiB 1.24TiB 65.88 1.17 157 osd.10
11   hdd  3.64000  1.0 3.64TiB 2.06TiB 1.58TiB 56.64 1.01 133 osd.11
20   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.23 1.07 148 osd.20
21   hdd  3.64000  1.0 3.64TiB 1.74TiB 1.90TiB 47.80 0.85 115 osd.21
22   hdd  3.64000  1.0 3.64TiB 2.05TiB 1.59TiB 56.27 1.00 138 osd.22
23   hdd  3.63689  1.0 3.64TiB 1.96TiB 1.67TiB 54.01 0.96 130 osd.23
 TOTAL 87.3TiB 49.1TiB 38.2TiB 56.22
MIN/MAX VAR: 0.85/1.17  STDDEV: 4.08
root@node1:~# ceph osd status
++--+---+---++-++-+---+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
++--+---+---++-++-+---+
| 0  | node1  | 2061G | 1663G |   38   |  5168k  |3   |  1491k  | exists,up 
|
| 1  | node1  | 2143G | 1580G |4   |  1092k  |9   |  2243k  | exists,up 
|
| 2  | node1  | 1965G | 1758G |   20   |  3643k  |5   |  1758k  | exists,up 
|
| 3  | node1  | 2119G | 1605G |   17   |  99.5k  |4   |  3904k  | exists,up 
|
| 4  | node2  | 2158G | 1565G |   12   |   527k  |1   |  2632k  | exists,up 
|
| 5  | node2  | 2014G | 1709G |   15   |   239k  |0   |   889k  | exists,up 
|
| 6  | node2  | 2174G | 1549G |   11   |  1677k  |5   |  1931k  | exists,up 
|
| 7  | node2  | 2021G | 1702G |2   |   597k  |0   |  1638k  | exists,up 
|
| 8  | node3  | 1833G | 1890G |4   |   564k  |4   |  5595k  | exists,up 
|
| 9  | node3  | 2223G | 1500G |6   |  1124k  |   10   |  4864k  | exists,up 
|
| 10 | node3  | 2453G | 1270G |8   |  1257k  |3   |  1447k  | exists,up 
|
| 11 | node3  | 2109G | 1614G |   14   |  2889k  |3   |  1449k  | exists,up 
|
| 12 | node1  | 2204G | 1520G |   17   |  1596k  |4   |  1806k  | exists,up 
|
| 13 | node1  | 2040G | 1683G |   15   |  2526k  |0   |   819k  | exists,up 
|
| 14 | node1  | 1978G | 1745G |   11   |  1713k  |8   |  3489k  | exists,up 
|
| 15 | node1  | 2238G | 1485G |   25   |  5151k  |5   |  2715k  | exists,up 
|
| 16 | node2  | 2044G | 1679G |2   |  43.3k  |1   |  3371k  | exists,up 
|
| 17 | node2  | 2388G | 1335G |   14   |  1736k  |9   |  5315k  | exists,up 
|
| 18 | node2  | 2013G | 1710G |8   |  1907k  |2   |  2004k  | exists,up 
|
| 19 | node2

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-11 Thread Muhammad Ahmad
>>>And it seems smartctl on our seagate ST4000NM0034 drives do not give us
data on total bytes written or read

If it's a SAS device, it's not always obvious where to find this information.

You can use Seagate's openseachest toolset.

For any (SAS/SATA, HDD/SSD) device, the --deviceInfo will give you
some of the info you are looking for; e.g.

sudo ./openSeaChest_Info -d /dev/sg1 --deviceInfo | grep Total
Total Bytes Read (TB): 82.46
Total Bytes Written (TB): 311.56



On Tue, Feb 11, 2020 at 3:10 AM lists  wrote:
>
> Hi Joe and Mehmet!
>
> Thanks for your responses!
>
> The requested outputs at the end of the message.
>
> But to make my question more clear:
>
> What we are actually after, is not about CURRENT usage of our OSDs, but
> stats on total GBs written in the cluster, per OSD, and read/write ratio.
>
> With those numbers, we would be able to identify suitable replacement
> SSDs for our current HDDs, and select specifically for OUR typical use.
> (taking into account endurance, speed, price, etc, etc)
>
> And it seems smartctl on our seagate ST4000NM0034 drives do not give us
> data on total bytes written or read. (...or are we simply not looking in
> the right place..?)
>
> Requested outputs below:
>
> > root@node1:~# ceph osd df tree
> > ID CLASS WEIGHT   REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS TYPE NAME
> > -1   87.35376- 87.3TiB 49.1TiB 38.2TiB 56.22 1.00   - root 
> > default
> > -2   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.23 1.00   - host 
> > node1
> >  0   hdd  3.64000  1.0 3.64TiB 2.01TiB 1.62TiB 55.34 0.98 137 
> > osd.0
> >  1   hdd  3.64000  1.0 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141 
> > osd.1
> >  2   hdd  3.63689  1.0 3.64TiB 1.92TiB 1.72TiB 52.79 0.94 128 
> > osd.2
> >  3   hdd  3.64000  1.0 3.64TiB 2.07TiB 1.57TiB 56.90 1.01 143 
> > osd.3
> > 12   hdd  3.64000  1.0 3.64TiB 2.15TiB 1.48TiB 59.18 1.05 138 
> > osd.12
> > 13   hdd  3.64000  1.0 3.64TiB 1.99TiB 1.64TiB 54.80 0.97 131 
> > osd.13
> > 14   hdd  3.64000  1.0 3.64TiB 1.93TiB 1.70TiB 53.13 0.94 127 
> > osd.14
> > 15   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.10 1.07 143 
> > osd.15
> > -3   29.12000- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host 
> > node2
> >  4   hdd  3.64000  1.0 3.64TiB 2.11TiB 1.53TiB 57.97 1.03 142 
> > osd.4
> >  5   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.11 0.96 134 
> > osd.5
> >  6   hdd  3.64000  1.0 3.64TiB 2.12TiB 1.51TiB 58.40 1.04 142 
> > osd.6
> >  7   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.66TiB 54.28 0.97 128 
> > osd.7
> > 16   hdd  3.64000  1.0 3.64TiB 2.00TiB 1.64TiB 54.90 0.98 133 
> > osd.16
> > 17   hdd  3.64000  1.0 3.64TiB 2.33TiB 1.30TiB 64.14 1.14 153 
> > osd.17
> > 18   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.07 0.96 132 
> > osd.18
> > 19   hdd  3.64000  1.0 3.64TiB 1.89TiB 1.75TiB 51.93 0.92 124 
> > osd.19
> > -4   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host 
> > node3
> >  8   hdd  3.64000  1.0 3.64TiB 1.79TiB 1.85TiB 49.24 0.88 123 
> > osd.8
> >  9   hdd  3.64000  1.0 3.64TiB 2.17TiB 1.47TiB 59.72 1.06 144 
> > osd.9
> > 10   hdd  3.64000  1.0 3.64TiB 2.40TiB 1.24TiB 65.88 1.17 157 
> > osd.10
> > 11   hdd  3.64000  1.0 3.64TiB 2.06TiB 1.58TiB 56.64 1.01 133 
> > osd.11
> > 20   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.23 1.07 148 
> > osd.20
> > 21   hdd  3.64000  1.0 3.64TiB 1.74TiB 1.90TiB 47.80 0.85 115 
> > osd.21
> > 22   hdd  3.64000  1.0 3.64TiB 2.05TiB 1.59TiB 56.27 1.00 138 
> > osd.22
> > 23   hdd  3.63689  1.0 3.64TiB 1.96TiB 1.67TiB 54.01 0.96 130 
> > osd.23
> >  TOTAL 87.3TiB 49.1TiB 38.2TiB 56.22
> > MIN/MAX VAR: 0.85/1.17  STDDEV: 4.08
> > root@node1:~# ceph osd status
> > ++--+---+---++-++-+---+
> > | id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state 
> >   |
> > ++--+---+---++-++-+---+
> > | 0  | node1  | 2061G | 1663G |   38   |  5168k  |3   |  1491k  | 
> > exists,up |
> > | 1  | node1  | 2143G | 1580G |4   |  1092k  |9   |  2243k  | 
> > exists,up |
> > | 2  | node1  | 1965G | 1758G |   20   |  3643k  |5   |  1758k  | 
> > exists,up |
> > | 3  | node1  | 2119G | 1605G |   17   |  99.5k  |4   |  3904k  | 
> > exists,up |
> > | 4  | node2  | 2158G | 1565G |   12   |   527k  |1   |  2632k  | 
> > exists,up |
> > | 5  | node2  | 2014G | 1709G |   15   |   239k  |0   |   889k  | 
> > exists,up |
> > | 6  | node2  | 2174G | 1549G |   11   |  1677k  |5   |  1931k  | 
> > exists,up |
> > | 7  | node2  | 2021G | 1702G |2   |   597k  |0   |  1638k  | 
> > exists,up |
> > | 8  | node3  | 1833G

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-12 Thread mj

Hi Muhammad,

Yes, that tool helps! Thank you for pointing it out!

With a combination of openSeaChest_Info and smartctl I was able to 
extract the following stats of our cluster, and the numbers are very 
surprising to me. I hope someone here can explain the what we see below:



node1   AnnualWrkload   ReadWritten Power On Hours  
osd0 93.14  318.7919.48 31815.65
osd1 94.38  322.6720.11 31815.42
osd2 41.08   38.9511.33 10722.47new disk
osd3 94.56  323.9819.45 31815.35
osd12   124.20  340.1120.09 25406.73
osd13   112.43  308.1817.88 25405.72
sdb14   120.67  330.9619.01 25405.65
osd15   105.59  287.7818.45 25405.90
ssd journal   0.46  1643.58 31813.00

node2   
osd4697.75  2390 151.23 31864.88(2.39PB)
osd5677.74  2320 144.94 31864.68(2.32PB)
osd6687.13  2340 157.11 31865.05(2.34PB)
osd7619.19  2100 151.08 31864.67(2.10PB)
osd16   827.57  2260 142.81 25405.93(2.26PB)
osd17   996.03  2720 167.97 25405.87(2.72PB)
osd18   809.36  2210 137.96 25405.82(2.21PB)
osd19   844.06  2300 146.84 25405.90(2.30PB)
ssd journal 0.461637.60 31862.00

node3   
osd8 75.30  258.7914.67 31813.67
osd9 77.30  264.8715.85 31813.68
osd1082.32  282.4316.53 31813.60
osd1182.26  282.7216.01 31813.73
osd2096.86  265.2515.65 25404.37
osd2193.18  256.1114.12 25404.22
osd22   108.43  298.2916.15 25404.23
osd2330.80   33.6110.78 12625.07new disk
ssd journal   0.46  1644.83 31811.00

AnnualWrkload = Annualized Workload Rate (TB/year)
Read = Total Bytes Read (TB)
Written = Total Bytes Written (TB)
Power On Hours = hours the drive has been used

From the numbers above, it seems the OSDs on node2 are used INCREDIBLY 
much more than those on the other two nodes. The numbers for node2 are 
even reported in PB, and the other nodes in TB. (converted to TB using 
https://www.gbmb.org/pb-to-tb, to make sure there are no conversion errors)


However, SSD journal usage across the three nodes looks similar.

All OSDs have the same weight:

root@node2:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF 
-1   87.35376 root default   
-2   29.11688 host pm1   
 0   hdd  3.64000 osd.0  up  1.0 1.0 
 1   hdd  3.64000 osd.1  up  1.0 1.0 
 2   hdd  3.63689 osd.2  up  1.0 1.0 
 3   hdd  3.64000 osd.3  up  1.0 1.0 
12   hdd  3.64000 osd.12 up  1.0 1.0 
13   hdd  3.64000 osd.13 up  1.0 1.0 
14   hdd  3.64000 osd.14 up  1.0 1.0 
15   hdd  3.64000 osd.15 up  1.0 1.0 
-3   29.12000 host pm2   
 4   hdd  3.64000 osd.4  up  1.0 1.0 
 5   hdd  3.64000 osd.5  up  1.0 1.0 
 6   hdd  3.64000 osd.6  up  1.0 1.0 
 7   hdd  3.64000 osd.7  up  1.0 1.0 
16   hdd  3.64000 osd.16 up  1.0 1.0 
17   hdd  3.64000 osd.17 up  1.0 1.0 
18   hdd  3.64000 osd.18 up  1.0 1.0 
19   hdd  3.64000 osd.19 up  1.0 1.0 
-4   29.11688 host pm3   
 8   hdd  3.64000 osd.8  up  1.0 1.0 
 9   hdd  3.64000 osd.9  up  1.0 1.0 
10   hdd  3.64000 osd.10 up  1.0 1.0 
11   hdd  3.64000 osd.11 up  1.0 1.0 
20   hdd  3.64000 osd.20 up  1.0 1.0 
21   hdd  3.64000 osd.21 up  1.0 1.0 
22   hdd  3.64000 osd.22 up  1.0 1.0 
23   hdd  3.63689 osd.23 up  1.0 1.0


Disk usage also looks ok:

root@pm2:~# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS 
 0   hdd 3.64000  1.0 3.64TiB 2.01TiB 1.62TiB 55.34 0.98 137 
 1   hdd 3.64000  1.0 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141 
 2   hdd 3.63689  1.0 3.64TiB 1.92TiB 1.72TiB 52.79 0.94 128 
 3   hdd 3.64000  1.0 3.64TiB 2.07TiB 1.57TiB 56.90 1.01 143 
12  

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-12 Thread mj




On 2/12/20 11:23 AM, mj wrote:

Better layout for the disks usage stats:

https://pastebin.com/8V5VDXNt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-13 Thread mj

Hi,

I would like to understand why the OSD HDDs on node2 of my three 
identical ceph hosts claim to have processed 10 times more reads/writes 
than the other two nodes.


OSD weights are all the similar, disk usage in space also, same disk 
sizes, same reported disk usage hours etc, etc. All data (QEMU VMs) is 
in one large 3/2 pool, called "ceph-storage". This is on ceph version 
12.2.10.


The spreadsheet with all the stats:
https://docs.google.com/spreadsheets/d/1n8aOC1tpPPMi2iALhxfHzSQTRmfIz6wCERlagNShVss/edit?usp=sharing

I hope the information is now both complete and readable, let me know if 
anything else is needed.


Curious for any insights.

The previously requested outputs below.

root@node1:~# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 3.64000  1.0 3.64TiB 2.01TiB 1.62TiB 55.35 0.98 137
 1   hdd 3.64000  1.0 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141
 2   hdd 3.63689  1.0 3.64TiB 1.92TiB 1.72TiB 52.80 0.94 128
 3   hdd 3.64000  1.0 3.64TiB 2.07TiB 1.57TiB 56.91 1.01 143
12   hdd 3.64000  1.0 3.64TiB 2.15TiB 1.48TiB 59.19 1.05 138
13   hdd 3.64000  1.0 3.64TiB 1.99TiB 1.64TiB 54.81 0.97 131
14   hdd 3.64000  1.0 3.64TiB 1.93TiB 1.70TiB 53.14 0.94 127
15   hdd 3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.11 1.07 143
 4   hdd 3.64000  1.0 3.64TiB 2.11TiB 1.53TiB 57.98 1.03 142
 5   hdd 3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.11 0.96 134
 6   hdd 3.64000  1.0 3.64TiB 2.12TiB 1.51TiB 58.41 1.04 142
 7   hdd 3.64000  1.0 3.64TiB 1.97TiB 1.66TiB 54.29 0.97 128
16   hdd 3.64000  1.0 3.64TiB 2.00TiB 1.64TiB 54.90 0.98 133
17   hdd 3.64000  1.0 3.64TiB 2.33TiB 1.30TiB 64.15 1.14 153
18   hdd 3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.08 0.96 132
19   hdd 3.64000  1.0 3.64TiB 1.89TiB 1.75TiB 51.94 0.92 124
 8   hdd 3.64000  1.0 3.64TiB 1.79TiB 1.85TiB 49.25 0.88 123
 9   hdd 3.64000  1.0 3.64TiB 2.17TiB 1.46TiB 59.73 1.06 144
10   hdd 3.64000  1.0 3.64TiB 2.40TiB 1.24TiB 65.89 1.17 157
11   hdd 3.64000  1.0 3.64TiB 2.06TiB 1.58TiB 56.65 1.01 133
20   hdd 3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.24 1.07 148
21   hdd 3.64000  1.0 3.64TiB 1.74TiB 1.90TiB 47.80 0.85 115
22   hdd 3.64000  1.0 3.64TiB 2.05TiB 1.59TiB 56.28 1.00 138
23   hdd 3.63689  1.0 3.64TiB 1.96TiB 1.67TiB 54.02 0.96 130
TOTAL 87.3TiB 49.1TiB 38.2TiB 56.23
MIN/MAX VAR: 0.85/1.17  STDDEV: 4.08

and

root@node1:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME  STATUS REWEIGHT PRI-AFF
-1   87.35376 root default
-2   29.11688 host node1
 0   hdd  3.64000 osd.0  up  1.0 1.0
 1   hdd  3.64000 osd.1  up  1.0 1.0
 2   hdd  3.63689 osd.2  up  1.0 1.0
 3   hdd  3.64000 osd.3  up  1.0 1.0
12   hdd  3.64000 osd.12 up  1.0 1.0
13   hdd  3.64000 osd.13 up  1.0 1.0
14   hdd  3.64000 osd.14 up  1.0 1.0
15   hdd  3.64000 osd.15 up  1.0 1.0
-3   29.12000 host node2
 4   hdd  3.64000 osd.4  up  1.0 1.0
 5   hdd  3.64000 osd.5  up  1.0 1.0
 6   hdd  3.64000 osd.6  up  1.0 1.0
 7   hdd  3.64000 osd.7  up  1.0 1.0
16   hdd  3.64000 osd.16 up  1.0 1.0
17   hdd  3.64000 osd.17 up  1.0 1.0
18   hdd  3.64000 osd.18 up  1.0 1.0
19   hdd  3.64000 osd.19 up  1.0 1.0
-4   29.11688 host node3
 8   hdd  3.64000 osd.8  up  1.0 1.0
 9   hdd  3.64000 osd.9  up  1.0 1.0
10   hdd  3.64000 osd.10 up  1.0 1.0
11   hdd  3.64000 osd.11 up  1.0 1.0
20   hdd  3.64000 osd.20 up  1.0 1.0
21   hdd  3.64000 osd.21 up  1.0 1.0
22   hdd  3.64000 osd.22 up  1.0 1.0
23   hdd  3.63689 osd.23 up  1.0 1.0

MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io