[ceph-users] Re: cephfs slow, howto investigate and tune mds configuration?

2020-02-11 Thread Marc Roos
Thanks Samy I will give this a try. 

It would be helpful if there is some value that shows cache misses or 
so, so you have a more precise idea with how much you need to increase 
the cache. I have now added a couple of GB's see if it is being used and 
helps speed up things.

PS. I have been looking at the mds with 'ceph daemonperf mds.a'


-Original Message-
From: Samy Ascha [mailto:s...@xel.nl] 
Sent: 11 February 2020 17:10
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] cephfs slow, howto investigate and tune mds 
configuration?



> 
> 
> Say I think my cephfs is slow when I rsync to it, slower than it used 
> to be. First of all, I do not get why it reads so much data. I assume 
> the file attributes need to come from the mds server, so the rsync 
> backup should mostly cause writes not?
> 
> I think it started being slow, after enabling snapshots on the file 
> system.
> 
> - how can I determine if mds_cache_memory_limit = 80 is still 
> correct?
> 
> - how can I test the mds performance from the command line, so I can 
> experiment with cpu power configurations, and see if this brings a 
> significant change?
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io

Hi,

Incidentally, I was checking this in my CephFS cluster too, and I have 
used this to monitor cache usage:

# while sleep 1; do ceph daemon mds.your-mds perf dump | jq 
'.mds_mem.rss'; ceph daemon mds.your-mds dump_mempools | jq -c 
'.mempool.by_pool.mds_co'; done

You will need `jq` for this example, or you can filter the JSON however 
you prefer.

This will show you, by second, the size of the cache in items and memory 
usage, also showing the total memory usage.

I found this somewhere on the net. Basically, you can use those stats in 
JSON to gather what you need for checking if your cache is used and big 
enough etc.

I'm no expert, and also still learning how to best monitor my CephFS 
performance. This did give me some insight, though.

Its not a lot I have to offer, but since I got help on the list 
recently, I thought I might as well share what small bits I can ;)

Samy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS: obscene buffer_anon memory use when scanning lots of files (continued)

2020-02-11 Thread John Madden
Following the list migration I need to re-open this thread:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2020-January/038014.html

...

Upgraded to 14.2.7, doesn't appear to have affected the behavior. As requested:

~$ ceph tell mds.mds1 heap stats
2020-02-10 16:52:44.313 7fbda2cae700  0 client.59208005
ms_handle_reset on v2:x.x.x.x:6800/3372494505
2020-02-10 16:52:44.337 7fbda3cb0700  0 client.59249562
ms_handle_reset on v2:x.x.x.x:6800/3372494505
mds.mds1 tcmalloc heap stats:
MALLOC:5388656 (47684.1 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +174879528 (  166.8 MiB) Bytes in central cache freelist
MALLOC: + 14511680 (   13.8 MiB) Bytes in transfer cache freelist
MALLOC: + 14089320 (   13.4 MiB) Bytes in thread cache freelists
MALLOC: + 90534048 (   86.3 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =  50294403232 (47964.5 MiB) Actual memory used (physical + swap)
MALLOC: + 50987008 (   48.6 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =  50345390240 (48013.1 MiB) Virtual address space used
MALLOC:
MALLOC: 260018  Spans in use
MALLOC: 20  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

~$ ceph tell mds.mds1 heap release
2020-02-10 16:52:47.205 7f037eff5700  0 client.59249625
ms_handle_reset on v2:x.x.x.x:6800/3372494505
2020-02-10 16:52:47.237 7f037fff7700  0 client.59249634
ms_handle_reset on v2:x.x.x.x:6800/3372494505
mds.mds1 releasing free RAM back to system.

The pools over 15 minutes or so:

~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 2045,
  "bytes": 3069493686
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 2445,
  "bytes": 362538
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 7850,
  "bytes": 7658678767
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 12274,
  "bytes": 11436728978
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 13747,
  "bytes": 11539478519
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 14615,
  "bytes": 13859676992
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 23267,
  "bytes": 22290063830
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 44944,
  "bytes": 40726959425
}

And one about a minute after the heap release showing continued growth:

~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
  "items": 50694,
  "bytes": 47343942094
}

This is on a single active MDS with 2 standbys, scan for about a
million files with about 20 parallel threads on two clients, open and
read each if it exists.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-11 Thread Muhammad Ahmad
>>>And it seems smartctl on our seagate ST4000NM0034 drives do not give us
data on total bytes written or read

If it's a SAS device, it's not always obvious where to find this information.

You can use Seagate's openseachest toolset.

For any (SAS/SATA, HDD/SSD) device, the --deviceInfo will give you
some of the info you are looking for; e.g.

sudo ./openSeaChest_Info -d /dev/sg1 --deviceInfo | grep Total
Total Bytes Read (TB): 82.46
Total Bytes Written (TB): 311.56



On Tue, Feb 11, 2020 at 3:10 AM lists  wrote:
>
> Hi Joe and Mehmet!
>
> Thanks for your responses!
>
> The requested outputs at the end of the message.
>
> But to make my question more clear:
>
> What we are actually after, is not about CURRENT usage of our OSDs, but
> stats on total GBs written in the cluster, per OSD, and read/write ratio.
>
> With those numbers, we would be able to identify suitable replacement
> SSDs for our current HDDs, and select specifically for OUR typical use.
> (taking into account endurance, speed, price, etc, etc)
>
> And it seems smartctl on our seagate ST4000NM0034 drives do not give us
> data on total bytes written or read. (...or are we simply not looking in
> the right place..?)
>
> Requested outputs below:
>
> > root@node1:~# ceph osd df tree
> > ID CLASS WEIGHT   REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS TYPE NAME
> > -1   87.35376- 87.3TiB 49.1TiB 38.2TiB 56.22 1.00   - root 
> > default
> > -2   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.23 1.00   - host 
> > node1
> >  0   hdd  3.64000  1.0 3.64TiB 2.01TiB 1.62TiB 55.34 0.98 137 
> > osd.0
> >  1   hdd  3.64000  1.0 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141 
> > osd.1
> >  2   hdd  3.63689  1.0 3.64TiB 1.92TiB 1.72TiB 52.79 0.94 128 
> > osd.2
> >  3   hdd  3.64000  1.0 3.64TiB 2.07TiB 1.57TiB 56.90 1.01 143 
> > osd.3
> > 12   hdd  3.64000  1.0 3.64TiB 2.15TiB 1.48TiB 59.18 1.05 138 
> > osd.12
> > 13   hdd  3.64000  1.0 3.64TiB 1.99TiB 1.64TiB 54.80 0.97 131 
> > osd.13
> > 14   hdd  3.64000  1.0 3.64TiB 1.93TiB 1.70TiB 53.13 0.94 127 
> > osd.14
> > 15   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.10 1.07 143 
> > osd.15
> > -3   29.12000- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host 
> > node2
> >  4   hdd  3.64000  1.0 3.64TiB 2.11TiB 1.53TiB 57.97 1.03 142 
> > osd.4
> >  5   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.11 0.96 134 
> > osd.5
> >  6   hdd  3.64000  1.0 3.64TiB 2.12TiB 1.51TiB 58.40 1.04 142 
> > osd.6
> >  7   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.66TiB 54.28 0.97 128 
> > osd.7
> > 16   hdd  3.64000  1.0 3.64TiB 2.00TiB 1.64TiB 54.90 0.98 133 
> > osd.16
> > 17   hdd  3.64000  1.0 3.64TiB 2.33TiB 1.30TiB 64.14 1.14 153 
> > osd.17
> > 18   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.07 0.96 132 
> > osd.18
> > 19   hdd  3.64000  1.0 3.64TiB 1.89TiB 1.75TiB 51.93 0.92 124 
> > osd.19
> > -4   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host 
> > node3
> >  8   hdd  3.64000  1.0 3.64TiB 1.79TiB 1.85TiB 49.24 0.88 123 
> > osd.8
> >  9   hdd  3.64000  1.0 3.64TiB 2.17TiB 1.47TiB 59.72 1.06 144 
> > osd.9
> > 10   hdd  3.64000  1.0 3.64TiB 2.40TiB 1.24TiB 65.88 1.17 157 
> > osd.10
> > 11   hdd  3.64000  1.0 3.64TiB 2.06TiB 1.58TiB 56.64 1.01 133 
> > osd.11
> > 20   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.23 1.07 148 
> > osd.20
> > 21   hdd  3.64000  1.0 3.64TiB 1.74TiB 1.90TiB 47.80 0.85 115 
> > osd.21
> > 22   hdd  3.64000  1.0 3.64TiB 2.05TiB 1.59TiB 56.27 1.00 138 
> > osd.22
> > 23   hdd  3.63689  1.0 3.64TiB 1.96TiB 1.67TiB 54.01 0.96 130 
> > osd.23
> >  TOTAL 87.3TiB 49.1TiB 38.2TiB 56.22
> > MIN/MAX VAR: 0.85/1.17  STDDEV: 4.08
> > root@node1:~# ceph osd status
> > ++--+---+---++-++-+---+
> > | id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state 
> >   |
> > ++--+---+---++-++-+---+
> > | 0  | node1  | 2061G | 1663G |   38   |  5168k  |3   |  1491k  | 
> > exists,up |
> > | 1  | node1  | 2143G | 1580G |4   |  1092k  |9   |  2243k  | 
> > exists,up |
> > | 2  | node1  | 1965G | 1758G |   20   |  3643k  |5   |  1758k  | 
> > exists,up |
> > | 3  | node1  | 2119G | 1605G |   17   |  99.5k  |4   |  3904k  | 
> > exists,up |
> > | 4  | node2  | 2158G | 1565G |   12   |   527k  |1   |  2632k  | 
> > exists,up |
> > | 5  | node2  | 2014G | 1709G |   15   |   239k  |0   |   889k  | 
> > exists,up |
> > | 6  | node2  | 2174G | 1549G |   11   |  1677k  |5   |  1931k  | 
> > exists,up |
> > | 7  | node2  | 2021G | 1702G |2   |   597k  |0   |  1638k  | 
> > exists,up |
> > | 8  | node3  | 

[ceph-users] Re: Bluestore cache parameter precedence

2020-02-11 Thread borepstein
Igor,

You are exactly right - it is my fault, I have failed to read the code 
correctly.

Cheers,

Boris.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: PrimaryLogPG.cc: 11550: FAILED ceph_assert(head_obc)

2020-02-11 Thread Jake Grimmett
Quick Update in case anyone reads my previous post.

No ideas were forthcoming on how to fix the assert that was flapping the
OSD (caused by deleting unfound objects).

The affected pg was readable, so we decided to recycle the OSD...

destroy the flapping primary OSD
# ceph osd destroy 443 --force

purge the lvm entry for this disk
#lvremove
/dev/ceph-64b0010b-e397-49c2-ab01-6e43e6e5b41a/osd-block-fb824e45-d35f-486c-a4ca-05e5937eceae

zap the disk, it's the only way to be sure...
# ceph-volume lvm zap  /dev/sdab

reuse the drive & OSD number
# ceph-volume lvm prepare --osd-id 443 --data /dev/sdab

activate the OSD
# ceph-volume lvm activate 443 6e252371-d158-4d16-ac31-fed8f7d0cb1f

Now watching to see if the cluster recovers...

best,

Jake

On 2/10/20 3:31 PM, Jake Grimmett wrote:
> Dear All,
> 
> Following a clunky* cluster restart, we had
> 
> 23 "objects unfound"
> 14 pg recovery_unfound
> 
> We could see no way to recover the unfound objects, we decided to mark
> the objects in one pg unfound...
> 
> [root@ceph1 bad_oid]# ceph pg 5.f2f mark_unfound_lost delete
> pg has 2 objects unfound and apparently lost marking
> 
> Unfortunately, this immediately crashed the primary OSD for this PG:
> 
> OSD log showing the osd crashing 3 times here: 
> 
> the assert was :>
> 
> 2020-02-10 13:38:45.003 7fa713ef3700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.6/rpm/el7/BUILD/ceph-14.2.6/src/osd/PrimaryLogPG.cc:
> In function 'int PrimaryLogPG::recover_missing(const hobject_t&,
> eversion_t, int, PGBackend::RecoveryHandle*)' thread 7fa713ef3700 time
> 2020-02-10 13:38:45.000875
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.6/rpm/el7/BUILD/ceph-14.2.6/src/osd/PrimaryLogPG.cc:
> 11550: FAILED ceph_assert(head_obc)
> 
> 
> Questions..
> 
> 1) Is it possible to recover the flapping OSD? or should we fail out the
> flapping OSD and hope the cluster recovers?
> 
> 2) We have 13 other pg with unfound objects. Do we need to mark_unfound
> these one at a time, and then fail out their primary OSD? (allowing the
> cluster to recover before mark_unfound the next pg & failing it's
> primary OSD)
> 
> 
> 
> * thread describing the bad restart :>
> 
> 
> many thanks!
> 
> Jake
> 


-- 
Dr Jake Grimmett
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs slow, howto investigate and tune mds configuration?

2020-02-11 Thread Samy Ascha



> On 11 Feb 2020, at 14:53, Marc Roos  wrote:
> 
> 
> Say I think my cephfs is slow when I rsync to it, slower than it used to 
> be. First of all, I do not get why it reads so much data. I assume the 
> file attributes need to come from the mds server, so the rsync backup 
> should mostly cause writes not?
> 
> I think it started being slow, after enabling snapshots on the file 
> system.
> 
> - how can I determine if mds_cache_memory_limit = 80 is still 
> correct?
> 
> - how can I test the mds performance from the command line, so I can 
> experiment with cpu power configurations, and see if this brings a 
> significant change?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

Hi,

Incidentally, I was checking this in my CephFS cluster too, and I have used 
this to monitor cache usage:

# while sleep 1; do ceph daemon mds.your-mds perf dump | jq '.mds_mem.rss'; 
ceph daemon mds.your-mds dump_mempools | jq -c '.mempool.by_pool.mds_co'; done

You will need `jq` for this example, or you can filter the JSON however you 
prefer.

This will show you, by second, the size of the cache in items and memory usage, 
also showing the total memory usage.

I found this somewhere on the net. Basically, you can use those stats in JSON 
to gather what you need for checking if your cache is used and big enough etc.

I'm no expert, and also still learning how to best monitor my CephFS 
performance. This did give me some insight, though.

Its not a lot I have to offer, but since I got help on the list recently, I 
thought I might as well share what small bits I can ;)

Samy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs slow, howto investigate and tune mds configuration?

2020-02-11 Thread Marc Roos


Say I think my cephfs is slow when I rsync to it, slower than it used to 
be. First of all, I do not get why it reads so much data. I assume the 
file attributes need to come from the mds server, so the rsync backup 
should mostly cause writes not?

I think it started being slow, after enabling snapshots on the file 
system.

- how can I determine if mds_cache_memory_limit = 80 is still 
correct?

- how can I test the mds performance from the command line, so I can 
experiment with cpu power configurations, and see if this brings a 
significant change?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ERROR: osd init failed: (1) Operation not permitted

2020-02-11 Thread Alwin Antreich
Hi Mario,

On Mon, Feb 10, 2020 at 07:50:15PM +0100, Ml Ml wrote:
> Hello List,
> 
> first of all: Yes - i made mistakes. Now i am trying to recover :-/
> 
> I had a healthy 3 node cluster which i wanted to convert to a single one.
> My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes.
> 
> I was able to healthy turn it from a 3 Node Cluster to a 2 Node cluster.
> Then the problems began.
> 
> I started to change size=1 and min_size=1. (i know, i know, i will
> never ever to that again!)
> Health was okay until here. Then over sudden both nodes got
> fenced...one node refused to boot, mons where missing, etc...to make
> long story short, here is where i am right now:
First off, you better have a backup. ;)

> 
> 
> root@node03:~ # ceph -s
> cluster b3be313f-d0ef-42d5-80c8-6b41380a47e3
>  health HEALTH_WARN
> 53 pgs stale
> 53 pgs stuck stale
>  monmap e4: 2 mons at {0=10.15.15.3:6789/0,1=10.15.15.2:6789/0}
> election epoch 298, quorum 0,1 1,0
>  osdmap e6097: 14 osds: 9 up, 9 in
>   pgmap v93644673: 512 pgs, 1 pools, 1193 GB data, 304 kobjects
> 1088 GB used, 32277 GB / 33366 GB avail
>  459 active+clean
>   53 stale+active+clean
> 
> root@node03:~ # ceph osd tree
> ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 32.56990 root default
> -2 25.35992 host node03
>  0  3.57999 osd.0up  1.0  1.0
>  5  3.62999 osd.5up  1.0  1.0
>  6  3.62999 osd.6up  1.0  1.0
>  7  3.62999 osd.7up  1.0  1.0
>  8  3.62999 osd.8up  1.0  1.0
> 19  3.62999 osd.19   up  1.0  1.0
> 20  3.62999 osd.20   up  1.0  1.0
> -3  7.20998 host node02
>  3  3.62999 osd.3up  1.0  1.0
>  4  3.57999 osd.4up  1.0  1.0
>  10 osd.1  down0  1.0
>  90 osd.9  down0  1.0
> 100 osd.10 down0  1.0
> 170 osd.17 down0  1.0
> 180 osd.18 down0  1.0
> 
> 
> 
> my main mistakes seemd to be:
> 
> ceph osd out osd.1
> ceph auth del osd.1
> systemctl stop ceph-osd@1
> ceph osd rm 1
> umount /var/lib/ceph/osd/ceph-1
> ceph osd crush remove osd.1
> 
> As far as i can tell, ceph waits and needs data from that OSD.1 (which
> i removed)
> 
> 
> 
> root@node03:~ # ceph health detail
> HEALTH_WARN 53 pgs stale; 53 pgs stuck stale
> pg 0.1a6 is stuck stale for 5086.552795, current state
> stale+active+clean, last acting [1]
> pg 0.142 is stuck stale for 5086.552784, current state
> stale+active+clean, last acting [1]
> pg 0.1e is stuck stale for 5086.552820, current state
> stale+active+clean, last acting [1]
> pg 0.e0 is stuck stale for 5086.552855, current state
> stale+active+clean, last acting [1]
> pg 0.1d is stuck stale for 5086.552822, current state
> stale+active+clean, last acting [1]
> pg 0.13c is stuck stale for 5086.552791, current state
> stale+active+clean, last acting [1]
> [...] SNIP [...]
> pg 0.e9 is stuck stale for 5086.552955, current state
> stale+active+clean, last acting [1]
> pg 0.87 is stuck stale for 5086.552939, current state
> stale+active+clean, last acting [1]
> 
> 
> When i try to start ODS.1 manually, i get:
> 
> 2020-02-10 18:48:26.107444 7f9ce31dd880  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid
> 10210
> 2020-02-10 18:48:26.134417 7f9ce31dd880  0
> filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342)
> 2020-02-10 18:48:26.184202 7f9ce31dd880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2020-02-10 18:48:26.184209 7f9ce31dd880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2020-02-10 18:48:26.184526 7f9ce31dd880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2020-02-10 18:48:26.184585 7f9ce31dd880  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize
> is disabled by conf
> 2020-02-10 18:48:26.309755 7f9ce31dd880  0
> filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2020-02-10 18:48:26.633926 7f9ce31dd880  1 journal _open
> /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2020-02-10 18:48:26.642185 7f9ce31dd880  1 journal _open
> /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
> 4096 bytes, 

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-11 Thread lists

Hi Joe and Mehmet!

Thanks for your responses!

The requested outputs at the end of the message.

But to make my question more clear:

What we are actually after, is not about CURRENT usage of our OSDs, but 
stats on total GBs written in the cluster, per OSD, and read/write ratio.


With those numbers, we would be able to identify suitable replacement 
SSDs for our current HDDs, and select specifically for OUR typical use. 
(taking into account endurance, speed, price, etc, etc)


And it seems smartctl on our seagate ST4000NM0034 drives do not give us 
data on total bytes written or read. (...or are we simply not looking in 
the right place..?)


Requested outputs below:


root@node1:~# ceph osd df tree
ID CLASS WEIGHT   REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS TYPE NAME
-1   87.35376- 87.3TiB 49.1TiB 38.2TiB 56.22 1.00   - root default
-2   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.23 1.00   - host node1
 0   hdd  3.64000  1.0 3.64TiB 2.01TiB 1.62TiB 55.34 0.98 137 osd.0
 1   hdd  3.64000  1.0 3.64TiB 2.09TiB 1.54TiB 57.56 1.02 141 osd.1
 2   hdd  3.63689  1.0 3.64TiB 1.92TiB 1.72TiB 52.79 0.94 128 osd.2
 3   hdd  3.64000  1.0 3.64TiB 2.07TiB 1.57TiB 56.90 1.01 143 osd.3
12   hdd  3.64000  1.0 3.64TiB 2.15TiB 1.48TiB 59.18 1.05 138 osd.12
13   hdd  3.64000  1.0 3.64TiB 1.99TiB 1.64TiB 54.80 0.97 131 osd.13
14   hdd  3.64000  1.0 3.64TiB 1.93TiB 1.70TiB 53.13 0.94 127 osd.14
15   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.10 1.07 143 osd.15
-3   29.12000- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host node2
 4   hdd  3.64000  1.0 3.64TiB 2.11TiB 1.53TiB 57.97 1.03 142 osd.4
 5   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.11 0.96 134 osd.5
 6   hdd  3.64000  1.0 3.64TiB 2.12TiB 1.51TiB 58.40 1.04 142 osd.6
 7   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.66TiB 54.28 0.97 128 osd.7
16   hdd  3.64000  1.0 3.64TiB 2.00TiB 1.64TiB 54.90 0.98 133 osd.16
17   hdd  3.64000  1.0 3.64TiB 2.33TiB 1.30TiB 64.14 1.14 153 osd.17
18   hdd  3.64000  1.0 3.64TiB 1.97TiB 1.67TiB 54.07 0.96 132 osd.18
19   hdd  3.64000  1.0 3.64TiB 1.89TiB 1.75TiB 51.93 0.92 124 osd.19
-4   29.11688- 29.1TiB 16.4TiB 12.7TiB 56.22 1.00   - host node3
 8   hdd  3.64000  1.0 3.64TiB 1.79TiB 1.85TiB 49.24 0.88 123 osd.8
 9   hdd  3.64000  1.0 3.64TiB 2.17TiB 1.47TiB 59.72 1.06 144 osd.9
10   hdd  3.64000  1.0 3.64TiB 2.40TiB 1.24TiB 65.88 1.17 157 osd.10
11   hdd  3.64000  1.0 3.64TiB 2.06TiB 1.58TiB 56.64 1.01 133 osd.11
20   hdd  3.64000  1.0 3.64TiB 2.19TiB 1.45TiB 60.23 1.07 148 osd.20
21   hdd  3.64000  1.0 3.64TiB 1.74TiB 1.90TiB 47.80 0.85 115 osd.21
22   hdd  3.64000  1.0 3.64TiB 2.05TiB 1.59TiB 56.27 1.00 138 osd.22
23   hdd  3.63689  1.0 3.64TiB 1.96TiB 1.67TiB 54.01 0.96 130 osd.23
 TOTAL 87.3TiB 49.1TiB 38.2TiB 56.22
MIN/MAX VAR: 0.85/1.17  STDDEV: 4.08
root@node1:~# ceph osd status
++--+---+---++-++-+---+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
++--+---+---++-++-+---+
| 0  | node1  | 2061G | 1663G |   38   |  5168k  |3   |  1491k  | exists,up 
|
| 1  | node1  | 2143G | 1580G |4   |  1092k  |9   |  2243k  | exists,up 
|
| 2  | node1  | 1965G | 1758G |   20   |  3643k  |5   |  1758k  | exists,up 
|
| 3  | node1  | 2119G | 1605G |   17   |  99.5k  |4   |  3904k  | exists,up 
|
| 4  | node2  | 2158G | 1565G |   12   |   527k  |1   |  2632k  | exists,up 
|
| 5  | node2  | 2014G | 1709G |   15   |   239k  |0   |   889k  | exists,up 
|
| 6  | node2  | 2174G | 1549G |   11   |  1677k  |5   |  1931k  | exists,up 
|
| 7  | node2  | 2021G | 1702G |2   |   597k  |0   |  1638k  | exists,up 
|
| 8  | node3  | 1833G | 1890G |4   |   564k  |4   |  5595k  | exists,up 
|
| 9  | node3  | 2223G | 1500G |6   |  1124k  |   10   |  4864k  | exists,up 
|
| 10 | node3  | 2453G | 1270G |8   |  1257k  |3   |  1447k  | exists,up 
|
| 11 | node3  | 2109G | 1614G |   14   |  2889k  |3   |  1449k  | exists,up 
|
| 12 | node1  | 2204G | 1520G |   17   |  1596k  |4   |  1806k  | exists,up 
|
| 13 | node1  | 2040G | 1683G |   15   |  2526k  |0   |   819k  | exists,up 
|
| 14 | node1  | 1978G | 1745G |   11   |  1713k  |8   |  3489k  | exists,up 
|
| 15 | node1  | 2238G | 1485G |   25   |  5151k  |5   |  2715k  | exists,up 
|
| 16 | node2  | 2044G | 1679G |2   |  43.3k  |1   |  3371k  | exists,up 
|
| 17 | node2  | 2388G | 1335G |   14   |  1736k  |9   |  5315k  | exists,up 
|
| 18 | node2  | 2013G | 1710G |8   |  1907k  |2   |  2004k  | exists,up 
|
| 19 |