Re: [ceph-users] troubleshooting space usage

Andrei Mikhailovsky Wed, 03 Jul 2019 08:35:26 -0700

Hi Igor. 

The numbers are identical it seems:


.rgw.buckets 19 15 TiB 78.22 4.3 TiB 8786934 

# cat /root/ceph-rgw.buckets-rados-ls-all |wc -l 
8786934 

Cheers 

> From: "Igor Fedotov" <ifedo...@suse.de>
> To: "andrei" <and...@arhont.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Wednesday, 3 July, 2019 13:49:02
> Subject: Re: [ceph-users] troubleshooting space usage

> Looks fine - comparing bluestore_allocated vs. bluestore_stored shows a little
> difference. So that's not the allocation overhead.

> What's about comparing object counts reported by ceph and radosgw tools?

> Igor.

> On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote:

>> Thanks Igor, Here is a link to the ceph perf data on several osds.

>> [ https://paste.ee/p/IzDMy | https://paste.ee/p/IzDMy ]

>> In terms of the object sizes. We use rgw to backup the data from various
>> workstations and servers. So, the sizes would be from a few kb to a few gig 
>> per
>> individual file.

>> Cheers

>>> From: "Igor Fedotov" [ mailto:ifedo...@suse.de | <ifedo...@suse.de> ]
>>> To: "andrei" [ mailto:and...@arhont.com | <and...@arhont.com> ]
>>> Cc: "ceph-users" [ mailto:ceph-users@lists.ceph.com |
>>> <ceph-users@lists.ceph.com> ]
>>> Sent: Wednesday, 3 July, 2019 12:29:33
>>> Subject: Re: [ceph-users] troubleshooting space usage

>>> Hi Andrei,

>>> Additionally I'd like to see performance counters dump for a couple of HDD 
>>> OSDs
>>> (obtained through 'ceph daemon osd.N perf dump' command).

>>> W.r.t average object size - I was thinking that you might know what objects 
>>> had
>>> been uploaded... If not then you might want to estimate it by using "rados 
>>> get"
>>> command on the pool: retrieve some random object set and check their sizes. 
>>> But
>>> let's check performance counters first - most probably they will show loses
>>> caused by allocation.

>>> Also I've just found similar issue (still unresolved) in our internal 
>>> tracker -
>>> but its root cause is definitely different from allocation overhead. Looks 
>>> like
>>> some orphaned objects in the pool. Could you please compare and share the
>>> amounts of objects in the pool reported by "ceph (or rados) df detail" and
>>> radosgw tools?

>>> Thanks,

>>> Igor

>>> On 7/3/2019 12:56 PM, Andrei Mikhailovsky wrote:

>>>> Hi Igor,

>>>> Many thanks for your reply. Here are the details about the cluster:

>>>> 1. Ceph version - 13.2.5-1xenial (installed from Ceph repository for ubuntu
>>>> 16.04)

>>>> 2. main devices for radosgw pool - hdd. we do use a few ssds for the other 
>>>> pool,
>>>> but it is not used by radosgw

>>>> 3. we use BlueStore

>>>> 4. Average rgw object size - I have no idea how to check that. Couldn't 
>>>> find a
>>>> simple answer from google either. Could you please let me know how to check
>>>> that?

>>>> 5. Ceph osd df tree:

>>>> 6. Other useful info on the cluster:

>>>> # ceph osd df tree
>>>> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME

>>>> -1 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - root uk
>>>> -5 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - datacenter ldex
>>>> -11 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - room ldex-dc3
>>>> -13 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - row row-a
>>>> -4 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - rack ldex-rack-a5
>>>> -2 28.04495 - 28 TiB 22 TiB 6.2 TiB 77.96 0.98 - host arh-ibstorage1-ib

>>>> 0 hdd 2.73000 0.79999 2.8 TiB 2.3 TiB 519 GiB 81.61 1.03 145 osd.0
>>>> 1 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 847 GiB 70.00 0.88 130 osd.1
>>>> 2 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 561 GiB 80.12 1.01 152 osd.2
>>>> 3 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 469 GiB 83.41 1.05 160 osd.3
>>>> 4 hdd 2.73000 1.00000 2.8 TiB 1.8 TiB 983 GiB 65.18 0.82 141 osd.4
>>>> 32 hdd 5.45999 1.00000 5.5 TiB 4.4 TiB 1.1 TiB 80.68 1.02 306 osd.32
>>>> 35 hdd 2.73000 1.00000 2.8 TiB 1.7 TiB 1.0 TiB 62.89 0.79 126 osd.35
>>>> 36 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 464 GiB 83.58 1.05 175 osd.36
>>>> 37 hdd 2.73000 0.89999 2.8 TiB 2.5 TiB 301 GiB 89.34 1.13 160 osd.37
>>>> 5 ssd 0.74500 1.00000 745 GiB 642 GiB 103 GiB 86.15 1.09 65 osd.5

>>>> -3 28.04495 - 28 TiB 24 TiB 4.5 TiB 84.03 1.06 - host arh-ibstorage2-ib
>>>> 9 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 405 GiB 85.65 1.08 158 osd.9
>>>> 10 hdd 2.73000 0.89999 2.8 TiB 2.4 TiB 352 GiB 87.52 1.10 169 osd.10
>>>> 11 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 783 GiB 72.28 0.91 160 osd.11
>>>> 12 hdd 2.73000 0.84999 2.8 TiB 2.4 TiB 359 GiB 87.27 1.10 153 osd.12
>>>> 13 hdd 2.73000 1.00000 2.8 TiB 2.4 TiB 348 GiB 87.69 1.11 169 osd.13
>>>> 14 hdd 2.73000 1.00000 2.8 TiB 2.5 TiB 283 GiB 89.97 1.14 170 osd.14
>>>> 15 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 560 GiB 80.18 1.01 155 osd.15
>>>> 16 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 332 GiB 88.26 1.11 178 osd.16
>>>> 26 hdd 5.45999 1.00000 5.5 TiB 4.4 TiB 1.0 TiB 81.04 1.02 324 osd.26
>>>> 7 ssd 0.74500 1.00000 745 GiB 607 GiB 138 GiB 81.48 1.03 62 osd.7

>>>> -15 28.04495 - 28 TiB 22 TiB 6.4 TiB 77.40 0.98 - host arh-ibstorage3-ib
>>>> 18 hdd 2.73000 0.95000 2.8 TiB 2.5 TiB 312 GiB 88.96 1.12 156 osd.18
>>>> 19 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 771 GiB 72.68 0.92 162 osd.19
>>>> 20 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 733 GiB 74.04 0.93 149 osd.20
>>>> 21 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 533 GiB 81.12 1.02 155 osd.21
>>>> 22 hdd 2.73000 1.00000 2.8 TiB 2.1 TiB 692 GiB 75.48 0.95 144 osd.22
>>>> 23 hdd 2.73000 1.00000 2.8 TiB 1.6 TiB 1.1 TiB 58.43 0.74 130 osd.23
>>>> 24 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 579 GiB 79.51 1.00 146 osd.24
>>>> 25 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 886 GiB 68.63 0.87 147 osd.25
>>>> 31 hdd 5.45999 1.00000 5.5 TiB 4.7 TiB 758 GiB 86.50 1.09 326 osd.31
>>>> 6 ssd 0.74500 0.89999 744 GiB 640 GiB 104 GiB 86.01 1.09 61 osd.6

>>>> -17 28.04494 - 28 TiB 22 TiB 6.3 TiB 77.61 0.98 - host arh-ibstorage4-ib
>>>> 8 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 909 GiB 67.80 0.86 141 osd.8
>>>> 17 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 904 GiB 67.99 0.86 144 osd.17
>>>> 27 hdd 2.73000 1.00000 2.8 TiB 2.1 TiB 654 GiB 76.84 0.97 152 osd.27
>>>> 28 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 481 GiB 82.98 1.05 153 osd.28
>>>> 29 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 829 GiB 70.65 0.89 137 osd.29
>>>> 30 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 762 GiB 73.03 0.92 142 osd.30
>>>> 33 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 501 GiB 82.25 1.04 166 osd.33
>>>> 34 hdd 5.45998 1.00000 5.5 TiB 4.5 TiB 968 GiB 82.77 1.04 325 osd.34
>>>> 39 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 402 GiB 85.77 1.08 162 osd.39
>>>> 38 ssd 0.74500 1.00000 745 GiB 671 GiB 74 GiB 90.02 1.14 68 osd.38
>>>> TOTAL 113 TiB 90 TiB 23 TiB 79.25
>>>> MIN/MAX VAR: 0.74/1.14 STDDEV: 8.14

>>>> # for i in $(radosgw-admin bucket list | jq -r '.[]'); do radosgw-admin 
>>>> bucket
>>>> stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | awk '{ 
>>>> SUM +=
>>>> $1} END { print SUM/1024/1024/1024 }'
>>>> 6.59098

>>>> # ceph df

>>>> GLOBAL:
>>>> SIZE AVAIL RAW USED %RAW USED
>>>> 113 TiB 23 TiB 90 TiB 79.25

>>>> POOLS:
>>>> NAME ID USED %USED MAX AVAIL OBJECTS
>>>> Primary-ubuntu-1 5 27 TiB 87.56 3.9 TiB 7302534
>>>> .users.uid 15 6.8 KiB 0 3.9 TiB 39
>>>> .users 16 335 B 0 3.9 TiB 20
>>>> .users.swift 17 14 B 0 3.9 TiB 1
>>>> .rgw.buckets 19 15 TiB 79.88 3.9 TiB 8787763
>>>> .users.email 22 0 B 0 3.9 TiB 0
>>>> .log 24 109 MiB 0 3.9 TiB 102301
>>>> .rgw.buckets.extra 37 0 B 0 2.6 TiB 0
>>>> .rgw.root 44 2.9 KiB 0 2.6 TiB 16
>>>> .rgw.meta 45 1.7 MiB 0 2.6 TiB 6249
>>>> .rgw.control 46 0 B 0 2.6 TiB 8
>>>> .rgw.gc 47 0 B 0 2.6 TiB 32
>>>> .usage 52 0 B 0 2.6 TiB 0
>>>> .intent-log 53 0 B 0 2.6 TiB 0
>>>> default.rgw.buckets.non-ec 54 0 B 0 2.6 TiB 0
>>>> .rgw.buckets.index 55 0 B 0 2.6 TiB 11485
>>>> .rgw 56 491 KiB 0 2.6 TiB 1686
>>>> Primary-ubuntu-1-ssd 57 1.2 TiB 92.39 105 GiB 379516

>>>> I am not too sure if the issue relates to the BlueStore overhead as I would
>>>> probably have seen the discrepancy in my Primary-ubuntu-1 pool as well.
>>>> However, the data usage on Primary-ubuntu-1 pool seems to be consistent 
>>>> with my
>>>> expectations (precise numbers to be verified soon). The issues seems to be 
>>>> only
>>>> with the .rgw-buckets pool where the "ceph df " output shows 15TB of usage 
>>>> and
>>>> the sum of all buckets in that pool shows just over 6.5TB.

>>>> Cheers

>>>> Andrei

>>>>> From: "Igor Fedotov" [ mailto:ifedo...@suse.de | <ifedo...@suse.de> ]
>>>>> To: "andrei" [ mailto:and...@arhont.com | <and...@arhont.com> ] , 
>>>>> "ceph-users" [
>>>>> mailto:ceph-users@lists.ceph.com | <ceph-users@lists.ceph.com> ]
>>>>> Sent: Tuesday, 2 July, 2019 10:58:54
>>>>> Subject: Re: [ceph-users] troubleshooting space usage

>>>>> Hi Andrei,

>>>>> The most obvious reason is space usage overhead caused by BlueStore 
>>>>> allocation
>>>>> granularity, e.g. if bluestore_min_alloc_size is 64K and average object 
>>>>> size is
>>>>> 16K one will waste 48K per object in average. This is rather a 
>>>>> speculation so
>>>>> far as we lack key the information about your cluster:

>>>>> - Ceph version

>>>>> - What are the main devices for OSD: hdd or ssd.

>>>>> - BlueStore or FileStore.

>>>>> - average RGW object size.

>>>>> You might also want to collect and share performance counter dumps (ceph 
>>>>> daemon
>>>>> osd.N perf dump) and "
>>>>> " reports from a couple of your OSDs.

>>>>> Thanks,

>>>>> Igor

>>>>> On 7/2/2019 11:43 AM, Andrei Mikhailovsky wrote:

>>>>>> Bump!

>>>>>>> From: "Andrei Mikhailovsky" [ mailto:and...@arhont.com | 
>>>>>>> <and...@arhont.com> ]
>>>>>>> To: "ceph-users" [ mailto:ceph-users@lists.ceph.com |
>>>>>>> <ceph-users@lists.ceph.com> ]
>>>>>>> Sent: Friday, 28 June, 2019 14:54:53
>>>>>>> Subject: [ceph-users] troubleshooting space usage

>>>>>>> Hi

>>>>>>> Could someone please explain / show how to troubleshoot the space usage 
>>>>>>> in Ceph
>>>>>>> and how to reclaim the unused space?

>>>>>>> I have a small cluster with 40 osds, replica of 2, mainly used as a 
>>>>>>> backend for
>>>>>>> cloud stack as well as the S3 gateway. The used space doesn't make any 
>>>>>>> sense to
>>>>>>> me, especially the rgw pool, so I am seeking help.

>>>>>>> Here is what I found from the client:

>>>>>>> Ceph -s shows the

>>>>>>> usage: 89 TiB used, 24 TiB / 113 TiB avail

>>>>>>> Ceph df shows:

>>>>>>> Primary-ubuntu-1 5 27 TiB 90.11 3.0 TiB 7201098
>>>>>>> Primary-ubuntu-1-ssd 57 1.2 TiB 89.62 143 GiB 359260
>>>>>>> .rgw.buckets 19 15 TiB 83.73 3.0 TiB 8742222

>>>>>>> the usage of the Primary-ubuntu-1 and Primary-ubuntu-1-ssd is in line 
>>>>>>> with my
>>>>>>> expectations. However, the .rgw.buckets pool seems to be using way too 
>>>>>>> much.
>>>>>>> The usage of all rgw buckets shows 6.5TB usage (looking at the size_kb 
>>>>>>> values
>>>>>>> from the "radosgw-admin bucket stats"). I am trying to figure out why
>>>>>>> .rgw.buckets is using 15TB of space instead of the 6.5TB as shown from 
>>>>>>> the
>>>>>>> bucket usage.

>>>>>>> Thanks

>>>>>>> Andrei

>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ]
>>>>>>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

>>>>>> _______________________________________________
>>>>>> ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
>>>>>> ceph-users@lists.ceph.com ] [
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] troubleshooting space usage

Reply via email to