Thanks for trying to help, Igor. > From: "Igor Fedotov" <ifedo...@suse.de> > To: "Andrei Mikhailovsky" <and...@arhont.com> > Cc: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Thursday, 4 July, 2019 12:52:16 > Subject: Re: [ceph-users] troubleshooting space usage
> Yep, this looks fine.. > hmm... sorry, but I'm out of ideas what's happening.. > Anyway I think ceph reports are more trustworthy than rgw ones. Looks like > some > issue with rgw reporting or may be some object leakage. > Regards, > Igor > On 7/3/2019 6:34 PM, Andrei Mikhailovsky wrote: >> Hi Igor. >> The numbers are identical it seems: >> .rgw.buckets 19 15 TiB 78.22 4.3 TiB 8786934 >> # cat /root/ceph-rgw.buckets-rados-ls-all |wc -l >> 8786934 >> Cheers >>> From: "Igor Fedotov" [ mailto:ifedo...@suse.de | <ifedo...@suse.de> ] >>> To: "andrei" [ mailto:and...@arhont.com | <and...@arhont.com> ] >>> Cc: "ceph-users" [ mailto:ceph-users@lists.ceph.com | >>> <ceph-users@lists.ceph.com> ] >>> Sent: Wednesday, 3 July, 2019 13:49:02 >>> Subject: Re: [ceph-users] troubleshooting space usage >>> Looks fine - comparing bluestore_allocated vs. bluestore_stored shows a >>> little >>> difference. So that's not the allocation overhead. >>> What's about comparing object counts reported by ceph and radosgw tools? >>> Igor. >>> On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote: >>>> Thanks Igor, Here is a link to the ceph perf data on several osds. >>>> [ https://paste.ee/p/IzDMy | https://paste.ee/p/IzDMy ] >>>> In terms of the object sizes. We use rgw to backup the data from various >>>> workstations and servers. So, the sizes would be from a few kb to a few >>>> gig per >>>> individual file. >>>> Cheers >>>>> From: "Igor Fedotov" [ mailto:ifedo...@suse.de | <ifedo...@suse.de> ] >>>>> To: "andrei" [ mailto:and...@arhont.com | <and...@arhont.com> ] >>>>> Cc: "ceph-users" [ mailto:ceph-users@lists.ceph.com | >>>>> <ceph-users@lists.ceph.com> ] >>>>> Sent: Wednesday, 3 July, 2019 12:29:33 >>>>> Subject: Re: [ceph-users] troubleshooting space usage >>>>> Hi Andrei, >>>>> Additionally I'd like to see performance counters dump for a couple of >>>>> HDD OSDs >>>>> (obtained through 'ceph daemon osd.N perf dump' command). >>>>> W.r.t average object size - I was thinking that you might know what >>>>> objects had >>>>> been uploaded... If not then you might want to estimate it by using >>>>> "rados get" >>>>> command on the pool: retrieve some random object set and check their >>>>> sizes. But >>>>> let's check performance counters first - most probably they will show >>>>> loses >>>>> caused by allocation. >>>>> Also I've just found similar issue (still unresolved) in our internal >>>>> tracker - >>>>> but its root cause is definitely different from allocation overhead. >>>>> Looks like >>>>> some orphaned objects in the pool. Could you please compare and share the >>>>> amounts of objects in the pool reported by "ceph (or rados) df detail" and >>>>> radosgw tools? >>>>> Thanks, >>>>> Igor >>>>> On 7/3/2019 12:56 PM, Andrei Mikhailovsky wrote: >>>>>> Hi Igor, >>>>>> Many thanks for your reply. Here are the details about the cluster: >>>>>> 1. Ceph version - 13.2.5-1xenial (installed from Ceph repository for >>>>>> ubuntu >>>>>> 16.04) >>>>>> 2. main devices for radosgw pool - hdd. we do use a few ssds for the >>>>>> other pool, >>>>>> but it is not used by radosgw >>>>>> 3. we use BlueStore >>>>>> 4. Average rgw object size - I have no idea how to check that. Couldn't >>>>>> find a >>>>>> simple answer from google either. Could you please let me know how to >>>>>> check >>>>>> that? >>>>>> 5. Ceph osd df tree: >>>>>> 6. Other useful info on the cluster: >>>>>> # ceph osd df tree >>>>>> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME >>>>>> -1 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - root uk >>>>>> -5 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - datacenter ldex >>>>>> -11 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - room ldex-dc3 >>>>>> -13 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - row row-a >>>>>> -4 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - rack ldex-rack-a5 >>>>>> -2 28.04495 - 28 TiB 22 TiB 6.2 TiB 77.96 0.98 - host arh-ibstorage1-ib >>>>>> 0 hdd 2.73000 0.79999 2.8 TiB 2.3 TiB 519 GiB 81.61 1.03 145 osd.0 >>>>>> 1 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 847 GiB 70.00 0.88 130 osd.1 >>>>>> 2 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 561 GiB 80.12 1.01 152 osd.2 >>>>>> 3 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 469 GiB 83.41 1.05 160 osd.3 >>>>>> 4 hdd 2.73000 1.00000 2.8 TiB 1.8 TiB 983 GiB 65.18 0.82 141 osd.4 >>>>>> 32 hdd 5.45999 1.00000 5.5 TiB 4.4 TiB 1.1 TiB 80.68 1.02 306 osd.32 >>>>>> 35 hdd 2.73000 1.00000 2.8 TiB 1.7 TiB 1.0 TiB 62.89 0.79 126 osd.35 >>>>>> 36 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 464 GiB 83.58 1.05 175 osd.36 >>>>>> 37 hdd 2.73000 0.89999 2.8 TiB 2.5 TiB 301 GiB 89.34 1.13 160 osd.37 >>>>>> 5 ssd 0.74500 1.00000 745 GiB 642 GiB 103 GiB 86.15 1.09 65 osd.5 >>>>>> -3 28.04495 - 28 TiB 24 TiB 4.5 TiB 84.03 1.06 - host arh-ibstorage2-ib >>>>>> 9 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 405 GiB 85.65 1.08 158 osd.9 >>>>>> 10 hdd 2.73000 0.89999 2.8 TiB 2.4 TiB 352 GiB 87.52 1.10 169 osd.10 >>>>>> 11 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 783 GiB 72.28 0.91 160 osd.11 >>>>>> 12 hdd 2.73000 0.84999 2.8 TiB 2.4 TiB 359 GiB 87.27 1.10 153 osd.12 >>>>>> 13 hdd 2.73000 1.00000 2.8 TiB 2.4 TiB 348 GiB 87.69 1.11 169 osd.13 >>>>>> 14 hdd 2.73000 1.00000 2.8 TiB 2.5 TiB 283 GiB 89.97 1.14 170 osd.14 >>>>>> 15 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 560 GiB 80.18 1.01 155 osd.15 >>>>>> 16 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 332 GiB 88.26 1.11 178 osd.16 >>>>>> 26 hdd 5.45999 1.00000 5.5 TiB 4.4 TiB 1.0 TiB 81.04 1.02 324 osd.26 >>>>>> 7 ssd 0.74500 1.00000 745 GiB 607 GiB 138 GiB 81.48 1.03 62 osd.7 >>>>>> -15 28.04495 - 28 TiB 22 TiB 6.4 TiB 77.40 0.98 - host arh-ibstorage3-ib >>>>>> 18 hdd 2.73000 0.95000 2.8 TiB 2.5 TiB 312 GiB 88.96 1.12 156 osd.18 >>>>>> 19 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 771 GiB 72.68 0.92 162 osd.19 >>>>>> 20 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 733 GiB 74.04 0.93 149 osd.20 >>>>>> 21 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 533 GiB 81.12 1.02 155 osd.21 >>>>>> 22 hdd 2.73000 1.00000 2.8 TiB 2.1 TiB 692 GiB 75.48 0.95 144 osd.22 >>>>>> 23 hdd 2.73000 1.00000 2.8 TiB 1.6 TiB 1.1 TiB 58.43 0.74 130 osd.23 >>>>>> 24 hdd 2.73000 1.00000 2.8 TiB 2.2 TiB 579 GiB 79.51 1.00 146 osd.24 >>>>>> 25 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 886 GiB 68.63 0.87 147 osd.25 >>>>>> 31 hdd 5.45999 1.00000 5.5 TiB 4.7 TiB 758 GiB 86.50 1.09 326 osd.31 >>>>>> 6 ssd 0.74500 0.89999 744 GiB 640 GiB 104 GiB 86.01 1.09 61 osd.6 >>>>>> -17 28.04494 - 28 TiB 22 TiB 6.3 TiB 77.61 0.98 - host arh-ibstorage4-ib >>>>>> 8 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 909 GiB 67.80 0.86 141 osd.8 >>>>>> 17 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 904 GiB 67.99 0.86 144 osd.17 >>>>>> 27 hdd 2.73000 1.00000 2.8 TiB 2.1 TiB 654 GiB 76.84 0.97 152 osd.27 >>>>>> 28 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 481 GiB 82.98 1.05 153 osd.28 >>>>>> 29 hdd 2.73000 1.00000 2.8 TiB 1.9 TiB 829 GiB 70.65 0.89 137 osd.29 >>>>>> 30 hdd 2.73000 1.00000 2.8 TiB 2.0 TiB 762 GiB 73.03 0.92 142 osd.30 >>>>>> 33 hdd 2.73000 1.00000 2.8 TiB 2.3 TiB 501 GiB 82.25 1.04 166 osd.33 >>>>>> 34 hdd 5.45998 1.00000 5.5 TiB 4.5 TiB 968 GiB 82.77 1.04 325 osd.34 >>>>>> 39 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 402 GiB 85.77 1.08 162 osd.39 >>>>>> 38 ssd 0.74500 1.00000 745 GiB 671 GiB 74 GiB 90.02 1.14 68 osd.38 >>>>>> TOTAL 113 TiB 90 TiB 23 TiB 79.25 >>>>>> MIN/MAX VAR: 0.74/1.14 STDDEV: 8.14 >>>>>> # for i in $(radosgw-admin bucket list | jq -r '.[]'); do radosgw-admin >>>>>> bucket >>>>>> stats --bucket=$i | jq '.usage | ."rgw.main" | .size_kb' ; done | awk '{ >>>>>> SUM += >>>>>> $1} END { print SUM/1024/1024/1024 }' >>>>>> 6.59098 >>>>>> # ceph df >>>>>> GLOBAL: >>>>>> SIZE AVAIL RAW USED %RAW USED >>>>>> 113 TiB 23 TiB 90 TiB 79.25 >>>>>> POOLS: >>>>>> NAME ID USED %USED MAX AVAIL OBJECTS >>>>>> Primary-ubuntu-1 5 27 TiB 87.56 3.9 TiB 7302534 >>>>>> .users.uid 15 6.8 KiB 0 3.9 TiB 39 >>>>>> .users 16 335 B 0 3.9 TiB 20 >>>>>> .users.swift 17 14 B 0 3.9 TiB 1 >>>>>> .rgw.buckets 19 15 TiB 79.88 3.9 TiB 8787763 >>>>>> .users.email 22 0 B 0 3.9 TiB 0 >>>>>> .log 24 109 MiB 0 3.9 TiB 102301 >>>>>> .rgw.buckets.extra 37 0 B 0 2.6 TiB 0 >>>>>> .rgw.root 44 2.9 KiB 0 2.6 TiB 16 >>>>>> .rgw.meta 45 1.7 MiB 0 2.6 TiB 6249 >>>>>> .rgw.control 46 0 B 0 2.6 TiB 8 >>>>>> .rgw.gc 47 0 B 0 2.6 TiB 32 >>>>>> .usage 52 0 B 0 2.6 TiB 0 >>>>>> .intent-log 53 0 B 0 2.6 TiB 0 >>>>>> default.rgw.buckets.non-ec 54 0 B 0 2.6 TiB 0 >>>>>> .rgw.buckets.index 55 0 B 0 2.6 TiB 11485 >>>>>> .rgw 56 491 KiB 0 2.6 TiB 1686 >>>>>> Primary-ubuntu-1-ssd 57 1.2 TiB 92.39 105 GiB 379516 >>>>>> I am not too sure if the issue relates to the BlueStore overhead as I >>>>>> would >>>>>> probably have seen the discrepancy in my Primary-ubuntu-1 pool as well. >>>>>> However, the data usage on Primary-ubuntu-1 pool seems to be consistent >>>>>> with my >>>>>> expectations (precise numbers to be verified soon). The issues seems to >>>>>> be only >>>>>> with the .rgw-buckets pool where the "ceph df " output shows 15TB of >>>>>> usage and >>>>>> the sum of all buckets in that pool shows just over 6.5TB. >>>>>> Cheers >>>>>> Andrei >>>>>>> From: "Igor Fedotov" [ mailto:ifedo...@suse.de | <ifedo...@suse.de> ] >>>>>>> To: "andrei" [ mailto:and...@arhont.com | <and...@arhont.com> ] , >>>>>>> "ceph-users" [ >>>>>>> mailto:ceph-users@lists.ceph.com | <ceph-users@lists.ceph.com> ] >>>>>>> Sent: Tuesday, 2 July, 2019 10:58:54 >>>>>>> Subject: Re: [ceph-users] troubleshooting space usage >>>>>>> Hi Andrei, >>>>>>> The most obvious reason is space usage overhead caused by BlueStore >>>>>>> allocation >>>>>>> granularity, e.g. if bluestore_min_alloc_size is 64K and average object >>>>>>> size is >>>>>>> 16K one will waste 48K per object in average. This is rather a >>>>>>> speculation so >>>>>>> far as we lack key the information about your cluster: >>>>>>> - Ceph version >>>>>>> - What are the main devices for OSD: hdd or ssd. >>>>>>> - BlueStore or FileStore. >>>>>>> - average RGW object size. >>>>>>> You might also want to collect and share performance counter dumps >>>>>>> (ceph daemon >>>>>>> osd.N perf dump) and " >>>>>>> " reports from a couple of your OSDs. >>>>>>> Thanks, >>>>>>> Igor >>>>>>> On 7/2/2019 11:43 AM, Andrei Mikhailovsky wrote: >>>>>>>> Bump! >>>>>>>>> From: "Andrei Mikhailovsky" [ mailto:and...@arhont.com | >>>>>>>>> <and...@arhont.com> ] >>>>>>>>> To: "ceph-users" [ mailto:ceph-users@lists.ceph.com | >>>>>>>>> <ceph-users@lists.ceph.com> ] >>>>>>>>> Sent: Friday, 28 June, 2019 14:54:53 >>>>>>>>> Subject: [ceph-users] troubleshooting space usage >>>>>>>>> Hi >>>>>>>>> Could someone please explain / show how to troubleshoot the space >>>>>>>>> usage in Ceph >>>>>>>>> and how to reclaim the unused space? >>>>>>>>> I have a small cluster with 40 osds, replica of 2, mainly used as a >>>>>>>>> backend for >>>>>>>>> cloud stack as well as the S3 gateway. The used space doesn't make >>>>>>>>> any sense to >>>>>>>>> me, especially the rgw pool, so I am seeking help. >>>>>>>>> Here is what I found from the client: >>>>>>>>> Ceph -s shows the >>>>>>>>> usage: 89 TiB used, 24 TiB / 113 TiB avail >>>>>>>>> Ceph df shows: >>>>>>>>> Primary-ubuntu-1 5 27 TiB 90.11 3.0 TiB 7201098 >>>>>>>>> Primary-ubuntu-1-ssd 57 1.2 TiB 89.62 143 GiB 359260 >>>>>>>>> .rgw.buckets 19 15 TiB 83.73 3.0 TiB 8742222 >>>>>>>>> the usage of the Primary-ubuntu-1 and Primary-ubuntu-1-ssd is in line >>>>>>>>> with my >>>>>>>>> expectations. However, the .rgw.buckets pool seems to be using way >>>>>>>>> too much. >>>>>>>>> The usage of all rgw buckets shows 6.5TB usage (looking at the >>>>>>>>> size_kb values >>>>>>>>> from the "radosgw-admin bucket stats"). I am trying to figure out why >>>>>>>>> .rgw.buckets is using 15TB of space instead of the 6.5TB as shown >>>>>>>>> from the >>>>>>>>> bucket usage. >>>>>>>>> Thanks >>>>>>>>> Andrei >>>>>>>>> _______________________________________________ >>>>>>>>> ceph-users mailing list >>>>>>>>> [ mailto:ceph-users@lists.ceph.com | ceph-users@lists.ceph.com ] >>>>>>>>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list [ mailto:ceph-users@lists.ceph.com | >>>>>>>> ceph-users@lists.ceph.com ] [ >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com