Re: [ceph-users] troubleshooting space usage

Igor Fedotov Wed, 03 Jul 2019 05:49:30 -0700

Looks fine - comparing bluestore_allocated vs. bluestore_stored shows alittle difference. So that's not the allocation overhead.


What's about comparing object counts reported by ceph and radosgw tools?



Igor.


On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote:

Thanks Igor, Here is a link to the ceph perf data on several osds.

https://paste.ee/p/IzDMy

In terms of the object sizes. We use rgw to backup the data fromvarious workstations and servers. So, the sizes would be from a few kbto a few gig per individual file.


Cheers



------------------------------------------------------------------------

    *From: *"Igor Fedotov" <ifedo...@suse.de>
    *To: *"andrei" <and...@arhont.com>
    *Cc: *"ceph-users" <ceph-users@lists.ceph.com>
    *Sent: *Wednesday, 3 July, 2019 12:29:33
    *Subject: *Re: [ceph-users] troubleshooting space usage

    Hi Andrei,

    Additionally I'd like to see performance counters dump for a
    couple of HDD OSDs (obtained through 'ceph daemon osd.N perf dump'
    command).

    W.r.t average object size - I was thinking that you might know
    what objects had been uploaded... If not then you might want to
    estimate it by using "rados get" command on the pool: retrieve
    some random object set and check their sizes. But let's check
    performance counters first - most probably they will show loses
    caused by allocation.


    Also I've just found similar issue (still unresolved) in our
    internal tracker - but its root cause is definitely different from
    allocation overhead. Looks like some orphaned objects in the pool.
    Could you please compare and share the amounts of objects in the
    pool reported by "ceph (or rados) df detail" and radosgw tools?


    Thanks,

    Igor


    On 7/3/2019 12:56 PM, Andrei Mikhailovsky wrote:

        Hi Igor,

        Many thanks for your reply. Here are the details about the
        cluster:

        1. Ceph version - 13.2.5-1xenial (installed from Ceph
        repository for ubuntu 16.04)

        2. main devices for radosgw pool - hdd. we do use a few ssds
        for the other pool, but it is not used by radosgw

        3. we use BlueStore

        4. Average rgw object size - I have no idea how to check that.
        Couldn't find a simple answer from google either. Could you
        please let me know how to check that?

        5. Ceph osd df tree:

        6. Other useful info on the cluster:

        # ceph osd df tree
        ID  CLASS WEIGHT    REWEIGHT SIZE    USE AVAIL   %USE  VAR
         PGS TYPE NAME

         -1       112.17979        - 113 TiB  90 TiB  23 TiB 79.25
        1.00   - root uk
         -5       112.17979        - 113 TiB  90 TiB  23 TiB 79.25
        1.00   -     datacenter ldex
        -11       112.17979        - 113 TiB  90 TiB  23 TiB 79.25
        1.00   -         room ldex-dc3
        -13       112.17979        - 113 TiB  90 TiB  23 TiB 79.25
        1.00   -             row row-a
         -4       112.17979        - 113 TiB  90 TiB  23 TiB 79.25
        1.00   -                 rack ldex-rack-a5
         -2        28.04495        -  28 TiB  22 TiB 6.2 TiB 77.96
        0.98   -                     host arh-ibstorage1-ib


          0   hdd   2.73000  0.79999 2.8 TiB 2.3 TiB 519 GiB 81.61
        1.03 145                         osd.0
          1   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 847 GiB 70.00
        0.88 130                         osd.1
         2   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 561 GiB 80.12 1.01
        152                         osd.2
          3   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 469 GiB 83.41
        1.05 160 osd.3
          4   hdd   2.73000  1.00000 2.8 TiB 1.8 TiB 983 GiB 65.18
        0.82 141 osd.4
         32   hdd   5.45999  1.00000 5.5 TiB 4.4 TiB 1.1 TiB 80.68
        1.02 306 osd.32
         35   hdd   2.73000  1.00000 2.8 TiB 1.7 TiB 1.0 TiB 62.89
        0.79 126 osd.35
         36   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 464 GiB 83.58
        1.05 175 osd.36
         37   hdd   2.73000  0.89999 2.8 TiB 2.5 TiB 301 GiB 89.34
        1.13 160 osd.37
          5   ssd   0.74500  1.00000 745 GiB 642 GiB 103 GiB 86.15
        1.09  65 osd.5

         -3        28.04495        -  28 TiB  24 TiB 4.5 TiB 84.03
        1.06   -                     host arh-ibstorage2-ib
          9   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB 405 GiB 85.65
        1.08 158 osd.9
         10   hdd   2.73000  0.89999 2.8 TiB 2.4 TiB 352 GiB 87.52
        1.10 169 osd.10
         11   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 783 GiB 72.28
        0.91 160 osd.11
         12   hdd   2.73000  0.84999 2.8 TiB 2.4 TiB 359 GiB 87.27
        1.10 153 osd.12
         13   hdd   2.73000  1.00000 2.8 TiB 2.4 TiB 348 GiB 87.69
        1.11 169 osd.13
         14   hdd   2.73000  1.00000 2.8 TiB 2.5 TiB 283 GiB 89.97
        1.14 170 osd.14
         15   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 560 GiB 80.18
        1.01 155 osd.15
         16   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB 332 GiB 88.26
        1.11 178 osd.16
         26   hdd   5.45999  1.00000 5.5 TiB 4.4 TiB 1.0 TiB 81.04
        1.02 324 osd.26
          7   ssd   0.74500  1.00000 745 GiB 607 GiB 138 GiB 81.48
        1.03  62 osd.7

        -15        28.04495        -  28 TiB  22 TiB 6.4 TiB 77.40
        0.98   -                     host arh-ibstorage3-ib
         18   hdd   2.73000  0.95000 2.8 TiB 2.5 TiB 312 GiB 88.96
        1.12 156 osd.18
         19   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 771 GiB 72.68
        0.92 162 osd.19
         20   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 733 GiB 74.04
        0.93 149 osd.20
         21   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 533 GiB 81.12
        1.02 155     osd.21
         22   hdd   2.73000  1.00000 2.8 TiB 2.1 TiB 692 GiB 75.48
        0.95 144     osd.22
         23   hdd   2.73000  1.00000 2.8 TiB 1.6 TiB 1.1 TiB 58.43
        0.74 130     osd.23
         24   hdd   2.73000  1.00000 2.8 TiB 2.2 TiB 579 GiB 79.51
        1.00 146     osd.24
         25   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 886 GiB 68.63
        0.87 147     osd.25
         31   hdd   5.45999  1.00000 5.5 TiB 4.7 TiB 758 GiB 86.50
        1.09 326     osd.31
          6   ssd   0.74500  0.89999 744 GiB 640 GiB 104 GiB 86.01
        1.09  61     osd.6

        -17        28.04494        -  28 TiB  22 TiB 6.3 TiB 77.61
        0.98   - host arh-ibstorage4-ib
          8   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 909 GiB 67.80
        0.86 141     osd.8
         17   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 904 GiB 67.99
        0.86 144     osd.17
         27   hdd   2.73000  1.00000 2.8 TiB 2.1 TiB 654 GiB 76.84
        0.97 152     osd.27
         28   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 481 GiB 82.98
        1.05 153     osd.28
         29   hdd   2.73000  1.00000 2.8 TiB 1.9 TiB 829 GiB 70.65
        0.89 137       osd.29
         30   hdd   2.73000  1.00000 2.8 TiB 2.0 TiB 762 GiB 73.03
        0.92 142       osd.30
         33   hdd   2.73000  1.00000 2.8 TiB 2.3 TiB 501 GiB 82.25
        1.04 166       osd.33
         34   hdd   5.45998  1.00000 5.5 TiB 4.5 TiB 968 GiB 82.77
        1.04 325       osd.34
         39   hdd   2.73000  0.95000 2.8 TiB 2.4 TiB 402 GiB 85.77
        1.08 162       osd.39
         38   ssd   0.74500  1.00000 745 GiB 671 GiB  74 GiB 90.02
        1.14  68       osd.38
                               TOTAL 113 TiB  90 TiB  23 TiB 79.25
        MIN/MAX VAR: 0.74/1.14  STDDEV: 8.14



        # for i in $(radosgw-admin bucket list | jq -r '.[]'); do
        radosgw-admin bucket stats --bucket=$i | jq '.usage |
        ."rgw.main" | .size_kb' ; done | awk '{ SUM += $1} END { print
        SUM/1024/1024/1024 }'
        6.59098


        # ceph df


        GLOBAL:
            SIZE        AVAIL      RAW USED     %RAW USED
            113 TiB     23 TiB       90 TiB 79.25

        POOLS:
            NAME                           ID     USED      %USED    
        MAX AVAIL     OBJECTS
            Primary-ubuntu-1               5       27 TiB     87.56  
            3.9 TiB     7302534
            .users.uid                     15     6.8 KiB         0  
            3.9 TiB          39
            .users                         16       335 B         0  
            3.9 TiB          20
            .users.swift                   17        14 B         0  
            3.9 TiB           1
        *  .rgw.buckets                   19      15 TiB     79.88    
          3.9 TiB     8787763*
            .users.email                   22         0 B         0  
            3.9 TiB           0
            .log                           24     109 MiB         0  
            3.9 TiB      102301
            .rgw.buckets.extra             37         0 B         0  
            2.6 TiB           0
            .rgw.root                      44     2.9 KiB         0  
            2.6 TiB          16
            .rgw.meta                      45     1.7 MiB         0  
            2.6 TiB        6249
            .rgw.control                   46         0 B         0  
            2.6 TiB           8
            .rgw.gc                        47         0 B         0  
            2.6 TiB          32
            .usage                         52         0 B         0  
            2.6 TiB           0
            .intent-log                    53         0 B         0  
            2.6 TiB           0
            default.rgw.buckets.non-ec     54         0 B         0  
            2.6 TiB           0
            .rgw.buckets.index             55         0 B         0  
            2.6 TiB       11485
            .rgw                           56     491 KiB         0  
            2.6 TiB        1686
            Primary-ubuntu-1-ssd           57     1.2 TiB     92.39  
            105 GiB      379516


        I am not too sure if the issue relates to the BlueStore
        overhead as I would probably have seen the discrepancy in my
        Primary-ubuntu-1 pool as well. However, the data usage on
        Primary-ubuntu-1 pool seems to be consistent with my
        expectations (precise numbers to be verified soon). The issues
        seems to be only with the .rgw-buckets pool where the "ceph df
        " output shows 15TB of usage and the sum of all buckets in
        that pool shows just over 6.5TB.

        Cheers

        Andrei


        ------------------------------------------------------------------------

            *From: *"Igor Fedotov" <ifedo...@suse.de>
            *To: *"andrei" <and...@arhont.com>, "ceph-users"
            <ceph-users@lists.ceph.com>
            *Sent: *Tuesday, 2 July, 2019 10:58:54
            *Subject: *Re: [ceph-users] troubleshooting space usage

            Hi Andrei,

            The most obvious reason is space usage overhead caused by
            BlueStore allocation granularity, e.g. if
            bluestore_min_alloc_size is 64K  and average object size
            is 16K one will waste 48K per object in average. This is
            rather a speculation so far as we lack key the information
            about your cluster:

            - Ceph version

            - What are the main devices for OSD: hdd or ssd.

            - BlueStore or FileStore.

            - average RGW object size.

            You might also want to collect and share performance
            counter dumps (ceph daemon osd.N perf dump) and "

            " reports from a couple of your OSDs.


            Thanks,

            Igor


            On 7/2/2019 11:43 AM, Andrei Mikhailovsky wrote:

                Bump!


                
------------------------------------------------------------------------

                    *From: *"Andrei Mikhailovsky" <and...@arhont.com>
                    *To: *"ceph-users" <ceph-users@lists.ceph.com>
                    *Sent: *Friday, 28 June, 2019 14:54:53
                    *Subject: *[ceph-users] troubleshooting space usage

                    Hi

                    Could someone please explain / show how to
                    troubleshoot the space usage in Ceph and how to
                    reclaim the unused space?

                    I have a small cluster with 40 osds, replica of 2,
                    mainly used as a backend for cloud stack as well
                    as the S3 gateway. The used space doesn't make any
                    sense to me, especially the rgw pool, so I am
                    seeking help.

                    Here is what I found from the client:

                    Ceph -s shows the

                     usage:   89 TiB used, 24 TiB / 113 TiB avail

                    Ceph df shows:

                    Primary-ubuntu-1               5   27 TiB    
                    90.11       3.0 TiB 7201098
                    Primary-ubuntu-1-ssd           57 1.2 TiB    
                    89.62       143 GiB  359260
                    .rgw.buckets                   19      15 TiB
                    83.73       3.0 TiB     8742222

                    the usage of the Primary-ubuntu-1 and
                    Primary-ubuntu-1-ssd is in line with my
                    expectations. However, the .rgw.buckets pool seems
                    to be using way too much. The usage of all rgw
                    buckets shows 6.5TB usage (looking at the size_kb
                    values from the "radosgw-admin bucket stats"). I
                    am trying to figure out why .rgw.buckets is using
                    15TB of space instead of the 6.5TB as shown from
                    the bucket usage.

                    Thanks

                    Andrei

                    _______________________________________________
                    ceph-users mailing list
                    ceph-users@lists.ceph.com
                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


                _______________________________________________
                ceph-users mailing list
                ceph-users@lists.ceph.com
                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] troubleshooting space usage

Reply via email to