In luminous they consolidated a lot of the rgw metadata pools by using
namespace inside of the pools. I would say that the GC pool was
consolidated into the log pool based on the correlation you've found with
the primary osds.  At least that mystery is solved as to why those 8 osds.
I don't know why there logs are being spammed with GC messages though.
Hopefully someone else can shed a light on that. I cc'd Yehuda on this, the
primary RGW dev.

On Mon, Aug 20, 2018, 7:09 AM Jakub Jaszewski <jaszewski.ja...@gmail.com>
wrote:

> Hi David,
>
> Right, we use this cluster (v12.2.5, fresh installation) for RGW, however,
> I don't see default.rgw.gc pool like we have on other cluster which was
> upgraded to Luminous, 10.2.9 -> 10.2.10 -> 12.2.2 (I believe that
> default.rgw.gc pool is there from the time of setting up RGW on Jewel
> version and the pool was automatically created).
>
> On impacted cluster we have below pools
> pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 1499 flags hashpspool stripe_width
> 0 application rgw
> pool 2 'rbd' replicated size 3 min_size 2 crush_rule 2 object_hash
> rjenkins pg_num 2048 pgp_num 2048 last_change 2230 flags hashpspool
> stripe_width 0 application rbd
> pool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 2
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1501 flags hashpspool
> stripe_width 0 application rgw
> pool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 2
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1491 flags hashpspool
> stripe_width 0 application rgw
> pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 2
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1486 flags hashpspool
> stripe_width 0 application rgw
> pool 7 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
> 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1483 owner
> 18446744073709551615 flags hashpspool stripe_width 0 application rgw
> pool 8 'default.rgw.buckets.data' erasure size 12 min_size 10 crush_rule 3
> object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2228 flags
> hashpspool max_bytes 879609302220800 stripe_width 36864 application rgw
> pool 9 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
> crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1506 owner
> 18446744073709551615 flags hashpspool stripe_width 0 application rgw
>
> Eight annoying OSDs match to primary OSDs of PGs that make default.rgw.log
> pool.
>
> Many thanks
> Jakub
>
>
> On Mon, Aug 20, 2018 at 11:54 AM David Turner <drakonst...@gmail.com>
> wrote:
>
>> I'm assuming you use RGW and that you have a GC pool for RGW. It also
>> might beat assumed that your GC pool only has 8 PGs.  Are any of those
>> guesses correct?
>>
>> On Mon, Aug 20, 2018, 5:13 AM Jakub Jaszewski <jaszewski.ja...@gmail.com>
>> wrote:
>>
>>> Issue tracker http://tracker.ceph.com/issues/23801.
>>> Still don't know why only particular OSDs write this information to log
>>> files.
>>>
>>> Jakub
>>>
>>> On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski <
>>> jaszewski.ja...@gmail.com> wrote:
>>>
>>>> Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
>>>> collection objects to process
>>>>
>>>> Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key="
>>>> entries per OSD log file
>>>> hostA:
>>>>   /var/log/ceph/ceph-osd.58.log
>>>>   1826467
>>>> hostB:
>>>>   /var/log/ceph/ceph-osd.88.log
>>>>   2924241
>>>> hostC:
>>>>   /var/log/ceph/ceph-osd.153.log
>>>>   581002
>>>>   /var/log/ceph/ceph-osd.164.log
>>>>   3278606
>>>> hostD:
>>>>   /var/log/ceph/ceph-osd.95.log
>>>>   1426963
>>>> hostE:
>>>>   /var/log/ceph/ceph-osd.4.log
>>>>   2716914
>>>>   /var/log/ceph/ceph-osd.53.log
>>>>   943749
>>>> hostF:
>>>>   /var/log/ceph/ceph-osd.172.log
>>>>   4085334
>>>>
>>>>
>>>> # radosgw-admin gc list --include-all|grep oid |wc -l
>>>> 302357
>>>> #
>>>>
>>>> Can anyone please explain what is going on ?
>>>>
>>>> Thanks!
>>>> Jakub
>>>>
>>>> On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski <
>>>> jaszewski.ja...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of
>>>>> records like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the
>>>>> corresponding log files, e.g.
>>>>>
>>>>> 2018-08-07 04:34:06.000585 7fdd8f012700  0 <cls>
>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>> end_key=1_01533616446.000580407
>>>>> 2018-08-07 04:34:06.001888 7fdd8f012700  0 <cls>
>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>> end_key=1_01533616446.001886318
>>>>> 2018-08-07 04:34:06.003395 7fdd8f012700  0 <cls>
>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>> end_key=1_01533616446.003390299
>>>>> 2018-08-07 04:34:06.005205 7fdd8f012700  0 <cls>
>>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>>> end_key=1_01533616446.005200341
>>>>>
>>>>> # grep '2018-08-07 04:34:06' /var/log/ceph/ceph-osd.4.log |wc -l
>>>>> 712
>>>>> #
>>>>>
>>>>> At the same time there were like 500 000 expired garbage collection
>>>>> objects.
>>>>>
>>>>> Log level of OSD subsystem is set to default 1/5 across all OSDs.
>>>>>
>>>>> I wonder why only few OSDs record this information and is it something
>>>>> to be logged in log level = 1 or maybe higher?
>>>>> https://github.com/ceph/ceph/blob/v12.2.5/src/cls/rgw/cls_rgw.cc#L3284
>>>>>
>>>>> Thanks
>>>>> Jakub
>>>>>
>>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to