Re: [ceph-users] Understanding reshard issues

Martin Emrich Thu, 14 Dec 2017 02:01:47 -0800

Hi!

Am 13.12.17 um 20:50 schrieb Graham Allan:

After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the sameissues reported earlier on the list under "rgw resharding operationseemingly won't end".


Yes, that were/are my threads, I also have this issue.

I was able to correct the buckets using "radosgw-admin bucket check--fix" command, and later disabled the auto resharding.

Were you able to manually reshard a bucket after the "--fix"? Here,after a bucket was damaged once, the manual reshard process will freeze.

As an experiment, I selected an unsharded bucket to attempt a manualreshard. I added it the reshard list ,then ran "radosgw-admin reshardexecute". The bucket in question contains 184000 objects and was beingconverted from 1 to 3 shards.
I'm trying to understand what I found...
1) the "radosgw-admin reshard execute" never returned. Somehow Iexpected it to kick off a background operation, but possibly this wasmistaken.

Yes, same behaviour here. Someone on the list mentioned that reshardingshould actually happen quite fast (at most a few minutes).

So there's clearly something wrong here, and I am glad I am not the onlyone experiencing it.


To compare: What is your infrastructure? mine is:

* three beefy hosts (64GB RAM) with 4 OSDs each for data (HDD), and 2OSDs each on SSDs for the index.

* all bluestore (DB/WAL for the HDD OSDs also on SSD partitions)

* radosgw runs on each of these OSD hosts (as they are mostly idling, Isee no cause for my poor performance in running the rados gateways onthe OSD hosts)

* 3 separate monitor/mgr hosts
* OS is CentOS 7, running Ceph 12.2.2

* We use several buckets, all with Versioning enabled, for many (100k to12M) rather small objects.


pool settings:
# ceph osd pool ls detail

pool 1 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hashrjenkins pg_num 256 pgp_num 256 last_change 174 lfor 0/172 flagshashpspool stripe_width 0pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hashrjenkins pg_num 8 pgp_num 8 last_change 842 owner 18446744073709551615flags hashpspool stripe_width 0 application rgwpool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1object_hash rjenkins pg_num 8 pgp_num 8 last_change 843 owner18446744073709551615 flags hashpspool stripe_width 0 application rgwpool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1object_hash rjenkins pg_num 128 pgp_num 128 last_change 950 lfor 0/948owner 18446744073709551615 flags hashpspool stripe_width 0 application rgwpool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1object_hash rjenkins pg_num 8 pgp_num 8 last_change 845 owner18446744073709551615 flags hashpspool stripe_width 0 application rgwpool 6 'default.rgw.buckets.index' replicated size 3 min_size 2crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 846owner 18446744073709551615 flags hashpspool stripe_width 0 application rgwpool 7 'default.rgw.buckets.data' replicated size 3 min_size 2crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 847lfor 0/246 owner 18446744073709551615 flags hashpspool stripe_width 0application rgwpool 8 'default.rgw.buckets.non-ec' replicated size 3 min_size 2crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 849flags hashpspool stripe_width 0 application rgw


Regards,

Martin
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Understanding reshard issues

Reply via email to