Re: [ceph-users] Understanding reshard issues
On 12/14/2017 04:00 AM, Martin Emrich wrote: Hi! Am 13.12.17 um 20:50 schrieb Graham Allan: After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same issues reported earlier on the list under "rgw resharding operation seemingly won't end". Yes, that were/are my threads, I also have this issue. I was able to correct the buckets using "radosgw-admin bucket check --fix" command, and later disabled the auto resharding. Were you able to manually reshard a bucket after the "--fix"? Here, after a bucket was damaged once, the manual reshard process will freeze. Interesting... the test bucket I tried to reshard below was one that had previously needed "bucket check --fix". I just tried the same thing on another old (and small, ~100 object) bucket which had not previously seen problems - I got the same hang. Although, I was doing a "reshard add" and "reshard execute" on the bucket which I guess is more of a manually triggered automatic reshard, as opposed to a true manual "bucket reshard" command. Having said that, the manual "bucket reshard" command also now freezes on that bucket. As an experiment, I selected an unsharded bucket to attempt a manual reshard. I added it the reshard list ,then ran "radosgw-admin reshard execute". The bucket in question contains 184000 objects and was being converted from 1 to 3 shards. I'm trying to understand what I found... 1) the "radosgw-admin reshard execute" never returned. Somehow I expected it to kick off a background operation, but possibly this was mistaken. Yes, same behaviour here. Someone on the list mentioned that resharding should actually happen quite fast (at most a few minutes). So there's clearly something wrong here, and I am glad I am not the only one experiencing it. To compare: What is your infrastructure? mine is: * three beefy hosts (64GB RAM) with 4 OSDs each for data (HDD), and 2 OSDs each on SSDs for the index. * all bluestore (DB/WAL for the HDD OSDs also on SSD partitions) * radosgw runs on each of these OSD hosts (as they are mostly idling, I see no cause for my poor performance in running the rados gateways on the OSD hosts) * 3 separate monitor/mgr hosts * OS is CentOS 7, running Ceph 12.2.2 * We use several buckets, all with Versioning enabled, for many (100k to 12M) rather small objects. This cluster has been around for some time (since firefly), and is running ubuntu 14.04. I will be converting it to Centos 7 over the next few weeks or months. It's only used for object store, no rbd or cephfs. 3 dedicated mons 9 large osd nodes with ~60x 6TB osds each, plus a handful of SSDs 4 radosgw nodes (2 ubuntu, 2 centos 7) The radosgw main storage pools are ec42 filestore spinning drives, the indexes are on 3-way replicated filestore ssds. -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Understanding reshard issues
Hi! Am 13.12.17 um 20:50 schrieb Graham Allan: After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same issues reported earlier on the list under "rgw resharding operation seemingly won't end". Yes, that were/are my threads, I also have this issue. I was able to correct the buckets using "radosgw-admin bucket check --fix" command, and later disabled the auto resharding. Were you able to manually reshard a bucket after the "--fix"? Here, after a bucket was damaged once, the manual reshard process will freeze. As an experiment, I selected an unsharded bucket to attempt a manual reshard. I added it the reshard list ,then ran "radosgw-admin reshard execute". The bucket in question contains 184000 objects and was being converted from 1 to 3 shards. I'm trying to understand what I found... 1) the "radosgw-admin reshard execute" never returned. Somehow I expected it to kick off a background operation, but possibly this was mistaken. Yes, same behaviour here. Someone on the list mentioned that resharding should actually happen quite fast (at most a few minutes). So there's clearly something wrong here, and I am glad I am not the only one experiencing it. To compare: What is your infrastructure? mine is: * three beefy hosts (64GB RAM) with 4 OSDs each for data (HDD), and 2 OSDs each on SSDs for the index. * all bluestore (DB/WAL for the HDD OSDs also on SSD partitions) * radosgw runs on each of these OSD hosts (as they are mostly idling, I see no cause for my poor performance in running the rados gateways on the OSD hosts) * 3 separate monitor/mgr hosts * OS is CentOS 7, running Ceph 12.2.2 * We use several buckets, all with Versioning enabled, for many (100k to 12M) rather small objects. pool settings: # ceph osd pool ls detail pool 1 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 174 lfor 0/172 flags hashpspool stripe_width 0 pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 842 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 843 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 950 lfor 0/948 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 845 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 6 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 846 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 7 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 847 lfor 0/246 owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw pool 8 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 849 flags hashpspool stripe_width 0 application rgw Regards, Martin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Understanding reshard issues
After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same issues reported earlier on the list under "rgw resharding operation seemingly won't end". Some buckets were automatically added to the reshard list, and something happened overnight such that they couldn't be written to. A couple of our radosgw nodes hung due to inadequate limits on file handles, so that might possibly have been a cause. I was able to correct the buckets using "radosgw-admin bucket check --fix" command, and later disabled the auto resharding. As an experiment, I selected an unsharded bucket to attempt a manual reshard. I added it the reshard list ,then ran "radosgw-admin reshard execute". The bucket in question contains 184000 objects and was being converted from 1 to 3 shards. I'm trying to understand what I found... 1) the "radosgw-admin reshard execute" never returned. Somehow I expected it to kick off a background operation, but possibly this was mistaken. 2) After 2 days it was still running. Is there any way to check progress? Such as querying something about the "new_bucket_instance_id" reported by "reshard status"? 3) When I tested uploading an object to the bucket I got an error - the client reported response code "UnknownError" - while radosgw logged: 2017-12-13 10:56:44.486131 7f02b2985700 0 block_while_resharding ERROR: bucket is still resharding, please retry 2017-12-13 10:56:44.488657 7f02b2985700 0 NOTICE: resharding operation on bucket index detected, blocking But the introduction to dynamic resharding says that "there is no need to stop IO operations that go to the bucket (although some concurrent operations may experience additional latency when resharding is in progress)" - so I feel sure something must be wrong here. I'd like to get a feel for how long it might take to reshard a smallish bucket of this sort, and whether it can be done without making it unwriteable, before considering how to handle our older and more pathological buckets (multi-million objects in a single shard). Thanks for any pointers, Graham -- Graham Allan Minnesota Supercomputing Institute - g...@umn.edu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com