Re: [ceph-users] Understanding reshard issues

2017-12-14 Thread Graham Allan



On 12/14/2017 04:00 AM, Martin Emrich wrote:

Hi!

Am 13.12.17 um 20:50 schrieb Graham Allan:
After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the 
same issues reported earlier on the list under "rgw resharding 
operation seemingly won't end". 


Yes, that were/are my threads, I also have this issue.


I was able to correct the buckets using "radosgw-admin bucket check 
--fix" command, and later disabled the auto resharding.


Were you able to manually reshard a bucket after the "--fix"? Here, 
after a bucket was damaged once, the manual reshard process will freeze.


Interesting... the test bucket I tried to reshard below was one that had 
previously needed "bucket check --fix".


I just tried the same thing on another old (and small, ~100 object) 
bucket which had not previously seen problems - I got the same hang.


Although, I was doing a "reshard add" and "reshard execute" on the 
bucket which I guess is more of a manually triggered automatic reshard, 
as opposed to a true manual "bucket reshard" command. Having said that, 
the manual "bucket reshard" command also now freezes on that bucket.


As an experiment, I selected an unsharded bucket to attempt a manual 
reshard. I added it the reshard list ,then ran "radosgw-admin reshard 
execute". The bucket in question contains 184000 objects and was being 
converted from 1 to 3 shards.


I'm trying to understand what I found...

1) the "radosgw-admin reshard execute" never returned. Somehow I 
expected it to kick off a background operation, but possibly this was 
mistaken.


Yes, same behaviour here. Someone on the list mentioned that resharding 
should actually happen quite fast (at most a few minutes).


So there's clearly something wrong here, and I am glad I am not the only 
one experiencing it.


To compare: What is your infrastructure? mine is:

* three beefy hosts (64GB RAM) with 4 OSDs each for data (HDD), and 2 
OSDs each on SSDs for the index.

* all bluestore (DB/WAL for the HDD OSDs also on SSD partitions)
* radosgw runs on each of these OSD hosts (as they are mostly idling, I 
see no cause for my poor performance in running the rados gateways on 
the OSD hosts)

* 3 separate monitor/mgr hosts
* OS is CentOS 7, running Ceph 12.2.2
* We use several buckets, all with Versioning enabled, for many (100k to 
12M) rather small objects.


This cluster has been around for some time (since firefly), and is 
running ubuntu 14.04. I will be converting it to Centos 7 over the next 
few weeks or months. It's only used for object store, no rbd or cephfs.


3 dedicated mons
9 large osd nodes with ~60x 6TB osds each, plus a handful of SSDs
4 radosgw nodes (2 ubuntu, 2 centos 7)

The radosgw main storage pools are ec42 filestore spinning drives, the 
indexes are on 3-way replicated filestore ssds.


--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu

--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding reshard issues

2017-12-14 Thread Martin Emrich

Hi!

Am 13.12.17 um 20:50 schrieb Graham Allan:
After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same 
issues reported earlier on the list under "rgw resharding operation 
seemingly won't end". 


Yes, that were/are my threads, I also have this issue.


I was able to correct the buckets using "radosgw-admin bucket check 
--fix" command, and later disabled the auto resharding.


Were you able to manually reshard a bucket after the "--fix"? Here, 
after a bucket was damaged once, the manual reshard process will freeze.


As an experiment, I selected an unsharded bucket to attempt a manual 
reshard. I added it the reshard list ,then ran "radosgw-admin reshard 
execute". The bucket in question contains 184000 objects and was being 
converted from 1 to 3 shards.


I'm trying to understand what I found...

1) the "radosgw-admin reshard execute" never returned. Somehow I 
expected it to kick off a background operation, but possibly this was 
mistaken.


Yes, same behaviour here. Someone on the list mentioned that resharding 
should actually happen quite fast (at most a few minutes).


So there's clearly something wrong here, and I am glad I am not the only 
one experiencing it.


To compare: What is your infrastructure? mine is:

* three beefy hosts (64GB RAM) with 4 OSDs each for data (HDD), and 2 
OSDs each on SSDs for the index.

* all bluestore (DB/WAL for the HDD OSDs also on SSD partitions)
* radosgw runs on each of these OSD hosts (as they are mostly idling, I 
see no cause for my poor performance in running the rados gateways on 
the OSD hosts)

* 3 separate monitor/mgr hosts
* OS is CentOS 7, running Ceph 12.2.2
* We use several buckets, all with Versioning enabled, for many (100k to 
12M) rather small objects.


pool settings:
# ceph osd pool ls detail
pool 1 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 174 lfor 0/172 flags 
hashpspool stripe_width 0
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 8 pgp_num 8 last_change 842 owner 18446744073709551615 
flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 843 owner 
18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 128 pgp_num 128 last_change 950 lfor 0/948 
owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 8 pgp_num 8 last_change 845 owner 
18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.buckets.index' replicated size 3 min_size 2 
crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 last_change 846 
owner 18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 7 'default.rgw.buckets.data' replicated size 3 min_size 2 
crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 847 
lfor 0/246 owner 18446744073709551615 flags hashpspool stripe_width 0 
application rgw
pool 8 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 
crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 849 
flags hashpspool stripe_width 0 application rgw


Regards,

Martin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Understanding reshard issues

2017-12-13 Thread Graham Allan
After our Jewel to Luminous 12.2.2 upgrade, I ran into some of the same 
issues reported earlier on the list under "rgw resharding operation 
seemingly won't end". Some buckets were automatically added to the 
reshard list, and something happened overnight such that they couldn't 
be written to. A couple of our radosgw nodes hung due to inadequate 
limits on file handles, so that might possibly have been a cause.


I was able to correct the buckets using "radosgw-admin bucket check 
--fix" command, and later disabled the auto resharding.


As an experiment, I selected an unsharded bucket to attempt a manual 
reshard. I added it the reshard list ,then ran "radosgw-admin reshard 
execute". The bucket in question contains 184000 objects and was being 
converted from 1 to 3 shards.


I'm trying to understand what I found...

1) the "radosgw-admin reshard execute" never returned. Somehow I 
expected it to kick off a background operation, but possibly this was 
mistaken.


2) After 2 days it was still running. Is there any way to check 
progress? Such as querying something about the "new_bucket_instance_id" 
reported by "reshard status"?


3) When I tested uploading an object to the bucket I got an error - the 
client reported response code "UnknownError" - while radosgw logged:



2017-12-13 10:56:44.486131 7f02b2985700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2017-12-13 10:56:44.488657 7f02b2985700  0 NOTICE: resharding operation on 
bucket index detected, blocking


But the introduction to dynamic resharding says that "there is no need 
to stop IO operations that go to the bucket (although some concurrent 
operations may experience additional latency when resharding is in 
progress)" - so I feel sure something must be wrong here.


I'd like to get a feel for how long it might take to reshard a smallish 
bucket of this sort, and whether it can be done without making it 
unwriteable, before considering how to handle our older and more 
pathological buckets (multi-million objects in a single shard).


Thanks for any pointers,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com