Hi, Thanks for the response - I am still unsure as to what will happen to the "marker" reference in the bucket metadata, as this is the object that is being detected as Large. Will the bucket generate a new "marker" reference in the bucket metadata?
I've been reading this page to try and get a better understanding of this http://docs.ceph.com/docs/luminous/radosgw/layout/ However I'm no clearer on this (and what the "marker" is used for), or why there are multiple separate "bucket_id" values (with different mtime stamps) that all show as having the same number of shards. If I were to remove the old bucket would I just be looking to execute rados - p .rgw.buckets.index rm .dir.default.5689810.107 Is the differing marker/bucket_id in the other buckets that was found also an indicator? As I say, there's a good number of these, here's some additional examples, though these aren't necessarily reporting as large omap objects: "BUCKET1", "default.281853840.479", "default.105206134.5", "BUCKET2", "default.364663174.1", "default.349712129.3674", Checking these other buckets, they are exhibiting the same sort of symptoms as the first (multiple instances of radosgw-admin metadata get showing what seem to be multiple resharding processes being run, with different mtimes recorded). Thanks Chris On Thu, 4 Oct 2018 at 16:21 Konstantin Shalygin <k0...@k0ste.ru> wrote: > Hi, > > Ceph version: Luminous 12.2.7 > > Following upgrading to Luminous from Jewel we have been stuck with a > cluster in HEALTH_WARN state that is complaining about large omap objects. > These all seem to be located in our .rgw.buckets.index pool. We've > disabled auto resharding on bucket indexes due to seeming looping issues > after our upgrade. We've reduced the number reported of reported large > omap objects by initially increasing the following value: > > ~# ceph daemon mon.ceph-mon-1 config get > osd_deep_scrub_large_omap_object_value_sum_threshold > { > "osd_deep_scrub_large_omap_object_value_sum_threshold": "2147483648 > <(214)%20748-3648>" > } > > However we're still getting a warning about a single large OMAP object, > however I don't believe this is related to an unsharded index - here's the > log entry: > > 2018-10-01 13:46:24.427213 osd.477 osd.477 172.26.216.6:6804/2311858 8482 : > cluster [WRN] Large omap object found. Object: > 15:333d5ad7:::.dir.default.5689810.107:head Key count: 17467251 Size > (bytes): 4458647149 <(445)%20864-7149> > > The object in the logs is the "marker" object, rather than the bucket_id - > I've put some details regarding the bucket here: > https://pastebin.com/hW53kTxL > > The bucket limit check shows that the index is sharded, so I think this > might be related to versioning, although I was unable to get confirmation > that the bucket in question has versioning enabled through the aws > cli(snipped debug output below) > > 2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response > headers: {'date': 'Tue, 02 Oct 2018 14:11:17 GMT', 'content-length': '137', > 'x-amz-request-id': 'tx0000000000000020e3b15-005bb37c85-15870fe0-default', > 'content-type': 'application/xml'} > 2018-10-02 15:11:17,530 - MainThread - botocore.parsers - DEBUG - Response > body: > <?xml version="1.0" encoding="UTF-8"?><VersioningConfiguration > xmlns="http://s3.amazonaws.com/doc/2006-03-01/"></VersioningConfiguration> > > After dumping the contents of large omap object mentioned above into a file > it does seem to be a simple listing of the bucket contents, potentially an > old index: > > ~# wc -l omap_keys > 17467251 omap_keys > > This is approximately 5 million below the currently reported number of > objects in the bucket. > > When running the commands listed > here:http://tracker.ceph.com/issues/34307#note-1 > > The problematic bucket is listed in the output (along with 72 other > buckets): > "CLIENTBUCKET", "default.294495648.690", "default.5689810.107" > > As this tests for bucket_id and marker fields not matching to print out the > information, is the implication here that both of these should match in > order to fully migrate to the new sharded index? > > I was able to do a "metadata get" using what appears to be the old index > object ID, which seems to support this (there's a "new_bucket_instance_id" > field, containing a newer "bucket_id" and reshard_status is 2, which seems > to suggest it has completed). > > I am able to take the "new_bucket_instance_id" and get additional metadata > about the bucket, each time I do this I get a slightly newer > "new_bucket_instance_id", until it stops suggesting updated indexes. > > It's probably worth pointing out that when going through this process the > final "bucket_id" doesn't match the one that I currently get when running > 'radosgw-admin bucket stats --bucket "CLIENTBUCKET"', even though it also > suggests that no further resharding has been done as "reshard_status" = 0 > and "new_bucket_instance_id" is blank. The output is available to view > here: > https://pastebin.com/g1TJfKLU > > It would be useful if anyone can offer some clarification on how to proceed > from this situation, identifying and removing any old/stale indexes from > the index pool (if that is the case), as I've not been able to spot > anything in the archives. > > If there's any further information that is needed for additional context > please let me know. > > > Usually, when you bucket is automatically resharded in some case old big > index is not deleted - this is your large omap object. > > This index is safe to delete. Also look at [1]. > > > [1] https://tracker.ceph.com/issues/24457 > > > > k >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com