[ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?

Graham Allan Fri, 16 Feb 2018 17:07:07 -0800

Sorry to be posting a second mystery at the same time - though thisfeels unconnected to my other one.

We had a user complain that they can't list the contents of one of theirbuckets (they can access certain objects within the bucket).


I started by running a simple command to get data on the bucket:

root@cephmon1:~# radosgw-admin bucket stats --bucket=mccuelab
error getting bucket stats ret=-2

not encouraging, but struck a memory... we had a bucket some time agowhich had 32 index shards but the metadata showed num_shards=0... gavethe same error. So looking at bucket metadata...

root@cephmon1:~# radosgw-admin metadata get 
bucket.instance:mccuelab:default.2049236.2
{
    "key": "bucket.instance:mccuelab:default.2049236.2",
    "ver": {
        "tag": "_pOR6OLmXKQxYuFBa0E-eEmK",
        "ver": 17
    },
    "mtime": "2018-02-15 17:50:28.225135Z",
    "data": {
        "bucket_info": {
            "bucket": {
                "name": "mccuelab",
                "marker": "default.2049236.2",
                "bucket_id": "default.2049236.2",
                "tenant": "",
                "explicit_placement": {
                    "data_pool": ".rgw.buckets",
                    "data_extra_pool": "",
                    "index_pool": ".rgw.buckets"
                }
            },
            "creation_time": "0.000000",
            "owner": "uid=12093",
            "flags": 0,
            "zonegroup": "default",
            "placement_rule": "",
            "has_instance_obj": "true",
            "quota": {
                "enabled": false,
                "check_on_raw": false,
                "max_size": -1024,
                "max_size_kb": 0,
                "max_objects": -1
            },
            "num_shards": 32,
            "bi_shard_hash_type": 0,
            "requester_pays": "false",
            "has_website": "false",
            "swift_versioning": "false",
            "swift_ver_location": "",
            "index_type": 0,
            "mdsearch_config": [],
            "reshard_status": 0,
            "new_bucket_instance_id": ""
        },
        "attrs": [
            {
                "key": "user.rgw.acl",
                "val": 
"AgKxAAAAAwImAAAACQAAAHVpZD0xMjA5MxUAAABSb2JlcnQgSmFtZXMgU2NoYWVmZXIEA38AAAABAQAAAAkAAAB1aWQ9MTIwOTMPAAAAAQAAAAkAAAB1aWQ9MTIwOTMFA0oAAAACAgQAAAAAAAAACQAAAHVpZD0xMjA5MwAAAAAAAAAAAgIEAAAADwAAABUAAABSb2JlcnQgSmFtZXMgU2NoYWVmZXIAAAAAAAAAAAAAAAAAAAAA"
            },
            {
                "key": "user.rgw.idtag",
                "val": ""
            }
        ]
    }
}


But, the index pool doesn't contain all of these - only 15 shards:

root@cephmon1:~# rados -p .rgw.buckets.index ls - | grep "default.2049236.2"
.dir.default.2049236.2.22
.dir.default.2049236.2.3
.dir.default.2049236.2.10
.dir.default.2049236.2.31
.dir.default.2049236.2.12
.dir.default.2049236.2.0
.dir.default.2049236.2.18
.dir.default.2049236.2.13
.dir.default.2049236.2.16
.dir.default.2049236.2.11
.dir.default.2049236.2.23
.dir.default.2049236.2.17
.dir.default.2049236.2.9
.dir.default.2049236.2.29
.dir.default.2049236.2.24

But, wait a minute - I wasn't reading carefully. this is a really oldbucket... the data pool and index pool are both set to .rgw.buckets

OK, so I check for index objects in the .rgw.buckets pool, where shards0..31 are all present - that's good.


so why do any index objects even exist in the .rgw.buckets.index pool...?

Set debug rgw=1 debug ms=1 and ran "radosgw-admin bi list"... amongst alot of other output I see...

- a query to osd 204 pg 100.1c;
- it finds and lists the first entry from ".dir.default.2049236.2.0"
- then a query to osd 164 pg 100.3d
- which returns "file not found" for ".dir.default.2049236.2.1"...
- consistent with shard #0 existing in .rgw.buckets.index, but not #1.

2018-02-16 18:13:10.405545 7f0f539beb80  1 -- 10.32.16.93:0/3172453804 --> 
10.31.0.65:6812/58901 -- osd_op(unknown.0.0:97 100.1c 
100:3a18c885:::.dir.default.2049236.2.0:head [call rgw.bi_list] snapc 0=[] 
ondisk+read+known_if_redirected e507701) v8 -- 0x7f0f55921f90 con 0
2018-02-16 18:13:10.410902 7f0f40a2f700  1 -- 10.32.16.93:0/3172453804 <== 
osd.204 10.31.0.65:6812/58901 1 ==== osd_op_reply(97 .dir.default.2049236.2.0 
[call] v0'0 uv1416558 ondisk = 0) v8 ==== 168+0+317 (1847036665 0 2928341486) 
0x7f0f3403d510 con 0x7f0f55925720[
    {
        "type": "plain",
        "idx": "durwa004/Copenhagen_bam_files_3.tar.xz",
        "entry": {
            "name": "durwa004/Copenhagen_bam_files_3.tar.xz",
            "instance": "",
            "ver": {
                "pool": 23,
                "epoch": 179629
            },
            "locator": "",
            "exists": "true",
            "meta": {
                "category": 1,
                "size": 291210535540,
                "mtime": "2018-02-09 04:59:43.869899Z",
                "etag": "e75dc95f44944fe9df6a102c809566be-272",
                "owner": "uid=12093",
                "owner_display_name": "Robert James Schaefer",
                "content_type": "application/x-xz",
                "accounted_size": 291210535540,
                "user_data": ""
            },
            "tag": "default.8366086.124333",
            "flags": 0,
            "pending_map": [],
            "versioned_epoch": 0
        }
    }
2018-02-16 18:13:10.411748 7f0f539beb80  1 -- 10.32.16.93:0/3172453804 --> 
10.31.0.71:6816/59857 -- osd_op(unknown.0.0:98 100.3d 
100:bd22ae7d:::.dir.default.2049236.2.1:head [call rgw.bi_list] snapc 0=[] 
ondisk+read+known_if_redirected e507701) v8 -- 0x7f0f55924b40 con 0
ERROR: bi_list(): (2) No such file or directory
2018-02-16 18:13:10.414018 7f0f41a31700  1 -- 10.32.16.93:0/3172453804 <== 
osd.164 10.31.0.71:6816/59857 1 ==== osd_op_reply(98 .dir.default.2049236.2.1 
[call] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 168+0+0 
(600468650 0 0) 0x7f0f3c03d540 con 0x7f0f5592ad50

It seems to me rgw should be looking in pool 23 (.rgw.buckets), not pool100 (.rgw.buckets.index)?

Presumably at some point it started using the default index pool (incl.adding new entries?) as defined in the rgw zone, rather than thosedefined for explicit_placement in the bucket metadata.

Last time we had "issues" of this sort (such as the incorrectnum_shards) was a long time ago; associated to the hammer->jewel upgradecirca 11/2016. I find it hard to believe this is being going onunnoticed for that long. Maybe related to our jewel->luminous upgrade(12/2017)? I'll ask the user when they last listed the bucket successfully.

Here is the current zone, btw - I do also have a json dump of this from11/2016 and it seems largely unchanged (some things like lc_pool andreshard_pool didn't exist).

root@cephmon1:~# radosgw-admin zone get default
{
    "id": "default",
    "name": "default",
    "domain_root": ".rgw",
    "control_pool": ".rgw.control",
    "gc_pool": ".rgw.gc",
    "lc_pool": ".log:lc",
    "log_pool": ".log",
    "intent_log_pool": ".intent-log",
    "usage_log_pool": ".usage",
    "reshard_pool": ".log:reshard",
    "user_keys_pool": ".users",
    "user_email_pool": ".users.email",
    "user_swift_pool": ".users.swift",
    "user_uid_pool": ".users.uid",
    "system_key": {
        "access_key": "",
        "secret_key": ""
    },
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": ".rgw.buckets.index",
                "data_pool": ".rgw.buckets",
                "data_extra_pool": ".rgw.buckets.extra",
                "index_type": 0,
                "compression": ""
            }
        },
        {
            "key": "ec42-placement",
            "val": {
                "index_pool": ".rgw.buckets.index",
                "data_pool": ".rgw.buckets.ec42",
                "data_extra_pool": ".rgw.buckets.extra",
                "index_type": 0,
                "compression": ""
            }
        }
    ],
    "metadata_heap": ".rgw.meta",
    "tier_config": [],
    "realm_id": "dbfd45d9-e250-41b0-be3e-ab9430215d5b"
}



Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?

Reply via email to