Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

Martin Emrich Wed, 17 Jan 2018 01:45:57 -0800

Hi Orit!

I did some tests, and indeed the combination of Versioning/Lifecycle with 
Resharding is the problem:



  *   If I do not enable Versioning/Lifecycle, Autoresharding works fine.
  *   If I disable Autoresharding but enable Versioning+Lifecycle, pushing data 
works fine, until I manually reshard. This hangs also.

My lifecycle rule (which shall remove all versions older than 60 days):


{

    "Rules": [{

        "Status": "Enabled",

        "Prefix": "",

        "NoncurrentVersionExpiration": {

            "NoncurrentDays": 60

        },

        "Expiration": {

            "ExpiredObjectDeleteMarker": true

        },

        "ID": "expire-60days"

    }]

}

I am currently testing with an application containing customer data, but I am 
also creating some random test data to create logs I can share.
I will also test whether the versioning itself is the culprit, or if it is the 
lifecycle rule.

Regards,
Martin

Von: Orit Wasserman <owass...@redhat.com>
Datum: Dienstag, 16. Januar 2018 um 18:38
An: Martin Emrich <martin.emr...@empolis.com>
Cc: ceph-users <ceph-users@lists.ceph.com>
Betreff: Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

Hi Martin,

On Mon, Jan 15, 2018 at 6:04 PM, Martin Emrich 
<martin.emr...@empolis.com<mailto:martin.emr...@empolis.com>> wrote:
Hi!

After having a completely broken radosgw setup due to damaged buckets, I 
completely deleted all rgw pools, and started from scratch.

But my problem is reproducible. After pushing ca. 100000 objects into a bucket, 
the resharding process appears to start, and the bucket is now unresponsive.

Sorry to hear that.
Can you share radosgw logs with --debug_rgw=20 --debug_ms=1?

I just see lots of these messages in all rgw logs:

2018-01-15 16:57:45.108826 7fd1779b1700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.119184 7fd1779b1700  0 NOTICE: resharding operation on 
bucket index detected, blocking
2018-01-15 16:57:45.260751 7fd1120e6700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.280410 7fd1120e6700  0 NOTICE: resharding operation on 
bucket index detected, blocking
2018-01-15 16:57:45.300775 7fd15b979700  0 block_while_resharding ERROR: bucket 
is still resharding, please retry
2018-01-15 16:57:45.300971 7fd15b979700  0 WARNING: set_req_state_err 
err_no=2300 resorting to 500
2018-01-15 16:57:45.301042 7fd15b979700  0 ERROR: 
RESTFUL_IO(s)->complete_header() returned err=Input/output error

One radosgw process and two OSDs housing the bucket index/metadata are still 
busy, but it seems to be stuck again.

How long is this resharding process supposed to take? I cannot believe that an 
application is supposed to block for more than half an hour...

I feel inclined to open a bug report, but I am yet unshure where the problem 
lies.

Some information:

* 3 RGW processes, 3 OSD hosts with 12 HDD OSDs and 6 SSD OSDs
* Ceph 12.2.2
* Auto-Resharding on, Bucket Versioning & Lifecycle rule enabled.

What life cycle rules do you use?

Regards,
Orit
Thanks,

Martin

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bug in RadosGW resharding? Hangs again...

Reply via email to