[ceph-users] Re: Hanging request in S3

2024-03-12 Thread Christian Kugler
Hi Casey,

Interesting. Especially since the request it hangs on is a GET request.
I set the option and restarted the RGW I test with.

The POSTs for deleting take a while but there are not longer blocking GET
or POST requests.
Thank you!

Best,
Christian

PS: Sorry for pressing the wrong reply button, Casey
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hanging request in S3

2024-03-06 Thread Casey Bodley
hey Christian, i'm guessing this relates to
https://tracker.ceph.com/issues/63373 which tracks a deadlock in s3
DeleteObjects requests when multisite is enabled.
rgw_multi_obj_del_max_aio can be set to 1 as a workaround until the
reef backport lands

On Wed, Mar 6, 2024 at 2:41 PM Christian Kugler  wrote:
>
> Hi,
>
> I am having some trouble with some S3 requests and I am at a loss.
>
> After upgrading to reef a couple of weeks ago some requests get stuck and
> never
> return. The two Ceph clusters are set up to sync the S3 realm
> bidirectionally.
> The bucket has 479 shards (dynamic resharding) at the moment.
>
> Putting an object (/etc/services) into the bucket via s3cmd works, and
> deleting
> it works as well. So I know it is not just the entire bucket that is somehow
> faulty.
>
> When I try to delete a specific prefix it the request for listing all
> objects
> never comes back. In the example below I only included the request in
> question
> which I aborted with ^C.
>
> $ s3cmd rm -r
> s3://sql20/pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/ -d
> [...snip...]
> DEBUG: Canonical Request:
> GET
> /sql20/
> prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
> host:[...snip...]
> x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> x-amz-date:20240306T183435Z
>
> host;x-amz-content-sha256;x-amz-date
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> --
> DEBUG: signature-v4 headers: {'x-amz-date': '20240306T183435Z',
> 'Authorization': 'AWS4-HMAC-SHA256
> Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
> 'x-amz-content-sha256':
> 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'}
> DEBUG: Processing request, please wait...
> DEBUG: get_hostname(sql20): [...snip...]
> DEBUG: ConnMan.get(): re-using connection: [...snip...]#1
> DEBUG: format_uri():
> /sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
> DEBUG: Sending request method_string='GET',
> uri='/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F',
> headers={'x-amz-date': '20240306T183435Z', 'Authorization':
> 'AWS4-HMAC-SHA256
> Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
> 'x-amz-content-sha256':
> 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'},
> body=(0 bytes)
> ^CDEBUG: Response:
> {}
> See ya!
>
> The request did not show up normally in the logs so I set debug_rgw=20 and
> debug_ms=20 via ceph config set.
>
> I tried to isolate the request and looked for its request id:
> 13321243250692796422
> The following is a grep for the request id:
>
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket verifying op params
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket pre-executing
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket check rate limiting
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket executing
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket list_objects_ordered: starting attempt 1
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket cls_bucket_list_ordered: request from each of 479 shard(s)
> for 8 entries to get 1001 total entries
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: currently processing
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
> from shard 437
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
> obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
> state=0x55d4237419e8 s->prefetch_data=0
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: skipping
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz[]
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: currently processing
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
> from shard 202
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
> obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
> state=0x55d4237419e8 s->prefetch_data=0
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: skippin