Hi everyone

We have faced some RGW outages recently, with the RGW returning HTTP 503. First 
for a few, then for most, then all requests - in the course of 1-2 hours. This 
seems to have started since we have updated from 15.2.4 to 15.2.5.

The line that accompanies these outages in the log is the following:

        s3:list_bucket Scheduling request failed with -2218

It first pops up a few times here and there, until it eventually applies to all 
requests. It seems to indicate that the throttler has reached the limit of open 
connections.

As we run a pair of HAProxy instances in front of RGW, which limit the number 
of connections to the two RGW instances to 400, this limit should never be 
reached. We do use RGW metadata sync between the instances, which could account 
for some extra connections, but if I look at open TCP connections between the 
instances I can count no more than 20 at any given time.

I also noticed that some connections in the RGW log seem to never complete. 
That is, I can find a ‘starting new request’ line, but no associated ‘req done’ 
or ‘beast’ line.

I don’t think there are any hung connections around, as they are killed by 
HAProxy after a short timeout.

Looking at the code, it seems as if the throttler in use (SimpleThrottler), 
eventually reaches the maximum count of 1024 connections 
(outstanding_requests), and never recovers. I believe that the request_complete 
function is not called in all cases, but I am not familiar with the Ceph 
codebase, so I am not sure.

See 
https://github.com/ceph/ceph/blob/cc17681b478594aa39dd80437256a54e388432f0/src/rgw/rgw_dmclock_async_scheduler.h#L166-L214
 
<https://github.com/ceph/ceph/blob/cc17681b478594aa39dd80437256a54e388432f0/src/rgw/rgw_dmclock_async_scheduler.h#L166-L214>

Does anyone see the same phenomenon? Could this be a bug in the request 
handling of RGW, or am I wrong in my assumptions?

For now we’re just restarting our RGWs regularly, which seems to keep the 
problem at bay.

Thanks for any hints.

Denis
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to