We are experiencing something similar (slow GETs responses) when sending 1k delete requests for example in ceph v16.2.13.
Rok On Mon, Jun 12, 2023 at 7:16 PM grin <g...@grin.hu> wrote: > Hello, > > ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable) > > There is a single (test) radosgw serving plenty of test traffic. When > under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty > reliably hangs: low traffic threads seem to work (like handling occasional > PUTs) but GETs are completely nonresponsive, all attention seems to be > spent on futexes. > > The effect is extremely similar to > > https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections > (subject: Radosgw (civetweb) hangs once around) > except this is quincy so it's beast instead of civetweb. The effect is the > same as described there, except the cluster is way smaller (about 20-40 > OSDs). > > I observed that when I start radosgw -f with debug 20/20 it almost never > hangs, so my guess is some ugly race condition. However I am a bit clueless > how to actually debug it since debugging makes it go away. Debug 1 > (default) with -d seems to hang after a while but it's not that simple to > induce, I'm still testing under 4/4. > > Also I do not see much to configure about beast. > > As to answer the question in the original (2016) thread: > - Debian stable > - no visible limits issue > - no obvious memory leak observed > - no other visible resource shortage > - strace says everyone's waiting on futexes, about 600-800 threads, apart > from the one serving occasional PUTs > - tcp port doesn't respond. > > IRC didn't react. ;-) > > Thanks, > Peter > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io