I'm curious how does the num_threads option to civetweb relate to the 'rgw thread pool size'? Should i make them equal?
ie: rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log -Ben On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander <w...@42on.com> wrote: > > > Op 9 februari 2017 om 19:34 schreef Mark Nelson <mnel...@redhat.com>: > > > > > > I'm not really an RGW expert, but I'd suggest increasing the > > "rgw_thread_pool_size" option to something much higher than the default > > 100 threads if you haven't already. RGW requires at least 1 thread per > > client connection, so with many concurrent connections some of them > > might end up timing out. You can scale the number of threads and even > > the number of RGW instances on a single server, but at some point you'll > > run out of threads at the OS level. Probably before that actually > > happens though, you'll want to think about multiple RGW gateway nodes > > behind a load balancer. Afaik that's how the big sites do it. > > > > In addition, have you tried to use more RADOS handles? > > rgw_num_rados_handles = 8 > > That with more RGW threads as Mark mentioned. > > Wido > > > I believe some folks are considering trying to migrate rgw to a > > threadpool/event processing model but it sounds like it would be quite a > > bit of work. > > > > Mark > > > > On 02/09/2017 12:25 PM, Benjeman Meekhof wrote: > > > Hi all, > > > > > > We're doing some stress testing with clients hitting our rados gw > > > nodes with simultaneous connections. When the number of client > > > connections exceeds about 5400 we start seeing 403 forbidden errors > > > and log messages like the following: > > > > > > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew > > > too big now=2017-02-09 08:53:16.000000 req_time=2017-02-09 > > > 08:37:18.000000 > > > > > > This is version 10.2.5 using embedded civetweb. There's just one > > > instance per node, and they all start generating 403 errors and the > > > above log messages when enough clients start hitting them. The > > > hardware is not being taxed at all, negligible load and network > > > throughput. OSD don't show any appreciable increase in CPU load or > > > io wait on journal/data devices. Unless I'm missing something it > > > looks like the RGW is just not scaling to fill out the hardware it is > > > on. > > > > > > Does anyone have advice on scaling RGW to fully utilize a host? > > > > > > thanks, > > > Ben > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com