Re: [naviserver-devel] naviserver with connection thread queue
On 06.12.12 13:13, Stephen Deasey wrote: > I guess it depends on how the website is deployed: in a more modern > set-up CSS is often compiled from SASS or LESS; javascript needs to be > minified and combined, possibly compiled using Google's optmising > compiler, maybe from coffee script; images are compressed, etc. Making > gzip versions of static text/* files is just one more target in a > Makefile. Which is a little different than the old PHP/OpenACS > perspective where everything happens at run-time. Modern PHP/OpenACS installations use reverse proxies like nginx for static content, where one has the option to compress files on the fly or to deliver pre-compressed binaries. When we switched our production site to gzip delivery for the dynamic content, we did not notice any difference in cpu-load. Sure, delivering static gziped content is faster than zipping on the fly, but i would like to keep the burden on the site master low. Not sure, why we are discussing this now. My original argument was that the api-structure for deliveries is overly complicated (to put it mildly) and not orthogonal (i failed to understand it without drawing the call-graph). There is a lot of room for improvement. -gustaf neumann -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Tue, Dec 4, 2012 at 10:55 PM, Gustaf Neumann wrote: > Am 04.12.12 20:06, schrieb Stephen Deasey: > > - we should actually ship some code which searches for *.gz versions of > static files > > this would mean to keep a .gz version and a non-.gz version in the file > system for the cases, where gzip is not an accepted encoding. Not sure, i > would like to manage these files and to keep it in sync the fast-path > cache could keep gzipped copies, invalidation is already there. I guess it depends on how the website is deployed: in a more modern set-up CSS is often compiled from SASS or LESS; javascript needs to be minified and combined, possibly compiled using Google's optmising compiler, maybe from coffee script; images are compressed, etc. Making gzip versions of static text/* files is just one more target in a Makefile. Which is a little different than the old PHP/OpenACS perspective where everything happens at run-time. -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Tue, Dec 4, 2012 at 10:24 PM, Gustaf Neumann wrote: > > Today, i was hunting another problem in connection with nsssl, which > turns out to be a weakness of our interfaces. The source for the problem > is that the buffer management of OpenSSL is not aligned with the buffer > management in naviserver. In the naviserver driver, all receive requests > are triggered via the poll, when sockets are readable. With OpenSSL it > might be as well possible that data as a leftover from an earlier > receive when a smaller buffer is provided. Naviserver requested during > upload spool reveive operations with a 4KB buffer. OpenSSL might receive > "at once" 16KB. The read operation with the small buffer will not drain > the OpenSSL buffer, and later, poll() will not be triggered by the fact, > that the socket is readable (since the buffer is still quite full). The > problem happened in NaviServer, when the input was spooled (e.g. file > uploads). I have doubts that this combination ever worked. I have > corrected the problem by increasing the buffer variable in the driver.c. > The cleaner implementation would be to add an "Ns_DriverReadableProc > Readable" similar to the "Ns_DriverKeepProc Keep", but that would > effect the interface of all drivers. Another way to use the openssl library is to manage socket read/writes yourself and hand memory buffers to openssl to encrypt/decrypt. -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Am 05.12.12 00:41, schrieb Stephen Deasey: > On Wed, Nov 28, 2012 at 10:38 AM, Gustaf Neumann wrote: >> It is interesting to see, that with always 5 connections threads running and >> using jemalloc, we see a rss consumption only slightly larger than with >> plain tcl and zippy malloc having maxthreads == 2, having less requests >> queued. >> >> Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10 >> >> requests 2062 spools 49 queued 3 connthreads 6 rss 376 >> requests 7743 spools 429 queued 359 connthreads 11 rss 466 >> requests 8389 spools 451 queued 366 connthreads 12 rss 466 >> >> which is even better. > Min/max threads 5/10 better than 2/10? the numbers show that 5/10 with tcmalloc is better than 5/10 with jemalloc and only slghtly worse than 2/2 with zippymalloc. -gn -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Wed, Nov 28, 2012 at 10:38 AM, Gustaf Neumann wrote: > > It is interesting to see, that with always 5 connections threads running and > using jemalloc, we see a rss consumption only slightly larger than with > plain tcl and zippy malloc having maxthreads == 2, having less requests > queued. > > Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10 > >requests 2062 spools 49 queued 3 connthreads 6 rss 376 >requests 7743 spools 429 queued 359 connthreads 11 rss 466 >requests 8389 spools 451 queued 366 connthreads 12 rss 466 > > which is even better. Min/max threads 5/10 better than 2/10? How about 7/10? When you hit 10/10 you can delete an awful lot of code :-) -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Tue, Dec 4, 2012 at 10:55 PM, Gustaf Neumann wrote: > > The code in naviserver-connthreadqueue handles already read-aheads with SSL. > i have removed there these hacks already; i think, these were in part > responsible for the sometimes erratic response times with SSL. Well, I think the thing here is one-upon-a-time SSL was considered computationally expensive (I don't know if it still is, with recent Intel cpus having dedicated AES instructions etc.). Read-ahead is good because you don't want an expensive conn thread waiting around for the whole request to arrive, packet by packet. But with SSL the single driver thread will be decrypting read-ahead data for multiple sockets and may run out of cpu, stalling the request pipeline, starving the conn threads. By making the SSL driver thread non-async you lose out on read-ahead as that all happens on the conn thread, but you gain cpu resources on a multi-cpu system (all of them, today). AOLserver 4.5 added a pool of read-ahead threads, one per-socket IIRC, to keep the benefits of read-ahead while gaining cpu parallelism. - does a single driver thread have enough computational resources to decrypt all sockets currently in read-ahead? This is going to depend on the algorithm. Might want to favour AES if you know your cpu has support. - which is worse, losing read-ahead, or losing cpu-parallelism? - if a read-ahead thread-pool is added, should it be one thread per-socket, which is simple, or one thread per-cpu and some kind of balancing mechanism? -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Am 04.12.12 20:06, schrieb Stephen Deasey: - we should actually ship some code which searches for *.gz versions of static files this would mean to keep a .gz version and a non-.gz version in the file system for the cases, where gzip is not an accepted encoding. Not sure, i would like to manage these files and to keep it in sync the fast-path cache could keep gzipped copies, invalidation is already there. * Similarly, range requests are not handled when the data is not sent ReturnOpen to the writer Queue. The diagram shows Ns_ConnReturnData also calls ReturnRange, and hence the other leg of fastpath and all the main data sending routines should handle range requests. this path is ok. when neither mmap or cache is set, fastpath can call ReturnOpenFd, and ReturnOpen send the data blindly to the writer if configured, which does not handle ranges. This needs some refactoring. * there is quite some potential to simplify / orthogonalize the servers infrastructure. * improving this structure has nothing to do with naviserver-connthreadqueue, and should happen at some time in the main tip. The writer thread was one of the last bits of code to land before things quietened down, and a lot of the stuff that got talked about didn't get implemented. i am not complaining, just trying to understand the historical layers. Without the call-graph the current code is hard to follow. One thing that was mentioned was having a call-back interface where you submit a function to the writer thread and it runs it. This would allow other kinds of requests to be served async. One of the things we've been talking about with the connthread work is simplification. The current code, with it's workarounds for stalls and managing thread counts is very complicated. If it were simplified and genericised it could also be used for background writer threads, and SSL read-ahead threads (as in aolserver > 4.5). So, that's another +1 for keeping the conn threads simple. The code in naviserver-connthreadqueue handles already read-aheads with SSL. i have removed there these hacks already; i think, these were in part responsible for the sometimes erratic response times with SSL. -gustaf neuamnn -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Am 04.12.12 20:25, schrieb Stephen Deasey: > I found this nifty site the other day: > > https://www.ssllabs.com/ssltest/analyze.html?d=next-scripting.org > > It's highlighting a few things that need fixed in the nsssl module, > including a couple of security bugs. Looks like relatively little code > though. The report is already much better: now everything is green. Most of the complaints could be removed via configuration, just two issues required code changes (requires a flag, which is not available in all current OpenSSL implementation, such as that from Mac OS X, and adding a callback). The security rating is now better than from nginx. Today, i was hunting another problem in connection with nsssl, which turns out to be a weakness of our interfaces. The source for the problem is that the buffer management of OpenSSL is not aligned with the buffer management in naviserver. In the naviserver driver, all receive requests are triggered via the poll, when sockets are readable. With OpenSSL it might be as well possible that data as a leftover from an earlier receive when a smaller buffer is provided. Naviserver requested during upload spool reveive operations with a 4KB buffer. OpenSSL might receive "at once" 16KB. The read operation with the small buffer will not drain the OpenSSL buffer, and later, poll() will not be triggered by the fact, that the socket is readable (since the buffer is still quite full). The problem happened in NaviServer, when the input was spooled (e.g. file uploads). I have doubts that this combination ever worked. I have corrected the problem by increasing the buffer variable in the driver.c. The cleaner implementation would be to add an "Ns_DriverReadableProc Readable" similar to the "Ns_DriverKeepProc Keep", but that would effect the interface of all drivers. -gustaf neumann -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Thu, Nov 29, 2012 at 6:51 PM, Gustaf Neumann wrote: > > It turned out > that the large queueing time came from requests from taipeh, which contained > several 404 errors. The size of the 404 request is 727 bytes, and therefore > under the writersize, which was configured as 1000. The delivery of an error > message takes to this site more than a second. Funny enough, the delivery of > the error message blocked the connection thread longer than the delivery of > the image when it is above the writersize. > > I will reduce the writersize further, but still a slow delivery can even > slow down the delivery of the headers, which happens still in the connection > thread. This shouldn't be the case for strings, or data sent from the fast path cache, such as a small file (a custom 404), as eventually those should work their way down to Ns_ConnWriteData which will construct the headers if not already sent and pass them, along with the data payload to writev(2). Linux should coalesce the buffers and send in a single packet, if small enough. I wonder if this is some kind of weird nsssl interaction. (For things like sendfile without ssl we could use TCP_CORK to coalesce the headers with the body) -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Mon, Dec 3, 2012 at 10:38 AM, Gustaf Neumann wrote: > > All changes are on bitbucket (nsssl and naviserver-connthreadqueue). I found this nifty site the other day: https://www.ssllabs.com/ssltest/analyze.html?d=next-scripting.org It's highlighting a few things that need fixed in the nsssl module, including a couple of security bugs. Looks like relatively little code though. Also, there's this: https://insouciant.org/tech/ssl-performance-case-study/ which is a pretty good explanation of things from a performance point of view. I haven't spent much time looking at SSL. Looks like there could be some big wins. For example, some of the stuff to do with certificate chains could probably be automated - the server could spit out an informative error to the log if things look poorly optimised. -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Tue, Dec 4, 2012 at 5:21 PM, Gustaf Neumann wrote: > > * Only content sent via Ns_ConnWriteVChars has the chance to get > compressed. > ie. dynamic content with a text/* mime-type. The idea here was you don't want to try and compress gifs an so on, and static content could be pre-compressed on disk - at runtime simple look for a *.gz version of the content. This could be cleaned up a bit by: - having an extendable white-list of mime-types which should be compressed: text/*, application/javascript, application/xml etc. - we should actually ship some code which searches for *.gz versions of static files > * Similarly, range requests are not handled when the data is not sent > ReturnOpen to the writer Queue. > The diagram shows Ns_ConnReturnData also calls ReturnRange, and hence the other leg of fastpath and all the main data sending routines should handle range requests. > > * there is quite some potential to simplify / orthogonalize the servers > infrastructure. > * improving this structure has nothing to do with > naviserver-connthreadqueue, and should happen at some time in the main tip. > The writer thread was one of the last bits of code to land before things quietened down, and a lot of the stuff that got talked about didn't get implemented. One thing that was mentioned was having a call-back interface where you submit a function to the writer thread and it runs it. This would allow other kinds of requests to be served async. One of the things we've been talking about with the connthread work is simplification. The current code, with it's workarounds for stalls and managing thread counts is very complicated. If it were simplified and genericised it could also be used for background writer threads, and SSL read-ahead threads (as in aolserver > 4.5). So, that's another +1 for keeping the conn threads simple. -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Am 29.11.12 19:51, schrieb Gustaf Neumann: > However, i am still in the process to clean up and address > some strange interactions (e.g. for nsssl some socket > closing interactions between driver and connection threads > seems to complex to me), so i am still for a while busy with that The problem in the bad interaction was that the driver poll reported, there should be data to read from the socket, while the nsssl receive said, there is none and vice versa. Since the nsssl was defined as sync, the socket processing happens in the connection threads via NsGetRequest(), which returned frequently empty, meaning a useless wakeup and ConnRun etc. in the connection thread. The reason for this was that the nsssl sets the socket to non-blocking without defining the driver as ASYNC, so that it was possible that a receive returned 0 bytes. Setting the driver simply to ASYNC did not help either, since the driver used its own event loop via Ns_SockTimedWait(), which interacts badly with the drivers poll loop (did not realize on Linux that a socket becomes readable, while this worked on Mac OS X). So i changed the receive operation to simply report the read state back to the main driver, which works now apparently ok (tested on Mac OS X and Linux). To be on the save side, i have tagged the previous version of nsssl on bitbucket and incremented the version number. This change led as well to a simplification in the main driver. The new version is running since sat on next-scripting.org and shows even better results. The average queue time is lower by again a factor of 4, the standard deviation is as well much better. accept queue run total avg 0,0599 0,0002 0,0183 0,0196 stdev 0,3382 0,0039 0,0413 0,0530 min 0, 0,0001 0,0011 0,0012 max 12,1002 0,2324 0,8958 2,0112 median 0,0088 0,0001 0,0118 0,0120 the accept times show now up since nsssl is async, but these are often not meaningful for the performance (due to closewait the accept might be a previous accept operation). now, all performance measurement happens relative to queue time (similar as in naviserver in the main tip). Note, that this data is based not on a synthetic test but on a real-world web site with potentially different traffic patterns from day to day, so don't draw too much conclusions from this. By moving from sync to async some potentially time consuming processing (waiting for more data) is moved from the connection thread to the driver's event loop, so the time span in the connection thread is reduced (for nsssl the same way as it was before for nssock). Therefore the "runtime" has less randomness and is shorter, leading to the improvements. The behavior of the async log writer is now configurable (by default turned off), log-rolling and shutdown should now wort with this reliable as well. All changes are on bitbucket (nsssl and naviserver-connthreadqueue). The server measures now as well the filter time separately from the runtime, but i am just in the process to collect data on that... -gustaf neumann one more update. Here is now the data of a full day with the Async-Writer thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard deviation from 0,7481 to 0,0211. This is a noticable improvement. accept queue run total avg 0, 0,0009 0,1304 0,1313 stdev 0, 0,0211 0,9022 0,9027 min 0, 0, 0,0032 0,0033 max 0, 0,9772 28,1699 28,1700 median 0, 0,0001 0,0208 0,0209 But still, the sometime large values are worth to investigate. It turned out that the large queueing time came from requests from taipeh, which contained several 404 errors. The size of the 404 request is 727 bytes, and therefore under the writersize, which was configured as 1000. The delivery of an error message takes to this site more than a second. Funny enough, the delivery of the error message blocked the connection thread longer than the delivery of the image when it is above the writersize. I will reduce the writersize further, but still a slow delivery can even slow down the delivery of the headers, which happens still in the connection thread. Most likely, i won't address this now. however, i will check the effects of fastpath cache, which was deactivated so far... best regards -gustaf neumann Am 28.11.12 11:38, schrieb Gustaf Neumann: Dear all, here is a short update of the findings and changes since last week. One can now activate for nslog "logpartialtimes", which adds an entry to the access log containing the partial request times (accept time, queuing time, and run time). The sample site runs now with minthreads == 5, maxthreads 10. If one analyzes the data of one day of our sample site, one can see that in this case, the accept time is always 0 (this is caused by nsssl, which is a sync driver, header parsing happens solely in the connection thread), the queue time is on
Re: [naviserver-devel] naviserver with connection thread queue
Am 29.11.12 20:24, schrieb Jeff Rogers: > > Hi Gustaf, > > One quick idea on the writer thread is to, regardless of size always > make one write attempt in the conn thread, and if less than the > complete buffer was written, then pass the remainder off to the writer > thread. This would get the best of both worlds - fast/small requests > don't incur the overhead of moving the duplicating the buffers, while > large/slow requests wouldn't block the conn thread. yes, that might be interesting to investigate as well. one has to differentiation between synchronous and asynchronous drivers and between file-deliveries and string-deliveries, etc. so in the general case, it might be some more work. However, i am still in the process to clean up and address some strange interactions (e.g. for nsssl some socket closing interactions between driver and connection threads seems to complex to me), so i am still for a while busy with that. i hope, that i don't create too much collateral damage and let you know, when the new stuff is stabilized... -gustaf neumann -- Univ.Prof. Dr. Gustaf Neumann Institute of Information Systems and New Media WU Vienna Augasse 2-6, A-1090 Vienna, AUSTRIA -- Keep yourself connected to Go Parallel: TUNE You got it built. Now make it sing. Tune shows you how. http://goparallel.sourceforge.net ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
This is good engineering work. I rise my hat to that. Congrats, Maurizio From: Gustaf Neumann [mailto:neum...@wu.ac.at] Sent: 29 November 2012 19:51 To: naviserver-devel@lists.sourceforge.net Subject: Re: [naviserver-devel] naviserver with connection thread queue Dear all, one more update. Here is now the data of a full day with the Async-Writer thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard deviation from 0,7481 to 0,0211. This is a noticable improvement. accept queue run total avg 0, 0,0009 0,1304 0,1313 stdev 0, 0,0211 0,9022 0,9027 min 0, 0, 0,0032 0,0033 max 0, 0,9772 28,1699 28,1700 median 0, 0,0001 0,0208 0,0209 But still, the sometime large values are worth to investigate. It turned out that the large queueing time came from requests from taipeh, which contained several 404 errors. The size of the 404 request is 727 bytes, and therefore under the writersize, which was configured as 1000. The delivery of an error message takes to this site more than a second. Funny enough, the delivery of the error message blocked the connection thread longer than the delivery of the image when it is above the writersize. I will reduce the writersize further, but still a slow delivery can even slow down the delivery of the headers, which happens still in the connection thread. Most likely, i won't address this now. however, i will check the effects of fastpath cache, which was deactivated so far... best regards -gustaf neumann Am 28.11.12 11:38, schrieb Gustaf Neumann: Dear all, here is a short update of the findings and changes since last week. One can now activate for nslog "logpartialtimes", which adds an entry to the access log containing the partial request times (accept time, queuing time, and run time). The sample site runs now with minthreads == 5, maxthreads 10. If one analyzes the data of one day of our sample site, one can see that in this case, the accept time is always 0 (this is caused by nsssl, which is a sync driver, header parsing happens solely in the connection thread), the queue time is on average 54 ms (milliseconds), which is huge, while the median is 0.1 ms. The reason is that we can see a maximum queue time of 21 secs! As a consequence, the standard deviation is as well huge. When one looks more closely to the log file, one can see that at times, when the queuing time is huge, as well the runtime is often huge, leading to cascading effects as described below. cause and reason are hard to determine, since we saw large runtimes even on delivery of rather small times accept queue run total avg 0, 0,0547 0,2656 0,3203 stdev0, 0,7481 0,9740 1,4681 min 0, 0, 0,0134 0,0135 max 0, 21,9594 16,6905 29,1668 median 0, 0,0001 0,0329 0,0330 The causes for the "random" slow cases are at least from two sources: slow delivery (some clients are slow in retrieving data, especially some bots seems to be bad) and system specific latencies (e.g. backup running, other servers becoming active, etc.). To address this problem, i moved more file deliveries to the writer thread (reducing writersize) and activated TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing time is 4.6 ms, but still the standard deviation is huge, caused by some "erratic" some large times. accept queue run total avg 0, 0,0046 0,1538 0,1583 stdev0, 0,0934 0,6868 0,6967 min 0, 0, 0,0137 0,0138 max 0, 4,6041 22,4101 22,4102 median 0, 0,0001 0,0216 0,0217 There are still some huge values, and the queuing time was "lucky" not to be influenced by the still large runtimes of requests. Aside of the top values, one sees sometimes unexplained delays of a few seconds. which seem to block everything. A large source for latencies is the file system. On our production system we already invested some efforts to tune the file system in the past, such we don't see effects as large as on this sample site. Since NaviServer should work well also on non-tuned file systems, i implemented an async writer thread, that decouples writing to files (access.log and error.log) from the connection threads. The interface is the same as unix write(), but does not block. I have this running since yesterday evening on the sample site, and do far, it seems to help to reduce the random latencies significantly. i'll wait with posting data until i have a similar time range to compare, maybe the good times i am seeing are accidentally. I have not yet looked into the details for handling the async writer withing logrotate and shutdown. On another front, i was experimenting with jemalloc and tcmalloc. Previously we had the following figures, using tcl as distributed with zippy malloc with minthreads=2, maxthreads=2 requests 10307 spools
Re: [naviserver-devel] naviserver with connection thread queue
Gustaf Neumann wrote: > Funny enough, the delivery of the error message blocked the connection > thread longer than the delivery of the image when it is above the > writersize. > > I will reduce the writersize further, but still a slow delivery can even > slow down the delivery of the headers, which happens still in the > connection thread. Most likely, i won't address this now. however, i > will check the effects of fastpath cache, which was deactivated so far... Hi Gustaf, This has been some really interesting bits you've been sharing. I keep meaning to add in my thoughts but I haven't had the time. One quick idea on the writer thread is to, regardless of size always make one write attempt in the conn thread, and if less than the complete buffer was written, then pass the remainder off to the writer thread. This would get the best of both worlds - fast/small requests don't incur the overhead of moving the duplicating the buffers, while large/slow requests wouldn't block the conn thread. -J -- Keep yourself connected to Go Parallel: VERIFY Test and improve your parallel project with help from experts and peers. http://goparallel.sourceforge.net ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Dear all, one more update. Here is now the data of a full day with the Async-Writer thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard deviation from 0,7481 to 0,0211. This is a noticable improvement. accept queue run total avg 0, 0,0009 0,1304 0,1313 stdev 0, 0,0211 0,9022 0,9027 min 0, 0, 0,0032 0,0033 max 0, 0,9772 28,1699 28,1700 median 0, 0,0001 0,0208 0,0209 But still, the sometime large values are worth to investigate. It turned out that the large queueing time came from requests from taipeh, which contained several 404 errors. The size of the 404 request is 727 bytes, and therefore under the writersize, which was configured as 1000. The delivery of an error message takes to this site more than a second. Funny enough, the delivery of the error message blocked the connection thread longer than the delivery of the image when it is above the writersize. I will reduce the writersize further, but still a slow delivery can even slow down the delivery of the headers, which happens still in the connection thread. Most likely, i won't address this now. however, i will check the effects of fastpath cache, which was deactivated so far... best regards -gustaf neumann Am 28.11.12 11:38, schrieb Gustaf Neumann: Dear all, here is a short update of the findings and changes since last week. One can now activate for nslog "logpartialtimes", which adds an entry to the access log containing the partial request times (accept time, queuing time, and run time). The sample site runs now with minthreads == 5, maxthreads 10. If one analyzes the data of one day of our sample site, one can see that in this case, the accept time is always 0 (this is caused by nsssl, which is a sync driver, header parsing happens solely in the connection thread), the queue time is on average 54 ms (milliseconds), which is huge, while the median is 0.1 ms. The reason is that we can see a maximum queue time of 21 secs! As a consequence, the standard deviation is as well huge. When one looks more closely to the log file, one can see that at times, when the queuing time is huge, as well the runtime is often huge, leading to cascading effects as described below. cause and reason are hard to determine, since we saw large runtimes even on delivery of rather small times accept queue run total avg 0, 0,0547 0,2656 0,3203 stdev 0, 0,7481 0,9740 1,4681 min 0, 0, 0,0134 0,0135 max 0, 21,9594 16,6905 29,1668 median 0, 0,0001 0,0329 0,0330 The causes for the "random" slow cases are at least from two sources: slow delivery (some clients are slow in retrieving data, especially some bots seems to be bad) and system specific latencies (e.g. backup running, other servers becoming active, etc.). To address this problem, i moved more file deliveries to the writer thread (reducing writersize) and activated TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing time is 4.6 ms, but still the standard deviation is huge, caused by some "erratic" some large times. accept queue run total avg 0, 0,0046 0,1538 0,1583 stdev 0, 0,0934 0,6868 0,6967 min 0, 0, 0,0137 0,0138 max 0, 4,6041 22,4101 22,4102 median 0, 0,0001 0,0216 0,0217 There are still some huge values, and the queuing time was "lucky" not to be influenced by the still large runtimes of requests. Aside of the top values, one sees sometimes unexplained delays of a few seconds. which seem to block everything. A large source for latencies is the file system. On our production system we already invested some efforts to tune the file system in the past, such we don't see effects as large as on this sample site. Since NaviServer should work well also on non-tuned file systems, i implemented an async writer thread, that decouples writing to files (access.log and error.log) from the connection threads. The interface is the same as unix write(), but does not block. I have this running since yesterday evening on the sample site, and do far, it seems to help to reduce the random latencies significantly. i'll wait with posting data until i have a similar time range to compare, maybe the good times i am seeing are accidentally. I have not yet looked into the details for handling the async writer withing logrotate and shutdown. On another front, i was experimenting with jemalloc and tcmalloc. Previously we had the following figures, using tcl as distributed with zippy malloc with minthreads=2, maxthreads=2 requests 10307 spools 1280 queued 2704 connthreads 11 rss 425 When changing minthreads to 5, maxthreads 10, but using jemalloc, one sees the following figures requests 3933 spools 188 queued 67 connthreads 7 rss 488 requests 6325 spools 256 queued 90 connthreads 9 rss 528 requests 10021 spoo
Re: [naviserver-devel] naviserver with connection thread queue
Dear all, here is a short update of the findings and changes since last week. One can now activate for nslog "logpartialtimes", which adds an entry to the access log containing the partial request times (accept time, queuing time, and run time). The sample site runs now with minthreads == 5, maxthreads 10. If one analyzes the data of one day of our sample site, one can see that in this case, the accept time is always 0 (this is caused by nsssl, which is a sync driver, header parsing happens solely in the connection thread), the queue time is on average 54 ms (milliseconds), which is huge, while the median is 0.1 ms. The reason is that we can see a maximum queue time of 21 secs! As a consequence, the standard deviation is as well huge. When one looks more closely to the log file, one can see that at times, when the queuing time is huge, as well the runtime is often huge, leading to cascading effects as described below. cause and reason are hard to determine, since we saw large runtimes even on delivery of rather small times accept queue run total avg 0, 0,0547 0,2656 0,3203 stdev 0, 0,7481 0,9740 1,4681 min 0, 0, 0,0134 0,0135 max 0, 21,9594 16,6905 29,1668 median 0, 0,0001 0,0329 0,0330 The causes for the "random" slow cases are at least from two sources: slow delivery (some clients are slow in retrieving data, especially some bots seems to be bad) and system specific latencies (e.g. backup running, other servers becoming active, etc.). To address this problem, i moved more file deliveries to the writer thread (reducing writersize) and activated TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing time is 4.6 ms, but still the standard deviation is huge, caused by some "erratic" some large times. accept queue run total avg 0, 0,0046 0,1538 0,1583 stdev 0, 0,0934 0,6868 0,6967 min 0, 0, 0,0137 0,0138 max 0, 4,6041 22,4101 22,4102 median 0, 0,0001 0,0216 0,0217 There are still some huge values, and the queuing time was "lucky" not to be influenced by the still large runtimes of requests. Aside of the top values, one sees sometimes unexplained delays of a few seconds. which seem to block everything. A large source for latencies is the file system. On our production system we already invested some efforts to tune the file system in the past, such we don't see effects as large as on this sample site. Since NaviServer should work well also on non-tuned file systems, i implemented an async writer thread, that decouples writing to files (access.log and error.log) from the connection threads. The interface is the same as unix write(), but does not block. I have this running since yesterday evening on the sample site, and do far, it seems to help to reduce the random latencies significantly. i'll wait with posting data until i have a similar time range to compare, maybe the good times i am seeing are accidentally. I have not yet looked into the details for handling the async writer withing logrotate and shutdown. On another front, i was experimenting with jemalloc and tcmalloc. Previously we had the following figures, using tcl as distributed with zippy malloc with minthreads=2, maxthreads=2 requests 10307 spools 1280 queued 2704 connthreads 11 rss 425 When changing minthreads to 5, maxthreads 10, but using jemalloc, one sees the following figures requests 3933 spools 188 queued 67 connthreads 7 rss 488 requests 6325 spools 256 queued 90 connthreads 9 rss 528 requests 10021 spools 378 queued 114 connthreads 14 rss 530 It is interesting to see, that with always 5 connections threads running and using jemalloc, we see a rss consumption only slightly larger than with plain tcl and zippy malloc having maxthreads == 2, having less requests queued. Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10 requests 2062 spools 49 queued 3 connthreads 6 rss 376 requests 7743 spools 429 queued 359 connthreads 11 rss 466 requests 8389 spools 451 queued 366 connthreads 12 rss 466 which is even better. For more information on malloc tests see https://next-scripting.org/xowiki/docs/misc/thread-mallocs or the tcl-core mailing list. That's all for now -gustaf neumann Am 20.11.12 20:07, schrieb Gustaf Neumann: Dear all, The idea of controlling the number of running threads via queuing latency is interesting, but i have to look into the details before i can comment on this. before one can consider controlling the number of running threads via queuing latency, one has to improve the awareness in NaviServer about the various phases in the requests lifetime. In the experimental version, we have now the following time stamps recorded: - acceptTime (the time, a socket was accepted) - requestQueueTime (the time the request was queued; was startTime) - requestDequeueTime
Re: [naviserver-devel] naviserver with connection thread queue
Dear all, The idea of controlling the number of running threads via queuing latency is interesting, but i have to look into the details before i can comment on this. before one can consider controlling the number of running threads via queuing latency, one has to improve the awareness in NaviServer about the various phases in the requests lifetime. In the experimental version, we have now the following time stamps recorded: - acceptTime (the time, a socket was accepted) - requestQueueTime (the time the request was queued; was startTime) - requestDequeueTime (the time the request was dequeued) The difference between requestQueueTime and acceptTime is the setup cost and depends on the amount of work, the driver does. For instance, nssock of naviserver performs read-ahead, while nsssl does not and passes connection right away. So, the previously used startTime (which is acctually the time the request was queued) was for drivers with read ahead not correct. In the experimental version, [ns_conn start] returns now always the accept time. The next paragraph uses the term endTime, which is the time, when a connection thread is done with a request (either the content was delivered, or the content was handed over to a writer thread). The difference between requestDequeueTime and requestQueueTime is the time spent in the queue. The difference between endTime and requestDequeueTime is the pure runtime, the difference between endTime and acceptTime is the totalTime. As a rough approximation the time between requestDequeueTime and acceptTime is pretty much influenced by the server setup, and the runtime by the application. I used the term "approximation" since the runtime of certain other requests influences the queuing time, as we see in the following: Consider a server with two running connection threads receiving 6 requests, where requests 2-5 are received in a very short time. The first three requests are directly assign to connection threads, have fromQueue == 0. These have queuing times between 88 and 110 micro seconds, which includes signal sending/receiving, thread change, and initial setup in the connection thread. The runtimes for these requests are pretty bad, in the range of 0.24 to 3.8 seconds elapsed time. [1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.00 queue 0.000110 run 0.637781 total 0.637891 [2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.00 queue 0.90 run 0.245030 total 0.245120 [3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.00 queue 0.88 run 0.432421 total 0.432509 [4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.00 queue 0.244246 run 0.249208 total 0.493454 [5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.00 queue 0.431545 run 3.713331 total 4.144876 [6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.00 queue 0.480382 run 3.799818 total 4.280200 Requests [4, 5, 6] are queued, and have queuing times between 0.2 and 0.5 seconds. The queuing times are pretty much the runtimes of [2, 3, 4], therefore the runtime determines the queuing time. For example, the totalTime of request [4] was 0.493454 secs, half of the time it was waiting in the queue. Request [4] can consider itself happy that it was not scheduled after [5] or [6], where its totalTime would have been likely in the range of 4 secs (10 times slower). Low waiting times are essential for good performance. This example shows pretty well the importance of aync delivery mechanisms like the writer thread or bgdelivery in OpenACS. A file being delivered by the connection thread over a slow internet connection might block later requests substantially (as in the cases above). This is even more important for todays web-sites, where a single view might entail 60+ embedded requests for js, css, images, where it is not feasible to defined hundreds of connection threads. Before going further into detail i'll provide further introspection mechanism to the experimental version. - [ns_server stats] adding total waiting time - [ns_conn queuetime] ... time spent in queue - [ns_conn dequeue] ... time stamp when the req starts to actually run (similar to [ns_conn start]) The first can be used for server monitoring, the next two for single connection introspection. The queuetime can be useful for better awareness and for optional output in the access log, and the dequeue time-stamp for application level profiling as base for a difference with current time. Further wishes, suggestions, comments? -gustaf neumann -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zo
Re: [naviserver-devel] naviserver with connection thread queue
On 18.11.12 20:34, Stephen Deasey wrote: On Sun, Nov 18, 2012 at 1:22 PM, Gustaf Neumann wrote: Here are some actual figures with a comparable number of requests: with minthreads==maxthreads==2 requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415 below are the previous values, competed by the number of queuing operations and the rss size in MV with minthreads=2, create when queue >= 2 requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466 as anticipated, thread creations and cpu consumption went down, but the number of queued requests (requests that could not be executed immediately) increased significantly. I was thinking of the opposite: make min/max threads equal by increasing min threads. Requests would never stall in the queue, unlike the experiment you ran with max threads reduced to min threads. on the site, we have maxthreads 10. so setting minthreads as well to 10 has the consequence of a larger memsize (and queued substantially reduced). But there's another benefit: unlike the dynamic scenario requests would also never stall in the queue when a new thread had to be started when min < max threads. you are talking about naviserver before 4.99.4. Both, the version in the tip naviserver repository and the forked version provide already warmed up threads. The version on the main tip starts to listen to the wakup signals only once it is warmed up, the version in the fork adds a thread to the conn queue as well only after the startup is complete. So, in both cases there is no stall. In earlier version, this was as you describe. What is the down side to increasing min threads up to max threads? higher memory consumption, maybe more open database connections, allocating resources which are not needed. The degree of wastefullness depends certainly on maxthreads. I would assume that for an admin carefully watching the server needs, setting minthreads==maxthreads to the "right value" can lead to slight improvements as long the load is rather constant over time. Maybe the most significant benefit of a low maxthreads value is the reduced memory consumption. On this machine we are using plain Tcl with its "zippy malloc", which does not release memory (once allocated to its pool) back to the OS. So, the measured memsize depends on the max number of threads with tcl interps, especially with large blueprints (as in the case of OpenACS). Right: the max number of threads *ever*, not just currently. So by killing threads you don't reduce memory usage, but you do increase latency for some requests which have to wait for a thread+interp to be created. not really with the warm-up feature. Is it convenient to measure latency distribution (not just average)? I guess not: we record conn.startTime when a connection is taken out of the queue and passed to a conn thread, but we don't record the time when a socket was accepted. we could record the socket accept time and measure the difference until the start of the connection runtime; when we output this to the accesslog (like logreqtime) we could run whatever statistics we want. Actually, managing request latency is another area we don't handle so well. You can influence it by adjusting the OS listen socket accept queue length, you can adjust the length of the naviserver queue, and with the proposed change here you can change how aggressive new threads are created to process requests in the queue. But queue-depth is a roundabout way of specifying milliseconds of latency. And not just round-about but inherently imprecise as different URLs are going to require different amounts of time to complete, and which URLs are requested is a function of current traffic. If instead of queue size you could specify a target latency then we could maybe do smarter things with the queue, such as pull requests off the back of the queue which have been waiting longer than the target latency, making room for fresh requests on the front of the queue. The idea of controlling the number of running threads via queuing latency is interesting, but i have to look into the details before i can comment on this. -gustaf -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Sun, Nov 18, 2012 at 1:22 PM, Gustaf Neumann wrote: > On 14.11.12 09:51, Gustaf Neumann wrote: > > On 13.11.12 15:02, Stephen Deasey wrote: > > On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann wrote: > > minthreads = 2 > > creating threads, when idle == 0 > 10468 requests, connthreads 267 > total cputime 00:10:32 > > creating threads, when queue >= 5 > requests 10104 connthreads 27 > total cputime 00:06:14 > > What if you set minthreads == maxthreads? > > The number of thread create operations will go further down. > > Here are some actual figures with a comparable number of requests: > > with minthreads==maxthreads==2 >requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415 > > below are the previous values, competed by the number of queuing operations > and the rss size in MV > > with minthreads=2, create when queue >= 2 >requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466 > > as anticipated, thread creations and cpu consumption went down, but the > number of queued requests (requests that could not be executed immediately) > increased significantly. I was thinking of the opposite: make min/max threads equal by increasing min threads. Requests would never stall in the queue, unlike the experiment you ran with max threads reduced to min threads. But there's another benefit: unlike the dynamic scenario requests would also never stall in the queue when a new thread had to be started when min < max threads. What is the down side to increasing min threads up to max threads? > Maybe the most significant benefit of a low maxthreads value is the reduced > memory consumption. On this machine we are using plain Tcl with its "zippy > malloc", which does not release memory (once allocated to its pool) back to > the OS. So, the measured memsize depends on the max number of threads with > tcl interps, especially with large blueprints (as in the case of OpenACS). Right: the max number of threads *ever*, not just currently. So by killing threads you don't reduce memory usage, but you do increase latency for some requests which have to wait for a thread+interp to be created. Is it convenient to measure latency distribution (not just average)? I guess not: we record conn.startTime when a connection is taken out of the queue and passed to a conn thread, but we don't record the time when a socket was accepted. Actually, managing request latency is another area we don't handle so well. You can influence it by adjusting the OS listen socket accept queue length, you can adjust the length of the naviserver queue, and with the proposed change here you can change how aggressive new threads are created to process requests in the queue. But queue-depth is a roundabout way of specifying milliseconds of latency. And not just round-about but inherently imprecise as different URLs are going to require different amounts of time to complete, and which URLs are requested is a function of current traffic. If instead of queue size you could specify a target latency then we could maybe do smarter things with the queue, such as pull requests off the back of the queue which have been waiting longer than the target latency, making room for fresh requests on the front of the queue. -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On 14.11.12 09:51, Gustaf Neumann wrote: On 13.11.12 15:02, Stephen Deasey wrote: On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann wrote: minthreads = 2 creating threads, when idle == 0 10468 requests, connthreads 267 total cputime 00:10:32 creating threads, when queue >= 5 requests 10104 connthreads 27 total cputime 00:06:14 What if you set minthreads == maxthreads? The number of thread create operations will go further down. Here are some actual figures with a comparable number of requests: with minthreads==maxthreads==2 requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415 below are the previous values, competed by the number of queuing operations and the rss size in MV with minthreads=2, create when queue >= 2 requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466 as anticipated, thread creations and cpu consumption went down, but the number of queued requests (requests that could not be executed immediately) increased significantly. Maybe the most significant benefit of a low maxthreads value is the reduced memory consumption. On this machine we are using plain Tcl with its "zippy malloc", which does not release memory (once allocated to its pool) back to the OS. So, the measured memsize depends on the max number of threads with tcl interps, especially with large blueprints (as in the case of OpenACS). This situation can be improved with e.g. jemalloc (what we are using in production, which requires a modified tcl), but after about 2 or 3 days running a server the rss sizes are very similar (most likely due to fragmentation). -gustaf neumann When running already at minthreads, the connection thread timeout is ignored (otherwise there would be a high number of thread create operations just after the timeout expires to ensure minthreads running connection threads). With connsperthread == 1000, there will be about 10 thread create operations for 1 requests (not counting the 2 initial create operation during startup for minthreads == 2). So, the cpu consumption will be lower, but the server would not scale, when the requests frequency would require more connection threads. Furthermore, there will be most likely more requests put into the queue instead of being served immediately. When we assume, that with minthreads == maxthreads == 2 there won't be more than say 20 requests queued, a similar effect could be achieved by allowing additional thread creations for more than 20 requests in the queue. Or even more conservative, allowing thread creations only when the request queue is completely full (setting the low water mark to 100%) would as well be better than minthreads == maxthreads, since the server will at least start to create additional threads in this rather hopeless situation, where with minthreads == maxthreads, it won't. -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On 13.11.12 15:02, Stephen Deasey wrote: > On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann wrote: >> minthreads = 2 >> >> creating threads, when idle == 0 >> 10468 requests, connthreads 267 >> total cputime 00:10:32 >> >> creating threads, when queue >= 5 >> requests 10104 connthreads 27 >> total cputime 00:06:14 > What if you set minthreads == maxthreads? The number of thread create operations will go further down. When running already at minthreads, the connection thread timeout is ignored (otherwise there would be a high number of thread create operations just after the timeout expires to ensure minthreads running connection threads). With connsperthread == 1000, there will be about 10 thread create operations for 1 requests (not counting the 2 initial create operation during startup for minthreads == 2). So, the cpu consumption will be lower, but the server would not scale, when the requests frequency would require more connection threads. Furthermore, there will be most likely more requests put into the queue instead of being served immediately. When we assume, that with minthreads == maxthreads == 2 there won't be more than say 20 requests queued, a similar effect could be achieved by allowing additional thread creations for more than 20 requests in the queue. Or even more conservative, allowing thread creations only when the request queue is completely full (setting the low water mark to 100%) would as well be better than minthreads == maxthreads, since the server will at least start to create additional threads in this rather hopeless situation, where with minthreads == maxthreads, it won't. -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann wrote: > > minthreads = 2 > > creating threads, when idle == 0 >10468 requests, connthreads 267 >total cputime 00:10:32 > > creating threads, when queue >= 5 >requests 10104 connthreads 27 >total cputime 00:06:14 What if you set minthreads == maxthreads? -- Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Dear all, again some update: The mechanism sketched below works now as well in the regression test. There is now a backchannel in place that lets conn threads notify the driver to check the liveliness of the server. This backchannel makes as well the timeout based liveliness checking obsolete. By using the lowwatermark parameter to control thread creation, the resource consumption went down significantly without sacrificing speed for this setup. Here is some data from next-scripting.org, which is a rather idle site with real world traffic (including bots etc.). The server has defined minthreads = 2, is running 2 drivers (nssock + nsssl) and uses a writer thread. before (creating threads, when idle == 0, running server for 2 days) 10468 requests, connthreads 267 total cputime 00:10:32 new (creating threads, when queue >= 5, running server for 2 days) requests 10104 connthreads 27 total cputime 00:06:14 One can see, that the number of create operations for connection threads went down by a factor of 10 (from 267 to 27), and that the cpu consumption was reduced by about 40% (thread initialization costs 0.64 secs in this configuration). One can get a behavior similar to idle==0 by setting the low water mark to 0. The shutdown mechanism is now adjusted to the new infrastructure (connection threads have their own condition variable, so one cannot use the old broadcast to all conn threads anymore). -gustaf neumann Am 07.11.12 02:54, schrieb Gustaf Neumann: > Some update: after some more testing with the new code, i still think, > the version is promising, but needs a few tweaks. I have started to > address the thread creation. > > To sum up the thread creation behavior/configuration of naviserver-tip: > > - minthreads (try to keep at least minthreads threads idle) > - spread (fight against thread mass extinction due to round robin) > - threadtimeout (useless due to round robin) > - connsperthread (the only parameter effectively controlling the > lifespan of an conn thread) > - maxconnections (controls maximum number of connections in the > waiting queue, including running threads) > - concurrentcreatethreshold (percentage of waiting queue full, when > to create threads in concurrently) > > Due to the policy of keeping at least minthreads idle, threads are > preallocated when the load is high, the number of threads never falls > under minthreads by construct. Threads stop mostly due to connsperthread. > > Naviserver with thread queue (fork) > > - minthreads (try to keep at least minthreads threads idle) > - threadtimeout (works effectively, default 120 secs) > - connsperthread (as before, just not varied via spread) > - maxconnections (as before; use maybe "queuesize" instead) > - lowwatermark (new) > - highwatermark (was concurrentcreatethreshold) > > The parameter "spread" is already deleted, since the enqueueing takes > care for a certain distribution, at least, when several threads are > created. Threads are deleted often before connsperthread due to the > timeout. Experiments show furthermore, that the rather agressive > preallocation policy with minthreads idle threads causes now much more > thread destroy and thread create operations than before. With with > OpenACS, thread creation is compute-intense (about 1 sec). > > In the experimental version, connections are only queued when no > connection thread is available (the tip version places every > connection into the queue). Queueing happens with "bulky" requests, > when e.g. a view causes a bunch (on average 5, often 10+, sometimes > 50+) of requests for embedded resources (style files, javascript, > images). It seems that permitting a few queued requests is often a > good idea, since the connection threads will pick these up typically > very quickly. > > To make the aggressiveness of the thread creation policy better > configurable, the experimental version uses for this purpose solely > the number of queued requests based on two parameters: > > - lowwatermark (if the actual queue size is below this value, don't > try to create threads; default 5%) > - highwatermark (if the actual queue size is above this value, allow > parallel thread creates; default 80%) > > To increase the aggressiveness, one could set lowwatermark to e.g. 0, > causing thread-creates, whenever a connection is queued. Increasing > the lowwatermark reduces the willingness to create new threads. The > highwatermark might be useful for benchmark situations, where the > queue is filled up quickly. > > The default values seems to work quite well, it is used currently on > http://next-scripting.org. However we still need some more experiments > on different sites to get a better understanding. > > hmm final comment: for the regression test, i had to add the policy to > create threads, when all connection threads are busy. The config file > of the regression test uses connsperthread 0 (which is t
Re: [naviserver-devel] naviserver with connection thread queue
Some update: after some more testing with the new code, i still think, the version is promising, but needs a few tweaks. I have started to address the thread creation. To sum up the thread creation behavior/configuration of naviserver-tip: - minthreads (try to keep at least minthreads threads idle) - spread (fight against thread mass extinction due to round robin) - threadtimeout (useless due to round robin) - connsperthread (the only parameter effectively controlling the lifespan of an conn thread) - maxconnections (controls maximum number of connections in the waiting queue, including running threads) - concurrentcreatethreshold (percentage of waiting queue full, when to create threads in concurrently) Due to the policy of keeping at least minthreads idle, threads are preallocated when the load is high, the number of threads never falls under minthreads by construct. Threads stop mostly due to connsperthread. Naviserver with thread queue (fork) - minthreads (try to keep at least minthreads threads idle) - threadtimeout (works effectively, default 120 secs) - connsperthread (as before, just not varied via spread) - maxconnections (as before; use maybe "queuesize" instead) - lowwatermark (new) - highwatermark (was concurrentcreatethreshold) The parameter "spread" is already deleted, since the enqueueing takes care for a certain distribution, at least, when several threads are created. Threads are deleted often before connsperthread due to the timeout. Experiments show furthermore, that the rather agressive preallocation policy with minthreads idle threads causes now much more thread destroy and thread create operations than before. With with OpenACS, thread creation is compute-intense (about 1 sec). In the experimental version, connections are only queued when no connection thread is available (the tip version places every connection into the queue). Queueing happens with "bulky" requests, when e.g. a view causes a bunch (on average 5, often 10+, sometimes 50+) of requests for embedded resources (style files, javascript, images). It seems that permitting a few queued requests is often a good idea, since the connection threads will pick these up typically very quickly. To make the aggressiveness of the thread creation policy better configurable, the experimental version uses for this purpose solely the number of queued requests based on two parameters: - lowwatermark (if the actual queue size is below this value, don't try to create threads; default 5%) - highwatermark (if the actual queue size is above this value, allow parallel thread creates; default 80%) To increase the aggressiveness, one could set lowwatermark to e.g. 0, causing thread-creates, whenever a connection is queued. Increasing the lowwatermark reduces the willingness to create new threads. The highwatermark might be useful for benchmark situations, where the queue is filled up quickly. The default values seems to work quite well, it is used currently on http://next-scripting.org. However we still need some more experiments on different sites to get a better understanding. hmm final comment: for the regression test, i had to add the policy to create threads, when all connection threads are busy. The config file of the regression test uses connsperthread 0 (which is the default, but not very good as such), causing the exit every connection thread to exit after every threads. So, when the request comes in, that we have a thread busy, but nothing queued. So, there would not be the need to create a new thread. However, when the conn thread exists, the single request would not be processed. So, much more testing is needed. -gustaf neumann Am 01.11.12 20:17, schrieb Gustaf Neumann: > Dear all, > > There is now a version on bitbucket, which works quite nice > and stable, as far i can tell. I have split up the rather > coarse lock of all pools and introduced finer locks for > waiting queue (wqueue) and thread queue (tqueue) per pool. > The changes lead to significant finer lock granularity and > improve scalability. > > I have tested this new version with a synthetic load of 120 > requests per seconds, some slower requests and some faster > ones, and it appears to be pretty stable. This load keeps > about 20 connection threads quite busy on my home machine. > The contention of the new locks is very little: on this test > we saw 12 busy locks on 217.000 locks on the waiting queue, > and 9 busy locks out of 83.000 locks on the thread queue. > These measures are much better than in current naviserver, > which has on the same test on the queue 248.000 locks with > 190 busy ones. The total waiting time for locks is reduced > by a factor of 10. One has to add, that it was not so bad > before either. The benefit will be larger when multiple > pools are used. > > Finally i think, the code is clearer than before, where the > lock duration was quite tricky to determine
[naviserver-devel] naviserver with connection thread queue
Dear all, There is now a version on bitbucket, which works quite nice and stable, as far i can tell. I have split up the rather coarse lock of all pools and introduced finer locks for waiting queue (wqueue) and thread queue (tqueue) per pool. The changes lead to significant finer lock granularity and improve scalability. I have tested this new version with a synthetic load of 120 requests per seconds, some slower requests and some faster ones, and it appears to be pretty stable. This load keeps about 20 connection threads quite busy on my home machine. The contention of the new locks is very little: on this test we saw 12 busy locks on 217.000 locks on the waiting queue, and 9 busy locks out of 83.000 locks on the thread queue. These measures are much better than in current naviserver, which has on the same test on the queue 248.000 locks with 190 busy ones. The total waiting time for locks is reduced by a factor of 10. One has to add, that it was not so bad before either. The benefit will be larger when multiple pools are used. Finally i think, the code is clearer than before, where the lock duration was quite tricky to determine. opinions? -gustaf neumann PS: For the changes, see: https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets PS2: have not addressed the server exit signaling yet. On 29.10.12 13:41, Gustaf Neumann wrote: > A version of this is in the following fork: > > https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets > > So far, the competition on the pool mutex is quite high, but > i think, it can be improved. Currently primarily the pool > mutex is used for conn thread life-cycle management, and it > is needed from the main/drivers/spoolers as well from the > connection threads to update the idle/running/.. counters > needed for controlling thread creation etc. Differentiating > these mutexes should help. > > i have not addressed the termination signaling, but that's > rather simple. > > -gustaf neumann > > On 28.10.12 03:08, Gustaf Neumann wrote: >> i've just implemented lightweight version of the above (just >> a few lines of code) by extending the connThread Arg >> structure; > > -- > The Windows 8 Center - In partnership with Sourceforge > Your idea - your app - 30 days. > Get started! > http://windows8center.sourceforge.net/ > what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/ > ___ > naviserver-devel mailing list > naviserver-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel