Re: [naviserver-devel] naviserver with connection thread queue
This is good engineering work. I rise my hat to that. Congrats, Maurizio From: Gustaf Neumann [mailto:neum...@wu.ac.at] Sent: 29 November 2012 19:51 To: naviserver-devel@lists.sourceforge.net Subject: Re: [naviserver-devel] naviserver with connection thread queue Dear all, one more update. Here is now the data of a full day with the Async-Writer thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard deviation from 0,7481 to 0,0211. This is a noticable improvement. accept queue run total avg 0, 0,0009 0,1304 0,1313 stdev 0, 0,0211 0,9022 0,9027 min 0, 0, 0,0032 0,0033 max 0, 0,9772 28,1699 28,1700 median 0, 0,0001 0,0208 0,0209 But still, the sometime large values are worth to investigate. It turned out that the large queueing time came from requests from taipeh, which contained several 404 errors. The size of the 404 request is 727 bytes, and therefore under the writersize, which was configured as 1000. The delivery of an error message takes to this site more than a second. Funny enough, the delivery of the error message blocked the connection thread longer than the delivery of the image when it is above the writersize. I will reduce the writersize further, but still a slow delivery can even slow down the delivery of the headers, which happens still in the connection thread. Most likely, i won't address this now. however, i will check the effects of fastpath cache, which was deactivated so far... best regards -gustaf neumann Am 28.11.12 11:38, schrieb Gustaf Neumann: Dear all, here is a short update of the findings and changes since last week. One can now activate for nslog "logpartialtimes", which adds an entry to the access log containing the partial request times (accept time, queuing time, and run time). The sample site runs now with minthreads == 5, maxthreads 10. If one analyzes the data of one day of our sample site, one can see that in this case, the accept time is always 0 (this is caused by nsssl, which is a sync driver, header parsing happens solely in the connection thread), the queue time is on average 54 ms (milliseconds), which is huge, while the median is 0.1 ms. The reason is that we can see a maximum queue time of 21 secs! As a consequence, the standard deviation is as well huge. When one looks more closely to the log file, one can see that at times, when the queuing time is huge, as well the runtime is often huge, leading to cascading effects as described below. cause and reason are hard to determine, since we saw large runtimes even on delivery of rather small times accept queue run total avg 0, 0,0547 0,2656 0,3203 stdev0, 0,7481 0,9740 1,4681 min 0, 0, 0,0134 0,0135 max 0, 21,9594 16,6905 29,1668 median 0, 0,0001 0,0329 0,0330 The causes for the "random" slow cases are at least from two sources: slow delivery (some clients are slow in retrieving data, especially some bots seems to be bad) and system specific latencies (e.g. backup running, other servers becoming active, etc.). To address this problem, i moved more file deliveries to the writer thread (reducing writersize) and activated TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing time is 4.6 ms, but still the standard deviation is huge, caused by some "erratic" some large times. accept queue run total avg 0, 0,0046 0,1538 0,1583 stdev0, 0,0934 0,6868 0,6967 min 0, 0, 0,0137 0,0138 max 0, 4,6041 22,4101 22,4102 median 0, 0,0001 0,0216 0,0217 There are still some huge values, and the queuing time was "lucky" not to be influenced by the still large runtimes of requests. Aside of the top values, one sees sometimes unexplained delays of a few seconds. which seem to block everything. A large source for latencies is the file system. On our production system we already invested some efforts to tune the file system in the past, such we don't see effects as large as on this sample site. Since NaviServer should work well also on non-tuned file systems, i implemented an async writer thread, that decouples writing to files (access.log and error.log) from the connection threads. The interface is the same as unix write(), but does not block. I have this running since yesterday evening on the sample site, and do far, it seems to help to reduce the random latencies significantly. i'll wait with posting data until i have a similar time range to compare, maybe the good times i am seeing are accidentally. I have not yet looked into the details for handling the async writer withing logrotate and shutdown. On another front, i was experimenting with jemalloc and tcmalloc. Previously we had the following figures, using tcl as distributed with zippy malloc with minthreads=2, maxthreads=2 requests 10307 spools 1280 queued 2704 connthreads 11 rss 425 When changing mi
Re: [naviserver-devel] naviserver with connection thread queue
Gustaf Neumann wrote: > Funny enough, the delivery of the error message blocked the connection > thread longer than the delivery of the image when it is above the > writersize. > > I will reduce the writersize further, but still a slow delivery can even > slow down the delivery of the headers, which happens still in the > connection thread. Most likely, i won't address this now. however, i > will check the effects of fastpath cache, which was deactivated so far... Hi Gustaf, This has been some really interesting bits you've been sharing. I keep meaning to add in my thoughts but I haven't had the time. One quick idea on the writer thread is to, regardless of size always make one write attempt in the conn thread, and if less than the complete buffer was written, then pass the remainder off to the writer thread. This would get the best of both worlds - fast/small requests don't incur the overhead of moving the duplicating the buffers, while large/slow requests wouldn't block the conn thread. -J -- Keep yourself connected to Go Parallel: VERIFY Test and improve your parallel project with help from experts and peers. http://goparallel.sourceforge.net ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] naviserver with connection thread queue
Dear all, one more update. Here is now the data of a full day with the Async-Writer thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard deviation from 0,7481 to 0,0211. This is a noticable improvement. accept queue run total avg 0, 0,0009 0,1304 0,1313 stdev 0, 0,0211 0,9022 0,9027 min 0, 0, 0,0032 0,0033 max 0, 0,9772 28,1699 28,1700 median 0, 0,0001 0,0208 0,0209 But still, the sometime large values are worth to investigate. It turned out that the large queueing time came from requests from taipeh, which contained several 404 errors. The size of the 404 request is 727 bytes, and therefore under the writersize, which was configured as 1000. The delivery of an error message takes to this site more than a second. Funny enough, the delivery of the error message blocked the connection thread longer than the delivery of the image when it is above the writersize. I will reduce the writersize further, but still a slow delivery can even slow down the delivery of the headers, which happens still in the connection thread. Most likely, i won't address this now. however, i will check the effects of fastpath cache, which was deactivated so far... best regards -gustaf neumann Am 28.11.12 11:38, schrieb Gustaf Neumann: Dear all, here is a short update of the findings and changes since last week. One can now activate for nslog "logpartialtimes", which adds an entry to the access log containing the partial request times (accept time, queuing time, and run time). The sample site runs now with minthreads == 5, maxthreads 10. If one analyzes the data of one day of our sample site, one can see that in this case, the accept time is always 0 (this is caused by nsssl, which is a sync driver, header parsing happens solely in the connection thread), the queue time is on average 54 ms (milliseconds), which is huge, while the median is 0.1 ms. The reason is that we can see a maximum queue time of 21 secs! As a consequence, the standard deviation is as well huge. When one looks more closely to the log file, one can see that at times, when the queuing time is huge, as well the runtime is often huge, leading to cascading effects as described below. cause and reason are hard to determine, since we saw large runtimes even on delivery of rather small times accept queue run total avg 0, 0,0547 0,2656 0,3203 stdev 0, 0,7481 0,9740 1,4681 min 0, 0, 0,0134 0,0135 max 0, 21,9594 16,6905 29,1668 median 0, 0,0001 0,0329 0,0330 The causes for the "random" slow cases are at least from two sources: slow delivery (some clients are slow in retrieving data, especially some bots seems to be bad) and system specific latencies (e.g. backup running, other servers becoming active, etc.). To address this problem, i moved more file deliveries to the writer thread (reducing writersize) and activated TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing time is 4.6 ms, but still the standard deviation is huge, caused by some "erratic" some large times. accept queue run total avg 0, 0,0046 0,1538 0,1583 stdev 0, 0,0934 0,6868 0,6967 min 0, 0, 0,0137 0,0138 max 0, 4,6041 22,4101 22,4102 median 0, 0,0001 0,0216 0,0217 There are still some huge values, and the queuing time was "lucky" not to be influenced by the still large runtimes of requests. Aside of the top values, one sees sometimes unexplained delays of a few seconds. which seem to block everything. A large source for latencies is the file system. On our production system we already invested some efforts to tune the file system in the past, such we don't see effects as large as on this sample site. Since NaviServer should work well also on non-tuned file systems, i implemented an async writer thread, that decouples writing to files (access.log and error.log) from the connection threads. The interface is the same as unix write(), but does not block. I have this running since yesterday evening on the sample site, and do far, it seems to help to reduce the random latencies significantly. i'll wait with posting data until i have a similar time range to compare, maybe the good times i am seeing are accidentally. I have not yet looked into the details for handling the async writer withing logrotate and shutdown. On another front, i was experimenting with jemalloc and tcmalloc. Previously we had the following figures, using tcl as distributed with zippy malloc with minthreads=2, maxthreads=2 requests 10307 spools 1280 queued 2704 connthreads 11 rss 425 When changing minthreads to 5, maxthreads 10, but using jemalloc, one sees the following figures requests 3933 spools 188 queued 67 connthreads 7 rss 488 requests 6325 spools 256 queued 90 connthreads 9 rss 528 requests 10021 spoo