Re: [naviserver-devel] naviserver with connection thread queue

2012-11-29 Thread Gustaf Neumann

Dear all,

one more update. Here is now the data of a full day with the 
Async-Writer thread. The avg queueing time dropped from 0,0547 to 
0,0009, the standard deviation from 0,7481 to 0,0211. This is a 
noticable improvement.


accept  queue   run total
avg 0,  0,0009  0,1304  0,1313
stdev   0,  0,0211  0,9022  0,9027
min 0,  0,  0,0032  0,0033
max 0,  0,9772  28,1699 28,1700
median  0,  0,0001  0,0208  0,0209

But still, the sometime large values are worth to investigate. It turned 
out that the large queueing time came from requests from taipeh, which 
contained several 404 errors. The size of the 404 request is 727 bytes, 
and therefore under the writersize, which was configured as 1000. The 
delivery of an error message takes to this site more than a second. 
Funny enough, the delivery of the error message blocked the connection 
thread longer than the delivery of the image when it is above the 
writersize.


I will reduce the writersize further, but still a slow delivery can even 
slow down the delivery of the headers, which happens still in the 
connection thread. Most likely, i won't address this now.  however, i 
will check the effects of fastpath cache, which was deactivated so far...


best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:

Dear all,

here is a short update of the findings and changes since last week.

One can now activate for nslog logpartialtimes, which adds an entry 
to the access log containing the partial request times (accept time, 
queuing time, and run time). The sample site runs now with minthreads 
== 5, maxthreads 10.


If one analyzes the data of one day of our sample site, one can see 
that in this case, the accept time is always 0 (this is caused by 
nsssl, which is a sync driver, header parsing happens solely in the 
connection thread), the queue time is on average 54 ms (milliseconds), 
which is huge, while the median is 0.1 ms. The reason is that we can 
see a maximum queue time of 21 secs! As a consequence, the standard 
deviation is as well huge. When one looks more closely to the log 
file, one can see that at times, when the queuing time is huge, as 
well the runtime is often huge, leading to cascading effects as 
described below. cause and reason are hard to determine, since we saw 
large runtimes even on delivery of rather small times

accept  queue   run total
avg 0,  0,0547  0,2656  0,3203
stdev   0,  0,7481  0,9740  1,4681
min 0,  0,  0,0134  0,0135
max 0,  21,9594 16,6905 29,1668
median  0,  0,0001  0,0329  0,0330

The causes for the random slow cases are at least from two sources: 
slow delivery (some clients are slow in retrieving data, especially 
some bots seems to be bad) and system specific latencies (e.g. backup 
running, other servers becoming active, etc.). To address this 
problem, i moved more file deliveries to the writer thread (reducing 
writersize) and activated TCP_DEFER_ACCEPT. One sees significant 
improvements, the average queuing time is 4.6 ms, but still the 
standard deviation is huge, caused by some erratic some large times.

accept  queue   run total
avg 0,  0,0046  0,1538  0,1583
stdev   0,  0,0934  0,6868  0,6967
min 0,  0,  0,0137  0,0138
max 0,  4,6041  22,4101 22,4102
median  0,  0,0001  0,0216  0,0217

There are still some huge values, and the queuing time was lucky not 
to be influenced by the still large runtimes of requests. Aside of the 
top values, one sees sometimes unexplained delays of a few seconds. 
which seem to block everything.


A large source for latencies is the file system. On our production 
system we already invested some efforts to tune the file system in the 
past, such we don't see effects as large as on this sample site. Since 
NaviServer should work well also on non-tuned file systems, i 
implemented an async writer thread, that decouples writing to files 
(access.log and error.log) from the connection threads. The interface 
is the same as unix write(), but does not block.


I have this running since yesterday evening on the sample site, and do 
far, it seems to help to reduce the random latencies significantly. 
i'll wait with posting data until i have a similar time range to 
compare, maybe the good times i am seeing are accidentally.


I have not yet looked into the details for handling the async writer 
withing logrotate and shutdown.


On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with 
zippy malloc


  with minthreads=2, maxthreads=2
  requests 10307 spools 1280 queued 2704 connthreads 11 rss 425

When changing minthreads to 5, maxthreads 10, but using jemalloc,
one sees the following figures
requests 3933 spools 188 queued 67 connthreads 7 rss 488
requests 6325 spools 256 queued 90 connthreads 9 rss 528
requests 10021 spools 378 

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-29 Thread Jeff Rogers
Gustaf Neumann wrote:
 Funny enough, the delivery of the error message blocked the connection
 thread longer than the delivery of the image when it is above the
 writersize.

 I will reduce the writersize further, but still a slow delivery can even
 slow down the delivery of the headers, which happens still in the
 connection thread. Most likely, i won't address this now.  however, i
 will check the effects of fastpath cache, which was deactivated so far...

Hi Gustaf,

This has been some really interesting bits you've been sharing.  I keep 
meaning to add in my thoughts but I haven't had the time.

One quick idea on the writer thread is to, regardless of size always 
make one write attempt in the conn thread, and if less than the complete 
buffer was written, then pass the remainder off to the writer thread. 
This would get the best of both worlds - fast/small requests don't incur 
the overhead of moving the duplicating the buffers, while large/slow 
requests wouldn't block the conn thread.

-J

--
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-29 Thread Maurizio Martignano
This is good engineering work. I rise my hat to that.

Congrats,

Maurizio

 

 

From: Gustaf Neumann [mailto:neum...@wu.ac.at] 
Sent: 29 November 2012 19:51
To: naviserver-devel@lists.sourceforge.net
Subject: Re: [naviserver-devel] naviserver with connection thread queue

 

Dear all,

one more update. Here is now the data of a full day with the Async-Writer
thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard
deviation from 0,7481 to 0,0211. This is a noticable improvement.

accept  queue   run total
avg 0,  0,0009  0,1304  0,1313
stdev   0,  0,0211  0,9022  0,9027
min 0,  0,  0,0032  0,0033
max 0,  0,9772  28,1699 28,1700
median  0,  0,0001  0,0208  0,0209
 

But still, the sometime large values are worth to investigate. It turned out
that the large queueing time came from requests from taipeh, which contained
several 404 errors. The size of the 404 request is 727 bytes, and therefore
under the writersize, which was configured as 1000. The delivery of an error
message takes to this site more than a second. Funny enough, the delivery of
the error message blocked the connection thread longer than the delivery of
the image when it is above the writersize.

I will reduce the writersize further, but still a slow delivery can even
slow down the delivery of the headers, which happens still in the connection
thread. Most likely, i won't address this now.  however, i will check the
effects of fastpath cache, which was deactivated so far...

best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:

Dear all,

here is a short update of the findings and changes since last week. 

One can now activate for nslog logpartialtimes, which adds an entry to the
access log containing the partial request times (accept time, queuing time,
and run time). The sample site runs now with minthreads == 5, maxthreads 10.

If one analyzes the data of one day of our sample site, one can see that in
this case, the accept time is always 0 (this is caused by nsssl, which is a
sync driver, header parsing happens solely in the connection thread), the
queue time is on average 54 ms (milliseconds), which is huge, while the
median is 0.1 ms. The reason is that we can see a maximum queue time of 21
secs! As a consequence, the standard deviation is as well huge. When one
looks more closely to the log file, one can see that at times, when the
queuing time is huge, as well the runtime is often huge, leading to
cascading effects as described below. cause and reason are hard to
determine, since we saw large runtimes even on delivery of rather small
times

  accept  queue   run total
avg  0,  0,0547  0,2656  0,3203
stdev0,  0,7481  0,9740  1,4681
min  0,  0,  0,0134  0,0135
max  0,  21,9594 16,6905 29,1668
median   0,  0,0001  0,0329  0,0330
 

The causes for the random slow cases are at least from two sources: slow
delivery (some clients are slow in retrieving data, especially some bots
seems to be bad) and system specific latencies (e.g. backup running, other
servers becoming active, etc.). To address this problem, i moved more file
deliveries to the writer thread (reducing writersize) and activated
TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing
time is 4.6 ms, but still the standard deviation is huge, caused by some
erratic some large times.

  accept  queue   run total
avg  0,  0,0046  0,1538  0,1583
stdev0,  0,0934  0,6868  0,6967
min  0,  0,  0,0137  0,0138
max  0,  4,6041  22,4101 22,4102
median   0,  0,0001  0,0216  0,0217
 

There are still some huge values, and the queuing time was lucky not to be
influenced by the still large runtimes of requests. Aside of the top values,
one sees sometimes unexplained delays of a few seconds. which seem to block
everything.

A large source for latencies is the file system. On our production system we
already invested some efforts to tune the file system in the past, such we
don't see effects as large as on this sample site. Since NaviServer should
work well also on non-tuned file systems, i implemented an async writer
thread, that decouples writing to files (access.log and error.log) from the
connection threads. The interface is the same as unix write(), but does not
block.

I have this running since yesterday evening on the sample site, and do far,
it seems to help to reduce the random latencies significantly. i'll wait
with posting data until i have a similar time range to compare, maybe the
good times i am seeing are accidentally.

I have not yet looked into the details for handling the async writer withing
logrotate and shutdown.

On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with zippy
malloc

  with minthreads=2, maxthreads=2
  requests 10307 spools 1280 queued 2704 connthreads 11 rss 425

When changing