Re: [naviserver-devel] naviserver with connection thread queue

2012-12-06 Thread Gustaf Neumann
On 06.12.12 13:13, Stephen Deasey wrote:
> I guess it depends on how the website is deployed: in a more modern
> set-up CSS is often compiled from SASS or LESS; javascript needs to be
> minified and combined, possibly compiled using Google's optmising
> compiler, maybe from coffee script; images are compressed, etc. Making
> gzip versions of static text/* files is just one more target in a
> Makefile.  Which is a little different than the old PHP/OpenACS
> perspective where everything happens at run-time.
Modern PHP/OpenACS installations use reverse proxies like 
nginx for static content, where one has the option to 
compress files on the fly or to deliver pre-compressed 
binaries. When we switched our production site to gzip 
delivery for the dynamic content, we did not notice any 
difference in cpu-load. Sure, delivering static gziped 
content is faster than zipping on the fly, but i would like 
to keep the burden on the site master low.

Not sure, why we are discussing this now. My original 
argument was that the api-structure for deliveries is overly 
complicated (to put it mildly)  and not orthogonal (i failed 
to understand it without drawing the call-graph). There is a 
lot of room for improvement.

-gustaf neumann

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-06 Thread Stephen Deasey
On Tue, Dec 4, 2012 at 10:55 PM, Gustaf Neumann  wrote:
> Am 04.12.12 20:06, schrieb Stephen Deasey:
>
> - we should actually ship some code which searches for *.gz versions of
> static files
>
> this would mean to keep a .gz version and a non-.gz version in the file
> system for the cases, where gzip is not an accepted encoding. Not sure, i
> would like to manage these files and to keep it in sync the fast-path
> cache could keep gzipped copies, invalidation is already there.

I guess it depends on how the website is deployed: in a more modern
set-up CSS is often compiled from SASS or LESS; javascript needs to be
minified and combined, possibly compiled using Google's optmising
compiler, maybe from coffee script; images are compressed, etc. Making
gzip versions of static text/* files is just one more target in a
Makefile.  Which is a little different than the old PHP/OpenACS
perspective where everything happens at run-time.

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-06 Thread Stephen Deasey
On Tue, Dec 4, 2012 at 10:24 PM, Gustaf Neumann  wrote:
>
> Today, i was hunting another problem in connection with nsssl, which
> turns out to be a weakness of our interfaces. The source for the problem
> is that the buffer management of OpenSSL is not aligned with the buffer
> management in naviserver. In the naviserver driver, all receive requests
> are triggered via the poll, when sockets are readable. With OpenSSL it
> might be as well possible that data as a leftover from an earlier
> receive when a smaller buffer is provided. Naviserver requested during
> upload spool reveive operations with a 4KB buffer. OpenSSL might receive
> "at once" 16KB. The read operation with the small buffer will not drain
> the OpenSSL buffer, and later, poll() will not be triggered by the fact,
> that the socket is readable (since the buffer is still quite full). The
> problem happened in NaviServer, when the input was spooled (e.g. file
> uploads). I have doubts that this combination ever worked. I have
> corrected the problem by increasing the buffer variable in the driver.c.
> The cleaner implementation would be to add an "Ns_DriverReadableProc
> Readable"  similar to the "Ns_DriverKeepProc Keep", but that would
> effect the interface of all drivers.

Another way to use the openssl library is to manage socket read/writes
yourself and hand memory buffers to openssl to encrypt/decrypt.

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Gustaf Neumann
Am 05.12.12 00:41, schrieb Stephen Deasey:
> On Wed, Nov 28, 2012 at 10:38 AM, Gustaf Neumann  wrote:
>> It is interesting to see, that with always 5 connections threads running and
>> using jemalloc, we see a rss consumption only slightly larger than with
>> plain tcl and zippy malloc having maxthreads == 2, having less requests
>> queued.
>>
>> Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10
>>
>> requests 2062 spools 49 queued 3 connthreads 6 rss 376
>> requests 7743 spools 429 queued 359 connthreads 11 rss 466
>> requests 8389 spools 451 queued 366 connthreads 12 rss 466
>>
>> which is even better.
> Min/max threads 5/10 better than 2/10?
the numbers show that 5/10 with tcmalloc is better than 5/10 with 
jemalloc and only slghtly worse than 2/2 with zippymalloc.

-gn

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Stephen Deasey
On Wed, Nov 28, 2012 at 10:38 AM, Gustaf Neumann  wrote:
>
> It is interesting to see, that with always 5 connections threads running and
> using jemalloc, we see a rss consumption only slightly larger than with
> plain tcl and zippy malloc having maxthreads == 2, having less requests
> queued.
>
> Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10
>
>requests 2062 spools 49 queued 3 connthreads 6 rss 376
>requests 7743 spools 429 queued 359 connthreads 11 rss 466
>requests 8389 spools 451 queued 366 connthreads 12 rss 466
>
> which is even better.

Min/max threads 5/10 better than 2/10? How about 7/10? When you hit
10/10 you can delete an awful lot of code :-)

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Stephen Deasey
On Tue, Dec 4, 2012 at 10:55 PM, Gustaf Neumann  wrote:
>
> The code in naviserver-connthreadqueue handles already read-aheads with SSL.
> i have removed there these hacks already; i think, these were in part
> responsible for the sometimes erratic response times with SSL.

Well, I think the thing here is one-upon-a-time SSL was considered
computationally expensive (I don't know if it still is, with recent
Intel cpus having dedicated AES instructions etc.). Read-ahead is good
because you don't want an expensive conn thread waiting around for the
whole request to arrive, packet by packet. But with SSL the single
driver thread will be decrypting read-ahead data for multiple sockets
and may run out of cpu, stalling the request pipeline, starving the
conn threads. By making the SSL driver thread non-async you lose out
on read-ahead as that all happens on the conn thread, but you gain cpu
resources on a multi-cpu system (all of them, today). AOLserver 4.5
added a pool of read-ahead threads, one per-socket IIRC, to keep the
benefits of read-ahead while gaining cpu parallelism.

- does a single driver thread have enough computational resources to
decrypt all sockets currently in read-ahead? This is going to depend
on the algorithm. Might want to favour AES if you know your cpu has
support.
- which is worse, losing read-ahead, or losing cpu-parallelism?
- if a read-ahead thread-pool is added, should it be one thread
per-socket, which is simple, or one thread per-cpu and some kind of
balancing mechanism?

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Gustaf Neumann

Am 04.12.12 20:06, schrieb Stephen Deasey:
- we should actually ship some code which searches for *.gz versions 
of static files
this would mean to keep a .gz version and a non-.gz version in the file 
system for the cases, where gzip is not an accepted encoding. Not sure, 
i would like to manage these files and to keep it in sync the 
fast-path cache could keep gzipped copies, invalidation is already there.


* Similarly, range requests are not handled when the data is not
sent ReturnOpen to the writer Queue.


The diagram shows Ns_ConnReturnData also calls ReturnRange, and hence 
the other leg of fastpath and all the main data sending routines 
should handle range requests.
this path is ok. when neither mmap or cache is set, fastpath can call 
ReturnOpenFd, and ReturnOpen send the data blindly to the writer if 
configured, which does not handle ranges. This needs some refactoring.


* there is quite some potential to simplify / orthogonalize the
servers infrastructure.
* improving this structure has nothing to do with
naviserver-connthreadqueue, and should happen at some time in the
main tip.


The writer thread was one of the last bits of code to land before 
things quietened down, and a lot of the stuff that got talked about 
didn't get implemented.


i am not complaining, just trying to understand the historical layers. 
Without the call-graph the current code is hard to follow.


One thing that was mentioned was having a call-back interface where 
you submit a function to the writer thread and it runs it. This would 
allow other kinds of requests to be served async.


One of the things we've been talking about with the connthread work is 
simplification. The current code, with it's workarounds for stalls and 
managing thread counts is very complicated. If it were simplified and 
genericised it could also be used for background writer threads, and 
SSL read-ahead threads (as in aolserver > 4.5). So, that's another +1 
for keeping the conn threads simple.
The code in naviserver-connthreadqueue handles already read-aheads with 
SSL. i have removed there these hacks already; i think, these were in 
part responsible for the sometimes erratic response times with SSL.


-gustaf neuamnn



--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Gustaf Neumann
Am 04.12.12 20:25, schrieb Stephen Deasey:
> I found this nifty site the other day:
>
>  https://www.ssllabs.com/ssltest/analyze.html?d=next-scripting.org
>
> It's highlighting a few things that need fixed in the nsssl module,
> including a couple of security bugs. Looks like relatively little code
> though.
The report is already much better: now everything is green. Most of the 
complaints could be removed via configuration, just two issues required 
code changes (requires a flag, which is not available in all current 
OpenSSL implementation, such as that from Mac OS X, and adding a 
callback). The security rating is now better than from nginx.

Today, i was hunting another problem in connection with nsssl, which 
turns out to be a weakness of our interfaces. The source for the problem 
is that the buffer management of OpenSSL is not aligned with the buffer 
management in naviserver. In the naviserver driver, all receive requests 
are triggered via the poll, when sockets are readable. With OpenSSL it 
might be as well possible that data as a leftover from an earlier 
receive when a smaller buffer is provided. Naviserver requested during 
upload spool reveive operations with a 4KB buffer. OpenSSL might receive 
"at once" 16KB. The read operation with the small buffer will not drain 
the OpenSSL buffer, and later, poll() will not be triggered by the fact, 
that the socket is readable (since the buffer is still quite full). The 
problem happened in NaviServer, when the input was spooled (e.g. file 
uploads). I have doubts that this combination ever worked. I have 
corrected the problem by increasing the buffer variable in the driver.c. 
The cleaner implementation would be to add an "Ns_DriverReadableProc 
Readable"  similar to the "Ns_DriverKeepProc Keep", but that would 
effect the interface of all drivers.

-gustaf neumann


--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Stephen Deasey
On Thu, Nov 29, 2012 at 6:51 PM, Gustaf Neumann  wrote:
>
> It turned out
> that the large queueing time came from requests from taipeh, which contained
> several 404 errors. The size of the 404 request is 727 bytes, and therefore
> under the writersize, which was configured as 1000. The delivery of an error
> message takes to this site more than a second. Funny enough, the delivery of
> the error message blocked the connection thread longer than the delivery of
> the image when it is above the writersize.
>
> I will reduce the writersize further, but still a slow delivery can even
> slow down the delivery of the headers, which happens still in the connection
> thread.

This shouldn't be the case for strings, or data sent from the fast
path cache, such as a small file (a custom 404), as eventually those
should work their way down to Ns_ConnWriteData which will construct
the headers if not already sent and pass them, along with the data
payload to writev(2). Linux should coalesce the buffers and send in a
single packet, if small enough.

I wonder if this is some kind of weird nsssl interaction.

(For things like sendfile without ssl we could use TCP_CORK to
coalesce the headers with the body)

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Stephen Deasey
On Mon, Dec 3, 2012 at 10:38 AM, Gustaf Neumann  wrote:
>
> All changes are on bitbucket (nsssl and naviserver-connthreadqueue).


I found this nifty site the other day:

https://www.ssllabs.com/ssltest/analyze.html?d=next-scripting.org

It's highlighting a few things that need fixed in the nsssl module,
including a couple of security bugs. Looks like relatively little code
though.

Also, there's this:

https://insouciant.org/tech/ssl-performance-case-study/

which is a pretty good explanation of things from a performance point
of view. I haven't spent much time looking at SSL. Looks like there
could be some big wins. For example, some of the stuff to do with
certificate chains could probably be automated - the server could spit
out an informative error to the log if things look poorly optimised.

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-04 Thread Stephen Deasey
On Tue, Dec 4, 2012 at 5:21 PM, Gustaf Neumann  wrote:

>
> * Only content sent via Ns_ConnWriteVChars has the chance to get
> compressed.
>

ie. dynamic content with a text/* mime-type. The idea here was you don't
want to try and compress gifs an so on, and static content could be
pre-compressed on disk - at runtime simple look for a *.gz version of the
content.

This could be cleaned up a bit by:

- having an extendable white-list of mime-types which should be compressed:
text/*, application/javascript, application/xml etc.

- we should actually ship some code which searches for *.gz versions of
static files



> * Similarly, range requests are not handled when the data is not sent
> ReturnOpen to the writer Queue.
>

The diagram shows Ns_ConnReturnData also calls ReturnRange, and hence the
other leg of fastpath and all the main data sending routines should handle
range requests.



>
> * there is quite some potential to simplify / orthogonalize the servers
> infrastructure.
> * improving this structure has nothing to do with
> naviserver-connthreadqueue, and should happen at some time in the main tip.
>


The writer thread was one of the last bits of code to land before things
quietened down, and a lot of the stuff that got talked about didn't get
implemented. One thing that was mentioned was having a call-back interface
where you submit a function to the writer thread and it runs it. This would
allow other kinds of requests to be served async.

One of the things we've been talking about with the connthread work is
simplification. The current code, with it's workarounds for stalls and
managing thread counts is very complicated. If it were simplified and
genericised it could also be used for background writer threads, and SSL
read-ahead threads (as in aolserver > 4.5). So, that's another +1 for
keeping the conn threads simple.
--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-12-03 Thread Gustaf Neumann

Am 29.11.12 19:51, schrieb Gustaf Neumann:
> However, i am still in the process to clean up and address
> some strange interactions (e.g. for nsssl some socket
> closing interactions between driver and connection threads
> seems to complex to me), so i am still for a while busy with that

The problem in the bad interaction was that the driver poll reported, 
there should be data to read from the socket, while the nsssl receive 
said, there is none and vice versa. Since the nsssl was defined as sync, 
the socket processing happens in the connection threads via 
NsGetRequest(), which returned frequently empty, meaning a useless 
wakeup and ConnRun etc. in the connection thread. The reason for this 
was that the nsssl sets the socket to non-blocking without defining the 
driver as ASYNC, so that it was possible that a receive returned 0 
bytes. Setting the driver simply to ASYNC did not help either, since the 
driver used its own event loop via Ns_SockTimedWait(), which interacts 
badly with the drivers poll loop (did not realize on Linux that a socket 
becomes readable, while this worked on Mac OS X). So i changed the 
receive operation to simply report the read state back to the main 
driver, which works now apparently ok (tested on Mac OS X and Linux). To 
be on the save side, i have tagged the previous version of nsssl on 
bitbucket and incremented the version number. This change led as well to 
a simplification in the main driver.


The new version is running since sat on next-scripting.org and shows 
even better results. The average queue time is lower by again a factor 
of 4, the standard deviation is as well much better.


accept  queue   run total
avg 0,0599  0,0002  0,0183  0,0196
stdev   0,3382  0,0039  0,0413  0,0530
min 0,  0,0001  0,0011  0,0012
max 12,1002 0,2324  0,8958  2,0112
median  0,0088  0,0001  0,0118  0,0120

the accept times show now up since nsssl is async, but these are often 
not meaningful for the performance (due to closewait the accept might be 
a previous accept operation). now, all performance measurement happens 
relative to queue time (similar as in naviserver in the main tip).


Note, that this data is based not on a synthetic test but on a 
real-world web site with potentially different traffic patterns from day 
to day, so don't draw too much conclusions from this. By moving from 
sync to async some potentially time consuming processing (waiting for 
more data) is moved from the connection thread to the driver's event 
loop, so the time span in the connection thread is reduced (for nsssl 
the same way as it was before for nssock). Therefore the "runtime" has 
less randomness and is shorter, leading to the improvements.


The behavior of the async log writer is now configurable (by default 
turned off), log-rolling and shutdown should now wort with this reliable 
as well.


All changes are on bitbucket (nsssl and naviserver-connthreadqueue). The 
server measures now as well the filter time separately from the runtime, 
but i am just in the process to collect data on that...


-gustaf neumann


one more update. Here is now the data of a full day with the 
Async-Writer thread. The avg queueing time dropped from 0,0547 to 
0,0009, the standard deviation from 0,7481 to 0,0211. This is a 
noticable improvement.

accept  queue   run total
avg 0,  0,0009  0,1304  0,1313
stdev   0,  0,0211  0,9022  0,9027
min 0,  0,  0,0032  0,0033
max 0,  0,9772  28,1699 28,1700
median  0,  0,0001  0,0208  0,0209

But still, the sometime large values are worth to investigate. It 
turned out that the large queueing time came from requests from 
taipeh, which contained several 404 errors. The size of the 404 
request is 727 bytes, and therefore under the writersize, which was 
configured as 1000. The delivery of an error message takes to this 
site more than a second. Funny enough, the delivery of the error 
message blocked the connection thread longer than the delivery of the 
image when it is above the writersize.


I will reduce the writersize further, but still a slow delivery can 
even slow down the delivery of the headers, which happens still in the 
connection thread. Most likely, i won't address this now.  however, i 
will check the effects of fastpath cache, which was deactivated so far...


best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:

Dear all,

here is a short update of the findings and changes since last week.

One can now activate for nslog "logpartialtimes", which adds an entry 
to the access log containing the partial request times (accept time, 
queuing time, and run time). The sample site runs now with minthreads 
== 5, maxthreads 10.


If one analyzes the data of one day of our sample site, one can see 
that in this case, the accept time is always 0 (this is caused by 
nsssl, which is a sync driver, header parsing happens solely in the 
connection thread), the queue time is on

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-30 Thread Gustaf Neumann
Am 29.11.12 20:24, schrieb Jeff Rogers:
>
> Hi Gustaf,
>
> One quick idea on the writer thread is to, regardless of size always 
> make one write attempt in the conn thread, and if less than the 
> complete buffer was written, then pass the remainder off to the writer 
> thread. This would get the best of both worlds - fast/small requests 
> don't incur the overhead of moving the duplicating the buffers, while 
> large/slow requests wouldn't block the conn thread.
yes, that might be interesting to investigate as well. one has to 
differentiation between synchronous and asynchronous drivers and between 
file-deliveries and string-deliveries, etc. so in the general case, it 
might be some more work. However, i am still in the process to clean up 
and address some strange interactions (e.g. for nsssl some socket 
closing interactions between driver and connection threads seems to 
complex to me), so i am still for a while busy with that. i hope, that i 
don't create too much collateral damage and let you know, when the new 
stuff is stabilized...

-gustaf neumann

-- 
Univ.Prof. Dr. Gustaf Neumann
Institute of Information Systems and New Media
WU Vienna
Augasse 2-6, A-1090 Vienna, AUSTRIA


--
Keep yourself connected to Go Parallel: 
TUNE You got it built. Now make it sing. Tune shows you how.
http://goparallel.sourceforge.net
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-29 Thread Maurizio Martignano
This is good engineering work. I rise my hat to that.

Congrats,

Maurizio

 

 

From: Gustaf Neumann [mailto:neum...@wu.ac.at] 
Sent: 29 November 2012 19:51
To: naviserver-devel@lists.sourceforge.net
Subject: Re: [naviserver-devel] naviserver with connection thread queue

 

Dear all,

one more update. Here is now the data of a full day with the Async-Writer
thread. The avg queueing time dropped from 0,0547 to 0,0009, the standard
deviation from 0,7481 to 0,0211. This is a noticable improvement.

accept  queue   run total
avg 0,  0,0009  0,1304  0,1313
stdev   0,  0,0211  0,9022  0,9027
min 0,  0,  0,0032  0,0033
max 0,  0,9772  28,1699 28,1700
median  0,  0,0001  0,0208  0,0209
 

But still, the sometime large values are worth to investigate. It turned out
that the large queueing time came from requests from taipeh, which contained
several 404 errors. The size of the 404 request is 727 bytes, and therefore
under the writersize, which was configured as 1000. The delivery of an error
message takes to this site more than a second. Funny enough, the delivery of
the error message blocked the connection thread longer than the delivery of
the image when it is above the writersize.

I will reduce the writersize further, but still a slow delivery can even
slow down the delivery of the headers, which happens still in the connection
thread. Most likely, i won't address this now.  however, i will check the
effects of fastpath cache, which was deactivated so far...

best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:

Dear all,

here is a short update of the findings and changes since last week. 

One can now activate for nslog "logpartialtimes", which adds an entry to the
access log containing the partial request times (accept time, queuing time,
and run time). The sample site runs now with minthreads == 5, maxthreads 10.

If one analyzes the data of one day of our sample site, one can see that in
this case, the accept time is always 0 (this is caused by nsssl, which is a
sync driver, header parsing happens solely in the connection thread), the
queue time is on average 54 ms (milliseconds), which is huge, while the
median is 0.1 ms. The reason is that we can see a maximum queue time of 21
secs! As a consequence, the standard deviation is as well huge. When one
looks more closely to the log file, one can see that at times, when the
queuing time is huge, as well the runtime is often huge, leading to
cascading effects as described below. cause and reason are hard to
determine, since we saw large runtimes even on delivery of rather small
times

  accept  queue   run total
avg  0,  0,0547  0,2656  0,3203
stdev0,  0,7481  0,9740  1,4681
min  0,  0,  0,0134  0,0135
max  0,  21,9594 16,6905 29,1668
median   0,  0,0001  0,0329  0,0330
 

The causes for the "random" slow cases are at least from two sources: slow
delivery (some clients are slow in retrieving data, especially some bots
seems to be bad) and system specific latencies (e.g. backup running, other
servers becoming active, etc.). To address this problem, i moved more file
deliveries to the writer thread (reducing writersize) and activated
TCP_DEFER_ACCEPT. One sees significant improvements, the average queuing
time is 4.6 ms, but still the standard deviation is huge, caused by some
"erratic" some large times.

  accept  queue   run total
avg  0,  0,0046  0,1538  0,1583
stdev0,  0,0934  0,6868  0,6967
min  0,  0,  0,0137  0,0138
max  0,  4,6041  22,4101 22,4102
median   0,  0,0001  0,0216  0,0217
 

There are still some huge values, and the queuing time was "lucky" not to be
influenced by the still large runtimes of requests. Aside of the top values,
one sees sometimes unexplained delays of a few seconds. which seem to block
everything.

A large source for latencies is the file system. On our production system we
already invested some efforts to tune the file system in the past, such we
don't see effects as large as on this sample site. Since NaviServer should
work well also on non-tuned file systems, i implemented an async writer
thread, that decouples writing to files (access.log and error.log) from the
connection threads. The interface is the same as unix write(), but does not
block.

I have this running since yesterday evening on the sample site, and do far,
it seems to help to reduce the random latencies significantly. i'll wait
with posting data until i have a similar time range to compare, maybe the
good times i am seeing are accidentally.

I have not yet looked into the details for handling the async writer withing
logrotate and shutdown.

On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with zippy
malloc

  with minthreads=2, maxthreads=2
  requests 10307 spools

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-29 Thread Jeff Rogers
Gustaf Neumann wrote:
> Funny enough, the delivery of the error message blocked the connection
> thread longer than the delivery of the image when it is above the
> writersize.
>
> I will reduce the writersize further, but still a slow delivery can even
> slow down the delivery of the headers, which happens still in the
> connection thread. Most likely, i won't address this now.  however, i
> will check the effects of fastpath cache, which was deactivated so far...

Hi Gustaf,

This has been some really interesting bits you've been sharing.  I keep 
meaning to add in my thoughts but I haven't had the time.

One quick idea on the writer thread is to, regardless of size always 
make one write attempt in the conn thread, and if less than the complete 
buffer was written, then pass the remainder off to the writer thread. 
This would get the best of both worlds - fast/small requests don't incur 
the overhead of moving the duplicating the buffers, while large/slow 
requests wouldn't block the conn thread.

-J

--
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-29 Thread Gustaf Neumann

Dear all,

one more update. Here is now the data of a full day with the 
Async-Writer thread. The avg queueing time dropped from 0,0547 to 
0,0009, the standard deviation from 0,7481 to 0,0211. This is a 
noticable improvement.


accept  queue   run total
avg 0,  0,0009  0,1304  0,1313
stdev   0,  0,0211  0,9022  0,9027
min 0,  0,  0,0032  0,0033
max 0,  0,9772  28,1699 28,1700
median  0,  0,0001  0,0208  0,0209

But still, the sometime large values are worth to investigate. It turned 
out that the large queueing time came from requests from taipeh, which 
contained several 404 errors. The size of the 404 request is 727 bytes, 
and therefore under the writersize, which was configured as 1000. The 
delivery of an error message takes to this site more than a second. 
Funny enough, the delivery of the error message blocked the connection 
thread longer than the delivery of the image when it is above the 
writersize.


I will reduce the writersize further, but still a slow delivery can even 
slow down the delivery of the headers, which happens still in the 
connection thread. Most likely, i won't address this now.  however, i 
will check the effects of fastpath cache, which was deactivated so far...


best regards
-gustaf neumann

Am 28.11.12 11:38, schrieb Gustaf Neumann:

Dear all,

here is a short update of the findings and changes since last week.

One can now activate for nslog "logpartialtimes", which adds an entry 
to the access log containing the partial request times (accept time, 
queuing time, and run time). The sample site runs now with minthreads 
== 5, maxthreads 10.


If one analyzes the data of one day of our sample site, one can see 
that in this case, the accept time is always 0 (this is caused by 
nsssl, which is a sync driver, header parsing happens solely in the 
connection thread), the queue time is on average 54 ms (milliseconds), 
which is huge, while the median is 0.1 ms. The reason is that we can 
see a maximum queue time of 21 secs! As a consequence, the standard 
deviation is as well huge. When one looks more closely to the log 
file, one can see that at times, when the queuing time is huge, as 
well the runtime is often huge, leading to cascading effects as 
described below. cause and reason are hard to determine, since we saw 
large runtimes even on delivery of rather small times

accept  queue   run total
avg 0,  0,0547  0,2656  0,3203
stdev   0,  0,7481  0,9740  1,4681
min 0,  0,  0,0134  0,0135
max 0,  21,9594 16,6905 29,1668
median  0,  0,0001  0,0329  0,0330

The causes for the "random" slow cases are at least from two sources: 
slow delivery (some clients are slow in retrieving data, especially 
some bots seems to be bad) and system specific latencies (e.g. backup 
running, other servers becoming active, etc.). To address this 
problem, i moved more file deliveries to the writer thread (reducing 
writersize) and activated TCP_DEFER_ACCEPT. One sees significant 
improvements, the average queuing time is 4.6 ms, but still the 
standard deviation is huge, caused by some "erratic" some large times.

accept  queue   run total
avg 0,  0,0046  0,1538  0,1583
stdev   0,  0,0934  0,6868  0,6967
min 0,  0,  0,0137  0,0138
max 0,  4,6041  22,4101 22,4102
median  0,  0,0001  0,0216  0,0217

There are still some huge values, and the queuing time was "lucky" not 
to be influenced by the still large runtimes of requests. Aside of the 
top values, one sees sometimes unexplained delays of a few seconds. 
which seem to block everything.


A large source for latencies is the file system. On our production 
system we already invested some efforts to tune the file system in the 
past, such we don't see effects as large as on this sample site. Since 
NaviServer should work well also on non-tuned file systems, i 
implemented an async writer thread, that decouples writing to files 
(access.log and error.log) from the connection threads. The interface 
is the same as unix write(), but does not block.


I have this running since yesterday evening on the sample site, and do 
far, it seems to help to reduce the random latencies significantly. 
i'll wait with posting data until i have a similar time range to 
compare, maybe the good times i am seeing are accidentally.


I have not yet looked into the details for handling the async writer 
withing logrotate and shutdown.


On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with 
zippy malloc


  with minthreads=2, maxthreads=2
  requests 10307 spools 1280 queued 2704 connthreads 11 rss 425

When changing minthreads to 5, maxthreads 10, but using jemalloc,
one sees the following figures
requests 3933 spools 188 queued 67 connthreads 7 rss 488
requests 6325 spools 256 queued 90 connthreads 9 rss 528
requests 10021 spoo

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-28 Thread Gustaf Neumann

Dear all,

here is a short update of the findings and changes since last week.

One can now activate for nslog "logpartialtimes", which adds an entry to 
the access log containing the partial request times (accept time, 
queuing time, and run time). The sample site runs now with minthreads == 
5, maxthreads 10.


If one analyzes the data of one day of our sample site, one can see that 
in this case, the accept time is always 0 (this is caused by nsssl, 
which is a sync driver, header parsing happens solely in the connection 
thread), the queue time is on average 54 ms (milliseconds), which is 
huge, while the median is 0.1 ms. The reason is that we can see a 
maximum queue time of 21 secs! As a consequence, the standard deviation 
is as well huge. When one looks more closely to the log file, one can 
see that at times, when the queuing time is huge, as well the runtime is 
often huge, leading to cascading effects as described below. cause and 
reason are hard to determine, since we saw large runtimes even on 
delivery of rather small times


accept  queue   run total
avg 0,  0,0547  0,2656  0,3203
stdev   0,  0,7481  0,9740  1,4681
min 0,  0,  0,0134  0,0135
max 0,  21,9594 16,6905 29,1668
median  0,  0,0001  0,0329  0,0330

The causes for the "random" slow cases are at least from two sources: 
slow delivery (some clients are slow in retrieving data, especially some 
bots seems to be bad) and system specific latencies (e.g. backup 
running, other servers becoming active, etc.). To address this problem, 
i moved more file deliveries to the writer thread (reducing writersize) 
and activated TCP_DEFER_ACCEPT. One sees significant improvements, the 
average queuing time is 4.6 ms, but still the standard deviation is 
huge, caused by some "erratic" some large times.


accept  queue   run total
avg 0,  0,0046  0,1538  0,1583
stdev   0,  0,0934  0,6868  0,6967
min 0,  0,  0,0137  0,0138
max 0,  4,6041  22,4101 22,4102
median  0,  0,0001  0,0216  0,0217

There are still some huge values, and the queuing time was "lucky" not 
to be influenced by the still large runtimes of requests. Aside of the 
top values, one sees sometimes unexplained delays of a few seconds. 
which seem to block everything.


A large source for latencies is the file system. On our production 
system we already invested some efforts to tune the file system in the 
past, such we don't see effects as large as on this sample site. Since 
NaviServer should work well also on non-tuned file systems, i 
implemented an async writer thread, that decouples writing to files 
(access.log and error.log) from the connection threads. The interface is 
the same as unix write(), but does not block.


I have this running since yesterday evening on the sample site, and do 
far, it seems to help to reduce the random latencies significantly. i'll 
wait with posting data until i have a similar time range to compare, 
maybe the good times i am seeing are accidentally.


I have not yet looked into the details for handling the async writer 
withing logrotate and shutdown.


On another front, i was experimenting with jemalloc and tcmalloc.
Previously we had the following figures, using tcl as distributed with 
zippy malloc


  with minthreads=2, maxthreads=2
  requests 10307 spools 1280 queued 2704 connthreads 11 rss 425

When changing minthreads to 5, maxthreads 10, but using jemalloc,
one sees the following figures

   requests 3933 spools 188 queued 67 connthreads 7 rss 488
   requests 6325 spools 256 queued 90 connthreads 9 rss 528
   requests 10021 spools 378 queued 114 connthreads 14 rss 530

It is interesting to see, that with always 5 connections threads running 
and using jemalloc, we see a rss consumption only slightly larger than 
with plain tcl and zippy malloc having maxthreads == 2, having less 
requests queued.


Similarly, with tcmalloc we see with minthreads to 5, maxthreads 10

   requests 2062 spools 49 queued 3 connthreads 6 rss 376
   requests 7743 spools 429 queued 359 connthreads 11 rss 466
   requests 8389 spools 451 queued 366 connthreads 12 rss 466

which is even better.

For more information on malloc tests see
https://next-scripting.org/xowiki/docs/misc/thread-mallocs
or the tcl-core mailing list.

That's all for now

-gustaf neumann

Am 20.11.12 20:07, schrieb Gustaf Neumann:

Dear all,

The idea of controlling the number of running threads via queuing 
latency is interesting, but i have to look into the details before i 
can comment on this.
before one can consider controlling the number of running threads via 
queuing latency, one has to improve the awareness in NaviServer about 
the various phases in the requests lifetime.


In the experimental version, we have now the following time stamps 
recorded:


-  acceptTime (the time, a socket was accepted)
-  requestQueueTime (the time the request was queued; was startTime)
-  requestDequeueTime

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-20 Thread Gustaf Neumann

Dear all,

The idea of controlling the number of running threads via queuing 
latency is interesting, but i have to look into the details before i 
can comment on this.
before one can consider controlling the number of running threads via 
queuing latency, one has to improve the awareness in NaviServer about 
the various phases in the requests lifetime.


In the experimental version, we have now the following time stamps recorded:

-  acceptTime (the time, a socket was accepted)
-  requestQueueTime (the time the request was queued; was startTime)
-  requestDequeueTime (the time the request was dequeued)

The difference between requestQueueTime and acceptTime is the setup cost 
and depends on the amount of work, the driver does. For instance, nssock 
of naviserver performs read-ahead, while nsssl does not and passes 
connection right away. So, the previously used startTime (which is 
acctually the time the request was queued) was for drivers with read 
ahead not correct. In the experimental version, [ns_conn start] returns 
now always the accept time.


The next paragraph uses the term endTime, which is the time, when a 
connection thread is done with a request (either the content was 
delivered, or the content was handed over to a writer thread).


The difference between requestDequeueTime and requestQueueTime is the 
time spent in the queue. The difference between endTime and 
requestDequeueTime is the pure runtime, the difference between endTime 
and acceptTime is the totalTime. As a rough approximation the time 
between requestDequeueTime and acceptTime is pretty much influenced by 
the server setup, and the runtime by the application. I used the term 
"approximation" since the runtime of certain other requests influences 
the queuing time, as we see in the following:


Consider a server with two running connection threads receiving 6 
requests, where requests 2-5 are received in a very short time. The 
first three requests are directly assign to connection threads, have 
fromQueue == 0. These have queuing times between 88 and 110 micro 
seconds, which includes signal sending/receiving, thread change, and 
initial setup in the connection thread. The runtimes for these requests 
are pretty bad, in the range of 0.24 to 3.8 seconds elapsed time.


[1] waiting 0 current 2 idle 1 ncons 999 fromQueue 0 accept 0.00 queue 
0.000110 run 0.637781 total 0.637891
[2] waiting 3 current 2 idle 0 ncons 998 fromQueue 0 accept 0.00 queue 
0.90 run 0.245030 total 0.245120
[3] waiting 2 current 2 idle 0 ncons 987 fromQueue 0 accept 0.00 queue 
0.88 run 0.432421 total 0.432509
[4] waiting 1 current 2 idle 0 ncons 997 fromQueue 1 accept 0.00 queue 
0.244246 run 0.249208 total 0.493454
[5] waiting 0 current 2 idle 0 ncons 986 fromQueue 1 accept 0.00 queue 
0.431545 run 3.713331 total 4.144876
[6] waiting 0 current 2 idle 0 ncons 996 fromQueue 1 accept 0.00 queue 
0.480382 run 3.799818 total 4.280200

Requests [4, 5, 6] are queued, and have queuing times between 0.2 and 
0.5 seconds. The queuing times are pretty much the runtimes of [2, 3, 
4], therefore the runtime determines the queuing time. For example, the 
totalTime of request [4] was 0.493454 secs, half of the time it was 
waiting in the queue. Request [4] can consider itself happy that it was 
not scheduled after [5] or [6], where its totalTime would have been 
likely in the range of 4 secs (10 times slower). Low waiting times are 
essential for good performance.


This example shows pretty well the importance of aync delivery 
mechanisms like the writer thread or bgdelivery in OpenACS. A file being 
delivered by the connection thread over a slow internet connection might 
block later requests substantially (as in the cases above). This is even 
more important for todays web-sites, where a single view might entail 
60+ embedded requests for js, css, images,  where it is not feasible 
to defined hundreds of connection threads.


Before going further into detail i'll provide further introspection 
mechanism to the experimental version.


 - [ns_server stats] adding total waiting time
 - [ns_conn queuetime] ... time spent in queue
 - [ns_conn dequeue]  ... time stamp when the req starts to actually 
run (similar to [ns_conn start])


The first can be used for server monitoring, the next two for single 
connection introspection. The queuetime can be useful for better 
awareness and for optional output in the access log, and the dequeue 
time-stamp for application level profiling as base for a difference with 
current time.


Further wishes, suggestions, comments?

-gustaf neumann

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zo

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-19 Thread Gustaf Neumann

On 18.11.12 20:34, Stephen Deasey wrote:

On Sun, Nov 18, 2012 at 1:22 PM, Gustaf Neumann  wrote:

Here are some actual figures with a comparable number of requests:

with minthreads==maxthreads==2
requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415

below are the previous values, competed by the number of queuing operations
and the rss size in MV

with minthreads=2, create when queue >= 2
requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466

as anticipated, thread creations and cpu consumption went down, but the
number of queued requests (requests that could not be executed immediately)
increased significantly.

I was thinking of the opposite: make min/max threads equal by
increasing min threads. Requests would never stall in the queue,
unlike the experiment you ran with max threads reduced to min threads.
on the site, we have maxthreads 10. so setting minthreads as 
well to 10 has the consequence of a larger memsize (and 
queued substantially reduced).

But there's another benefit: unlike the dynamic scenario requests
would also never stall in the queue when a new thread had to be
started when min < max threads.
you are talking about naviserver before 4.99.4. Both, the 
version in the tip naviserver repository and the forked 
version provide already warmed up threads. The version on 
the main tip starts to listen to the wakup signals only once 
it is warmed up, the version in the fork adds a thread to 
the conn queue as well only after the startup is complete. 
So, in both cases there is no stall. In earlier version, 
this was as you describe.

What is the down side to increasing min threads up to max threads?
higher memory consumption, maybe more open database 
connections, allocating resources which are not needed. The 
degree of wastefullness depends certainly on maxthreads. I 
would assume that for an admin carefully watching the server 
needs, setting minthreads==maxthreads to the "right value" 
can lead to slight improvements as long the load is rather 
constant over time.



Maybe the most significant benefit of a low maxthreads value is the reduced
memory consumption. On this machine we are using plain Tcl with its "zippy
malloc", which does not release memory (once allocated to its pool) back to
the OS. So, the measured memsize depends on the max number of threads with
tcl interps, especially with large blueprints (as in the case of OpenACS).

Right: the max number of threads *ever*, not just currently. So by
killing threads you don't reduce memory usage, but you do increase
latency for some requests which have to wait for a thread+interp to be
created.

not really with the warm-up feature.

Is it convenient to measure latency distribution (not just average)? I
guess not: we record conn.startTime when a connection is taken out of
the queue and passed to a conn thread, but we don't record the time
when a socket was accepted.
we could record the socket accept time and measure the 
difference until the start of the connection runtime; when 
we output this to the accesslog (like logreqtime) we could 
run whatever statistics we want.

Actually, managing request latency is another area we don't handle so
well. You can influence it by adjusting the OS listen socket accept
queue length, you can adjust the length of the naviserver queue, and
with the proposed change here you can change how aggressive new
threads are created to process requests in the queue. But queue-depth
is a roundabout way of specifying milliseconds of latency. And not
just round-about but inherently imprecise as different URLs are going
to require different amounts of time to complete, and which URLs are
requested is a function of current traffic. If instead of queue size
you could specify a target latency then we could maybe do smarter
things with the queue, such as pull requests off the back of the queue
which have been waiting longer than the target latency, making room
for fresh requests on the front of the queue.
The idea of controlling the number of running threads via 
queuing latency is interesting, but i have to look into the 
details before i can comment on this.


-gustaf

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-18 Thread Stephen Deasey
On Sun, Nov 18, 2012 at 1:22 PM, Gustaf Neumann  wrote:
> On 14.11.12 09:51, Gustaf Neumann wrote:
>
> On 13.11.12 15:02, Stephen Deasey wrote:
>
> On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann  wrote:
>
> minthreads = 2
>
> creating threads, when idle == 0
> 10468 requests, connthreads 267
> total cputime 00:10:32
>
> creating threads, when queue >= 5
> requests 10104 connthreads 27
> total cputime 00:06:14
>
> What if you set minthreads == maxthreads?
>
> The number of thread create operations will go further down.
>
> Here are some actual figures with a comparable number of requests:
>
> with minthreads==maxthreads==2
>requests 10182 queued 2695 connthreads 11 cpu 00:05:27 rss 415
>
> below are the previous values, competed by the number of queuing operations
> and the rss size in MV
>
> with minthreads=2, create when queue >= 2
>requests 10104 queued 1584 connthreads 27 cpu 00:06:14 rss 466
>
> as anticipated, thread creations and cpu consumption went down, but the
> number of queued requests (requests that could not be executed immediately)
> increased significantly.

I was thinking of the opposite: make min/max threads equal by
increasing min threads. Requests would never stall in the queue,
unlike the experiment you ran with max threads reduced to min threads.
But there's another benefit: unlike the dynamic scenario requests
would also never stall in the queue when a new thread had to be
started when min < max threads.

What is the down side to increasing min threads up to max threads?

> Maybe the most significant benefit of a low maxthreads value is the reduced
> memory consumption. On this machine we are using plain Tcl with its "zippy
> malloc", which does not release memory (once allocated to its pool) back to
> the OS. So, the measured memsize depends on the max number of threads with
> tcl interps, especially with large blueprints (as in the case of OpenACS).

Right: the max number of threads *ever*, not just currently. So by
killing threads you don't reduce memory usage, but you do increase
latency for some requests which have to wait for a thread+interp to be
created.

Is it convenient to measure latency distribution (not just average)? I
guess not: we record conn.startTime when a connection is taken out of
the queue and passed to a conn thread, but we don't record the time
when a socket was accepted.


Actually, managing request latency is another area we don't handle so
well. You can influence it by adjusting the OS listen socket accept
queue length, you can adjust the length of the naviserver queue, and
with the proposed change here you can change how aggressive new
threads are created to process requests in the queue. But queue-depth
is a roundabout way of specifying milliseconds of latency. And not
just round-about but inherently imprecise as different URLs are going
to require different amounts of time to complete, and which URLs are
requested is a function of current traffic. If instead of queue size
you could specify a target latency then we could maybe do smarter
things with the queue, such as pull requests off the back of the queue
which have been waiting longer than the target latency, making room
for fresh requests on the front of the queue.

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-18 Thread Gustaf Neumann

On 14.11.12 09:51, Gustaf Neumann wrote:

On 13.11.12 15:02, Stephen Deasey wrote:

On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann  wrote:

minthreads = 2

creating threads, when idle == 0
 10468 requests, connthreads 267
 total cputime 00:10:32

creating threads, when queue >= 5
 requests 10104 connthreads 27
 total cputime 00:06:14

What if you set minthreads == maxthreads?

The number of thread create operations will go further down.
Here are some actual figures with a comparable number of 
requests:


with minthreads==maxthreads==2
   requests 10182 queued 2695 connthreads 11 cpu 00:05:27 
rss 415


below are the previous values, competed by the number of 
queuing operations and the rss size in MV


with minthreads=2, create when queue >= 2
   requests 10104 queued 1584 connthreads 27 cpu 00:06:14 
rss 466


as anticipated, thread creations and cpu consumption went 
down, but the number of queued requests (requests that could 
not be executed immediately) increased significantly.


Maybe the most significant benefit of a low maxthreads value 
is the reduced memory consumption. On this machine we are 
using plain Tcl with its "zippy malloc", which does not 
release memory (once allocated to its pool) back to the OS. 
So, the measured memsize depends on the max number of 
threads with tcl interps, especially with large blueprints 
(as in the case of OpenACS). This situation can be improved 
with e.g. jemalloc (what we are using in production, which 
requires a modified tcl), but after about 2 or 3 days 
running a server the rss sizes are very similar (most likely 
due to fragmentation).


-gustaf neumann

When running already at minthreads, the connection thread
timeout is ignored (otherwise there would be a high number
of thread create operations just after the timeout expires
to ensure minthreads running connection threads). With
connsperthread == 1000, there will be about 10 thread create
operations for 1 requests (not counting the 2 initial
create operation during startup for minthreads == 2). So,
the cpu consumption will be lower, but the server would not
scale, when the requests frequency would require more
connection threads. Furthermore, there will be most likely
more requests put into the queue instead of being served
immediately.

When we assume, that with minthreads == maxthreads == 2
there won't be more than say 20 requests queued, a similar
effect could be achieved by allowing additional thread
creations for more than 20 requests in the queue. Or even
more conservative, allowing thread creations only when the
request queue is completely full (setting the low water mark
to 100%) would as well be better than minthreads ==
maxthreads, since the server will at least start to create
additional threads in this rather hopeless situation, where
with minthreads == maxthreads, it won't.





--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-14 Thread Gustaf Neumann
On 13.11.12 15:02, Stephen Deasey wrote:
> On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann  wrote:
>> minthreads = 2
>>
>> creating threads, when idle == 0
>> 10468 requests, connthreads 267
>> total cputime 00:10:32
>>
>> creating threads, when queue >= 5
>> requests 10104 connthreads 27
>> total cputime 00:06:14
> What if you set minthreads == maxthreads?
The number of thread create operations will go further down. 
When running already at minthreads, the connection thread 
timeout is ignored (otherwise there would be a high number 
of thread create operations just after the timeout expires 
to ensure minthreads running connection threads). With 
connsperthread == 1000, there will be about 10 thread create 
operations for 1 requests (not counting the 2 initial 
create operation during startup for minthreads == 2). So, 
the cpu consumption will be lower, but the server would not 
scale, when the requests frequency would require more 
connection threads. Furthermore, there will be most likely 
more requests put into the queue instead of being served 
immediately.

When we assume, that with minthreads == maxthreads == 2 
there won't be more than say 20 requests queued, a similar 
effect could be achieved by allowing additional thread 
creations for more than 20 requests in the queue. Or even 
more conservative, allowing thread creations only when the 
request queue is completely full (setting the low water mark 
to 100%) would as well be better than minthreads == 
maxthreads, since the server will at least start to create 
additional threads in this rather hopeless situation, where 
with minthreads == maxthreads, it won't.





--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-13 Thread Stephen Deasey
On Tue, Nov 13, 2012 at 11:18 AM, Gustaf Neumann  wrote:
>
> minthreads = 2
>
> creating threads, when idle == 0
>10468 requests, connthreads 267
>total cputime 00:10:32
>
> creating threads, when queue >= 5
>requests 10104 connthreads 27
>total cputime 00:06:14

What if you set minthreads == maxthreads?

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] naviserver with connection thread queue

2012-11-13 Thread Gustaf Neumann
Dear all,

again some update:

The mechanism sketched below works now as well in the regression test. 
There is now a backchannel in place that lets conn threads notify the 
driver to check the liveliness of the server. This backchannel makes as 
well the timeout based  liveliness checking obsolete. By using the 
lowwatermark parameter to control thread creation, the resource 
consumption went down significantly without sacrificing speed for this 
setup.

Here is some data from next-scripting.org, which is a rather idle site 
with real world traffic (including bots etc.). The server has defined 
minthreads = 2, is running 2 drivers (nssock + nsssl) and uses a writer 
thread.

before (creating threads, when idle == 0, running server for 2 days)
   10468 requests, connthreads 267
   total cputime 00:10:32

new (creating threads, when queue >= 5, running server for 2 days)
   requests 10104 connthreads 27
   total cputime 00:06:14

One can see, that the number of create operations for connection threads 
went down by a factor of 10 (from 267 to 27), and that the cpu 
consumption was reduced by about 40% (thread initialization costs 0.64 
secs in this configuration). One can get a behavior similar to idle==0 
by setting the low water mark to 0.

The shutdown mechanism is now adjusted to the new infrastructure 
(connection threads have their own condition variable, so one cannot use 
the old broadcast to all conn threads anymore).

-gustaf neumann


Am 07.11.12 02:54, schrieb Gustaf Neumann:
> Some update: after some more testing with the new code, i still think, 
> the version is promising, but needs a few tweaks. I have started to 
> address the thread creation.
>
> To sum up the thread creation behavior/configuration of naviserver-tip:
>
>   - minthreads (try to keep at least minthreads threads idle)
>   - spread (fight against thread mass extinction due to round robin)
>   - threadtimeout (useless due to round robin)
>   - connsperthread (the only parameter effectively controlling the 
> lifespan of an conn thread)
>   - maxconnections (controls maximum number of connections in the 
> waiting queue, including running threads)
>   - concurrentcreatethreshold (percentage of waiting queue full, when 
> to create threads in concurrently)
>
> Due to the policy of keeping at least minthreads idle, threads are 
> preallocated when the load is high, the number of threads never falls 
> under minthreads by construct. Threads stop mostly due to connsperthread.
>
> Naviserver with thread queue (fork)
>
>   - minthreads (try to keep at least minthreads threads idle)
>   - threadtimeout (works effectively, default 120 secs)
>   - connsperthread (as before, just not varied via spread)
>   - maxconnections (as before; use maybe "queuesize" instead)
>   - lowwatermark (new)
>   - highwatermark (was concurrentcreatethreshold)
>
> The parameter "spread" is already deleted, since the enqueueing takes 
> care for a certain distribution, at least, when several threads are 
> created. Threads are deleted often before connsperthread due to the 
> timeout. Experiments show furthermore, that the rather agressive 
> preallocation policy with minthreads idle threads causes now much more 
> thread destroy and thread create operations than before. With with 
> OpenACS, thread creation is compute-intense (about 1 sec).
>
> In the experimental version, connections are only queued when no 
> connection thread is available (the tip version places every 
> connection into the queue). Queueing happens with "bulky" requests, 
> when e.g. a view causes a bunch (on average 5, often 10+, sometimes 
> 50+) of requests for embedded resources (style files, javascript, 
> images). It seems that permitting a few queued requests is often a 
> good idea, since the connection threads will pick these up typically 
> very quickly.
>
> To make the aggressiveness of the thread creation policy better 
> configurable, the experimental version uses for this purpose solely 
> the number of queued requests based on two parameters:
>
>   - lowwatermark (if the actual queue size is below this value, don't 
> try to create threads; default 5%)
>   - highwatermark (if the actual queue size is above this value, allow 
> parallel thread creates; default 80%)
>
> To increase the aggressiveness, one could set lowwatermark to e.g. 0, 
> causing thread-creates, whenever a connection is queued. Increasing 
> the lowwatermark reduces the willingness to create new threads. The 
> highwatermark might be useful for benchmark situations, where the 
> queue is filled up quickly.
>
> The default values seems to work quite well, it is used currently on 
> http://next-scripting.org. However we still need some more experiments 
> on different sites to get a better understanding.
>
> hmm final comment: for the regression test, i had to add the policy to 
> create threads, when all connection threads are busy. The config file 
> of the regression test uses connsperthread 0 (which is t

Re: [naviserver-devel] naviserver with connection thread queue

2012-11-06 Thread Gustaf Neumann
Some update: after some more testing with the new code, i still think, 
the version is promising, but needs a few tweaks. I have started to 
address the thread creation.

To sum up the thread creation behavior/configuration of naviserver-tip:

   - minthreads (try to keep at least minthreads threads idle)
   - spread (fight against thread mass extinction due to round robin)
   - threadtimeout (useless due to round robin)
   - connsperthread (the only parameter effectively controlling the 
lifespan of an conn thread)
   - maxconnections (controls maximum number of connections in the 
waiting queue, including running threads)
   - concurrentcreatethreshold (percentage of waiting queue full, when 
to create threads in concurrently)

Due to the policy of keeping at least minthreads idle, threads are 
preallocated when the load is high, the number of threads never falls 
under minthreads by construct. Threads stop mostly due to connsperthread.

Naviserver with thread queue (fork)

   - minthreads (try to keep at least minthreads threads idle)
   - threadtimeout (works effectively, default 120 secs)
   - connsperthread (as before, just not varied via spread)
   - maxconnections (as before; use maybe "queuesize" instead)
   - lowwatermark (new)
   - highwatermark (was concurrentcreatethreshold)

The parameter "spread" is already deleted, since the enqueueing takes 
care for a certain distribution, at least, when several threads are 
created. Threads are deleted often before connsperthread due to the 
timeout. Experiments show furthermore, that the rather agressive 
preallocation policy with minthreads idle threads causes now much more 
thread destroy and thread create operations than before. With with 
OpenACS, thread creation is compute-intense (about 1 sec).

In the experimental version, connections are only queued when no 
connection thread is available (the tip version places every connection 
into the queue). Queueing happens with "bulky" requests, when e.g. a 
view causes a bunch (on average 5, often 10+, sometimes 50+) of requests 
for embedded resources (style files, javascript, images). It seems that 
permitting a few queued requests is often a good idea, since the 
connection threads will pick these up typically very quickly.

To make the aggressiveness of the thread creation policy better 
configurable, the experimental version uses for this purpose solely the 
number of queued requests based on two parameters:

   - lowwatermark (if the actual queue size is below this value, don't 
try to create threads; default 5%)
   - highwatermark (if the actual queue size is above this value, allow 
parallel thread creates; default 80%)

To increase the aggressiveness, one could set lowwatermark to e.g. 0, 
causing thread-creates, whenever a connection is queued. Increasing the 
lowwatermark reduces the willingness to create new threads. The 
highwatermark might be useful for benchmark situations, where the queue 
is filled up quickly.

The default values seems to work quite well, it is used currently on 
http://next-scripting.org. However we still need some more experiments 
on different sites to get a better understanding.

hmm final comment: for the regression test, i had to add the policy to 
create threads, when all connection threads are busy. The config file of 
the regression test uses connsperthread 0 (which is the default, but not 
very good as such), causing the exit every connection thread to exit 
after every threads. So, when the request comes in, that we have a 
thread busy, but nothing queued. So, there would not be the need to 
create a new thread. However, when the conn thread exists, the single 
request would not be processed.

So, much more testing is needed.
-gustaf neumann

Am 01.11.12 20:17, schrieb Gustaf Neumann:
> Dear all,
>
> There is now a version on bitbucket, which works quite nice
> and stable, as far i can tell. I have split up the rather
> coarse lock of all pools and introduced finer locks for
> waiting queue (wqueue) and thread queue (tqueue) per pool.
> The changes lead to significant finer lock granularity and
> improve scalability.
>
> I have tested this new version with a synthetic load of 120
> requests per seconds, some slower requests and some faster
> ones, and it appears to be pretty stable. This load keeps
> about 20 connection threads quite busy on my home machine.
> The contention of the new locks is very little: on this test
> we saw 12 busy locks on 217.000 locks on the waiting queue,
> and 9 busy locks out of 83.000 locks on the thread queue.
> These measures are much better than in current naviserver,
> which has on the same test on the queue 248.000 locks with
> 190 busy ones. The total waiting time for locks is reduced
> by a factor of 10. One has to add, that it was not so bad
> before either. The benefit will be larger when multiple
> pools are used.
>
> Finally i think, the code is clearer than before, where the
> lock duration was quite tricky to determine

[naviserver-devel] naviserver with connection thread queue

2012-11-01 Thread Gustaf Neumann
Dear all,

There is now a version on bitbucket, which works quite nice 
and stable, as far i can tell. I have split up the rather 
coarse lock of all pools and introduced finer locks for 
waiting queue (wqueue) and thread queue (tqueue) per pool.  
The changes lead to significant finer lock granularity and 
improve scalability.

I have tested this new version with a synthetic load of 120 
requests per seconds, some slower requests and some faster 
ones, and it appears to be pretty stable. This load keeps 
about 20 connection threads quite busy on my home machine. 
The contention of the new locks is very little: on this test 
we saw 12 busy locks on 217.000 locks on the waiting queue, 
and 9 busy locks out of 83.000 locks on the thread queue. 
These measures are much better than in current naviserver, 
which has on the same test on the queue 248.000 locks with 
190 busy ones. The total waiting time for locks is reduced 
by a factor of 10. One has to add, that it was not so bad 
before either. The benefit will be larger when multiple 
pools are used.

Finally i think, the code is clearer than before, where the 
lock duration was quite tricky to determine.

opinions?
-gustaf neumann

PS: For the changes, see: 
https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets 

PS2: have not addressed the server exit signaling yet.

On 29.10.12 13:41, Gustaf Neumann wrote:
> A version of this is in the following fork:
>
> https://bitbucket.org/gustafn/naviserver-connthreadqueue/changesets
>
> So far, the competition on the pool mutex is quite high, but
> i think, it can be improved. Currently primarily the pool
> mutex is used for conn thread life-cycle management, and it
> is needed from the main/drivers/spoolers as well from the
> connection threads to update the idle/running/.. counters
> needed for controlling thread creation etc. Differentiating
> these mutexes should help.
>
> i have not addressed the termination signaling, but that's
> rather simple.
>
> -gustaf neumann
>
> On 28.10.12 03:08, Gustaf Neumann wrote:
>> i've just implemented lightweight version of the above (just
>> a few lines of code) by extending the connThread Arg
>> structure; 
>
> --
> The Windows 8 Center - In partnership with Sourceforge
> Your idea - your app - 30 days.
> Get started!
> http://windows8center.sourceforge.net/
> what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/
> ___
> naviserver-devel mailing list
> naviserver-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/naviserver-devel


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel