Re: building async-calls

2007-12-17 Thread Robert Collins

On Sun, 2007-12-16 at 21:57 +0900, Adrian Chadd wrote:

 On Sun, Dec 16, 2007, Tsantilas Christos wrote:
 
  OK Adrian I fixed this too. You can build the async-calls without
  enabling of ICAP client.
  
   Next question - if I read this code right, a class is instanced
 for every
   async callback being scheduled, is this true?
 
  Yes this is true. An AsyncCall class instanced for every async
 callback.
 
 And the comm code is going to register one of these per comm events?
 Have you benchmarked what that'll do to performance? :)
 
 Robert's comm code changes in -3 did exactly this, and it trashed
 performance
 at high throughput; hence why I started unwinding it..

I find it hard to buy that root cause analysis has been done here. There
are *lots* of other changes in -3 that will cause memory allocation; and
if allocation is the problem for these async call result classes, then
its likely fixable (for instance, in this specific case, a ring buffer
allocator may be ideal).

There are *python* programs that can saturate gigabit links with a
similar high allocation model.

-Rob


-- 
GPG key available at: http://www.robertcollins.net/keys.txt.


signature.asc
Description: This is a digitally signed message part


Re: building async-calls

2007-12-17 Thread Tsantilas Christos
Hi Adrian,
  As I am seeing squid3 spends time in  EventScheduler::schedule method.
This method did not affected by the new AsyncCall code. Also this method
is similar with squid 2.6 eventAdd function.

The only I can say is that possibly we are scheduling a huge number of
events in squid3.
If the time spend while creating the ev_entry class possibly we need a
(better?) memory allocator.

Also there is a for loop in this method. Most event scheduled with when=0
so they attached at the end (or near the end) of the events list (but we
are running all the list). If we have a big number of events (1000 or
more) it is possible to spend a lot of time here.
But if this is the problem it is easy for someone to fix it, we must not
worry about it at this time

--
  Christos


 CPU: AMD64 processors, speed 2613.46 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
 mask of 0x00 (No unit mask) count 10
 samples  %image name   symbol name
 2701344.2713  squidEventScheduler::schedule(char
 const*, void (*)(void*), void*, double, int, bool)
 2618  4.2906  libc-2.6.1.somemcpy
 2217  3.6334  libc-2.6.1.somemset
 1425  2.3354  squidMemPool::clean(long)
 913   1.4963  squidmem_node::dataRange() const

 Might want to fix that..



 Adrian






Re: building async-calls

2007-12-17 Thread Adrian Chadd
On Mon, Dec 17, 2007, Tsantilas Christos wrote:
 Hi Adrian,
 
 If you just worry about performance of AsyncCalls/JobCalls design lets
 waiting to have a progress in AsyncCalls development and see how performs.

Hey I'm glad to, as long as people test resource use regressions as they're
developing.

I do what I can with polygraph here just to make sure that I haven't boned
anything locally during development - and occasionally i do put a  where =
is needed or similar and this blows things up! - and I'm generally quite
surprised that I pick up performance issues in squid-3 every time I sit
down and look at it. :)

 About the event subsystem in squid, it is  less than 500 lines of C/C++
 code and looking in it I have in my mind 2-3 easy to implement hacks to
 improve significant its performance, if it is really problem. Because it
 is a small part  of squid can easily be rewritten at any time

Again, I don't think its a great idea to overload that events system for
general callback events. Robert had the right idea - create a seperate
event dispatcher, register it, and use that for immediate callbacks.

 And there are other parts of squid3 which also can improved, for example
 HTTP parser.

I think you'll find the HTTP parser isn't where most of the inefficencies
are. :) Profiling locally shows, in roughly this order:

* regex handling (refresh_patterns, ACL lookups)
* memset (which isn't a problem in squid-2.HEAD as much)
* memcpy (which is becoming less of a problem in Squid-2.HEAD + store_copy)
* vsnprintf and friends (need to sort -that- one out sometime)
* the mempools and system malloc
* httpHeaderNameById (need to look into this one!)

In -3 you'll probably see headersEnd() high up on the list, and maybe the 
headers
are being parsed a couple times too many.

Fixing the allocator overhead doesn't mean we need a faster allocator! just 
yet,
it means maybe we could use the allocator less. The biggest allocator user?
HttpHeaderEntry's and String buffers.

Fixing memset() has been done by only zeroing buffers that need to be.
Again, this becomes less of a point when you're allocating less (as the 
memset()s
are basically only done during allocation.)

vsnprintf() is annoying - it seems to be used internally by a number of 
functions
(eg inet_ntoa()) as well as some of the request/reply packing functions. I'll
need to look into that some more.

memcpy(), in rough order:

* storeClientCopy()
* storeAppend()
* header packing
* string copying into http headers during parsing

1) is gone in squid-2.HEAD store_copy and it seems to be fine. I'm working on 
4).
2) will take a little more effort as the http.c[c] code is slighly convoluted in
   its buffer use, but I expect to sort through that later on. 3), well, it 
depends
   on how more efficient writev() is. :)


 All these thinks should not worry us because they can fixed without the
 need of huge changes in squid3 code, and probably a number of them will
 fixed before 3.1 release.
 
 My opinion is that there are other thinks for which someone should worry
  about. For example squid (2 or 3) can not use efficiently a multi
 core/multi processor system.

If you have a slow, badly organised codebase (and I'm talking about Squid-*
here, not just -2 or -3 in particular) then SMP is just hiding the problem.
Ideally I'd like to see Squid on a single CPU, with non-disk workloads,
satisfy almost anyone's requirements in the medium term. Other web servers,
albeit without doing anywhere near as much as Squid does, can pull off
1gbps/10k req/sec on a modern single core; Squid shouldn't be any different.
SMP is important, but in this instance hardware -has- moved on and you'll
find SMP will be more important in handling things like inline content
processing, ACL matching (regular expression lookups) much more than the
raw shovelling-of-HTTP-bits. That stuff is (moderately) easy. :)





Adrian



Re: building async-calls

2007-12-16 Thread Tsantilas Christos
Hi Adrian,
  I commit some changes to async-calls branch and now compiles (but does
not link) without ICAP client enabled.

Currently you have to enable ICAP client to build async-calls. The
reason is that the src/ICAP/AsyncJob.o used in AsyncCalls but built only
as part of ICAP client library.
Later AsyncJob.* files will be moved under the src directory and the
problem will be fixed (I hope :-) )

Regards,
   Christos

Adrian Chadd wrote:
 trying to build async-calls:
 
 
 client_side_request.h:67: error: expected class-name before ‘{’ token
 cc1plus: warnings being treated as errors
 client_side_request.h:67: warning: ‘class ClientHttpRequest’ has virtual 
 functions but non-virtual destructor
 client_side_request.h: In member function ‘virtual bool 
 ClientHttpRequest::doneAll() const’:
 client_side_request.h:158: error: ‘ICAPInitiator’ has not been declared
 client_side_request.h:159: error: cannot call member function ‘virtual bool 
 AsyncJob::doneAll() const’ without object
 client_side_request.h:67: error: expected class-name before ‘{’ token
 make[3]: *** [client_side.o] Error 1
 make[3]: *** Waiting for unfinished jobs
 cc1plus: warnings being treated as errors
 client_side_request.h:67: warning: ‘class ClientHttpRequest’ has virtual 
 functions but non-virtual destructor
 client_side_request.h: In member function ‘virtual bool 
 ClientHttpRequest::doneAll() const’:
 client_side_request.h:158: error: ‘ICAPInitiator’ has not been declared
 client_side_request.h:159: error: cannot call member function ‘virtual bool 
 AsyncJob::doneAll() const’ without object
 make[3]: *** [client_side_reply.o] Error 1
 
 



Re: building async-calls

2007-12-16 Thread Adrian Chadd
On Sun, Dec 16, 2007, Tsantilas Christos wrote:
 Hi Adrian,
   I commit some changes to async-calls branch and now compiles (but does
 not link) without ICAP client enabled.
 
 Currently you have to enable ICAP client to build async-calls. The
 reason is that the src/ICAP/AsyncJob.o used in AsyncCalls but built only
 as part of ICAP client library.
 Later AsyncJob.* files will be moved under the src directory and the
 problem will be fixed (I hope :-) )

Cool.

Next question - if I read this code right, a class is instanced for every
async callback being scheduled, is this true?




Adrian



Re: building async-calls

2007-12-16 Thread Tsantilas Christos
Adrian Chadd wrote:
 On Sun, Dec 16, 2007, Tsantilas Christos wrote:
 Currently you have to enable ICAP client to build async-calls. The
 reason is that the src/ICAP/AsyncJob.o used in AsyncCalls but built only
 as part of ICAP client library.

OK Adrian I fixed this too. You can build the async-calls without
enabling of ICAP client.

 Next question - if I read this code right, a class is instanced for every
 async callback being scheduled, is this true?
 
Yes this is true. An AsyncCall class instanced for every async callback.

-- 
  Christos



Re: building async-calls

2007-12-16 Thread Adrian Chadd
On Sun, Dec 16, 2007, Tsantilas Christos wrote:

 OK Adrian I fixed this too. You can build the async-calls without
 enabling of ICAP client.
 
  Next question - if I read this code right, a class is instanced for every
  async callback being scheduled, is this true?

 Yes this is true. An AsyncCall class instanced for every async callback.

And the comm code is going to register one of these per comm events?
Have you benchmarked what that'll do to performance? :)

Robert's comm code changes in -3 did exactly this, and it trashed performance
at high throughput; hence why I started unwinding it..



Adrian



Re: building async-calls

2007-12-16 Thread Tsantilas Christos
Adrian Chadd wrote:
 On Sun, Dec 16, 2007, Tsantilas Christos wrote:
 
 
 Yes this is true. An AsyncCall class instanced for every async callback.
 
 And the comm code is going to register one of these per comm events?
Yes.

 Have you benchmarked what that'll do to performance? :)

Maybe has some performance penalty. But if there is a performance
decrease, I do not think that it is huge.
Normally creating a class is not more costly than creating a C struct
and initialize it. The AsyncCall classes are very simple classes.

Moreover, I believe with current design you are loosing more in
performance, trying to prevent problems adding extra code for tests or
workarounds.

 
 Robert's comm code changes in -3 did exactly this, and it trashed performance
 at high throughput; hence why I started unwinding it..
 
 
 
 Adrian
 
 



Re: building async-calls

2007-12-16 Thread Adrian Chadd
On Sun, Dec 16, 2007, Tsantilas Christos wrote:

  Have you benchmarked what that'll do to performance? :)
 
 Maybe has some performance penalty. But if there is a performance
 decrease, I do not think that it is huge.
 Normally creating a class is not more costly than creating a C struct
 and initialize it. The AsyncCall classes are very simple classes.

Well, the previous C comm loop code didn't create a C struct and
initialise it. That was the whole point. :)

 Moreover, I believe with current design you are loosing more in
 performance, trying to prevent problems adding extra code for tests or
 workarounds.

I suggest just benchmarking it under high transaction load (ie,
which pegs the CPU 100%) and see where the CPU is going.
You'll see the memory allocator taking up quite a bit of CPU
time. Remove the low-hanging fruit which will be eliminated with
a few easy changes (headersEnd() taking loads of CPU, memcpy(),
memset()) and if its anything like what I saw before you'll see
the allocator taking up more CPU than is fair..



Adrian



Re: building async-calls

2007-12-16 Thread Amos Jeffries
 On Sun, Dec 16, 2007, Tsantilas Christos wrote:

  Have you benchmarked what that'll do to performance? :)

 Maybe has some performance penalty. But if there is a performance
 decrease, I do not think that it is huge.
 Normally creating a class is not more costly than creating a C struct
 and initialize it. The AsyncCall classes are very simple classes.

 Well, the previous C comm loop code didn't create a C struct and
 initialise it. That was the whole point. :)

 Moreover, I believe with current design you are loosing more in
 performance, trying to prevent problems adding extra code for tests or
 workarounds.

 I suggest just benchmarking it under high transaction load (ie,
 which pegs the CPU 100%) and see where the CPU is going.
 You'll see the memory allocator taking up quite a bit of CPU
 time. Remove the low-hanging fruit which will be eliminated with
 a few easy changes (headersEnd() taking loads of CPU, memcpy(),
 memset()) and if its anything like what I saw before you'll see
 the allocator taking up more CPU than is fair..

 Adrian


Adrian, I've tried to do some profiling myself recently but am stuck
getting those nice stats you post out of it. (Last time I did profiling
was in VisualStudio).

Could you send me or the list a how-to on using the cpu-profiling feature,
from enable to viewing stats please?

Amos




Re: building async-calls

2007-12-16 Thread Adrian Chadd
On Mon, Dec 17, 2007, Amos Jeffries wrote:

 
 Adrian, I've tried to do some profiling myself recently but am stuck
 getting those nice stats you post out of it. (Last time I did profiling
 was in VisualStudio).
 
 Could you send me or the list a how-to on using the cpu-profiling feature,
 from enable to viewing stats please?

* install linux
* install oprofile
* read the docs
* then, find the wiki page outlining how its done. :)

I've been meaning to write a shim or two to overload the -pg feature in GCC -
it'll call your function enter/exit stubs with some stack information -
so I can generate call graphs inside Squid without having to do the inaccurate
gprof time profiling.



Adrian



Re: building async-calls

2007-12-16 Thread Amos Jeffries
 On Mon, Dec 17, 2007, Amos Jeffries wrote:


 Adrian, I've tried to do some profiling myself recently but am stuck
 getting those nice stats you post out of it. (Last time I did profiling
 was in VisualStudio).

 Could you send me or the list a how-to on using the cpu-profiling
 feature,
 from enable to viewing stats please?

 * install linux
pre-done :-)
 * install oprofile
ah, thats why I couldn't find 'profile'.

 * read the docs
 * then, find the wiki page outlining how its done. :)

this?
http://wiki.squid-cache.org/LinuxOprofile


 I've been meaning to write a shim or two to overload the -pg feature in
 GCC -
 it'll call your function enter/exit stubs with some stack information -
 so I can generate call graphs inside Squid without having to do the
 inaccurate
 gprof time profiling.

That went right over my head, all but the 'expensive ... time profiling'
which kind of sticks to something from wayback.

Thans

Amos




building async-calls

2007-12-15 Thread Adrian Chadd
trying to build async-calls:


client_side_request.h:67: error: expected class-name before ???{??? token
cc1plus: warnings being treated as errors
client_side_request.h:67: warning: ???class ClientHttpRequest??? has virtual 
functions but non-virtual destructor
client_side_request.h: In member function ???virtual bool 
ClientHttpRequest::doneAll() const???:
client_side_request.h:158: error: ???ICAPInitiator??? has not been declared
client_side_request.h:159: error: cannot call member function ???virtual bool 
AsyncJob::doneAll() const??? without object
client_side_request.h:67: error: expected class-name before ???{??? token
make[3]: *** [client_side.o] Error 1
make[3]: *** Waiting for unfinished jobs
cc1plus: warnings being treated as errors
client_side_request.h:67: warning: ???class ClientHttpRequest??? has virtual 
functions but non-virtual destructor
client_side_request.h: In member function ???virtual bool 
ClientHttpRequest::doneAll() const???:
client_side_request.h:158: error: ???ICAPInitiator??? has not been declared
client_side_request.h:159: error: cannot call member function ???virtual bool 
AsyncJob::doneAll() const??? without object
make[3]: *** [client_side_reply.o] Error 1