Re: building async-calls
On Sun, 2007-12-16 at 21:57 +0900, Adrian Chadd wrote: On Sun, Dec 16, 2007, Tsantilas Christos wrote: OK Adrian I fixed this too. You can build the async-calls without enabling of ICAP client. Next question - if I read this code right, a class is instanced for every async callback being scheduled, is this true? Yes this is true. An AsyncCall class instanced for every async callback. And the comm code is going to register one of these per comm events? Have you benchmarked what that'll do to performance? :) Robert's comm code changes in -3 did exactly this, and it trashed performance at high throughput; hence why I started unwinding it.. I find it hard to buy that root cause analysis has been done here. There are *lots* of other changes in -3 that will cause memory allocation; and if allocation is the problem for these async call result classes, then its likely fixable (for instance, in this specific case, a ring buffer allocator may be ideal). There are *python* programs that can saturate gigabit links with a similar high allocation model. -Rob -- GPG key available at: http://www.robertcollins.net/keys.txt. signature.asc Description: This is a digitally signed message part
Re: building async-calls
Hi Adrian, As I am seeing squid3 spends time in EventScheduler::schedule method. This method did not affected by the new AsyncCall code. Also this method is similar with squid 2.6 eventAdd function. The only I can say is that possibly we are scheduling a huge number of events in squid3. If the time spend while creating the ev_entry class possibly we need a (better?) memory allocator. Also there is a for loop in this method. Most event scheduled with when=0 so they attached at the end (or near the end) of the events list (but we are running all the list). If we have a big number of events (1000 or more) it is possible to spend a lot of time here. But if this is the problem it is easy for someone to fix it, we must not worry about it at this time -- Christos CPU: AMD64 processors, speed 2613.46 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 2701344.2713 squidEventScheduler::schedule(char const*, void (*)(void*), void*, double, int, bool) 2618 4.2906 libc-2.6.1.somemcpy 2217 3.6334 libc-2.6.1.somemset 1425 2.3354 squidMemPool::clean(long) 913 1.4963 squidmem_node::dataRange() const Might want to fix that.. Adrian
Re: building async-calls
On Mon, Dec 17, 2007, Tsantilas Christos wrote: Hi Adrian, If you just worry about performance of AsyncCalls/JobCalls design lets waiting to have a progress in AsyncCalls development and see how performs. Hey I'm glad to, as long as people test resource use regressions as they're developing. I do what I can with polygraph here just to make sure that I haven't boned anything locally during development - and occasionally i do put a where = is needed or similar and this blows things up! - and I'm generally quite surprised that I pick up performance issues in squid-3 every time I sit down and look at it. :) About the event subsystem in squid, it is less than 500 lines of C/C++ code and looking in it I have in my mind 2-3 easy to implement hacks to improve significant its performance, if it is really problem. Because it is a small part of squid can easily be rewritten at any time Again, I don't think its a great idea to overload that events system for general callback events. Robert had the right idea - create a seperate event dispatcher, register it, and use that for immediate callbacks. And there are other parts of squid3 which also can improved, for example HTTP parser. I think you'll find the HTTP parser isn't where most of the inefficencies are. :) Profiling locally shows, in roughly this order: * regex handling (refresh_patterns, ACL lookups) * memset (which isn't a problem in squid-2.HEAD as much) * memcpy (which is becoming less of a problem in Squid-2.HEAD + store_copy) * vsnprintf and friends (need to sort -that- one out sometime) * the mempools and system malloc * httpHeaderNameById (need to look into this one!) In -3 you'll probably see headersEnd() high up on the list, and maybe the headers are being parsed a couple times too many. Fixing the allocator overhead doesn't mean we need a faster allocator! just yet, it means maybe we could use the allocator less. The biggest allocator user? HttpHeaderEntry's and String buffers. Fixing memset() has been done by only zeroing buffers that need to be. Again, this becomes less of a point when you're allocating less (as the memset()s are basically only done during allocation.) vsnprintf() is annoying - it seems to be used internally by a number of functions (eg inet_ntoa()) as well as some of the request/reply packing functions. I'll need to look into that some more. memcpy(), in rough order: * storeClientCopy() * storeAppend() * header packing * string copying into http headers during parsing 1) is gone in squid-2.HEAD store_copy and it seems to be fine. I'm working on 4). 2) will take a little more effort as the http.c[c] code is slighly convoluted in its buffer use, but I expect to sort through that later on. 3), well, it depends on how more efficient writev() is. :) All these thinks should not worry us because they can fixed without the need of huge changes in squid3 code, and probably a number of them will fixed before 3.1 release. My opinion is that there are other thinks for which someone should worry about. For example squid (2 or 3) can not use efficiently a multi core/multi processor system. If you have a slow, badly organised codebase (and I'm talking about Squid-* here, not just -2 or -3 in particular) then SMP is just hiding the problem. Ideally I'd like to see Squid on a single CPU, with non-disk workloads, satisfy almost anyone's requirements in the medium term. Other web servers, albeit without doing anywhere near as much as Squid does, can pull off 1gbps/10k req/sec on a modern single core; Squid shouldn't be any different. SMP is important, but in this instance hardware -has- moved on and you'll find SMP will be more important in handling things like inline content processing, ACL matching (regular expression lookups) much more than the raw shovelling-of-HTTP-bits. That stuff is (moderately) easy. :) Adrian
Re: building async-calls
Hi Adrian, I commit some changes to async-calls branch and now compiles (but does not link) without ICAP client enabled. Currently you have to enable ICAP client to build async-calls. The reason is that the src/ICAP/AsyncJob.o used in AsyncCalls but built only as part of ICAP client library. Later AsyncJob.* files will be moved under the src directory and the problem will be fixed (I hope :-) ) Regards, Christos Adrian Chadd wrote: trying to build async-calls: client_side_request.h:67: error: expected class-name before ‘{’ token cc1plus: warnings being treated as errors client_side_request.h:67: warning: ‘class ClientHttpRequest’ has virtual functions but non-virtual destructor client_side_request.h: In member function ‘virtual bool ClientHttpRequest::doneAll() const’: client_side_request.h:158: error: ‘ICAPInitiator’ has not been declared client_side_request.h:159: error: cannot call member function ‘virtual bool AsyncJob::doneAll() const’ without object client_side_request.h:67: error: expected class-name before ‘{’ token make[3]: *** [client_side.o] Error 1 make[3]: *** Waiting for unfinished jobs cc1plus: warnings being treated as errors client_side_request.h:67: warning: ‘class ClientHttpRequest’ has virtual functions but non-virtual destructor client_side_request.h: In member function ‘virtual bool ClientHttpRequest::doneAll() const’: client_side_request.h:158: error: ‘ICAPInitiator’ has not been declared client_side_request.h:159: error: cannot call member function ‘virtual bool AsyncJob::doneAll() const’ without object make[3]: *** [client_side_reply.o] Error 1
Re: building async-calls
On Sun, Dec 16, 2007, Tsantilas Christos wrote: Hi Adrian, I commit some changes to async-calls branch and now compiles (but does not link) without ICAP client enabled. Currently you have to enable ICAP client to build async-calls. The reason is that the src/ICAP/AsyncJob.o used in AsyncCalls but built only as part of ICAP client library. Later AsyncJob.* files will be moved under the src directory and the problem will be fixed (I hope :-) ) Cool. Next question - if I read this code right, a class is instanced for every async callback being scheduled, is this true? Adrian
Re: building async-calls
Adrian Chadd wrote: On Sun, Dec 16, 2007, Tsantilas Christos wrote: Currently you have to enable ICAP client to build async-calls. The reason is that the src/ICAP/AsyncJob.o used in AsyncCalls but built only as part of ICAP client library. OK Adrian I fixed this too. You can build the async-calls without enabling of ICAP client. Next question - if I read this code right, a class is instanced for every async callback being scheduled, is this true? Yes this is true. An AsyncCall class instanced for every async callback. -- Christos
Re: building async-calls
On Sun, Dec 16, 2007, Tsantilas Christos wrote: OK Adrian I fixed this too. You can build the async-calls without enabling of ICAP client. Next question - if I read this code right, a class is instanced for every async callback being scheduled, is this true? Yes this is true. An AsyncCall class instanced for every async callback. And the comm code is going to register one of these per comm events? Have you benchmarked what that'll do to performance? :) Robert's comm code changes in -3 did exactly this, and it trashed performance at high throughput; hence why I started unwinding it.. Adrian
Re: building async-calls
Adrian Chadd wrote: On Sun, Dec 16, 2007, Tsantilas Christos wrote: Yes this is true. An AsyncCall class instanced for every async callback. And the comm code is going to register one of these per comm events? Yes. Have you benchmarked what that'll do to performance? :) Maybe has some performance penalty. But if there is a performance decrease, I do not think that it is huge. Normally creating a class is not more costly than creating a C struct and initialize it. The AsyncCall classes are very simple classes. Moreover, I believe with current design you are loosing more in performance, trying to prevent problems adding extra code for tests or workarounds. Robert's comm code changes in -3 did exactly this, and it trashed performance at high throughput; hence why I started unwinding it.. Adrian
Re: building async-calls
On Sun, Dec 16, 2007, Tsantilas Christos wrote: Have you benchmarked what that'll do to performance? :) Maybe has some performance penalty. But if there is a performance decrease, I do not think that it is huge. Normally creating a class is not more costly than creating a C struct and initialize it. The AsyncCall classes are very simple classes. Well, the previous C comm loop code didn't create a C struct and initialise it. That was the whole point. :) Moreover, I believe with current design you are loosing more in performance, trying to prevent problems adding extra code for tests or workarounds. I suggest just benchmarking it under high transaction load (ie, which pegs the CPU 100%) and see where the CPU is going. You'll see the memory allocator taking up quite a bit of CPU time. Remove the low-hanging fruit which will be eliminated with a few easy changes (headersEnd() taking loads of CPU, memcpy(), memset()) and if its anything like what I saw before you'll see the allocator taking up more CPU than is fair.. Adrian
Re: building async-calls
On Sun, Dec 16, 2007, Tsantilas Christos wrote: Have you benchmarked what that'll do to performance? :) Maybe has some performance penalty. But if there is a performance decrease, I do not think that it is huge. Normally creating a class is not more costly than creating a C struct and initialize it. The AsyncCall classes are very simple classes. Well, the previous C comm loop code didn't create a C struct and initialise it. That was the whole point. :) Moreover, I believe with current design you are loosing more in performance, trying to prevent problems adding extra code for tests or workarounds. I suggest just benchmarking it under high transaction load (ie, which pegs the CPU 100%) and see where the CPU is going. You'll see the memory allocator taking up quite a bit of CPU time. Remove the low-hanging fruit which will be eliminated with a few easy changes (headersEnd() taking loads of CPU, memcpy(), memset()) and if its anything like what I saw before you'll see the allocator taking up more CPU than is fair.. Adrian Adrian, I've tried to do some profiling myself recently but am stuck getting those nice stats you post out of it. (Last time I did profiling was in VisualStudio). Could you send me or the list a how-to on using the cpu-profiling feature, from enable to viewing stats please? Amos
Re: building async-calls
On Mon, Dec 17, 2007, Amos Jeffries wrote: Adrian, I've tried to do some profiling myself recently but am stuck getting those nice stats you post out of it. (Last time I did profiling was in VisualStudio). Could you send me or the list a how-to on using the cpu-profiling feature, from enable to viewing stats please? * install linux * install oprofile * read the docs * then, find the wiki page outlining how its done. :) I've been meaning to write a shim or two to overload the -pg feature in GCC - it'll call your function enter/exit stubs with some stack information - so I can generate call graphs inside Squid without having to do the inaccurate gprof time profiling. Adrian
Re: building async-calls
On Mon, Dec 17, 2007, Amos Jeffries wrote: Adrian, I've tried to do some profiling myself recently but am stuck getting those nice stats you post out of it. (Last time I did profiling was in VisualStudio). Could you send me or the list a how-to on using the cpu-profiling feature, from enable to viewing stats please? * install linux pre-done :-) * install oprofile ah, thats why I couldn't find 'profile'. * read the docs * then, find the wiki page outlining how its done. :) this? http://wiki.squid-cache.org/LinuxOprofile I've been meaning to write a shim or two to overload the -pg feature in GCC - it'll call your function enter/exit stubs with some stack information - so I can generate call graphs inside Squid without having to do the inaccurate gprof time profiling. That went right over my head, all but the 'expensive ... time profiling' which kind of sticks to something from wayback. Thans Amos
building async-calls
trying to build async-calls: client_side_request.h:67: error: expected class-name before ???{??? token cc1plus: warnings being treated as errors client_side_request.h:67: warning: ???class ClientHttpRequest??? has virtual functions but non-virtual destructor client_side_request.h: In member function ???virtual bool ClientHttpRequest::doneAll() const???: client_side_request.h:158: error: ???ICAPInitiator??? has not been declared client_side_request.h:159: error: cannot call member function ???virtual bool AsyncJob::doneAll() const??? without object client_side_request.h:67: error: expected class-name before ???{??? token make[3]: *** [client_side.o] Error 1 make[3]: *** Waiting for unfinished jobs cc1plus: warnings being treated as errors client_side_request.h:67: warning: ???class ClientHttpRequest??? has virtual functions but non-virtual destructor client_side_request.h: In member function ???virtual bool ClientHttpRequest::doneAll() const???: client_side_request.h:158: error: ???ICAPInitiator??? has not been declared client_side_request.h:159: error: cannot call member function ???virtual bool AsyncJob::doneAll() const??? without object make[3]: *** [client_side_reply.o] Error 1