Re: squid-smp: synchronization issue & solutions
On Wed, Nov 25, 2009 at 7:48 AM, Amos Jeffries wrote: > On Tue, 24 Nov 2009 16:13:37 -0700, Alex Rousskov > wrote: >> On 11/20/2009 10:59 PM, Robert Collins wrote: >>> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote: >> Q1. What are the major areas or units of asynchronous code > execution? >> Some of us may prefer large areas such as "http_port acceptor" or >> "cache" or "server side". Others may root for AsyncJob as the > largest >> asynchronous unit of execution. These two approaches and their >> implications differ a lot. There may be other designs worth >> considering. >> >>> I'd like to let people start writing (and perf testing!) patches. To >>> unblock people. I think the primary questions are: >>> - do we permit multiple approaches inside the same code base. E.g. >>> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job >>> queues' or some such abstraction elsewhere ? >>> (I vote yes, but with caution: someone trying something we don't >>> already do should keep it on a branch and really measure it well until >>> its got plenty of buy in). >> >> I vote for multiple approaches at lower levels of the architecture and >> against multiple approaches at highest level of the architecture. My Q1 >> was only about the highest levels, BTW. >> >> For example, I do not think it is a good idea to allow a combination of >> OpenMP, ACE, and something else as a top-level design. Understanding, >> supporting, and tuning such a mix would be a nightmare, IMO. >> >> On the other hand, using threads within some disk storage schemes while >> using processes for things like "cache" may make a lot of sense, and we >> already have examples of some of that working. >> > > OpenMP seems almost unanimous negative by the people who know it. > OK >> >> This is why I believe that the decision of processes versus threads *at >> the highest level* of the architecture is so important. Yes, we are, >> can, and will use threads at lower levels. There is no argument there. >> The question is whether we can also use threads to split Squid into >> several instances of "major areas" like client side(s), cache(s), and >> server side(s). >> >> See Henrik's email on why it is difficult to use threads at highest >> levels. I am not convinced yet, but I do see Henrik's point, and I >> consider the dangers he cites critical for the right Q1 answer. >> >> >>> - If we do *not* permit multiple approaches, then what approach do we >>> want for parallelisation. E.g. a number of long lived threads that take >>> on work, or many transient threads as particular bits of the code need >>> threads. I favour the former (long lived 'worker' threads). >> >> For highest-level models, I do not think that "one job per >> thread/process", "one call per thread/process", or any other "one little >> short-lived something per thread/process" is a good idea. I do believe >> we have to parallelize "major areas", and I think we should support >> multiple instances of some of those "areas" (e.g., multiple client >> sides). Each "major area" would be long-lived process/thread, of course. > > Agreed. mostly. > > As Rob points out the idea is for one small'ish pathway of the code to be > run N times with different state data each time by a single thread. > > Sachins' initial AcceptFD thread proposal would perhapse be exemplar for > this type of thread. Where one thread does the comm layer; accept() through > to the scheduling call hand-off to handlers outside comm. Then goes back > for the next accept(). > > The only performance issue brought up was by you that its particular case > might flood the slower main process if done first. Not all code can be done > this way. > > Overheads are simply moving the state data in/out of the thread. IMO > starting/stopping threads too often is a fairly bad idea. Most events will > end up being grouped together into types (perhapse categorized by > component, perhapse by client request, perhapse by pathway) with a small > thread dedicated to handling that type of call. > >> >> Again for higher-level models, I am also skeptical that it is a good >> idea to just split Squid into N mostly non-cooperating nearly identical >> instances. It may be the right first step, but I would like to offer >> more than that in terms of overall performance and tunability. > > The answer to that is: of all the SMP models we theorize, that one is the > only proven model so far. > Administrators are already doing it with all the instance management > manually handled on quad+ core machines. With a lot of performance success. > > In last nights discussion on IRC we covered what issues are outstanding > from making this automatic and all are resolvable except cache index. It's > not easily shareable between instances. > >> >> I hope the above explains why I consider Q1 critical for the meant >> "highest level" scope and why "we already use processes and threads" is >> certainly true but irrelevant withi
Re: squid-smp: synchronization issue & solutions
On Tue, 24 Nov 2009 16:13:37 -0700, Alex Rousskov wrote: > On 11/20/2009 10:59 PM, Robert Collins wrote: >> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote: > Q1. What are the major areas or units of asynchronous code execution? > Some of us may prefer large areas such as "http_port acceptor" or > "cache" or "server side". Others may root for AsyncJob as the largest > asynchronous unit of execution. These two approaches and their > implications differ a lot. There may be other designs worth > considering. > >> I'd like to let people start writing (and perf testing!) patches. To >> unblock people. I think the primary questions are: >> - do we permit multiple approaches inside the same code base. E.g. >> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job >> queues' or some such abstraction elsewhere ? >> (I vote yes, but with caution: someone trying something we don't >> already do should keep it on a branch and really measure it well until >> its got plenty of buy in). > > I vote for multiple approaches at lower levels of the architecture and > against multiple approaches at highest level of the architecture. My Q1 > was only about the highest levels, BTW. > > For example, I do not think it is a good idea to allow a combination of > OpenMP, ACE, and something else as a top-level design. Understanding, > supporting, and tuning such a mix would be a nightmare, IMO. > > On the other hand, using threads within some disk storage schemes while > using processes for things like "cache" may make a lot of sense, and we > already have examples of some of that working. > OpenMP seems almost unanimous negative by the people who know it. > > This is why I believe that the decision of processes versus threads *at > the highest level* of the architecture is so important. Yes, we are, > can, and will use threads at lower levels. There is no argument there. > The question is whether we can also use threads to split Squid into > several instances of "major areas" like client side(s), cache(s), and > server side(s). > > See Henrik's email on why it is difficult to use threads at highest > levels. I am not convinced yet, but I do see Henrik's point, and I > consider the dangers he cites critical for the right Q1 answer. > > >> - If we do *not* permit multiple approaches, then what approach do we >> want for parallelisation. E.g. a number of long lived threads that take >> on work, or many transient threads as particular bits of the code need >> threads. I favour the former (long lived 'worker' threads). > > For highest-level models, I do not think that "one job per > thread/process", "one call per thread/process", or any other "one little > short-lived something per thread/process" is a good idea. I do believe > we have to parallelize "major areas", and I think we should support > multiple instances of some of those "areas" (e.g., multiple client > sides). Each "major area" would be long-lived process/thread, of course. Agreed. mostly. As Rob points out the idea is for one small'ish pathway of the code to be run N times with different state data each time by a single thread. Sachins' initial AcceptFD thread proposal would perhapse be exemplar for this type of thread. Where one thread does the comm layer; accept() through to the scheduling call hand-off to handlers outside comm. Then goes back for the next accept(). The only performance issue brought up was by you that its particular case might flood the slower main process if done first. Not all code can be done this way. Overheads are simply moving the state data in/out of the thread. IMO starting/stopping threads too often is a fairly bad idea. Most events will end up being grouped together into types (perhapse categorized by component, perhapse by client request, perhapse by pathway) with a small thread dedicated to handling that type of call. > > Again for higher-level models, I am also skeptical that it is a good > idea to just split Squid into N mostly non-cooperating nearly identical > instances. It may be the right first step, but I would like to offer > more than that in terms of overall performance and tunability. The answer to that is: of all the SMP models we theorize, that one is the only proven model so far. Administrators are already doing it with all the instance management manually handled on quad+ core machines. With a lot of performance success. In last nights discussion on IRC we covered what issues are outstanding from making this automatic and all are resolvable except cache index. It's not easily shareable between instances. > > I hope the above explains why I consider Q1 critical for the meant > "highest level" scope and why "we already use processes and threads" is > certainly true but irrelevant within that scope. > > > Thank you, > > Alex. Thank you for clarifying that. I now think we are all more or less headed in the same direction(s). With three models proposed for t
Re: squid-smp: synchronization issue & solutions
On Tue, 2009-11-24 at 16:13 -0700, Alex Rousskov wrote: > For example, I do not think it is a good idea to allow a combination of > OpenMP, ACE, and something else as a top-level design. Understanding, > supporting, and tuning such a mix would be a nightmare, IMO. I think that would be hard, yes. > See Henrik's email on why it is difficult to use threads at highest > levels. I am not convinced yet, but I do see Henrik's point, and I > consider the dangers he cites critical for the right Q1 answer. > > - If we do *not* permit multiple approaches, then what approach do we > > want for parallelisation. E.g. a number of long lived threads that take > > on work, or many transient threads as particular bits of the code need > > threads. I favour the former (long lived 'worker' threads). > > For highest-level models, I do not think that "one job per > thread/process", "one call per thread/process", or any other "one little > short-lived something per thread/process" is a good idea. Neither do I. Short lived things have a high overhead. But consider that a queue of tasks in a single long lived thread doesn't have the high overhead of making a new thread or process per item in the queue. Using ACLs as an example, ACL checking is callback based nearly everywhere; we could have a thread that does ACL checking and free up the main thread to continue doing work. Later on, with more auditing we could have multiple concurrent ACL checking threads. -Rob signature.asc Description: This is a digitally signed message part
Re: squid-smp: synchronization issue & solutions
On 11/20/2009 10:59 PM, Robert Collins wrote: > On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote: Q1. What are the major areas or units of asynchronous code execution? Some of us may prefer large areas such as "http_port acceptor" or "cache" or "server side". Others may root for AsyncJob as the largest asynchronous unit of execution. These two approaches and their implications differ a lot. There may be other designs worth considering. > I'd like to let people start writing (and perf testing!) patches. To > unblock people. I think the primary questions are: > - do we permit multiple approaches inside the same code base. E.g. > OpenMP in some bits, pthreads / windows threads elsewhere, and 'job > queues' or some such abstraction elsewhere ? > (I vote yes, but with caution: someone trying something we don't > already do should keep it on a branch and really measure it well until > its got plenty of buy in). I vote for multiple approaches at lower levels of the architecture and against multiple approaches at highest level of the architecture. My Q1 was only about the highest levels, BTW. For example, I do not think it is a good idea to allow a combination of OpenMP, ACE, and something else as a top-level design. Understanding, supporting, and tuning such a mix would be a nightmare, IMO. On the other hand, using threads within some disk storage schemes while using processes for things like "cache" may make a lot of sense, and we already have examples of some of that working. This is why I believe that the decision of processes versus threads *at the highest level* of the architecture is so important. Yes, we are, can, and will use threads at lower levels. There is no argument there. The question is whether we can also use threads to split Squid into several instances of "major areas" like client side(s), cache(s), and server side(s). See Henrik's email on why it is difficult to use threads at highest levels. I am not convinced yet, but I do see Henrik's point, and I consider the dangers he cites critical for the right Q1 answer. > - If we do *not* permit multiple approaches, then what approach do we > want for parallelisation. E.g. a number of long lived threads that take > on work, or many transient threads as particular bits of the code need > threads. I favour the former (long lived 'worker' threads). For highest-level models, I do not think that "one job per thread/process", "one call per thread/process", or any other "one little short-lived something per thread/process" is a good idea. I do believe we have to parallelize "major areas", and I think we should support multiple instances of some of those "areas" (e.g., multiple client sides). Each "major area" would be long-lived process/thread, of course. Again for higher-level models, I am also skeptical that it is a good idea to just split Squid into N mostly non-cooperating nearly identical instances. It may be the right first step, but I would like to offer more than that in terms of overall performance and tunability. I hope the above explains why I consider Q1 critical for the meant "highest level" scope and why "we already use processes and threads" is certainly true but irrelevant within that scope. Thank you, Alex.
Re: squid-smp: synchronization issue & solutions
On Tue, Nov 24, 2009 at 6:08 PM, Henrik Nordstrom wrote: > ons 2009-11-25 klockan 00:55 +1300 skrev Amos Jeffries: > >> I kind of mean that by the "smaller units". I'm thinking primarily here >> of the internal DNS. It's API is very isolated from the work. > > And also a good example of where the CPU usage is negligible. > > And no, it's not really that isolated. It's allocating data for the > response which is then handed to the caller, and modified in other parts > of the code via ipcache.. > > But yes, it's a good example of where one can try scheduling the > processing on a separate thread to experiment with such model. Its not only about how much CPU usage we are distributing among threads, But we also have to consider that thread works only inside its own memory if shared data is less( & must be), If we could let thread to work inside its own private memory maximum time, it is worth to create thread so a thread scheduled on a core accessing its own cache will definitely speed up our squid. Yes we have to consider how OS is doing read/write operations, Because all write operations must be done serially if using WRITE THROUGH policy to update all levels of memory ( cache or main), otherwise no issues > Regards > Henrik > > -- Mr. S. H. Malave Computer Science & Engineering Department, Walchand College of Engineering,Sangli. sachinmal...@wce.org.in
Re: squid-smp: synchronization issue & solutions
ons 2009-11-25 klockan 00:55 +1300 skrev Amos Jeffries: > I kind of mean that by the "smaller units". I'm thinking primarily here > of the internal DNS. It's API is very isolated from the work. And also a good example of where the CPU usage is negligible. And no, it's not really that isolated. It's allocating data for the response which is then handed to the caller, and modified in other parts of the code via ipcache.. But yes, it's a good example of where one can try scheduling the processing on a separate thread to experiment with such model. Regards Henrik
Re: squid-smp: synchronization issue & solutions
Henrik Nordstrom wrote: sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries: I think we can open the doors earlier than after that. I'm happy with an approach that would see the smaller units of Squid growing in parallelism to encompass two full cores. And I have a more careful opinion. Introducing threads in the current Squid core processing is very non-trivial. This due to the relatively high amount of shared data with no access protection. We already have sufficient nightmares from data access synchronization issues in the current non-threaded design, and trying to synchronize access in a threaded operations is many orders of magnitude more complex. The day the code base is cleaned up to the level that one can actually assess what data is being accessed where threads may be a viable discussion, but as things are today it's almost impossible to judge what data will be directly or indirectly accessed by any larger operation. I kind of mean that by the "smaller units". I'm thinking primarily here of the internal DNS. It's API is very isolated from the work. Using threads for micro operations will not help us. The overhead involved in scheduling an operation to a thread is comparably large to most operations we are performing, and if adding to this the amount of synchronization needed to shield the data accessed by that operation then the overhead will in nearly all cases by far out weight the actual processing time of the micro operations only resulting in a net loss of performance. There is some isolated cases I can think of like SSL handshake negotiation where actual processing may be significant, but at the general level I don't see many operations which would be candidates for micro threading. These are the ones I can see without really looking ... * receive DNS packet, * validate * add to cache * schedule event * repeat ::shared: call event data, IP memory block (copy?), queue access, any stats counted or the one Sachin found: * accept connection * perform NAT if needed * perform SSL handshakes if needed * generate connection state objects * schedule * repeat ::shared: state data object (write), SSL context (read-only?), call event data, call queue access, any stats counted or the request body pump is a dead-end for handling: * read data chunk * compress/decompress * write to disk * write data chunk to client * repeat ::shared: state data object (read-only, if thread provides its own data buffer), 2N FD data (read-only), any stats counted Yes this last is overkill unless bunching up the concurrency a little/lot in each thread. so the request body data pump can pull/push up to N active client connections at once. Using threads for isolated things like disk I/O is one thing. The code running in those threads are very very isolated and limited in what it's allowed to do (may only access the data given to them, may NOT allocate new data or look up any other global data), but is still heavily penalized from synchronization overhead. Further the only reason why we have the threaded I/O model is because Posix AIO do not provide a rich enough interface, missing open/close operations which may both block for significant amount of time. So we had to implement our own alternative having open/close operations. If you look closely at the threads I/O code you will see that it goes to quite great lengths to isolate the threads from the main code, with obvious performance drawbacks. The initial code even went much further in isolation, but core changes have over time provided a somewhat more suitable environment for some of those operations. For the same reasons I don't see OpenMP as fitting for the problem scope we have. The strength of OpenMP is to parallize CPU intensive operations of the code where those regions is well defined in what data they access, not to deal with a large scale of concurrent operations with access to unknown amounts of shared data. Trying to thread the Squid core engine is in many ways similar to the problems kernel developers have had to fight in making the OS kernels multithreaded, except that we don't even have threads of execution (the OS developers at least had processes). If trying to do the same with the Squid code then we would need an approach like the following: 1. Create a big Squid main lock, always held except for audited regions known to use more fine grained locking. 2. Set up N threads of executing, all initially fighting for that big main lock in each operation. 3. Gradually work over the code identify areas where that big lock is not needed to be held, transition over to more fine grained locking. Starting at the main loops and work down from there. This is not a path I favor for the Squid code. It's a transition which is larger than the Squid-3 transition, and which have even bigger negative impacts on performance until most of the work have been completed. Another alternative is to start on Squid-4, rewri
Re: squid-smp: synchronization issue & solutions
sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries: > I think we can open the doors earlier than after that. I'm happy with an > approach that would see the smaller units of Squid growing in > parallelism to encompass two full cores. And I have a more careful opinion. Introducing threads in the current Squid core processing is very non-trivial. This due to the relatively high amount of shared data with no access protection. We already have sufficient nightmares from data access synchronization issues in the current non-threaded design, and trying to synchronize access in a threaded operations is many orders of magnitude more complex. The day the code base is cleaned up to the level that one can actually assess what data is being accessed where threads may be a viable discussion, but as things are today it's almost impossible to judge what data will be directly or indirectly accessed by any larger operation. Using threads for micro operations will not help us. The overhead involved in scheduling an operation to a thread is comparably large to most operations we are performing, and if adding to this the amount of synchronization needed to shield the data accessed by that operation then the overhead will in nearly all cases by far out weight the actual processing time of the micro operations only resulting in a net loss of performance. There is some isolated cases I can think of like SSL handshake negotiation where actual processing may be significant, but at the general level I don't see many operations which would be candidates for micro threading. Using threads for isolated things like disk I/O is one thing. The code running in those threads are very very isolated and limited in what it's allowed to do (may only access the data given to them, may NOT allocate new data or look up any other global data), but is still heavily penalized from synchronization overhead. Further the only reason why we have the threaded I/O model is because Posix AIO do not provide a rich enough interface, missing open/close operations which may both block for significant amount of time. So we had to implement our own alternative having open/close operations. If you look closely at the threads I/O code you will see that it goes to quite great lengths to isolate the threads from the main code, with obvious performance drawbacks. The initial code even went much further in isolation, but core changes have over time provided a somewhat more suitable environment for some of those operations. For the same reasons I don't see OpenMP as fitting for the problem scope we have. The strength of OpenMP is to parallize CPU intensive operations of the code where those regions is well defined in what data they access, not to deal with a large scale of concurrent operations with access to unknown amounts of shared data. Trying to thread the Squid core engine is in many ways similar to the problems kernel developers have had to fight in making the OS kernels multithreaded, except that we don't even have threads of execution (the OS developers at least had processes). If trying to do the same with the Squid code then we would need an approach like the following: 1. Create a big Squid main lock, always held except for audited regions known to use more fine grained locking. 2. Set up N threads of executing, all initially fighting for that big main lock in each operation. 3. Gradually work over the code identify areas where that big lock is not needed to be held, transition over to more fine grained locking. Starting at the main loops and work down from there. This is not a path I favor for the Squid code. It's a transition which is larger than the Squid-3 transition, and which have even bigger negative impacts on performance until most of the work have been completed. Another alternative is to start on Squid-4, rewriting the code base completely from scratch starting at a parallel design and then plug in any pieces that can be rescued from earlier Squid generations if any. But for obvious staffing reasons this is an approach I do not recommend in this project. It's effectively starting another project, with very little shared with the Squid we have today. For these reasons I am more in favor for multi-process approaches. The amount of work needed for making Squid multi-process capable is fairly limited and mainly circulates around the cache index and a couple of other areas that need to be shared for proper operation. We can fully parallelize Squid today at process level if disabling persistent shared cache + digest auth, and this is done by many users already. Squid-2 can even do it on the same http_port, letting the OS schedule connections to the available Squid processes. Regards Henrik
Re: squid-smp: synchronization issue & solutions
Robert Collins wrote: On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote: Important features of OPENMP, you might be interested in... ** If your compiler is not supporting OPENMP then you dont have to do any special thing, Compiler simply ignores these #pragmas.. and runs codes as if they are in sequential single thread program, without affecting the end goal. I don't think this is useful to us: all the platforms we consider important have threading libraries. Support does seem widespread. ** Programmers need not to create any locking mechanism and worry about critical sections, We have to worry about this because: - OpenMP is designed for large data set manipulation - few of our datasets are large except for: - some ACL's - the main hash table So we'll need separate threads created around large constructs like 'process a request' (unless we take a thread-per-CPU approach and a queue of jobs). Either approach will require careful synchronisation on the 20 or so shared data structures. ** By default it creates number threads equals to processors( * cores per processor) in your system. All of the above make me think that OPENMP-enabled Squid may be significantly slower than multi-instance Squid. I doubt OPENMP is so smart that it can correctly and efficiently orchestrate the work of Squid "threads" that are often not even visible/identifiable in the current code. I think it could, if we had a shared-nothing model under the hood so that we could 'simply' parallelise the front end dispatch and let everything run. However, that doesn't really fit our problem. - Designed for parallelizing computation-intensive programs such as various math models running on massively parallel computers. AFAICT, the OpenMP steering group is comprised of folks that deal with such models in such environments. Our environment and performance goals are rather different. But that doesnt mean that we can not have independent threads, It means that there is a high probability that it will not work well for other, very different, problem areas. It may work, but not work well enough. I agree. From my reading openMP isn't really suitable to our domain. I've asked around a little and noone has said 'Yes! you should Do It'. The similar servers I know of like drizzle(Mysql) do not do it. I think our first questions should instead include: Q1. What are the major areas or units of asynchronous code execution? Some of us may prefer large areas such as "http_port acceptor" or "cache" or "server side". Others may root for AsyncJob as the largest asynchronous unit of execution. These two approaches and their implications differ a lot. There may be other designs worth considering. I'd like to let people start writing (and perf testing!) patches. To unblock people. I think the primary questions are: - do we permit multiple approaches inside the same code base. E.g. OpenMP in some bits, pthreads / windows threads elsewhere, and 'job queues' or some such abstraction elsewhere ? (I vote yes, but with caution: someone trying something we don't already do should keep it on a branch and really measure it well until its got plenty of buy in). I'm also in favor of the mixed approach. With care that the particular approach taken at each point is appropriate for the operation being done. For example I wouldn't place each Call into a process. But a thread each might be arguable. Whereas a Job might be a process with multiple threads, or a thread with async hops in time. - If we do *not* permit multiple approaches, then what approach do we want for parallelisation. E.g. a number of long lived threads that take on work, or many transient threads as particular bits of the code need threads. I favour the former (long lived 'worker' threads). If we can reach either a 'yes' on the first of these two questions or a decision on the second, then folk can start working on their favourite part of the code base. As long as its well tested and delivered with appropriate synchronisation, I think the benefit of letting folk scratch itches will be considerable. I know you have processes vs threads as a key question, but I don't actually think it is. I don't think so either. Sounds like a good Q but its a choice of two alternatives where the best alternative is number 3: both. We _already_ have a mixed environment. The helpers and diskd/unlinkd are perfect examples of having chosen the process model for some small internal units of Squid and the idns vs dnsserver being an example of the other choice being made. We are not deciding on how to make Squid parallel, but how to make is massively _more_ parallel than it already is. We *already* have significant experience with threads (threaded disk io engine) and multiple processes (diskd io engine, helpers). We shouldn't require a single answer for breaking squid up, rather good analysis by the person doing the work on breaking a particular bit of it up.
Re: squid-smp: synchronization issue & solutions
On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote: > > Important features of OPENMP, you might be interested in... > > > > ** If your compiler is not supporting OPENMP then you dont have to do > > any special thing, Compiler simply ignores these #pragmas.. > > and runs codes as if they are in sequential single thread program, > > without affecting the end goal. I don't think this is useful to us: all the platforms we consider important have threading libraries. Support does seem widespread. > > ** Programmers need not to create any locking mechanism and worry > > about critical sections, We have to worry about this because: - OpenMP is designed for large data set manipulation - few of our datasets are large except for: - some ACL's - the main hash table So we'll need separate threads created around large constructs like 'process a request' (unless we take a thread-per-CPU approach and a queue of jobs). Either approach will require careful synchronisation on the 20 or so shared data structures. > > ** By default it creates number threads equals to processors( * cores > > per processor) in your system. > > All of the above make me think that OPENMP-enabled Squid may be > significantly slower than multi-instance Squid. I doubt OPENMP is so > smart that it can correctly and efficiently orchestrate the work of > Squid "threads" that are often not even visible/identifiable in the > current code. I think it could, if we had a shared-nothing model under the hood so that we could 'simply' parallelise the front end dispatch and let everything run. However, that doesn't really fit our problem. > >> - Designed for parallelizing computation-intensive programs such as > >> various math models running on massively parallel computers. AFAICT, the > >> OpenMP steering group is comprised of folks that deal with such models > >> in such environments. Our environment and performance goals are rather > >> different. > >> > > > > But that doesnt mean that we can not have independent threads, > > It means that there is a high probability that it will not work well for > other, very different, problem areas. It may work, but not work well enough. I agree. From my reading openMP isn't really suitable to our domain. I've asked around a little and noone has said 'Yes! you should Do It'. The similar servers I know of like drizzle(Mysql) do not do it. > >> I think our first questions should instead include: > >> > >> Q1. What are the major areas or units of asynchronous code execution? > >> Some of us may prefer large areas such as "http_port acceptor" or > >> "cache" or "server side". Others may root for AsyncJob as the largest > >> asynchronous unit of execution. These two approaches and their > >> implications differ a lot. There may be other designs worth considering. I'd like to let people start writing (and perf testing!) patches. To unblock people. I think the primary questions are: - do we permit multiple approaches inside the same code base. E.g. OpenMP in some bits, pthreads / windows threads elsewhere, and 'job queues' or some such abstraction elsewhere ? (I vote yes, but with caution: someone trying something we don't already do should keep it on a branch and really measure it well until its got plenty of buy in). - If we do *not* permit multiple approaches, then what approach do we want for parallelisation. E.g. a number of long lived threads that take on work, or many transient threads as particular bits of the code need threads. I favour the former (long lived 'worker' threads). If we can reach either a 'yes' on the first of these two questions or a decision on the second, then folk can start working on their favourite part of the code base. As long as its well tested and delivered with appropriate synchronisation, I think the benefit of letting folk scratch itches will be considerable. I know you have processes vs threads as a key question, but I don't actually think it is. We *already* have significant experience with threads (threaded disk io engine) and multiple processes (diskd io engine, helpers). We shouldn't require a single answer for breaking squid up, rather good analysis by the person doing the work on breaking a particular bit of it up. > > I AM THINKING ABOUT HYBRID OF BOTH... > > > > Somebody might implement process model, Then we would merge both > > process and thread models .. together we could have a better squid.. > > :) > > What do u think? > > I doubt we have the resources to do a generic process model so I would > rather decide on a single primary direction (processes or threads) and > try to generalize that later if needed. However, a process (if we decide > to go down that route) may still have lower-level threads, but that is a > secondary question/decision. We could simply adopt ACE wholesale and focus on the squid unique bits of the stack. Squid is a pretty typical 'all in one' bundle at the moment; I'd like to see us focus and reuse/split ou
Re: squid-smp: synchronization issue & solutions
Right. Thats the easy bit. I could even do that in Squid-2 with a little bit of luck. The hard bit is rewriting the relevant code which relies on cbdata style reference counting behaviour. That is the tricky bit. Adrian 2009/11/20 Robert Collins : > On Wed, 2009-11-18 at 10:46 +0800, Adrian Chadd wrote: >> Plenty of kernels nowdays do a bit of TCP and socket process in >> process/thread context; so you need to do your socket TX/RX in >> different processes/threads to get parallelism in the networking side >> of things. > > Very good point. > >> You could fake it somewhat by pushing socket IO into different threads >> but then you have all the overhead of shuffling IO and completed IO >> between threads. This may be .. complicated. > > The event loop I put together for -3 should be able to do that without > changing the loop - just extending the modules that hook into it. > > -Rob >
Re: squid-smp: synchronization issue & solutions
On Wed, 2009-11-18 at 10:46 +0800, Adrian Chadd wrote: > Plenty of kernels nowdays do a bit of TCP and socket process in > process/thread context; so you need to do your socket TX/RX in > different processes/threads to get parallelism in the networking side > of things. Very good point. > You could fake it somewhat by pushing socket IO into different threads > but then you have all the overhead of shuffling IO and completed IO > between threads. This may be .. complicated. The event loop I put together for -3 should be able to do that without changing the loop - just extending the modules that hook into it. -Rob signature.asc Description: This is a digitally signed message part
Re: squid-smp: synchronization issue & solutions
Plenty of kernels nowdays do a bit of TCP and socket process in process/thread context; so you need to do your socket TX/RX in different processes/threads to get parallelism in the networking side of things. You could fake it somewhat by pushing socket IO into different threads but then you have all the overhead of shuffling IO and completed IO between threads. This may be .. complicated. Adrian 2009/11/18 Gonzalo Arana : > On Tue, Nov 17, 2009 at 12:45 PM, Alex Rousskov > wrote: >> On 11/17/2009 04:09 AM, Sachin Malave wrote: >> >> >> >>> I AM THINKING ABOUT HYBRID OF BOTH... >>> >>> Somebody might implement process model, Then we would merge both >>> process and thread models .. together we could have a better squid.. >>> :) >>> What do u think? > > In my limited squid expierence, cpu usage is hardly a bottleneck. So, > why not just use smp for the cpu/disk-intensive parts? > > The candidates I can think of are: > * evaluating regular expressions (url_regex acls). > * aufs/diskd (squid already has support for this). > > Best regards, > > -- > Gonzalo A. Arana > >
Re: squid-smp: synchronization issue & solutions
On Tue, 2009-11-17 at 15:49 -0300, Gonzalo Arana wrote: > In my limited squid expierence, cpu usage is hardly a bottleneck. So, > why not just use smp for the cpu/disk-intensive parts? > > The candidates I can think of are: > * evaluating regular expressions (url_regex acls). > * aufs/diskd (squid already has support for this). So, we can drive squid to 100% CPU in production high load environments. To scale further we need: - more cpus - more performance from the cpu's we have Adrian is working on the latter, and the SMP discussion is about the former. Simply putting each request in its own thread would go a long way towards getting much more bang for buck - but thats not actually trivial to do :) -Rob signature.asc Description: This is a digitally signed message part
Re: squid-smp: synchronization issue & solutions
On Tue, Nov 17, 2009 at 9:15 PM, Alex Rousskov wrote: > On 11/17/2009 04:09 AM, Sachin Malave wrote: > >>> After spending 2 minutes on openmp.org, I am not very excited about >>> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be: >>> >>> - An "approach" or "model" requiring compiler support and language >>> extensions. It is _not_ a library. You examples with #pragmas is a good >>> illustration. > >> Important features of OPENMP, you might be interested in... >> >> ** If your compiler is not supporting OPENMP then you dont have to do >> any special thing, Compiler simply ignores these #pragmas.. >> and runs codes as if they are in sequential single thread program, >> without affecting the end goal. >> >> ** Programmers need not to create any locking mechanism and worry >> about critical sections, >> >> ** By default it creates number threads equals to processors( * cores >> per processor) in your system. > > All of the above make me think that OPENMP-enabled Squid may be > significantly slower than multi-instance Squid. I doubt OPENMP is so > smart that it can correctly and efficiently orchestrate the work of > Squid "threads" that are often not even visible/identifiable in the > current code. > >>> - Designed for parallelizing computation-intensive programs such as >>> various math models running on massively parallel computers. AFAICT, the >>> OpenMP steering group is comprised of folks that deal with such models >>> in such environments. Our environment and performance goals are rather >>> different. >>> >> >> But that doesnt mean that we can not have independent threads, > > It means that there is a high probability that it will not work well for > other, very different, problem areas. It may work, but not work well enough. > >>> I think our first questions should instead include: >>> >>> Q1. What are the major areas or units of asynchronous code execution? >>> Some of us may prefer large areas such as "http_port acceptor" or >>> "cache" or "server side". Others may root for AsyncJob as the largest >>> asynchronous unit of execution. These two approaches and their >>> implications differ a lot. There may be other designs worth considering. >>> >> >> See my sample codes, I sent in last mail.. There i have separated out >> the schedule() and dial() functions, Where one thread is registering >> calls in AsyncCallQueue and another is dispatching them.. >> Well, We can concentrate on other areas also > > scheedule() and dial() are low level routines that are irrelevant for Q1. > >>> Q2. Threads versus processes. Depending on Q1, we may have a choice. The >>> choice will affect the required locking mechanism and other key decisions. >>> >> >> If you are planning to use processes then it is as good as running >> multiple squids on single machine.., > > I am not planning to use processes yet, but if they are indeed as good > as running multiple Squids, that is a plus. Hopefully, we can do better > than multi-instance Squid, but we should be at least as bad/good. > > >> Only thing is they must be >> accepting requests on different ports... But if we want distribute >> single squid's work then i feel threading is the best choice.. > > You can have a process accepting a request and then forwarding the work > to another process or receiving a cache hit from another process. > Inter-process communication is slower than inter-thread communication, > but it is not impossible. > > >> I AM THINKING ABOUT HYBRID OF BOTH... >> >> Somebody might implement process model, Then we would merge both >> process and thread models .. together we could have a better squid.. >> :) >> What do u think? > > I doubt we have the resources to do a generic process model so I would > rather decide on a single primary direction (processes or threads) and > try to generalize that later if needed. However, a process (if we decide > to go down that route) may still have lower-level threads, but that is a > secondary question/decision. > OK then, please come precisely, What exactly you are thinking ? tell me areas where i should concentrate ? I want to know what exactly is going in your mind so that i could start working and experimenting in that direction ... :) meanwhile i would also try to experiment with threading, i am doing right now, it would help me when we start actual development, is that OK ? Thanx.. > Cheers, > > Alex. > -- Mr. S. H. Malave Computer Science & Engineering Department, Walchand College of Engineering,Sangli. sachinmal...@wce.org.in
Re: squid-smp: synchronization issue & solutions
On Tue, Nov 17, 2009 at 12:45 PM, Alex Rousskov wrote: > On 11/17/2009 04:09 AM, Sachin Malave wrote: > > > >> I AM THINKING ABOUT HYBRID OF BOTH... >> >> Somebody might implement process model, Then we would merge both >> process and thread models .. together we could have a better squid.. >> :) >> What do u think? In my limited squid expierence, cpu usage is hardly a bottleneck. So, why not just use smp for the cpu/disk-intensive parts? The candidates I can think of are: * evaluating regular expressions (url_regex acls). * aufs/diskd (squid already has support for this). Best regards, -- Gonzalo A. Arana
Re: squid-smp: synchronization issue & solutions
On 11/17/2009 04:09 AM, Sachin Malave wrote: >> After spending 2 minutes on openmp.org, I am not very excited about >> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be: >> >> - An "approach" or "model" requiring compiler support and language >> extensions. It is _not_ a library. You examples with #pragmas is a good >> illustration. > Important features of OPENMP, you might be interested in... > > ** If your compiler is not supporting OPENMP then you dont have to do > any special thing, Compiler simply ignores these #pragmas.. > and runs codes as if they are in sequential single thread program, > without affecting the end goal. > > ** Programmers need not to create any locking mechanism and worry > about critical sections, > > ** By default it creates number threads equals to processors( * cores > per processor) in your system. All of the above make me think that OPENMP-enabled Squid may be significantly slower than multi-instance Squid. I doubt OPENMP is so smart that it can correctly and efficiently orchestrate the work of Squid "threads" that are often not even visible/identifiable in the current code. >> - Designed for parallelizing computation-intensive programs such as >> various math models running on massively parallel computers. AFAICT, the >> OpenMP steering group is comprised of folks that deal with such models >> in such environments. Our environment and performance goals are rather >> different. >> > > But that doesnt mean that we can not have independent threads, It means that there is a high probability that it will not work well for other, very different, problem areas. It may work, but not work well enough. >> I think our first questions should instead include: >> >> Q1. What are the major areas or units of asynchronous code execution? >> Some of us may prefer large areas such as "http_port acceptor" or >> "cache" or "server side". Others may root for AsyncJob as the largest >> asynchronous unit of execution. These two approaches and their >> implications differ a lot. There may be other designs worth considering. >> > > See my sample codes, I sent in last mail.. There i have separated out > the schedule() and dial() functions, Where one thread is registering > calls in AsyncCallQueue and another is dispatching them.. > Well, We can concentrate on other areas also scheedule() and dial() are low level routines that are irrelevant for Q1. >> Q2. Threads versus processes. Depending on Q1, we may have a choice. The >> choice will affect the required locking mechanism and other key decisions. >> > > If you are planning to use processes then it is as good as running > multiple squids on single machine.., I am not planning to use processes yet, but if they are indeed as good as running multiple Squids, that is a plus. Hopefully, we can do better than multi-instance Squid, but we should be at least as bad/good. > Only thing is they must be > accepting requests on different ports... But if we want distribute > single squid's work then i feel threading is the best choice.. You can have a process accepting a request and then forwarding the work to another process or receiving a cache hit from another process. Inter-process communication is slower than inter-thread communication, but it is not impossible. > I AM THINKING ABOUT HYBRID OF BOTH... > > Somebody might implement process model, Then we would merge both > process and thread models .. together we could have a better squid.. > :) > What do u think? I doubt we have the resources to do a generic process model so I would rather decide on a single primary direction (processes or threads) and try to generalize that later if needed. However, a process (if we decide to go down that route) may still have lower-level threads, but that is a secondary question/decision. Cheers, Alex.
Re: squid-smp: synchronization issue & solutions
On Mon, Nov 16, 2009 at 9:43 PM, Alex Rousskov wrote: > On 11/15/2009 11:59 AM, Sachin Malave wrote: > >> Since last few days i am analyzing squid code for smp support, I found >> one big issue regarding debugs() function, It is very hard get rid of >> this issue as it is appearing at almost everywhere in the code. So for >> testing purpose i have disable the debug option in squid.conf as >> follows >> >> --- >> debug_options 0,0 >> --- >> >> Well this was only way, as did not want to spend time on this issue. > > You can certainly disable any feature as an intermediate step as long as > the overall approach allows for the later efficient support of the > temporary disabled feature. Debugging is probably the worst feature to > disable though because without it we do not know much about Squid operation. > I agree, We should find a way to re-enable this feature. It is temporarily disabled... Off-course locking debugs() was not the solution thats why it is disabled... > >> Now concentrating on locking mechanism... > > I would not recommend starting with such low-level decisions as locking > mechanisms. We need to decide what needs to be locked first. AFAIK, > there is currently no consensus whether we start with processes or > threads, for example. The locking mechanism would depend on that. > > >> As OpenMP library is widely supported by almost all platforms and >> compilers, I am inheriting locking mechanism from the same >> Just include omp.h & compile code with -fopenmp option if using gcc, >> Other may use similar thing on their platform, Well that is not a big >> issue.. > > After spending 2 minutes on openmp.org, I am not very excited about > using OpenMP. Please correct me if I am wrong, but OpenMP seems to be: > > - An "approach" or "model" requiring compiler support and language > extensions. It is _not_ a library. You examples with #pragmas is a good > illustration. > We have to use something to create and manage threads, there are some other libraries and models also but i feel we need something that will work on all platforms, Important features of OPENMP, you might be interested in... ** If your compiler is not supporting OPENMP then you dont have to do any special thing, Compiler simply ignores these #pragmas.. and runs codes as if they are in sequential single thread program, without affecting the end goal. ** Programmers need not to create any locking mechanism and worry about critical sections, ** By default it creates number threads equals to processors( * cores per processor) in your system. ** Its fork and join model is scalable.. ( Off-course we must find such areas in exiting code) ** OPENMP is OLD but still growing .. Providing new features with new releases.. Think about other threading libraries, I think their developments are stopped, Some of them are not freely available, some of them are available only on WINDOWS.. ** IT IS FREE and OPEN-SOURCE like us.. ** INTEL just has released TBB ( Threading Building Blocks), But i doubt its performance on AMD ( non-intel ) hardware. ** You might be thinking about old Pthreads, But i think OPENMP is very safe and better than pthreads for programmers SPECIALLY ONE WHO IS MAKING CHANGES IN EXISTING CODES. and easy to debugs. please think about my last point... :) > - Designed for parallelizing computation-intensive programs such as > various math models running on massively parallel computers. AFAICT, the > OpenMP steering group is comprised of folks that deal with such models > in such environments. Our environment and performance goals are rather > different. > But that doesnt mean that we can not have independent threads, Only thing is that we have to start these threads in main(), because main never ends.. Otherwise those independent threads will die after returning to calling function.. > >> 1. hash_link LOCKED >> >> 2. dlink_list LOCKED >> >> 3. ipcache, fqdncache LOCKED, >> >> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then >> please discuss. >> >> 5. statistic counters --- NOT LOCKED ( I know this is very important, >> But these are scattered all around squid code, Write now they may be >> holding wrong values) >> >> 6. memory manager --- DID NOT FOLLOW >> >> 7. configuration objects --- DID NOT FOLLOW > > I worry that the end result of this exercise would produce a slow and > buggy Squid for several reasons: > > - Globally locking low-level but interdependent objects is likely to > create deadlocks when two or more locked objects need to lock other > locked objects in a circular fashion. > is there any other option ? As discussed, Amos is trying to make these areas as independent as possible. So that we would have less locking in the code. > - Locking low-level objects without an overall performance-aware plan is > likely to result in performance-killing competition for critical locks. > I believe that with
Re: squid-smp: synchronization issue & solutions
On 11/15/2009 11:59 AM, Sachin Malave wrote: > Since last few days i am analyzing squid code for smp support, I found > one big issue regarding debugs() function, It is very hard get rid of > this issue as it is appearing at almost everywhere in the code. So for > testing purpose i have disable the debug option in squid.conf as > follows > > --- > debug_options 0,0 > --- > > Well this was only way, as did not want to spend time on this issue. You can certainly disable any feature as an intermediate step as long as the overall approach allows for the later efficient support of the temporary disabled feature. Debugging is probably the worst feature to disable though because without it we do not know much about Squid operation. > Now concentrating on locking mechanism... I would not recommend starting with such low-level decisions as locking mechanisms. We need to decide what needs to be locked first. AFAIK, there is currently no consensus whether we start with processes or threads, for example. The locking mechanism would depend on that. > As OpenMP library is widely supported by almost all platforms and > compilers, I am inheriting locking mechanism from the same > Just include omp.h & compile code with -fopenmp option if using gcc, > Other may use similar thing on their platform, Well that is not a big > issue.. After spending 2 minutes on openmp.org, I am not very excited about using OpenMP. Please correct me if I am wrong, but OpenMP seems to be: - An "approach" or "model" requiring compiler support and language extensions. It is _not_ a library. You examples with #pragmas is a good illustration. - Designed for parallelizing computation-intensive programs such as various math models running on massively parallel computers. AFAICT, the OpenMP steering group is comprised of folks that deal with such models in such environments. Our environment and performance goals are rather different. > 1. hash_link LOCKED > > 2. dlink_list LOCKED > > 3. ipcache, fqdncache LOCKED, > > 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then > please discuss. > > 5. statistic counters --- NOT LOCKED ( I know this is very important, > But these are scattered all around squid code, Write now they may be > holding wrong values) > > 6. memory manager --- DID NOT FOLLOW > > 7. configuration objects --- DID NOT FOLLOW I worry that the end result of this exercise would produce a slow and buggy Squid for several reasons: - Globally locking low-level but interdependent objects is likely to create deadlocks when two or more locked objects need to lock other locked objects in a circular fashion. - Locking low-level objects without an overall performance-aware plan is likely to result in performance-killing competition for critical locks. I believe that with the right design, many locks can be avoided. I think our first questions should instead include: Q1. What are the major areas or units of asynchronous code execution? Some of us may prefer large areas such as "http_port acceptor" or "cache" or "server side". Others may root for AsyncJob as the largest asynchronous unit of execution. These two approaches and their implications differ a lot. There may be other designs worth considering. Q2. Threads versus processes. Depending on Q1, we may have a choice. The choice will affect the required locking mechanism and other key decisions. Thank you, Alex.
Re: squid-smp: synchronization issue & solutions
[NP: eliding recipients I know are getting these mails through squid-dev anyway] On Mon, 16 Nov 2009 12:52:15 +1100, Robert Collins wrote: > On Mon, 2009-11-16 at 00:29 +0530, Sachin Malave wrote: >> Hello, >> >> Since last few days i am analyzing squid code for smp support, I found >> one big issue regarding debugs() function, It is very hard get rid of >> this issue as it is appearing at almost everywhere in the code. So for >> testing purpose i have disable the debug option in squid.conf as >> follows >> >> --- >> debug_options 0,0 >> --- >> >> Well this was only way, as did not want to spend time on this issue. > > Its very important that debugs works. What exactly were the problems identified? > > >> 1. hash_link LOCKED > > Bad idea, not all hashes will be cross-thread, so making the primitive > lock incurs massive overhead for all threads. > >> 2. dlink_list LOCKED > > Ditto. > Aye. These two need to be checked for thread-safe implementations and any locking done in the caller code per the distinctly named hash/dlink. >> 3. ipcache, fqdncache LOCKED, > > Probably important. > >> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then >> please discuss. > > we need analysis and proof, not 'seems to work'. Aye. NP: this is one of the critical data stores in Squid. I wouldn't be too far off generalizing the "everything" up and down the request handling uses it semi-'random access' directly or indirectly. > >> 5. statistic counters --- NOT LOCKED ( I know this is very important, >> But these are scattered all around squid code, Write now they may be >> holding wrong values) > > Will need to be fixed. > >> 6. memory manager --- DID NOT FOLLOW > > Will need attention, e.g. per thread allocators. > >> 7. configuration objects --- DID NOT FOLLOW > > ACL's are not threadsafe. > >> AND FINALLY, Two sections in EventLoop.cc are separated and executed >> in two threads simultaneously >> as follows (#pragma lines added in existing code, no other changes) > > I'm not at all sure that splitting the event loop like that is sensible. > > Better to have the dispatcher dispatch to threads. > > -Rob Amos
Re: squid-smp: synchronization issue & solutions
On Mon, 2009-11-16 at 00:29 +0530, Sachin Malave wrote: > Hello, > > Since last few days i am analyzing squid code for smp support, I found > one big issue regarding debugs() function, It is very hard get rid of > this issue as it is appearing at almost everywhere in the code. So for > testing purpose i have disable the debug option in squid.conf as > follows > > --- > debug_options 0,0 > --- > > Well this was only way, as did not want to spend time on this issue. Its very important that debugs works. > 1. hash_link LOCKED Bad idea, not all hashes will be cross-thread, so making the primitive lock incurs massive overhead for all threads. > 2. dlink_list LOCKED Ditto. > 3. ipcache, fqdncache LOCKED, Probably important. > 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then > please discuss. we need analysis and proof, not 'seems to work'. > 5. statistic counters --- NOT LOCKED ( I know this is very important, > But these are scattered all around squid code, Write now they may be > holding wrong values) Will need to be fixed. > 6. memory manager --- DID NOT FOLLOW Will need attention, e.g. per thread allocators. > 7. configuration objects --- DID NOT FOLLOW ACL's are not threadsafe. > AND FINALLY, Two sections in EventLoop.cc are separated and executed > in two threads simultaneously > as follows (#pragma lines added in existing code, no other changes) I'm not at all sure that splitting the event loop like that is sensible. Better to have the dispatcher dispatch to threads. -Rob signature.asc Description: This is a digitally signed message part
squid-smp: synchronization issue & solutions
Hello, Since last few days i am analyzing squid code for smp support, I found one big issue regarding debugs() function, It is very hard get rid of this issue as it is appearing at almost everywhere in the code. So for testing purpose i have disable the debug option in squid.conf as follows --- debug_options 0,0 --- Well this was only way, as did not want to spend time on this issue. Now concentrating on locking mechanism... As OpenMP library is widely supported by almost all platforms and compilers, I am inheriting locking mechanism from the same Just include omp.h & compile code with -fopenmp option if using gcc, Other may use similar thing on their platform, Well that is not a big issue.. BUT, is it wise to take support from this library? Please discuss on this issue I felt it is really easy to manage threads and critical sections if we use OPENMP. AS DISCUSSED BEFORE.. AND details available on http://wiki.squid-cache.org/Features/SmpScale I think, I have solved SOME critical section problems in existing squid code. *AsyncCallQueue.cc*** void AsyncCallQueue::schedule(AsyncCall::Pointer &call) { #pragma omp critical (AsyncCallQueueLock_c) // HERE IS THE LOCK { if (theHead != NULL) { // append assert(!theTail->theNext); theTail->theNext = call; theTail = call; } else { // create queue from cratch theHead = theTail = call; } } //AND THEN AsyncCallQueue::fireNext() { AsyncCall::Pointer call; #pragma omp critical (AsyncCallQueueLock_c) // SAME LOCK { call = theHead; theHead = call->theNext; call->theNext = NULL; if (theTail == call) theTail = NULL; } } ITS WORKING, AS SAME CRITICAL SECTIONS (i.e AsyncCallQueueLock_c) CAN NOT BE CALLED SIMULTANEOUSLY ** Well in the same way following thing as appearing on /Features/SmpScale are also locked( May be incompletely) 1. hash_link LOCKED 2. dlink_list LOCKED 3. ipcache, fqdncache LOCKED, 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then please discuss. 5. statistic counters --- NOT LOCKED ( I know this is very important, But these are scattered all around squid code, Write now they may be holding wrong values) 6. memory manager --- DID NOT FOLLOW 7. configuration objects --- DID NOT FOLLOW AND FINALLY, Two sections in EventLoop.cc are separated and executed in two threads simultaneously as follows (#pragma lines added in existing code, no other changes) **EventLoop.cc #pragma omp parallel sections //PARALLEL SECTIONS { #pragma omp section //THREAD-1 { if (waitingEngine != NULL) checkEngine(waitingEngine, true); if (timeService != NULL) timeService->tick(); checked = true; } #pragma omp section //THREAD-2 { while(1) { if ( lastRound == true) break; sawActivity = dispatchCalls(); if (sawActivity) runOnceResult = false; if(checked == true) lastRound = true; } } } May need deep testing , but it is working.. am I on the right path ? Thank you, -- Mr. S. H. Malave Computer Science & Engineering Department, Walchand College of Engineering,Sangli. sachinmal...@wce.org.in