On Friday 04 January 2013, Daniel Lescohier wrote: > I just saw this from last month from Stefan Fritsch and Niklas > Edmundsson: > > The fact that the client ip shows up on all threads points to some > > >> potential optimization: Recently active threads should be > >> preferred, because their memory is more likely to be in the cpu > >> caches. Right now, the thread that has been idle the longest > >> time will do the work. > >> > >> Ah, virtually guaranteeing that the thread with the coldest > >> cache gets to > > > > do the work... > > I definitely agree on the potential for improvement here, you would > most > > > likely want to select either the thread that processed this > > request last time, or the most recently active idle thread. > > These two conditions kinda collides though, so the challenge is > > probably to come up with some rather cheap selection algorithm > > that is good enough. > > Which CPU memory caches are you referring to? > > 1. For stack, on a new request, you're writing a new call stack, > the prior request's stack was unwound.
Even if the previous request's stack memory is freed, the new request will use the same memory which will likely be in the cpu cache. And since the stack is written to in portions smaller than a cache line, the cpu would have to read the stack from memory otherwise. > 2. For heap, you're creating a new request pool, so you will be > popping an apr memnode from the head of the allocator's free list. > It may even be the same memnode that was pushed onto the free > list when the prior request's apr pool was freed. Mpm event uses per-connection allocators. The most recently freed allocator will be used first. If this is done by the most recently active thread, it may be more likely that the allocator memory is in the cache of the correct cpu. But this may be more complex in practice. Besides, it is quite possible that the most recently used memnode is not at the top of the allocator. > 3. On a new request, the next code/instructions it will execute > will be the functions that reads the http request and populates > the request_rec. That's different code than happened at the end of > the prior request (serving the request, logging the request, > etc.). If a request is handled by a previously inactive thread, it may be more likely that that thread is scheduled on a cpu core that did not previously run httpd at all. This would then cause a cold instruction cache and require a context switch. However, this depends very much on the OS and the details of the workload. > So, I'm not convinced a thread selection algorithm is needed. For 1., a better thread selection would definitely be a win. For 2. and 3., it is less obvious.
