sebb wrote:
On 14/06/07, Dmytro Fedonin <[EMAIL PROTECTED]> wrote:
Looking through 'server/mpm/worker/worker.c' I have found such a
combination of TODO/FIXME comments:
1)
/* TODO: requests_this_child should be synchronized - aaron */
if (requests_this_child <= 0) {
2)
requests_this_child--; /* FIXME: should be synchronized - aaron */

And I can not see any point here. These are one word CPU operations,
thus there is no way to preempt inside this kind of operation. So, one
CPU is safe by nature of basic operation. If we have several CPUs they
will synchronize caches any way, thus we will never get inconsistent
state here. We can only lose time trying to synchronize it in code. Am I
not right?

The decrement operation is a read-modify-write cycle, it is possible for 2 CPUs to overlap their operations, ending up with a observable lost decrement. Since they both end up reading the same initial value.

On IA32/x86 the "DEC" assembly instruction operation can be prefixed by the "LOCK" instruction, this makes the CPU continue to assert memory bus locking for the duration of the instruction so there is no way for CPU2 to perform a read access until CPU1 releases control of the memory bus when it completes the instruction, this is effectively what atomic_dec() enforces.

The amount of performance lost by using atomic_xxx() really is minimal, with any luck it might only be that cache-line that remains locked not the entire memory bus.


The decrement operation may be handled as load, decrement, store on
some architectures, so can be pre-empted by a different CPU.

There is no other way to handle it :) Memory itself can't perform arithmetic operations, so the decrement always happens inside the ALU inside the CPU.

It is true that non-SMP aware CPUs might maintain memory bus acquisition during the 'decrement' (aka modify) phase of the operation since there is no reason not to give it up as they are the only user of memory.

This becomes a performance bottleneck for any SMP capable CPU which has a cache that can operate at full CPU clock speeds. As the 'decrement' (aka modify) phase is going to require at least 1 clock cycle to perform so why not let another CPU make use of the memory bus.


Also some hardware architectures (e.g. HP Alpha) have an unusual
memory model. One CPU may see memory updates in a different order from
another CPU. Software that relies on the updates being seen across all
CPUs must use the appropriate memory synchronisation instructions.

I don't know if these considerations apply to this code.

Memory update ordering applies when considering how 2 or more distinct machine words are updated with respect to themselves when those updates are observed from another CPU.

The example here is with concerns over a single machine word being updated on SMP systems.


Darryl

Reply via email to