sebb wrote:
On 14/06/07, Dmytro Fedonin <[EMAIL PROTECTED]> wrote:
Looking through 'server/mpm/worker/worker.c' I have found such a
combination of TODO/FIXME comments:
1)
/* TODO: requests_this_child should be synchronized - aaron */
if (requests_this_child <= 0) {
2)
requests_this_child--; /* FIXME: should be synchronized - aaron */
And I can not see any point here. These are one word CPU operations,
thus there is no way to preempt inside this kind of operation. So, one
CPU is safe by nature of basic operation. If we have several CPUs they
will synchronize caches any way, thus we will never get inconsistent
state here. We can only lose time trying to synchronize it in code. Am I
not right?
The decrement operation is a read-modify-write cycle, it is possible for
2 CPUs to overlap their operations, ending up with a observable lost
decrement. Since they both end up reading the same initial value.
On IA32/x86 the "DEC" assembly instruction operation can be prefixed by
the "LOCK" instruction, this makes the CPU continue to assert memory bus
locking for the duration of the instruction so there is no way for CPU2
to perform a read access until CPU1 releases control of the memory bus
when it completes the instruction, this is effectively what atomic_dec()
enforces.
The amount of performance lost by using atomic_xxx() really is minimal,
with any luck it might only be that cache-line that remains locked not
the entire memory bus.
The decrement operation may be handled as load, decrement, store on
some architectures, so can be pre-empted by a different CPU.
There is no other way to handle it :) Memory itself can't perform
arithmetic operations, so the decrement always happens inside the ALU
inside the CPU.
It is true that non-SMP aware CPUs might maintain memory bus acquisition
during the 'decrement' (aka modify) phase of the operation since there
is no reason not to give it up as they are the only user of memory.
This becomes a performance bottleneck for any SMP capable CPU which has
a cache that can operate at full CPU clock speeds. As the 'decrement'
(aka modify) phase is going to require at least 1 clock cycle to perform
so why not let another CPU make use of the memory bus.
Also some hardware architectures (e.g. HP Alpha) have an unusual
memory model. One CPU may see memory updates in a different order from
another CPU. Software that relies on the updates being seen across all
CPUs must use the appropriate memory synchronisation instructions.
I don't know if these considerations apply to this code.
Memory update ordering applies when considering how 2 or more distinct
machine words are updated with respect to themselves when those updates
are observed from another CPU.
The example here is with concerns over a single machine word being
updated on SMP systems.
Darryl