(moved here from fpc-devel)
On 30 Jun 2011, at 11:31, Hans-Peter Diettrich wrote:
Vinzent Höfler schrieb:
When it's up to every coder, to insert explicit synchronization
whenever required, how to determine the places where explicit code
is required?
By careful analysis. Although there may exist tools which detect
potentially
un-synchronised accesses to shared variables, there will be no tool
that
inserts synchronisation code automatically for you.
I wouldn't like such tools, except the compiler itself :-(
Data races can only be detected without false positives/negatives via
dynamic analysis, by intercepting all memory accesses and all
synchronization operations.
This will report all potential data races on the execution path of
that particular run (i.e., not only the cases where some value
actually did get lost due to timing during that particular run, but
all of the ones that could have happened during that run), but on the
other hand it's also *only* for that particular run (so potential data
races on different execution paths will not be caught).
You can read about the principle here: http://escher.elis.ugent.be/publ/Edocs/DOC/P104_116.pdf
(and it contains further references to other papers that go into
even more detail).
If a program would contain nothing but FPC-compiled code it would in
theory be possible to modify the RTL and the compiler to insert all
required analysis code, but I would not consider that to be worth the
effort. The reason is that it would only work for programs that do not
contain assembler code and which do not perform any system calls
(system calls can also read/write memory).
Consider the shareable bi-linked list, where insertion requires code
like this:
list.Lock; //prevent concurrent access
... //determine affected list elements
new.prev := prev; //prev must be guaranteed to be valid
new.next := next;
prev.next := new;
next.prev := new;
list.Unlock;
What can we expect from the Lock method/instruction - what kind of
synchronizaton (memory barrier) can, will or should it provide?
Use a critical section as provided by the FPC RTL, and all necessary
synchronization will be performed (including the required memory
barriers). Manual memory barriers are only required if you use lock-
free multithreading (which I would not recommend unless you are
already an absolute expert in multi-threaded programming) or when
writing your own synchronization primitives.
And to repeat what I mentioned before: the program state cannot become
unsynchronized if a thread switches from one core to another in the
middle of a critical section. If you (I don't mean you personally
here) think that you are observing something like that, you have
another bug. Using the standard synchronization primitives is all you
need to write correct multi-threaded programs.
My understanding of a *full* cache synchronization would slow down
not only the current core and cache, but also all other caches?
If so, would it help to enclose above instructions in e.g.
Synchronized begin
update the links...
end;
so that the compiler can make all memory references (at least reads)
occur read/write-through, inside such a code block?
Even if that were possible, disabling caching does in no way solve any
kind of data race. It would slow down programs without any gain
whatsoever as far as thread safety is concerned.
As Nikolai mentioned, you need a critical section (or a lock-free
equivalent specific to an individual case, but that cannot be
automatically generated -- and even the first published papers with
manually constructed lock-free algorithms turned out to contain errors
afterwards).
After these considerations I'd understand that using Interlocked
instructions in the code would ensure such read/write-through, but
merely as a side effect - they also lock the bus for every
instruction,
They only did so on ancient x86 processors. Nowadays they don't
anymore. They also never did on most other architectures.
We need a documentation of the FPC specific means of cache
synchronization, with their guaranteed effects on every target[1].
Such documentation is not compiler-specific, but architecture-
specific. The FPC documentation is not a tutorial on computer
architecture, nor a tutorial on multi-threaded programming.
FPC exports routines that for all existing kinds of memory barriers
(ReadBarrier, ReadDependencyBarrier, ReadWriteBarrier and
WriteBarrier), as well as for various synchronization primitives
(criticalsection/mutex, event/conditional signal, ...). That is the
job of the compiler/RTL. Language extensions are of course always
possible, but that is unrelated to "documentation of the FPC specific
means of cache synchronization, with their guaranteed effects on every
target" (language constructs by definition would have the same effect
everywhere).
Tutorials and architecture documentation are a separate department.
Furthermore we need concrete examples[2], how (to what extent) it's
required to use these special instructions/procedures, in examples
like above.
Yes, many people could use tutorials about computer architecture,
operating system principles and multi-threading. But again, those are
completely unrelated to the development of FPC itself.
A good starter would probably be Module 4 of
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-823-computer-system-architecture-fall-2005/lecture-notes/
Jonas
_______________________________________________
fpc-other maillist - fpc-other@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-other