(moved here from fpc-devel)

On 30 Jun 2011, at 11:31, Hans-Peter Diettrich wrote:

Vinzent Höfler schrieb:

When it's up to every coder, to insert explicit synchronization whenever required, how to determine the places where explicit code is required?
By careful analysis. Although there may exist tools which detect potentially un-synchronised accesses to shared variables, there will be no tool that
inserts synchronisation code automatically for you.

I wouldn't like such tools, except the compiler itself :-(

Data races can only be detected without false positives/negatives via dynamic analysis, by intercepting all memory accesses and all synchronization operations.

This will report all potential data races on the execution path of that particular run (i.e., not only the cases where some value actually did get lost due to timing during that particular run, but all of the ones that could have happened during that run), but on the other hand it's also *only* for that particular run (so potential data races on different execution paths will not be caught).

You can read about the principle here: http://escher.elis.ugent.be/publ/Edocs/DOC/P104_116.pdf (and it contains further references to other papers that go into even more detail).

If a program would contain nothing but FPC-compiled code it would in theory be possible to modify the RTL and the compiler to insert all required analysis code, but I would not consider that to be worth the effort. The reason is that it would only work for programs that do not contain assembler code and which do not perform any system calls (system calls can also read/write memory).

Consider the shareable bi-linked list, where insertion requires code like this:
 list.Lock; //prevent concurrent access
 ... //determine affected list elements
 new.prev := prev; //prev must be guaranteed to be valid
 new.next := next;
 prev.next := new;
 next.prev := new;
 list.Unlock;
What can we expect from the Lock method/instruction - what kind of synchronizaton (memory barrier) can, will or should it provide?

Use a critical section as provided by the FPC RTL, and all necessary synchronization will be performed (including the required memory barriers). Manual memory barriers are only required if you use lock- free multithreading (which I would not recommend unless you are already an absolute expert in multi-threaded programming) or when writing your own synchronization primitives.

And to repeat what I mentioned before: the program state cannot become unsynchronized if a thread switches from one core to another in the middle of a critical section. If you (I don't mean you personally here) think that you are observing something like that, you have another bug. Using the standard synchronization primitives is all you need to write correct multi-threaded programs.

My understanding of a *full* cache synchronization would slow down not only the current core and cache, but also all other caches?

If so, would it help to enclose above instructions in e.g.
 Synchronized begin
   update the links...
 end;
so that the compiler can make all memory references (at least reads) occur read/write-through, inside such a code block?

Even if that were possible, disabling caching does in no way solve any kind of data race. It would slow down programs without any gain whatsoever as far as thread safety is concerned.

As Nikolai mentioned, you need a critical section (or a lock-free equivalent specific to an individual case, but that cannot be automatically generated -- and even the first published papers with manually constructed lock-free algorithms turned out to contain errors afterwards).

After these considerations I'd understand that using Interlocked instructions in the code would ensure such read/write-through, but merely as a side effect - they also lock the bus for every instruction,

They only did so on ancient x86 processors. Nowadays they don't anymore. They also never did on most other architectures.

We need a documentation of the FPC specific means of cache synchronization, with their guaranteed effects on every target[1].

Such documentation is not compiler-specific, but architecture- specific. The FPC documentation is not a tutorial on computer architecture, nor a tutorial on multi-threaded programming.

FPC exports routines that for all existing kinds of memory barriers (ReadBarrier, ReadDependencyBarrier, ReadWriteBarrier and WriteBarrier), as well as for various synchronization primitives (criticalsection/mutex, event/conditional signal, ...). That is the job of the compiler/RTL. Language extensions are of course always possible, but that is unrelated to "documentation of the FPC specific means of cache synchronization, with their guaranteed effects on every target" (language constructs by definition would have the same effect everywhere).

Tutorials and architecture documentation are a separate department.

Furthermore we need concrete examples[2], how (to what extent) it's required to use these special instructions/procedures, in examples like above.

Yes, many people could use tutorials about computer architecture, operating system principles and multi-threading. But again, those are completely unrelated to the development of FPC itself.

A good starter would probably be Module 4 of 
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-823-computer-system-architecture-fall-2005/lecture-notes/


Jonas

_______________________________________________
fpc-other maillist  -  fpc-other@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-other

Reply via email to