Lluís Vilanova писал 2015-09-04 16:00:
Mark Burton writes:
[...]
* What to do about icount?
What is the impact of multi-thread on icount? Do we need to disable
it
for MTTCG or can it be correct per-cpu? Can it be updated
lock-step?
We need some input from the guys that use icount the most.
That means Edgar. :)
Hi!
IMO it would be nice if we could run the cores in some kind of
lock-step
with a configurable amount of instructions that they can run ahead
of time (X).
For example, if X is 10000, every thread/core would checkpoint at
10000 insn boundaries and wait for other cores. Between these
checkpoints, the cores will not be in sync. We might need to
consider synchronizing at I/O accesses aswell to avoid weird
timing issues when reading counter registers for example.
Of course the devil will be in the details but an approach roughly
like that sounds useful to me.
And “works" in other domains.
Theoretically we dont need to sync at IO (Dynamic quantums), for most
systems
that have ’normal' IO its normally less efficient I believe. However,
the
trouble is, the user typically doesn’t know, and mucking about with
quantum
lengths, dynamic quantum switches etc is probably a royal pain in the
butt. And
if you dont set your quantum right, the thing will run really slowly
(or will
break)…
The choices are a rock or a hard place. Dynamic quantums risk to be
slow (you’ll
be forcing an expensive ’sync’ - all CPU’s will have to exit etc) on
each IO
access from each core…. not great. Syncing with host time (e.g. each
CPU tries
to sync with host clock as best it can) will fail when one or other
CPU can’t
keep up…. In the end you end up with leaving the user with a nice long
bit of
string and a message saying “hang yourself here”.
That price would not be paid when icount is disabled. Well, the code
complexity
price is always paid... I meant runtime :)
Then, I think this depends on what type of guarantees you require from
icount. I see two possible semantics:
* All CPUs are *exactly* synchronized at icount granularity
This means that every icount instructions everyone has to stop and
synchronize.
* All CPUs are *loosely* synchronized at icount granularity
You can implement it in a way that ensures that every cpu has *at
least*
reached a certain timestamp. So cpus can keep on running nonetheless.
Is the third possibility looks sane?
* All CPUs synchronize at shared memory operations.
When somebody tries to read/write shared memory, it should wait until
all others
will reach the same icount.
The downside is that the latter loses the ability for reproducible
runs, which
IMHO are useful. A more complex option is to merge both: icount sets
the
"synchronization granularity" and another parameter sets the maximum
delta
between cpus (i.e., set it to 0 to have the first option, and infinite
for the
second).
Pavel Dovgalyuk