Mark Burton writes: [...] >>>> * What to do about icount? >>>> >>>> What is the impact of multi-thread on icount? Do we need to disable it >>>> for MTTCG or can it be correct per-cpu? Can it be updated lock-step? >>>> >>>> We need some input from the guys that use icount the most. >>> >>> That means Edgar. :) >> >> Hi! >> >> IMO it would be nice if we could run the cores in some kind of lock-step >> with a configurable amount of instructions that they can run ahead >> of time (X). >> >> For example, if X is 10000, every thread/core would checkpoint at >> 10000 insn boundaries and wait for other cores. Between these >> checkpoints, the cores will not be in sync. We might need to >> consider synchronizing at I/O accesses aswell to avoid weird >> timing issues when reading counter registers for example. >> >> Of course the devil will be in the details but an approach roughly >> like that sounds useful to me.
> And “works" in other domains. > Theoretically we dont need to sync at IO (Dynamic quantums), for most systems > that have ’normal' IO its normally less efficient I believe. However, the > trouble is, the user typically doesn’t know, and mucking about with quantum > lengths, dynamic quantum switches etc is probably a royal pain in the butt. > And > if you dont set your quantum right, the thing will run really slowly (or will > break)… > The choices are a rock or a hard place. Dynamic quantums risk to be slow > (you’ll > be forcing an expensive ’sync’ - all CPU’s will have to exit etc) on each IO > access from each core…. not great. Syncing with host time (e.g. each CPU tries > to sync with host clock as best it can) will fail when one or other CPU can’t > keep up…. In the end you end up with leaving the user with a nice long bit of > string and a message saying “hang yourself here”. That price would not be paid when icount is disabled. Well, the code complexity price is always paid... I meant runtime :) Then, I think this depends on what type of guarantees you require from icount. I see two possible semantics: * All CPUs are *exactly* synchronized at icount granularity This means that every icount instructions everyone has to stop and synchronize. * All CPUs are *loosely* synchronized at icount granularity You can implement it in a way that ensures that every cpu has *at least* reached a certain timestamp. So cpus can keep on running nonetheless. The downside is that the latter loses the ability for reproducible runs, which IMHO are useful. A more complex option is to merge both: icount sets the "synchronization granularity" and another parameter sets the maximum delta between cpus (i.e., set it to 0 to have the first option, and infinite for the second). Cheers, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth