Hello everybody,

while working with the "features" branch, I came across a few issues, which I've
also fixed. Many of these findings should also apply to the "master" branch.

I thought it would be wise to share my patches, but unfortunately, I cannot
provide a clean diff at this time, as the fixes are buried under a tonne of code for what I'm actually doing (it's a master's thesis project), and I'm working
under a fair bit of time pressure at the moment.

I did not bother to check if there have been patches similar to mine in the
meanwhile, and I also did not document my changes, so please bear with me and
some of the following information possibly being slightly inaccurate or
incomplete.

1) Switching back and forth between emulation and simulation

There was one major issue:

- Using the OOO core, only the thread performing the ptlcall to switch
simulation mode was guaranteed to continue in a consistent context state. All other threads could be anywhere between SOM and EOM at the point of QEMU taking over, thus, among many other things, calls to subroutines could have written to the stack while the stack pointer or instruction pointer had not been updated
yet.

Luckily, there were the beginnings of code to handle that, and connecting the
dots turned out to be relatively easy (compared to tracking down the issue).
Analogous to handle_interrupt_at_next_eom, the OOO ThreadContext class also has a member stop_at_next_eom, and a commit result COMMIT_RESULT_STOP. To use those, a few more modifications had to be made, such as supplying a flag to stop asap to the cores and their individual threads when emitting the run-cycle signal, and then having the threads stop at the next EOM, and the core return a flag to
the machine if it is permissible or not to switch to emulation now.

I also found a few minor issues, which might or might not have contributed to
the switching problem:

- The memory hierarchy was not reset when switching to simulation, so many
outdated memory requests were still in flight, and the cache-lines also held
obsolete address information. There were no traces of ever considering resetting
the memory hierarchy, but adding a lot of boilerplate reset functions in the
relevant classes worked fine.

- The VM clock and tick counter jumped in both directions when switching to
emulation. Since they are expected to be monotonous, the guest might, at the
least, falsely detect time-outs and stalls. Updating the timer offsets when
leaving simulation mode seemed like a good idea.

- The halt operation was a no-op. When implementing halting, the handling of
exceptions and interrupts of all CPUs had to be moved before calling
ptl_simulate() in sim_cpu_exec(), analogous to the original cpu_exec(), as the handling code would not un-halt the simulated CPUs by itself, and switching to
emulation right after handling the interrupts would let the virtual CPUs sit
idly while they actually have work, or exceptions to remain unhandled.

After fixing these issues, I could switch back and forth forth 10,000 times in a row in a C program without any ill effects. I wanted to the the same with ls in
a shell script as something like the ultimate test, but alas, haven't come
around to do that yet. However, I haven't seen the division overflow assert
failure and guest kernel stalls ever since.

2) TSX

- The member in_tsx_ of class TsxCache was not initialized to 0 in its
constructor. This caused occasional pipeline stalls after switching to
simulation mode.

- The member tsxMemoryBuffer of class ThreadContext had a cache line-size which was not a power of two, and the code using it only worked correctly for loading
and storing 64-bit values. When performing updates smaller than that on
transactional data, parts of the stored 64-bit words were lost. Thanks to the little-endianness of the x86 architecture, the code to properly access sub-words turned out to be a bit of a mess, but thank goodness all accesses are aligned around these parts of the machine. Furthermore, the geometry of this buffer did not match L1D's, so false sharing issues and other behaviour did diverge from
the real machine.

3) Spurious assertion failures

In ooo-pipe.cpp, line 2035, "assert(physreg->data);" failed without any code
actually branching to 0. After looking at the ASF branch, changing the line to
"assert(physreg->data || isbarrier(uop.opcode));" seemed to do the trick.

4) Dynamic memory allocation

While not actually a bug, class MemoryRequest uses new and delete for its member history on every invocation of a memory request, along with a fairly expensive STL stringbuf. This is contrary to what seems to be one of ptlsim's design objectives, namely avoiding dynamic allocation (and STL) during simulation altogether. It is also done entirely in vain when not logging, which is probably the default production case. A simple check if config.loglevel is 0 or not helped to avoid unnecessary dynamic memory allocation and feeding a stringbuf that is never read, and also seemed to slightly speed up simulation.


This is what I remember concerning the bug-fixes I've made so far. I hope to be
able to provide a clean diff soon.


Regards,

gunnar.


P.S.: Since I do not seem to have the credentials necessary to access the
mailing list's HTTPS URL advertised on the MARSS site, please kindly add me to
the mailing list, thank you. :)


_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to