[marss86-devel] A few discoveries

Gunnar Ruthenberg Fri, 13 Dec 2013 07:29:02 -0800

Hello everybody,

while working with the "features" branch, I came across a few issues,which I've

also fixed. Many of these findings should also apply to the "master" branch.


I thought it would be wise to share my patches, but unfortunately, I cannot

provide a clean diff at this time, as the fixes are buried under a tonneof codefor what I'm actually doing (it's a master's thesis project), and I'mworking

under a fair bit of time pressure at the moment.

I did not bother to check if there have been patches similar to mine in the

meanwhile, and I also did not document my changes, so please bear withme and

some of the following information possibly being slightly inaccurate or
incomplete.

1) Switching back and forth between emulation and simulation

There was one major issue:

- Using the OOO core, only the thread performing the ptlcall to switch

simulation mode was guaranteed to continue in a consistent contextstate. Allother threads could be anywhere between SOM and EOM at the point of QEMUtakingover, thus, among many other things, calls to subroutines could havewritten tothe stack while the stack pointer or instruction pointer had not beenupdated

yet.

Luckily, there were the beginnings of code to handle that, andconnecting the

dots turned out to be relatively easy (compared to tracking down the issue).

Analogous to handle_interrupt_at_next_eom, the OOO ThreadContext classalso hasa member stop_at_next_eom, and a commit result COMMIT_RESULT_STOP. Touse those,a few more modifications had to be made, such as supplying a flag tostop asapto the cores and their individual threads when emitting the run-cyclesignal,and then having the threads stop at the next EOM, and the core return aflag to

the machine if it is permissible or not to switch to emulation now.

I also found a few minor issues, which might or might not havecontributed to

the switching problem:

- The memory hierarchy was not reset when switching to simulation, so many
outdated memory requests were still in flight, and the cache-lines also held

obsolete address information. There were no traces of ever consideringresetting

the memory hierarchy, but adding a lot of boilerplate reset functions in the
relevant classes worked fine.

- The VM clock and tick counter jumped in both directions when switching to
emulation. Since they are expected to be monotonous, the guest might, at the
least, falsely detect time-outs and stalls. Updating the timer offsets when
leaving simulation mode seemed like a good idea.

- The halt operation was a no-op. When implementing halting, the handling of
exceptions and interrupts of all CPUs had to be moved before calling

ptl_simulate() in sim_cpu_exec(), analogous to the original cpu_exec(),as thehandling code would not un-halt the simulated CPUs by itself, andswitching to

emulation right after handling the interrupts would let the virtual CPUs sit
idly while they actually have work, or exceptions to remain unhandled.

After fixing these issues, I could switch back and forth forth 10,000times in arow in a C program without any ill effects. I wanted to the the samewith ls in

a shell script as something like the ultimate test, but alas, haven't come
around to do that yet. However, I haven't seen the division overflow assert
failure and guest kernel stalls ever since.

2) TSX

- The member in_tsx_ of class TsxCache was not initialized to 0 in its
constructor. This caused occasional pipeline stalls after switching to
simulation mode.

- The member tsxMemoryBuffer of class ThreadContext had a cacheline-size whichwas not a power of two, and the code using it only worked correctly forloading

and storing 64-bit values. When performing updates smaller than that on

transactional data, parts of the stored 64-bit words were lost. Thanksto thelittle-endianness of the x86 architecture, the code to properly accesssub-wordsturned out to be a bit of a mess, but thank goodness all accesses arealignedaround these parts of the machine. Furthermore, the geometry of thisbuffer didnot match L1D's, so false sharing issues and other behaviour did divergefrom

the real machine.

3) Spurious assertion failures

In ooo-pipe.cpp, line 2035, "assert(physreg->data);" failed without any code

actually branching to 0. After looking at the ASF branch, changing theline to

"assert(physreg->data || isbarrier(uop.opcode));" seemed to do the trick.

4) Dynamic memory allocation

While not actually a bug, class MemoryRequest uses new and delete forits member history on every invocation of a memory request, along with afairly expensive STL stringbuf. This is contrary to what seems to be oneof ptlsim's design objectives, namely avoiding dynamic allocation (andSTL) during simulation altogether. It is also done entirely in vain whennot logging, which is probably the default production case. A simplecheck if config.loglevel is 0 or not helped to avoid unnecessary dynamicmemory allocation and feeding a stringbuf that is never read, and alsoseemed to slightly speed up simulation.

This is what I remember concerning the bug-fixes I've made so far. Ihope to be

able to provide a clean diff soon.


Regards,

gunnar.


P.S.: Since I do not seem to have the credentials necessary to access the

mailing list's HTTPS URL advertised on the MARSS site, please kindly addme to

the mailing list, thank you. :)


_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

[marss86-devel] A few discoveries

Reply via email to