Thanks a lot Tyler for more detailed explanation. This helps me to understand the accuracy of the Simulator & QEMU's execution. I was actually bit curious to know the QEMU internals especially cache architecture, coherency mechanisms and SMP implementation of QEMU but unfortunately I could not find much in net. Is source code walk thru the only way? Any pointers to documentation for these are appreciated.
Thanks for your time, karthik On Wed, Apr 1, 2015 at 6:26 PM, <[email protected]> wrote: > Atomics, locking, coherency, etc. are all *implemented* correctly in qemu... > > They're just not *modeled* accurately. You'll probably get unrealistic > timings and behavior because of how the VCPUs are scheduled (one at a > time, round-robin, for several instructions at a time). A VCPU will spin > for tens to hundreds? of instructions at a time -- all the while other > VCPUs are effectively suspended. OTOH, MARSS keeps all the cores in > lockstep, cycle for cycle. > > AFAIK, qemu doesn't even have I/D caches because it doesn't need them; > everything gets dumped straight to memory because it's just an emulator > and the x86 is pretty transparent as far as software architected cache, > TLB, etc. support goes. > > Tyler > >> Hello Tyler, >> >> Based on your previous response, I was bit inquisitive on couple of >> things about the emulator. It would be great if you could throw some >> light on these >> >> 1) Are atomic instructions modelled properly in emulator? >> 2) When you say the coherency logic is not modelled well in emulator >> do you mean the cache coherency is not accurate and the caches are >> incoherent?, in which case I should see bugs in the parallel program >> which I do not currently see. So I believe they emulate them correctly >> but they may not do all the coherency steps a normal hardware does but >> use some tweaks to bring about the same coherency. Am I correct? I >> could not get much info about the emulator's coherency online. >> >> Thanks for your time, >> karthik >> >> >> On Mon, Mar 30, 2015 at 11:40 AM, <[email protected]> wrote: >>> Sorry, I misread your second question -- >>> >>> You should definitely run your algorithms through the simulator. The >>> emulator does NOT model the coherency logic well. >>> >>> Avadh got some speed up by running the multi-threaded version of MARSS: >>> http://marssandbeyond.blogspot.com/2012/01/multi-threaded-simulation-in-marss.html >>> >>> I'm not sure the state of that branch, but it's worth a try if things >>> are >>> running too slowly for you. >>> >>> Tyler >>> >>>> There are some patches to qemu that have an effect even when running in >>>> just plain emulation mode. MARSS leverages qemu to do some page table >>>> book-keeping that I believe runs even when in pure emulation mode, for >>>> example. If you're curious, you can grep for MARSS_QEMU in the qemu/ >>>> directory to see such changes. That being said, these changes should >>>> not >>>> have that much of an effect on qemu's performance when running in >>>> emulation mode... have you tried running a stock qemu (without KVM, >>>> just >>>> TCG?) >>>> >>>> Regarding lock-contention, the research community will absolutely >>>> accept >>>> your work. MARSS models the coherency logic between CPUs very >>>> accurately >>>> (and it's configurable). If you want to be especially crafty, you could >>>> use the DRAMSim2 plugin to model the RAMs with high accuracy as well, >>>> but >>>> you're probably more concerned with the coherency simulation (which is >>>> provided by the default configuration). >>>> >>>> Tyler >>>> >>>>> Hi, >>>>> >>>>> I am trying to use MARSS for my research work on lock contention >>>>> issues on parallel programs running on future many-core processors. >>>>> When I tried to compile MARSS for 32 cores and run my parallel >>>>> programs, I find it to take a lot of time. But when I just emulate >>>>> (using the default QEMU available) instead of switching to simulation, >>>>> obviously I could run my parallel programs faster and could simulate >>>>> the lock contentions. I have few questions from these observations for >>>>> which I look for clarifications: >>>>> >>>>> 1) When the MARSS is running in emulated mode is it just another QEMU? >>>>> or is there any difference? >>>>> 2) Since I am able to reproduce my lock contention problem using >>>>> emulation(& the simulator being too slow for large core counts) I am >>>>> thinking of working with it to test my algorithms. Will the research >>>>> community accept the results obtained from an emulator? Kindly let me >>>>> know. >>>>> >>>>> Thanks for your time, >>>>> karthik >>>>> >>>>> _______________________________________________ >>>>> http://www.marss86.org >>>>> Marss86-Devel mailing list >>>>> [email protected] >>>>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> http://www.marss86.org >>>> Marss86-Devel mailing list >>>> [email protected] >>>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel >>>> >>> >>> >> > > _______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
