Thanks for the info, Gabe... I didn't realize you had made it that far. Would it be worth making your code available somehow in case someone wants to build on it?
Gabe's email also points up a distinction I didn't make... there are really two ways you could leverage a faster emulation environment, one is as a new integrated gem5 CPU model as Gabe was doing, and the other is just to have "checkpoint interoperability" between two semi-independent systems, which is the route I was taking with my SimNow experimentation and what I was implicitly assuming in my earlier email. The former is more work but potentially more useful, as you could switch back and forth between the fast and slow models within a single simulation. The latter model requires you to fast forward to an interesting point in the fast model, dump a checkpoint (or a few checkpoints), then load that state up in a separate run of gem5. So it's not as elegant, but it's potentially less work and less disruptive to the existing code base. I don't object to the former, but I'm not convinced it's worth the additional work. In particular, even with a "sampled" simulation, once you get going you very likely will want to be keeping the caches warm even during the fast-forward phases, which might mean that you can't use the faster emulated environment anyway, or even if you can, that the speedup you get will be swamped by the cache simulation overhead. I think it also depends on where you're coming from... the "checkpoint interoperability" approach makes the most sense if you're trying to use an existing large code base like QEMU or SimNow, while the CPU model approach is more sensible for something like KVM which is really an interface and not a separate application. So it looks like Gabe and I agree that probably the most promising path would be to continue Gabe's work on building a KVM-based CPU model, even if we may not agree on what the second-best alternative is. Interesting about the timer... I wonder if they've added a way to turn that off? Seems like it should be optional to have KVM handle that directly. If someone did want to pursue that, it might even be worth trying to get a patch into KVM to control that. Steve On Mon, Mar 26, 2012 at 1:12 PM, Gabriel Michael Black < [email protected]> wrote: > I'd made some decent progress on using KVM as a CPU model. Execution got > decently far along, but then the problem I had when I stopped working on it > was that the timer Linux wanted to use was provided by KVM for performance > reasons, and it ran at actual speed while gem5 (m5 at the time) has its own > carefully controlled and usually much slower version of time. There may > have been other plumbing issues too like making sure interrupts were being > piped to the right places, but basically at a certain point into booting > things went crazy, Linux got upset and quit working. > > As far as how to hook binary translation into gem5, it would be nice, but > there are a few problems. First, as Steve said, internal state isn't > necessarily that portable between the other environments and gem5. I'm less > optimistic than he is as far as the CPU state, but we agree that devices > are a major issue. There have been attempts at doing this sort of thing, > but I don't think any of them worked out well enough to become a permanent > part of things. > > Another option would be to actually implement a binary translating CPU > model in gem5 which worked with everything else in the simulator by design, > rather than being bolted in after the fact. This would be a lot of work and > probably not be as good as other similar implementations just because a lot > of people have spent a lot of time on those. A gem5 version could probably > be decent, though, and would be better than not having anything. Also as > Steve said, it would be a *lot* of work. I'd want our ISA definitions to > somehow fit in with the interpreted and translated systems so that they'd > be consistent and we wouldn't have to separate implementations to maintain, > so those would have to be reworked as far as the underlying mechanism and > the descriptions themselves. That alone would be a fairly daunting task. > Then you'd have to also write the translation engine itself which would be > like rewriting qemu, basically. Actually hooking it into the rest of the > simulator would be more straightforward at that point. > > In general, I agree with Steve. This sort of thing could work out, but it > would be really difficult to pull off and especially to do it well enough > for it to be considered a real implementation and not just a sort of > working proof of concept. > > If you decide to do it anyway, my recommendation would be to either try to > hook KVM in as a CPU model or to implement binary translation from scratch > as a CPU model. KVM would probably be much more tractable in scope. No > matter what you do, be sure to discuss it with the dev list so we can help > you make the right decisions earlier rather than later so you don't have to > throw away a bunch of work. Also, you should avoid needing to change any > existing pieces of the simulator beyond what's absolutely necessary (ISA > descriptions are ok, for instance). That will keep your design cleaner, and > also avoid cluttering things up if/while your new work is in progress. > > Gabe > > > > Quoting Steve Reinhardt <[email protected]>: > > Thanks for your interest in improving gem5! >> >> The idea of doing binary translation to improve performance (particularly >> for functional fast-forwarding) has come up before, but we haven't crossed >> that bridge for several reasons: >> 1. Most of all, it's really really hard, and there are always plenty of >> other more pressing things to work on. >> 2. Our current ISA descriptions weren't set up with this in mind, so it >> would probably require reworking the ISA descriptions in addition to >> building the framework. >> 3. Other groups (like QEMU and AMD's SimNow) have already built binary >> translation tools that are way better than anything we would do. >> 4. For x86, at least, the idea of using hardware virtualization provides >> an >> alternative that could have even higher performance than binary >> translation. >> >> What we've generally been thinking of as a more desirable and achievable >> alternative would be to interoperate with another environment like QEMU, >> SimNow, or KVM so that you could run at high speed in one of these other >> tools, extract the system state, load it into gem5, and then run a >> detailed >> simulation from there. Gabe Black did a little exploration of KVM quite a >> while ago, but I don't think he got that far (correct me if I'm wrong, >> Gabe). I also did a little internal playing around with SimNow but >> nothing >> I can release. Other than that I don't know of anyone who's worked on >> this >> yet. >> >> Since the issues are pretty much the same, I'll use the term EE to refer >> to >> a high-speed emulated environment, whether it's QEMU, SimNow, KVM, or >> something else. >> >> In theory, it's pretty straightforward; architectural CPU and memory state >> is pretty well defined, and most of these systems have checkpoint/snapshot >> capability, so it's simply a matter of running in one of these EEs, saving >> a checkpoint, and loading it up into gem5. The big challenge really >> revolves around devices: the set of devices that gem5 supports doesn't >> necessarily intersect with those that these EEs support, and the internal >> state representation is guaranteed to be different. >> >> I think the best solution to the device problem is to find a way to use >> the >> *same* device models in both the EE and in gem5, either by grafting the >> EE's device models into gem5 or the other way around. For KVM, you'd have >> to use gem5's models, since KVM by itself has no device models. For other >> EEs, there are potential benefits to finding a way to port their device >> models into gem5, since I expect they have more models (and more complete >> models) than we do (certainly for SimNow I know that's true). >> >> However, a big potential downside of incorporating other device models >> is licensing. I know QEMU is GPL, which is problematic for us (since we >> use a BSD-based license, and that's very important to us given the number >> of companies involved with gem5). Anything that would contaminate gem5 >> with GPL is unacceptable. I haven't looked into QEMU enough to know if >> this is something that can be worked around or not. >> >> Also, while SimNow has a lot of appeal for those of us at AMD, I can see >> where people would prefer an open-source and multi-ISA solution. SimNow >> is >> probably more feasible than you might think, though, since there is a free >> binary version available ( >> http://developer.amd.com/**tools/simnow/pages/default.**aspx<http://developer.amd.com/tools/simnow/pages/default.aspx>), >> and we have >> contacts in the SimNow group to explore opening up additional internal >> interfaces etc. if that proves necessary. >> >> I think KVM might be the most appealing avenue; it does tie us even more >> to >> Linux than we are already, but that's the only major downside I see. It >> also doesn't support all our ISAs, but Wikipedia says it does support >> PowerPC in addition to x86, and there is an ARM port in the works ( >> http://systems.cs.columbia.**edu/projects/kvm-arm/<http://systems.cs.columbia.edu/projects/kvm-arm/>). >> I expect that x86+ARM >> covers the vast and growing majority of our user base. >> >> Just to be complete, I'll mention that I'm sure there are opportunities to >> improve the performance of the existing gem5 ISA simulation/emulation that >> are simpler and more feasible than doing binary translation in gem5, but I >> expect those opportunities are more like tens of percent speedup rather >> than the order(s?) of magnitude or so you'd probably get out of going to >> something like KVM. >> >> I'd really be glad to see something along these lines happen, and am happy >> to help to the extent I can. I'm also interested if some of the other >> developers have a different opinion or further insights. >> >> Steve >> >> On Sun, Mar 25, 2012 at 7:39 PM, Pablo Ortiz <[email protected]> >> wrote: >> >> Hello dev group, >>> >>> My group is looking at the possibility of improving the performance of >>> GEM5 for the purpose of simulating an Android environment. In QEMU, there >>> is a step performed during binary translation in which basic code blocks >>> are translated and cached to be executed to avoid the overhead of having >>> to >>> translate common, previously translated code blocks. Would such an >>> optimization be reasonably or doable or even sensible in the context of >>> GEM5. I would love to hear the thoughts of the mailing list. I would like >>> to thank, in advance, any who wish to respond to this email. >>> >>> Cheers, >>> El >>> ______________________________**_________________ >>> gem5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >>> >>> ______________________________**_________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> >> >> > > ______________________________**_________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/**listinfo/gem5-dev<http://m5sim.org/mailman/listinfo/gem5-dev> > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
