On 02/21/18 08:16, Peter Corlett via cctalk wrote: > > A programmer of modest ability should be able to knock up a simple > switch()-based step-by-step CPU emulation in a few hours. This is analogous to > a simple microcoded CPU and the performance will suck. Yeah, please don't code this way. Big huge case statements really suck.
Build a function that can decode opcodes and then dispatches to an array of functions via pointers instead. Build tables in the emulator that both help the opcode decoder (and accelerate it), and also have fields in it that hold opcode cycle timings, opcode sizes (so you can increment the IR - instruction Register/Program Counter by the size of the opcode). You could also cache the results of the opcode decoder, but you'll need to do a bunch of memory management for that, and detect when a certain place in memory has been overwritten with new code. However, this can yield very useful information if you also add a bunch of extra fields to your instruction cache such as what register values were used before, what CPU cycle the last time it ran, whether that opcode accessed I/O or RAM, what MMU context it was in, etc. These extra bits of recoded "flight data" are gold when debugging or reverse engineering the running OS/apps later. For example on CPUs which access I/O via memory space, if you see a MOVE to a register in a disassembly, you don't really know what it's doing unless you see the value in that register at the time it ran, and then you can think, Aha! 0xfc0020 - that's this I/O register on this specific device here on this bus, so it can help you locate I/O drivers in the kernel. Now, you can also do something else that's interesting, if you reverse engineer the driver a bit and find it's entry and exit points, basically how it's called, and what it returns, you can trap that in your emulator, so when your emulator's CPU calls that block of code, you don't execute that native code, and rather, do whatever that driver does natively and return the right values on the stack/registers/target memory/etc. And this can speed up your emulator quite a bit. i.e. say you have some code that loads a file from disk. Rather than emulating several thousand opcodes, you can replace the whole thing with a block read from a file and return back to the caller and skip all the bit-banging. (But do update the CPU cycle count). If say, the firmware insists on checking the RAM by writing thousands of different patterns over many megs of memory, you can detect that in your emulator and skip it. No need to make the user wait 5 minutes for the machine to warm up. Speed up the boot process. (Well you can make these things optional "hacks" that the user can enable or disable.) > Making it *cycle-accurate* involves deep understanding of the emulated CPU's > internal architecture. If part of the platform requires cycle-accurate timing > for bit-banging some hardware device, you're going to need this. Hopefully this is already documented, if not, having schematics might help here, but would need lots of work. (Assuming a multi IC CPU as opposed to discrete CPU which would likely have great docs already.) > > Making it *fast* also involves being an expert in compiler backends for the > target architecture, because this requires decompiling and then recompiling > the > code on the fly. > > ... and that's the easy bit. Now you get to emulate the hardware. > > Yeah, it's a huge job, but I think in the end, it's totally worth it. Just takes a lot of commitment and free time, and a love for the machine you're trying to emulate.