On reflection it's just not that hard to write a C invlpg that calls putcr3 if it's running on a 386. We use LOCK INCL to do ref counts so that it works on the 386, we never use BSWAP, and we use CR0.WP by checking the cpuid.
For actually running a system, VMware and Qemu are pretty much neck and neck and Bochs is far behind. For debugging a system, Bochs is slightly ahead of Qemu and VMware is far behind. Bochs has a reasonable debugger interface that knows about 16-bit mode and segments and offsets and page tables and everything. Qemu is much faster but has no debugger. You can attach to it with gdb but gdb only knows about 32-bit mode user-level sorts of things, so usually I have to set a break point using gdb and then dump register info back at qemu's console. I lost count of the number of times one or the other got me out of a jam while I was writing the new VM code or the realmode code that made vesa-in-the-kernel possible. Russ