Hi Peter, Vadim,

Great to know the QEMU team has thought about this before.  I concede that
my idea might not fit nicely into QEMU's design architecture, it likely
would be an entirely different accelerator or standalone emulator,  and
even then, it would be a hit-or-miss for users as far as whether it would
actually help over the slowness of full TCG emulation.  Like I said, I've
thought about doing something like this, but not sure if/when I actually
would get around to doing it (most likely only when it's already too late!)

FYI - I tried my troubled application with QEMU *User Emulation* of x86_64
on x86_64, and it worked better, but nevertheless crashed in a x86_64
systemcall with *SIGABORT*.  I did not use -L at all, and I'm not clear if
that option has much usefulness when you are emulating the host
architecture?  The same application worked fine on the same machine under
QEMU System Emulation, so maybe there's still hope? (BTW - I found
performance best under the Slackware-based Porteus Linux Distribution, but
awful and unusable under the Ubuntu-based Bodhi Linux Distribution -
choosing the right distribution really makes a difference!)

Furthermore on User Mode Emulation, I'm wondering, since the program is
going to be entirely TCG accelerated anyway, if I might be better off
trying to run a different architecture build of the application (ex: ARM)
rather than using the same architecture?  In that case, all the ARM shared
libraries and system dependencies would neee to be stored under a sort of
ARM chroot pointed to by the -L argument?

Much appreciation for all your help and comments!


CP

On Tue, Aug 13, 2024, 5:53 AM Peter Maydell <peter.mayd...@linaro.org>
wrote:

> On Mon, 12 Aug 2024 at 18:47, Chris Parker <ch...@parkerfamily.name>
> wrote:
> > FYI - I have been interested for some time in making an accelerator that
> would run in a sort-of hybrid KVM mode, where host-supported instructions
> are executed by the processor, but unsupported functions are provided by
> something like the TCG. How feasible that is depends on how new and
> different the missing instructions are.  For example, this first came up
> when I wanted to run a x86 32-bit program with SSE2 instructions on a
> Pentium III which only had SSE instructions; that would be doable in theory
> since SSE2 only adds new processor instructions, but not any new registers
> (if it did, that would be a lot more complicated or even impossible to
> support in this way).  My original idea was to handle the ILLEGAL
> INSTRUCTION processor exception and examine the offending instruction to
> see if it was covered in the TCG implementation, and if so execute the TCG
> and then resume the program, which in theory is totally possible. Now I am
> facing a similar issue running x64 64-bit SSE4.1 instructions on a 64-bit
> processor that only supports up to SSE2, so it's like deja vu! I don't know
> if/when I'll ever get around to coding something like this, but as you can
> see, the problem isn't going away, and demand for this sort of thing is
> likely to grow in the coming years as applications and operating systems
> continue to disregard old hardware and begin to use newer processor
> features.
>
>
> Yes, this kind of idea is one that's come up before. It's
> certainly in theory possible, but QEMU as it stands isn't really
> designed to be able to do this -- the choice of accelerator
> is an all-or-nothing one, with no thought of "use TCG only
> for this next instruction or two and then do something else".
> If you do it by catching the illegal-instruction exception
> this also limits you to only new CPU features which add new
> instructions that previously would cause an exception and not
> any that change any of the behaviour of existing instructions.
>
> If you want an example of using this approach with something
> other than QEMU:
>
> https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/emulating-sve-on-armv8-using-dynamorio-and-armie
> discusses the "ArmIE" tool which is based on DynamoRIO.
> (DynamoRIO is an open source framework: https://dynamorio.org/
> and ArmIE a closed-source tool built using it.)
> I think because DynamoRIO is an instrumentation framework
> it's much more suited to this kind of "leave most instructions
> the way they are but when you see XYZ do something else"
> design than QEMU is (QEMU's emulation always converts to
> an intermediate representation, does register allocation
> and optimisation and then emits code for that, whether the
> host architecture is the same as the guest or different).
>
> thanks
> -- PMM
>

Reply via email to