Re: How to clone CPUState in a new thread?
On Thu, Nov 7, 2019 at 1:50 PM Michael Goffioul wrote: > On Thu, Nov 7, 2019 at 7:53 AM Peter Maydell > wrote: > >> On Thu, 7 Nov 2019 at 12:46, Michael Goffioul >> wrote: >> > Side question: is this the right mailing list to discuss this, or is >> there a more appropriate one? >> >> You're more likely to find actual QEMU developers reading qemu-devel; >> qemu-discuss has fewer contributors and they tend to be more >> likely to be end-users or interested in end-user questions >> rather than internals. >> >> As for your original question, if you're creating a new >> thread and want the new thread's TCG CPU state to match >> that of the old thread then the linux-user 'clone' call >> is what you want to follow. Duplicating the existing CPU >> state is a bit of a hacky codepath but it does work. >> If you have a thread that doesn't want to follow the >> existing CPU state of some other creating thread but >> instead is started with a fixed entirely-known state, >> then the code in linux-user/main.c which starts the first >> thread of the process might be a better model. >> >> Overall, though, QEMU's code is not designed to be embedded >> into some other runtime environment in the way you're doing >> it, so I would expect there to be pain involved in trying >> to get it to work (especially surrounding threads, new >> processes and signals). >> > > Thanks for the input. I've feeling that the problem is related to the > stack setup for the emulated code, or better, the lack thereof. I've > successfully run JNI code in a separate java thread, while still originally > loaded and tcg-processed in the main thread, for things like integer > computation or even creating and returning a Java string. However something > like sending a string to logcat makes the thing crash. The primary suspect > is the TLS, which, if I'm not mistaken, is located somewhere in the > thread's stack. Would that make sense? > It turned out the problem was really the lack of TLS storage initialization in the emulated code. In particular, Android P and lower uses one of the 8 TLS slots defined in bionic to manage errno, meaning that any system call would most likely lead to a crash. With some trickery, I manage to initialize the emulated side of the thread properly and the code then runs fine.
Re: How to clone CPUState in a new thread?
On Thu, Nov 7, 2019 at 7:53 AM Peter Maydell wrote: > On Thu, 7 Nov 2019 at 12:46, Michael Goffioul > wrote: > > Side question: is this the right mailing list to discuss this, or is > there a more appropriate one? > > You're more likely to find actual QEMU developers reading qemu-devel; > qemu-discuss has fewer contributors and they tend to be more > likely to be end-users or interested in end-user questions > rather than internals. > > As for your original question, if you're creating a new > thread and want the new thread's TCG CPU state to match > that of the old thread then the linux-user 'clone' call > is what you want to follow. Duplicating the existing CPU > state is a bit of a hacky codepath but it does work. > If you have a thread that doesn't want to follow the > existing CPU state of some other creating thread but > instead is started with a fixed entirely-known state, > then the code in linux-user/main.c which starts the first > thread of the process might be a better model. > > Overall, though, QEMU's code is not designed to be embedded > into some other runtime environment in the way you're doing > it, so I would expect there to be pain involved in trying > to get it to work (especially surrounding threads, new > processes and signals). > Thanks for the input. I've feeling that the problem is related to the stack setup for the emulated code, or better, the lack thereof. I've successfully run JNI code in a separate java thread, while still originally loaded and tcg-processed in the main thread, for things like integer computation or even creating and returning a Java string. However something like sending a string to logcat makes the thing crash. The primary suspect is the TLS, which, if I'm not mistaken, is located somewhere in the thread's stack. Would that make sense?
Re: How to clone CPUState in a new thread?
On Thu, 7 Nov 2019 at 12:46, Michael Goffioul wrote: > Side question: is this the right mailing list to discuss this, or is there a > more appropriate one? You're more likely to find actual QEMU developers reading qemu-devel; qemu-discuss has fewer contributors and they tend to be more likely to be end-users or interested in end-user questions rather than internals. As for your original question, if you're creating a new thread and want the new thread's TCG CPU state to match that of the old thread then the linux-user 'clone' call is what you want to follow. Duplicating the existing CPU state is a bit of a hacky codepath but it does work. If you have a thread that doesn't want to follow the existing CPU state of some other creating thread but instead is started with a fixed entirely-known state, then the code in linux-user/main.c which starts the first thread of the process might be a better model. Overall, though, QEMU's code is not designed to be embedded into some other runtime environment in the way you're doing it, so I would expect there to be pain involved in trying to get it to work (especially surrounding threads, new processes and signals). thanks -- PMM
Re: How to clone CPUState in a new thread?
On Thu, Nov 7, 2019 at 7:38 AM Michael Goffioul wrote: > > > On Thu, Nov 7, 2019 at 4:57 AM Jakob Bohm wrote: > >> On 07/11/2019 01:44, Michael Goffioul wrote: >> > Hi, >> > >> > I'm working on a project that wants to replace houdini (ARM-to-x86 >> > translation layer for Android from Intel) with a free open-source >> > implementation. I'm trying to leverage qemu user-mode to achieve that, >> > but it requires code changes to allow executing dynamically loaded >> > functions instead of running a single executable. >> > >> Basic question: Isn't the qemu user-mode emulator already able to run a >> "single executable" that loads DLLs, creates dynamic code etc. in the >> emulated instruction set? >> >> The obvious exception would be to skip the ARM instruction set >> intermediary >> when translating Dalvik byte code from .dex files. >> >> From this perspective, emulated ARM thread creation would be just letting >> qemu emulate the ARM code that would be called, including letting qemu >> emulate >> the system calls such as "clone". >> >> A special case would be if houdini allows direct calls between ARM and x86 >> .so files. I don't know if qemu-user has the ability to expose host >> native DLLs to emulated code. >> > > Basically Houdini implements the native bridge interface, as defined here: > https://android.googlesource.com/platform/system/core/+/refs/tags/android-10.0.0_r11/libnativebridge/include/nativebridge/native_bridge.h#172 > It allows running Android APK that contains ARM-compiled native/JNI code > on an Android-x86 OS. It does so by taking care of loading the ARM .so JNI > files are providing trampoline stubs to the Android runtime JVM. It does > not expose the host native .so to the emulated code, instead it provides a > set of ARM-compiled core libraries from Android: it is actually very > similar to running dynamically linked code in qemu-user with a chroot'ed > ARM environment. Actual interaction with the native host is happening > mostly/only through binder socket. > > To initialize the qemu-user engine, I make it load a custom ARM .so/ELF > file that uses the Android linker (from the ARM pseudo chroot environment) > as interpreter. This allows me to delegate all dynamic linking aspects. > > So far, the emulation is working fine and I'm able to run simple > ARM-compiled apps on Android-x86, even if the native code spawns new > threads. My current (hopefully last) problem is when a Java thread, > different than the one that initialized the qemu engine) is trying to run > native code. I need to setup a new CPUState/CPUArchState instance for this > Java thread. > Side question: is this the right mailing list to discuss this, or is there a more appropriate one?
Re: How to clone CPUState in a new thread?
On Thu, Nov 7, 2019 at 4:57 AM Jakob Bohm wrote: > On 07/11/2019 01:44, Michael Goffioul wrote: > > Hi, > > > > I'm working on a project that wants to replace houdini (ARM-to-x86 > > translation layer for Android from Intel) with a free open-source > > implementation. I'm trying to leverage qemu user-mode to achieve that, > > but it requires code changes to allow executing dynamically loaded > > functions instead of running a single executable. > > > Basic question: Isn't the qemu user-mode emulator already able to run a > "single executable" that loads DLLs, creates dynamic code etc. in the > emulated instruction set? > > The obvious exception would be to skip the ARM instruction set intermediary > when translating Dalvik byte code from .dex files. > > From this perspective, emulated ARM thread creation would be just letting > qemu emulate the ARM code that would be called, including letting qemu > emulate > the system calls such as "clone". > > A special case would be if houdini allows direct calls between ARM and x86 > .so files. I don't know if qemu-user has the ability to expose host > native DLLs to emulated code. > Basically Houdini implements the native bridge interface, as defined here: https://android.googlesource.com/platform/system/core/+/refs/tags/android-10.0.0_r11/libnativebridge/include/nativebridge/native_bridge.h#172 It allows running Android APK that contains ARM-compiled native/JNI code on an Android-x86 OS. It does so by taking care of loading the ARM .so JNI files are providing trampoline stubs to the Android runtime JVM. It does not expose the host native .so to the emulated code, instead it provides a set of ARM-compiled core libraries from Android: it is actually very similar to running dynamically linked code in qemu-user with a chroot'ed ARM environment. Actual interaction with the native host is happening mostly/only through binder socket. To initialize the qemu-user engine, I make it load a custom ARM .so/ELF file that uses the Android linker (from the ARM pseudo chroot environment) as interpreter. This allows me to delegate all dynamic linking aspects. So far, the emulation is working fine and I'm able to run simple ARM-compiled apps on Android-x86, even if the native code spawns new threads. My current (hopefully last) problem is when a Java thread, different than the one that initialized the qemu engine) is trying to run native code. I need to setup a new CPUState/CPUArchState instance for this Java thread.
Re: How to clone CPUState in a new thread?
On 07/11/2019 01:44, Michael Goffioul wrote: Hi, I'm working on a project that wants to replace houdini (ARM-to-x86 translation layer for Android from Intel) with a free open-source implementation. I'm trying to leverage qemu user-mode to achieve that, but it requires code changes to allow executing dynamically loaded functions instead of running a single executable. Basic question: Isn't the qemu user-mode emulator already able to run a "single executable" that loads DLLs, creates dynamic code etc. in the emulated instruction set? The obvious exception would be to skip the ARM instruction set intermediary when translating Dalvik byte code from .dex files. From this perspective, emulated ARM thread creation would be just letting qemu emulate the ARM code that would be called, including letting qemu emulate the system calls such as "clone". A special case would be if houdini allows direct calls between ARM and x86 .so files. I don't know if qemu-user has the ability to expose host native DLLs to emulated code. In a nutshell, using ideas from unicorn-engine, I've enhanced CPUARMState with a stop address. Whenever this address is encountered in the translator, it generates a YIELD exception, which then makes the cpu_loop to exit. It works fine for simple cases, but I'm having trouble with multi-threading aspect. Threads created from the native/ARM side do seem to work properly. The problem is when a new Java thread (not created from native/ARM) attempts to execute native code. The QEMU engine has been initialized in the main thread, but new Java threads do not have access to thread-local variable thread_cpu. I've tried (maybe naively) to recreate what the clone syscall is doing to create a new CPUState/CPUArchState object, usable from the new thread, but executing any ARM code quickly lead to a crash. I suppose I'm doing something wrong, or missing something to properly initiale a new cpu. I'm hoping that someone could help me solve this problem. I've attached the current QEMU patch I'm using, most of the Android glue layer is in linux-user/main.c. It contains a set of utility functions that my Android native bridge implementation is using. Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. http://www.wisemo.com Transformervej 29, 2860 Soborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
How to clone CPUState in a new thread?
Hi, I'm working on a project that wants to replace houdini (ARM-to-x86 translation layer for Android from Intel) with a free open-source implementation. I'm trying to leverage qemu user-mode to achieve that, but it requires code changes to allow executing dynamically loaded functions instead of running a single executable. In a nutshell, using ideas from unicorn-engine, I've enhanced CPUARMState with a stop address. Whenever this address is encountered in the translator, it generates a YIELD exception, which then makes the cpu_loop to exit. It works fine for simple cases, but I'm having trouble with multi-threading aspect. Threads created from the native/ARM side do seem to work properly. The problem is when a new Java thread (not created from native/ARM) attempts to execute native code. The QEMU engine has been initialized in the main thread, but new Java threads do not have access to thread-local variable thread_cpu. I've tried (maybe naively) to recreate what the clone syscall is doing to create a new CPUState/CPUArchState object, usable from the new thread, but executing any ARM code quickly lead to a crash. I suppose I'm doing something wrong, or missing something to properly initiale a new cpu. I'm hoping that someone could help me solve this problem. I've attached the current QEMU patch I'm using, most of the Android glue layer is in linux-user/main.c. It contains a set of utility functions that my Android native bridge implementation is using. qemu-android.diff.bz2 Description: application/bzip