On Fri, Mar 11, 2022 at 01:04:33PM +0100, Paolo Bonzini wrote: > On 3/11/22 10:27, Stefan Hajnoczi wrote: > > > Not quite voluntarily, but I noticed I had to add one 0 to make them run > > > for > > > a decent amount of time. So yeah, it's much faster than siglongjmp. > > That's a nice first indication that performance will be good. I guess > > that deep coroutine_fn stacks could be less efficient with stackless > > coroutines compared to ucontext, but the cost of switching between > > coroutines (enter/yield) will be lower with stackless coroutines. > > Note that right now I'm not placing the coroutine_fn stack on the heap, it's > still allocated from a contiguous area in virtual address space. The > contiguous allocation is wrapped by coroutine_stack_alloc and > coroutine_stack_free, so it's really easy to change them to malloc and free. > > I also do not have to walk up the whole call stack on coroutine_fn yields, > because calls from one coroutine_fn to the next are tail calls; in exchange > for that, I have more indirect calls than if the code did > > if (next_call() == COROUTINE_YIELD) { > return COROUTINE_YIELD; > } > > For now the choice was again just the one that made the translation easiest. > > Today I also managed to implement a QEMU-like API on top of C++ coroutines: > > CoroutineFn<int> return_int() { > co_await qemu_coroutine_yield(); > co_return 30; > } > > CoroutineFn<void> return_void() { > co_await qemu_coroutine_yield(); > } > > CoroutineFn<void> co(void *) { > co_await return_void(); > printf("%d\n", co_await return_int()) > co_await qemu_coroutine_yield(); > } > > int main() { > Coroutine *f = qemu_coroutine_create(co, NULL); > printf("--- 0\n"); > qemu_coroutine_enter(f); > printf("--- 1\n"); > qemu_coroutine_enter(f); > printf("--- 2\n"); > qemu_coroutine_enter(f); > printf("--- 3\n"); > qemu_coroutine_enter(f); > printf("--- 4\n"); > } > > The runtime code is absurdly obscure; my favorite bit is > > Yield qemu_coroutine_yield() > { > return Yield(); > } > > :) However, at 200 lines of code it's certainly smaller than a > source-to-source translator. It might be worth investigating a bit more. > Only files that define or use a coroutine_fn (which includes callers of > qemu_coroutine_create) would have to be compiled as C++.
Unless I'm misunderstanding what you mean, "define a coroutine_fn" is a very large number of functions/files $ git grep coroutine_fn | wc -l 806 $ git grep -l coroutine_fn | wc -l 132 Dominated by the block layer of course, but tentacles spreading out into alot of other code. Feels like identifying all callers would be tedious/unpleasant enough, that practically speaking we would have to just compile all of QEMU as C++. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|