On Nov 22 10:57, Aaron Lindsay wrote: > On Nov 21 18:22, Richard Henderson wrote: > > On 11/21/22 13:51, Alex Bennée wrote: > > > > > > Aaron Lindsay <aa...@os.amperecomputing.com> writes: > > > > > > > On Nov 15 22:36, Alex Bennée wrote: > > > > > Aaron Lindsay <aa...@os.amperecomputing.com> writes: > > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to > > > > > > NULL at the > > > > > > end of an instruction/TB's execution, so its not exactly clear to > > > > > > me how this > > > > > > is occurring. However, I suspect it may be relevant that we are > > > > > > calling > > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`. > > > > > > > > > > Hmm I'm going to have to remind myself about how this bit works. > > > > > > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is > > > > set for an instruction? Is it guaranteed it is reset by the end of the > > > > tb? > > > > > > It should be by the end of the instruction. See > > > inject_mem_disable_helper() which inserts TCG code to disable the > > > helpers. We also have plugin_gen_disable_mem_helpers() which should > > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is > > > why qemu_plugin_disable_mem_helpers() is only really concerned about > > > when we longjmp out of the loop. > > > > > > > If I were to put an assertion in cpu_tb_exec() just after the call > > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL > > > > there? > > > > > > Yes I think so. > > > > Indeed. > > Well, the good news is that if this is an assumption we're relying on, it is > now trivial to reproduce the problem! > > Compile some simple program (doesn't really matter, the issue gets triggered > early): > > $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple > > Make this change to cpu_tb_exec(): > > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > index 356fe348de..50a010327d 100644 > > --- a/accel/tcg/cpu-exec.c > > +++ b/accel/tcg/cpu-exec.c > > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int > > *tb_exit) > > > > qemu_thread_jit_execute(); > > ret = tcg_qemu_tb_exec(env, tb_ptr); > > + if (cpu->plugin_mem_cbs != NULL) { > > + g_assert_not_reached(); > > + } > > cpu->can_do_io = 1; > > /* > > * TODO: Delay swapping back to the read-write region of the TB > > And run: > > $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin > ./simple > > You should fairly quickly see something like: > > > [snip] > > 0, 0x5502814d04, 0xb4000082, "" > > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0 > > 0, 0x5502814d0c, 0xf1001c1f, "" > > ** > > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached > > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be > > reached > > When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs > pointer changing from one non-null value to another (which also seems to > indicate it is not being cleared between instructions). > > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is > getting optimized away, but not the code to set it in the first place?
Is there anyone who could help take a look at this from the code gen perspective? -Aaron