On Nov 22 10:57, Aaron Lindsay wrote:
> On Nov 21 18:22, Richard Henderson wrote:
> > On 11/21/22 13:51, Alex Bennée wrote:
> > > 
> > > Aaron Lindsay <aa...@os.amperecomputing.com> writes:
> > > 
> > > > On Nov 15 22:36, Alex Bennée wrote:
> > > > > Aaron Lindsay <aa...@os.amperecomputing.com> writes:
> > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to 
> > > > > > NULL at the
> > > > > > end of an instruction/TB's execution, so its not exactly clear to 
> > > > > > me how this
> > > > > > is occurring. However, I suspect it may be relevant that we are 
> > > > > > calling
> > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > > 
> > > > > Hmm I'm going to have to remind myself about how this bit works.
> > > > 
> > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > > tb?
> > > 
> > > It should be by the end of the instruction. See
> > > inject_mem_disable_helper() which inserts TCG code to disable the
> > > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > > when we longjmp out of the loop.
> > > 
> > > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > > there?
> > > 
> > > Yes I think so.
> > 
> > Indeed.
> 
> Well, the good news is that if this is an assumption we're relying on, it is
> now trivial to reproduce the problem!
> 
> Compile some simple program (doesn't really matter, the issue gets triggered
> early):
> 
> $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
> 
> Make this change to cpu_tb_exec():
> 
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 356fe348de..50a010327d 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int 
> > *tb_exit)
> > 
> >      qemu_thread_jit_execute();
> >      ret = tcg_qemu_tb_exec(env, tb_ptr);
> > +    if (cpu->plugin_mem_cbs != NULL) {
> > +        g_assert_not_reached();
> > +    }
> >      cpu->can_do_io = 1;
> >      /*
> >       * TODO: Delay swapping back to the read-write region of the TB
> 
> And run:
> 
> $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin 
> ./simple
> 
> You should fairly quickly see something like:
> 
> > [snip]
> > 0, 0x5502814d04, 0xb4000082, ""
> > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> > 0, 0x5502814d0c, 0xf1001c1f, ""
> > **
> > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be 
> > reached
> 
> When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
> pointer changing from one non-null value to another (which also seems to
> indicate it is not being cleared between instructions).
> 
> Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> getting optimized away, but not the code to set it in the first place?

Is there anyone who could help take a look at this from the code gen
perspective?

-Aaron

Reply via email to