On Wed, 30 Mar 2022 20:23:21 PDT (-0700), alistai...@gmail.com wrote:
On Thu, Mar 31, 2022 at 3:11 AM Idan Horowitz <idan.horow...@gmail.com> wrote:
On Wed, 30 Mar 2022 at 19:11, Palmer Dabbelt <pal...@dabbelt.com> wrote:
>
>
> Presumably you mean "revert" here? That might be the right way to go,
> just to avoid breaking users (even if we fix the kernel bug, it'll take
> a while to get everyone to update). That said, this smells like the
> sort of thing that's going to crop up at arbitrary times in dynamic
> systems so while a revert looks like it'd work around the boot issue we
> might be making more headaches for folks down the road.
>
The opposite in fact, I did not suggest to revert it, but rather undo
the revert (as Alistair already removed it from the apply-next tree),
since my original patch fixes buggy behaviour that is blocking the
testing of some embedded software on QEMU.
Ah, sorry -- the QEMU tree I was looking at still had the patch in
there, must have just been an old one.
So, this is a little tricky.
We want to apply the fix, but that will break current users.
Once the fix is merged into Linux we can apply it here. That should
hopefully be right at the start of the 7.1 QEMU development window,
which should give time for the fix to propagate into stable kernels
and not break too many people by the time QEMU is released.
If you think this is a Linux bug then that makes sense, but I think this
is a QEMU bug -- I sent a patch, not sure if it went through as it didn't
make it to lore.
I also think the bug will manifest without the TB exit patch, maybe in
single-step mode and definately if we happen to exit the TB at that
point for other reasons. Assuming my reasoning is correct in that
patch, we may also be hitting this as arbitrary corruption anywhere.
I'd started to write up a "QEMU errata" Linux patch for this, but then
convinced myself that just adding the sfence.vma was insufficient.