On Sat, 14 Jan 2023, Akihiko Odaki wrote:
On 2023/01/13 22:43, BALATON Zoltan wrote:
On Thu, 5 Jan 2023, BALATON Zoltan wrote:
Hello,
I got reports from several users trying to run AmigaOS4 on sam460ex on
Apple silicon Macs that they get missing graphics that I can't reproduce
on x86_64. With help from the users who get the problem we've narrowed it
down to the following:
It looks like that data written to the sm501's ram in
qemu/hw/display/sm501.c::sm501_2d_operation() is then not seen from
sm501_update_display() in the same file. The sm501_2d_operation() function
is called when the guest accesses the emulated card so it may run in a
different thread than sm501_update_display() which is called by the ui
backend but I'm not sure how QEMU calls these. Is device code running in
iothread and display update in main thread? The problem is also
independent of the display backend and was reproduced with both -display
cocoa and -display sdl.
We have confirmed it's not the pixman routines that sm501_2d_operation()
uses as the same issue is seen also with QEMU 4.x where pixman wasn't used
and with all versions up to 7.2 so it's also not some bisectable change in
QEMU. It also happens with --enable-debug so it doesn't seem to be related
to optimisation either and I don't get it on x86_64 but even x86_64 QEMU
builds run on Apple M1 with Rosetta 2 show the problem. It also only seems
to affect graphics written from sm501_2d_operation() which AmigaOS4 uses
extensively but other OSes don't and just render graphics with the vcpu
which work without problem also on the M1 Macs that show this problem with
AmigaOS4. Theoretically this could be some missing syncronisation which is
something ARM and PPC may need while x86 doesn't but I don't know if this
is really the reason and if so where and how to fix it). Any idea what may
cause this and what could be a fix to try?
Any idea anyone? At least some explanation if the above is plausible or if
there's an option to disable the iothread and run everyting in a single
thread to verify the theory could help. I've got reports from at least 3
people getting this problem but I can't do much to fix it without some
help.
(Info on how to run it is here:
http://zero.eik.bme.hu/~balaton/qemu/amiga/#amigaos
but AmigaOS4 is not freely distributable so it's a bit hard to reproduce.
Some Linux X servers that support sm501/sm502 may also use the card's 2d
engine but I don't know about any live CDs that readily run on sam460ex.)
Thank you,
BALATON Zoltan
Sorry, I missed the email.
Indeed the ui backend should call sm501_update_display() in the main thread,
which should be different from the thread calling sm501_2d_operation().
However, if I understand it correctly, both of the functions should be called
with iothread lock held so there should be no race condition in theory.
But there is an exception: memory_region_snapshot_and_clear_dirty() releases
iothread lock, and that broke raspi3b display device:
https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=wn+k8dqneb_...@mail.gmail.com/T/
It is unexpected that gfx_update() callback releases iothread lock so it may
break things in peculiar ways.
Peter, is there any change in the situation regarding the race introduced by
memory_region_snapshot_and_clear_dirty()?
For now, to workaround the issue, I think you can create another mutex and
make the entire sm501_2d_engine_write() and sm501_update_display() critical
sections.
Interesting thread but not sure it's the same problem so this workaround
may not be enough to fix my issue. Here's a video posted by one of the
people who reported it showing the problem on M1 Mac:
https://www.youtube.com/watch?v=FDqoNbp6PQs
and here's how it looks like on other machines:
https://www.youtube.com/watch?v=ML7-F4HNFKQ
There are also videos showing it running on RPi 4 and G5 Mac without this
issue so it seems to only happen on Apple Silicon M1 Macs. What's strange
is that graphics elements are not just delayed which I think should happen
with missing thread synchronisation where the update callback would miss
some pixels rendered during it's running but subsequent update callbacks
would eventually draw those, woudn't they? Also setting full_update to 1
in sm501_update_display() callback to disable dirty tracking does not fix
the problem. So it looks like as if sm501_2d_operation() running on one
CPU core only writes data to the local cache of that core which
sm501_update_display() running on other core can't see, so maybe some
cache synchronisation is needed in memory_region_set_dirty() or if that's
already there maybe I should call it for all changes not only those in the
visible display area? I'm still not sure I understand the problem and
don't know what could be a fix for it so anything to test to identify the
issue better might also bring us closer to a solution.
Regards,
BALATON Zoltan