Am 29.06.2023 um 01:45 schrieb Andy Smith:
On Wed, Jun 28, 2023 at 09:40:52PM +0200, Paul Leiber wrote:
In the meantime, I have upgraded Dom0 to Debian Bookworm with
linux-image-amd64 6.1.27-1 and Xen 4.17. The issue persists, seemingly
unchanged.
In DomU dmesg, there are several "swiotlb buffer is full" entries while
failing to load the driver during the boot process:
I don't think you'll get better response here than you will from the
Xen project. You might want to try asking on xen-devel as opposed to
xen-users, though they will expect you to use their upstream source
and maybe build your own kernel as well, while attempting to
diagnose things.
Well, you did give me new information, so asking here was not in vain.
Going to xen-devel is indeed the next step.
You should be able to build an upstream 4.17.x Xen image (the xen.gz
file that goes into /boot) and boot that without altering anything
else on your system, so that may be useful for testing things out.
I don't do PCI passthrough but I did once have a driver that kept
complaining about "swiotlb buffer is full". At that time Xen devs
told me something like the driver was doing DMA differently because
the dom0 had a low amount of RAM and that if I gave the dom0 4GiB or
more RAM this would force the driver to do DMA in the expected way.
This did fix things for me with that driver.
That would be done with:
GRUB_CMDLINE_XEN="dom0_mem=4096M …"
in /etc/default/grub, but I see that you only have 4G memory total
in this hardware so maybe you could try:
GRUB_CMDLINE_XEN="dom0_mem=1024M,max:4096M …"
and see if that helps at all? (as far as I understand, the ,max: bit
tells the dom0 kernel that it may have that much memory at some
point, without actually allocating it)
Tried that, didn't help. Thanks anyway!
If that doesn't help, I am out of ideas, and again recommend
xen-devel.
Just to state the obvious, what's bothering me is that my setup was
working with kernel 5.10.0-21 and that it stopped working with kernel
5.10.178-3. So it seems that somewhere inbetween, changes have been
introduced that lead to the non-working state. If I am not wrong, then
that's called a regression, although one that seems to affect only a
very limited number of users (who's the other one?). In principle, it
should be possible to identify this change, but I lack the knowledge to
do so.