Bug#1028251: New Patch (Was: Re: Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64)
On 1/13/2023 9:08 PM, Chuck Zmudzinski wrote: > On 1/13/23 6:59 PM, Hans van Kranenburg wrote: > > Hi, > > > > On 1/13/23 22:45, Chuck Zmudzinski wrote: > >> On 1/13/23 7:39 AM, Marek Marczykowski-Górecki wrote: > >>> On Fri, Jan 13, 2023 at 12:58:29AM -0500, Chuck Zmudzinski wrote: > >>>> On 1/11/2023 10:58 PM, Chuck Zmudzinski wrote: > >>>>> On 1/9/23 12:55 PM, Hans van Kranenburg wrote: > >>>>>> Hi! > > [...] > > Yolo style cutting out lines here... > > [...] > >>>> > > >> > >> Perhaps this is an opportunity for you to try to fix 922033 again. > >> I see it has been sitting there for a few years now. Let's see > >> what Hans thinks. > > > > Yeah, well, so, the thing here is... > > > > When Debian started to package Xen (thanks! Bastian, in 200X), the > > upstream init scripts were copy pasted, and adjusted to have the ability > > to have different Hypervisor-ABI-incompatible versions installed at the > > same time. Also, this is related to the collection of Makefile patches > > we carry around to have ABI-incompatible stuff end up in a directory > > like /usr/lib/xen-4.14/ and /usr/lib/xen-4.17/ ! > > That is a nice feature of the Xen Debian packages, to have the ability > to manage guests on different versions of the hypervisor. > > > > > What does this mean? Well, in the most basic sense it means that you > > could apt-get (dist-)upgrade and then still be able to xl shutdown a > > domU afterwards before doing reboot, because it will choose the right > > tools which match with the ABI of the *now* running hypervisor instead > > of being left with a dumpster fire, which in the end causes you to shout > > curse words and cause you to have to go to the machine and hold the > > power button for 5 seconds to force power it off. > > > > This is the thing about where you upgrade from Xen 4.14 to Xen 4.17 > > during the upgrade from Debian 11/Bullseye to Debian 12/Bookworm, it > > will allow you, if booting the whole new thing is a huge failure, to > > reset the computer, and in grub, choose to use the previous Xen (and > > possibly do that in combination with previous Debian linux kernel) and > > then have a system where you again at least can start your domUs again > > *) and first have a good rest, night of sleep before starting to dig > > into what's going wrong. > > > > So, this is exactly the same way of doing stuff like how you can also > > reboot back into the previous Linux kernel (ABI-compatible) one during a > > system upgrade, even if you're not using Xen at all! > > > > I like this very much. This is the kind of thing that helps admins of > > systems that have just local disks and a few domUs. Like, the case where > > you support some non-profit organization with their server stuff running > > on donated hardware. (Yes, I also do some of those, I do!) And, in case > > something does fail (there could always be something like a misbehaving > > mpt3sas card in the hardware or anything that no one else spotted yet), > > the admin does not have to end up in total panic mode after doing the > > upgrade on a Friday afternoon lying upside down inside a broom closet, > > but they can just at least recover from the situation and have something > > that's running again, and then a day later, or 2 or 3 days or a week > > later return on another planned moment to fix it, after asking around. > > > > Upstream Xen stuff doesn't have anything like that. > > > > But, they actually look at us, and they think, ooh, this is actually > > nice, we should have that also by default. > > > > The fact that we have this changed/altered/divergent init scripts in > > Debian is the main reason that we cannot just enable systemd things > > which will put upstream whatever on the system. > > I understand the problem here. > > > > > So, what could we do about this? > > > > The project plan (that could be drafted on an A4 paper) could look like, > > gather around all distro maintainers of Linux distro's that are shipping > > Xen, and then search for a 'Project owner', which we totally need to be > > someone that is actually employed at a company that actually cares about > > getting the results of this. "Totally need to be someone that is actually employed at a company." I am curious about that statement. Has Debian given up on the idea that members of the FLOSS community can band together and solve a problem like this without corpo
Bug#1028251: New Patch (Was: Re: Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64)
On 1/13/23 6:59 PM, Hans van Kranenburg wrote: > Hi, > > On 1/13/23 22:45, Chuck Zmudzinski wrote: >> On 1/13/23 7:39 AM, Marek Marczykowski-Górecki wrote: >>> On Fri, Jan 13, 2023 at 12:58:29AM -0500, Chuck Zmudzinski wrote: >>>> On 1/11/2023 10:58 PM, Chuck Zmudzinski wrote: >>>>> On 1/9/23 12:55 PM, Hans van Kranenburg wrote: >>>>>> Hi! > [...] > Yolo style cutting out lines here... > [...] >>>> >>>> Regarding the systemd files causing ftbfs, this explains it: >>>> >>>> https://salsa.debian.org/xen-team/debian-xen/-/blob/master/m4/systemd.m4#L119 >>>> >>>> and this: >>>> >>>> https://salsa.debian.org/xen-team/debian-xen/-/blob/master/tools/configure.ac#L480 >>>> >>>> The comments indicate that using AX_AVAILABLE_SYSTEMD() will >>>> by default enable systemd if systemd development files are on the >>>> build system, and AX_ALLOW_SYSTEMD() means --enable-systemd >>>> must explicitly be passed to tools/configure to enable it. Upstream >>>> uses the former, so build systems with systemd development files >>>> by default will ftbfs because that produces missing files that dh_missing >>>> in debian/rules does not like. >>>> >>>> So the reason there is ftbfs on my system is that my system has >>>> the systemd development package installed. >>> >>> By the way, maybe a better fix would be to pass --enable-systemd, add >>> libsystemd-dev >>> build-dep and list them in the package? They might require patching to >>> support Debian-specific upgrade machinery, though... >>> >>> Not installing xendriverdomain.service is one of things missing for >>> driver domains support >>> (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=922033). >>> >> >> Hi Marek, >> >> I wouldn't be against fixing it that way. In fact, I would prefer >> that Debian packaged Xen with full support for native systemd units. >> I am willing to wait until if/when the package maintainers have >> full systemd support in the Xen packages. >> >> Perhaps this is an opportunity for you to try to fix 922033 again. >> I see it has been sitting there for a few years now. Let's see >> what Hans thinks. > > Yeah, well, so, the thing here is... > > When Debian started to package Xen (thanks! Bastian, in 200X), the > upstream init scripts were copy pasted, and adjusted to have the ability > to have different Hypervisor-ABI-incompatible versions installed at the > same time. Also, this is related to the collection of Makefile patches > we carry around to have ABI-incompatible stuff end up in a directory > like /usr/lib/xen-4.14/ and /usr/lib/xen-4.17/ ! That is a nice feature of the Xen Debian packages, to have the ability to manage guests on different versions of the hypervisor. > > What does this mean? Well, in the most basic sense it means that you > could apt-get (dist-)upgrade and then still be able to xl shutdown a > domU afterwards before doing reboot, because it will choose the right > tools which match with the ABI of the *now* running hypervisor instead > of being left with a dumpster fire, which in the end causes you to shout > curse words and cause you to have to go to the machine and hold the > power button for 5 seconds to force power it off. > > This is the thing about where you upgrade from Xen 4.14 to Xen 4.17 > during the upgrade from Debian 11/Bullseye to Debian 12/Bookworm, it > will allow you, if booting the whole new thing is a huge failure, to > reset the computer, and in grub, choose to use the previous Xen (and > possibly do that in combination with previous Debian linux kernel) and > then have a system where you again at least can start your domUs again > *) and first have a good rest, night of sleep before starting to dig > into what's going wrong. > > So, this is exactly the same way of doing stuff like how you can also > reboot back into the previous Linux kernel (ABI-compatible) one during a > system upgrade, even if you're not using Xen at all! > > I like this very much. This is the kind of thing that helps admins of > systems that have just local disks and a few domUs. Like, the case where > you support some non-profit organization with their server stuff running > on donated hardware. (Yes, I also do some of those, I do!) And, in case > something does fail (there could always be something like a misbehaving > mpt3sas card in the hardware or anything that no one else spotted yet), > the admin does not have to end up in total panic
Bug#1028251: [Pkg-xen-devel] Bug#1028251: New Patch (Was: Re: Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64)
On 1/13/23 7:39 AM, Marek Marczykowski-Górecki wrote: > On Fri, Jan 13, 2023 at 12:58:29AM -0500, Chuck Zmudzinski wrote: >> On 1/11/2023 10:58 PM, Chuck Zmudzinski wrote: >> > On 1/9/23 12:55 PM, Hans van Kranenburg wrote: >> > > Hi! >> > > >> > > On 09/01/2023 18:44, Chuck Zmudzinski wrote: >> > >> Control: tag -1 + moreinfo >> > >> >> > >> thanks >> > >> >> > >> On 1/9/23 8:09 AM, Hans van Kranenburg wrote: >> > >>> Hi Chuck, >> > >>> >> > >>> On 1/8/23 23:18, Chuck Zmudzinski wrote: >> > >>>> [...] >> > >>>> >> > >>>> The build failed: >> > >>>> >> > >>>>debian/rules override_dh_missing >> > >>>> make[1]: Entering directory '/home/chuckz/sources-sid/xen/xen-4.17.0' >> > >>>> dh_missing --list-missing >> > >>>> dh_missing: warning: usr/lib/modules-load.d/xen.conf exists in >> > >>>> debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/proc-xen.mount exists in >> > >>>> debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/xen-init-dom0.service >> > >>>> exists in debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: >> > >>>> usr/lib/systemd/system/xen-qemu-dom0-disk-backend.service exists in >> > >>>> debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/xen-watchdog.service >> > >>>> exists in debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/xenconsoled.service >> > >>>> exists in debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/xendomains.service exists >> > >>>> in debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/xendriverdomain.service >> > >>>> exists in debian/tmp but is not installed to anywhere >> > >>>> dh_missing: warning: usr/lib/systemd/system/xenstored.service exists >> > >>>> in debian/tmp but is not installed to anywhere >> > >>> >> > >>> I cannot reproduce this error here locally and the CI build also >> > >>> succeeds: >> > >>> >> > >>> https://salsa.debian.org/xen-team/debian-xen/-/pipelines/481577 >> > >> >> > >> I thought I had a fairly clean sid install, but I think the problem >> > >> on my system could be caused by some obscure grandfathered in >> > >> setting because the sid I am using was updated from all the way back to >> > >> an original install of jessie many years ago... >> > >> >> > >> It might be time for me to refresh my sid with a clean installation. >> > >> >> > >> Out of curiosity and if you have time, can you answer a couple of >> > >> question if you know the answer? >> > >> >> > >> 1. Do the builds on a clean environment produce the missing files >> > >> listed in my build? >> > > >> > > No, after my local package build, there's no such things in there: >> > > >> > > ~/build/xen/debian-xen/debian/tmp/usr/lib m (master) 1-$ ll >> > > total 0 >> > > drwxr-xr-x 1 knorrie knorrie 110 Jan 8 23:51 debug >> > > drwxr-xr-x 1 knorrie knorrie 2048 Jan 8 23:50 x86_64-linux-gnu >> > > drwxr-xr-x 1 knorrie knorrie 20 Jan 8 23:51 xen-4.17 >> > > >> > >> >> > >> 2. Are those systemd service files installed anywhere in the xen >> > >> binary packages, either in arch=x86_64 packages or for the arch=all >> > >> packages such as xen-utils-common? >> > > >> > > No, they are not: >> > > >> > > https://packages.debian.org/search?searchon=contents&keywords=xenconsoled.service&mode=path&suite=unstable&arch=any >> > > >> > >> If you don't know the answer to these questions I will investigate >> > >> myself to find the answers, so you can work on more important things. >
Bug#1028251: New Patch (Was: Re: [Pkg-xen-devel] Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64)
On 1/11/2023 10:58 PM, Chuck Zmudzinski wrote: > On 1/9/23 12:55 PM, Hans van Kranenburg wrote: > > Hi! > > > > On 09/01/2023 18:44, Chuck Zmudzinski wrote: > >> Control: tag -1 + moreinfo > >> > >> thanks > >> > >> On 1/9/23 8:09 AM, Hans van Kranenburg wrote: > >>> Hi Chuck, > >>> > >>> On 1/8/23 23:18, Chuck Zmudzinski wrote: > >>>> [...] > >>>> > >>>> The build failed: > >>>> > >>>>debian/rules override_dh_missing > >>>> make[1]: Entering directory '/home/chuckz/sources-sid/xen/xen-4.17.0' > >>>> dh_missing --list-missing > >>>> dh_missing: warning: usr/lib/modules-load.d/xen.conf exists in > >>>> debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/proc-xen.mount exists in > >>>> debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/xen-init-dom0.service exists > >>>> in debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: > >>>> usr/lib/systemd/system/xen-qemu-dom0-disk-backend.service exists in > >>>> debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/xen-watchdog.service exists > >>>> in debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/xenconsoled.service exists > >>>> in debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/xendomains.service exists in > >>>> debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/xendriverdomain.service > >>>> exists in debian/tmp but is not installed to anywhere > >>>> dh_missing: warning: usr/lib/systemd/system/xenstored.service exists in > >>>> debian/tmp but is not installed to anywhere > >>> > >>> I cannot reproduce this error here locally and the CI build also succeeds: > >>> > >>> https://salsa.debian.org/xen-team/debian-xen/-/pipelines/481577 > >> > >> I thought I had a fairly clean sid install, but I think the problem > >> on my system could be caused by some obscure grandfathered in > >> setting because the sid I am using was updated from all the way back to > >> an original install of jessie many years ago... > >> > >> It might be time for me to refresh my sid with a clean installation. > >> > >> Out of curiosity and if you have time, can you answer a couple of > >> question if you know the answer? > >> > >> 1. Do the builds on a clean environment produce the missing files > >> listed in my build? > > > > No, after my local package build, there's no such things in there: > > > > ~/build/xen/debian-xen/debian/tmp/usr/lib m (master) 1-$ ll > > total 0 > > drwxr-xr-x 1 knorrie knorrie 110 Jan 8 23:51 debug > > drwxr-xr-x 1 knorrie knorrie 2048 Jan 8 23:50 x86_64-linux-gnu > > drwxr-xr-x 1 knorrie knorrie 20 Jan 8 23:51 xen-4.17 > > > >> > >> 2. Are those systemd service files installed anywhere in the xen > >> binary packages, either in arch=x86_64 packages or for the arch=all > >> packages such as xen-utils-common? > > > > No, they are not: > > > > https://packages.debian.org/search?searchon=contents&keywords=xenconsoled.service&mode=path&suite=unstable&arch=any > > > >> If you don't know the answer to these questions I will investigate > >> myself to find the answers, so you can work on more important things. > >> > >>> > >>> How are you building the packages? In a clean build environment, using > >>> for example sbuild or pbuilder, or in an environment where unrelated > >>> other build dependencies could be present, that are not included in the > >>> xen list, but maybe 'wake up and do something' if they're present? > >> > >> As I said, I am building on a sid install that might have some > >> stuff grandfathered in from old releases going back to jessie. > >> I also might have some stale stuff around from my private builds > >> of the traditional device model available from xen that is not > >> part of the Debian packages. I will investigate these possible causes. > >> > >> I use
Bug#1028557: general: The Debian Social Contract (DSC) is meaningless
Package: general Severity: normal Dear Maintainer, It is a bug that Debian considers the DSC so important, yet, the concept of a contract is totally meaningless outside of the context of a legal system where the obligations and rights that arise from the terms of the contract can be enforced. Proposed Fix: Replace the DSC with a disclaimer that admits the document formerly known as the Debian Social Contract is totally meaningless because it cannot be legally enforced and refers the reader to the actual software licenses of the software in the distribution that *can* be enforced in a legal system. Thanks
Bug#1028251: [Pkg-xen-devel] Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64
On 1/9/23 12:55 PM, Hans van Kranenburg wrote: > Hi! > > On 09/01/2023 18:44, Chuck Zmudzinski wrote: >> Control: tag -1 + moreinfo >> >> thanks >> >> On 1/9/23 8:09 AM, Hans van Kranenburg wrote: >>> Hi Chuck, >>> >>> On 1/8/23 23:18, Chuck Zmudzinski wrote: >>>> [...] >>>> >>>> The build failed: >>>> >>>>debian/rules override_dh_missing >>>> make[1]: Entering directory '/home/chuckz/sources-sid/xen/xen-4.17.0' >>>> dh_missing --list-missing >>>> dh_missing: warning: usr/lib/modules-load.d/xen.conf exists in debian/tmp >>>> but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/proc-xen.mount exists in >>>> debian/tmp but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/xen-init-dom0.service exists >>>> in debian/tmp but is not installed to anywhere >>>> dh_missing: warning: >>>> usr/lib/systemd/system/xen-qemu-dom0-disk-backend.service exists in >>>> debian/tmp but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/xen-watchdog.service exists in >>>> debian/tmp but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/xenconsoled.service exists in >>>> debian/tmp but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/xendomains.service exists in >>>> debian/tmp but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/xendriverdomain.service exists >>>> in debian/tmp but is not installed to anywhere >>>> dh_missing: warning: usr/lib/systemd/system/xenstored.service exists in >>>> debian/tmp but is not installed to anywhere >>> >>> I cannot reproduce this error here locally and the CI build also succeeds: >>> >>> https://salsa.debian.org/xen-team/debian-xen/-/pipelines/481577 >> >> I thought I had a fairly clean sid install, but I think the problem >> on my system could be caused by some obscure grandfathered in >> setting because the sid I am using was updated from all the way back to >> an original install of jessie many years ago... >> >> It might be time for me to refresh my sid with a clean installation. >> >> Out of curiosity and if you have time, can you answer a couple of >> question if you know the answer? >> >> 1. Do the builds on a clean environment produce the missing files >> listed in my build? > > No, after my local package build, there's no such things in there: > > ~/build/xen/debian-xen/debian/tmp/usr/lib m (master) 1-$ ll > total 0 > drwxr-xr-x 1 knorrie knorrie 110 Jan 8 23:51 debug > drwxr-xr-x 1 knorrie knorrie 2048 Jan 8 23:50 x86_64-linux-gnu > drwxr-xr-x 1 knorrie knorrie 20 Jan 8 23:51 xen-4.17 > >> >> 2. Are those systemd service files installed anywhere in the xen >> binary packages, either in arch=x86_64 packages or for the arch=all >> packages such as xen-utils-common? > > No, they are not: > > https://packages.debian.org/search?searchon=contents&keywords=xenconsoled.service&mode=path&suite=unstable&arch=any > >> If you don't know the answer to these questions I will investigate >> myself to find the answers, so you can work on more important things. >> >>> >>> How are you building the packages? In a clean build environment, using >>> for example sbuild or pbuilder, or in an environment where unrelated >>> other build dependencies could be present, that are not included in the >>> xen list, but maybe 'wake up and do something' if they're present? >> >> As I said, I am building on a sid install that might have some >> stuff grandfathered in from old releases going back to jessie. >> I also might have some stale stuff around from my private builds >> of the traditional device model available from xen that is not >> part of the Debian packages. I will investigate these possible causes. >> >> I use debuild as a frontend to dpkg-buildpackage to build the packages. > > Yes. So (I'm not entirely sure how it works, but as example, just making > something up here): After doing something else first, you might end up > with a system that has for example dh-systemd-yolo-all-the-things-helper > installed. And, it might be that only it being present means that the > package build process changes. It might even be a 'feature' of that > helper... "jus
Bug#1028251: [Pkg-xen-devel] Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64
On 1/9/23 12:55 PM, Hans van Kranenburg wrote: > Hi! > > On 09/01/2023 18:44, Chuck Zmudzinski wrote: > ... > This is why it is very much recommended to build the packages using > something like sbuild, so that you can be sure that every time it will > start with a super minimal chroot which only has some essential things, > and that the only build dependencies used will be the ones that are > explicitly defined in the debian/control of the package. Thanks for the advice - it is now on my TODO list to learn to use sbuild or some other tool that makes it easy to do builds in a minimal chroot. Kind regards, Chuck
Bug#1028251: [Pkg-xen-devel] Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64
Control: tag -1 + moreinfo thanks On 1/9/23 8:09 AM, Hans van Kranenburg wrote: > Hi Chuck, > > On 1/8/23 23:18, Chuck Zmudzinski wrote: >> [...] >> >> The build failed: >> >>debian/rules override_dh_missing >> make[1]: Entering directory '/home/chuckz/sources-sid/xen/xen-4.17.0' >> dh_missing --list-missing >> dh_missing: warning: usr/lib/modules-load.d/xen.conf exists in debian/tmp >> but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/proc-xen.mount exists in >> debian/tmp but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/xen-init-dom0.service exists in >> debian/tmp but is not installed to anywhere >> dh_missing: warning: >> usr/lib/systemd/system/xen-qemu-dom0-disk-backend.service exists in >> debian/tmp but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/xen-watchdog.service exists in >> debian/tmp but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/xenconsoled.service exists in >> debian/tmp but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/xendomains.service exists in >> debian/tmp but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/xendriverdomain.service exists >> in debian/tmp but is not installed to anywhere >> dh_missing: warning: usr/lib/systemd/system/xenstored.service exists in >> debian/tmp but is not installed to anywhere > > I cannot reproduce this error here locally and the CI build also succeeds: > > https://salsa.debian.org/xen-team/debian-xen/-/pipelines/481577 I thought I had a fairly clean sid install, but I think the problem on my system could be caused by some obscure grandfathered in setting because the sid I am using was updated from all the way back to an original install of jessie many years ago... It might be time for me to refresh my sid with a clean installation. Out of curiosity and if you have time, can you answer a couple of question if you know the answer? 1. Do the builds on a clean environment produce the missing files listed in my build? 2. Are those systemd service files installed anywhere in the xen binary packages, either in arch=x86_64 packages or for the arch=all packages such as xen-utils-common? If you don't know the answer to these questions I will investigate myself to find the answers, so you can work on more important things. > > How are you building the packages? In a clean build environment, using > for example sbuild or pbuilder, or in an environment where unrelated > other build dependencies could be present, that are not included in the > xen list, but maybe 'wake up and do something' if they're present? As I said, I am building on a sid install that might have some stuff grandfathered in from old releases going back to jessie. I also might have some stale stuff around from my private builds of the traditional device model available from xen that is not part of the Debian packages. I will investigate these possible causes. I use debuild as a frontend to dpkg-buildpackage to build the packages. > > You can also compare your own build output with the full one from the CI > job: > > https://salsa.debian.org/xen-team/debian-xen/-/jobs/3767564/raw I will take a look at that when I get a chance. This is not a real high priority for me, so I am content to let this be until I get a chance to investigate the quirks of my current installation of sid, and I also added the moreinfo tag, so you can ignore this bug if you wish until I do some further research. Cheers, Chuck
Bug#1028251: Updated Patch (Was: xen: FTBFS when building xen binary packages for sid on x86_64)
Sorry, the patch I posted in the original message will not apply properly. I forgot I also edited the comment: Here is the correct patch: --- rules 2022-12-21 16:34:51.0 -0500 +++ rules.new 2023-01-08 05:31:24.0 -0500 @@ -327,9 +327,9 @@ | xargs -0r gzip -9vn # By default, files in debian/tmp which are not handled by anything -# in rules are ignored. This makes them into errors. +# in rules are ignored. This lists them. override_dh_missing: - dh_missing --fail-missing + dh_missing --list-missing # We are dropping the config file /etc/default/xen which appeared in --snip-- Thanks for all your work. Apart from this little problem, it appears Xen 4.17 will work well on Bookworm. Kind regards, Chuck
Bug#1028251: xen: FTBFS when building xen binary packages for sid on x86_64
Source: xen Version: 4.17.0-1 Severity: normal Tags: ftbfs patch Dear Maintainer, Hi, I needed to test a patch to libxl so I started by trying to build xen from source on an up-to-date sid installation. The build failed: debian/rules override_dh_missing make[1]: Entering directory '/home/chuckz/sources-sid/xen/xen-4.17.0' dh_missing --list-missing dh_missing: warning: usr/lib/modules-load.d/xen.conf exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/proc-xen.mount exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xen-init-dom0.service exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xen-qemu-dom0-disk-backend.service exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xen-watchdog.service exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xenconsoled.service exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xendomains.service exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xendriverdomain.service exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/systemd/system/xenstored.service exists in debian/tmp but is not installed to anywhere Please note that this output is after editing the line in debian/rules that is currently dh_missing --fail-missing with dh_missing --list-missing so the missing files only induce a warning instead of FTBFS. So the workaround is this patch to debian/rules: --- a/debian/rules 2023-01-08 16:36:01.605863417 -0500 +++ b/debian/rules 2023-01-08 05:31:24.0 -0500 @@ -329,7 +329,7 @@ # By default, files in debian/tmp which are not handled by anything # in rules are ignored. This lists them. override_dh_missing: - dh_missing --fail-missing + dh_missing --list-missing # We are dropping the config file /etc/default/xen which appeared in ---snip- I presume you know about this and plan to fix it before the next upload, but perhaps a recent systemd update is causing this so I am reporting it here. I also request that if the missing systemd files cannot be installed properly before the next upload of a new version you apply a workaround such as this patch or another workaround until the missing systemd files are installed and configured correctly. Kind regards, Chuck -- System Information: Debian Release: bookworm/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 6.0.0-6-amd64 (SMP w/4 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled
Bug#988333: [Pkg-xen-devel] xen/4.14.3-1: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working: xl -vvv create log
On 10/26/2021 10:06 AM, Chuck Zmudzinski wrote: On 10/25/2021 4:45 PM, Chuck Zmudzinski wrote: On 10/23/2021 11:11 AM, Hans van Kranenburg wrote: Hi! On 5/10/2021 1:33 PM, Chuck Zmudzinski wrote: [...] with buster and bullseye running as the Dom0, I can only get the VGA/Passthrough feature to work with Windows Xen HVMs. I would expect both Windows and Linux HVMs to work comparably well. A possible time-saver that I can recommend is to send a post to the upstream xen-users list [0] about this already. Like "Hi all, I'm starting a HVM Linux domU with Linux 5.10.70 on a Xen 4.14.3 system with also 5.10.70 dom0 kernel, with this and this domU config file. It fails to start, this is the xl -vvv create output, and this error (the irq stuff) appears in the dom0 kernel log.". Try to keep it simple and not too long initially, without the surrounding stories, to increase chance of it being fully read. I can do this soon - I have some more interesting tests to share here and with the Xen developers upstream. I will need to think a little about how to present this bug to the Xen upstream developers in a short and simple enough way for them to be likely to read it initially. For now, I will report here some results from the journal log entries of both Bullseye dom0 and Bullseye domU for two different configurations. These logs are not generated with the -vvv option, but they do provide quite a bit of interesting information and are already somewhat overwhelming, even without the -vvv option. So I will hold off for now before making the logs even more verbose with -vvv. Now I add output of xl create with -vvv option: chuckz@debian:~$ sudo xl -vvv create bullseye-hvm.cfg Parsing config from bullseye-hvm.cfg libxl: debug: libxl_create.c:2017:do_domain_create: ao 0x55c97f27e180: create: how=(nil) callback=(nil) poller=0x55c97f27e220 libxl: detail: libxl_create.c:622:libxl__domain_make: passthrough: sync_pt libxl: debug: libxl_device.c:379:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown libxl: debug: libxl_device.c:413:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy libxl: debug: libxl_device.c:379:libxl__device_disk_set_backend: Disk vdev=xvdb spec.backend=unknown libxl: debug: libxl_device.c:413:libxl__device_disk_set_backend: Disk vdev=xvdb, using backend phy libxl: debug: libxl_create.c:1279:initiate_domain_create: Domain 2:running bootloader libxl: debug: libxl_bootloader.c:328:libxl__bootloader_run: Domain 2:not a PV/PVH domain, skipping bootloader libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x55c97f284148: deregister unregistered libxl: detail: libxl_x86.c:338:hvm_set_viridian_features: base group enabled libxl: detail: libxl_x86.c:338:hvm_set_viridian_features: freq group enabled libxl: detail: libxl_x86.c:338:hvm_set_viridian_features: time_ref_count group enabled libxl: detail: libxl_x86.c:338:hvm_set_viridian_features: apic_assist group enabled libxl: detail: libxl_x86.c:338:hvm_set_viridian_features: crash_ctl group enabled domainbuilder: detail: xc_dom_allocate: cmdline="", features="" domainbuilder: detail: xc_dom_kernel_file: filename="/usr/lib/xen-4.14/boot/hvmloader" domainbuilder: detail: xc_dom_malloc_filemap : 329 kB libxl: debug: libxl_dom.c:829:libxl__load_hvm_firmware_module: Loading BIOS: /usr/share/seabios/bios-256k.bin domainbuilder: detail: xc_dom_boot_xen_init: ver 4.14, caps xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 domainbuilder: detail: xc_dom_parse_image: called domainbuilder: detail: xc_dom_find_loader: trying multiboot-binary loader ... domainbuilder: detail: loader probe failed domainbuilder: detail: xc_dom_find_loader: trying HVM-generic loader ... domainbuilder: detail: loader probe OK xc: detail: ELF: phdr: paddr=0x10 memsz=0x5bfc4 xc: detail: ELF: memory: 0x10 -> 0x15bfc4 domainbuilder: detail: xc_dom_mem_init: mem 3072 MB, pages 0xc pages, 4k each domainbuilder: detail: xc_dom_mem_init: 0xc pages domainbuilder: detail: xc_dom_boot_mem_init: called domainbuilder: detail: range: start=0x0 end=0xc000 xc: detail: PHYSICAL MEMORY ALLOCATION: xc: detail: 4KB PAGES: 0x0200 xc: detail: 2MB PAGES: 0x01ff xc: detail: 1GB PAGES: 0x0002 domainbuilder: detail: xc_dom_build_image: called domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x100+0x5c at 0x7f20de301000 domainbuilder: detail: xc_dom_alloc_segment: kernel : 0x10 -> 0x15c000 (pfn 0x100 + 0x5c pages) xc: detail: ELF: phdr 0 at 0x7f20de2a5000 -> 0x7f20de2f7420 domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x15c+0x40 at 0x7f20de2c1000 domainbuilder: detail: xc_dom_alloc_segment: System Firmware module : 0x15c000 -> 0x19c000 (pfn 0x15c + 0x40 pages) domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x19c+0x1 at 0
Bug#988333: [Pkg-xen-devel] linux-image-5.10.0-6-amd64: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working, but Windows Xen HVMs do work
On 10/26/2021 10:06 AM, Chuck Zmudzinski wrote: On 10/25/2021 4:45 PM, Chuck Zmudzinski wrote: On 10/23/2021 11:11 AM, Hans van Kranenburg wrote: Hi! On 5/10/2021 1:33 PM, Chuck Zmudzinski wrote: [...] with buster and bullseye running as the Dom0, I can only get the VGA/Passthrough feature to work with Windows Xen HVMs. I would expect both Windows and Linux HVMs to work comparably well. A possible time-saver that I can recommend is to send a post to the upstream xen-users list [0] about this already. Like "Hi all, I'm starting a HVM Linux domU with Linux 5.10.70 on a Xen 4.14.3 system with also 5.10.70 dom0 kernel, with this and this domU config file. It fails to start, this is the xl -vvv create output, and this error (the irq stuff) appears in the dom0 kernel log.". Try to keep it simple and not too long initially, without the surrounding stories, to increase chance of it being fully read. I can do this soon - I have some more interesting tests to share here and with the Xen developers upstream. I will need to think a little about how to present this bug to the Xen upstream developers in a short and simple enough way for them to be likely to read it initially. For now, I will report here some results from the journal log entries of both Bullseye dom0 and Bullseye domU for two different configurations. These logs are not generated with the -vvv option, but they do provide quite a bit of interesting information and are already somewhat overwhelming, even without the -vvv option. So I will hold off for now before making the logs even more verbose with -vvv. The intention of this message is to provide detailed logs for a detailed analysis of the problem, not to describe the problem in simple terms. A few days ago I ran two tests, and I have four different log files attached from those tests. In both tests, the Bullseye HVM was configured for PCI/IGD passthrough using the domain config file and preparation for passthrough in dom0 described in the earlier message #31: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988333#31 The two tests were: 1. Bullseye dom0, Debian 11.1 / Bullseye HVM domU, Debian 11.1 This first test essentially confirmed that the updated versions of the packages for both Bullseye dom0 and Bullseye domU since the original report five months ago do not fix the problem. In this test case, I am using all the official packages of Debian 11.1 (Bullseye). It is important to note that the version of the device model used in this test is the official upstream version of qemu for Bullseye. On Debian, Xen uses by default the qemu-system-i386 binary from the qemu-system-x86 package, and Bullseye currently uses qemu version 5.2+dfsg-11+deb11u1 as the default device model. I attached two log files from this test: qemu-upstream-hvm.txt and qemu-upstream-dom0.txt. They are the logged journal entries for the Bullseye HVM and Bullseye dom0 domains, respectively. They are fairly complete logs, showing the kernel version running in both the dom0 and the HVM, the kernel command line for both the dom0 and the domU, the command that was used to create the HVM domain, etc. One might recall that in the original report I said it was difficult to capture logs from the domU, but this time I was able to capture the log by waiting a few minutes before shutting it down. I also discovered, in contrast to what I said in the earlier report, that it is possible to gracefully shut down the domU using xl shutdown by waiting long enough before trying to shut it down, and also it takes a few minutes instead of the normal few seconds to shut it down because of the problems caused by this configuration. By waiting for the graceful shutdown instead of using xl destroy , I was able to view the log of the attempted boot in the domU on a subsequent normal boot (without PCI passthrough) using journalctl, and capture some useful Call Traces. For this first test, although there is a successful shut down, the domain is never built to the point where one can login, neither at the terminal nor remotely via ssh. But the boot messages were displayed on the passed through video device, but only very slowly, it took almost two minutes before the boot messages started to appear and it also took a couple of minutes after issuing the xl shutdown command in dom0 before it indicated on the passed through video device that the HVM domain shut down and powered off. The second test: 2. Same as first test, except use the qemu traditional device model instead of the qemu upstream model which on Debian comes from the qemu-system-x86 package. I also attached two log files from this test: qemu-traditional-hvm.txt and qemu-traditional-dom0.txt, and these also are fairly complete logs showing the kernel version in use, etc. Since Debian does not provide the traditional device model, I had to build it from xenbits.xen.org: https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=shortlog;h=refs/heads/st
Bug#988333: [Pkg-xen-devel] linux-image-5.10.0-6-amd64: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working, but Windows Xen HVMs do work
On 10/23/2021 11:11 AM, Hans van Kranenburg wrote: Can you share the domU config file? Yes, here it is: builder = 'hvm' memory = '3072' vcpus = '4' device_model_version = 'qemu-xen' # device_model_version = 'qemu-xen-traditional' # This is now bullseye disk = ['/dev/systems/linux,,xvda,w','/dev/data/linuxdata,,xvdb,w'] name = 'bullseye-hvm' vif = [ 'mac=00:16:3E:27:2C:AA,model=e1000,script=vif-route.hvm,ip=192.168.1.4' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' boot = 'c' acpi = '1' apic = '1' viridian = '1' xen_platform_pci = '1' serial = 'pty' vga = 'none' sdl = '0' vnc = '0' gfx_passthru = '1' pci = [ '00:1b.0', '00:14.0,rdm_policy=relaxed', '00:02.0' ] And, other configs you need to have in place to exclude the devices from being seen as normal devices directly in dom0? (I haven't used passthrough myself yet, but I read that this is needed.) I run this script in Dom0 before starting the domain: #!/bin/bash modprobe xen-pciback xl pci-assignable-add 00:02.0 xl pci-assignable-add 00:14.0 xl pci-assignable-add 00:1b.0 xl pci-assignable-list The script makes the Intel IGD, USB 3.0 controller, and sound device available to an unprivileged domain. the pci = ... statement in the domain config corresponds to these same three PCI devices. I forgot to add that you need to run lspci in dom0 to get the PCI bus, slot and function numbers of the PCI devices you want to pass through to the unprivileged domain. On my system, this is what I got: $lspci 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) 00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06) 00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05) 00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-V (rev 05) 00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05) 00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05) 00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5) 00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d5) 00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d5) 00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05) 00:1f.0 ISA bridge: Intel Corporation B85 Express LPC Controller (rev 05) 00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05) 00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05) 02:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 04) 04:00.0 Network controller: Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) (rev 01) The PCI slot and function numbers might be different for the USB controller and sound card on other systems, but it is always 00:02.0 for the Intel IGD from what I have read. On my system, from the output of lspci listed above the USB 3.0 controller (xHCI as opposed to EHCI which indicates an older and slower USB 2.0 controller) is 00:14.0 and the sound card is 00:1b.0, and this is where the arguments for the pci = ... statement in the domU config file and the xl pci-assignable-add commands come from. Also, it was necessary to add the rdm_policy=relaxed option to the USB card in the pci = statement, and YMMV with different hardware as far as how compatible your PCI devices are with PCI passthrough. From what I have read, the relaxed rdm_policy setting is needed because my USB card's memory overlaps with other devices, and I think Intel does a better job isolating the PCI devices with newer hardware. My box is now almost seven years old and I think newer hardware might not need that relaxed rdm_policy setting. It would be better to have hardware that works without this relaxed rdm_policy because allowing passthrough of devices that overlap with other devices' memory is obviously a security concern, but my setup does not involve any untrusted domains so I am comfortable using it in my environment. All the best, Chuck
Bug#988333: [Pkg-xen-devel] linux-image-5.10.0-6-amd64: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working, but Windows Xen HVMs do work
On 10/23/2021 11:11 AM, Hans van Kranenburg wrote: Hi! On 10/19/21 5:44 AM, Chuck Zmudzinski wrote: On 5/10/2021 1:33 PM, Chuck Zmudzinski wrote: [...] with buster and bullseye running as the Dom0, I can only get the VGA/Passthrough feature to work with Windows Xen HVMs. I would expect both Windows and Linux HVMs to work comparably well. You don't mention the used Xen version (Debian package version) for buster and bullseye anywhere, so I'll assume it's the latest 4.14.3-1(~deb11u1) one. Yes, That's the version. The original report from five months ago was an earlier version but the latest version still behaves the same way. I just tested it a couple of days ago. [...] The biggest problems were that the Dom0 reported problems with IRQ 16 being disabled after starting the bullseye HVM DomU, and only xl destroy could be used to stop the corrupted process. Well, at least we have an error somewhere already. That's a starting point. Can you share the domU config file? Yes, here it is: builder = 'hvm' memory = '3072' vcpus = '4' device_model_version = 'qemu-xen' # device_model_version = 'qemu-xen-traditional' # This is now bullseye disk = ['/dev/systems/linux,,xvda,w','/dev/data/linuxdata,,xvdb,w'] name = 'bullseye-hvm' vif = [ 'mac=00:16:3E:27:2C:AA,model=e1000,script=vif-route.hvm,ip=192.168.1.4' ] on_poweroff = 'destroy' on_reboot = 'restart' on_crash = 'restart' boot = 'c' acpi = '1' apic = '1' viridian = '1' xen_platform_pci = '1' serial = 'pty' vga = 'none' sdl = '0' vnc = '0' gfx_passthru = '1' pci = [ '00:1b.0', '00:14.0,rdm_policy=relaxed', '00:02.0' ] And, other configs you need to have in place to exclude the devices from being seen as normal devices directly in dom0? (I haven't used passthrough myself yet, but I read that this is needed.) I run this script in Dom0 before starting the domain: #!/bin/bash modprobe xen-pciback xl pci-assignable-add 00:02.0 xl pci-assignable-add 00:14.0 xl pci-assignable-add 00:1b.0 xl pci-assignable-list The script makes the Intel IGD, USB 3.0 controller, and sound device available to the domain. the pci = ... statement in the domain config corresponds to these same three PCI devices. Can you share more verbose logging done by xl create when using xl -vvv create ? I don' have time now, but will do this and report tomorrow. But, AFAIK what you want to do should be possible yes. The bullseye HVM DomU still fails to boot on an up-to-date bullseye Xen Dom0 configured to pass through the same PCI/IGD devices. The bullseye HVM DomU with IGD passthrough has so far only been verified to work on an old, slightly modified jessie Xen Dom0. More Details: These latest tests are with linux version 5.10.70-1 for bullseye stable. For the jessie Dom0, which worked with the unmodified bullseye HVM DomU, I had to add a few patches to the old jessie Xen packages so the unmodified bullseye Xen HVM Ok, yes, clear, that makes the domU kernel not the primary suspect. These tests demonstrate that a fix for this bug is possible in src:xen rather than in src:linux, but the patches needed to fix this bug in Xen 4.14, which is the version of Xen on bullseye, are not yet identified. It might also be possible (just a wild guess) that for Xen 4.14, the options in the domU config file need to be different than for Xen 4.4. They are a little different already, 4.4 did not need the rdm_policy setting. But you are right, there are other settings I haven't checked yet. I will report on some more tests I have done tomorrow when I have more ti I will continue to investigate this issue and try to bisect the problem as it recurs in Dom0 for some version of Xen > 4.4 and <= 4.14. It will obviously take some time since there are so many differences between Xen 4.4 and 4.14. If you can make progress on that, and find an actual commit that changes the behavior, then we're probably at 95% towards finding a cause and solution. :) That'd be great. A possible time-saver that I can recommend is to send a post to the upstream xen-users list [0] about this already. Like "Hi all, I'm starting a HVM Linux domU with Linux 5.10.70 on a Xen 4.14.3 system with also 5.10.70 dom0 kernel, with this and this domU config file. It fails to start, this is the xl -vvv create output, and this error (the irq stuff) appears in the dom0 kernel log.". Try to keep it simple and not too long initially, without the surrounding stories, to increase chance of it being fully read. I can do this soon - I have some more interesting tests to share here and with the Xen developers upstream. If I find a fix in src:xen for Xen >=4.14 Dom0 on bullseye or sid, I will r
Bug#988333: linux-image-5.10.0-6-amd64: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working, but Windows Xen HVMs do work
On 5/10/2021 1:33 PM, Chuck Zmudzinski wrote: Package: src:linux Version: 5.10.28-1 Severity: normal Tags: upstream Dear Maintainer, I have been using Xen's PCI and VGA passthrough feature since wheezy and jessie were the stable versions, and back then both Windows HVMs and Linux HVMs would function with the Intel Integrated Graphics Device (IGD), the audio device, and the USB 3 controller passed to them. But with buster and bullseye running as the Dom0, I can only get the VGA/Passthrough feature to work with Windows Xen HVMs. I would expect both Windows and Linux HVMs to work comparably well. Dear Debian Kernel Team and Debian Xen Team, I originally reported this bug in src:linux, as described above, but recent tests indicate a fix can be made in src:xen without any modifications to src:linux, so I suggest reassigning it from src:linux to src:xen. My explanation follows: On my system which is an ASRock B85M Pro4 (Haswell), with BIOS P2.50 12/11/2015, and with a jessie Xen Dom0 with a few patches to the old jessie Xen packages, I was able to successfully pass through the USB 3.0 controller, the sound card, and the Intel IGD to an unmodified bullseye HVM DomU without any of the problems I reported in the original bug report (message #5): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=988333#5 The biggest problems were that the Dom0 reported problems with IRQ 16 being disabled after starting the bullseye HVM DomU, and only xl destroy could be used to stop the corrupted process. The bullseye HVM DomU still fails to boot on an up-to-date bullseye Xen Dom0 configured to pass through the same PCI/IGD devices. The bullseye HVM DomU with IGD passthrough has so far only been verified to work on an old, slightly modified jessie Xen Dom0. More Details: These latest tests are with linux version 5.10.70-1 for bullseye stable. For the jessie Dom0, which worked with the unmodified bullseye HVM DomU, I had to add a few patches to the old jessie Xen packages so the unmodified bullseye Xen HVM DomU would boot on the jessie Xen Dom0 that uses a fairly old version of Xen (version 4.4). Specifically, it was necessary to add two upstream Xen patches to the old jessie Xen-4.4 packages: 1. https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=98297f0 2. https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=09a4ef8 The first patch is needed to support booting Linux kernels >= 4.10 in a Xen HVM DomU on a Xen 4.4 Dom0, since that is when the Linux kernel started validating the timestamp counter adjust msr for hvm guests, and the validation fails on Xen 4.4 without the first patch. Modern versions of Linux expect the Dom0 to provide feature-XXX flags in xenstore for the DomU. The 4.4 version of libxl does not provide this so the second patch provides it for version 4.4 of libxl. Neither of these two patches specifically solves the problem with IGD passthrough, they are simply needed to enable the old Xen 4.4 hypervisor and tools to boot modern Linux kernels in a Xen HVM DomU, with or without PCI/IGD passthrough. It was also necessary to use the ancient Xen qemu traditional device model since the old Xen 4.4 does not support IGD passthrough with the older upstream qemu device model for jessie. I did not do anything special here. I just compiled the qemu-xen-traditional binary from the xenbits git repository: https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=commit;h=204a7fc1 and recompiled hvmloader with rombios as required by qemu-xen-traditional and integrated these with the Debian packaging of Xen 4.4 for jessie the way it was done with Xen 4.1 on wheezy before Debian removed qemu-xen-traditional from the Debian Xen packages. On the jessie Dom0, no other changes were made. For example, it used the latest 3.16.0-11 kernel for jessie. These tests demonstrate that a fix for this bug is possible in src:xen rather than in src:linux, but the patches needed to fix this bug in Xen 4.14, which is the version of Xen on bullseye, are not yet identified. I will continue to investigate this issue and try to bisect the problem as it recurs in Dom0 for some version of Xen > 4.4 and <= 4.14. It will obviously take some time since there are so many differences between Xen 4.4 and 4.14. If I find a fix in src:xen for Xen >=4.14 Dom0 on bullseye or sid, I will reassign #988333 to src:xen myself. Until then, I will leave it to the discretion of the Debian Kernel Team to decide whether or not to reassign it to src:xen now. Regards, Chuck
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
On 10/4/2021 1:51 PM, Diederik de Haas wrote: On Monday, 4 October 2021 17:27:22 CEST Chuck Zmudzinski wrote: I can confirm these 4 fix the bug on my hardware. \o/ Thanks for testing and reporting back :-) Cheers, Diederik Thank you, Diederik, for your good work finding the commits from upstream that fix the bug. And also thanks to you, Andy, for helping fix this bug in the IRC and for your interest and support of the Debian Xen Team's work. Cheers, Chuck
Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?
As discussed in message #91, the submitter of this bug accepts the package maintainer's fix which will close this bug.
Bug#994899: Bug#991967: Simply ACPI powerdown/reset issue?
On 10/4/2021 6:57 AM, Diederik de Haas wrote: On Monday, 4 October 2021 11:46:54 CEST Hans van Kranenburg wrote: The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I agree with Diederik that we should keep them all together. Context: Those 4 are part of 1 patch-set posted here: https://lists.xen.org/archives/html/xen-devel/2020-11/msg01516.html The 5th was already debatable and I choose to include it in my MR, but I'm fine with not including that one. Cheers, Diederik As the submitter of #994899, I can confirm these 4 fix the bug on my hardware. I agree this fix can close #994899 and #995341, since as Hans noted, they are part of the upstream stable 4.15 branch and I presume that will make them stable enough for bullseye. Thank you Hans, Diederik, and Elliott. All the best, Chuck
Bug#995341: Highly inappropriate behavior which the RT should be aware of
On 10/3/2021 11:21 AM, Chuck Zmudzinski wrote: On 10/1/2021 5:48 AM, Diederik de Haas wrote: We've already identified a possible fix, which I can point to if so desired, I think the fix referred to is here: https://salsa.debian.org/xen-team/debian-xen/-/tree/knorrie/for-diederik-3-fixes AFAICT, this fix involves adding three more commits Slight correction - it actually looks like this proposal involves 3 fixes for diederik, but it actually involves five new commits from the upstream unstable Xen 4.16 branch, as indicated by the five new patches in the debian/patches directory. from the unstable upstream Xen 4.16 branch in addition to the nine commits already added from unstable upstream Xen 4.16 to provide better support for the Raspberry Pi 4 but with the unintended side effect of #994899. I do not object to this fix for Debian's current unstable distribution. However, this bug concerns Debian's current stable version, bullseye, not sid/unstable. I would respectfully disagree with the Release Team's decision to migrate the aforementioned fix from the unstable release to bullseye unless the fix is accepted by the upstream Xen project in its stable 4.14 branch and its future stable point releases 4.14.x. IMO, the debdiff attached to message #30: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995341#30 is a better suited fix more in accordance with the stability/security requirements of a typical Debian stable release. Regards, Chuck Zmudzinski
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
The original submitter has proposed a fix (see messages #30 and #35). Another contributor to this report has indicated the package maintainer does not endorse the submitter of the bug's proposed fix and is working on another fix (see messages #40 and #65). The original submitter of the bug thinks the possible solution proposed in messages #40 and #65 does not meet the typical stability/security requirements for a typical Debian stable release.
Bug#995341: Highly inappropriate behavior which the RT should be aware of
On 10/1/2021 5:48 AM, Diederik de Haas wrote: We've already identified a possible fix, which I can point to if so desired, I think the fix referred to is here: https://salsa.debian.org/xen-team/debian-xen/-/tree/knorrie/for-diederik-3-fixes AFAICT, this fix involves adding three more commits from the unstable upstream Xen 4.16 branch in addition to the nine commits already added from unstable upstream Xen 4.16 to provide better support for the Raspberry Pi 4 but with the unintended side effect of #994899. I do not object to this fix for Debian's current unstable distribution. However, this bug concerns Debian's current stable version, bullseye, not sid/unstable. I would respectfully disagree with the Release Team's decision to migrate the aforementioned fix from the unstable release to bullseye unless the fix is accepted by the upstream Xen project in its stable 4.14 branch and its future stable point releases 4.14.x. IMO, the debdiff attached to message #30: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995341#30 is a better suited fix more in accordance with the stability/security requirements of a typical Debian stable release. Regards, Chuck Zmudzinski
Bug#995341: Highly inappropriate behavior which the RT should be aware of
On 10/1/2021 5:48 AM, Diederik de Haas wrote: Hi Release Team, I want to make sure that you're aware of what I consider HIGHLY inappropriate behavior by Chuck where he is trying to sidestep/override the Xen maintainers by filing this bug directly to the release.debian.org pseudo package. This only appeared on the Debian Xen maintainers' ML because Chuck went on a severity-dance where he *also* changed the severity of bug #994899, which _is_ assigned to the Xen package and therefor the Xen maintainers could see it. In https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899#10 I tried to steer the efforts to getting the issue fixed in a more constructive manner. I failed at that. We've already identified a possible fix, which I can point to if so desired, but I don't think the RT should be bothered with this (dispute). I respectfully disagree, because as I have mentioned repeatedly both in this report and in #994899 that the process of migrating the stable package of the xen hypervisor for amd64 broke because when bullseye was released on August, 14, 2021, that package still contained patches from the unstable upstream Xen 4.16 branch, whereas the version advertised by Debian for the stable release was, and still is, the stable upstream of Xen, version 4.14. It may be news to Chuck, but not the RT, but a package maintainer has the prerogative to include additional patches in the package that gets uploaded to the Debian archive. (It happens all the time) And that can introduce bugs. Shit happens. You learn from that. And then you go on fixing those bugs *in coordination with* the package maintainers. Perhaps it may be news to Diederik that the Release Team does have the prerogative to review and either accept or reject a series of patches from an unstable upstream branch into the stable release and respond to a request from a user/volunteer to review such patches that obviously can and in fact did cause bug #994899 in this case. It may have just been an oversight, but in this case, IMO, the package maintainers *should* have notified the release team of the unstable patches from Xen 4.16 that were in the supposedly stable 4.14 xen hypervisor package for amd64 sometime BEFORE bullseye was released as the new stable version on August 14, 2021, so the Release Team could decide if the unstable patches could stay in the formal release of bullseye. IMHO, it is up to the Release Team, not the package maintainers, to decide if Debian specific patches from an UNSTABLE upstream branch can remain in a package of the STABLE upstream version at the time of the stable release. The package maintainers never gave the Release Team a chance to review the upstream unstable patches before bullseye was released. It is also for the Release Team, not the package maintainers, to decide if those unstable patches can remain after a user/volunteer requests that they be removed as the appropriate way to fix a bug in the stable release that is caused by the presence of the unstable patches in the stable release. I would be much less inclined to request that the Release Team review the unstable patches that are causing #994899 if there was some evidence that upstream plans to eventually backport those patches from Xen 4.16 to Xen 4.14. At present no such evidence exists, and perhaps a way to resolve this controversy is for Debian to submit a pull request to the Xen project to merge the unstable patches in Debian's current Xen 4.14 packages into Xen's stable Xen 4.14 branch. If upstream endorses the unstable patches as suitable for their 4.14 stable release and eventully commits them to their 4.14 branch and subsequent upstream point releases, then I would also accept them as appropriate for the Debian package of the upstream stable 4.14 version of Xen that targets the stable version, currently bullseye. Regards, Chuck Zmudzinski What you don't do, is try to go above/around them by addressing the RT directly. One should have at least the decency to directly To/CC the package maintainer when you do, which in 99.99+% of cases you REALLY should not do. Regards, Diederik
Bug#995341: Highly inappropriate behavior which the RT should be aware of
On 10/1/2021 5:48 AM, Diederik de Haas wrote: Hi Release Team, I want to make sure that you're aware of what I consider HIGHLY inappropriate behavior by Chuck where he is trying to sidestep/override the Xen maintainers by filing this bug directly to the release.debian.org pseudo package. I consider it also highly inappropriate for one volunteer to criticize a newcomer volunteer without at least a Cc to the volunteer he is criticizing, to give the volunteer under attack an opportunity to respond and defend herself/himself. This only appeared on the Debian Xen maintainers' ML because Chuck went on a severity-dance where he *also* changed the severity of bug #994899, which _is_ assigned to the Xen package and therefor the Xen maintainers could see it. In https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899#10 I tried to steer the efforts to getting the issue fixed in a more constructive manner. I failed at that. We've already identified a possible fix, which I can point to if so desired, but I don't think the RT should be bothered with this (dispute). It may be news to Chuck, but not the RT, but a package maintainer has the prerogative to include additional patches in the package that gets uploaded to the Debian archive. (It happens all the time) And that can introduce bugs. Shit happens. You learn from that. And then you go on fixing those bugs *in coordination with* the package maintainers. I agree package maintainers must have a say. and as far as I can tell the Release Team has the final say on what goes into the stable release. I have tried to cooperate with volunteers for the package maintainers, but they refused to cooperate with me. When volunteers for the package maintainers are uncooperative and excessively critical and unfair to a newcomer volunteer, what is a newcomer to do? Does Debian really consider this the best way to sustain the community and acquire new developers as veterans move on or quit? IMHO, following Diederik's approach toward newcomers will result in a slow and painful death for Debian as competent developers move on and no one is there to replace them because it is just not worth the personal attacks one must endure when trying to contribute to Debian. What you don't do, is try to go above/around them by addressing the RT directly. One should have at least the decency to directly To/CC the package maintainer when you do, which in 99.99+% of cases you REALLY should not do. One should also have the decency to Cc a person one is criticizing, something i have done by sending a Cc to the person I am criticizing (in my defense of his attack on me). Diederik did not have this decency when he criticized me. Respectfully, Chuck Zmudzinski Regards, Diederik
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
On 9/29/2021 7:26 PM, Chuck Zmudzinski wrote: Special instructions for applying the debdiff: Please note that an updated debdiff has been provided to target the correct distribution and use the correct version number for the updated package at the following link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995341#30 Also, a clarification: The special instructions below (the quilt pop -a and quilt push -a commands) are only required to test the debdiff on a local build. No special processing should be required on automated test builds. Chuck 1. Download the attached debdiff, bug#994899.diff, into a working directory Then in the working directory with the attached bug#994899.diff and the archived/compressed source file archives for xen_4.14.3-1~deb11u1 in it: 2. dpkg-source -x xen_4.14.3-1~deb11u1.dsc 3. cd xen-4.14.3 4. quilt pop -a 5. patch -p1 < ../bug#994899.diff 6. quilt push -a After these 6 steps, the tree is ready to build the source/binary packages.
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
On 9/29/2021 7:26 PM, Chuck Zmudzinski wrote: Ordinarily, as I understand the process, a bug in the stable version is first fixed in the unstable release and then the fix is migrated (backported) to the stable release. But it appears to me a fix in the unstable release will not be forthcoming soon, or it might be a different bug (see #991967, affecting the unstable release, sid, for more details). Another way to look at this unusual situation: At the present time the Xen packages targeting the unstable version are identical to the Xen packages targeting the stable version. In other words, either the stable version is not really stable or the unstable version is actually stable. I argue it is the former and somewhere along the line the process for migrating a stable version of Xen into bullseye broke. I have identified when the process broke. It was when patches from unstable upstream Xen 4.16 were migrated to bullseye even though the upstream version of Xen for both stable and unstable was stable upstream Xen 4.14. In other words, the current Debian version of Xen targeting the stable distribution is actually an unstable version of Debian Xen that is a mixture of mostly stable Xen 4.14 and nine unstable patches from upstream Xen 4.16. This is causing instabilities and bugs such as #994899 and #991967 on amd64 and also likely i386. This upload to stable fixes this by removing the instabilities on amd64 and i386 without removing the good work done by the Debian Xen Team improving support for arm devices. So it is a win-win to accept this upload to stable. Going forward, work on Debian's unstable version of Xen can continue with investigating and fixing #991967 and eventually updating to a newer upstream version, which will probably be at least Xen 4.16 which in my tests already show that #994899 is fixed upstream in Xen 4.16, as discussed here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899#5 To quote (with corrections and clarifications) from the aforementioned message: I also tested the current unstable (master) branch from Xen upstream, which is xen-c76cfad, which (upstream) calls Xen-4.16-unstable. I tested the current bullseye kernel (5.10.46-4) as a dom0 on that upstream Xen-4.16 hypervisor and did not see the bug, so this most definitely is NOT an upstream bug. So far all practical purposes, #994899 *is* fixed upstream in upstream's unstable version Xen 4.16. So we should be free to patch it in stable now. Chuck
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
On 9/30/2021 2:57 PM, Paul Gevers wrote: Hi Chuck, On 30-09-2021 18:15, Chuck Zmudzinski wrote: ... the debdiff I uploaded to BTS has UNRELEASED rather than bullseye for the distribution field of the changelog, and the new target version is ...deb11u1.1 instead of deb11u2. That is how dch formatted the changelog when I generated the debdiff. Let me know if I need to fix the debdiff filename and changelog with a new debdiff that corrects those, and I will upload it to 995...@bugs.debian.org. You could indeed send a follow-up with a debdiff that fixes these issues as that saves the stable release managers a round trip to you and less brain power. You'd need to fix it locally anyways to actually do the upload. Update debdiff with these fixes to changelog is attached to this message. Changelog also Closes #994899. Chuck diff -Nru xen-4.14.3/debian/changelog xen-4.14.3/debian/changelog --- xen-4.14.3/debian/changelog 2021-09-13 10:28:21.0 -0400 +++ xen-4.14.3/debian/changelog 2021-09-30 16:06:50.0 -0400 @@ -1,3 +1,12 @@ +xen (4.14.3-1~deb11u2) bullseye; urgency=medium + + * Non-maintainer upload. + * debian/patches - move RPI4 patches into a separate directory + * debian/rules - disable RPI4 patches on amd64|i386 (Closes: #994899) + * debian/control - add Build Dependency quilt + + -- Chuck Zmudzinski Thu, 30 Sep 2021 16:06:50 -0400 + xen (4.14.3-1~deb11u1) bullseye-security; urgency=medium * Rebuild for bullseye-security diff -Nru xen-4.14.3/debian/control xen-4.14.3/debian/control --- xen-4.14.3/debian/control 2021-07-10 08:01:39.0 -0400 +++ xen-4.14.3/debian/control 2021-09-26 22:21:51.0 -0400 @@ -34,6 +34,7 @@ markdown, ocaml-native-compilers | ocaml-nox, ocaml-findlib, + quilt, Homepage: https://xenproject.org/ Vcs-Browser: https://salsa.debian.org/xen-team/debian-xen Vcs-Git: https://salsa.debian.org/xen-team/debian-xen.git diff -Nru xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch --- xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 2021-09-13 10:25:25.0 -0400 +++ xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 1969-12-31 19:00:00.0 -0500 @@ -1,105 +0,0 @@ -From: Stefano Stabellini -Date: Fri, 2 Oct 2020 13:47:17 -0700 -Subject: xen/rpi4: implement watchdog-based reset - -The preferred method to reboot RPi4 is PSCI. If it is not available, -touching the watchdog is required to be able to reboot the board. - -The implementation is based on -drivers/watchdog/bcm2835_wdt.c:__bcm2835_restart in Linux v5.9-rc7. - -Signed-off-by: Stefano Stabellini -Acked-by: Julien Grall -Reviewed-by: Bertrand Marquis -Tested-by: Roman Shaposhnik -CC: ro...@zededa.com -(cherry picked from commit 25849c8b16f2a5b7fcd0a823e80a5f1b590291f9) - xen/arch/arm/platforms/brcm-raspberry-pi.c | 61 ++ - 1 file changed, 61 insertions(+) - -diff --git a/xen/arch/arm/platforms/brcm-raspberry-pi.c b/xen/arch/arm/platforms/brcm-raspberry-pi.c -index f5ae58a..811b40b 100644 a/xen/arch/arm/platforms/brcm-raspberry-pi.c -+++ b/xen/arch/arm/platforms/brcm-raspberry-pi.c -@@ -17,6 +17,10 @@ - * GNU General Public License for more details. - */ - -+#include -+#include -+#include -+#include - #include - - static const char *const rpi4_dt_compat[] __initconst = -@@ -37,12 +41,69 @@ static const struct dt_device_match rpi4_blacklist_dev[] __initconst = - * The aux peripheral also shares a page with the aux UART. - */ - DT_MATCH_COMPATIBLE("brcm,bcm2835-aux"), -+/* Special device used for rebooting */ -+DT_MATCH_COMPATIBLE("brcm,bcm2835-pm"), - { /* sentinel */ }, - }; - -+ -+#define PM_PASSWORD 0x5a00 -+#define PM_RSTC 0x1c -+#define PM_WDOG 0x24 -+#define PM_RSTC_WRCFG_FULL_RESET0x0020 -+#define PM_RSTC_WRCFG_CLR 0xffcf -+ -+static void __iomem *rpi4_map_watchdog(void) -+{ -+void __iomem *base; -+struct dt_device_node *node; -+paddr_t start, len; -+int ret; -+ -+node = dt_find_compatible_node(NULL, NULL, "brcm,bcm2835-pm"); -+if ( !node ) -+return NULL; -+ -+ret = dt_device_get_address(node, 0, &start, &len); -+if ( ret ) -+{ -+printk("Cannot read watchdog register address\n"); -+return NULL; -+} -+ -+base = ioremap_nocache(start & PAGE_MASK, PAGE_SIZE); -+if ( !base ) -+{ -+printk("Unable to map watchdog register!\n"); -+return NULL; -+} -+ -+return base; -+} -+ -+static void rpi4_reset(void) -+{ -+uint32_t val; -+void __iomem *base = rpi4_map_watchdog(); -+ -+if ( !base ) -+return; -+ -+/* use a timeout of 10 ticks (~150us) */ -+
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
On 9/30/2021 2:57 PM, Paul Gevers wrote: Hi Chuck, On 30-09-2021 18:15, Chuck Zmudzinski wrote: ... the debdiff I uploaded to BTS has UNRELEASED rather than bullseye for the distribution field of the changelog, and the new target version is ...deb11u1.1 instead of deb11u2. That is how dch formatted the changelog when I generated the debdiff. Let me know if I need to fix the debdiff filename and changelog with a new debdiff that corrects those, and I will upload it to 995...@bugs.debian.org. You could indeed send a follow-up with a debdiff that fixes these issues as that saves the stable release managers a round trip to you and less brain power. You'd need to fix it locally anyways to actually do the upload. Two notes: 1) https://lists.debian.org/debian-devel-announce/2019/08/msg0.html (under "Workflow"). 2) https://lists.debian.org/debian-live/2021/09/msg00027.html Paul I will prepare the updated debdiff. I understand I am asking for an exception to the Release Team's first "usual" criteria for acceptance: * The bug you want to fix in stable must be fixed in unstable already (and not waiting in NEW or the delayed queue) I don't know if they will be willing to make an exception. Probably not before 11.1 comes out as I am sure they are busy dealing with the upcoming point release. I do hope they will read my whole report before deciding. What happened here is very unusual for Debian, and IMHO Debian would not be renowned for stability if it happened more often. Chuck
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
Control: severity -1 normal After reading some other bug reports, I now think this bug's severity should be normal, not important. Regards, Chuck
Bug#995341: release.debian.org: Xen dom0 does not power off on bullseye (stable)
ild I am just a user, not a developer, so I could only test the patch on my amd64 system for the amd64 package. Other architectures (i386, arm64, etc.) and crossbuilding (if this package is crossbuilt on buildd) need to be tested/verified before uploading. If you upload this patch (or another patch that does the same) you can close this bug and #994899. For more information about this problem, please see the messages in #994899: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899 and in #991967: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967 All the best, Chuck Zmudzinski diff -Nru xen-4.14.3/debian/changelog xen-4.14.3/debian/changelog --- xen-4.14.3/debian/changelog 2021-09-13 10:28:21.0 -0400 +++ xen-4.14.3/debian/changelog 2021-09-27 11:51:02.0 -0400 @@ -1,3 +1,12 @@ +xen (4.14.3-1~deb11u1.1) UNRELEASED; urgency=medium + + * Non-maintainer upload. + * debian/patches - move RPI4 patches into a separate directory + * debian/rules - disable RPI4 patches on amd64|i386 to fix #994899 + * debian/control - add Build Dependency quilt + + -- Chuck Zmudzinski Mon, 27 Sep 2021 11:51:04 -0400 + xen (4.14.3-1~deb11u1) bullseye-security; urgency=medium * Rebuild for bullseye-security diff -Nru xen-4.14.3/debian/control xen-4.14.3/debian/control --- xen-4.14.3/debian/control 2021-07-10 08:01:39.0 -0400 +++ xen-4.14.3/debian/control 2021-09-26 22:21:51.0 -0400 @@ -34,6 +34,7 @@ markdown, ocaml-native-compilers | ocaml-nox, ocaml-findlib, + quilt, Homepage: https://xenproject.org/ Vcs-Browser: https://salsa.debian.org/xen-team/debian-xen Vcs-Git: https://salsa.debian.org/xen-team/debian-xen.git diff -Nru xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch --- xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 2021-09-13 10:25:25.0 -0400 +++ xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 1969-12-31 19:00:00.0 -0500 @@ -1,105 +0,0 @@ -From: Stefano Stabellini -Date: Fri, 2 Oct 2020 13:47:17 -0700 -Subject: xen/rpi4: implement watchdog-based reset - -The preferred method to reboot RPi4 is PSCI. If it is not available, -touching the watchdog is required to be able to reboot the board. - -The implementation is based on -drivers/watchdog/bcm2835_wdt.c:__bcm2835_restart in Linux v5.9-rc7. - -Signed-off-by: Stefano Stabellini -Acked-by: Julien Grall -Reviewed-by: Bertrand Marquis -Tested-by: Roman Shaposhnik -CC: ro...@zededa.com -(cherry picked from commit 25849c8b16f2a5b7fcd0a823e80a5f1b590291f9) - xen/arch/arm/platforms/brcm-raspberry-pi.c | 61 ++ - 1 file changed, 61 insertions(+) - -diff --git a/xen/arch/arm/platforms/brcm-raspberry-pi.c b/xen/arch/arm/platforms/brcm-raspberry-pi.c -index f5ae58a..811b40b 100644 a/xen/arch/arm/platforms/brcm-raspberry-pi.c -+++ b/xen/arch/arm/platforms/brcm-raspberry-pi.c -@@ -17,6 +17,10 @@ - * GNU General Public License for more details. - */ - -+#include -+#include -+#include -+#include - #include - - static const char *const rpi4_dt_compat[] __initconst = -@@ -37,12 +41,69 @@ static const struct dt_device_match rpi4_blacklist_dev[] __initconst = - * The aux peripheral also shares a page with the aux UART. - */ - DT_MATCH_COMPATIBLE("brcm,bcm2835-aux"), -+/* Special device used for rebooting */ -+DT_MATCH_COMPATIBLE("brcm,bcm2835-pm"), - { /* sentinel */ }, - }; - -+ -+#define PM_PASSWORD 0x5a00 -+#define PM_RSTC 0x1c -+#define PM_WDOG 0x24 -+#define PM_RSTC_WRCFG_FULL_RESET0x0020 -+#define PM_RSTC_WRCFG_CLR 0xffcf -+ -+static void __iomem *rpi4_map_watchdog(void) -+{ -+void __iomem *base; -+struct dt_device_node *node; -+paddr_t start, len; -+int ret; -+ -+node = dt_find_compatible_node(NULL, NULL, "brcm,bcm2835-pm"); -+if ( !node ) -+return NULL; -+ -+ret = dt_device_get_address(node, 0, &start, &len); -+if ( ret ) -+{ -+printk("Cannot read watchdog register address\n"); -+return NULL; -+} -+ -+base = ioremap_nocache(start & PAGE_MASK, PAGE_SIZE); -+if ( !base ) -+{ -+printk("Unable to map watchdog register!\n"); -+return NULL; -+} -+ -+return base; -+} -+ -+static void rpi4_reset(void) -+{ -+uint32_t val; -+void __iomem *base = rpi4_map_watchdog(); -+ -+if ( !base ) -+return; -+ -+/* use a timeout of 10 ticks (~150us) */ -+writel(10 | PM_PASSWORD, base + PM_WDOG); -+val = readl(base + PM_RSTC); -+val &= PM_RSTC_WRCFG_CLR; -+val |= PM_PASSWORD | PM_RSTC_WRCFG_FULL_RESET; -+writel(val, base + PM_RSTC); -+ -+/* No sleeping, possibly ato
Bug#991967: Simply ACPI powerdown/reset issue?
This corrects typos - I referenced the wrong bug # in a few places. On 9/25/2021 11:27 PM, Elliott Mitchell wrote: Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b By main branch, I presume you mean the unstable 4.16 branch of Xen. Correct? (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Interesting. Looks like you are honing in on solving this bug. I notice at the beginning of this message you quoted an older message of mine which does not take into account that I have reported a new bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899 because I did come to the conclusion, as you did, that there are in fact two bugs. I wonder if the results of your modified Xen 4.14.3-1 with 4.19.181-1 and 4.19.194-3 on my hardware would be of help. I have, as you might recall, older (Haswell) intel, EFI boot system, and systemd for init/shutdown services. If I get the same result, then I would agree we are seeing a regression between those two versions of Linux. Otherwise, then there may also be some tests involving EFI vs. BIOS to do. Or, based on what I have learned at #994899, also possibly we need to check systemd vs. sysv-init. Do you want me to do the test on my hardware? Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. I can confirm it did hang on my hardware with this combination of Xen and Linux versions. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. And we already have two distinct bugs on BTS. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. I agree. I believe you. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. That looks a lot like #994899. Have you ruled out the possibility that this bug is #994899 in disguise? If so, how? Or do you think #994899 is a third bug? The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. Yeah, that's alot of work. That's how I found my solution for #994899. For that bug, since the working version was Xen 4.11 and the broken version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14. So that required a bit of detective work studying git logs, but in the end, I just tested 4.12, and it was good, then 4.13 and it was good. I also tested the first Debian version of 4.14, which was actually experimental on Debian if I recall correctly. It did not include the RPI4 patches, and it was good too. So I knew the bug was introduced sometime after that, and I soon identified the RPI4 patches as the place where the bug (#994899) first appeared on my hardware. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. When you decide what to do here, I would like to check it to see if it works on my hardware and if you don't hear anything from me, you can assume it worked fine on my hardware. Cheers, Chuck
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung I presume the Xen 4.14.3-1 you are referring to is not the official version, but the one patched with the three extra aforementioned commits. Note: I use quilt to manage the packages, and quilt rejected the last commit because the context within three lines of the patched code was changed. A goto bad was changed to goto done by another commit on the Xen unstable branch, so I fixed the patch file and changed the 'done' to 'bad' to get the third patch to succeed. Let's call this patched version of Xen version 4.14.3-1.1 I tried these on my hardware, which is a Haswell processor, EFI boot, and systemd for init, and my results are: Xen 4.14.3-1.1 with Linux 4.19.181-1: system reboots hung Xen 4.14.3-1.1 with Linux 4.19.194-3: system reboots hung Xen 4.14.3-1.1 with Linux 5.10.46-4: system reboots hung I still cannot reproduce this result, not even with the extra three commits. Perhaps it depends on differences in the BIOS or EFI, or maybe systemd vs. sysv. I share this result in case it is of help to you. Regards, Chuck Zmudzinski
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
A patch has been uploaded (see message #67). For more information, see message #34.
Bug#994899: patch
Patch is attached. (An improved patch from the one in message #55) changes from the patch in message #55: debian/rules: quilt pop 0026-... instead of quilt pop 14 This ensures builds succeed when the number of patches in debian/patches increases Patch generated by: debdiff xen_4.14.3-1~deb11u1.dsc xen_4.14.3-1~deb11u1.1.dsc > bug#994899.diff What it does: Rebuilds the xen packages without the RPI4 patches on amd64 and i386 Tested on: Native amd64 build Fixes this bug on my amd64 system Build Instructions: Since the .pc directory is changed in the new package, we need quilt to rebuild it correctly. So the following commands should work to build the packages. Start in an empty directory. if [ ! -e /usr/bin/quilt ]; then sudo apt install quilt; fi dget -x https://snapshot.debian.org/archive/debian-security/20210920T191155Z/pool/updates/main/x/xen/xen_4.14.3-1~deb11u1.dsc cd xen-4.14.3 dpkg-checkbuilddeps quilt pop -a cd .. patch -p0 < bug#994899.diff cd xen-4.14.3 quilt push -a debuild -i -us -uc -b To build the source package: debuild -i -us -uc -S diff -Nru xen-4.14.3/debian/changelog xen-4.14.3/debian/changelog --- xen-4.14.3/debian/changelog 2021-09-13 10:28:21.0 -0400 +++ xen-4.14.3/debian/changelog 2021-09-27 11:51:02.0 -0400 @@ -1,3 +1,12 @@ +xen (4.14.3-1~deb11u1.1) UNRELEASED; urgency=medium + + * Non-maintainer upload. + * debian/patches - move RPI4 patches into a separate directory + * debian/rules - disable RPI4 patches on amd64|i386 to fix #994899 + * debian/control - add Build Dependency quilt + + -- Chuck Zmudzinski Mon, 27 Sep 2021 11:51:04 -0400 + xen (4.14.3-1~deb11u1) bullseye-security; urgency=medium * Rebuild for bullseye-security diff -Nru xen-4.14.3/debian/control xen-4.14.3/debian/control --- xen-4.14.3/debian/control 2021-07-10 08:01:39.0 -0400 +++ xen-4.14.3/debian/control 2021-09-26 22:21:51.0 -0400 @@ -34,6 +34,7 @@ markdown, ocaml-native-compilers | ocaml-nox, ocaml-findlib, + quilt, Homepage: https://xenproject.org/ Vcs-Browser: https://salsa.debian.org/xen-team/debian-xen Vcs-Git: https://salsa.debian.org/xen-team/debian-xen.git diff -Nru xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch --- xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 2021-09-13 10:25:25.0 -0400 +++ xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 1969-12-31 19:00:00.0 -0500 @@ -1,105 +0,0 @@ -From: Stefano Stabellini -Date: Fri, 2 Oct 2020 13:47:17 -0700 -Subject: xen/rpi4: implement watchdog-based reset - -The preferred method to reboot RPi4 is PSCI. If it is not available, -touching the watchdog is required to be able to reboot the board. - -The implementation is based on -drivers/watchdog/bcm2835_wdt.c:__bcm2835_restart in Linux v5.9-rc7. - -Signed-off-by: Stefano Stabellini -Acked-by: Julien Grall -Reviewed-by: Bertrand Marquis -Tested-by: Roman Shaposhnik -CC: ro...@zededa.com -(cherry picked from commit 25849c8b16f2a5b7fcd0a823e80a5f1b590291f9) - xen/arch/arm/platforms/brcm-raspberry-pi.c | 61 ++ - 1 file changed, 61 insertions(+) - -diff --git a/xen/arch/arm/platforms/brcm-raspberry-pi.c b/xen/arch/arm/platforms/brcm-raspberry-pi.c -index f5ae58a..811b40b 100644 a/xen/arch/arm/platforms/brcm-raspberry-pi.c -+++ b/xen/arch/arm/platforms/brcm-raspberry-pi.c -@@ -17,6 +17,10 @@ - * GNU General Public License for more details. - */ - -+#include -+#include -+#include -+#include - #include - - static const char *const rpi4_dt_compat[] __initconst = -@@ -37,12 +41,69 @@ static const struct dt_device_match rpi4_blacklist_dev[] __initconst = - * The aux peripheral also shares a page with the aux UART. - */ - DT_MATCH_COMPATIBLE("brcm,bcm2835-aux"), -+/* Special device used for rebooting */ -+DT_MATCH_COMPATIBLE("brcm,bcm2835-pm"), - { /* sentinel */ }, - }; - -+ -+#define PM_PASSWORD 0x5a00 -+#define PM_RSTC 0x1c -+#define PM_WDOG 0x24 -+#define PM_RSTC_WRCFG_FULL_RESET0x0020 -+#define PM_RSTC_WRCFG_CLR 0xffcf -+ -+static void __iomem *rpi4_map_watchdog(void) -+{ -+void __iomem *base; -+struct dt_device_node *node; -+paddr_t start, len; -+int ret; -+ -+node = dt_find_compatible_node(NULL, NULL, "brcm,bcm2835-pm"); -+if ( !node ) -+return NULL; -+ -+ret = dt_device_get_address(node, 0, &start, &len); -+if ( ret ) -+{ -+printk("Cannot read watchdog register address\n"); -+return NULL; -+} -+ -+base = ioremap_nocache(start & PAGE_MASK, PAGE_SIZE); -+if ( !base ) -+{ -+printk("Unable to map watchdog register!\n"); -+
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
A patch has been uploaded (message #55). For more information, see message #34.
Bug#994899: patch
Patch is attached. Patch generated by: debdiff xen_4.14.3-1~deb11u1.dsc xen_4.14.3-1~deb11u1.1.dsc > bug#994899.diff What it does: Rebuilds the xen packages without the RPI4 patches on amd64 and i386 Tested on: Native amd64 build Fixes this bug on my amd64 system Build Instructions: Since the .pc directory is changed in the new package, we need quilt to rebuild it correctly. So the following commands should work to build the packages. Start in an empty directory. if [ ! -e /usr/bin/quilt ]; then sudo apt install quilt; fi dget -x https://snapshot.debian.org/archive/debian-security/20210920T191155Z/pool/updates/main/x/xen/xen_4.14.3-1~deb11u1.dsc cd xen-4.14.3 dpkg-checkbuilddeps quilt pop -a cd .. patch -p0 < bug#994899.diff cd xen-4.14.3 debuild -i -us -uc -b To build the source package: debuild -i -us -uc -S diff -Nru xen-4.14.3/debian/changelog xen-4.14.3/debian/changelog --- xen-4.14.3/debian/changelog 2021-09-13 10:28:21.0 -0400 +++ xen-4.14.3/debian/changelog 2021-09-26 22:22:56.0 -0400 @@ -1,3 +1,12 @@ +xen (4.14.3-1~deb11u1.1) UNRELEASED; urgency=medium + + * Non-maintainer upload. + * debian/patches - move RPI4 patches into a separate directory + * debian/rules - disable RPI4 patches on amd64|i386 to fix #994899 + * debian/control - add Build Dependency quilt + + -- Chuck Zmudzinski Sun, 26 Sep 2021 22:23:21 -0400 + xen (4.14.3-1~deb11u1) bullseye-security; urgency=medium * Rebuild for bullseye-security diff -Nru xen-4.14.3/debian/control xen-4.14.3/debian/control --- xen-4.14.3/debian/control 2021-07-10 08:01:39.0 -0400 +++ xen-4.14.3/debian/control 2021-09-26 22:21:51.0 -0400 @@ -34,6 +34,7 @@ markdown, ocaml-native-compilers | ocaml-nox, ocaml-findlib, + quilt, Homepage: https://xenproject.org/ Vcs-Browser: https://salsa.debian.org/xen-team/debian-xen Vcs-Git: https://salsa.debian.org/xen-team/debian-xen.git diff -Nru xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch --- xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 2021-09-13 10:25:25.0 -0400 +++ xen-4.14.3/debian/patches/0027-xen-rpi4-implement-watchdog-based-reset.patch 1969-12-31 19:00:00.0 -0500 @@ -1,105 +0,0 @@ -From: Stefano Stabellini -Date: Fri, 2 Oct 2020 13:47:17 -0700 -Subject: xen/rpi4: implement watchdog-based reset - -The preferred method to reboot RPi4 is PSCI. If it is not available, -touching the watchdog is required to be able to reboot the board. - -The implementation is based on -drivers/watchdog/bcm2835_wdt.c:__bcm2835_restart in Linux v5.9-rc7. - -Signed-off-by: Stefano Stabellini -Acked-by: Julien Grall -Reviewed-by: Bertrand Marquis -Tested-by: Roman Shaposhnik -CC: ro...@zededa.com -(cherry picked from commit 25849c8b16f2a5b7fcd0a823e80a5f1b590291f9) - xen/arch/arm/platforms/brcm-raspberry-pi.c | 61 ++ - 1 file changed, 61 insertions(+) - -diff --git a/xen/arch/arm/platforms/brcm-raspberry-pi.c b/xen/arch/arm/platforms/brcm-raspberry-pi.c -index f5ae58a..811b40b 100644 a/xen/arch/arm/platforms/brcm-raspberry-pi.c -+++ b/xen/arch/arm/platforms/brcm-raspberry-pi.c -@@ -17,6 +17,10 @@ - * GNU General Public License for more details. - */ - -+#include -+#include -+#include -+#include - #include - - static const char *const rpi4_dt_compat[] __initconst = -@@ -37,12 +41,69 @@ static const struct dt_device_match rpi4_blacklist_dev[] __initconst = - * The aux peripheral also shares a page with the aux UART. - */ - DT_MATCH_COMPATIBLE("brcm,bcm2835-aux"), -+/* Special device used for rebooting */ -+DT_MATCH_COMPATIBLE("brcm,bcm2835-pm"), - { /* sentinel */ }, - }; - -+ -+#define PM_PASSWORD 0x5a00 -+#define PM_RSTC 0x1c -+#define PM_WDOG 0x24 -+#define PM_RSTC_WRCFG_FULL_RESET0x0020 -+#define PM_RSTC_WRCFG_CLR 0xffcf -+ -+static void __iomem *rpi4_map_watchdog(void) -+{ -+void __iomem *base; -+struct dt_device_node *node; -+paddr_t start, len; -+int ret; -+ -+node = dt_find_compatible_node(NULL, NULL, "brcm,bcm2835-pm"); -+if ( !node ) -+return NULL; -+ -+ret = dt_device_get_address(node, 0, &start, &len); -+if ( ret ) -+{ -+printk("Cannot read watchdog register address\n"); -+return NULL; -+} -+ -+base = ioremap_nocache(start & PAGE_MASK, PAGE_SIZE); -+if ( !base ) -+{ -+printk("Unable to map watchdog register!\n"); -+return NULL; -+} -+ -+return base; -+} -+ -+static void rpi4_reset(void) -+{ -+uint32_t val; -+void __iomem *base = rpi4_map_watchdog(); -+ -+if ( !base ) -+return; -+ -+/* use a timeout of 10 ticks (~150us)
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
Added tag upstream. Explanation is in discussion at related bug #991967 here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#169 and here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#174 Briefly, since we are currently shipping a fork of Xen-4.14 on our unstable, testing, and stable versions of the hypervisor to better support arm devices but there is an annoying bug also in x86 (amd64) in these versions, IMO we should 1) Notify upstream of the fork we are doing and 2) Notify our users, especially on the stable branch, that our version of Xen is actually a fork of Xen-4.14. I know this can be discovered by reading the changelog, but to find it one must go back to the unstable version that was released back in December of 2020 to find where the fork started. Not many people (including me) would look there to try to find such a significant change to the package, especially on stable where ordinarily only vanilla security patches from upstream are in the changelog. So, as a courtesy to our users, I think the visibility of this change needs to be elevated to the status of at least a README.Debian file, if not an actual notification to the user by dpkg when installing. Of course the changelog should also note explicitly that this is a fork of Xen 4.14 in all the released versions that have patches from Xen upstream 4.16. Perhaps there is a way to also indicate this in the version name and number of the packages, but I do not know if there are conventions or policies to handle a version change that is really the start of a fork. If so, we should follow them.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/26/2021 8:46 AM, Chuck Zmudzinski wrote: On 9/25/2021 11:27 PM, Elliott Mitchell wrote: Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. In light of what I discovered while investigating the cause of bug #994899, I would tend to think calling Debian 4.14.2+25-gb6a8c4f72d-2 "vanilla" an interesting choice of words. To me, vanilla connotes boring, uninteresting. But that version of Debian Xen, and also the current version in the stable distribution, bullseye, are not boring or uninteresting as I have studied these versions and concluded they actually are now a fork of upstream Xen's 4.14 version, since they contain patches from upstream Xen's 4.16 unstable branch to better support the Raspberry Pi 4, as noted in the changelogs of those versions. So I am adding the tag upstream, Actually, I will add the upstream tag to the bug I reported in Xen, #994899, since we are talking about upstream Xen, not upstream Linux.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. In light of what I discovered while investigating the cause of bug #994899, I would tend to think calling Debian 4.14.2+25-gb6a8c4f72d-2 "vanilla" an interesting choice of words. To me, vanilla connotes boring, uninteresting. But that version of Debian Xen, and also the current version in the stable distribution, bullseye, are not boring or uninteresting as I have studied these versions and concluded they actually are now a fork of upstream Xen's 4.14 version, since they contain patches from upstream Xen's 4.16 unstable branch to better support the Raspberry Pi 4, as noted in the changelogs of those versions. So I am adding the tag upstream, and I suggest that the Debian Xen Team notify upstream Xen that we are planning a fork of Xen to better support popular arm devices and we are already shipping a testing version of it in our current bullseye release. We could tell upstream we are willing to stop this fork if they could assist us with backporting the reworking of the xen/arm/acpi and xen/x86/acpi code that is in upstream Xen 4.16 unstable to xen 4.14. We can tell them if they are interested in what we are doing, they can take a look at the work we are doing on our public development servers (salsa). For our own users, especially in the stable version, we should make a note of this fact in a README.Debian file and place it in an appropriate place of the binary packages. We should also note that there are encouraging results with this version for improved support on arm, but some tests indicate an annoying bug causing problems shutting down Domain 0 appear to have surfaced on x86 (amd64). For details, see bugs #991967 and #994899 on the Debian Bug Tracking System. I think this is the BEST way to truly proceed in accordance with the Debian Social Policy of courtesy and cooperation with the free software projects that are available to the public in our main repositories, and to properly inform our users what we are doing in our current Xen packages for unstable, testing, and stable.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-contained and needed for some highly popular ARM devices. I suspect that depends on how highly motivated Debian is to support those highly popular ARM devices not just with Linux, but with Linux as a Xen Dom0 on those devices. Even if they are highly popular devices, what matters, ultimately, I think, is if there is a reason for them to be popular as devices that run a Xen dom0. Then maybe there is a chance to get some patches into the Linux kernel for this purpose. Just my two cents, FWIW.
Bug#991967: Simply ACPI powerdown/reset issue?
On 9/25/2021 11:27 PM, Elliott Mitchell wrote: On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote: I presume you are suggesting I try booting 4.19.181-1 on the current version of Xen-4.14 for bullseye as a dom0. I am not inclined to try it until an official Debian developer endorses your opinion that the bug I am seeing is distinct from #991967, at which point I will report the bug I am seeing as a new bug. Chuck Zmudzinski you are getting rather close to my threshold for calling harrassment. You're not /quite/ there, but I'm concerned. Sorry if I offended you in some way, I didn't mean to. Since the purpose of the bug reports is to find and diagnose bugs, I did a bit of experimentation and made some observations. I checked out the Debian Xen source via git. I got the current "master" branch which is presently the candidate 4.14.3-1 version, which includes urgent fixes. The hash is: e7a17db0305c8de891b366ad3528e5a43015 On top of this I cherry-picked 3 commits from Xen's main branch: 5a4087004d1adbbb223925f3306db0e5824a2bdc 0f089bbf43ecce6f27576cb548ba4341d0ec46a8 bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b By main branch, I presume you mean the unstable 4.16 branch of Xen. Correct? (these can be retrieved via Xen's gitweb at https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is suitable for the `git am` command) With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and 4.19.194-3 (this system is presently mostly on oldstable). The results were: Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung Interesting. Looks like you are honing in on solving this bug. I notice at the beginning of this message you quoted an older message of mine which does not take into account that I have reported a new bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994899 because I did come to the conclusion, as you did, that there are in fact two bugs. I wonder if the results of your modified Xen 4.14.3-1 with 4.19.181-1 and 4.19.194-3 on my hardware would be of help. I have, as you might recall, older (Haswell) intel, EFI boot system, and systemd for init/shutdown services. If I get the same result, then I would agree we are seeing a regression between those two versions of Linux. Otherwise, then there may also be some tests involving EFI vs. BIOS to do. Or, based on what I have learned at #994899, also possibly we need to check systemd vs. sysv-init. Do you want me to do the test on my hardware? Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with Linux 4.19.181-1. I believe this combination would have hung during reboot. I can confirm it did hang on my hardware with this combination of Xen and Linux versions. As such, I believe there are in fact two distinct bugs being observed. The presence of EITHER of these is sufficient to cause hangs during powerdown or reboot. And we already have two distinct bugs on BTS. First, some patch originally from Linux's main branch breaks Xen reboots was backported somewhere between 4.19.181-1 and 4.19.194-3. This may either have been introduced before 5.10 diverged from main, or may also have been backported to 5.10. THIS is Debian bug #991967. I agree. I believe you. Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is valuable to ARM devices breaks reboots and powerdowns on x86. This is correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. Presently this has no Debian bug report. That looks a lot like #994889. Have you ruled out the possibility that this bug is #994889 in disguise? If so, how? Or do you think #994889 is a third bug? The first is presently unidentified, someone enthusiastic either needs to read git logs/source code, or bisect and build to find where it got broken. Yeah, that's alot of work. That's how I found my solution for #994889. For that bug, since the working version was Xen 4.11 and the broken version was Xen 4.14, the cause could have been in 4.12, 4.13, or 4.14. So that required a bit of detective work studying git logs, but in the end, I just tested 4.12, and it was good, then 4.13 and it was good. I also tested the first Debian version of 4.14, which was actually experimental on Debian if I recall correctly. It did not include the RPI4 patches, and it was good too. So I knew the bug was introduced sometime after that, and I soon identified the RPI4 patches as the place where the bug (#994889) first appeared on my hardware. The second we seem to have a fix. The only question is how many patches to cherry pick? bc141e8ca562 is non-urgent as it is merely superficial and not needed for functionality. 5a4087004d1a is a workaround for Linux kernel breakage, but how likely are we to see that fixed in the Linux kernel packages? The fix is well-
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
Lowered severity to minor because the information so far indicates the bug may only affect a limited set of hardware/software combinations. It does affect my system, but I have found a solution for it in accord with the Debian principles of free software. I understand the free software development world cannot stop everything it is doing to work on this little bug. Nevertheless, if Debian wishes to try to implement a true and full solution that fixes both this bug and #991976, I am willing to cooperate with anyone who will not accuse me of wrongdoing in this public forum without first discussing the matter with me in a private email. Cheers, Chuck Zmudzinski
Bug#994899: [Pkg-xen-devel] Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
Based on the technical information so far provided by the Debian community in this report and in the related bug #991967, I consider this bug closed. For me it is fixed. I found the solution for it on my hardware and shared it with the Debian community. I do not care if the official Debian developers implement my suggestion or not. I will not run Debian's version which is really a mix of Xen-4.14 stable and Xen-4.16 unstable. Instead I will run the version with the patch I suggested in this bug report, and AFAICT I will have a more stable and bug-free version of Xen than anyone who runs the current so-called stable version of Xen for Debian. Since the information about this bug is scattered in various places in the this bug report and in #991967, I will say this bug concerned the following hardware/software configuration: Motherboard/CPU: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. Init system: systemd Xen domain type: Domain 0 Linux Kernel Versions: all Linux kernel versions of both buster and bullseye running as a dom0 I tested exhibit the bug, but no Debian stable Linux kernel version since 4.19.0-16 running as a dom0 exhibited the bug with the Xen hypervisor for buster. If you are experiencing the symptoms described in this bug report, the solution I proposed for Debian in both this bug report and in #991976 might fix the bug if your hardware and software configuration is similar to what I have described above. However, to fix it you will have to test it yourself, and that would involve building the Xen package for bullseye from source, unless and until Debian decides to implement the fix I proposed in my original bug report. if you use BIOS boot and/or sysv-init instead of EFI boot and/or systemd, it is likely the fix I have described here will not fix the bug, and also if you have a newer intel cpu or an amd cpu the fix might not work on your hardware, but it might work if you use EFI and systemd in those latter cases. If you try the solution I proposed here and it does not solve your issue, I would suggest that you look at #991976 before reporting a new bug. I will not take action to close this bug; that is up to the Debian developers to decide. Instead, I will select this message to be a summary of the bug. Happy computing on Debian, Chuck Zmudzinski
Bug#994899: [Pkg-xen-devel] Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
On 9/23/2021 5:50 PM, Diederik de Haas wrote: On donderdag 23 september 2021 21:54:49 CEST Chuck Zmudzinski wrote: While I did respond point by point privately to the author Don't do that. Any discussion relevant to the bug should be sent to the bug itself so that everyone has all the relevant information. I actually learned myself how to build Xen packages so I could assist you as good as possible. You won't see any more effort or participation on my part. Bye. Sorry to hear that. Cheers, Chuck Zmudzinski
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
On 9/23/2021 12:49 PM, Diederik de Haas wrote: Control: tag -1 -newcomer Control: tag -1 -upstream On woensdag 22 september 2021 21:50:16 CEST Chuck Zmudzinski wrote: Finally, I tag the bug newcomer simply because there is a known solution but That's what the 'patch' tag is for. 'newcomer' is similar to 'good first issue', which this is not. Hence removing the 'newcomer' tag. the Debian Xen package maintainer seems to want the Debian Kernel Team to find a way to fix the bug in the Linux kernel, as evidenced by the recent discussion over at #991976, instead of implementing the fix in the Xen hypervisor as proposed here. You're claiming, possibly correctly, that the issue is with the Debian Xen package, not with the upstream code, so removing the 'upstream' tag as well. It's good that you filed this bug against the Debian Xen package because it's (quite) possible that there is both an issue with the Linux kernel which #991976 is about and with the Xen package, what this issue will be about. They way you went about it ... not so good. By filing a bug you want others to spend their free time to (help) fix an issue you are having (and in this case, me too). To make the best use of their time and your chances of it being fixed, you should state the problem as short and succinct as possible. And in the case of a 'patch' as may be the case here, the actual patch. You did neither. You did go on a rant where you made (incorrect) claims and accusations. I don't think that helps your goal, which is getting this issue fixed. Do you? F.e. you make claims on the Debian Xen package maintainers' position, while this is the first time they've been made (explicitly) aware of this issue. So they did not have a chance to (formulate and) state their position. I had written a point-by-point description of what *I* think was wrong with your bug report, but that would only keep the negative cycle in place. FTR: I'm just a contributor to Debian (by participating in this bug), just like you are (by submitting a bug). And so is Elliot. For uploading packages to the Debian archive you *do* need special permissions. For almost all other things, everyone can contribute. Package: xen-hypervisor-4.14-amd64 Version: 4.14.3-1~deb11u1 Severity: important Since I am not a developer, I only tagged this bug important, but if I were a developer, I would tag it serious and implement a fix that does what I will propose below. https://www.debian.org/Bugs/Developer#severities explains what the severity levels entail. There is no correlation between severity and some (claimed) role within the project. IMO this bug is *at most* important. Let's leave it to the Debian Xen package maintainers to change the severity if they think that's appropriate. I refer you here for my first description of the problem to the Debian Bug System: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#34 IMO, this is just wrong. You've filed a new bug so make the exact problem the primary part of this bug. Don't ask of others to read a '50 page document' and expect them to distill YOUR problem themselves. Doing a copy+paste of the *relevant* part is absolutely fine. So please reply to this with the following (minimal) info: - Your hardware - Whether you use BIOS or UEFI - Your init system - What you did and what the result of that was. Item 2 & 3 may seem 'odd' at first, but should become clear later on. another Debian user confirmed the bug on bullseye: Yep. If I do 'poweroff' on my Xen server, it looks like it does the whole shutdown procedure correctly, but it doesn't actually poweroff the machine. I can drill down the cause of this bug in the stable version of debian to a series of nine commits from upstream in order to improve Raspberry Pi 4 support in version 4.14.0+88-g1d1d1f5391-1. Do those 9 commits correspond to 9 patches from the /debian/patches/ directory? If so, which 9? If you add the 'patch' tag, do indeed include the patch in the bug report. I built Xen packages based on 4.14.3-1~deb11u1 but remove patches 0029-0034, but after installing those packages and rebooting into my patched version, my Xen server still did NOT power off. Other patches didn't seem relevant *to me*, but I can be wrong. If you share your changes, I can try whether that will fix the problem with me (too). My Xen server uses BIOS (not UEFI, which I think you do) and has sysv-init as init system. That may be relevant as well. So the bug was introduced in the Debian Xen unstable/testing package on 15 Dec 2020 according to the changelog. You *think* that was the case? Or did your Bullseye system actually poweroff correctly when installing version 4.14.0+80-gd101b417b7-1 or earlier? That was the version before the RPi4 related patches were added. Version N-1=good, version N=b
Bug#994899: xen-hypervisor-4.14-amd64 breaks system poweroff on bullseye
Package: xen-hypervisor-4.14-amd64 Version: 4.14.3-1~deb11u1 Severity: important Tags: patch newcomer upstream Dear Maintainer, This bug is related to #991976, reported by Elliott Mitchell, who happens to be the person who requested the patches that are causing this bug. I understand he is a Debian Xen developer. Since I am not a developer, I only tagged this bug important, but if I were a developer, I would tag it serious and implement a fix that does what I will propose below. I hereby humbly request that you elevate this bug to serious, since it is entirely wrong to release software that causes a modern workstation/server to not power down properly and renders it unable to be managed remotely, which is what this bug does. A bug like this is normal on an unstable or testing distribution, but unacceptable/serious on the current stable release. I refer you here for my first description of the problem to the Debian Bug System: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#34 I also point out that another Debian user confirmed the bug on bullseye: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#54 Elliott is of the opinion that what I am seeing is a bug distinct from #991976. I am inclined to agree, as I have not been able to reproduce it the way he describes it, although the symptoms he sees are very similar. That is why I am submitting a new report. I can drill down the cause of this bug in the stable version of debian to the Debian Xen Team's decision to include a series of nine commits from upstream in order to improve Raspberry Pi 4 support in version 4.14.0+88-g1d1d1f5391-1. So the bug was introduced in the Debian Xen unstable/testing package on 15 Dec 2020 according to the changelog. I understand the original reporter of #991976 wants to keep these patches in the stable version of Xen to better support the Raspberry Pi 4, and that he is a Debian Xen developer. But I will strenuously and respectfully disagree with any decision by the Debian Xen Team to not apply a very reasonable compromise solution. Over on #991967, I argued passionately for removing the nine Raspberry Pi 4 patches from the stable Xen version because, and it is still my opinion that experiments with patches from unstable upstream branches is not appropriate for a package in a stable version. That is why I would expect the release team to tag this bug as serious even if the Debian Xen Team refuses to tag it as serious. Nevertheless, I propose the following compromise: Simply ship a package for the stable version that omits the nine Raspberry Pi 4 patches from unstable upstream while building for the amd64|i386 architectures. I was able to implement such a fix even though I am not an official developer in just a few hours. It is really a trivial fix, all I did was add a rule in debian/rules to use quilt to disable the nine patches on amd64|i386. I made it easy by moving the nine RP4 patches from debian/patches to debian/patches/rpi4 and so could I use sed s/rpi4/#rpi4/ debian/series to disable the patches for the amd64|i386 case. I am sure there are other ways to implement the fix, and it really is trivial, it would fix this bug and still allow for the Raspberry Pi 4 patches to be included where they are needed which I believe is in the arm64 architecture. I also tested the current unstable branch from Xen upstream, which is xen-c76cfad, which unstable calls Xen-4.16-unstable. I tested the current bullseye kernel as a dom0 on that upstream Xen-4.16 hypervisor and did not see the bug, so this most definetely is NOT an upstream bug. It is a Debian Xen packaging bug. I expect that perhaps some commits on the Xen-4.16 upstream branch that are missing on the Xen-4.14 branch might also fix this bug, but until such a solution is found, I suggest the aforementioned solution as a workaround. The reason I tagged this bug as upstream is that I think it would be adviseable to make upstream aware that our current xen-4.14 package is not really a true Xen-4.14 but one with some patches from Xen-unstable that are causing this bug, and perhaps they can help eventually find the best solution for their Xen-4.14 stable branch. Finally, I tag the bug newcomer simply because there is a known solution but the Debian Xen package maintainer seems to want the Debian Kernel Team to find a way to fix the bug in the Linux kernel, as evidenced by the recent discussion over at #991976, instead of implementing the fix in the Xen hypervisor as proposed here. Regards, Chuck Zmudzinski *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? * What exactly did you do (or not do) that was effective (or ineffective)? * What was the outcome of this action? * What outcome did you expect instead? *** End of the template - remove these template lines *** -- System Information: Debian Release: 11.0 APT prefers stable-updates APT policy: (500,
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/24/2021 7:12 PM, Ben Hutchings wrote: On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote: Ben Hutchings writes: I think a proper fix would be one of: a. If the Xen virtual keyboard driver is advertising capabilities it doesn't have, stop it doing that. b. Change the implementation of modalias attributes to allow longer values. It's not clear to me whether the Xen driver is advertising correctly or not. If it is, then�the solution should be b, but that may be too disruptive a change to the kernel. So a reasonable workaround might be: c. Change the input subsystem to limit the length of the capabilities part of the modalias. The problem with a) is that the Xen keyboard is not a physical keyboard and so it has no way of knowing what keys it actually has. It is a fake input device designed to pass through whatever input the Xen hypervisor sends down. As such, any key could come in. If it doesn't advertise that it has all of these keys, then they would not be accepted by libinput when the hypervisor sends them down. Right, that's what I feared. xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC, KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654 keys and 2362 bytes in the modalias. This seems to be the heart of the problem: libinput was designed assuming that all keyboards can and must report what keys are actually present, and then libinput tries to cram that information into the modalias rather than some other sysfs attribute as it should ( or not at all... I still don't see how this information is actually supposed to be useful to userspace ). I think modaliases aren't intended to be interpreted by user-space, other than processing wildcards when matching to modules. For input devices, the same information is available through other variables in the uevent, in a more compact form. The information *is* useful for user-space; e.g. in initramfs-tools we recognise keyboard devices and add their drivers to the initramfs but ignore other input devices. As for b), the problem isn't with the modalias attribute itself, but when the kernel tries to copy it into the environment block for the udev callout. The environment block is only a single page, and so limited to 4 KB. And that's for everything else that goes into the environment, not just the modalias. Text-based sysfs attributes are limited to a page, but udev receives uevents through netlink, not sysfs. The current limit on the environment of a uevent appears to be 2 KB (UEVENT_BUFFER_SIZE defined in ). That seems like it *might* be easier to change, so long as user-space doesn't have a similar limit. I looked into systemd/udev, and it seems to use an 8 KB buffer for receiving uevents: https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 But as a first step I think increasing the kernel buffer size to 4 KB would be enough. Perhaps someone could test whether this patch to the domU kernel makes udev happier: --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -30,7 +30,7 @@ #define UEVENT_HELPER_PATH_LEN 256 #define UEVENT_NUM_ENVP 64 /* number of env pointers */ -#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */ +#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */ #ifdef CONFIG_UEVENT_HELPER /* path to the userspace helper executed on an event */ --- END --- ? Ben. Even though this patch has been tested to apparently fix this bug and the bug has been elevated to important and tagged patch and upstream, AFAICT there is no action yet upstream or anywhere else after more than three weeks. Is this patch dead as a possible fix for this bug? Best wishes, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/21/2021 9:13 AM, Chuck Zmudzinski wrote: On 9/20/2021 10:37 PM, Elliott Mitchell wrote: On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. FWIW, I tried this. Sorry, not only does this not fix things, when I shutdown the dom0 running with the official Debian 4.19.181-1 kernel on the current official Debian Xen-4.14 hypervisor, the dom0 not only did not power off, it did not even reach the systemd poweroff target. Slight correction - after a few minutes, it did finally reach the systemd poweroff target, but the power did not turn off. Yet, it works perfectly on the official Debian Xen-4.11 hypervisor. Again, my tests cannot confirm that there is a bug in src:linux, the only common denominator for this bug in all my testing is src:xen, the and it appears in all the 4.14 Xen versions for bullseye, for every single Linux version tested. Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/2021 10:37 PM, Elliott Mitchell wrote: On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. FWIW, I tried this. Sorry, not only does this not fix things, when I shutdown the dom0 running with the official Debian 4.19.181-1 kernel on the current official Debian Xen-4.14 hypervisor, the dom0 not only did not power off, it did not even reach the systemd poweroff target. Yet, it works perfectly on the official Debian Xen-4.11 hypervisor. Again, my tests cannot confirm that there is a bug in src:linux, the only common denominator for this bug in all my testing is src:xen, the and it appears in all the 4.14 Xen versions for bullseye, for every single Linux version tested. Chuck
Bug#991967: linux-src 4.19.194-3 breaks Xen Dom0 powerdown and reboot
On 9/21/21 7:22 AM, Fr. Chuck Zmudzinski, C.P.M. wrote: On Sat, 7 Aug 2021 08:40:14 +0200 Salvatore Bonaccorso wrote: > Control: tags -1 + moreinfo > > Hi, > > On Fri, Aug 06, 2021 at 11:50:54AM -0700, Elliott Mitchell wrote: > > Package: src:linux > > Version: 4.19.194-3 > > Control: affects -1 src:xen > > > > SSIA. Previous versions of 4.19 had no issues (4.19.181-1 according to > > notes), but this cropped up with 4.19.194-3 (-1 and -2 weren't tested). > > > > When a Xen domain 0 tries to reboot or powerdown the computer, it hangs > > with the display off, but the power supply is active. > > > > I'm rebuilding from source, so I imagine this also effects > > linux-image-4.19.0-17-amd64. > > Can you please try to bisect which commit introduced the issue? Does > it affect as well current upstream 4.19.201? > > Regards, > Salvatore > > Dear Salvatore, As you have noticed, much more information about this bug has been added to this bug report, but the original reporter is of the opinion that much of that new information concerns a bug related to but distinct from the bug he reported. Both bugs have the same symptom: dom0 does not power down when shutting down the system, and it is clear that both bugs are related to x86 acpi code in either the Linux kernel or in the Xen hypervisor. But I cannot reproduce his original bug which occurred in Linux 4.19.194-3 on Xen-4.11 from buster. I have only seen the bug in Xen-4.14 for bullseye, and I always see it with Xen-4.14 regardless of the Linux kernel version. As far as I can tell, another participant in this bug report has reproduced the behavior I am seeing, but not the behavior the original reporter is seeing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#54 Would you endorse the original reporter's belief that there are two distinct bugs being discussed here? If so, I would be inclined to report the bug I am seeing as a distinct but related bug in the bullseye version of src:xen. Otherwise, I respectfully ask that you reclassify this as a bug in src:xen, since the original reporter has not been able to identify a commit in src:linux that caused the bug and no one has been able to reproduce the bug on Xen-4.11/Linux-4.19.194-3. Regards, Chuck Zmudzinski Allow me to propose the following arguments in favor of changing this to a bug in src:xen: 1) The original report of this bug in src:linux version 4.19.194-3 with Xen 4.11 has not been reproduced by anyone. 2) The same symptom has been reproduced in recent versions of src:xen for bullseye, with Xen version 4.14.x 3) For the future, what is the point of trying to fix a bug in oldstable? Why not concern ourselves with fixing the bug as it now appears in stable? Regards, Chuck Zmudzinski
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 10:12 PM, Chuck Zmudzinski wrote: On 9/20/21 6:29 PM, Chuck Zmudzinski wrote: On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try. I have an encouraging result. I found a very simple patch to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff bug on my system and it should not affect the arm patches at all: -- This patch partially reverts previous patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This hopefully fixes #911976 --- a/xen/arch/x86/acpi/lib.c 2021-09-20 16:49:08.0 -0400 +++ b/xen/arch/x86/acpi/lib.c 2021-09-20 16:25:05.572038000 -0400 @@ -46,10 +46,6 @@ if ((phys + size) <= (1 * 1024 * 1024)) return __va(phys); - /* No further arch specific implementation after early boot */ - if (system_state >= SYS_STATE_boot) - return NULL; - offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; set_fixmap(FIX_ACPI_END, phys); -- Further testing with this patch revealed a problem. Although this simple patch causes dom0 to poweroff when shutting down, on the next reboot the system dropped to single-user shell because it mixed up my ssd and my hard disk. Normally the system assigns my SSD as /dev/sda and my hard disk as /dev/sdb. But on the first reboot after running the Xen hypervisor, the system reversed them so my SSD was /dev/sdb and my hard disk was /dev/sda. Since the EFI partition, which is a vfat partition, is on the SSD and in /etc/fstab I ask to mount it from the /dev/sda1 partition, it is now at /dev/sdb1, and the first partition is not a vfat partition on the hard disk so the system drops to a root shell for system maintenance. This switching of the devices on the subsequent reboot is another symptom of this bug I have seen in the past, and usually the ordinary behavior is restored on the next reboot or after resetting and powering off or unplugging from power. So this patch does not really fix the bug reliably. To clarify things, I saw this strange behavior of the system switching the disk devices with this patch under the following conditions: 1) Boot using this simple patch - dom0 shuts down properly 2) Boot using Elliott's suggested patch in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#94 3) It was when booting using Elliott's suggested patch that I saw the drop to single-user root for system maintenance. Moreover, Elliott's suggested patch did not fix the dom0 power off bug. So it might be the case that this simple patch would work for both amd64 and arm devices nicely, but Elliott refuses to test it with his arm devices. Sigh.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 10:37 PM, Elliott Mitchell wrote: On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. I presume you are suggesting I try booting 4.19.181-1 on the current version of Xen-4.14 for bullseye as a dom0. I am not inclined to try it until an official Debian developer endorses your opinion that the bug I am seeing is distinct from #991967, at which point I will report the bug I am seeing as a new bug. Regards, Chuck Zmudzinski
Bug#991967: linux-src 4.19.194-3 breaks Xen Dom0 powerdown and reboot
On Sat, 7 Aug 2021 08:40:14 +0200 Salvatore Bonaccorso wrote: > Control: tags -1 + moreinfo > > Hi, > > On Fri, Aug 06, 2021 at 11:50:54AM -0700, Elliott Mitchell wrote: > > Package: src:linux > > Version: 4.19.194-3 > > Control: affects -1 src:xen > > > > SSIA. Previous versions of 4.19 had no issues (4.19.181-1 according to > > notes), but this cropped up with 4.19.194-3 (-1 and -2 weren't tested). > > > > When a Xen domain 0 tries to reboot or powerdown the computer, it hangs > > with the display off, but the power supply is active. > > > > I'm rebuilding from source, so I imagine this also effects > > linux-image-4.19.0-17-amd64. > > Can you please try to bisect which commit introduced the issue? Does > it affect as well current upstream 4.19.201? > > Regards, > Salvatore > > Dear Salvatore, As you have noticed, much more information about this bug has been added to this bug report, but the original reporter is of the opinion that much of that new information concerns a bug related to but distinct from the bug he reported. Both bugs have the same symptom: dom0 does not power down when shutting down the system, and it is clear that both bugs are related to x86 acpi code in either the Linux kernel or in the Xen hypervisor. But I cannot reproduce his original bug which occurred in Linux 4.19.194-3 on Xen-4.11 from buster. I have only seen the bug in Xen-4.14 for bullseye, and I always see it with Xen-4.14 regardless of the Linux kernel version. As far as I can tell, another participant in this bug report has reproduced the behavior I am seeing, but not the behavior the original reporter is seeing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#54 Would you endorse the original reporter's belief that there are two distinct bugs being discussed here? If so, I would be inclined to report the bug I am seeing as a distinct but related bug in the bullseye version of src:xen. Otherwise, I respectfully ask that you reclassify this as a bug in src:xen, since the original reporter has not been able to identify a commit in src:linux that caused the bug and no one has been able to reproduce the bug on Xen-4.11/Linux-4.19.194-3. Regards, Chuck Zmudzinski
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 6:29 PM, Chuck Zmudzinski wrote: On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try. I have an encouraging result. I found a very simple patch to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff bug on my system and it should not affect the arm patches at all: -- This patch partially reverts previous patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This hopefully fixes #911976 --- a/xen/arch/x86/acpi/lib.c 2021-09-20 16:49:08.0 -0400 +++ b/xen/arch/x86/acpi/lib.c 2021-09-20 16:25:05.572038000 -0400 @@ -46,10 +46,6 @@ if ((phys + size) <= (1 * 1024 * 1024)) return __va(phys); - /* No further arch specific implementation after early boot */ - if (system_state >= SYS_STATE_boot) - return NULL; - offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; set_fixmap(FIX_ACPI_END, phys); -- Further testing with this patch revealed a problem. Although this simple patch causes dom0 to poweroff when shutting down, on the next reboot the system dropped to single-user shell because it mixed up my ssd and my hard disk. Normally the system assigns my SSD as /dev/sda and my hard disk as /dev/sdb. But on the first reboot after running the Xen hypervisor, the system reversed them so my SSD was /dev/sdb and my hard disk was /dev/sda. Since the EFI partition, which is a vfat partition, is on the SSD and in /etc/fstab I ask to mount it from the /dev/sda1 partition, it is now at /dev/sdb1, and the first partition is not a vfat partition on the hard disk so the system drops to a root shell for system maintenance. This switching of the devices on the subsequent reboot is another symptom of this bug I have seen in the past, and usually the ordinary behavior is restored on the next reboot or after resetting and powering off or unplugging from power. So this patch does not really fix the bug reliably.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try. I have an encouraging result. I found a very simple patch to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff bug on my system and it should not affect the arm patches at all: -- This patch partially reverts previous patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This hopefully fixes #911976 --- a/xen/arch/x86/acpi/lib.c 2021-09-20 16:49:08.0 -0400 +++ b/xen/arch/x86/acpi/lib.c 2021-09-20 16:25:05.572038000 -0400 @@ -46,10 +46,6 @@ if ((phys + size) <= (1 * 1024 * 1024)) return __va(phys); - /* No further arch specific implementation after early boot */ - if (system_state >= SYS_STATE_boot) - return NULL; - offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; set_fixmap(FIX_ACPI_END, phys); -- Can you try this patch to src:xen and see if your arm devices are OK with it?
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. You're referencing several software versions which are mismatches for #991967. #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3, but not Linux kernel 4.19.181. The fact it correlates with a Linux kernel update rather strongly points to the Linux kernel. I could believe the situation is partially the fault of both though. I don't see it with Xen-4.11 and Linux kernel 4.19.194-3 which is the current default dom0 configuration on Debian buster, but I do see it with Debian's version of Xen-4.14 and either Linux kernel 4.19.194-3 from buster or Linux kernel 5.10.46-4 from bullseye as the dom0. So I only saw it with the update of the Xen hypervisor from 4.11 to 4.14. Of course you have different hardware and a different acpi implementation which is also likely to be a factor that determines whether or not the dom0 poweroff bug manifests itself. I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I did try to remove this single patch from the xen build using quilt, but quilt was not happy when it tried to apply the subsequent arm patch, so I just removed all the subsequent arm patches to keep quilt happy with my modified xen src tree. I will try it now, though. If it is this small a delta that is causing the problem on x86/amd64, then maybe we can come up with a workaround in src:xen that is acceptable for both arm and x86/amd64. I think this bug should be re-classified as a bug in src:xen. There could be a separate bug in src:xen, but that is not #991967. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4.14 branch upstream should not be included in the xen package for Debian stable. It was requested since someone trying to have Xen operational on a device needed those for operation. Rather a lot of bugfix or very small standalone feature patches get cherry-picked. Presently I haven't been convinced this is a Xen bug (though it does effect Xen installations). Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux kernel? I'm suspecting got incorrectly backported on the Linux side (alternatively the Xen project seems a bit poor at keeping needed patches in Linux). Yes, I recently built and tested a slightly modified Debian bullseye kernel to test a fix for #983357: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357 If you have a patch for Debian's 5.10 bullseye kernel that might fix the dom0 poweroff bug I am seeing on bullseye with Debian's current Xen 4.14, I am willing to try it out on my system as an alternate fix from the fix I discovered in src:xen that unfortunately removes arm patches that are needed by some devices.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 9:30 PM, Chuck Zmudzinski wrote: On 9/19/2021 4:53 PM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 03:54:01PM -0400, Chuck Zmudzinski wrote: On 9/19/2021 1:29 PM, Elliott Mitchell wrote: Have you tried memory ballooning with PVH or HVM domains? That combination has been reliably crashing Xen for me for a while. Apparently few others have run into it, yet it is reliable for me. Have you tried the combination? Works? Panics? I have not tried ballooning HVM or PVH domains. If the Xen hypervisor is crashing when ballooning unprivileged domains, doesn't that support my belief that there are bugs in src:xen rather than in src:linux? No. I still think the patches to fix a panic on devices using the arm architecture are a bit aggressive for the Debian Xen package for Debian stable. Those patches upstream are intended for Xen unstable, which is currently Xen 4.16. Such patches do not belong in a stable Xen 4.14 package for Debian stable, especially after it can be proven they cause a regression for Xen users of amd64 devices, the regression being that they break the proper shutdown functioning of amd64 devices. I think the correct Debian way to support the arm devices that panic on a true upstream Xen 4.14 hypervisor without the patches for arm that cause dom0 to not power off properly on amd64 is by first testing the arm patches as part of a new Xen 4.16 unstable Xen package for Debian unstable, then follow ordinary development procedures for porting Xen 4.16 to bookworm/testing, and then finally a backport of Xen 4.16 to bullseye. That is the only way I can see this being done without causing grief to Xen users who want a stable Xen on a stable Debian, unless upstream can help with porting the arm patches back to Xen 4.14 in such a way that they don't break things on amd64. This was also deliberately not copied to #991967 since this is unrelated. I'm concerned this second one might be Debian, but the small delta makes me think it likely originates from upstream Xen. I was wondering whether you had seen it since I haven't found other reports. (note, if you try recreating, this is a Xen panic, all domains get lost) This is off-topic for bug #991968. Regards, Chuck Also off-topic for bug #991967 - sorry about the typo. Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 4:53 PM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 03:54:01PM -0400, Chuck Zmudzinski wrote: On 9/19/2021 1:29 PM, Elliott Mitchell wrote: Have you tried memory ballooning with PVH or HVM domains? That combination has been reliably crashing Xen for me for a while. Apparently few others have run into it, yet it is reliable for me. Have you tried the combination? Works? Panics? I have not tried ballooning HVM or PVH domains. If the Xen hypervisor is crashing when ballooning unprivileged domains, doesn't that support my belief that there are bugs in src:xen rather than in src:linux? No. I still think the patches to fix a panic on devices using the arm architecture are a bit aggressive for the Debian Xen package for Debian stable. Those patches upstream are intended for Xen unstable, which is currently Xen 4.16. Such patches do not belong in a stable Xen 4.14 package for Debian stable, especially after it can be proven they cause a regression for Xen users of amd64 devices, the regression being that they break the proper shutdown functioning of amd64 devices. I think the correct Debian way to support the arm devices that panic on a true upstream Xen 4.14 hypervisor without the patches for arm that cause dom0 to not power off properly on amd64 is by first testing the arm patches as part of a new Xen 4.16 unstable Xen package for Debian unstable, then follow ordinary development procedures for porting Xen 4.16 to bookworm/testing, and then finally a backport of Xen 4.16 to bullseye. That is the only way I can see this being done without causing grief to Xen users who want a stable Xen on a stable Debian, unless upstream can help with porting the arm patches back to Xen 4.14 in such a way that they don't break things on amd64. This was also deliberately not copied to #991967 since this is unrelated. I'm concerned this second one might be Debian, but the small delta makes me think it likely originates from upstream Xen. I was wondering whether you had seen it since I haven't found other reports. (note, if you try recreating, this is a Xen panic, all domains get lost) This is off-topic for bug #991968. Regards, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 1:29 PM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. Actually hardware which is pretty different from mine, so you may run into distinct bugs. Have you tried PVH or HVM domains? HVM domains: Yes, and they work normally on all Debian versions I have tried.. PVH domains: No, I have not tried these on Debian. Have you tried memory ballooning with PVH or HVM domains? That combination has been reliably crashing Xen for me for a while. Apparently few others have run into it, yet it is reliable for me. Have you tried the combination? Works? Panics? I have not tried ballooning HVM or PVH domains. If the Xen hypervisor is crashing when ballooning unprivileged domains, doesn't that support my belief that there are bugs in src:xen rather than in src:linux? Regards, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 10:56 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso wrote: > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > An experiment lead to a potential alternative explanation for #991967. > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > was tried on a UEFI system and the issue wasn't observed) > > Following up on https://bugs.debian.org/991967#12 > > Did you succeeded in bisecting the issue as you seem to have it > reproducible? I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Nope. As per the report the problem appeared with kernel 4.19.194-3 and at the time using Xen 4.11. The kernel you're listing is rather more recent, which might suggest a patch which had been backported from 5.x to 4.19. I could believe a Xen security update being the trigger though (I don't recall there being one at the right time, but I wouldn't rule it out). Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. Just to make sure, the kernel you were testing was 4.19.194-3? The issue didn't manifest with kernels earlier than that. I will check again with a buster dom0 when I get a chance, probably late tonight or tomorrow. I think it was 4.19.194-3 if that is the latest buster kernel because I don't think there has been an update to the buster kernel since I tested it. Could be we're seeing distinct bugs. I could agree if the problem shows up on my system with the 4.19.194-3 kernel dom0 on xen-4.11, but if not, then it is probably the same bug, a bug that is in src:xen, not src:linux. This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. While that commit modifies the code path the processor takes, the modified path appears identical. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4.14 branch upstream should not be included in the xen package for Debian stable. Some people are asking for those. Those are bugfixes for an extremely popular device which panics on boot without the patches. The raspberry pi, I presume. Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees were modified in a way which broke Xen 4.14 on ARM64. The change violated Linux's own standards for device-trees, yet still appeared in a stable branch. In other news, if you see device-trees compared to ACPI tables, they're not very comparable. 99% of ACPI tables work for all versions of all OSes. Any given device-tree is only likely to work for a single version of a single OS. While a useful abstraction for portions of kernel code, device-trees are utter garbage compared to ACPI tables. Well, now we are at Debian stable with 5.10.x for linux and 4.14.x for xen, so we are kind of stuck with these versions on Debian stable now. I am all for tweaking the Debian stable packages to support raspberry and amd64. The question is, what is the quickest and least disturbing way to fix it now? All the best, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 1:05 AM, Chuck Zmudzinski wrote: Hello Elliott and Salvatore, I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. I also found a fix in src:xen: I noticed the series of patches in debian/patches of the 4.14.2+25-gb6a8c4f72d-2 version of src:xen (and earlier versions of xen-4.14 on Debian) have several patches backported from the unstable branch of xen upstream. By removing some of these patches from the patches series of the src:xen package, the dom0 shuts down as expected on my ASRock Haswell motherboard. I rebuilt the src:xen package after removing the following patches from the debian/patches series and the result was that the computer shuts down as expected if I boot using the patched hypervisor: 0027-xen-rpi4-implement-watchdog-based-reset.patch 0028-tools-python-Pass-linker-to-Python-build-process.patch 0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch 0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch 0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch 0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch 0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch 0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch Most of these patches seem unrelated to the amd64 architecture and instead affect the arm architecture, and removing all these patches is probably more than is needed to fix this bug, but I removed them all because I could not find them upstream on the 4.14 branch but instead only saw them on the xen unstable branch upstream (I did not check if they are on the 4.15 branch upstream), and I wanted to test a true upstream 4.14 version without these seemingly aggressive patches added by Debian from the unstable branch of xen upstream, and I discovered by being more conservative and not adding these patches from the unstable branch upstream fixed the problem! I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch The commit log for this patch states: From: Julien Grall Date: Sat, 26 Sep 2020 17:44:29 +0100 Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory() The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic while the __acpi_os_{un,}map_memory() are meant to be arch-specific. Currently, the former are still containing x86 specific code. To avoid this rather strange split, the generic helpers are reworked so they are arch-agnostic. This requires the introduction of a new helper __acpi_os_unmap_memory() that will undo any mapping done by __acpi_os_map_memory(). Currently, the arch-helper for unmap is basically a no-op so it only returns whether the mapping was arch specific. But this will change in the future. Note that the x86 version of acpi_os_map_memory() was already able to able the 1MB region. Hence why there is no addition of new code. Signed-off-by: Julien Grall Reviewed-by: Rahul Singh Reviewed-by: Jan Beulich Acked-by: Stefano Stabellini Tested-by: Rahul Singh Tested-by: Elliott Mitchell (cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973) --- This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. I think this bug should be re-classified as a bug in src:xen. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
lini Tested-by: Rahul Singh Tested-by: Elliott Mitchell (cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973) --- This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. I think this bug should be re-classified as a bug in src:xen. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4.14 branch upstream should not be included in the xen package for Debian stable. Regards, Chuck Zmudzinski
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/26/2021 8:01 AM, Chuck Zmudzinski wrote: On 8/24/2021 7:12 PM, Ben Hutchings wrote: The current limit on the environment of a uevent appears to be 2 KB (UEVENT_BUFFER_SIZE defined in ).� That seems like it *might* be easier to change, so long as user-space doesn't have a similar limit. I looked into systemd/udev, and it seems to use an 8 KB buffer for receiving uevents: https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 But as a first step I think increasing the kernel buffer size to 4 KB would be enough.� Perhaps someone could test whether this patch to the domU kernel makes udev happier: --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -30,7 +30,7 @@ � � #define UEVENT_HELPER_PATH_LEN������� 256 � #define UEVENT_NUM_ENVP����������� 64��� /* number of env pointers */ -#define UEVENT_BUFFER_SIZE������� 2048��� /* buffer for the variables */ +#define UEVENT_BUFFER_SIZE������� 4096��� /* buffer for the variables */ � � #ifdef CONFIG_UEVENT_HELPER � /* path to the userspace helper executed on an event */ --- END --- ? Ben. I tested this patch on my Xen HVM bullseye system and it appears 4k is enough for the UEVENT_BUFFER_SIZE to accommodate the Xen Virtual Keyboard's large modalias. I needed to follow the instructions in the Kernel team's handbook for changing the ABI name of the kernel for the build to succeed with the patch. I just bumped it from 8 to 8.1. Results: 1. No coldplug failure reported at boot time. 2. With the patch the system can write uevent data to sysfs for the Xen Virtual Keyboard device. With the current 5.10.0-8 kernel: chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent chuckz@debian:~$ With the patched kernel with a change to the ABI version from 8 to 8.1: chuckz@debian:~$ uname -r 5.10.0-8.1-amd64 chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent PRODUCT=1/5853//0 NAME="Xen Virtual Keyboard" PHYS="xenbus/device/vkbd/0" PROP=0 EV=3 KEY=7fff ... MODALIAS=input:b0001v5853pe-e0,1,k71,72... really long MODALIAS --- So I think a test of the installation media in a Xen HVM with the 4k buffer in the kernel is the next step. I would also like to test a live CD in a Xen HVM with this patch. It was also reported to fail to boot in a Xen HVM on the debian-user list. BTW, my complements to the Debian Kernel Team for the excellent handbook on building kernels for Debian. It is easy to understand and made it very easy for me to build and test the patch even though I have not built a Linux kernel in many years, and I never built a Debian kernel before. All the best, Chuck Results of more tests with the patched kernel: 1. Boot on dom0 - works normally, can create VMs, run Liinux container, etc. 2. Boot in Xen PV - works normally 3. Boot on bare hardware - works normally I do not see any issues with the patched kernel on my system. Cheers, Chuck
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/24/2021 7:12 PM, Ben Hutchings wrote: The current limit on the environment of a uevent appears to be 2 KB (UEVENT_BUFFER_SIZE defined in ). That seems like it *might* be easier to change, so long as user-space doesn't have a similar limit. I looked into systemd/udev, and it seems to use an 8 KB buffer for receiving uevents: https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 But as a first step I think increasing the kernel buffer size to 4 KB would be enough. Perhaps someone could test whether this patch to the domU kernel makes udev happier: --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -30,7 +30,7 @@ #define UEVENT_HELPER_PATH_LEN 256 #define UEVENT_NUM_ENVP 64 /* number of env pointers */ -#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */ +#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */ #ifdef CONFIG_UEVENT_HELPER /* path to the userspace helper executed on an event */ --- END --- ? Ben. I tested this patch on my Xen HVM bullseye system and it appears 4k is enough for the UEVENT_BUFFER_SIZE to accommodate the Xen Virtual Keyboard's large modalias. I needed to follow the instructions in the Kernel team's handbook for changing the ABI name of the kernel for the build to succeed with the patch. I just bumped it from 8 to 8.1. Results: 1. No coldplug failure reported at boot time. 2. With the patch the system can write uevent data to sysfs for the Xen Virtual Keyboard device. With the current 5.10.0-8 kernel: chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent chuckz@debian:~$ With the patched kernel with a change to the ABI version from 8 to 8.1: chuckz@debian:~$ uname -r 5.10.0-8.1-amd64 chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent PRODUCT=1/5853//0 NAME="Xen Virtual Keyboard" PHYS="xenbus/device/vkbd/0" PROP=0 EV=3 KEY=7fff ... MODALIAS=input:b0001v5853pe-e0,1,k71,72... really long MODALIAS --- So I think a test of the installation media in a Xen HVM with the 4k buffer in the kernel is the next step. I would also like to test a live CD in a Xen HVM with this patch. It was also reported to fail to boot in a Xen HVM on the debian-user list. BTW, my complements to the Debian Kernel Team for the excellent handbook on building kernels for Debian. It is easy to understand and made it very easy for me to build and test the patch even though I have not built a Linux kernel in many years, and I never built a Debian kernel before. All the best, Chuck
Bug#988776: Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/25/2021 4:16 PM, Phillip Susi wrote: Chuck Zmudzinski writes: If it doesn't work, I am also willing to try approach a by patching the Linux kernel xen-kbdfront driver by removing the for loops that advertise those 654 keys. I tend to agree with Philip that this is totally unnecessary, but I suppose I could be wrong about that. I read the discussion Philip had with the Xen developers and they seemed to want to keep the Xen keyboard driver as it is. That was the first thing I tried and the libinput maintainer pointed out that if you don't advertise the keys, you can't use the keys. In other words, somebody presses that key on their keyboard and the domU won't recognize it. Well, good news - It looks like Ben's patch works, I just tested it in my full install in a Xen HVM domU and all looks good. I did not see the Coldplug failure at the beginning of the boot - it is hard to miss in the bright red letters on the console, and even more convincing is the fact that another symptom of the bug is gone. This bug manifests itself in udev not being able to write uevent data to sysfs for the Xen Virtual Keyboard. With Ben's patch of increasing the UEVENT_BUFFER_SIZE from 2048 to 4096, udev can write its uevent data to sysfs for the Xen Virtual Keyboard: With the current 5.10.0-8 kernel: chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent chuckz@debian:~$ With the patched kernel with a change to the ABI version from 8 to 8.1: chuckz@debian:~$ uname -r 5.10.0-8.1-amd64 chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent PRODUCT=1/5853//0 NAME="Xen Virtual Keyboard" PHYS="xenbus/device/vkbd/0" PROP=0 EV=3 KEY=7fff ... MODALIAS=input:b0001v5853pe-e0,1,k71,72... really long MODALIAS I expect with that patch the installation media will work in a Xen HVM domU. Cheers, Chuck
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/24/2021 7:12 PM, Ben Hutchings wrote: Text-based sysfs attributes are limited to a page, but udev receives uevents through netlink, not sysfs. The current limit on the environment of a uevent appears to be 2 KB (UEVENT_BUFFER_SIZE defined in ). That seems like it *might* be easier to change, so long as user-space doesn't have a similar limit. I looked into systemd/udev, and it seems to use an 8 KB buffer for receiving uevents: https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 But as a first step I think increasing the kernel buffer size to 4 KB would be enough. Perhaps someone could test whether this patch to the domU kernel makes udev happier: --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -30,7 +30,7 @@ #define UEVENT_HELPER_PATH_LEN 256 #define UEVENT_NUM_ENVP 64 /* number of env pointers */ -#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */ +#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */ #ifdef CONFIG_UEVENT_HELPER /* path to the userspace helper executed on an event */ --- END --- ? Ben. I tried this patch but the build failed - it ran for over an hour. I am not sure why as I have not built a Linux kernel in many years. So I will this: 1) Try to build the unmodified kernel on my system just to be sure I am building the kernel correctly and that my hardware is OK. Once I could not build the Linux kernel until I replaced a bad memory card. 2) If that succeeds, I will try the patch with a bump to the abi version. From the output of the failed build and what I read in the section on the Debian kernel ABI name, I think that the system detected an ABI change and so it failed. The build was checking symbols when it failed. This will take a little while because it takes over an hour to build the kernel on my system. Chuck
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/25/2021 12:45 PM, Chuck Zmudzinski wrote: On 8/24/2021 7:12 PM, Ben Hutchings wrote: On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote: Ben Hutchings writes: I think a proper fix would be one of: a. If the Xen virtual keyboard driver is advertising capabilities it ��� doesn't have, stop it doing that. b. Change the implementation of modalias attributes to allow longer ��� values. It's not clear to me whether the Xen driver is advertising correctly or not.� If it is, then�the solution should be b, but that may be too disruptive a change to the kernel.� So a reasonable workaround might be: c. Change the input subsystem to limit the length of the ��� capabilities part of the modalias. The problem with a) is that the Xen keyboard is not a physical keyboard and so it has no way of knowing what keys it actually has.� It is a fake input device designed to pass through whatever input the Xen hypervisor sends down.� As such, any key could come in.� If it doesn't advertise that it has all of these keys, then they would not be accepted by libinput when the hypervisor sends them down. Right, that's what I feared. xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC, KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654 keys and 2362 bytes in the modalias. This seems to be the heart of the problem: libinput was designed assuming that all keyboards can and must report what keys are actually present, and then libinput tries to cram that information into the modalias rather than some other sysfs attribute as it should ( or not at all... I still don't see how this information is actually supposed to be useful to userspace ). I think modaliases aren't intended to be interpreted by user-space, other than processing wildcards when matching to modules. For input devices, the same information is available through other variables in the uevent, in a more compact form.� The information *is* useful for user-space; e.g. in initramfs-tools we recognise keyboard devices and add their drivers to the initramfs but ignore other input devices. As for b), the problem isn't with the modalias attribute itself, but when the kernel tries to copy it into the environment block for the udev callout.� The environment block is only a single page, and so limited to 4 KB.� And that's for everything else that goes into the environment, not just the modalias. Text-based sysfs attributes are limited to a page, but udev receives uevents through netlink, not sysfs. The current limit on the environment of a uevent appears to be 2 KB (UEVENT_BUFFER_SIZE defined in ).� That seems like it *might* be easier to change, so long as user-space doesn't have a similar limit. I looked into systemd/udev, and it seems to use an 8 KB buffer for receiving uevents: https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 But as a first step I think increasing the kernel buffer size to 4 KB would be enough.� Perhaps someone could test whether this patch to the domU kernel makes udev happier: --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -30,7 +30,7 @@ � � #define UEVENT_HELPER_PATH_LEN������� 256 � #define UEVENT_NUM_ENVP����������� 64��� /* number of env pointers */ -#define UEVENT_BUFFER_SIZE������� 2048��� /* buffer for the variables */ +#define UEVENT_BUFFER_SIZE������� 4096��� /* buffer for the variables */ � � #ifdef CONFIG_UEVENT_HELPER � /* path to the userspace helper executed on an event */ --- END --- ? Ben. I will try it in my bullseye Xen HVM DomU. I am not sure how to rebuild the installation media with a patched systemd, but I can patch my installed Xen HVM DomU system with a patched systemd with the increased buffer size and see if the Coldplug failure early in the boot process goes away. If so, then it is likely this patch to systemd would also fix the installation media. If it doesn't work, I am also willing to try approach a by patching the Linux kernel xen-kbdfront driver by removing the for loops that advertise those 654 keys. I tend to agree with Philip that this is totally unnecessary, but I suppose I could be wrong about that. I read the discussion Philip had with the Xen developers and they seemed to want to keep the Xen keyboard driver as it is. Chuck The build failed with an error. I used the test-patches script to start the build: chuckz@debian:~/linuxdata/sources-bullseye/kernel/linux-5.10.46$ bash debian/bin/test-patches ../patch with Ben's patch to UEVENT_BUFFER_SIZE in ../patch. The build was running for over an hour and then failed with the last few lines on the console as: RT_SYMBOL zl10039_attach������������������������������ï¿
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/24/2021 7:12 PM, Ben Hutchings wrote: On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote: Ben Hutchings writes: I think a proper fix would be one of: a. If the Xen virtual keyboard driver is advertising capabilities it doesn't have, stop it doing that. b. Change the implementation of modalias attributes to allow longer values. It's not clear to me whether the Xen driver is advertising correctly or not. If it is, then�the solution should be b, but that may be too disruptive a change to the kernel. So a reasonable workaround might be: c. Change the input subsystem to limit the length of the capabilities part of the modalias. The problem with a) is that the Xen keyboard is not a physical keyboard and so it has no way of knowing what keys it actually has. It is a fake input device designed to pass through whatever input the Xen hypervisor sends down. As such, any key could come in. If it doesn't advertise that it has all of these keys, then they would not be accepted by libinput when the hypervisor sends them down. Right, that's what I feared. xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC, KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654 keys and 2362 bytes in the modalias. This seems to be the heart of the problem: libinput was designed assuming that all keyboards can and must report what keys are actually present, and then libinput tries to cram that information into the modalias rather than some other sysfs attribute as it should ( or not at all... I still don't see how this information is actually supposed to be useful to userspace ). I think modaliases aren't intended to be interpreted by user-space, other than processing wildcards when matching to modules. For input devices, the same information is available through other variables in the uevent, in a more compact form. The information *is* useful for user-space; e.g. in initramfs-tools we recognise keyboard devices and add their drivers to the initramfs but ignore other input devices. As for b), the problem isn't with the modalias attribute itself, but when the kernel tries to copy it into the environment block for the udev callout. The environment block is only a single page, and so limited to 4 KB. And that's for everything else that goes into the environment, not just the modalias. Text-based sysfs attributes are limited to a page, but udev receives uevents through netlink, not sysfs. The current limit on the environment of a uevent appears to be 2 KB (UEVENT_BUFFER_SIZE defined in ). That seems like it *might* be easier to change, so long as user-space doesn't have a similar limit. I looked into systemd/udev, and it seems to use an 8 KB buffer for receiving uevents: https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 But as a first step I think increasing the kernel buffer size to 4 KB would be enough. Perhaps someone could test whether this patch to the domU kernel makes udev happier: --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -30,7 +30,7 @@ #define UEVENT_HELPER_PATH_LEN 256 #define UEVENT_NUM_ENVP 64 /* number of env pointers */ -#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */ +#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */ #ifdef CONFIG_UEVENT_HELPER /* path to the userspace helper executed on an event */ --- END --- ? Ben. I will try it in my bullseye Xen HVM DomU. I am not sure how to rebuild the installation media with a patched systemd, but I can patch my installed Xen HVM DomU system with a patched systemd with the increased buffer size and see if the Coldplug failure early in the boot process goes away. If so, then it is likely this patch to systemd would also fix the installation media. If it doesn't work, I am also willing to try approach a by patching the Linux kernel xen-kbdfront driver by removing the for loops that advertise those 654 keys. I tend to agree with Philip that this is totally unnecessary, but I suppose I could be wrong about that. I read the discussion Philip had with the Xen developers and they seemed to want to keep the Xen keyboard driver as it is. Chuck
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/25/2021 10:54 AM, Ben Hutchings wrote: On Tue, 2021-08-24 at 15:19 -0400, Chuck Zmudzinski wrote: On 8/24/2021 1:12 PM, Ben Hutchings wrote: [...] I think a proper fix would be one of: a. If the Xen virtual keyboard driver is advertising capabilities it doesn't have, stop it doing that. b. Change the implementation of modalias attributes to allow longer values. It's not clear to me whether the Xen driver is advertising correctly or not. If it is, then the solution should be b, but that may be too disruptive a change to the kernel. So a reasonable workaround might be: c. Change the input subsystem to limit the length of the capabilities part of the modalias. Ben. So workaround c would not involve disruptions to the kernel or systemd? Workaround c seems too disruptive for stable to me, but maybe could go into unstable and eventually into testing. I don't think it would be very disruptive. It might require a kernel ABI bump, but we do those regularly during a stable release. And this bug is severe enough that I think a fix would be suitable for Debian stable. A problem with the approach of fixing this bug in the Xen keyboard driver is that the fix must be implemented in the underlying Dom0 system, which could be almost anything - another Linux distro or Debian stable or oldstable. Any fix upstream would probably get into a bullseye Dom0, but not oldstable Dom0, but perhaps it could be provided as a backport for anyone who is still on oldstable for their Xen Dom0. [...] I agree that we need to fix this for domU independently of any protocol change to allow discovery of which keys the underlying input device has. So we can't solve this with approach a. Ben. Actually, now I think my comments about approach a are wrong. I was thinking the Linux kernel was reading the modalias of the Xen Virtual Keyboard from through some interface provided by xen - the hypervisor or libxl or some such component running in Dom0. After further investigation, now I think the modalias of the Xen Virtual Keyboard is coming from here: https://github.com/torvalds/linux/blob/6e764bcd1cf72a2846c0e53d3975a09b242c04c9/drivers/input/misc/xen-kbdfront.c#L257 This is the xen-kbdfront.c driver, which is part of the Linux kernel. At line 257 of that driver, we have: for (i = KEY_ESC; i < KEY_UNKNOWN; i++) __set_bit(i, kbd->keybit); for (i = KEY_OK; i < KEY_MAX; i++) __set_bit(i, kbd->keybit); This is advertising too many keys, making the modalias absurdly large. The Xen virtual keyboard driver in the Linux kernel has been doing this at least since 2011 when to Xen virtual keyboard driver was moved to its current location in the Linux kernel source tree. So this can probably be fixed in the Linux kernel without any patches to the Xen hypervisor or libxl running in Dom0. Probably just removing those two for loops would fix it. Chuck
Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel
On 8/24/2021 1:12 PM, Ben Hutchings wrote: On Tue, 2021-08-24 at 10:56 -0400, Chuck Zmudzinski wrote: On 5/24/2021 3:30 AM, Michael Biebl wrote: Hi Phillip Am 24.05.2021 um 06:19 schrieb Cyril Brulebois: trigger to cold plug all devices. Both scripts are set -e. The Xen Virtual Keyboard driver and at least one other driver have always failed to trigger due to having absurdly long modalias, but the error used to be ignored. The kernel now returns the error to udevadm So this is a change in behaviour in the kernel? What happens if you boot the installed system? Does udevadm trigger fail there as well? I feel a bit uneasy changing the udev start script this late in the release cycle (especially when it appears like covering up an issue someplace else). I'll let Marco make the judgement on this though, as he has the most experience with those udev udeb start scripts as the original author. Michael After reviewing Philip's message at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357#43 which seems to point to the root cause of this bug, I can add: On my Xen HVM DomU I see the absurdly long modalias for the Xen Virtual keyboard that seems to be causing this crash in sysfs at /sys/devices/virtual/input/input2/modalias But at /sys/devices/vkbd-0/modalias, I see just 'xen:vkbd', which would probably not result in an error in the udev script if this was also written as the modalias at /sys/devices/virtual/input/input2/modalias So the Xen virtual keyboard appears more than once in sysfs, and modalias is not the same in the different places. This seems to be a problem. They are two different devices, and they should have different modaliases. Linux has code for discovering devices on each kind of bus, including virtual buses, and that code creates "bus devices" such as vkbd-0. At this point the kernel doesn't know what the device is capable of. The modalias for a bus device carries some identifying information that can be used to select a driver module for it. The driver does know what the device is capable of, and how to use it. It will normally create one or more "class devices" that support a particular set of operations; in this case input device operations. Class devices typically don't have modaliases, since they don't need another layer of drivers on top. However, for input devices the modalias carries information about the device's capabilities. These may trigger loading of the evdev or joydev module. I understand the correct way to fix this bug is by modifying the Xxen virtual keyboard (and any other devices that might cause this crash) and not the start-udev script on the netinst installation media, which is so far the only available workaround. Hopefully Xen will accept a fix if we can come up with a fix. [...] I think a proper fix would be one of: a. If the Xen virtual keyboard driver is advertising capabilities it doesn't have, stop it doing that. b. Change the implementation of modalias attributes to allow longer values. It's not clear to me whether the Xen driver is advertising correctly or not. If it is, then the solution should be b, but that may be too disruptive a change to the kernel. So a reasonable workaround might be: c. Change the input subsystem to limit the length of the capabilities part of the modalias. Ben. So workaround c would not involve disruptions to the kernel or systemd? Workaround c seems too disruptive for stable to me, but maybe could go into unstable and eventually into testing. A problem with the approach of fixing this bug in the Xen keyboard driver is that the fix must be implemented in the underlying Dom0 system, which could be almost anything - another Linux distro or Debian stable or oldstable. Any fix upstream would probably get into a bullseye Dom0, but not oldstable Dom0, but perhaps it could be provided as a backport for anyone who is still on oldstable for their Xen Dom0. Anyway, I will look into the Xen virtual keyboard capabilities. The only capability I can think of that would be useful in this context is that it supports live migration of a VM through some sort of hot-swapping capability. If it has that capability, a workaround to support it would be good. But if it does not have that capability or if such a capability is not needed for a keyboard, then it should probably stop advertising itself as being able or needing to do that. Ultimately, it is up to Xen to decide if they are going to make changes to its virtual keyboard. Chuck
Bug#983357: Netinst crashes xen domU when loading kernel
On 5/24/2021 3:30 AM, Michael Biebl wrote: Hi Phillip Am 24.05.2021 um 06:19 schrieb Cyril Brulebois: trigger to cold plug all devices. Both scripts are set -e. The Xen Virtual Keyboard driver and at least one other driver have always failed to trigger due to having absurdly long modalias, but the error used to be ignored. The kernel now returns the error to udevadm So this is a change in behaviour in the kernel? What happens if you boot the installed system? Does udevadm trigger fail there as well? I feel a bit uneasy changing the udev start script this late in the release cycle (especially when it appears like covering up an issue someplace else). I'll let Marco make the judgement on this though, as he has the most experience with those udev udeb start scripts as the original author. Michael After reviewing Philip's message at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357#43 which seems to point to the root cause of this bug, I can add: On my Xen HVM DomU I see the absurdly long modalias for the Xen Virtual keyboard that seems to be causing this crash in sysfs at /sys/devices/virtual/input/input2/modalias But at /sys/devices/vkbd-0/modalias, I see just 'xen:vkbd', which would probably not result in an error in the udev script if this was also written as the modalias at /sys/devices/virtual/input/input2/modalias So the Xen virtual keyboard appears more than once in sysfs, and modalias is not the same in the different places. This seems to be a problem. I understand the correct way to fix this bug is by modifying the Xxen virtual keyboard (and any other devices that might cause this crash) and not the start-udev script on the netinst installation media, which is so far the only available workaround. Hopefully Xen will accept a fix if we can come up with a fix. I am willing to try to debug this by testing patches to the Xen virtual keyboard, and anyone who has any tips on how udev works would be helpful. Is there documentation in udev for device developers somewhere to consult that explains how to update old device drivers so they are compatible with the modern version? Does the Xen virtual keyboard need to be managed by udev? Is there a simple way to disable incompatible devices so udev ignores them? Chuck Zmudzinski
Bug#983357: Netinst crashes xen domU when loading kernel
On 5/25/2021 2:38 PM, Phillip Susi wrote: Michael Biebl writes: So this is a change in behaviour in the kernel? Yes, this commit fixed the kernel to report the error instead of silently failing: commit df44b479654f62b478c18ee4d8bc4e9f897a9844 Author: Peter Rajnoha Date: Wed Dec 5 12:27:44 2018 +0100 kobject: return error code if writing /sys/.../uevent fails Propagate error code back to userspace if writing the /sys/.../uevent file fails. Before, the write operation always returned with success, even if we failed to recognize the input string or if we failed to generate the uevent itself. With the error codes properly propagated back to userspace, we are able to react in userspace accordingly by not assuming and awaiting a uevent that is not delivered. Signed-off-by: Peter Rajnoha Signed-off-by: Greg Kroah-Hartman What happens if you boot the installed system? Does udevadm trigger fail there as well? Yes, it does; that is how I was able to track down the problem. I feel a bit uneasy changing the udev start script this late in the release cycle (especially when it appears like covering up an issue someplace else). I'll let Marco make the judgement on this though, as he has the most experience with those udev udeb start scripts as the original author. So far I have been removing the -e from the shbang line in the start-udev script and remastering the iso so I can get it to boot. It would probably be a better idea to just add a || true to the udevadm trigger call. I feel fairly certain that no matter what the cause of the coldplug failure, the user is going to be better off ignoring it and trying to proceed than a kernel panic. Hello, This bug was noticed on the debian-user list recently and I have been testing various workarounds and instead of removing -e from the shbang line I came up with prepending the udevadm trigger call in the start-udev script with dmesg | grep DMI: | grep 'Xen HVM domU' || This causes the offending udevadm trigger call to never be invoked when running in a Xen HVM DomU. On all other systems, the call should be invoked like normal. With this hack, I was able to create a modified ISO and run the bullseye installer from it in a Xen HVM DomU and complete an install without the crash and reboot. I also can confirm that I always see the coldplug failure on the installed system in a Xen HVM DomU, but in that case the failure does not cause a crash and the system boots normally after reporting the failure. I also do not see the problem in a Xen PV DomU, which I think is what the /install.amd/xen folder on the installation media is for. Chuck Zmudzinski
Bug#990055: qemu-system-x86: Cannot set PCI slot 2 for Intel IGD Passthrough using Xen
Package: qemu-system-x86 Version: 6.0+dfsg-1~exp0 and 5.2+dfsg-10 Severity: normal Tags: patch upstream Dear Maintainer, I find that when using qemu with a Windows Xen HVM DomU and also passing through the Intel integrated graphics device (IGD) to the Windows Xen HVM DomU, it is much more reliable if the Intel IGD is at PCI slot 2 in the HVM DomU. When using the ancient qemu-xen-traditional device model provided by the Xen project, the Intel IGD always grabs slot 2 when it is passed through to the Xen HVM DomU using the gfx_passthru option in xl.cfg, but not when using what the Xen project refers to as the upstream qemu device model, which is the device model provided by the qemu-system-x86 package. Intel says the IGD device needs to be at PCI slot 2 and but it will be at a different slot when using the qemu version provided by the qemu-system-x86 package. One problem that occurs is that Windows sometimes reports code 43 errors in the Windows Device Manager when the Intel IGD is not set to PCI slot 2, and this prevents IGD passthrough from working because the Windows code 43 error causes Windows to disable the affected device. Other times, the screen is a little fuzzy at first but it usually clears up later. I investigated and found out how the ancient qemu-xen-traditional model ensures the IGD grabs PCI slot 2, it is by patching the hw/pci/pci.c file, but the patch in qemu-xen-traditional is not appropriate for this version of qemu because unlike qemu-xen-traditional, this version is designed to support more configuratons than with Xen. I was able to develop a patch that causes the Intel IGD to grab slot 2 with these versions of qemu when qemu is running with xen as the accelerator and when using the xenlight (xl/libxl) toolstack to build the Xen HVM. The patch is designed to only affect Xen HVMs with IGD passthrough, that is, when using the xenlight toolstack and setting gfx_passthru to '1' or 'igd' in the Xen HVM DomU's xl.cfg file. The following patch is for the 6.0+dfsg-1~exp0 package, but it also applies to the 5.2+dfsg-10 package also with some fuzz. I used it as the last patch in the series of patches in debian/patches, and it works well. It uses CONFIG_DEVICES to only compile on platforms with CONFIG_XEN_IGD_PASSTHROUGH set by the meson build system, and it also checks and only applies at runtime if the gfx_passthru option is set, and all the patch is in the xen part of the qemu code. -Start of Patch --- a/hw/i386/xen/xen-hvm.c 2021-04-29 13:18:58.0 -0400 +++ b/hw/i386/xen/xen-hvm.c 2021-06-18 09:44:58.0 -0400 @@ -9,6 +9,7 @@ */ #include "qemu/osdep.h" +#include CONFIG_DEVICES #include "qemu/units.h" #include "cpu.h" @@ -38,6 +39,11 @@ #include #include +#ifdef CONFIG_XEN_IGD_PASSTHROUGH +#include "hw/pci/pci_bus.h" +#include "hw/xen/xen_pt.h" +#endif + //#define DEBUG_XEN_HVM #ifdef DEBUG_XEN_HVM @@ -1530,6 +1536,21 @@ exit(1); } +#ifdef CONFIG_XEN_IGD_PASSTHROUGH +/* Reserve pci slot 2 for the Intel IGD */ +void xen_hvm_reserve_igd_slot(PCIBus *pci_bus) +{ +DPRINTF("Checking if igd-passthrough is set...\n"); +if (xen_igd_gfx_pt_enabled()) { +DPRINTF("Reserving PCI slot 0x02 for IGD...\n"); +pci_bus->slot_reserved_mask = XEN_IGD_PCI_SLOT; +} +else { +DPRINTF("IGD passthrough is not set\n"); +} +} +#endif + void destroy_hvm_domain(bool reboot) { xc_interface *xc_handle; --- a/hw/i386/pc_piix.c 2021-06-18 09:39:56.0 -0400 +++ b/hw/i386/pc_piix.c 2021-06-18 09:49:15.0 -0400 @@ -208,6 +208,13 @@ pci_memory, ram_memory); pcms->bus = pci_bus; +#ifdef CONFIG_XEN_IGD_PASSTHROUGH +/* This function checks if igd-passthru is enabled and + * if so, reserve slot 2 for it on the PCI Bus */ +if (xen_enabled()) { +xen_hvm_reserve_igd_slot(pci_bus); +} +#endif piix3 = piix3_create(pci_bus, &isa_bus); piix3->pic = x86ms->gsi; piix3_devfn = piix3->dev.devfn; --- a/include/hw/xen/xen-x86.h 2021-04-29 13:18:58.0 -0400 +++ b/include/hw/xen/xen-x86.h 2021-06-18 09:54:05.0 -0400 @@ -12,4 +12,8 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion **ram_memory); +#ifdef CONFIG_XEN_IGD_PASSTHROUGH +void xen_hvm_reserve_igd_slot(PCIBus *pci_bus); +#endif + #endif /* QEMU_HW_XEN_X86_H */ --- a/hw/xen/xen_pt.c 2021-04-29 13:18:58.0 -0400 +++ b/hw/xen/xen_pt.c 2021-06-18 10:07:42.0 -0400 @@ -53,6 +53,7 @@ */ #include "qemu/osdep.h" +#include CONFIG_DEVICES #include "qapi/error.h" #include @@ -65,6 +66,10 @@ #include "xen_pt.h" #include "qemu/range.h" #include "exec/address-spaces.h" +#ifdef CONFIG_XEN_IGD_PASSTHROUGH +#include "hw/pci/pci_bus.h" +static void xen_pt_clear_igd_slot(DeviceState *qdev, Error **errp); +#endif static bool has_igd_gfx_passthru;
Bug#988333: linux-image-5.10.0-6-amd64: VGA Intel IGD Passthrough to Debian Xen HVM DomUs not working, but Windows Xen HVMs do work
Package: src:linux Version: 5.10.28-1 Severity: normal Tags: upstream Dear Maintainer, I have been using Xen's PCI and VGA passthrough feature since wheezy and jessie were the stable versions, and back then both Windows HVMs and Linux HVMs would function with the Intel Integrated Graphics Device (IGD), the audio device, and the USB 3 controller passed to them. But with buster and bullseye running as the Dom0, I can only get the VGA/Passthrough feature to work with Windows Xen HVMs. I would expect both Windows and Linux HVMs to work comparably well. -- Package-specific info: Linux version 5.10.0-6-amd64 (debian-ker...@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.28-1 (2021-04-09) BOOT_IMAGE=/boot/vmlinuz-5.10.0-6-amd64 root=UUID=332b3875-d57c-4083-9d46-3faa28d60691 ro xen-fbfront.video=24,1368,768 quiet - this is what I have on the bullseye DomU. On the Dom0, I have BOOT_IMAGE=/boot/vmlinuz-5.10.0-6-amd64 root=/dev/debian/bullseye ro reboot=bios quiet console=tty1 console=hvc0 On Dom0, the Xen commandline and version (from xl dmesg): dom0_mem=2G,max:2G smt=false pv-l1tf=false iommu=1 no-real-mode edd=off Xen version 4.14.2-pre (Debian 4.14.1+11-gb0b734a8b3-1) (pkg-xen-de...@lists.alioth.debian.org) (x86_64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110) debug=n Sun Feb 28 18:49:45 UTC 2021 Bootloader: GRUB 2.02+dfsg1-20+deb10u2 kernel logs (problems reported in Dom0's syslog when trying to start this Debian bullseye Xen HVM DomU with Xen VGA/PCI passthrough configured): May 9 10:52:20 bullseye kernel: [0.00] Linux version 5.10.0-6-amd64 (debian-ker...@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.28-1 (2021-04-09) May 9 10:52:20 bullseye kernel: [0.00] Command line: placeholder root=/dev/debian/bullseye ro reboot=bios quiet console=tty1 console=hvc0 . . . Start a bullseye Xen HVM configured for PCI/VGA passthrough using the bullseye Xen and Qemu packages for bullseye on Dom0 (Haswell Intel IGD + audio device + USB 3.0 controller): May 10 08:50:03 bullseye kernel: [79077.644346] pciback :00:1b.0: xen_pciback: vpci: assign to virtual slot 0 May 10 08:50:03 bullseye kernel: [79077.644478] pciback :00:1b.0: registering for 16 May 10 08:50:03 bullseye kernel: [79077.644732] pciback :00:14.0: xen_pciback: vpci: assign to virtual slot 1 May 10 08:50:03 bullseye kernel: [79077.644874] pciback :00:14.0: registering for 16 May 10 08:50:03 bullseye kernel: [79077.645024] pciback :00:02.0: xen_pciback: vpci: assign to virtual slot 2 May 10 08:50:03 bullseye kernel: [79077.645107] pciback :00:02.0: registering for 16 May 10 08:50:30 bullseye kernel: [79105.273876] vif vif-16-0 vif16.0: Guest Rx ready May 10 08:50:30 bullseye kernel: [79105.273893] IPv6: ADDRCONF(NETDEV_CHANGE): vif16.0: link becomes ready May 10 08:50:30 bullseye kernel: [79105.278023] xen-blkback: backend/vbd/16/51712: using 4 queues, protocol 1 (x86_64-abi) persistent grants May 10 08:50:44 bullseye kernel: [79119.104937] irq 16: nobody cared (try booting with the "irqpoll" option) May 10 08:50:44 bullseye kernel: [79119.104973] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-6-amd64 #1 Debian 5.10.28-1 May 10 08:50:44 bullseye kernel: [79119.104976] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B85M Pro4, BIOS P2.50 12/11/2015 May 10 08:50:44 bullseye kernel: [79119.104979] Call Trace: May 10 08:50:44 bullseye kernel: [79119.104984] May 10 08:50:44 bullseye kernel: [79119.104998] dump_stack+0x6b/0x83 May 10 08:50:44 bullseye kernel: [79119.105008] __report_bad_irq+0x35/0xa7 May 10 08:50:44 bullseye kernel: [79119.105014] note_interrupt.cold+0xb/0x61 May 10 08:50:44 bullseye kernel: [79119.105024] handle_irq_event+0xa8/0xb0 May 10 08:50:44 bullseye kernel: [79119.105030] handle_fasteoi_irq+0x78/0x1c0 May 10 08:50:44 bullseye kernel: [79119.105037] generic_handle_irq+0x47/0x50 May 10 08:50:44 bullseye kernel: [79119.105044] __evtchn_fifo_handle_events+0x175/0x190 May 10 08:50:44 bullseye kernel: [79119.105054] __xen_evtchn_do_upcall+0x66/0xb0 May 10 08:50:44 bullseye kernel: [79119.105063] __xen_pv_evtchn_do_upcall+0x11/0x20 May 10 08:50:44 bullseye kernel: [79119.105069] asm_call_irq_on_stack+0x12/0x20 May 10 08:50:44 bullseye kernel: [79119.105072] May 10 08:50:44 bullseye kernel: [79119.105079] xen_pv_evtchn_do_upcall+0xa2/0xc0 May 10 08:50:44 bullseye kernel: [79119.105084] exc_xen_hypervisor_callback+0x8/0x10 May 10 08:50:44 bullseye kernel: [79119.105091] RIP: e030:xen_hypercall_sched_op+0xa/0x20 May 10 08:50:44 bullseye kernel: [79119.105097] Code: 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc May 10 08:50:44 bullseye kernel:
Bug#776742: Solved for jessie: Bug#776742: xen-utils-common: no support for VGA Passthrough
s are not needed for the build process, and they are not installed in any of the binary packages produced by my builds. 6. I am willing to contribute a fix based on one of my nmu packages to the Debian project if the Debian Xen team is interested in adding it to the Debian packages archive, either in main, contrib, or non-free as appropriate. I am not including any debdiff patches here because they are too large. Chuck Zmudzinski
Bug#776742: xen-utils-common: no support for VGA Passthrough
Here are my hardware specs: Motherboard: ASRock B85M Pro4 LGA 1150 Intel B85 HDMI SATA 6Gb/s USB 3.0 Micro ATX Intel (About 2 and a half years old) CPU: Intel core i5-4590S (Haswell 4th Generation) Chipset: Intel B85 Intel Integrated Graphics: HD4600, No external video card, I pass through the Intel integrated graphics card to my DomUs Sound: Realtek integrated on the motherboard, this can also be passed through to DomUs using PCI passthrough, as well as the USB on the motherboard, although I could not get USB 3 ports to work with passthrough, the USB 2 ports work fine though in DomUs I also have 16 GB RAM and a 240 GB SSD, with a 1 TB HD. I bought the system specifically because it had hardware specs that were known to support VGA passthrough at that time. I don't know what the best options are now if you are looking for hardware that supports VGA passthrough. About Debian stretch: As I said I have not tried it. But I think it is more likely to support VGA passthrough out of the box than Jessie because the version of Xen on stretch is 4.8, and version 4.8 supports VGA passthrough using either the newer upstream qemu or the traditional qemu, but the version of Xen on jessie is 4.4, and in that version VGA passthrough is only supported using the older traditional version of qemu which is included in Wheezy but is not available out of the box on jessie. Compare the man page of xl.cfg on jessie with the man page of xl.cfg on stretch. Look at what each says about supporting gfx passthrough: On jessie: gfx_passthru is currently only supported with the qemu-xen- traditional device-model. Upstream qemu-xen device-model currently does not have support for gfx_passthru. On stretch: gfx_passthru is currently supported both with the qemu-xen-traditional device-model and upstream qemu-xen device-model. The reason gfx passthrough does not work on jessie is that Debian took the traditional qemu device model out of its xen package, but that component is required for that version of Xen to function with VGA passthrough. Unless Debian backports version 4.7 or 4.8 to jessie so the upstream qemu device model will work with VGA passthrough, or unless Debian provides a supported way to install the traditional qemu device model on jessie, I don't think there will ever be a supported configuration on jessie that supports VGA passthrough. I got VGA passthrough working on my jessie system by hacking the xen source package for jessie, but I don't think it is possible to get VGA passthrough working with the current version of Debian's xen package for jessie, no matter what hardware you have. You should try it on stretch and wheezy to test your hardware for VGA passthrough functionality. Wheezy also has a better chance of working and it also works on my system with wheezy, but it is a little flaky and I had to hold back upgrades of the hypervisor for it to continue to work on wheezy. But wheezy has the traditional qemu device model, but jessie doesn't. For these reasons, you are better off trying wheezy or stretch for VGA passthrough until Debian provides a solution for jessie. Chuck On 02/12/2017 01:37 PM, Juergen Schinker wrote: can you add your hw specs and also what about Debian-Stretch? J - On 12 Feb, 2017, at 18:05, Chuck Zmudzinski brchu...@netscape.net wrote: This bug, at its core, is that currently there is no supported solution for VGA passthrough on Xen for stable version Jessie from Debian. After browsing Xen's repositories, I found out that Xen did not claim to support VGA passthrough with the upstream qemu-xen device model until Sep 25, 2015, the date the xl.cfg man page was updated to indicate support for VGA passthrough with upstream qemu-xen. This change to the xl.cfg man page was only made on the Xen version 4.7 and 4.8 branches, so if you want to use VGA passthrough without the traditional qemu-dm binary, you must upgrade to at least Xen version 4.7. Debian testing (currently stretch) uses Xen 4.8 and it presumably supports VGA passthrough without qemu-xen-traditional but I have not tried it. This situation leaves users of Debian stable (currently Jessie) with no supported solution from Debian for VGA passthrough on Xen. Obviously there are two solutions. Backport Xen 4.7 or greater to Jessie, or restore the traditional qemu-dm binary to the Xen 4.4.x package for Jessie. A couple of months ago I decided to try and rebuild the Xen source package for Jessie with support for qemu-xen-traditional from upstream included. It did not take long to get a working package that solves this bug. I discovered the following facts: 1. Adding qemu-xen-traditional in a way supported by Xen also requires rombios which, like qemu-xen-traditional, is disabled in Debian's official build of Xen for Jessie. 2. After configuring the build for qemu-xen-traditional and rombios, the only binary package that is modified significantly is xen-utils-4.4, which
Bug#776742: xen-utils-common: no support for VGA Passthrough
't like the motherboard changed from other them. As far as I can tell, Xen still maintains the traditional qemu-dm, and I was able to recently rebuild the xen package for Jessie and get VGA Passthrough working on Jessie with the most recent version of the traditional device model that is available from Xen for the stable 4.4 release of Xen. So I ask again, why can't the Debian Xen team restore qemu-dm to its official Xen package for Jessie? The only reasonable reasons I can think of is that there is some free software licensing issue with the rombios modules that are statically linked to hvmloader or with some necessary component of qemu-dm, or the Debian Xen team has too few resources and is devoting its efforts to developing Xen for stretch rather than adding feaatures that did not make it into Jessie when it was released. But why should oldstable Wheezy have a feature that stable Jessie does not have? In any case, I hope the Debian Xen team can explain why qemu-dm cannot be restored to Debian Jessie's offical xen package. Thank you for your consideration of my question. Sincerely, Chuck Zmudzinski