Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/21/2021 9:13 AM, Chuck Zmudzinski wrote: On 9/20/2021 10:37 PM, Elliott Mitchell wrote: On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. FWIW, I tried this. Sorry, not only does this not fix things, when I shutdown the dom0 running with the official Debian 4.19.181-1 kernel on the current official Debian Xen-4.14 hypervisor, the dom0 not only did not power off, it did not even reach the systemd poweroff target. Slight correction - after a few minutes, it did finally reach the systemd poweroff target, but the power did not turn off. Yet, it works perfectly on the official Debian Xen-4.11 hypervisor. Again, my tests cannot confirm that there is a bug in src:linux, the only common denominator for this bug in all my testing is src:xen, the and it appears in all the 4.14 Xen versions for bullseye, for every single Linux version tested. Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/2021 10:37 PM, Elliott Mitchell wrote: On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. FWIW, I tried this. Sorry, not only does this not fix things, when I shutdown the dom0 running with the official Debian 4.19.181-1 kernel on the current official Debian Xen-4.14 hypervisor, the dom0 not only did not power off, it did not even reach the systemd poweroff target. Yet, it works perfectly on the official Debian Xen-4.11 hypervisor. Again, my tests cannot confirm that there is a bug in src:linux, the only common denominator for this bug in all my testing is src:xen, the and it appears in all the 4.14 Xen versions for bullseye, for every single Linux version tested. Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 10:12 PM, Chuck Zmudzinski wrote: On 9/20/21 6:29 PM, Chuck Zmudzinski wrote: On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try. I have an encouraging result. I found a very simple patch to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff bug on my system and it should not affect the arm patches at all: -- This patch partially reverts previous patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This hopefully fixes #911976 --- a/xen/arch/x86/acpi/lib.c 2021-09-20 16:49:08.0 -0400 +++ b/xen/arch/x86/acpi/lib.c 2021-09-20 16:25:05.572038000 -0400 @@ -46,10 +46,6 @@ if ((phys + size) <= (1 * 1024 * 1024)) return __va(phys); - /* No further arch specific implementation after early boot */ - if (system_state >= SYS_STATE_boot) - return NULL; - offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; set_fixmap(FIX_ACPI_END, phys); -- Further testing with this patch revealed a problem. Although this simple patch causes dom0 to poweroff when shutting down, on the next reboot the system dropped to single-user shell because it mixed up my ssd and my hard disk. Normally the system assigns my SSD as /dev/sda and my hard disk as /dev/sdb. But on the first reboot after running the Xen hypervisor, the system reversed them so my SSD was /dev/sdb and my hard disk was /dev/sda. Since the EFI partition, which is a vfat partition, is on the SSD and in /etc/fstab I ask to mount it from the /dev/sda1 partition, it is now at /dev/sdb1, and the first partition is not a vfat partition on the hard disk so the system drops to a root shell for system maintenance. This switching of the devices on the subsequent reboot is another symptom of this bug I have seen in the past, and usually the ordinary behavior is restored on the next reboot or after resetting and powering off or unplugging from power. So this patch does not really fix the bug reliably. To clarify things, I saw this strange behavior of the system switching the disk devices with this patch under the following conditions: 1) Boot using this simple patch - dom0 shuts down properly 2) Boot using Elliott's suggested patch in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991967#94 3) It was when booting using Elliott's suggested patch that I saw the drop to single-user root for system maintenance. Moreover, Elliott's suggested patch did not fix the dom0 power off bug. So it might be the case that this simple patch would work for both amd64 and arm devices nicely, but Elliott refuses to test it with his arm devices. Sigh.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 10:37 PM, Elliott Mitchell wrote: On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. I presume you are suggesting I try booting 4.19.181-1 on the current version of Xen-4.14 for bullseye as a dom0. I am not inclined to try it until an official Debian developer endorses your opinion that the bug I am seeing is distinct from #991967, at which point I will report the bug I am seeing as a new bug. Regards, Chuck Zmudzinski
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 7:39 PM, Diederik de Haas wrote: On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. I tried these two commits (adapted for the xen-4.14 branch) but this approach did not fix the bug - with these patches applied the dom0 did not power down. My advice for the Debian Xen Team is to consult with upstream and get their advice on whether or not it is advisable for Debian to retain the patches from the Xen-4.16 branch that have been added to the Debian 4.14 package in an attempt to support some arm devices that panic during on an unpatched Xen-4.14. If upstream cannot help Debian backport fixes for arm panics from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian Xen team should remove aggressive patches that really have now turned the Debian Xen-4.14 package into a Frankenstein version that is a mixture of Xen-4.14 and Xen-4.16, and decide that support for those arm devices must wait until Debian gets Xen 4.16 up and running on the unstable and hopefully soon, testing distribution.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 6:29 PM, Chuck Zmudzinski wrote: On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try. I have an encouraging result. I found a very simple patch to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff bug on my system and it should not affect the arm patches at all: -- This patch partially reverts previous patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This hopefully fixes #911976 --- a/xen/arch/x86/acpi/lib.c 2021-09-20 16:49:08.0 -0400 +++ b/xen/arch/x86/acpi/lib.c 2021-09-20 16:25:05.572038000 -0400 @@ -46,10 +46,6 @@ if ((phys + size) <= (1 * 1024 * 1024)) return __va(phys); - /* No further arch specific implementation after early boot */ - if (system_state >= SYS_STATE_boot) - return NULL; - offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; set_fixmap(FIX_ACPI_END, phys); -- Further testing with this patch revealed a problem. Although this simple patch causes dom0 to poweroff when shutting down, on the next reboot the system dropped to single-user shell because it mixed up my ssd and my hard disk. Normally the system assigns my SSD as /dev/sda and my hard disk as /dev/sdb. But on the first reboot after running the Xen hypervisor, the system reversed them so my SSD was /dev/sdb and my hard disk was /dev/sda. Since the EFI partition, which is a vfat partition, is on the SSD and in /etc/fstab I ask to mount it from the /dev/sda1 partition, it is now at /dev/sdb1, and the first partition is not a vfat partition on the hard disk so the system drops to a root shell for system maintenance. This switching of the devices on the subsequent reboot is another symptom of this bug I have seen in the past, and usually the ordinary behavior is restored on the next reboot or after resetting and powering off or unplugging from power. So this patch does not really fix the bug reliably.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote: > > On 9/20/21 7:39 PM, Diederik de Haas wrote: > > On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: > >> Merely having the path is a sufficiently strong indicator for me to > >> simply wave it past. I though would suggest Debian should instead > >> cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. > >> > >> This is available as a patch at: > >> > >> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 > > You probably then also want the following commit, which is a fix on that > > patch: > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b > > > > Found that via the following url/query: > > https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI > > > > I don't know whether others should be used from that as well. > > I tried these two commits (adapted for the xen-4.14 branch) but this > approach did not fix the bug - with these patches applied the dom0 > did not power down. > > My advice for the Debian Xen Team is to consult with upstream and > get their advice on whether or not it is advisable for Debian to > retain the patches from the Xen-4.16 branch that have been > added to the Debian 4.14 package in an attempt to support > some arm devices that panic during on an unpatched Xen-4.14. > If upstream cannot help Debian backport fixes for arm panics > from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian > Xen team should remove aggressive patches that really have now > turned the Debian Xen-4.14 package into a Frankenstein version > that is a mixture of Xen-4.14 and Xen-4.16, and decide that support > for those arm devices must wait until Debian gets Xen 4.16 up > and running on the unstable and hopefully soon, testing distribution. It is still not established you're running into #991967. Unless the one you're pointing towards was backported to the Xen 4.11 packages (which I doubt) it cannot explain #991967, since at the time 4.11 was in use. Could be this is a second bug with symptoms similar to #991967. Now that a fix for the second bug has been identified, you might try a 4.19.181-1 kernel and see whether that fixes things. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try. I have an encouraging result. I found a very simple patch to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff bug on my system and it should not affect the arm patches at all: -- This patch partially reverts previous patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This hopefully fixes #911976 --- a/xen/arch/x86/acpi/lib.c 2021-09-20 16:49:08.0 -0400 +++ b/xen/arch/x86/acpi/lib.c 2021-09-20 16:25:05.572038000 -0400 @@ -46,10 +46,6 @@ if ((phys + size) <= (1 * 1024 * 1024)) return __va(phys); - /* No further arch specific implementation after early boot */ - if (system_state >= SYS_STATE_boot) - return NULL; - offset = phys & (PAGE_SIZE - 1); mapped_size = PAGE_SIZE - offset; set_fixmap(FIX_ACPI_END, phys); -- Can you try this patch to src:xen and see if your arm devices are OK with it?
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote: > Merely having the path is a sufficiently strong indicator for me to > simply wave it past. I though would suggest Debian should instead > cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. > > This is available as a patch at: > > https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 You probably then also want the following commit, which is a fix on that patch: https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b Found that via the following url/query: https://xenbits.xen.org/gitweb/?p=xen.git&a=search&h=HEAD&st=commit&s=x86%2FACPI I don't know whether others should be used from that as well. signature.asc Description: This is a digitally signed message part.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Mon, Sep 20, 2021 at 06:29:49PM -0400, Chuck Zmudzinski wrote: > On 9/20/21 1:43 PM, Chuck Zmudzinski wrote: > > > > On 9/20/21 12:27 AM, Elliott Mitchell wrote: > >> On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > >> > >>> I suspect the following patch is the culprit for problems > >>> shutting down on the amd64 architecture: > >>> > >>> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > >>> This patch does affect amd64 acpi code, and is probably causing > >>> the problem on my amd64 system, so my build of the xen-4.14 > >>> hypervisor without this patch fixed the problem. > >> Of the ones listed that is the only one which has any overlap with x86 > >> code.?? The next reproduction step is `apt-get source xen && > >> patch -p1 -R < > >> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > >> && dpkg-buildpackage -b`.?? Then try with this to confirm that patch > >> is what does it. > >> > >> Thing is that delta is rather small.?? I don't have a simulator, but that > >> is rather small to be the culprit. > > > > I just tested the build with > > patch -p1 -R < > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > applied before building the package and I can confirm that this is the > > patch > > causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch > > fixes it on my amd64 system. But this would probably break the arm build. > > > > I think one possible fix would require modifying > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > so it only applies at runtime to the arm architecture. I will try some > > modifications to the patch instead of removing it, and if I get something > > that works on amd64 and also might work on arm, I will post it > > for Elliott to try. > > I have an encouraging result. I found a very simple patch > to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff > bug on my system and it should not affect the arm patches > at all: > -- > This patch partially reverts previous patch > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > > This hopefully fixes #911976 > > --- a/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:49:08.0 -0400 > +++ b/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:25:05.572038000 -0400 > @@ -46,10 +46,6 @@ > if ((phys + size) <= (1 * 1024 * 1024)) > ?? return __va(phys); > > -?? /* No further arch specific implementation after early boot */ > -?? if (system_state >= SYS_STATE_boot) > -?? ?? return NULL; > - > offset = phys & (PAGE_SIZE - 1); > mapped_size = PAGE_SIZE - offset; > set_fixmap(FIX_ACPI_END, phys); > -- > > Can you try this patch to src:xen and see if your > arm devices are OK with it? Merely having the path is a sufficiently strong indicator for me to simply wave it past. I though would suggest Debian should instead cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8. This is available as a patch at: https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8 The other commit I would suggest being picked by src:xen is 5a4087004d1adbbb223925f3306db0e5824a2bdc This is for device-tree funkiness which got added between linux-5.10.0 and linux-5.10.y (if the Debian kernel team wants to maintain a fix in Debian's kernel source, that works too). BTW have I mentioned I've become rather skeptical of device-trees being a usable way of representing hardware information? -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I just tested the build with patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch applied before building the package and I can confirm that this is the patch causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch fixes it on my amd64 system. But this would probably break the arm build. I think one possible fix would require modifying 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch so it only applies at runtime to the arm architecture. I will try some modifications to the patch instead of removing it, and if I get something that works on amd64 and also might work on arm, I will post it for Elliott to try.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/20/21 12:27 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. You're referencing several software versions which are mismatches for #991967. #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3, but not Linux kernel 4.19.181. The fact it correlates with a Linux kernel update rather strongly points to the Linux kernel. I could believe the situation is partially the fault of both though. I don't see it with Xen-4.11 and Linux kernel 4.19.194-3 which is the current default dom0 configuration on Debian buster, but I do see it with Debian's version of Xen-4.14 and either Linux kernel 4.19.194-3 from buster or Linux kernel 5.10.46-4 from bullseye as the dom0. So I only saw it with the update of the Xen hypervisor from 4.11 to 4.14. Of course you have different hardware and a different acpi implementation which is also likely to be a factor that determines whether or not the dom0 poweroff bug manifests itself. I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. I did try to remove this single patch from the xen build using quilt, but quilt was not happy when it tried to apply the subsequent arm patch, so I just removed all the subsequent arm patches to keep quilt happy with my modified xen src tree. I will try it now, though. If it is this small a delta that is causing the problem on x86/amd64, then maybe we can come up with a workaround in src:xen that is acceptable for both arm and x86/amd64. I think this bug should be re-classified as a bug in src:xen. There could be a separate bug in src:xen, but that is not #991967. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4.14 branch upstream should not be included in the xen package for Debian stable. It was requested since someone trying to have Xen operational on a device needed those for operation. Rather a lot of bugfix or very small standalone feature patches get cherry-picked. Presently I haven't been convinced this is a Xen bug (though it does effect Xen installations). Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux kernel? I'm suspecting got incorrectly backported on the Linux side (alternatively the Xen project seems a bit poor at keeping needed patches in Linux). Yes, I recently built and tested a slightly modified Debian bullseye kernel to test a fix for #983357: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357 If you have a patch for Debian's 5.10 bullseye kernel that might fix the dom0 poweroff bug I am seeing on bullseye with Debian's current Xen 4.14, I am willing to try it out on my system as an alternate fix from the fix I discovered in src:xen that unfortunately removes arm patches that are needed by some devices.
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 9:30 PM, Chuck Zmudzinski wrote: On 9/19/2021 4:53 PM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 03:54:01PM -0400, Chuck Zmudzinski wrote: On 9/19/2021 1:29 PM, Elliott Mitchell wrote: Have you tried memory ballooning with PVH or HVM domains? That combination has been reliably crashing Xen for me for a while. Apparently few others have run into it, yet it is reliable for me. Have you tried the combination? Works? Panics? I have not tried ballooning HVM or PVH domains. If the Xen hypervisor is crashing when ballooning unprivileged domains, doesn't that support my belief that there are bugs in src:xen rather than in src:linux? No. I still think the patches to fix a panic on devices using the arm architecture are a bit aggressive for the Debian Xen package for Debian stable. Those patches upstream are intended for Xen unstable, which is currently Xen 4.16. Such patches do not belong in a stable Xen 4.14 package for Debian stable, especially after it can be proven they cause a regression for Xen users of amd64 devices, the regression being that they break the proper shutdown functioning of amd64 devices. I think the correct Debian way to support the arm devices that panic on a true upstream Xen 4.14 hypervisor without the patches for arm that cause dom0 to not power off properly on amd64 is by first testing the arm patches as part of a new Xen 4.16 unstable Xen package for Debian unstable, then follow ordinary development procedures for porting Xen 4.16 to bookworm/testing, and then finally a backport of Xen 4.16 to bullseye. That is the only way I can see this being done without causing grief to Xen users who want a stable Xen on a stable Debian, unless upstream can help with porting the arm patches back to Xen 4.14 in such a way that they don't break things on amd64. This was also deliberately not copied to #991967 since this is unrelated. I'm concerned this second one might be Debian, but the small delta makes me think it likely originates from upstream Xen. I was wondering whether you had seen it since I haven't found other reports. (note, if you try recreating, this is a Xen panic, all domains get lost) This is off-topic for bug #991968. Regards, Chuck Also off-topic for bug #991967 - sorry about the typo. Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 4:53 PM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 03:54:01PM -0400, Chuck Zmudzinski wrote: On 9/19/2021 1:29 PM, Elliott Mitchell wrote: Have you tried memory ballooning with PVH or HVM domains? That combination has been reliably crashing Xen for me for a while. Apparently few others have run into it, yet it is reliable for me. Have you tried the combination? Works? Panics? I have not tried ballooning HVM or PVH domains. If the Xen hypervisor is crashing when ballooning unprivileged domains, doesn't that support my belief that there are bugs in src:xen rather than in src:linux? No. I still think the patches to fix a panic on devices using the arm architecture are a bit aggressive for the Debian Xen package for Debian stable. Those patches upstream are intended for Xen unstable, which is currently Xen 4.16. Such patches do not belong in a stable Xen 4.14 package for Debian stable, especially after it can be proven they cause a regression for Xen users of amd64 devices, the regression being that they break the proper shutdown functioning of amd64 devices. I think the correct Debian way to support the arm devices that panic on a true upstream Xen 4.14 hypervisor without the patches for arm that cause dom0 to not power off properly on amd64 is by first testing the arm patches as part of a new Xen 4.16 unstable Xen package for Debian unstable, then follow ordinary development procedures for porting Xen 4.16 to bookworm/testing, and then finally a backport of Xen 4.16 to bullseye. That is the only way I can see this being done without causing grief to Xen users who want a stable Xen on a stable Debian, unless upstream can help with porting the arm patches back to Xen 4.14 in such a way that they don't break things on amd64. This was also deliberately not copied to #991967 since this is unrelated. I'm concerned this second one might be Debian, but the small delta makes me think it likely originates from upstream Xen. I was wondering whether you had seen it since I haven't found other reports. (note, if you try recreating, this is a Xen panic, all domains get lost) This is off-topic for bug #991968. Regards, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) > > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. You're referencing several software versions which are mismatches for #991967. #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3, but not Linux kernel 4.19.181. The fact it correlates with a Linux kernel update rather strongly points to the Linux kernel. I could believe the situation is partially the fault of both though. > I suspect the following patch is the culprit for problems > shutting down on the amd64 architecture: > > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch > This patch does affect amd64 acpi code, and is probably causing > the problem on my amd64 system, so my build of the xen-4.14 > hypervisor without this patch fixed the problem. Of the ones listed that is the only one which has any overlap with x86 code. The next reproduction step is `apt-get source xen && patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch && dpkg-buildpackage -b`. Then try with this to confirm that patch is what does it. Thing is that delta is rather small. I don't have a simulator, but that is rather small to be the culprit. > I think this bug should be re-classified as a bug in src:xen. There could be a separate bug in src:xen, but that is not #991967. > I also would inquire with the Debian Xen Team about why they > are backporting patches from the upstream xen unstable > branch into Debian's 4.14 package that is currently shipping > on Debian stable (bullseye). IMHO, the aforementioned > patches that are not in the stable 4.14 branch upstream > should not be included in the xen package for Debian stable. It was requested since someone trying to have Xen operational on a device needed those for operation. Rather a lot of bugfix or very small standalone feature patches get cherry-picked. Presently I haven't been convinced this is a Xen bug (though it does effect Xen installations). Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux kernel? I'm suspecting got incorrectly backported on the Linux side (alternatively the Xen project seems a bit poor at keeping needed patches in Linux). -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 1:29 PM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. Actually hardware which is pretty different from mine, so you may run into distinct bugs. Have you tried PVH or HVM domains? HVM domains: Yes, and they work normally on all Debian versions I have tried.. PVH domains: No, I have not tried these on Debian. Have you tried memory ballooning with PVH or HVM domains? That combination has been reliably crashing Xen for me for a while. Apparently few others have run into it, yet it is reliable for me. Have you tried the combination? Works? Panics? I have not tried ballooning HVM or PVH domains. If the Xen hypervisor is crashing when ballooning unprivileged domains, doesn't that support my belief that there are bugs in src:xen rather than in src:linux? Regards, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 10:56 AM, Elliott Mitchell wrote: On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso wrote: > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > An experiment lead to a potential alternative explanation for #991967. > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > was tried on a UEFI system and the issue wasn't observed) > > Following up on https://bugs.debian.org/991967#12 > > Did you succeeded in bisecting the issue as you seem to have it > reproducible? I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Nope. As per the report the problem appeared with kernel 4.19.194-3 and at the time using Xen 4.11. The kernel you're listing is rather more recent, which might suggest a patch which had been backported from 5.x to 4.19. I could believe a Xen security update being the trigger though (I don't recall there being one at the right time, but I wouldn't rule it out). Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. Just to make sure, the kernel you were testing was 4.19.194-3? The issue didn't manifest with kernels earlier than that. I will check again with a buster dom0 when I get a chance, probably late tonight or tomorrow. I think it was 4.19.194-3 if that is the latest buster kernel because I don't think there has been an update to the buster kernel since I tested it. Could be we're seeing distinct bugs. I could agree if the problem shows up on my system with the 4.19.194-3 kernel dom0 on xen-4.11, but if not, then it is probably the same bug, a bug that is in src:xen, not src:linux. This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. While that commit modifies the code path the processor takes, the modified path appears identical. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4.14 branch upstream should not be included in the xen package for Debian stable. Some people are asking for those. Those are bugfixes for an extremely popular device which panics on boot without the patches. The raspberry pi, I presume. Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees were modified in a way which broke Xen 4.14 on ARM64. The change violated Linux's own standards for device-trees, yet still appeared in a stable branch. In other news, if you see device-trees compared to ACPI tables, they're not very comparable. 99% of ACPI tables work for all versions of all OSes. Any given device-tree is only likely to work for a single version of a single OS. While a useful abstraction for portions of kernel code, device-trees are utter garbage compared to ACPI tables. Well, now we are at Debian stable with 5.10.x for linux and 4.14.x for xen, so we are kind of stuck with these versions on Debian stable now. I am all for tweaking the Debian stable packages to support raspberry and amd64. The question is, what is the quickest and least disturbing way to fix it now? All the best, Chuck
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On 9/19/2021 1:05 AM, Chuck Zmudzinski wrote: Hello Elliott and Salvatore, I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. I also found a fix in src:xen: I noticed the series of patches in debian/patches of the 4.14.2+25-gb6a8c4f72d-2 version of src:xen (and earlier versions of xen-4.14 on Debian) have several patches backported from the unstable branch of xen upstream. By removing some of these patches from the patches series of the src:xen package, the dom0 shuts down as expected on my ASRock Haswell motherboard. I rebuilt the src:xen package after removing the following patches from the debian/patches series and the result was that the computer shuts down as expected if I boot using the patched hypervisor: 0027-xen-rpi4-implement-watchdog-based-reset.patch 0028-tools-python-Pass-linker-to-Python-build-process.patch 0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch 0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch 0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch 0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch 0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch 0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch Most of these patches seem unrelated to the amd64 architecture and instead affect the arm architecture, and removing all these patches is probably more than is needed to fix this bug, but I removed them all because I could not find them upstream on the 4.14 branch but instead only saw them on the xen unstable branch upstream (I did not check if they are on the 4.15 branch upstream), and I wanted to test a true upstream 4.14 version without these seemingly aggressive patches added by Debian from the unstable branch of xen upstream, and I discovered by being more conservative and not adding these patches from the unstable branch upstream fixed the problem! I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch The commit log for this patch states: From: Julien Grall Date: Sat, 26 Sep 2020 17:44:29 +0100 Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory() The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic while the __acpi_os_{un,}map_memory() are meant to be arch-specific. Currently, the former are still containing x86 specific code. To avoid this rather strange split, the generic helpers are reworked so they are arch-agnostic. This requires the introduction of a new helper __acpi_os_unmap_memory() that will undo any mapping done by __acpi_os_map_memory(). Currently, the arch-helper for unmap is basically a no-op so it only returns whether the mapping was arch specific. But this will change in the future. Note that the x86 version of acpi_os_map_memory() was already able to able the 1MB region. Hence why there is no addition of new code. Signed-off-by: Julien Grall Reviewed-by: Rahul Singh Reviewed-by: Jan Beulich Acked-by: Stefano Stabellini Tested-by: Rahul Singh Tested-by: Elliott Mitchell (cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973) --- This patch does affect amd64 acpi code, and is probably causing the problem on my amd64 system, so my build of the xen-4.14 hypervisor without this patch fixed the problem. I think this bug should be re-classified as a bug in src:xen. I also would inquire with the Debian Xen Team about why they are backporting patches from the upstream xen unstable branch into Debian's 4.14 package that is currently shipping on Debian stable (bullseye). IMHO, the aforementioned patches that are not in the stable 4.14 branch upstre
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote: > On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso > wrote: > > > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > > An experiment lead to a potential alternative explanation for #991967. > > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > > was tried on a UEFI system and the issue wasn't observed) > > > > Following up on https://bugs.debian.org/991967#12 > > > > Did you succeeded in bisecting the issue as you seem to have it > > reproducible? > > I noticed this bug on bullseye ever since I have been > running bullseye as a dom0, but my testing indicates > there is no problem with src:linux but the problem > appeared in src:xen with the 4.14 version of xen on > bullseye. > > I ask Elliott if you are only seeing the problem on Debian's > xen-4.14 hypervisor? Also, which architecture, arm or > amd64? I only see the problem on the Debian xen-4.14 > hypervisor, and I have only tested on amd64, and I > have found a fix for my amd64 system which is as > follows: > > Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, > with a Haswell CPU (core i5-4590S) > > xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 > > linux kernel version: 5.10.46-4 (the current amd64 kernel > for bullseye) Nope. As per the report the problem appeared with kernel 4.19.194-3 and at the time using Xen 4.11. The kernel you're listing is rather more recent, which might suggest a patch which had been backported from 5.x to 4.19. I could believe a Xen security update being the trigger though (I don't recall there being one at the right time, but I wouldn't rule it out). > Boot system: EFI, not using secure boot, booting xen > hypervisor and dom0 bullseye with grub-efi package for > bullseye, and it boots the xen-4.14-amd64.gz file, not > the xen-4.14-amd64.efi file. > > I also tested a buster dom0 with the 4.19 series kernel > on the xen-4.14 hypervisor from bullseye and saw the > problem, but I did not see the problem with either > a buster (linux 4.19) or bullseye (linux 5.10) dom0 on > the xen-4.11 hypervisor, so I think the problem is > with the Debian version of the xen-4.14 hypervisor, > not with src:linux. Just to make sure, the kernel you were testing was 4.19.194-3? The issue didn't manifest with kernels earlier than that. Could be we're seeing distinct bugs. > This patch does affect amd64 acpi code, and is probably causing > the problem on my amd64 system, so my build of the xen-4.14 > hypervisor without this patch fixed the problem. While that commit modifies the code path the processor takes, the modified path appears identical. > I also would inquire with the Debian Xen Team about why they > are backporting patches from the upstream xen unstable > branch into Debian's 4.14 package that is currently shipping > on Debian stable (bullseye). IMHO, the aforementioned > patches that are not in the stable 4.14 branch upstream > should not be included in the xen package for Debian stable. Some people are asking for those. Those are bugfixes for an extremely popular device which panics on boot without the patches. Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees were modified in a way which broke Xen 4.14 on ARM64. The change violated Linux's own standards for device-trees, yet still appeared in a stable branch. In other news, if you see device-trees compared to ACPI tables, they're not very comparable. 99% of ACPI tables work for all versions of all OSes. Any given device-tree is only likely to work for a single version of a single OS. While a useful abstraction for portions of kernel code, device-trees are utter garbage compared to ACPI tables. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso wrote: > Hi Elliott, > > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > An experiment lead to a potential alternative explanation for #991967. > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > was tried on a UEFI system and the issue wasn't observed) > > Following up on https://bugs.debian.org/991967#12 > > Did you succeeded in bisecting the issue as you seem to have it > reproducible? > > Regards, > Salvatore > > Hello Elliott and Salvatore, I noticed this bug on bullseye ever since I have been running bullseye as a dom0, but my testing indicates there is no problem with src:linux but the problem appeared in src:xen with the 4.14 version of xen on bullseye. I ask Elliott if you are only seeing the problem on Debian's xen-4.14 hypervisor? Also, which architecture, arm or amd64? I only see the problem on the Debian xen-4.14 hypervisor, and I have only tested on amd64, and I have found a fix for my amd64 system which is as follows: Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015, with a Haswell CPU (core i5-4590S) xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64 linux kernel version: 5.10.46-4 (the current amd64 kernel for bullseye) Boot system: EFI, not using secure boot, booting xen hypervisor and dom0 bullseye with grub-efi package for bullseye, and it boots the xen-4.14-amd64.gz file, not the xen-4.14-amd64.efi file. I also tested a buster dom0 with the 4.19 series kernel on the xen-4.14 hypervisor from bullseye and saw the problem, but I did not see the problem with either a buster (linux 4.19) or bullseye (linux 5.10) dom0 on the xen-4.11 hypervisor, so I think the problem is with the Debian version of the xen-4.14 hypervisor, not with src:linux. I also found a fix in src:xen: I noticed the series of patches in debian/patches of the 4.14.2+25-gb6a8c4f72d-2 version of src:xen (and earlier versions of xen-4.14 on Debian) have several patches backported from the unstable branch of xen upstream. By removing some of these patches from the patches series of the src:xen package, the dom0 shuts down as expected on my ASRock Haswell motherboard. I rebuilt the src:xen package after removing the following patches from the debian/patches series and the result was that the computer shuts down as expected if I boot using the patched hypervisor: 0027-xen-rpi4-implement-watchdog-based-reset.patch 0028-tools-python-Pass-linker-to-Python-build-process.patch 0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch 0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch 0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch 0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch 0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch 0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch Most of these patches seem unrelated to the amd64 architecture and instead affect the arm architecture, and removing all these patches is probably more than is needed to fix this bug, but I removed them all because I could not find them upstream on the 4.14 branch but instead only saw them on the xen unstable branch upstream (I did not check if they are on the 4.15 branch upstream), and I wanted to test a true upstream 4.14 version without these seemingly aggressive patches added by Debian from the unstable branch of xen upstream, and I discovered by being more conservative and not adding these patches from the unstable branch upstream fixed the problem! I suspect the following patch is the culprit for problems shutting down on the amd64 architecture: 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch The commit log for this patch states: From: Julien Grall Date: Sat, 26 Sep 2020 17:44:29 +0100 Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory() The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic while the __acpi_os_{un,}map_memory() are meant to be arch-specific. Currently, the former are still containing x86 specific code. To avoid this rather strange split, the generic helpers are reworked so they are arch-agnostic. This requires the introduction of a new helper __acpi_os_unmap_memory() that will undo any mapping done by __acpi_os_map_memory(). Currently, the arch-helper for unmap is basically a no-op so it only returns whether the mapping was arch specific. But this will change in the future. Note that the x86 version of acpi_os_map_memory() was already able to able the 1MB region. Hence why there is no addition of new code. Signed-off-by: Julien Grall Reviewed-by: Rahul Singh Reviewed-by: Jan Beulich Acked-by: Stefano Stabellini Tested-by: Rahul Singh Tested-by: Elliott Mitchell (cherry picked from commit 1
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
On Sat, Sep 11, 2021 at 01:29:12PM +0200, Salvatore Bonaccorso wrote: > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > > An experiment lead to a potential alternative explanation for #991967. > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > > 4.19.194-3. Presence of Xen on the system may be unrelated. > > > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > > was tried on a UEFI system and the issue wasn't observed) > > Following up on https://bugs.debian.org/991967#12 > > Did you succeeded in bisecting the issue as you seem to have it > reproducible? Problem is that is rather a lot of kernel builds, which also means a lot of downtime... Right now distribution update seems worthy of greater attention. The one notable bit is the one I sent in the last message. The system does NOT have UEFI, and a test system with UEFI seemed to have no problem. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
Hi Elliott, On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote: > An experiment lead to a potential alternative explanation for #991967. > The issue may be ACPI (non-UEFI) powerdown/reset was broken at > 4.19.194-3. Presence of Xen on the system may be unrelated. > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen > was tried on a UEFI system and the issue wasn't observed) Following up on https://bugs.debian.org/991967#12 Did you succeeded in bisecting the issue as you seem to have it reproducible? Regards, Salvatore
Bug#991967: #991967: Simply ACPI powerdown/reset issue?
An experiment lead to a potential alternative explanation for #991967. The issue may be ACPI (non-UEFI) powerdown/reset was broken at 4.19.194-3. Presence of Xen on the system may be unrelated. Failing that, it could be Xen and non-UEFI systems are effected. (Xen was tried on a UEFI system and the issue wasn't observed) -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445