Hi, IS_FATAL_ERROR sounds like a utility macro, I would add a parameter to test the flags and make it return a Bool compatible value. My 2 cents comments. Cheers
Le 2 janv. 2018 11:05, "Stefan Fritsch" <s...@sfritsch.de> a écrit : Hi, on a Fujitsu E744 laptop we have a problem that sometimes there is a very long delay (up to several minutes) when booting from hard disk. It seems accessing the DVD drive (which has no disk inserted) sometimes fails with some errors that are not correctly handled by grub, which leads to each access being stalled until the 20s timeout triggers. This seems to happen when grub is trying to read filesystem/partition data. Example debug output: disk/ahci.c:1020: AHCI command scheduled disk/ahci.c:1022: AHCI tfd = 50 disk/ahci.c:1024: AHCI inten = 7dc000ff disk/ahci.c:1026: AHCI intstatus = 3 disk/ahci.c:1031: AHCI inten = 7dc000ff disk/ahci.c:1033: AHCI tfd = 50 disk/ahci.c:1036: AHCI sig = eb140101 disk/ahci.c:1038: AHCI tfd = 2051 disk/ahci.c:1049: AHCI status <1 1 40000001 2051> disk/ahci.c:926: AHCI tfd = 2051 disk/ahci.c:932: AHCI tfd = 2051 disk/ahci.c:936: AHCI tfd = 2051 disk/ahci.c:942: grub_ahci_read (size=0, cmdsize = 0) disk/ahci.c:957: AHCI tfd = 2051, CL=0xda455000 disk/ahci.c:968: AHCI tfd = 2051 disk/ahci.c:980: AHCI tfd = 2051 disk/ahci.c:987: AHCI tfd = 2051 disk/ahci.c:998: cfis: 27 80 08 00 00 00 00 00 disk/ahci.c:1003: cfis: 00 00 00 00 00 00 00 00 disk/ahci.c:1015: PRDT = da453000, 0, ffffffff (128) The problem is that the command_issue bit that is checked in the loop is only reset if the "HBA receives a FIS which clears the BSY, DRQ, and ERR bits for the command", but the ERR bit is never cleared. The patch below seems to fix the issue by checking the error bits in the interrupt status register. According to the AHCI 1.2 spec, "Interrupt sources that are disabled (‘0’) are still reflected in the status registers.", so this should work even though grub uses polling. The relevant bit in our case is the Task File Error Status (TFES), which is equivalent to the ERR bit 0 in tfd. But the patch below also checks the other error bits except for the "Interface non-fatal error status" bit. Cheers, Stefan diff --git a/grub-core/disk/ahci.c b/grub-core/disk/ahci.c --- a/grub-core/disk/ahci.c +++ b/grub-core/disk/ahci.c @@ -82,6 +82,19 @@ enum grub_ahci_hba_port_command GRUB_AHCI_HBA_PORT_CMD_FR = 0x4000, }; +enum grub_ahci_hba_port_int_status + { + GRUB_AHCI_HBA_PORT_IS_IFS = (1UL << 27), + GRUB_AHCI_HBA_PORT_IS_HBDS = (1UL << 28), + GRUB_AHCI_HBA_PORT_IS_HBFS = (1UL << 29), + GRUB_AHCI_HBA_PORT_IS_TFES = (1UL << 30), + }; +#define IS_FATAL_ERROR (\ + GRUB_AHCI_HBA_PORT_IS_IFS| \ + GRUB_AHCI_HBA_PORT_IS_HBDS|\ + GRUB_AHCI_HBA_PORT_IS_HBFS|\ + GRUB_AHCI_HBA_PORT_IS_TFES) + struct grub_ahci_hba { grub_uint32_t cap; @@ -1026,7 +1039,8 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev, endtime = grub_get_time_ms () + (spinup ? 20000 : 20000); while ((dev->hba->ports[dev->port].command_issue & 1)) - if (grub_get_time_ms () > endtime) + if (grub_get_time_ms () > endtime || + (dev->hba->ports[dev->port].intstatus & IS_FATAL_ERROR)) { grub_dprintf ("ahci", "AHCI status <%x %x %x %x>\n", dev->hba->ports[dev->port].command_issue, @@ -1034,7 +1048,10 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev, dev->hba->ports[dev->port].intstatus, dev->hba->ports[dev->port].task_file_data); dev->hba->ports[dev->port].command_issue = 0; - err = grub_error (GRUB_ERR_IO, "AHCI transfer timed out"); + if (dev->hba->ports[dev->port].intstatus & IS_FATAL_ERROR) + err = grub_error (GRUB_ERR_IO, "AHCI transfer error"); + else + err = grub_error (GRUB_ERR_IO, "AHCI transfer timed out"); if (!reset) grub_ahci_reset_port (dev, 1); break; _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel
_______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel