Re: [coreboot] [RFH] Native AMD fam10-15 support

2018-06-15 Thread qtux
On 15/06/18 13:51, Kyösti Mälkki wrote:
> On Fri, Jun 15, 2018 at 2:14 PM, qtux  wrote:
> 
>>
>>>> Coreboot did work well, but froze sometimes when booting during the
>>>> assigning resources step (more or less exactly after assigning the PCI
>>>> 14.3 or PNP 002e.2 device, which happen to be close to each other inside
>>>> the devicetree). I had to remove the power cord in order to be able to
>>>> boot again (or to get the next random freeze...). Rarely, after such an
>>>> recovery, I have got flooded by IOMMU warnings in Linux which would only
>>>> disappear after another reboot.
>>>
>>> Ah, that resume reboot-loop issue. The bit that tells to do S3 resume
>>> is a sticky register backed up by Vstb rail. With [2] you should not
>>> need to do full power-cycling at least. We should extend this work to
>>> other platforms.
>>>
>>
>> I am not sure whether the term resume reboot-loop applies for my issue
>> (side note: I used a serial connection to monitor the boot process):
>>
>> Rebooting (via holding the power button for some seconds) after
>> encountering a freeze (aka stopping at the assign resource step)
>> resulted into having no output from serial at all. I could repeat this
>> with no effect at all, the computer seemed to be dead. Only removing the
>> power cord could solve the issue.
>>
>> This issue could occur when rebooting but even when cold booting.
>>
> 
> One of these boards had LPC related lockups. I think the solution was to
> disable serial console or to set console to low loglevel.
> 

If that is the case I would suggest to alter the default board
configuration to reflect this. Nevertheless, it seems strange to me that
such a lockup disappeared after changing SPI chips.

> 
> 
>> Answers are inside the text.
>> I forgot to mention that I am currently on commit 793ae846e8.
>>
> 
> Let's take the parent of that, commit 4a027e6e -- the one you refer to only
> appears on gerrit review branch.
> 
> Kyösti
> 
> 
> 

That is fine, sorry for the misleading commit hash.

Cheers,
Matthias

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] [RFH] Native AMD fam10-15 support

2018-06-15 Thread qtux
On 15/06/18 02:42, Kyösti Mälkki wrote:
> On Thu, Jun 14, 2018 at 4:38 AM, qtux  wrote:
>> On 13/06/18 22:12, Kyösti Mälkki wrote:
>> Hi,
>>
>> S3 is __not__ working on my KCMA-D8. The last time I tried, I had to
>> remove the power cord for a couple of seconds to be able to boot again.
>>
>> Interestingly, this issue looks similar to another one I had with a
>> flash chip which seems not to be supported by coreboot. Here the
>> relevant part of the logs regarding the bad chip:
>> Manufacturer: ef
>> SF: Unsupported Winbond ID 4014
>> SF: Unsupported manufacturer!
>>
> 
> Thanks, [1] should take care of this.
> 

Thank you! I will test the Winbond chip when I find some time to do so.

>> Coreboot did work well, but froze sometimes when booting during the
>> assigning resources step (more or less exactly after assigning the PCI
>> 14.3 or PNP 002e.2 device, which happen to be close to each other inside
>> the devicetree). I had to remove the power cord in order to be able to
>> boot again (or to get the next random freeze...). Rarely, after such an
>> recovery, I have got flooded by IOMMU warnings in Linux which would only
>> disappear after another reboot.
> 
> Ah, that resume reboot-loop issue. The bit that tells to do S3 resume
> is a sticky register backed up by Vstb rail. With [2] you should not
> need to do full power-cycling at least. We should extend this work to
> other platforms.
> 

I am not sure whether the term resume reboot-loop applies for my issue
(side note: I used a serial connection to monitor the boot process):

Rebooting (via holding the power button for some seconds) after
encountering a freeze (aka stopping at the assign resource step)
resulted into having no output from serial at all. I could repeat this
with no effect at all, the computer seemed to be dead. Only removing the
power cord could solve the issue.

This issue could occur when rebooting but even when cold booting.

>> Replacing the chip seems to have solved this random boot freeze problem.
>> But maybe the S3 issue and the issue I had with the wrong chip are
>> related as they both lock down the machine until I remove the power cord.
> 
> Yes, it's connected. Having a non-supported SPI part ID there would
> prevent ACPI S3 resume, and likely enter the loop.>

Just to be sure: The S3 resume does not work with the __supported__ SPI
chip. I did not test S3 with the unsupported one.

> If someone takes the task of testing and/or bisecting please note:
> 
> Regression present between: 714709f .. a26377b
> 
> Regression present between: 9e94dbf .. c2a921b (for kcma-d8) 8a8386e
> (for kgpe-d16)
> 
> Now, since commit babb2e6 that claims to add S3 support on kgpe-d16 is
> within the latter period, I do not quite see how S3 support could have
> worked with that commit on kgpe-d16.  Or maybe this feature was never
> retested once it was rebased and upstreamed. Nor can I see how it
> could have worked for any commit in 4.6, so I must be missing
> something here. So I will need some logs.
> 
> 
> [1] https://review.coreboot.org/#/c/coreboot/+/27107
> [2] https://review.coreboot.org/#/c/coreboot/+/27108
> 
> Kyösti
> 

Answers are inside the text.
I forgot to mention that I am currently on commit 793ae846e8.

Cheers,
Matthias

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] [RFH] Native AMD fam10-15 support

2018-06-13 Thread qtux
On 13/06/18 22:12, Kyösti Mälkki wrote:
> Hi
> 
> Now that we wiped out K8, I'd like to put my eyes on fam10-15 boards.
> 
> Couple questions for board owners:
> 
> First, about asus/kcma-d8 and asus/kgpe-d16: Do these have working S3
> support? I remember rumours they originally worked at some point, but
> regressed during the rebase / upstream process. Anyone willing to
> bisect/fix it if necessary?
> 
> I am asking, because these are the last two remaining boards with
> combination of HAVE_ACPI_RESUME=y and RELOCATABLE_RAMSTAGE=n, and we
> have to drag along some back-and-forth memory copy code to keep OS
> memory intact for these two.
> 
> Second, I would like to move forwards with AMD fam10 to have
> RELOCATABE_RAMSTAGE=y, that would also solve above-mentioned issue and
> open up doors for some new features.
> 
> If it was my decision, RELOCATABLE_RAMSTAGE for x86 would be one
> criteria to survive the next (October 2018?) release. POSTCAR_STAGE
> for May 2019. I am probably too late to make such wishes, but I hope
> these will happen in the next two years nevertheless.
> 
> Kyösti
> 

Hi,

S3 is __not__ working on my KCMA-D8. The last time I tried, I had to
remove the power cord for a couple of seconds to be able to boot again.

Interestingly, this issue looks similar to another one I had with a
flash chip which seems not to be supported by coreboot. Here the
relevant part of the logs regarding the bad chip:
Manufacturer: ef
SF: Unsupported Winbond ID 4014
SF: Unsupported manufacturer!

Coreboot did work well, but froze sometimes when booting during the
assigning resources step (more or less exactly after assigning the PCI
14.3 or PNP 002e.2 device, which happen to be close to each other inside
the devicetree). I had to remove the power cord in order to be able to
boot again (or to get the next random freeze...). Rarely, after such an
recovery, I have got flooded by IOMMU warnings in Linux which would only
disappear after another reboot.

Replacing the chip seems to have solved this random boot freeze problem.
But maybe the S3 issue and the issue I had with the wrong chip are
related as they both lock down the machine until I remove the power cord.

Cheers,
Matthias

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] [RFH] Status of the Lenovo X201

2018-05-18 Thread qtux
On 03/05/18 21:51, qtux wrote:
> I uploaded a status report for the X201 and it contains the smashed
> stack message. Since then I booted several times but was not able to
> reproduce this stack smashing issue. It seems like that this kind of
> error occurs only once after flashing. Please find attached a diff of
> the notable differences of a new console log compared to the one I
> pushed to board-status. It supports that there is an issue with the
> initial raminit.
> 
> Cheers,
> Matthias
> 
> On 02/05/18 20:12, ron minnich wrote:
>> Yeah I think you want to hunt this stack smash error down, it's not
>> something you want to ignore.
>>
>> On Wed, May 2, 2018 at 11:09 AM Kyösti Mälkki 
>> wrote:
>>
>>> On Wed, May 2, 2018 at 8:53 PM, Nico Huber  wrote:
>>>> On 02.05.2018 18:37, qtux wrote:
>>>>> Thanks for your detailed explanation. So in essence shall I ignore the
>>>>> messages or blacklist lpc_ich?
>>>>
>>>> Yes, either ;)
>>>>
>>>>>
>>>>> Besides, while preparing the status report, I sometimes find a "Smashed
>>>>> stack detected in romstage!" message in the console log, just before
>>>>> ramstage is starting. Is there something to worry about there?
>>>>
>>>> Um, yes. I think that's not good. But I wonder why it's not happening
>>>> consistently.
>>>
>>> I commented about that earlier in this thread. Seemed like actual
>>> raminit eats a lot of stack, but loading from MRC cache or equivalent
>>> does not. One could find that struct and move it to BSS, declared with
>>> CAR_GLOBAL. I would rather not extend the boundary for stack-smashing
>>> detection.
>>>
>>> Kyösti
>>>
>>> --
>>> coreboot mailing list: coreboot@coreboot.org
>>> https://mail.coreboot.org/mailman/listinfo/coreboot
>>

I uploaded a proposed fix for the smashed stack issue:

https://review.coreboot.org/#/c/coreboot/+/26388/

Side note: I tried adding CAR_GLOBAL to the ram_training and the raminfo
struct. Both had no effect on the issue.

Cheers,
Matthias

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] [RFH] Status of the Lenovo X201

2018-05-03 Thread qtux
I uploaded a status report for the X201 and it contains the smashed
stack message. Since then I booted several times but was not able to
reproduce this stack smashing issue. It seems like that this kind of
error occurs only once after flashing. Please find attached a diff of
the notable differences of a new console log compared to the one I
pushed to board-status. It supports that there is an issue with the
initial raminit.

Cheers,
Matthias

On 02/05/18 20:12, ron minnich wrote:
> Yeah I think you want to hunt this stack smash error down, it's not
> something you want to ignore.
> 
> On Wed, May 2, 2018 at 11:09 AM Kyösti Mälkki 
> wrote:
> 
>> On Wed, May 2, 2018 at 8:53 PM, Nico Huber  wrote:
>>> On 02.05.2018 18:37, qtux wrote:
>>>> Thanks for your detailed explanation. So in essence shall I ignore the
>>>> messages or blacklist lpc_ich?
>>>
>>> Yes, either ;)
>>>
>>>>
>>>> Besides, while preparing the status report, I sometimes find a "Smashed
>>>> stack detected in romstage!" message in the console log, just before
>>>> ramstage is starting. Is there something to worry about there?
>>>
>>> Um, yes. I think that's not good. But I wonder why it's not happening
>>> consistently.
>>
>> I commented about that earlier in this thread. Seemed like actual
>> raminit eats a lot of stack, but loading from MRC cache or equivalent
>> does not. One could find that struct and move it to BSS, declared with
>> CAR_GLOBAL. I would rather not extend the boundary for stack-smashing
>> detection.
>>
>> Kyösti
>>
>> --
>> coreboot mailing list: coreboot@coreboot.org
>> https://mail.coreboot.org/mailman/listinfo/coreboot
> 
diff --git a/lenovo/x201/4.7-994-ga940e384b6/2018-05-03T17_19_05Z/coreboot_console.txt b/lenovo/x201/4.7-994-ga940e384b6/2018-05-03T17_19_05Z/coreboot_console.txt
index 6c73a758a..20bffc5c7 100644
--- a/lenovo/x201/4.7-994-ga940e384b6/2018-05-03T17_19_05Z/coreboot_console.txt
+++ b/lenovo/x201/4.7-994-ga940e384b6/2018-05-03T17_19_05Z/coreboot_console.txt
@@ -99,10 +100,6 @@ ME: Error Code  : No Error
 ME: Progress Phase  : BUP Phase
 ME: Power Management Event  : Clean Moff->Mx wake
 ME: Progress Phase State: 0x41
-Smashed stack detected in romstage!
-Smashed stack detected in romstage!
-Smashed stack detected in romstage!
-Smashed stack detected in romstage!
 MTRR Range: Start=ff80 End=0 (Size 80)
 MTRR Range: Start=0 End=100 (Size 100)
 MTRR Range: Start=bf00 End=bf80 (Size 80)
@@ -860,7 +857,7 @@ SMM Module: stub loaded at bf808000. Will call bf8101a6()
 Initializing southbridge SMI... ... pmbase = 0x0500
 
 SMI_STS: MCSMI PM1 
-PM1_STS: WAK BM TMROF 
+PM1_STS: WAK BM 
 GPE0_STS: GPIO14 GPIO11 GPIO9 GPIO5 GPIO4 GPIO3 GPIO2 GPIO1 GPIO0 
 ALT_GP_SMI_STS: GPI14 GPI13 GPI11 GPI10 GPI9 GPI7 GPI6 GPI5 GPI4 GPI3 GPI2 GPI1 GPI0 
 TCO_STS: 
@@ -1301,17 +1298,13 @@ Updating MRC cache data.
 CBFS: 'Master Header Locator' located CBFS at [700200:7fffc0)
 CBFS: Locating 'mrc.cache'
 CBFS: Found @ offset 1fdc0 size 1
-find_current_mrc_cache_local: picked entry 0 from cache block
-Manufacturer: c2
-SF: Detected MX25L6405D with sector size 0x1000, total 0x80
-find_next_mrc_cache: picked next entry from cache block at fff21000
-Finally: write MRC cache update to flash at fff21000
-Successfully wrote MRC cache
-BS: BS_DEV_INIT times (us): entry 5 run 145643 exit 14099
+find_current_mrc_cache_local: picked entry 1 from cache block
+MRC data in flash is up to date. No update.
+BS: BS_DEV_INIT times (us): entry 5 run 145972 exit 12026
 Finalize devices...
 PCI: 00:1f.0 final
 Devices finalized
@@ -1472,7 +1465,7 @@ SF: Detected MX25L6405D with sector size 0x1000, total 0x80
 CBFS: 'Master Header Locator' located CBFS at [700200:7fffc0)
 FMAP: Found "FLASH" version 1.1 at 70.
 FMAP: base = ff80 size = 80 #areas = 3
-Wrote coreboot table at: bf746000, 0x36c bytes, checksum acd2
+Wrote coreboot table at: bf746000, 0x36c bytes, checksum 50d2
 coreboot table: 900 bytes.
 IMD ROOT0. bf7ff000 1000
 IMD SMALL   1. bf7fe000 1000
@@ -1538,14 +1531,16 @@ AHCI controller at 00:1f.2, iobase 0xcfd26000, irq 11
 Found 0 lpt ports
 Found 0 serial ports
 Searching bootorder for: /rom@img/memtest
-Discarding ps2 data aa (status=11)
 Searching bootorder for: /pci@i0cf8/*@1f,2/drive@0/disk@0
 AHCI/0: Set transfer mode to UDMA-5
 AHCI/0: registering: "AHCI/0: M4-CT128M4SSD2 ATA-9 Hard-Disk (119 GiBytes)"
 Initialized USB HUB (0 ports used)
+WARNING - Timeout at ps2_recvbyte:182!
+Discarding ps2 data aa (status=11)
+WARNING - Timeout at ps2_recvbyte:182!
 PS2 keyboard initialized
 WARNING - Timeout at ehci_wait_td:516!
-ehci p

Re: [coreboot] [RFH] Status of the Lenovo X201

2018-05-02 Thread qtux
Thanks for your detailed explanation. So in essence shall I ignore the
messages or blacklist lpc_ich?

Besides, while preparing the status report, I sometimes find a "Smashed
stack detected in romstage!" message in the console log, just before
ramstage is starting. Is there something to worry about there?

It may correlate with another issue I found: Sometimes (mostly after
some experimentation) SeaBios loads a long time (about 10 to 20 seconds)
and is not able to find my SATA drive (though, payloads from cbfs can
still be loaded). Restarting with Ctrl+Alt+Del is sufficient in these
cases to solve the issue until the next time I tinker with coreboot (in
particular experimenting with me_cleaner seems to cause this issue quite
often).

Cheers,
Matthias

On 02/05/18 00:42, Nico Huber wrote:
> On 02.05.2018 00:03, qtux wrote:
>> ...
>> ACPI Warning: SystemIO range 0x0480-0x04AF
>> conflicts with OpRegion 0x0480-0x04EB (\GPIO)
>> (20180105/utaddress-247)
>> ACPI: If an ACPI driver is available for this device, you should use it
>> instead of the native driver
>> lpc_ich: Resource conflict(s) found affecting gpio_ich
>>
>> Maybe these are also caused by copy pasting Sandy Bridge code as I found
>> a reference to PMIO and GPIO with matching addresses in
>> src/southbridge/intel/bd82x6x/acpi/pch.asl. Do you have any ideas on
>> this issue?
> 
> It's pretty simple. When the firmware was written (both coreboot and the
> one from Lenovo) this `lpc_ich` driver didn't exist in Linux and wasn't
> accounted for. From a firmware point of view, that driver shouldn't
> exist at all and our ACPI code claims the device's resources therefore.
> I don't think the driver was meant to be included into generic Linux
> distributions.
> 
> Related story: The same applies to other drivers like the buggy intel-
> spi. That one even warns in Kconfig "Say N here unless you know what you
> are doing.". Due to a simple off-by-one in the code, it bricked[1] a lot
> of systems with Ubuntu 17.10 and they had to withdraw their images.
> That's what you get when you blindly enable all modules and ship them to
> humble users.
> 
> Well, you better know what you are doing ;)
> 
> Nico
> 
> [1] It only write-protected the firmware flash by accident, the actual
> brick was caused by the UEFI shipping on the affected systems.
> 

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] [RFH] Status of the Lenovo X201

2018-05-01 Thread qtux
Thank you Kyösti, your patch solves all irq issues and USB is working
again on my X201i.
I opened a review for adding the X201i as an X201 variant:
https://review.coreboot.org/#/c/coreboot/+/25971/

Apart from that I have the following ACPI conflict with PMIO and GPIO:

ACPI: Battery Slot [BAT0] (battery present)
ACPI: Battery Slot [BAT1] (battery absent)
ACPI: AC Adapter [AC] (on-line)
ACPI Warning: SystemIO range 0x0528-0x052F
conflicts with OpRegion 0x0500-0x057F (\PMIO)
(20180105/utaddress-247)
ACPI: If an ACPI driver is available for this device, you should use it
instead of the native driver
ACPI Warning: SystemIO range 0x04C0-0x04CF
conflicts with OpRegion 0x0480-0x04EB (\GPIO)
(20180105/utaddress-247)
ACPI: If an ACPI driver is available for this device, you should use it
instead of the native driver
ACPI Warning: SystemIO range 0x04B0-0x04BF
conflicts with OpRegion 0x0480-0x04EB (\GPIO)
(20180105/utaddress-247)
ACPI: If an ACPI driver is available for this device, you should use it
instead of the native driver
ACPI Warning: SystemIO range 0x0480-0x04AF
conflicts with OpRegion 0x0480-0x04EB (\GPIO)
(20180105/utaddress-247)
ACPI: If an ACPI driver is available for this device, you should use it
instead of the native driver
lpc_ich: Resource conflict(s) found affecting gpio_ich

Maybe these are also caused by copy pasting Sandy Bridge code as I found
a reference to PMIO and GPIO with matching addresses in
src/southbridge/intel/bd82x6x/acpi/pch.asl. Do you have any ideas on
this issue?

Cheers,
Matthias


On 01/05/18 19:35, qtux wrote:
> Beware that patch is incomplete! Coreboot dies at
> src/southbridge/intel/common/acpi_pirq_gen.c line 97:
> 
> if (!lpcb_path)
>   die("ACPI_PIRQ_GEN: Missing LPCB ACPI path\n");
> 
> You have to add the lpc_acpi_name function to
> src/southbridge/intel/ibexpeak/lpc.c as in
> src/southbridge/intel/bd82x6x/lpc.c to circumvent this issue.
> 
> I was preparing to upload my patch when I saw yours (which is almost
> identical). Additionally, I already tested my patch on a Lenovo X201i.
> Shall I edit your patch on gerrit or upload mine in a separate merge
> request, or do something else?
> 
> Cheers,
> Matthias
> 
> On 01/05/18 18:45, Kyösti Mälkki wrote:
>> On Mon, Apr 30, 2018 at 6:46 AM, qtux  wrote:
>>> I wrote a patch [0] for the finalize code issue. With that my X201i is 
>>> working fine on current master besides an regression introduced in commit 
>>> 7f5efd90e598320791200e03f761309ee04b58a3 [1]. With that regression USB and 
>>> SD card is not working anymore and it raises the following errors:
>>
>> Thanks for patching _and_ testing, your patch for finalize was just merged.
>>
>>> can't derive routing for PCI INT A
>>> PCI INT A: no GSI
>>
>> As for IRQ regressions, I think I can see where it went wrong, find my
>> attempt to fix it blind-folded [1].
>>
>> [1] https://review.coreboot.org/#/c/coreboot/+/25965
>>
>> Kyösti
>>
> 

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] [RFH] Status of the Lenovo X201

2018-05-01 Thread qtux
Beware that patch is incomplete! Coreboot dies at
src/southbridge/intel/common/acpi_pirq_gen.c line 97:

if (!lpcb_path)
die("ACPI_PIRQ_GEN: Missing LPCB ACPI path\n");

You have to add the lpc_acpi_name function to
src/southbridge/intel/ibexpeak/lpc.c as in
src/southbridge/intel/bd82x6x/lpc.c to circumvent this issue.

I was preparing to upload my patch when I saw yours (which is almost
identical). Additionally, I already tested my patch on a Lenovo X201i.
Shall I edit your patch on gerrit or upload mine in a separate merge
request, or do something else?

Cheers,
Matthias

On 01/05/18 18:45, Kyösti Mälkki wrote:
> On Mon, Apr 30, 2018 at 6:46 AM, qtux  wrote:
>> I wrote a patch [0] for the finalize code issue. With that my X201i is 
>> working fine on current master besides an regression introduced in commit 
>> 7f5efd90e598320791200e03f761309ee04b58a3 [1]. With that regression USB and 
>> SD card is not working anymore and it raises the following errors:
> 
> Thanks for patching _and_ testing, your patch for finalize was just merged.
> 
>> can't derive routing for PCI INT A
>> PCI INT A: no GSI
> 
> As for IRQ regressions, I think I can see where it went wrong, find my
> attempt to fix it blind-folded [1].
> 
> [1] https://review.coreboot.org/#/c/coreboot/+/25965
> 
> Kyösti
> 

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] [RFH] Status of the Lenovo X201

2018-04-29 Thread qtux
I wrote a patch [0] for the finalize code issue. With that my X201i is working 
fine on current master besides an regression introduced in commit 
7f5efd90e598320791200e03f761309ee04b58a3 [1]. With that regression USB and SD 
card is not working anymore and it raises the following errors:

[   17.986754] usb 1-1: device not accepting address 2, error -110
[   18.110095] usb 1-1: new high-speed USB device number 3 using ehci-pci
[   18.200089] usb 2-1: device not accepting address 2, error -110
[   18.323421] usb 2-1: new high-speed USB device number 3 using ehci-pci
[   34.200083] usb 1-1: device not accepting address 3, error -110
[   34.200169] usb usb1-port1: attempt power cycle
[   34.413364] usb 2-1: device not accepting address 3, error -110
[   34.413439] usb usb2-port1: attempt power cycle
[   34.636752] usb 1-1: new high-speed USB device number 4 using ehci-pci
[   34.850085] usb 2-1: new high-speed USB device number 4 using ehci-pci
[   45.293417] usb 1-1: device not accepting address 4, error -110
[   45.416732] usb 1-1: new high-speed USB device number 5 using ehci-pci
[   45.506783] usb 2-1: device not accepting address 4, error -110
[   45.630088] usb 2-1: new high-speed USB device number 5 using ehci-pci
[   56.173393] usb 1-1: device not accepting address 5, error -110
[   56.173445] usb usb1-port1: unable to enumerate USB device
[   56.386753] usb 2-1: device not accepting address 5, error -110
[   56.386845] usb usb2-port1: unable to enumerate USB device

Additionally there are some IRQ errors inside the kernel messages like

can't derive routing for PCI INT A
PCI INT A: no GSI

for different devices which seem related to the change in [1].

Cheers,
Matthias

[0] https://review.coreboot.org/#/c/coreboot/+/25914/
[1] https://review.coreboot.org/#/c/coreboot/+/22859/


On 29/04/18 14:14, Kyösti Mälkki wrote:
> On Sun, Apr 29, 2018 at 1:35 PM, Nicola Corna  wrote:
>> April 28, 2018 5:59 PM, "Nico Huber"  wrote:
>>
>>> Yes, that's very likely a problem. It looks like the whole finalize code
>>> path of the X201 was untested all the time (even on resume). I don't
>>> remember if EHCI debug works in SMM? If it does, you could enable log-
>>> ging for the SMI handler as well (if you want to debug it).
>>>
>>> Nico
>> Attached you can find the log with the SMM debug enabled, but it doesn't seem
>> to me much different from the non-debug log.
>>
>> Nicola
> DEBUG_SMI does not output to EHCI, I have considered it too unstable.
>
> You can try your luck with attached patch to have DEBUG_SMI=y output
> on EHCI debug. EHCI console code does not take precautions against
> someone else touching the same register set so it's likely to fail
> once payload and/or OS loads its EHCI driver, possibly making USB
> media and keyboard unusable as well.
>
> Kyösti


-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot

[coreboot] Lenovo X201i not booting and not resuming from S3 due to SMM finalize

2018-04-19 Thread qtux
Hello,

I found that after updating my Lenovo X201i the commit
d533b16669a3bacb19b2824e6b4bc76a2a18c92a [1] causes a freeze during
boot. Additionally, if INTEL_CHIPSET_LOCKDOWN is disabled (and thus
booting works), resuming from S3 does not work anymore. I found that the
following patch circumvents the S3 resume issue (I did not test with
INTEL_CHIPSET_LOCKDOWN enabled, yet):

diff --git a/src/cpu/intel/model_2065x/finalize.c
b/src/cpu/intel/model_2065x/finalize.c
index 50e00bf74a..693aafe37a 100644
--- a/src/cpu/intel/model_2065x/finalize.c
+++ b/src/cpu/intel/model_2065x/finalize.c
@@ -56,5 +56,5 @@ void intel_model_2065x_finalize_smm(void)
msr_set_bit(MSR_MISC_PWR_MGMT, 22);

/* Lock memory configuration to protect SMM */
-   msr_set_bit(MSR_LT_LOCK_MEMORY, 0);
+   //msr_set_bit(MSR_LT_LOCK_MEMORY, 0);
 }
diff --git a/src/northbridge/intel/nehalem/finalize.c
b/src/northbridge/intel/nehalem/finalize.c
index 0b5cb74ce2..c9bce581bf 100644
--- a/src/northbridge/intel/nehalem/finalize.c
+++ b/src/northbridge/intel/nehalem/finalize.c
@@ -26,7 +26,7 @@ void intel_nehalem_finalize_smm(void)
pci_or_config32(PCI_DEV_SNB, 0x5c, 1 << 0); /* DPR */
pci_or_config32(PCI_DEV_SNB, 0x78, 1 << 10);/* ME */
pci_or_config32(PCI_DEV_SNB, 0x90, 1 << 0); /* REMAPBASE */
-   pci_or_config32(PCI_DEV_SNB, 0x98, 1 << 0); /* REMAPLIMIT */
+   //pci_or_config32(PCI_DEV_SNB, 0x98, 1 << 0);   /* REMAPLIMIT */
pci_or_config32(PCI_DEV_SNB, 0xa0, 1 << 0); /* TOM */
pci_or_config32(PCI_DEV_SNB, 0xa8, 1 << 0); /* TOUUD */
pci_or_config32(PCI_DEV_SNB, 0xb0, 1 << 0); /* BDSM */

I tried to find some clues in the documentation of the first [2] and
second [3] generation mobile Core CPU as the finalize code seems to be
copied from the Sandy Bridge code. I found no reference to a register
locking mechanism in [2] comparable to the one described in [3] and used
inside the SMM finalization code, which seems odd to me. Additionally,
the register sizes and locations appear to be different. I suspect that
the Nehalem finalize function does not work as intended and may probably
be removed altogether.

Do you have any ideas on what is wrong here? I added my shortconfig [4]
in case you need it.

Cheers,
Matthias

[1] https://review.coreboot.org/21129
[2]
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/2nd-gen-core-family-mobile-vol-2-datasheet.pdf
[3]
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/core-mobile-datasheet-vol-2.pdf
[4] shortconfig:
# This image was built using coreboot 4.7-784-g08c4ce851e-dirty
CONFIG_VENDOR_LENOVO=y
CONFIG_HAVE_IFD_BIN=y
CONFIG_HAVE_ME_BIN=y
CONFIG_HAVE_GBE_BIN=y
CONFIG_BOARD_LENOVO_X201=y
# CONFIG_INTEL_CHIPSET_LOCKDOWN is not set
CONFIG_MEMTEST_SECONDARY_PAYLOAD=y

-- 
coreboot mailing list: coreboot@coreboot.org
https://mail.coreboot.org/mailman/listinfo/coreboot