Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-09 Thread Marc Jones
On Fri, Nov 6, 2009 at 7:57 AM, Nathan Williams  wrote:
> Marc Jones wrote:
>>
>> When linux does the reset, is the coreboot output the same? Does it do
>> the "Resetting the processor"?
>
> Yes, it does "Resetting the processor after PLL configuration for the
> changes to take effect"
>
> I captured an example of my issue:
>
> http://coreboot.pastebin.com/m7f5ed367
>
> I made some more observations today:
>
> From Linux, a reboot from the command line works fine.  It only seems to
>  die when a fsck check fails on boot and forces a reboot.  The motherboard
> doesn't have a RTC backup battery at the moment, so to test I have been
> setting the clock in Linux, shutdown, remove the power supply for a few
> seconds, then boot up again.  Because the default time is in 1999, Linux
> runs a fsck which causes it to restart and die in coreboot.
>
> Once coreboot crashes, a hardware reset doesn't fix it.  Coreboot will
> always stop at the same point.  Even removing power from the motherboard
> doesn't help.  However, I did find that by swapping the SODIMM to a
> different RAM module would boot.  I know it doesn't sound very scientific
> but it's what appeared to happen.
>
> Is it possible that coreboot or maybe SeaBIOS is using incorrect values from
> non-volatile ram?
>
> Another observation I made was that by setting the debug_level to BIOS_CRIT,
> instead of dying at the usual spot in disable_car() and stopping, coreboot
> would reset continuously (cycling every 1-2 seconds)
>
> Another issue that's partly related is the ability for coreboot to set  the
> GeodeLink speed depending on the detected RAM speed.  As a work-around, we
> are only using 333MHz SODIMMs and have set the bootstrap bits for
> GLCP_SYS_RSTPLL[7:1] (section 6.14.2.13 of LX databook) to 500Mhz CPU,
> 333MHz GLIU instead of bypass mode.  In bypass mode, the GLIU is 266MHz and
> some of our 333MHz RAM will fail in disable_car(). As a test, I have
> experimented with
> pll_reset(MANUALCONF, PLLMSRHI, PLLMSRLO) in initram.c in an attempt to
> change the GLIU to 333MHz.  I probably didn't have the correct bits set, so
> even though I managed to set GLIU, it failed the last test (DLL) in
> sdram_enable() and would reset.

Your second problem might explain the first. You should look closely
at the detection problem. It depends on the reset and the state of the
rstpll flags. There could be a corner case or something unusual going
on. How did you set the boot strap bits with hardware (straps)? You
should use pll_reset(ManualConf) settings to change it with hardware.

Marc


-- 
http://marcjonesconsulting.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-10 Thread Nathan Williams

Marc Jones wrote:

On Fri, Nov 6, 2009 at 7:57 AM, Nathan Williams  wrote:

Another observation I made was that by setting the debug_level to BIOS_CRIT,
instead of dying at the usual spot in disable_car() and stopping, coreboot
would reset continuously (cycling every 1-2 seconds)


Since I needed to have a BIOS that didn't have much debugging enabled 
for a customer sample, I looked a bit deeper to find the cause of this 
continuous reset behaviour.  Even changing the debug level from 
BIOS_SPEW to BIOS_DEBUG caused the reset.  I tracked it down to a single 
 printk and my attached patch means it works at BIOS_CRIT now, just 
with a few extra debug lines.  Without the printk, the code gets to 
"missing phase4_read_resources" (just a few lines down from my patch) 
before restarting.




Another issue that's partly related is the ability for coreboot to set  the
GeodeLink speed depending on the detected RAM speed.  As a work-around, we
are only using 333MHz SODIMMs and have set the bootstrap bits for
GLCP_SYS_RSTPLL[7:1] (section 6.14.2.13 of LX databook) to 500Mhz CPU,
333MHz GLIU instead of bypass mode.  In bypass mode, the GLIU is 266MHz and
some of our 333MHz RAM will fail in disable_car(). As a test, I have
experimented with
pll_reset(MANUALCONF, PLLMSRHI, PLLMSRLO) in initram.c in an attempt to
change the GLIU to 333MHz.  I probably didn't have the correct bits set, so
even though I managed to set GLIU, it failed the last test (DLL) in
sdram_enable() and would reset.


Your second problem might explain the first. You should look closely
at the detection problem. It depends on the reset and the state of the
rstpll flags. There could be a corner case or something unusual going
on. How did you set the boot strap bits with hardware (straps)? You
should use pll_reset(ManualConf) settings to change it with hardware.

Marc




Sorry, I should have explained that we set the boostrap bits in hardware:

Bit 7: PW1 pad - active high when the PCI clock is 66 MHz, low for 33 MHz.
Bit 6: IRQ13 pad - active high for stall-on-reset debug feature, 
otherwise low.

Bit 5: PW0 pad - part of CPU/GLIU frequency selects.
Bit 4: SUSPA# pad - part of CPU/GLIU frequency selects.
Bit 3: GNT2# pad - part of CPU/GLIU frequency selects.
Bit 2: GNT1# pad - part of CPU/GLIU frequency selects.
Bit 1: GNT0# pad - part of CPU/GLIU frequency selects.

We have pulled these pins up or down to be "0010110", which corresponds 
to CPU 500MHz, GLIU 333MHz in table 6-87.  This should also mean that 
the on reset, the value of GLCP_SYS_RSTPLL should be 049C_0300182Ch 
(except that SWFLAGS (GLCP_SYS_RSTPLL[31:26]) is only reset to 0 on 
Power On Reset (POR).  So I should be using pll_reset(ManualConf)?  I'll 
try it later today and see if I can get some debugging output.


Regards,
Nathan

--- a/device/device.c
+++ b/device/device.c
@@ -282,7 +282,7 @@ void read_resources(struct bus *bus)
 	/* Walk through all devices and find which resources they need. */
 	for (curdev = bus->children; curdev; curdev = curdev->sibling) {
 		int i;
-		printk(BIOS_SPEW,
+		printk(BIOS_CRIT,
 		   "%s: %s(%s) dtsname %s enabled %d\n",
 		   __func__, bus->dev ? bus->dev->dtsname : "NOBUSDEV",
 		   bus->dev ? dev_path(bus->dev) : "NOBUSDEV",
-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-10 Thread Marc Jones
On Tue, Nov 10, 2009 at 1:26 PM, Nathan Williams  wrote:
> Marc Jones wrote:
>>
>> On Fri, Nov 6, 2009 at 7:57 AM, Nathan Williams 
>> wrote:
>>>
>>> Another observation I made was that by setting the debug_level to
>>> BIOS_CRIT,
>>> instead of dying at the usual spot in disable_car() and stopping,
>>> coreboot
>>> would reset continuously (cycling every 1-2 seconds)
>
> Since I needed to have a BIOS that didn't have much debugging enabled for a
> customer sample, I looked a bit deeper to find the cause of this continuous
> reset behaviour.  Even changing the debug level from BIOS_SPEW to BIOS_DEBUG
> caused the reset.  I tracked it down to a single  printk and my attached
> patch means it works at BIOS_CRIT now, just with a few extra debug lines.
>  Without the printk, the code gets to "missing phase4_read_resources" (just
> a few lines down from my patch) before restarting.

This sounds like it is probably blowing the stack or the stack hits
memory that isn't working correctly.


>
>>>
>>> Another issue that's partly related is the ability for coreboot to set
>>>  the
>>> GeodeLink speed depending on the detected RAM speed.  As a work-around,
>>> we
>>> are only using 333MHz SODIMMs and have set the bootstrap bits for
>>> GLCP_SYS_RSTPLL[7:1] (section 6.14.2.13 of LX databook) to 500Mhz CPU,
>>> 333MHz GLIU instead of bypass mode.  In bypass mode, the GLIU is 266MHz
>>> and
>>> some of our 333MHz RAM will fail in disable_car(). As a test, I have
>>> experimented with
>>> pll_reset(MANUALCONF, PLLMSRHI, PLLMSRLO) in initram.c in an attempt to
>>> change the GLIU to 333MHz.  I probably didn't have the correct bits set,
>>> so
>>> even though I managed to set GLIU, it failed the last test (DLL) in
>>> sdram_enable() and would reset.
>>
>> Your second problem might explain the first. You should look closely
>> at the detection problem. It depends on the reset and the state of the
>> rstpll flags. There could be a corner case or something unusual going
>> on. How did you set the boot strap bits with hardware (straps)? You
>> should use pll_reset(ManualConf) settings to change it with hardware.
>>
>> Marc
>>
>>
>
> Sorry, I should have explained that we set the boostrap bits in hardware:
>
> Bit 7: PW1 pad - active high when the PCI clock is 66 MHz, low for 33 MHz.
> Bit 6: IRQ13 pad - active high for stall-on-reset debug feature, otherwise
> low.
> Bit 5: PW0 pad - part of CPU/GLIU frequency selects.
> Bit 4: SUSPA# pad - part of CPU/GLIU frequency selects.
> Bit 3: GNT2# pad - part of CPU/GLIU frequency selects.
> Bit 2: GNT1# pad - part of CPU/GLIU frequency selects.
> Bit 1: GNT0# pad - part of CPU/GLIU frequency selects.
>
> We have pulled these pins up or down to be "0010110", which corresponds to
> CPU 500MHz, GLIU 333MHz in table 6-87.  This should also mean that the on
> reset, the value of GLCP_SYS_RSTPLL should be 049C_0300182Ch (except
> that SWFLAGS (GLCP_SYS_RSTPLL[31:26]) is only reset to 0 on Power On Reset
> (POR).  So I should be using pll_reset(ManualConf)?  I'll try it later today
> and see if I can get some debugging output.

If it is set by straps, it should be doing the right thing and you
don't need to use the ManualConf. There could still be a corner case
and you should try trace through the soft reset that is causing the
problem. Also, have you diff'd the MC settings between the BIOS and
coreboot. I would be interested in discrepancies.

Marc


-- 
http://marcjonesconsulting.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-22 Thread Nathan Williams
Marc Jones wrote:
> On Tue, Nov 10, 2009 at 1:26 PM, Nathan Williams  
> wrote:
>> Marc Jones wrote:
>>> On Fri, Nov 6, 2009 at 7:57 AM, Nathan Williams 
>>> wrote:
 Another observation I made was that by setting the debug_level to
 BIOS_CRIT,
 instead of dying at the usual spot in disable_car() and stopping,
 coreboot
 would reset continuously (cycling every 1-2 seconds)
>> Since I needed to have a BIOS that didn't have much debugging enabled for a
>> customer sample, I looked a bit deeper to find the cause of this continuous
>> reset behaviour.  Even changing the debug level from BIOS_SPEW to BIOS_DEBUG
>> caused the reset.  I tracked it down to a single  printk and my attached
>> patch means it works at BIOS_CRIT now, just with a few extra debug lines.
>>  Without the printk, the code gets to "missing phase4_read_resources" (just
>> a few lines down from my patch) before restarting.
> 
> This sounds like it is probably blowing the stack or the stack hits
> memory that isn't working correctly.
> 
> 
 Another issue that's partly related is the ability for coreboot to set
  the
 GeodeLink speed depending on the detected RAM speed.  As a work-around,
 we
 are only using 333MHz SODIMMs and have set the bootstrap bits for
 GLCP_SYS_RSTPLL[7:1] (section 6.14.2.13 of LX databook) to 500Mhz CPU,
 333MHz GLIU instead of bypass mode.  In bypass mode, the GLIU is 266MHz
 and
 some of our 333MHz RAM will fail in disable_car(). As a test, I have
 experimented with
 pll_reset(MANUALCONF, PLLMSRHI, PLLMSRLO) in initram.c in an attempt to
 change the GLIU to 333MHz.  I probably didn't have the correct bits set,
 so
 even though I managed to set GLIU, it failed the last test (DLL) in
 sdram_enable() and would reset.
>>> Your second problem might explain the first. You should look closely
>>> at the detection problem. It depends on the reset and the state of the
>>> rstpll flags. There could be a corner case or something unusual going
>>> on. How did you set the boot strap bits with hardware (straps)? You
>>> should use pll_reset(ManualConf) settings to change it with hardware.
>>>
>>> Marc
>>>
>>>
>> Sorry, I should have explained that we set the boostrap bits in hardware:
>>
>> Bit 7: PW1 pad - active high when the PCI clock is 66 MHz, low for 33 MHz.
>> Bit 6: IRQ13 pad - active high for stall-on-reset debug feature, otherwise
>> low.
>> Bit 5: PW0 pad - part of CPU/GLIU frequency selects.
>> Bit 4: SUSPA# pad - part of CPU/GLIU frequency selects.
>> Bit 3: GNT2# pad - part of CPU/GLIU frequency selects.
>> Bit 2: GNT1# pad - part of CPU/GLIU frequency selects.
>> Bit 1: GNT0# pad - part of CPU/GLIU frequency selects.
>>
>> We have pulled these pins up or down to be "0010110", which corresponds to
>> CPU 500MHz, GLIU 333MHz in table 6-87.  This should also mean that the on
>> reset, the value of GLCP_SYS_RSTPLL should be 049C_0300182Ch (except
>> that SWFLAGS (GLCP_SYS_RSTPLL[31:26]) is only reset to 0 on Power On Reset
>> (POR).  So I should be using pll_reset(ManualConf)?  I'll try it later today
>> and see if I can get some debugging output.
> 
> If it is set by straps, it should be doing the right thing and you
> don't need to use the ManualConf. There could still be a corner case
> and you should try trace through the soft reset that is causing the
> problem. Also, have you diff'd the MC settings between the BIOS and
> coreboot. I would be interested in discrepancies.
> 
> Marc
> 
> 

I managed to get the commercial BIOS to boot on my board and diffed it with 
coreboot:

http://coreboot.pastebin.com/m39b22c21

The only differences I can see are related to interrupts, which shouldn't 
matter in relation to
my RAM problems.

I have also run a memtest86 with the commercial BIOS (from bootable CDROM) and 
as a payload in coreboot.
The commercial BIOS didn't have any errors, but my coreboot did.  So the 
hardware can't be too bad.

Nathan

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-23 Thread Marc Jones
On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
 wrote:
> Marc Jones wrote:
>> On Tue, Nov 10, 2009 at 1:26 PM, Nathan Williams  
>> wrote:
>>> Marc Jones wrote:
 On Fri, Nov 6, 2009 at 7:57 AM, Nathan Williams 
 wrote:
> Another observation I made was that by setting the debug_level to
> BIOS_CRIT,
> instead of dying at the usual spot in disable_car() and stopping,
> coreboot
> would reset continuously (cycling every 1-2 seconds)
>>> Since I needed to have a BIOS that didn't have much debugging enabled for a
>>> customer sample, I looked a bit deeper to find the cause of this continuous
>>> reset behaviour.  Even changing the debug level from BIOS_SPEW to BIOS_DEBUG
>>> caused the reset.  I tracked it down to a single  printk and my attached
>>> patch means it works at BIOS_CRIT now, just with a few extra debug lines.
>>>  Without the printk, the code gets to "missing phase4_read_resources" (just
>>> a few lines down from my patch) before restarting.
>>
>> This sounds like it is probably blowing the stack or the stack hits
>> memory that isn't working correctly.
>>
>>
> Another issue that's partly related is the ability for coreboot to set
>  the
> GeodeLink speed depending on the detected RAM speed.  As a work-around,
> we
> are only using 333MHz SODIMMs and have set the bootstrap bits for
> GLCP_SYS_RSTPLL[7:1] (section 6.14.2.13 of LX databook) to 500Mhz CPU,
> 333MHz GLIU instead of bypass mode.  In bypass mode, the GLIU is 266MHz
> and
> some of our 333MHz RAM will fail in disable_car(). As a test, I have
> experimented with
> pll_reset(MANUALCONF, PLLMSRHI, PLLMSRLO) in initram.c in an attempt to
> change the GLIU to 333MHz.  I probably didn't have the correct bits set,
> so
> even though I managed to set GLIU, it failed the last test (DLL) in
> sdram_enable() and would reset.
 Your second problem might explain the first. You should look closely
 at the detection problem. It depends on the reset and the state of the
 rstpll flags. There could be a corner case or something unusual going
 on. How did you set the boot strap bits with hardware (straps)? You
 should use pll_reset(ManualConf) settings to change it with hardware.

 Marc


>>> Sorry, I should have explained that we set the boostrap bits in hardware:
>>>
>>> Bit 7: PW1 pad - active high when the PCI clock is 66 MHz, low for 33 MHz.
>>> Bit 6: IRQ13 pad - active high for stall-on-reset debug feature, otherwise
>>> low.
>>> Bit 5: PW0 pad - part of CPU/GLIU frequency selects.
>>> Bit 4: SUSPA# pad - part of CPU/GLIU frequency selects.
>>> Bit 3: GNT2# pad - part of CPU/GLIU frequency selects.
>>> Bit 2: GNT1# pad - part of CPU/GLIU frequency selects.
>>> Bit 1: GNT0# pad - part of CPU/GLIU frequency selects.
>>>
>>> We have pulled these pins up or down to be "0010110", which corresponds to
>>> CPU 500MHz, GLIU 333MHz in table 6-87.  This should also mean that the on
>>> reset, the value of GLCP_SYS_RSTPLL should be 049C_0300182Ch (except
>>> that SWFLAGS (GLCP_SYS_RSTPLL[31:26]) is only reset to 0 on Power On Reset
>>> (POR).  So I should be using pll_reset(ManualConf)?  I'll try it later today
>>> and see if I can get some debugging output.
>>
>> If it is set by straps, it should be doing the right thing and you
>> don't need to use the ManualConf. There could still be a corner case
>> and you should try trace through the soft reset that is causing the
>> problem. Also, have you diff'd the MC settings between the BIOS and
>> coreboot. I would be interested in discrepancies.
>>
>> Marc
>>
>>
>
> I managed to get the commercial BIOS to boot on my board and diffed it with 
> coreboot:
>
> http://coreboot.pastebin.com/m39b22c21
>
> The only differences I can see are related to interrupts, which shouldn't 
> matter in relation to
> my RAM problems.
>
> I have also run a memtest86 with the commercial BIOS (from bootable CDROM) 
> and as a payload in coreboot.
> The commercial BIOS didn't have any errors, but my coreboot did.  So the 
> hardware can't be too bad.

That looks like just the southbridge cs5536 target. The memory
differences would be in the processor geodelx target. Can you send
those results?

Marc




-- 
http://marcjonesconsulting.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-24 Thread Nathan Williams
Marc Jones wrote:
> On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
>  wrote:
>> I managed to get the commercial BIOS to boot on my board and diffed it with 
>> coreboot:
>>
>> http://coreboot.pastebin.com/m39b22c21
>>
>> The only differences I can see are related to interrupts, which shouldn't 
>> matter in relation to
>> my RAM problems.
>>
>> I have also run a memtest86 with the commercial BIOS (from bootable CDROM) 
>> and as a payload in coreboot.
>> The commercial BIOS didn't have any errors, but my coreboot did.  So the 
>> hardware can't be too bad.
> 
> That looks like just the southbridge cs5536 target. The memory
> differences would be in the processor geodelx target. Can you send
> those results?
> 
> Marc
> 

I did some new MSR dumps.

Diff:
./msrtool -t geodelx -t cs5536 -d amd_ref_bios
http://coreboot.pastebin.com/m5e487f87

AMD NAS reference BIOS:
./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios
http://coreboot.pastebin.com/madc04ac

My Coreboot:
./msrtool -t geodelx -t cs5536 -l -s nathan_bios
http://coreboot.pastebin.com/m7f35d855


The diffs I did today show some differences with GLCP_DELAY_CONTROLS.
Last time I added some code to force it to match the commercial BIOS
GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.

I also tested all the SODIMMS I have here (about 10) with the commercial BIOS.
Each time I did a msrtool diff to one I saved on disk.

Most are 333MHz, but 2 are 400MHz.  There weren't any changes to the MSRs.

Could there be an issue with the initialisation sequence that reading MSRs
after booting won't show?  Also, quite a few MSRs aren't defined in geodelx.c 
yet.
Are there any obvious ones that should be added in?

Regards,
Nathan

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-24 Thread Marc Jones
On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams  wrote:
> Marc Jones wrote:
>> On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
>>  wrote:
>>> I managed to get the commercial BIOS to boot on my board and diffed it with 
>>> coreboot:
>>>
>>> http://coreboot.pastebin.com/m39b22c21
>>>
>>> The only differences I can see are related to interrupts, which shouldn't 
>>> matter in relation to
>>> my RAM problems.
>>>
>>> I have also run a memtest86 with the commercial BIOS (from bootable CDROM) 
>>> and as a payload in coreboot.
>>> The commercial BIOS didn't have any errors, but my coreboot did.  So the 
>>> hardware can't be too bad.
>>
>> That looks like just the southbridge cs5536 target. The memory
>> differences would be in the processor geodelx target. Can you send
>> those results?
>>
>> Marc
>>
>
> I did some new MSR dumps.
>
> Diff:
> ./msrtool -t geodelx -t cs5536 -d amd_ref_bios
> http://coreboot.pastebin.com/m5e487f87
>
> AMD NAS reference BIOS:
> ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios
> http://coreboot.pastebin.com/madc04ac
>
> My Coreboot:
> ./msrtool -t geodelx -t cs5536 -l -s nathan_bios
> http://coreboot.pastebin.com/m7f35d855
>
>
> The diffs I did today show some differences with GLCP_DELAY_CONTROLS.
> Last time I added some code to force it to match the commercial BIOS
> GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.
>
> I also tested all the SODIMMS I have here (about 10) with the commercial BIOS.
> Each time I did a msrtool diff to one I saved on disk.
>
> Most are 333MHz, but 2 are 400MHz.  There weren't any changes to the MSRs.
>
> Could there be an issue with the initialisation sequence that reading MSRs
> after booting won't show?  Also, quite a few MSRs aren't defined in geodelx.c 
> yet.
> Are there any obvious ones that should be added in?
>

--- AMD NAS reference BIOS
+++ Nathan's coreboot v3
#
# GLCP_DELAY_CONTROLS
#
-0x4c0f 0x83f1_00aa_5696_0404
+0x4c0f 0x8271_005a_ 5696_ 0404

It looks like coreboot and the ref bios detect different dimm
configuration. This timing setup could be part of the instability (I
don't think it explains the reset problem). Look at the code here:
SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets
set to see what might be happening. Make sure that MTest is disabled
in the ref bios setup. This setting is based on the number of devices
(load) there is on the dimm.

I didn't realize that so few registers were in the msr tool for
geodelx. You should add these:
2018h R/W Refresh and SDRAM Program (MC_CF07_DATA)
10071007_0040h Page 227
2019h R/W Timing and Mode Program (MC_CF8F_DATA) 1808_287337A3h Page 229
201Ah R/W Feature Enables (MC_CF1017_DATA) _11080001h Page 231
201Bh RO Performance Counters (MC_CFPERF_CNT1) _h Page 232
201Ch R/W Counter and CAS Control (MC_PERCNT2) _00FF00FFh Page 233
201Dh R/W Clocking and Debug (MC_CFCLK_DBUG) _1300h Page 233

4C0Fh R/W GLCP I/O Delay
Controls(GLCP_DELAY_CONTROLS)_h Page 549
4C14h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL)
Bootstrap specific Page 554

Marc

-- 
http://marcjonesconsulting.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-25 Thread Nathan Williams
Marc Jones wrote:
> On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams  
> wrote:
>> Marc Jones wrote:
>>> On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
>>>  wrote:
 I managed to get the commercial BIOS to boot on my board and diffed it 
 with coreboot:

 http://coreboot.pastebin.com/m39b22c21

 The only differences I can see are related to interrupts, which shouldn't 
 matter in relation to
 my RAM problems.

 I have also run a memtest86 with the commercial BIOS (from bootable CDROM) 
 and as a payload in coreboot.
 The commercial BIOS didn't have any errors, but my coreboot did.  So the 
 hardware can't be too bad.
>>> That looks like just the southbridge cs5536 target. The memory
>>> differences would be in the processor geodelx target. Can you send
>>> those results?
>>>
>>> Marc
>>>
>> I did some new MSR dumps.
>>
>> Diff:
>> ./msrtool -t geodelx -t cs5536 -d amd_ref_bios
>> http://coreboot.pastebin.com/m5e487f87
>>
>> AMD NAS reference BIOS:
>> ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios
>> http://coreboot.pastebin.com/madc04ac
>>
>> My Coreboot:
>> ./msrtool -t geodelx -t cs5536 -l -s nathan_bios
>> http://coreboot.pastebin.com/m7f35d855
>>
>>
>> The diffs I did today show some differences with GLCP_DELAY_CONTROLS.
>> Last time I added some code to force it to match the commercial BIOS
>> GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.
>>
>> I also tested all the SODIMMS I have here (about 10) with the commercial 
>> BIOS.
>> Each time I did a msrtool diff to one I saved on disk.
>>
>> Most are 333MHz, but 2 are 400MHz.  There weren't any changes to the MSRs.
>>
>> Could there be an issue with the initialisation sequence that reading MSRs
>> after booting won't show?  Also, quite a few MSRs aren't defined in 
>> geodelx.c yet.
>> Are there any obvious ones that should be added in?
>>
> 
> --- AMD NAS reference BIOS
> +++ Nathan's coreboot v3
> #
> # GLCP_DELAY_CONTROLS
> #
> -0x4c0f 0x83f1_00aa_5696_0404
> +0x4c0f 0x8271_005a_ 5696_ 0404
> 
> It looks like coreboot and the ref bios detect different dimm
> configuration. This timing setup could be part of the instability (I
> don't think it explains the reset problem). Look at the code here:
> SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets
> set to see what might be happening. Make sure that MTest is disabled
> in the ref bios setup. This setting is based on the number of devices
> (load) there is on the dimm.
> 
> I didn't realize that so few registers were in the msr tool for
> geodelx. You should add these:
> 2018h R/W Refresh and SDRAM Program (MC_CF07_DATA)
> 10071007_0040h Page 227
> 2019h R/W Timing and Mode Program (MC_CF8F_DATA) 1808_287337A3h Page 
> 229
> 201Ah R/W Feature Enables (MC_CF1017_DATA) _11080001h Page 231
> 201Bh RO Performance Counters (MC_CFPERF_CNT1) _h Page 232
> 201Ch R/W Counter and CAS Control (MC_PERCNT2) _00FF00FFh Page 233
> 201Dh R/W Clocking and Debug (MC_CFCLK_DBUG) _1300h Page 233
> 
> 4C0Fh R/W GLCP I/O Delay
> Controls(GLCP_DELAY_CONTROLS)_h Page 549
> 4C14h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL)
> Bootstrap specific Page 554
> 
> Marc
> 

I've now added the MSRs and uploaded to pastebin:

AMD NAS:
http://coreboot.pastebin.com/m53aed60b

My coreboot:
http://coreboot.pastebin.com/md23bc6a

./msrtool -d AMD_NAS:
http://coreboot.pastebin.com/m77663de5

Tomorrow I'll try the tests on the NAS hardware, instead of our own motherboards
just in case there are some hidden hardware issues.

Regards,
Nathan

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-27 Thread Nathan Williams
Nathan Williams wrote:
> Marc Jones wrote:
>> On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams  
>> wrote:
>>> Marc Jones wrote:
 On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
  wrote:
> I managed to get the commercial BIOS to boot on my board and diffed it 
> with coreboot:
>
> http://coreboot.pastebin.com/m39b22c21
>
> The only differences I can see are related to interrupts, which shouldn't 
> matter in relation to
> my RAM problems.
>
> I have also run a memtest86 with the commercial BIOS (from bootable 
> CDROM) and as a payload in coreboot.
> The commercial BIOS didn't have any errors, but my coreboot did.  So the 
> hardware can't be too bad.
 That looks like just the southbridge cs5536 target. The memory
 differences would be in the processor geodelx target. Can you send
 those results?

 Marc

>>> I did some new MSR dumps.
>>>
>>> Diff:
>>> ./msrtool -t geodelx -t cs5536 -d amd_ref_bios
>>> http://coreboot.pastebin.com/m5e487f87
>>>
>>> AMD NAS reference BIOS:
>>> ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios
>>> http://coreboot.pastebin.com/madc04ac
>>>
>>> My Coreboot:
>>> ./msrtool -t geodelx -t cs5536 -l -s nathan_bios
>>> http://coreboot.pastebin.com/m7f35d855
>>>
>>>
>>> The diffs I did today show some differences with GLCP_DELAY_CONTROLS.
>>> Last time I added some code to force it to match the commercial BIOS
>>> GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.
>>>
>>> I also tested all the SODIMMS I have here (about 10) with the commercial 
>>> BIOS.
>>> Each time I did a msrtool diff to one I saved on disk.
>>>
>>> Most are 333MHz, but 2 are 400MHz.  There weren't any changes to the MSRs.
>>>
>>> Could there be an issue with the initialisation sequence that reading MSRs
>>> after booting won't show?  Also, quite a few MSRs aren't defined in 
>>> geodelx.c yet.
>>> Are there any obvious ones that should be added in?
>>>
>> --- AMD NAS reference BIOS
>> +++ Nathan's coreboot v3
>> #
>> # GLCP_DELAY_CONTROLS
>> #
>> -0x4c0f 0x83f1_00aa_5696_0404
>> +0x4c0f 0x8271_005a_ 5696_ 0404
>>
>> It looks like coreboot and the ref bios detect different dimm
>> configuration. This timing setup could be part of the instability (I
>> don't think it explains the reset problem). Look at the code here:
>> SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets
>> set to see what might be happening. Make sure that MTest is disabled
>> in the ref bios setup. This setting is based on the number of devices
>> (load) there is on the dimm.
>>
>> I didn't realize that so few registers were in the msr tool for
>> geodelx. You should add these:
>> 2018h R/W Refresh and SDRAM Program (MC_CF07_DATA)
>> 10071007_0040h Page 227
>> 2019h R/W Timing and Mode Program (MC_CF8F_DATA) 1808_287337A3h Page 
>> 229
>> 201Ah R/W Feature Enables (MC_CF1017_DATA) _11080001h Page 231
>> 201Bh RO Performance Counters (MC_CFPERF_CNT1) _h Page 
>> 232
>> 201Ch R/W Counter and CAS Control (MC_PERCNT2) _00FF00FFh Page 
>> 233
>> 201Dh R/W Clocking and Debug (MC_CFCLK_DBUG) _1300h Page 233
>>
>> 4C0Fh R/W GLCP I/O Delay
>> Controls(GLCP_DELAY_CONTROLS)_h Page 549
>> 4C14h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL)
>> Bootstrap specific Page 554
>>
>> Marc
>>
> 
> I've now added the MSRs and uploaded to pastebin:
> 
> AMD NAS:
> http://coreboot.pastebin.com/m53aed60b
> 
> My coreboot:
> http://coreboot.pastebin.com/md23bc6a
> 
> ./msrtool -d AMD_NAS:
> http://coreboot.pastebin.com/m77663de5
> 
> Tomorrow I'll try the tests on the NAS hardware, instead of our own 
> motherboards
> just in case there are some hidden hardware issues.
> 
> Regards,
> Nathan
> 

On the NAS reference board I got the following diff between coreboot
and the commercial BIOS:

http://coreboot.pastebin.com/m1353db1a

As you can see there are a lot of latency differences.
Unfortunately it was only later that I realised that the differences are 
because the bootstraps are set to bypass, which means coreboot uses 266 as the 
speed, where as the commercial bios uses 333.  So when I repeat the same on our 
boards, the only difference in the geodelx MSRs is:

# MC_CFCLK_DBUG
-0x201d 0x
+0x201d 0x1000
#12 TRISTATE_DIS TRI-STATE Disable
-0: Tri-stating enabled
+1: Tri-stating disabled

Nathan

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-30 Thread Marc Jones
On Fri, Nov 27, 2009 at 2:05 AM, Nathan Williams  wrote:
> Nathan Williams wrote:
>> Marc Jones wrote:
>>> On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams  
>>> wrote:
 Marc Jones wrote:
> On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
>  wrote:
>> I managed to get the commercial BIOS to boot on my board and diffed it 
>> with coreboot:
>>
>> http://coreboot.pastebin.com/m39b22c21
>>
>> The only differences I can see are related to interrupts, which 
>> shouldn't matter in relation to
>> my RAM problems.
>>
>> I have also run a memtest86 with the commercial BIOS (from bootable 
>> CDROM) and as a payload in coreboot.
>> The commercial BIOS didn't have any errors, but my coreboot did.  So the 
>> hardware can't be too bad.
> That looks like just the southbridge cs5536 target. The memory
> differences would be in the processor geodelx target. Can you send
> those results?
>
> Marc
>
 I did some new MSR dumps.

 Diff:
 ./msrtool -t geodelx -t cs5536 -d amd_ref_bios
 http://coreboot.pastebin.com/m5e487f87

 AMD NAS reference BIOS:
 ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios
 http://coreboot.pastebin.com/madc04ac

 My Coreboot:
 ./msrtool -t geodelx -t cs5536 -l -s nathan_bios
 http://coreboot.pastebin.com/m7f35d855


 The diffs I did today show some differences with GLCP_DELAY_CONTROLS.
 Last time I added some code to force it to match the commercial BIOS
 GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.

 I also tested all the SODIMMS I have here (about 10) with the commercial 
 BIOS.
 Each time I did a msrtool diff to one I saved on disk.

 Most are 333MHz, but 2 are 400MHz.  There weren't any changes to the MSRs.

 Could there be an issue with the initialisation sequence that reading MSRs
 after booting won't show?  Also, quite a few MSRs aren't defined in 
 geodelx.c yet.
 Are there any obvious ones that should be added in?

>>> --- AMD NAS reference BIOS
>>> +++ Nathan's coreboot v3
>>> #
>>> # GLCP_DELAY_CONTROLS
>>> #
>>> -0x4c0f 0x83f1_00aa_5696_0404
>>> +0x4c0f 0x8271_005a_ 5696_ 0404
>>>
>>> It looks like coreboot and the ref bios detect different dimm
>>> configuration. This timing setup could be part of the instability (I
>>> don't think it explains the reset problem). Look at the code here:
>>> SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets
>>> set to see what might be happening. Make sure that MTest is disabled
>>> in the ref bios setup. This setting is based on the number of devices
>>> (load) there is on the dimm.
>>>
>>> I didn't realize that so few registers were in the msr tool for
>>> geodelx. You should add these:
>>> 2018h R/W Refresh and SDRAM Program (MC_CF07_DATA)
>>> 10071007_0040h Page 227
>>> 2019h R/W Timing and Mode Program (MC_CF8F_DATA) 1808_287337A3h 
>>> Page 229
>>> 201Ah R/W Feature Enables (MC_CF1017_DATA) _11080001h Page 231
>>> 201Bh RO Performance Counters (MC_CFPERF_CNT1) _h Page 
>>> 232
>>> 201Ch R/W Counter and CAS Control (MC_PERCNT2) _00FF00FFh Page 
>>> 233
>>> 201Dh R/W Clocking and Debug (MC_CFCLK_DBUG) _1300h Page 233
>>>
>>> 4C0Fh R/W GLCP I/O Delay
>>> Controls(GLCP_DELAY_CONTROLS)_h Page 549
>>> 4C14h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL)
>>> Bootstrap specific Page 554
>>>
>>> Marc
>>>
>>
>> I've now added the MSRs and uploaded to pastebin:
>>
>> AMD NAS:
>> http://coreboot.pastebin.com/m53aed60b
>>
>> My coreboot:
>> http://coreboot.pastebin.com/md23bc6a
>>
>> ./msrtool -d AMD_NAS:
>> http://coreboot.pastebin.com/m77663de5
>>
>> Tomorrow I'll try the tests on the NAS hardware, instead of our own 
>> motherboards
>> just in case there are some hidden hardware issues.
>>
>> Regards,
>> Nathan
>>
>
> On the NAS reference board I got the following diff between coreboot
> and the commercial BIOS:
>
> http://coreboot.pastebin.com/m1353db1a
>
> As you can see there are a lot of latency differences.
> Unfortunately it was only later that I realised that the differences are 
> because the bootstraps are set to bypass, which means coreboot uses 266 as 
> the speed, where as the commercial bios uses 333.  So when I repeat the same 
> on our boards, the only difference in the geodelx MSRs is:
>
> # MC_CFCLK_DBUG
> -0x201d 0x
> +0x201d 0x1000
> #    12 TRISTATE_DIS TRI-STATE Disable
> -0: Tri-stating enabled
> +1: Tri-stating disabled


Nathan,

I don't think the tri-state disable bit explains the problems you have
seen. Since the memory has the same settings, the problem must be
somewhere else. You will need to go back the the reboot path to
investigate. It seems like something in the reset isn't doing a
complete reset, which causes a problem with the c

Re: [coreboot] GeodeLX RAM initialisation issue

2009-11-30 Thread Nathan Williams
Marc Jones wrote:
> On Fri, Nov 27, 2009 at 2:05 AM, Nathan Williams  
> wrote:
>> Nathan Williams wrote:
>>> Marc Jones wrote:
 On Tue, Nov 24, 2009 at 1:09 AM, Nathan Williams  
 wrote:
> Marc Jones wrote:
>> On Mon, Nov 23, 2009 at 12:27 AM, Nathan Williams
>>  wrote:
>>> I managed to get the commercial BIOS to boot on my board and diffed it 
>>> with coreboot:
>>>
>>> http://coreboot.pastebin.com/m39b22c21
>>>
>>> The only differences I can see are related to interrupts, which 
>>> shouldn't matter in relation to
>>> my RAM problems.
>>>
>>> I have also run a memtest86 with the commercial BIOS (from bootable 
>>> CDROM) and as a payload in coreboot.
>>> The commercial BIOS didn't have any errors, but my coreboot did.  So 
>>> the hardware can't be too bad.
>> That looks like just the southbridge cs5536 target. The memory
>> differences would be in the processor geodelx target. Can you send
>> those results?
>>
>> Marc
>>
> I did some new MSR dumps.
>
> Diff:
> ./msrtool -t geodelx -t cs5536 -d amd_ref_bios
> http://coreboot.pastebin.com/m5e487f87
>
> AMD NAS reference BIOS:
> ./msrtool -t geodelx -t cs5536 -l -s amd_ref_bios
> http://coreboot.pastebin.com/madc04ac
>
> My Coreboot:
> ./msrtool -t geodelx -t cs5536 -l -s nathan_bios
> http://coreboot.pastebin.com/m7f35d855
>
>
> The diffs I did today show some differences with GLCP_DELAY_CONTROLS.
> Last time I added some code to force it to match the commercial BIOS
> GLCP_DELAY_CONTROLS MSR, but it didn't seem to make any difference.
>
> I also tested all the SODIMMS I have here (about 10) with the commercial 
> BIOS.
> Each time I did a msrtool diff to one I saved on disk.
>
> Most are 333MHz, but 2 are 400MHz.  There weren't any changes to the MSRs.
>
> Could there be an issue with the initialisation sequence that reading MSRs
> after booting won't show?  Also, quite a few MSRs aren't defined in 
> geodelx.c yet.
> Are there any obvious ones that should be added in?
>
 --- AMD NAS reference BIOS
 +++ Nathan's coreboot v3
 #
 # GLCP_DELAY_CONTROLS
 #
 -0x4c0f 0x83f1_00aa_5696_0404
 +0x4c0f 0x8271_005a_ 5696_ 0404

 It looks like coreboot and the ref bios detect different dimm
 configuration. This timing setup could be part of the instability (I
 don't think it explains the reset problem). Look at the code here:
 SetDelayControl(void) and anywhere else that GLCP_DELAY_CONTROLS gets
 set to see what might be happening. Make sure that MTest is disabled
 in the ref bios setup. This setting is based on the number of devices
 (load) there is on the dimm.

 I didn't realize that so few registers were in the msr tool for
 geodelx. You should add these:
 2018h R/W Refresh and SDRAM Program (MC_CF07_DATA)
 10071007_0040h Page 227
 2019h R/W Timing and Mode Program (MC_CF8F_DATA) 1808_287337A3h 
 Page 229
 201Ah R/W Feature Enables (MC_CF1017_DATA) _11080001h Page 231
 201Bh RO Performance Counters (MC_CFPERF_CNT1) _h Page 
 232
 201Ch R/W Counter and CAS Control (MC_PERCNT2) _00FF00FFh Page 
 233
 201Dh R/W Clocking and Debug (MC_CFCLK_DBUG) _1300h Page 
 233

 4C0Fh R/W GLCP I/O Delay
 Controls(GLCP_DELAY_CONTROLS)_h Page 549
 4C14h R/W GLCP System Reset and PLL Control (GLCP_SYS_RSTPLL)
 Bootstrap specific Page 554

 Marc

>>> I've now added the MSRs and uploaded to pastebin:
>>>
>>> AMD NAS:
>>> http://coreboot.pastebin.com/m53aed60b
>>>
>>> My coreboot:
>>> http://coreboot.pastebin.com/md23bc6a
>>>
>>> ./msrtool -d AMD_NAS:
>>> http://coreboot.pastebin.com/m77663de5
>>>
>>> Tomorrow I'll try the tests on the NAS hardware, instead of our own 
>>> motherboards
>>> just in case there are some hidden hardware issues.
>>>
>>> Regards,
>>> Nathan
>>>
>> On the NAS reference board I got the following diff between coreboot
>> and the commercial BIOS:
>>
>> http://coreboot.pastebin.com/m1353db1a
>>
>> As you can see there are a lot of latency differences.
>> Unfortunately it was only later that I realised that the differences are 
>> because the bootstraps are set to bypass, which means coreboot uses 266 as 
>> the speed, where as the commercial bios uses 333.  So when I repeat the same 
>> on our boards, the only difference in the geodelx MSRs is:
>>
>> # MC_CFCLK_DBUG
>> -0x201d 0x
>> +0x201d 0x1000
>> #12 TRISTATE_DIS TRI-STATE Disable
>> -0: Tri-stating enabled
>> +1: Tri-stating disabled
> 
> 
> Nathan,
> 
> I don't think the tri-state disable bit explains the problems you have
> seen. Since the memory has the same settings, the problem must be
> somewhere el

Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-02 Thread Daniel Mack
(sorry I can't post a proper reply message, I picked that up from the
archives)

Nathan Williams  wrote:

> I am suspicious that the reset problem only occurs when I'm using a
> laptop hard drive off the 44pin IDE connector on our board. I have tried
> booting with a 3.5" drive and external 12V, but I can't replicate the
> problem. With the 3.5" drive, a reboot from fsck works fine. Hopefully
> the next PCB revision should perform better because we've moved the 5V
> plane further away from the DDR tracks.
> 
> I don't know if I mentioned another problem that has similar symptoms.
> Some RAM causes the same cache disable problem, even if there are no
> IDE devices connected. This happens from power-up, so it's not a reset
> issue.

I'm facing a very similar issue here on an ALIX.2D board which is based
on the same chipset. The problem happens to occur only sometimes, just
like you described it, and resetting from Linux gives a higher change of
provoking it. However, I also have once out of 20 power-up cycles as
well.

What's really strange about that is the fact that sometimes, not even
power cycling will fix it - coreboot will always ever stop at the same
point (from what I've traced, exactly at the same instructions that you
pointed out). Powering off and waiting for ~10 minutes likely brings the
board back to life.

Connecting or disconnecting extern IDE44 drives does not appear to
affect the probability, though. One more thing that bring some awareness
is that the effect is harder to trigger when booting from an external
LPC flash emulator (in contrast to coreboot flashed to the internal
LPC).

I urgently need to resolve that and would appreciate any more hints
about where to add more code for flushing caches and the like. I also
suspect the reset vector to not properly flush the hardware, but I'm
somewhat lost in this codebase I must admit, and Geode is also nothing
I'm terribly familiar with.

Thanks,
Daniel


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-02 Thread Peter Stuge
Daniel Mack wrote:
> the effect is harder to trigger when booting from an external LPC
> flash emulator (in contrast to coreboot flashed to the internal
> LPC).

Then you could experiment with a few different flash chips.

PC Engines makes a nice and neat Flash recovery board, which plugs
onto the LPC header, and comes with a PLCC chip in a socket.

http://pcengines.ch/lpc1a.htm

Order one or two of these, and order a few different flash chips
which are compatible with the board, and start collecting data
points.

Each LPC.1A comes with SST49LF040B. On the ALIX.2 and .3 there is an
AMIC flash chip. (Different from ALIX.1 which also has SST.)

You can also use Winbond W39V040APZ/080APZ. Note [0-9]A, it must not
be e.g. W39V040FAPZ (note [0-9]FA) which is a FWH chip.

Another compatible chip is PMC Pm49FL004T.


//Peter

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-02 Thread Daniel Mack
On Wed, Dec 02, 2009 at 02:59:01PM +0100, Peter Stuge wrote:
> Daniel Mack wrote:
> > the effect is harder to trigger when booting from an external LPC
> > flash emulator (in contrast to coreboot flashed to the internal
> > LPC).
> 
> Then you could experiment with a few different flash chips.
> 
> PC Engines makes a nice and neat Flash recovery board, which plugs
> onto the LPC header, and comes with a PLCC chip in a socket.

I doubt the flash chip itself is the problem. Might be I haven't been
totally clear about what I observed.

When using the Linux tools to flash an image to the internal LPC, the
system most likely won't come up immediately. I need that power-off
delay of some minutes to reanimate the board. After that, the bug is
very hard to trigger, even though it does happen, especially when
powering the device off (by unplugging the supply) and on again right
after that.

So my theory is that there is something left in any part of the system
which makes coreboot fail in disable_car(). And the same (or maybe just
a similar) effect is triggered when the LPC is written.

Does that ring a bell? As I said, I'm pretty lost in debugging this, but
I'm sure we're not having a hardware issue.

Thanks,
Daniel


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-04 Thread Daniel Mack
No hint, anyone?

On Wed, Dec 02, 2009 at 03:44:15PM +0100, Daniel Mack wrote:
> On Wed, Dec 02, 2009 at 02:59:01PM +0100, Peter Stuge wrote:
> > Daniel Mack wrote:
> > > the effect is harder to trigger when booting from an external LPC
> > > flash emulator (in contrast to coreboot flashed to the internal
> > > LPC).
> > 
> > Then you could experiment with a few different flash chips.
> > 
> > PC Engines makes a nice and neat Flash recovery board, which plugs
> > onto the LPC header, and comes with a PLCC chip in a socket.
> 
> I doubt the flash chip itself is the problem. Might be I haven't been
> totally clear about what I observed.
> 
> When using the Linux tools to flash an image to the internal LPC, the
> system most likely won't come up immediately. I need that power-off
> delay of some minutes to reanimate the board. After that, the bug is
> very hard to trigger, even though it does happen, especially when
> powering the device off (by unplugging the supply) and on again right
> after that.
> 
> So my theory is that there is something left in any part of the system
> which makes coreboot fail in disable_car(). And the same (or maybe just
> a similar) effect is triggered when the LPC is written.
> 
> Does that ring a bell? As I said, I'm pretty lost in debugging this, but
> I'm sure we're not having a hardware issue.
> 
> Thanks,
> Daniel

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-04 Thread Myles Watson
On Fri, Dec 4, 2009 at 9:47 AM, Daniel Mack  wrote:
> No hint, anyone?
Maybe you could zero all the RAM.  If you have to power it down for a
specific amount of time, that could be the time for the RAM to lose
its state.  If that works, you could start finding uninitialized
variables or a bad pointer somewhere.

As long as you're grasping at straws :)

Thanks,
Myles

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-04 Thread ron minnich
On Fri, Dec 4, 2009 at 8:47 AM, Daniel Mack  wrote:
> No hint, anyone?

Just about every time I had this problem on my geodes it was a problem
with dram. Just about every time. It's quite weird how well DRAM can
work even if it has not been programmed correctly. The correspondance
with disable_car() might just be that there's lots of burst cache
traffic to ram when you do this operation and cache is suddenly
connected to dram again.

Also, over the years, we have frequently found that DRAM vendors are,
well, less-than-honest about their product. One experience was on
OLPC. We had three boards, all with nominally the same parts,
different vendors however.
Boards A&B worked with faster timing; Boards A&C worked with medium
timing; and boards B&C only worked with the slowest timing. (I believe
in this case it was ras to cas delay)

Yes, indeed, it's not always true that slowing down dram makes it work :-)

Rather than "power off for 10 minutes" -- I assume this is "at the
wall plug" -- I wonder if you'd see an improvement if you yanked the
DC power at the board. Which were you doing -- AC or DC power off?

Thanks

ron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-04 Thread Daniel Mack
Hi Ron,

thanks for your answer.

On Fri, Dec 04, 2009 at 09:03:14AM -0800, ron minnich wrote:
> On Fri, Dec 4, 2009 at 8:47 AM, Daniel Mack  wrote:
> > No hint, anyone?
> 
> Just about every time I had this problem on my geodes it was a problem
> with dram. Just about every time. It's quite weird how well DRAM can
> work even if it has not been programmed correctly. The correspondance
> with disable_car() might just be that there's lots of burst cache
> traffic to ram when you do this operation and cache is suddenly
> connected to dram again.

Help me understanding how the DRAM can be programmed correctly. Is it
about timing constraints?

> Also, over the years, we have frequently found that DRAM vendors are,
> well, less-than-honest about their product. One experience was on
> OLPC. We had three boards, all with nominally the same parts,
> different vendors however.
> Boards A&B worked with faster timing; Boards A&C worked with medium
> timing; and boards B&C only worked with the slowest timing. (I believe
> in this case it was ras to cas delay)

That could well be an explanation for what I'm seeing, however, I wonder
why all boards work totally stable once they booted. Wouldn't wrong DRAM
settings result in unpredictable behaviour such as sporadic fails? I
don't see anything like that.

> Rather than "power off for 10 minutes" -- I assume this is "at the
> wall plug" -- I wonder if you'd see an improvement if you yanked the
> DC power at the board. Which were you doing -- AC or DC power off?

I was unplugging the DC jack from the board. There is some blocking
capacitors on it, but I doubt they will cause any part of the system to
survive much longer than a couple of seconds. But even something like
10s doesn't solve it. Only sometimes though, and I haven't found a
reliable pattern yet. Damn, I really wish I could provide more specific
input :-/

Thanks,
Daniel

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-04 Thread ron minnich
On Fri, Dec 4, 2009 at 9:12 AM, Daniel Mack  wrote:

> Help me understanding how the DRAM can be programmed correctly. Is it
> about timing constraints?

it's how you set the timing in the dram controller and how it matches
the DRAM, but it's also about the order in which you program things
and the timing of how you issue the commands. If you're doing v3 this
should all "just work", it certainly used to for me. But I have not
touched this code in 9 months or more.

> That could well be an explanation for what I'm seeing, however, I wonder
> why all boards work totally stable once they booted. Wouldn't wrong DRAM
> settings result in unpredictable behaviour such as sporadic fails? I
> don't see anything like that.

I wish I knew.

> I was unplugging the DC jack from the board. There is some blocking
> capacitors on it, but I doubt they will cause any part of the system to
> survive much longer than a couple of seconds. But even something like
> 10s doesn't solve it. Only sometimes though, and I haven't found a
> reliable pattern yet. Damn, I really wish I could provide more specific
> input :-/

This points more to what Myles was saying -- you might want to zero
all of memory and see if that helps. Are you using crosstool to build?
If not, you should.

ron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GeodeLX RAM initialisation issue

2009-12-04 Thread Marc Jones
On Fri, Dec 4, 2009 at 10:12 AM, Daniel Mack  wrote:
> Hi Ron,
>
> thanks for your answer.
>
> On Fri, Dec 04, 2009 at 09:03:14AM -0800, ron minnich wrote:
>> On Fri, Dec 4, 2009 at 8:47 AM, Daniel Mack  wrote:
>> > No hint, anyone?
>>
>> Just about every time I had this problem on my geodes it was a problem
>> with dram. Just about every time. It's quite weird how well DRAM can
>> work even if it has not been programmed correctly. The correspondance
>> with disable_car() might just be that there's lots of burst cache
>> traffic to ram when you do this operation and cache is suddenly
>> connected to dram again.
>
> Help me understanding how the DRAM can be programmed correctly. Is it
> about timing constraints?
>
>> Also, over the years, we have frequently found that DRAM vendors are,
>> well, less-than-honest about their product. One experience was on
>> OLPC. We had three boards, all with nominally the same parts,
>> different vendors however.
>> Boards A&B worked with faster timing; Boards A&C worked with medium
>> timing; and boards B&C only worked with the slowest timing. (I believe
>> in this case it was ras to cas delay)
>
> That could well be an explanation for what I'm seeing, however, I wonder
> why all boards work totally stable once they booted. Wouldn't wrong DRAM
> settings result in unpredictable behaviour such as sporadic fails? I
> don't see anything like that.
>
>> Rather than "power off for 10 minutes" -- I assume this is "at the
>> wall plug" -- I wonder if you'd see an improvement if you yanked the
>> DC power at the board. Which were you doing -- AC or DC power off?
>
> I was unplugging the DC jack from the board. There is some blocking
> capacitors on it, but I doubt they will cause any part of the system to
> survive much longer than a couple of seconds. But even something like
> 10s doesn't solve it. Only sometimes though, and I haven't found a
> reliable pattern yet. Damn, I really wish I could provide more specific
> input :-/

I'm a little confused. Is the failure always at disable_car when you
do the flash programming? What does "the system most likely won't come
up immediately" mean? This description sounds more like the 5536 being
in a bad state, which may or may not have to do with RAM. I have heard
of problems with the 5536 getting locked up if power sequencing is not
exactly right. Does it work if you unplug, remove the cmos battery,
press the power button to remove any capacitance, then plug it back in
make it work?

If it always breaks at disable_car(), it could be a memory or cache
state problem that wouldn't be seen with the legacy BIOS because it
doesn't do CAR. It could still be hardware/power sequence related
since we don't see this on every platform.  As far as I know, the AMD
reference designs and the Artec mainboards don't exhibit this problem.

Marc

-- 
http://marcjonesconsulting.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot