Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-12 Thread Rudolf Marek



I finally know that my issue must be related with the smbus registers
because on a vendor bios running machine and using i2cdetect and i2cdump
I get several values for different i2c devices detected, I get the same
values when I successfully start with coreboot. But when I start with
coreboot and fail with mcr_d fatal exit those registers are blank, I
know that because I found a nice piece of code dumping smbus registers
on the h8dme board :D thx to the autor!!

I also know that reading these registers out may cause them to get lost!
I'm not sure why?!

  


There is a multiplexer on SMBus, this confirms my theory. Please check
the GPIO.

Imagine the multiplexer acts as some kind of rail switch. The
transactions on smbus never reach thhe memory chips (the SPD eeprom).
You need to find a pin to control the multiplexer.

Rudolf




Now my question is how do I initialize these registers with the values
known from the vendor BIOS? smb_write_byte doesn't seems to work or
maybe I'm using it wrong.

THX,
Knut Kujat.



Knut Kujat escribió:
  

Hello,

thx all of you for your comments. Here a little update :)

I now know why the boards worked just fine up here in my lab. To know if
the board would work after being unplugged I always only unplugged the
electrical cable but never the monitor attached to the board I figured
out that the monitor is providing enough juice to maintain whatever
alive in the board so after plugging the electrical cable on again
coreboot started fine. Another thing I figured out is that it seems that
the front leds of the board a managed by GPIO as well, is this right? If
so it seems that something is wrong with GPIO because the power on led
never works with coreboot.

thx,
Knut Kujat.



ron minnich escribió:
  


Just FYI:

on our first system with Arima boards in 2002, everything worked well
until we started booting 64-bit kernels. I'm not kidding. We did not
find the SMBUS MUX on the boards until we had unreliable coreboot
boots of 64-bit kernels. For quite some time the boards worked fine.
Ollie found the SMBUS MUX by examining schematics.

So the SMBUS mux can appear in strange ways, at strange times. This
sounds like one of those times. SMBUS muxes are more common than you
might think and the default power-on state is not always very well
determined.

ron

  

  
  



  




--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-12 Thread Knut Kujat
Rudolf Marek escribió:

 I finally know that my issue must be related with the smbus registers
 because on a vendor bios running machine and using i2cdetect and i2cdump
 I get several values for different i2c devices detected, I get the same
 values when I successfully start with coreboot. But when I start with
 coreboot and fail with mcr_d fatal exit those registers are blank, I
 know that because I found a nice piece of code dumping smbus registers
 on the h8dme board :D thx to the autor!!

 I also know that reading these registers out may cause them to get lost!
 I'm not sure why?!

   

 There is a multiplexer on SMBus, this confirms my theory. Please check
 the GPIO.

 Imagine the multiplexer acts as some kind of rail switch. The
 transactions on smbus never reach thhe memory chips (the SPD eeprom).
 You need to find a pin to control the multiplexer.

 Rudolf
Thanks, because of your hints I was able to figure out that I needed to
set up the spd_rom in romstage.c I also added the GPIOs settings as read
from vendor BIOS and now the power on led works :).

thx,
Knut Kujat.




 Now my question is how do I initialize these registers with the values
 known from the vendor BIOS? smb_write_byte doesn't seems to work or
 maybe I'm using it wrong.

 THX,
 Knut Kujat.



 Knut Kujat escribió:
  
 Hello,

 thx all of you for your comments. Here a little update :)

 I now know why the boards worked just fine up here in my lab. To
 know if
 the board would work after being unplugged I always only unplugged
 the
 electrical cable but never the monitor attached to the board I figured
 out that the monitor is providing enough juice to maintain whatever
 alive in the board so after plugging the electrical cable on again
 coreboot started fine. Another thing I figured out is that it seems
 that
 the front leds of the board a managed by GPIO as well, is this
 right? If
 so it seems that something is wrong with GPIO because the power on led
 never works with coreboot.

 thx,
 Knut Kujat.



 ron minnich escribió:
  
 Just FYI:

 on our first system with Arima boards in 2002, everything worked well
 until we started booting 64-bit kernels. I'm not kidding. We did not
 find the SMBUS MUX on the boards until we had unreliable coreboot
 boots of 64-bit kernels. For quite some time the boards worked fine.
 Ollie found the SMBUS MUX by examining schematics.

 So the SMBUS mux can appear in strange ways, at strange times. This
 sounds like one of those times. SMBUS muxes are more common than you
 might think and the default power-on state is not always very well
 determined.

 ron

 
   

   





-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-11 Thread Ward Vandewege
On Wed, Mar 10, 2010 at 05:26:47PM +0100, Knut Kujat wrote:
 I finally know that my issue must be related with the smbus registers
 because on a vendor bios running machine and using i2cdetect and i2cdump
 I get several values for different i2c devices detected, I get the same
 values when I successfully start with coreboot. But when I start with
 coreboot and fail with mcr_d fatal exit those registers are blank, I
 know that because I found a nice piece of code dumping smbus registers
 on the h8dme board :D thx to the autor!!

That would have been Marc Jones :)

Thanks,
Ward.

-- 
Ward Vandewege w...@fsf.org
Free Software Foundation - Senior Systems Administrator

Join us in Cambridge for LibrePlanet, March 19th-21st!
http://groups.fsf.org/wiki/LibrePlanet2010

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-10 Thread Knut Kujat
Hi,

I finally know that my issue must be related with the smbus registers
because on a vendor bios running machine and using i2cdetect and i2cdump
I get several values for different i2c devices detected, I get the same
values when I successfully start with coreboot. But when I start with
coreboot and fail with mcr_d fatal exit those registers are blank, I
know that because I found a nice piece of code dumping smbus registers
on the h8dme board :D thx to the autor!!

I also know that reading these registers out may cause them to get lost!
I'm not sure why?!

Now my question is how do I initialize these registers with the values
known from the vendor BIOS? smb_write_byte doesn't seems to work or
maybe I'm using it wrong.

THX,
Knut Kujat.



Knut Kujat escribió:
 Hello,

 thx all of you for your comments. Here a little update :)

 I now know why the boards worked just fine up here in my lab. To know if
 the board would work after being unplugged I always only unplugged the
 electrical cable but never the monitor attached to the board I figured
 out that the monitor is providing enough juice to maintain whatever
 alive in the board so after plugging the electrical cable on again
 coreboot started fine. Another thing I figured out is that it seems that
 the front leds of the board a managed by GPIO as well, is this right? If
 so it seems that something is wrong with GPIO because the power on led
 never works with coreboot.

 thx,
 Knut Kujat.



 ron minnich escribió:
   
 Just FYI:

 on our first system with Arima boards in 2002, everything worked well
 until we started booting 64-bit kernels. I'm not kidding. We did not
 find the SMBUS MUX on the boards until we had unreliable coreboot
 boots of 64-bit kernels. For quite some time the boards worked fine.
 Ollie found the SMBUS MUX by examining schematics.

 So the SMBUS mux can appear in strange ways, at strange times. This
 sounds like one of those times. SMBUS muxes are more common than you
 might think and the default power-on state is not always very well
 determined.

 ron

   
 


   


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-08 Thread Knut Kujat
Hello,

thx all of you for your comments. Here a little update :)

I now know why the boards worked just fine up here in my lab. To know if
the board would work after being unplugged I always only unplugged the
electrical cable but never the monitor attached to the board I figured
out that the monitor is providing enough juice to maintain whatever
alive in the board so after plugging the electrical cable on again
coreboot started fine. Another thing I figured out is that it seems that
the front leds of the board a managed by GPIO as well, is this right? If
so it seems that something is wrong with GPIO because the power on led
never works with coreboot.

thx,
Knut Kujat.



ron minnich escribió:
 Just FYI:

 on our first system with Arima boards in 2002, everything worked well
 until we started booting 64-bit kernels. I'm not kidding. We did not
 find the SMBUS MUX on the boards until we had unreliable coreboot
 boots of 64-bit kernels. For quite some time the boards worked fine.
 Ollie found the SMBUS MUX by examining schematics.

 So the SMBUS mux can appear in strange ways, at strange times. This
 sounds like one of those times. SMBUS muxes are more common than you
 might think and the default power-on state is not always very well
 determined.

 ron

   


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Andrew Goodbody

Sorry, neglected to send original reply to list.

Knut Kujat wrote:

Andrew Goodbody escribió:

Knut Kujat wrote:

Any suggestions ?

The vendor BIOS is doing some initialisation that coreboot is not.
This init survives a short shutdown but is lost after a longer period
without power.

Yes, vendor BIOS must be doing something different when initializing
ram. But why is coreboot working just fine up here in the lab even if I
let it unplugged the whole night next morning I plug it back on and it
works!


Don't focus on that too much. It's probably to do with the environment, 
or even just coincidence.



Is there a multiplexer on the SMBUS?

I honestly don't know, I have:


A multiplexer on the SMBUS was just something that occurred to me. To 
find it you would need to actually use the SMBUS controller to scan the 
SMBUS for devices. This is not a trivial task but I think there may be 
tools out there to help you.


A better approach would be to start by actually debugging what is going 
wrong in RAM init. That will tell you the area to investigate for 
differences.


Andrew

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Rudolf Marek

Hi,

This is pointing to something which is powered from 5VSB voltage. It could be 
some GPIO settings which sets voltage for ram through some other chip. It could 
be some powersequencing pin connected as GPIO too, it could be a i2c bus 
multiplexer operated by some GPIO pin too ;)


I would suggest to dump the superio chip with isadump (all logical devices) 
and all registers powered from the 5VSB well if known. Check for changes on GPIO 
pins or SuperIO global config.


Check if the fail is caused by missing SPD EPROMS (error SMBus reads) or just by 
ram itself.


It could be also something from the SB itself, but try with superio first.

Then compare the dumps with that you obtained from coreboot (you will need to 
program that) You can check from linux with legacy bios, then boot with coreboot 
and then boot with power unplugged.


Good luck,

Rudolf

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Knut Kujat
Rudolf Marek escribió:
 Hi,

 This is pointing to something which is powered from 5VSB voltage. It
 could be some GPIO settings which sets voltage for ram through some
 other chip. It could be some powersequencing pin connected as GPIO
 too, it could be a i2c bus multiplexer operated by some GPIO pin too ;)

 I would suggest to dump the superio chip with isadump (all logical
 devices) and all registers powered from the 5VSB well if known. Check
 for changes on GPIO pins or SuperIO global config.

 Check if the fail is caused by missing SPD EPROMS (error SMBus reads)
 or just by ram itself.

 It could be also something from the SB itself, but try with superio
 first.

 Then compare the dumps with that you obtained from coreboot (you will
 need to program that) You can check from linux with legacy bios, then
 boot with coreboot and then boot with power unplugged.

 Good luck,

 Rudolf

Hi,

I did a output on status form status = mctRead_SPD(smbaddr, Index); in
mct_d.c and it only spits -1 out while on the working coreboot machine
it gives me several numbers until index = 64 on those dimms where ram is
installed. Is this a possible SPD EPROMS missing error you pointed out?
What would be my next steps if so?

Thanks for your effort,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Stefan Reinauer
On 3/5/10 2:33 PM, Andrew Goodbody wrote:
 Sorry, neglected to send original reply to list.

 Knut Kujat wrote:
 Andrew Goodbody escribió:
 Knut Kujat wrote:
 Any suggestions ?
 The vendor BIOS is doing some initialisation that coreboot is not.
 This init survives a short shutdown but is lost after a longer period
 without power.
 Yes, vendor BIOS must be doing something different when initializing
 ram. But why is coreboot working just fine up here in the lab even if I
 let it unplugged the whole night next morning I plug it back on and it
 works!

 Don't focus on that too much. It's probably to do with the
 environment, or even just coincidence.
I think so too.

Two more suggestions:
- compare coreboot and vendor bios with SerialICE
- try disabling all cores / cpus except the BSP to make sure the problem
is not caused by the PCI access race conditions in the Fam8 and K10 ports...

Stefan

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Rudolf Marek

Hi,

I did a output on status form status = mctRead_SPD(smbaddr, Index); in
mct_d.c and it only spits -1 out while on the working coreboot machine
it gives me several numbers until index = 64 on those dimms where ram is
installed. Is this a possible SPD EPROMS missing error you pointed out?



Yes this points to some I2C multiplexer device. You need to find out how to 
control the multiplexer. It might be some GPIO setup or even some i2c device. 
Try to superiotool in verbose mode to see how the GPIO is setup. You will need 
either to load the GPIO settings (of superio tool) in coreboot before ram init 
or just dump it and check for the differences in first place.


in linux, i2cdetect 0
output would also help maybe...

try running sensors-detect it might detect the bus multiplexers.

Rudolf



What would be my next steps if so?

Thanks for your effort,
Knut Kujat.



--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Rudolf Marek

Two more suggestions:
- compare coreboot and vendor bios with SerialICE
- try disabling all cores / cpus except the BSP to make sure the problem
is not caused by the PCI access race conditions in the Fam8 and K10 ports...


Yes good one also.

Rudolf



Stefan



--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread ron minnich
Just FYI:

on our first system with Arima boards in 2002, everything worked well
until we started booting 64-bit kernels. I'm not kidding. We did not
find the SMBUS MUX on the boards until we had unreliable coreboot
boots of 64-bit kernels. For quite some time the boards worked fine.
Ollie found the SMBUS MUX by examining schematics.

So the SMBUS mux can appear in strange ways, at strange times. This
sounds like one of those times. SMBUS muxes are more common than you
might think and the default power-on state is not always very well
determined.

ron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-04 Thread Knut Kujat
Hello,

I still having trouble with fatal exit but now I can reproduce the error:

Let's say I have a board running with vendor BIOS and flashing
coreboot.rom into it with flashrom, so far everything good.
Now I shut the whole system down and turn it on again, and voila
coreboot booting without having problems. And I can shut the system down
like 100 times and boot again with no trouble. Now I unplugging the
board for more than a minute plug it back on and coreboot is unable to
find my installed memory and dies with No Nodes?! mct_d: fatal exit.
In order to make it boot again with coreboot I have to first flash the
vendor BIOS on it and boot it than I can flash and boot coreboot again.
That won't be much trouble with 1 or 2 boards but with more than 10...

I'm thinking that there may be some kind of electrical issue because I
have a board that used to fatal exit down in the cluster but up here
in the lab it works  fine without any unplugging and than not working
issues. Is there any way to solve this problem? Maybe ram needs more
time to stabilize itself before initializing ?!

Any suggestions ?

Thanks,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-01 Thread Knut Kujat
Peter Stuge escribió:
 Knut Kujat wrote:
   
 I haven't tried swapping CPUs or RAM. But this errors appears on
 memory initialization, right? So its most likely a ram issue?
 

 The memory controller is built-in to the CPU.

 Try swapping components around and see if the problem follows some
 particular parts.


 //Peter

   
Hello,

switching memory from a working board to the failing board worked
partially because now it boots and even starts seabios but seabios
can't find the hard drive!! It's like there isn't one installed I
already switched HD with the working board and no result, of course
everything works fine with vendor bios.

That's odd!

thx,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-01 Thread Knut Kujat
Knut Kujat escribió:
 Peter Stuge escribió:
   
 Knut Kujat wrote:
   
 
 I haven't tried swapping CPUs or RAM. But this errors appears on
 memory initialization, right? So its most likely a ram issue?
 
   
 The memory controller is built-in to the CPU.

 Try swapping components around and see if the problem follows some
 particular parts.


 //Peter

   
 
 Hello,

 switching memory from a working board to the failing board worked
 partially because now it boots and even starts seabios but seabios
 can't find the hard drive!! It's like there isn't one installed I
 already switched HD with the working board and no result, of course
 everything works fine with vendor bios.

 That's odd!

 thx,
 Knut Kujat.

   

I solved it. There are 3 sata cables connected to the board only 1
actually has a hard drive connected to it. Seems like this cable has to
be connected to sata 1 and before all others. Is this right? Can someone
confirm that pleas?

Bye and THX,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

[coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Knut Kujat
Hi,
I've got this ugly mct_d fatal exit error again on one of my H8QME-2+
boards. Even every single board is absolutely identical 4 Opterons 16G
Ram, etc... there are several boards booting and working without any
problem with coreboot and others don't even start and mct_d fatal exit :(.

Has someone an idea what the problem could be ?

Thanks any comment would be appreciated.

Knut Kujat



-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Rudolf Marek

Does it happen when you create same configuration using SIMnow?

Rudolf

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Knut Kujat
Christian Leber escribió:
 On Friday 26 February 2010 14:41:49 you wrote:

 Hi Knut

   
 I've got this ugly mct_d fatal exit error again on one of my H8QME-2+
 boards. Even every single board is absolutely identical 4 Opterons 16G
 Ram, etc... there are several boards booting and working without any
 problem with coreboot and others don't even start and mct_d fatal exit :(.

 Has someone an idea what the problem could be ?
 

 AFAIK the boxes are using engineering samples, so who knows,
 have you tried swapping CPUs?
 Have you tried swapping the RAM?
 Does that happen with or without HTX board?

 Regards
 Christian
   
Hi,

it happens with and without board :(. No I haven't tried swapping CPUs
or RAM. But this errors appears on memory initialization, right? So its
most likely a ram issue?

thx,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Christian Leber
On Friday 26 February 2010 14:41:49 you wrote:

Hi Knut

 I've got this ugly mct_d fatal exit error again on one of my H8QME-2+
 boards. Even every single board is absolutely identical 4 Opterons 16G
 Ram, etc... there are several boards booting and working without any
 problem with coreboot and others don't even start and mct_d fatal exit :(.
 
 Has someone an idea what the problem could be ?

AFAIK the boxes are using engineering samples, so who knows,
have you tried swapping CPUs?
Have you tried swapping the RAM?
Does that happen with or without HTX board?

Regards
Christian

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot