[coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Knut Kujat
Hi,
I've got this ugly mct_d fatal exit error again on one of my H8QME-2+
boards. Even every single board is absolutely identical 4 Opterons 16G
Ram, etc... there are several boards booting and working without any
problem with coreboot and others don't even start and mct_d fatal exit :(.

Has someone an idea what the problem could be ?

Thanks any comment would be appreciated.

Knut Kujat



-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Rudolf Marek

Does it happen when you create same configuration using SIMnow?

Rudolf

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Knut Kujat
Christian Leber escribió:
> On Friday 26 February 2010 14:41:49 you wrote:
>
> Hi Knut
>
>   
>> I've got this ugly mct_d fatal exit error again on one of my H8QME-2+
>> boards. Even every single board is absolutely identical 4 Opterons 16G
>> Ram, etc... there are several boards booting and working without any
>> problem with coreboot and others don't even start and mct_d fatal exit :(.
>>
>> Has someone an idea what the problem could be ?
>> 
>
> AFAIK the boxes are using engineering samples, so who knows,
> have you tried swapping CPUs?
> Have you tried swapping the RAM?
> Does that happen with or without HTX board?
>
> Regards
> Christian
>   
Hi,

it happens with and without board :(. No I haven't tried swapping CPUs
or RAM. But this errors appears on memory initialization, right? So its
most likely a ram issue?

thx,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Christian Leber
On Friday 26 February 2010 14:41:49 you wrote:

Hi Knut

> I've got this ugly mct_d fatal exit error again on one of my H8QME-2+
> boards. Even every single board is absolutely identical 4 Opterons 16G
> Ram, etc... there are several boards booting and working without any
> problem with coreboot and others don't even start and mct_d fatal exit :(.
> 
> Has someone an idea what the problem could be ?

AFAIK the boxes are using engineering samples, so who knows,
have you tried swapping CPUs?
Have you tried swapping the RAM?
Does that happen with or without HTX board?

Regards
Christian

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-02-26 Thread Peter Stuge
Knut Kujat wrote:
> I haven't tried swapping CPUs or RAM. But this errors appears on
> memory initialization, right? So its most likely a ram issue?

The memory controller is built-in to the CPU.

Try swapping components around and see if the problem follows some
particular parts.


//Peter

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-01 Thread Knut Kujat
Peter Stuge escribió:
> Knut Kujat wrote:
>   
>> I haven't tried swapping CPUs or RAM. But this errors appears on
>> memory initialization, right? So its most likely a ram issue?
>> 
>
> The memory controller is built-in to the CPU.
>
> Try swapping components around and see if the problem follows some
> particular parts.
>
>
> //Peter
>
>   
Hello,

switching memory from a working board to the failing board worked
"partially" because now it boots and even starts seabios but seabios
can't find the hard drive!! It's like there isn't one installed I
already switched HD with the working board and no result, of course
everything works fine with vendor bios.

That's odd!

thx,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-01 Thread Knut Kujat
Knut Kujat escribió:
> Peter Stuge escribió:
>   
>> Knut Kujat wrote:
>>   
>> 
>>> I haven't tried swapping CPUs or RAM. But this errors appears on
>>> memory initialization, right? So its most likely a ram issue?
>>> 
>>>   
>> The memory controller is built-in to the CPU.
>>
>> Try swapping components around and see if the problem follows some
>> particular parts.
>>
>>
>> //Peter
>>
>>   
>> 
> Hello,
>
> switching memory from a working board to the failing board worked
> "partially" because now it boots and even starts seabios but seabios
> can't find the hard drive!! It's like there isn't one installed I
> already switched HD with the working board and no result, of course
> everything works fine with vendor bios.
>
> That's odd!
>
> thx,
> Knut Kujat.
>
>   

I "solved" it. There are 3 sata cables connected to the board only 1
actually has a hard drive connected to it. Seems like this cable has to
be connected to sata 1 and before all others. Is this right? Can someone
confirm that pleas?

Bye and THX,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-04 Thread Knut Kujat
Hello,

I still having trouble with "fatal exit" but now I can reproduce the error:

Let's say I have a board running with vendor BIOS and flashing
coreboot.rom into it with flashrom, so far everything good.
Now I shut the whole system down and turn it on again, and voila
coreboot booting without having problems. And I can shut the system down
like 100 times and boot again with no trouble. Now I unplugging the
board for more than a minute plug it back on and coreboot is unable to
find my installed memory and dies with "No Nodes?!" "mct_d: fatal exit".
In order to make it boot again with coreboot I have to first flash the
vendor BIOS on it and boot it than I can flash and boot coreboot again.
That won't be much trouble with 1 or 2 boards but with more than 10...

I'm thinking that there may be some kind of electrical issue because I
have a board that used to "fatal exit" down in the cluster but up here
in the lab it works  fine without any "unplugging and than not working"
issues. Is there any way to solve this problem? Maybe ram needs more
time to stabilize itself before initializing ?!

Any suggestions ?

Thanks,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Andrew Goodbody

Sorry, neglected to send original reply to list.

Knut Kujat wrote:

Andrew Goodbody escribió:

Knut Kujat wrote:

Any suggestions ?

The vendor BIOS is doing some initialisation that coreboot is not.
This init survives a short shutdown but is lost after a longer period
without power.

Yes, vendor BIOS must be doing something different when initializing
ram. But why is coreboot working just fine up here in the lab even if I
let it unplugged the whole night next morning I plug it back on and it
works!


Don't focus on that too much. It's probably to do with the environment, 
or even just coincidence.



Is there a multiplexer on the SMBUS?

I honestly don't know, I have:


A multiplexer on the SMBUS was just something that occurred to me. To 
find it you would need to actually use the SMBUS controller to scan the 
SMBUS for devices. This is not a trivial task but I think there may be 
tools out there to help you.


A better approach would be to start by actually debugging what is going 
wrong in RAM init. That will tell you the area to investigate for 
differences.


Andrew

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Rudolf Marek

Hi,

This is pointing to something which is powered from 5VSB voltage. It could be 
some GPIO settings which sets voltage for ram through some other chip. It could 
be some powersequencing pin connected as GPIO too, it could be a i2c bus 
multiplexer operated by some GPIO pin too ;)


I would suggest to dump the superio chip with "isadump" (all logical devices) 
and all registers powered from the 5VSB well if known. Check for changes on GPIO 
pins or SuperIO global config.


Check if the fail is caused by missing SPD EPROMS (error SMBus reads) or just by 
ram itself.


It could be also something from the SB itself, but try with superio first.

Then compare the dumps with that you obtained from coreboot (you will need to 
program that) You can check from linux with legacy bios, then boot with coreboot 
and then boot with power unplugged.


Good luck,

Rudolf

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Knut Kujat
Rudolf Marek escribió:
> Hi,
>
> This is pointing to something which is powered from 5VSB voltage. It
> could be some GPIO settings which sets voltage for ram through some
> other chip. It could be some powersequencing pin connected as GPIO
> too, it could be a i2c bus multiplexer operated by some GPIO pin too ;)
>
> I would suggest to dump the superio chip with "isadump" (all logical
> devices) and all registers powered from the 5VSB well if known. Check
> for changes on GPIO pins or SuperIO global config.
>
> Check if the fail is caused by missing SPD EPROMS (error SMBus reads)
> or just by ram itself.
>
> It could be also something from the SB itself, but try with superio
> first.
>
> Then compare the dumps with that you obtained from coreboot (you will
> need to program that) You can check from linux with legacy bios, then
> boot with coreboot and then boot with power unplugged.
>
> Good luck,
>
> Rudolf
>
Hi,

I did a output on status form status = mctRead_SPD(smbaddr, Index); in
mct_d.c and it only spits -1 out while on the working coreboot machine
it gives me several numbers until index = 64 on those dimms where ram is
installed. Is this a possible SPD EPROMS missing error you pointed out?
What would be my next steps if so?

Thanks for your effort,
Knut Kujat.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Stefan Reinauer
On 3/5/10 2:33 PM, Andrew Goodbody wrote:
> Sorry, neglected to send original reply to list.
>
> Knut Kujat wrote:
>> Andrew Goodbody escribió:
>>> Knut Kujat wrote:
 Any suggestions ?
>>> The vendor BIOS is doing some initialisation that coreboot is not.
>>> This init survives a short shutdown but is lost after a longer period
>>> without power.
>> Yes, vendor BIOS must be doing something different when initializing
>> ram. But why is coreboot working just fine up here in the lab even if I
>> let it unplugged the whole night next morning I plug it back on and it
>> works!
>
> Don't focus on that too much. It's probably to do with the
> environment, or even just coincidence.
I think so too.

Two more suggestions:
- compare coreboot and vendor bios with SerialICE
- try disabling all cores / cpus except the BSP to make sure the problem
is not caused by the PCI access race conditions in the Fam8 and K10 ports...

Stefan

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Rudolf Marek

Hi,

I did a output on status form status = mctRead_SPD(smbaddr, Index); in
mct_d.c and it only spits -1 out while on the working coreboot machine
it gives me several numbers until index = 64 on those dimms where ram is
installed. Is this a possible SPD EPROMS missing error you pointed out?



Yes this points to some I2C multiplexer device. You need to find out how to 
control the multiplexer. It might be some GPIO setup or even some i2c device. 
Try to superiotool in verbose mode to see how the GPIO is setup. You will need 
either to load the GPIO settings (of superio tool) in coreboot before ram init 
or just dump it and check for the differences in first place.


in linux, i2cdetect 0
output would also help maybe...

try running sensors-detect it might detect the bus multiplexers.

Rudolf



What would be my next steps if so?

Thanks for your effort,
Knut Kujat.



--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread Rudolf Marek

Two more suggestions:
- compare coreboot and vendor bios with SerialICE
- try disabling all cores / cpus except the BSP to make sure the problem
is not caused by the PCI access race conditions in the Fam8 and K10 ports...


Yes good one also.

Rudolf



Stefan



--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-05 Thread ron minnich
Just FYI:

on our first system with Arima boards in 2002, everything worked well
until we started booting 64-bit kernels. I'm not kidding. We did not
find the SMBUS MUX on the boards until we had unreliable coreboot
boots of 64-bit kernels. For quite some time the boards worked fine.
Ollie found the SMBUS MUX by examining schematics.

So the SMBUS mux can appear in strange ways, at strange times. This
sounds like one of those times. SMBUS muxes are more common than you
might think and the default power-on state is not always very well
determined.

ron

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-08 Thread Knut Kujat
Hello,

thx all of you for your comments. Here a little update :)

I now know why the boards worked just fine up here in my lab. To know if
the board would work after being unplugged I always "only" unplugged the
electrical cable but never the monitor attached to the board I figured
out that the monitor is providing enough juice to maintain whatever
alive in the board so after plugging the electrical cable on again
coreboot started fine. Another thing I figured out is that it seems that
the front leds of the board a managed by GPIO as well, is this right? If
so it seems that something is wrong with GPIO because the power on led
never works with coreboot.

thx,
Knut Kujat.



ron minnich escribió:
> Just FYI:
>
> on our first system with Arima boards in 2002, everything worked well
> until we started booting 64-bit kernels. I'm not kidding. We did not
> find the SMBUS MUX on the boards until we had unreliable coreboot
> boots of 64-bit kernels. For quite some time the boards worked fine.
> Ollie found the SMBUS MUX by examining schematics.
>
> So the SMBUS mux can appear in strange ways, at strange times. This
> sounds like one of those times. SMBUS muxes are more common than you
> might think and the default power-on state is not always very well
> determined.
>
> ron
>
>   


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-10 Thread Knut Kujat
Hi,

I finally know that my issue must be related with the smbus registers
because on a vendor bios running machine and using i2cdetect and i2cdump
I get several values for different i2c devices detected, I get the same
values when I successfully start with coreboot. But when I start with
coreboot and fail with mcr_d fatal exit those registers are blank, I
know that because I found a nice piece of code dumping smbus registers
on the h8dme board :D thx to the autor!!

I also know that reading these registers out may cause them to get lost!
I'm not sure why?!

Now my question is how do I initialize these registers with the values
known from the vendor BIOS? smb_write_byte doesn't seems to work or
maybe I'm using it wrong.

THX,
Knut Kujat.



Knut Kujat escribió:
> Hello,
>
> thx all of you for your comments. Here a little update :)
>
> I now know why the boards worked just fine up here in my lab. To know if
> the board would work after being unplugged I always "only" unplugged the
> electrical cable but never the monitor attached to the board I figured
> out that the monitor is providing enough juice to maintain whatever
> alive in the board so after plugging the electrical cable on again
> coreboot started fine. Another thing I figured out is that it seems that
> the front leds of the board a managed by GPIO as well, is this right? If
> so it seems that something is wrong with GPIO because the power on led
> never works with coreboot.
>
> thx,
> Knut Kujat.
>
>
>
> ron minnich escribió:
>   
>> Just FYI:
>>
>> on our first system with Arima boards in 2002, everything worked well
>> until we started booting 64-bit kernels. I'm not kidding. We did not
>> find the SMBUS MUX on the boards until we had unreliable coreboot
>> boots of 64-bit kernels. For quite some time the boards worked fine.
>> Ollie found the SMBUS MUX by examining schematics.
>>
>> So the SMBUS mux can appear in strange ways, at strange times. This
>> sounds like one of those times. SMBUS muxes are more common than you
>> might think and the default power-on state is not always very well
>> determined.
>>
>> ron
>>
>>   
>> 
>
>
>   


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-11 Thread Ward Vandewege
On Wed, Mar 10, 2010 at 05:26:47PM +0100, Knut Kujat wrote:
> I finally know that my issue must be related with the smbus registers
> because on a vendor bios running machine and using i2cdetect and i2cdump
> I get several values for different i2c devices detected, I get the same
> values when I successfully start with coreboot. But when I start with
> coreboot and fail with mcr_d fatal exit those registers are blank, I
> know that because I found a nice piece of code dumping smbus registers
> on the h8dme board :D thx to the autor!!

That would have been Marc Jones :)

Thanks,
Ward.

-- 
Ward Vandewege 
Free Software Foundation - Senior Systems Administrator

Join us in Cambridge for LibrePlanet, March 19th-21st!
http://groups.fsf.org/wiki/LibrePlanet2010

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-12 Thread Rudolf Marek



I finally know that my issue must be related with the smbus registers
because on a vendor bios running machine and using i2cdetect and i2cdump
I get several values for different i2c devices detected, I get the same
values when I successfully start with coreboot. But when I start with
coreboot and fail with mcr_d fatal exit those registers are blank, I
know that because I found a nice piece of code dumping smbus registers
on the h8dme board :D thx to the autor!!

I also know that reading these registers out may cause them to get lost!
I'm not sure why?!

  


There is a multiplexer on SMBus, this confirms my theory. Please check
the GPIO.

Imagine the multiplexer acts as some kind of rail switch. The
transactions on smbus never reach thhe memory chips (the SPD eeprom).
You need to find a pin to control the multiplexer.

Rudolf




Now my question is how do I initialize these registers with the values
known from the vendor BIOS? smb_write_byte doesn't seems to work or
maybe I'm using it wrong.

THX,
Knut Kujat.



Knut Kujat escribió:
  

Hello,

thx all of you for your comments. Here a little update :)

I now know why the boards worked just fine up here in my lab. To know if
the board would work after being unplugged I always "only" unplugged the
electrical cable but never the monitor attached to the board I figured
out that the monitor is providing enough juice to maintain whatever
alive in the board so after plugging the electrical cable on again
coreboot started fine. Another thing I figured out is that it seems that
the front leds of the board a managed by GPIO as well, is this right? If
so it seems that something is wrong with GPIO because the power on led
never works with coreboot.

thx,
Knut Kujat.



ron minnich escribió:
  


Just FYI:

on our first system with Arima boards in 2002, everything worked well
until we started booting 64-bit kernels. I'm not kidding. We did not
find the SMBUS MUX on the boards until we had unreliable coreboot
boots of 64-bit kernels. For quite some time the boards worked fine.
Ollie found the SMBUS MUX by examining schematics.

So the SMBUS mux can appear in strange ways, at strange times. This
sounds like one of those times. SMBUS muxes are more common than you
might think and the default power-on state is not always very well
determined.

ron

  

  
  



  




--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.

2010-03-12 Thread Knut Kujat
Rudolf Marek escribió:
>
>> I finally know that my issue must be related with the smbus registers
>> because on a vendor bios running machine and using i2cdetect and i2cdump
>> I get several values for different i2c devices detected, I get the same
>> values when I successfully start with coreboot. But when I start with
>> coreboot and fail with mcr_d fatal exit those registers are blank, I
>> know that because I found a nice piece of code dumping smbus registers
>> on the h8dme board :D thx to the autor!!
>>
>> I also know that reading these registers out may cause them to get lost!
>> I'm not sure why?!
>>
>>   
>
> There is a multiplexer on SMBus, this confirms my theory. Please check
> the GPIO.
>
> Imagine the multiplexer acts as some kind of rail switch. The
> transactions on smbus never reach thhe memory chips (the SPD eeprom).
> You need to find a pin to control the multiplexer.
>
> Rudolf
Thanks, because of your hints I was able to figure out that I needed to
set up the spd_rom in romstage.c I also added the GPIOs settings as read
from vendor BIOS and now the power on led works :).

thx,
Knut Kujat.

>
>
>
>> Now my question is how do I initialize these registers with the values
>> known from the vendor BIOS? smb_write_byte doesn't seems to work or
>> maybe I'm using it wrong.
>>
>> THX,
>> Knut Kujat.
>>
>>
>>
>> Knut Kujat escribió:
>>  
>>> Hello,
>>>
>>> thx all of you for your comments. Here a little update :)
>>>
>>> I now know why the boards worked just fine up here in my lab. To
>>> know if
>>> the board would work after being unplugged I always "only" unplugged
>>> the
>>> electrical cable but never the monitor attached to the board I figured
>>> out that the monitor is providing enough juice to maintain whatever
>>> alive in the board so after plugging the electrical cable on again
>>> coreboot started fine. Another thing I figured out is that it seems
>>> that
>>> the front leds of the board a managed by GPIO as well, is this
>>> right? If
>>> so it seems that something is wrong with GPIO because the power on led
>>> never works with coreboot.
>>>
>>> thx,
>>> Knut Kujat.
>>>
>>>
>>>
>>> ron minnich escribió:
>>>  
 Just FYI:

 on our first system with Arima boards in 2002, everything worked well
 until we started booting 64-bit kernels. I'm not kidding. We did not
 find the SMBUS MUX on the boards until we had unreliable coreboot
 boots of 64-bit kernels. For quite some time the boards worked fine.
 Ollie found the SMBUS MUX by examining schematics.

 So the SMBUS mux can appear in strange ways, at strange times. This
 sounds like one of those times. SMBUS muxes are more common than you
 might think and the default power-on state is not always very well
 determined.

 ron

 
>>>   
>>
>>   
>
>
>


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot