Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
I finally know that my issue must be related with the smbus registers because on a vendor bios running machine and using i2cdetect and i2cdump I get several values for different i2c devices detected, I get the same values when I successfully start with coreboot. But when I start with coreboot and fail with mcr_d fatal exit those registers are blank, I know that because I found a nice piece of code dumping smbus registers on the h8dme board :D thx to the autor!! I also know that reading these registers out may cause them to get lost! I'm not sure why?! There is a multiplexer on SMBus, this confirms my theory. Please check the GPIO. Imagine the multiplexer acts as some kind of rail switch. The transactions on smbus never reach thhe memory chips (the SPD eeprom). You need to find a pin to control the multiplexer. Rudolf Now my question is how do I initialize these registers with the values known from the vendor BIOS? smb_write_byte doesn't seems to work or maybe I'm using it wrong. THX, Knut Kujat. Knut Kujat escribió: Hello, thx all of you for your comments. Here a little update :) I now know why the boards worked just fine up here in my lab. To know if the board would work after being unplugged I always only unplugged the electrical cable but never the monitor attached to the board I figured out that the monitor is providing enough juice to maintain whatever alive in the board so after plugging the electrical cable on again coreboot started fine. Another thing I figured out is that it seems that the front leds of the board a managed by GPIO as well, is this right? If so it seems that something is wrong with GPIO because the power on led never works with coreboot. thx, Knut Kujat. ron minnich escribió: Just FYI: on our first system with Arima boards in 2002, everything worked well until we started booting 64-bit kernels. I'm not kidding. We did not find the SMBUS MUX on the boards until we had unreliable coreboot boots of 64-bit kernels. For quite some time the boards worked fine. Ollie found the SMBUS MUX by examining schematics. So the SMBUS mux can appear in strange ways, at strange times. This sounds like one of those times. SMBUS muxes are more common than you might think and the default power-on state is not always very well determined. ron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Rudolf Marek escribió: I finally know that my issue must be related with the smbus registers because on a vendor bios running machine and using i2cdetect and i2cdump I get several values for different i2c devices detected, I get the same values when I successfully start with coreboot. But when I start with coreboot and fail with mcr_d fatal exit those registers are blank, I know that because I found a nice piece of code dumping smbus registers on the h8dme board :D thx to the autor!! I also know that reading these registers out may cause them to get lost! I'm not sure why?! There is a multiplexer on SMBus, this confirms my theory. Please check the GPIO. Imagine the multiplexer acts as some kind of rail switch. The transactions on smbus never reach thhe memory chips (the SPD eeprom). You need to find a pin to control the multiplexer. Rudolf Thanks, because of your hints I was able to figure out that I needed to set up the spd_rom in romstage.c I also added the GPIOs settings as read from vendor BIOS and now the power on led works :). thx, Knut Kujat. Now my question is how do I initialize these registers with the values known from the vendor BIOS? smb_write_byte doesn't seems to work or maybe I'm using it wrong. THX, Knut Kujat. Knut Kujat escribió: Hello, thx all of you for your comments. Here a little update :) I now know why the boards worked just fine up here in my lab. To know if the board would work after being unplugged I always only unplugged the electrical cable but never the monitor attached to the board I figured out that the monitor is providing enough juice to maintain whatever alive in the board so after plugging the electrical cable on again coreboot started fine. Another thing I figured out is that it seems that the front leds of the board a managed by GPIO as well, is this right? If so it seems that something is wrong with GPIO because the power on led never works with coreboot. thx, Knut Kujat. ron minnich escribió: Just FYI: on our first system with Arima boards in 2002, everything worked well until we started booting 64-bit kernels. I'm not kidding. We did not find the SMBUS MUX on the boards until we had unreliable coreboot boots of 64-bit kernels. For quite some time the boards worked fine. Ollie found the SMBUS MUX by examining schematics. So the SMBUS mux can appear in strange ways, at strange times. This sounds like one of those times. SMBUS muxes are more common than you might think and the default power-on state is not always very well determined. ron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
On Wed, Mar 10, 2010 at 05:26:47PM +0100, Knut Kujat wrote: I finally know that my issue must be related with the smbus registers because on a vendor bios running machine and using i2cdetect and i2cdump I get several values for different i2c devices detected, I get the same values when I successfully start with coreboot. But when I start with coreboot and fail with mcr_d fatal exit those registers are blank, I know that because I found a nice piece of code dumping smbus registers on the h8dme board :D thx to the autor!! That would have been Marc Jones :) Thanks, Ward. -- Ward Vandewege w...@fsf.org Free Software Foundation - Senior Systems Administrator Join us in Cambridge for LibrePlanet, March 19th-21st! http://groups.fsf.org/wiki/LibrePlanet2010 -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Hi, I finally know that my issue must be related with the smbus registers because on a vendor bios running machine and using i2cdetect and i2cdump I get several values for different i2c devices detected, I get the same values when I successfully start with coreboot. But when I start with coreboot and fail with mcr_d fatal exit those registers are blank, I know that because I found a nice piece of code dumping smbus registers on the h8dme board :D thx to the autor!! I also know that reading these registers out may cause them to get lost! I'm not sure why?! Now my question is how do I initialize these registers with the values known from the vendor BIOS? smb_write_byte doesn't seems to work or maybe I'm using it wrong. THX, Knut Kujat. Knut Kujat escribió: Hello, thx all of you for your comments. Here a little update :) I now know why the boards worked just fine up here in my lab. To know if the board would work after being unplugged I always only unplugged the electrical cable but never the monitor attached to the board I figured out that the monitor is providing enough juice to maintain whatever alive in the board so after plugging the electrical cable on again coreboot started fine. Another thing I figured out is that it seems that the front leds of the board a managed by GPIO as well, is this right? If so it seems that something is wrong with GPIO because the power on led never works with coreboot. thx, Knut Kujat. ron minnich escribió: Just FYI: on our first system with Arima boards in 2002, everything worked well until we started booting 64-bit kernels. I'm not kidding. We did not find the SMBUS MUX on the boards until we had unreliable coreboot boots of 64-bit kernels. For quite some time the boards worked fine. Ollie found the SMBUS MUX by examining schematics. So the SMBUS mux can appear in strange ways, at strange times. This sounds like one of those times. SMBUS muxes are more common than you might think and the default power-on state is not always very well determined. ron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Hello, thx all of you for your comments. Here a little update :) I now know why the boards worked just fine up here in my lab. To know if the board would work after being unplugged I always only unplugged the electrical cable but never the monitor attached to the board I figured out that the monitor is providing enough juice to maintain whatever alive in the board so after plugging the electrical cable on again coreboot started fine. Another thing I figured out is that it seems that the front leds of the board a managed by GPIO as well, is this right? If so it seems that something is wrong with GPIO because the power on led never works with coreboot. thx, Knut Kujat. ron minnich escribió: Just FYI: on our first system with Arima boards in 2002, everything worked well until we started booting 64-bit kernels. I'm not kidding. We did not find the SMBUS MUX on the boards until we had unreliable coreboot boots of 64-bit kernels. For quite some time the boards worked fine. Ollie found the SMBUS MUX by examining schematics. So the SMBUS mux can appear in strange ways, at strange times. This sounds like one of those times. SMBUS muxes are more common than you might think and the default power-on state is not always very well determined. ron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Sorry, neglected to send original reply to list. Knut Kujat wrote: Andrew Goodbody escribió: Knut Kujat wrote: Any suggestions ? The vendor BIOS is doing some initialisation that coreboot is not. This init survives a short shutdown but is lost after a longer period without power. Yes, vendor BIOS must be doing something different when initializing ram. But why is coreboot working just fine up here in the lab even if I let it unplugged the whole night next morning I plug it back on and it works! Don't focus on that too much. It's probably to do with the environment, or even just coincidence. Is there a multiplexer on the SMBUS? I honestly don't know, I have: A multiplexer on the SMBUS was just something that occurred to me. To find it you would need to actually use the SMBUS controller to scan the SMBUS for devices. This is not a trivial task but I think there may be tools out there to help you. A better approach would be to start by actually debugging what is going wrong in RAM init. That will tell you the area to investigate for differences. Andrew -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Hi, This is pointing to something which is powered from 5VSB voltage. It could be some GPIO settings which sets voltage for ram through some other chip. It could be some powersequencing pin connected as GPIO too, it could be a i2c bus multiplexer operated by some GPIO pin too ;) I would suggest to dump the superio chip with isadump (all logical devices) and all registers powered from the 5VSB well if known. Check for changes on GPIO pins or SuperIO global config. Check if the fail is caused by missing SPD EPROMS (error SMBus reads) or just by ram itself. It could be also something from the SB itself, but try with superio first. Then compare the dumps with that you obtained from coreboot (you will need to program that) You can check from linux with legacy bios, then boot with coreboot and then boot with power unplugged. Good luck, Rudolf -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Rudolf Marek escribió: Hi, This is pointing to something which is powered from 5VSB voltage. It could be some GPIO settings which sets voltage for ram through some other chip. It could be some powersequencing pin connected as GPIO too, it could be a i2c bus multiplexer operated by some GPIO pin too ;) I would suggest to dump the superio chip with isadump (all logical devices) and all registers powered from the 5VSB well if known. Check for changes on GPIO pins or SuperIO global config. Check if the fail is caused by missing SPD EPROMS (error SMBus reads) or just by ram itself. It could be also something from the SB itself, but try with superio first. Then compare the dumps with that you obtained from coreboot (you will need to program that) You can check from linux with legacy bios, then boot with coreboot and then boot with power unplugged. Good luck, Rudolf Hi, I did a output on status form status = mctRead_SPD(smbaddr, Index); in mct_d.c and it only spits -1 out while on the working coreboot machine it gives me several numbers until index = 64 on those dimms where ram is installed. Is this a possible SPD EPROMS missing error you pointed out? What would be my next steps if so? Thanks for your effort, Knut Kujat. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
On 3/5/10 2:33 PM, Andrew Goodbody wrote: Sorry, neglected to send original reply to list. Knut Kujat wrote: Andrew Goodbody escribió: Knut Kujat wrote: Any suggestions ? The vendor BIOS is doing some initialisation that coreboot is not. This init survives a short shutdown but is lost after a longer period without power. Yes, vendor BIOS must be doing something different when initializing ram. But why is coreboot working just fine up here in the lab even if I let it unplugged the whole night next morning I plug it back on and it works! Don't focus on that too much. It's probably to do with the environment, or even just coincidence. I think so too. Two more suggestions: - compare coreboot and vendor bios with SerialICE - try disabling all cores / cpus except the BSP to make sure the problem is not caused by the PCI access race conditions in the Fam8 and K10 ports... Stefan -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Hi, I did a output on status form status = mctRead_SPD(smbaddr, Index); in mct_d.c and it only spits -1 out while on the working coreboot machine it gives me several numbers until index = 64 on those dimms where ram is installed. Is this a possible SPD EPROMS missing error you pointed out? Yes this points to some I2C multiplexer device. You need to find out how to control the multiplexer. It might be some GPIO setup or even some i2c device. Try to superiotool in verbose mode to see how the GPIO is setup. You will need either to load the GPIO settings (of superio tool) in coreboot before ram init or just dump it and check for the differences in first place. in linux, i2cdetect 0 output would also help maybe... try running sensors-detect it might detect the bus multiplexers. Rudolf What would be my next steps if so? Thanks for your effort, Knut Kujat. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Two more suggestions: - compare coreboot and vendor bios with SerialICE - try disabling all cores / cpus except the BSP to make sure the problem is not caused by the PCI access race conditions in the Fam8 and K10 ports... Yes good one also. Rudolf Stefan -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Just FYI: on our first system with Arima boards in 2002, everything worked well until we started booting 64-bit kernels. I'm not kidding. We did not find the SMBUS MUX on the boards until we had unreliable coreboot boots of 64-bit kernels. For quite some time the boards worked fine. Ollie found the SMBUS MUX by examining schematics. So the SMBUS mux can appear in strange ways, at strange times. This sounds like one of those times. SMBUS muxes are more common than you might think and the default power-on state is not always very well determined. ron -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Hello, I still having trouble with fatal exit but now I can reproduce the error: Let's say I have a board running with vendor BIOS and flashing coreboot.rom into it with flashrom, so far everything good. Now I shut the whole system down and turn it on again, and voila coreboot booting without having problems. And I can shut the system down like 100 times and boot again with no trouble. Now I unplugging the board for more than a minute plug it back on and coreboot is unable to find my installed memory and dies with No Nodes?! mct_d: fatal exit. In order to make it boot again with coreboot I have to first flash the vendor BIOS on it and boot it than I can flash and boot coreboot again. That won't be much trouble with 1 or 2 boards but with more than 10... I'm thinking that there may be some kind of electrical issue because I have a board that used to fatal exit down in the cluster but up here in the lab it works fine without any unplugging and than not working issues. Is there any way to solve this problem? Maybe ram needs more time to stabilize itself before initializing ?! Any suggestions ? Thanks, Knut Kujat. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Peter Stuge escribió: Knut Kujat wrote: I haven't tried swapping CPUs or RAM. But this errors appears on memory initialization, right? So its most likely a ram issue? The memory controller is built-in to the CPU. Try swapping components around and see if the problem follows some particular parts. //Peter Hello, switching memory from a working board to the failing board worked partially because now it boots and even starts seabios but seabios can't find the hard drive!! It's like there isn't one installed I already switched HD with the working board and no result, of course everything works fine with vendor bios. That's odd! thx, Knut Kujat. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Knut Kujat escribió: Peter Stuge escribió: Knut Kujat wrote: I haven't tried swapping CPUs or RAM. But this errors appears on memory initialization, right? So its most likely a ram issue? The memory controller is built-in to the CPU. Try swapping components around and see if the problem follows some particular parts. //Peter Hello, switching memory from a working board to the failing board worked partially because now it boots and even starts seabios but seabios can't find the hard drive!! It's like there isn't one installed I already switched HD with the working board and no result, of course everything works fine with vendor bios. That's odd! thx, Knut Kujat. I solved it. There are 3 sata cables connected to the board only 1 actually has a hard drive connected to it. Seems like this cable has to be connected to sata 1 and before all others. Is this right? Can someone confirm that pleas? Bye and THX, Knut Kujat. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
[coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Hi, I've got this ugly mct_d fatal exit error again on one of my H8QME-2+ boards. Even every single board is absolutely identical 4 Opterons 16G Ram, etc... there are several boards booting and working without any problem with coreboot and others don't even start and mct_d fatal exit :(. Has someone an idea what the problem could be ? Thanks any comment would be appreciated. Knut Kujat -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Does it happen when you create same configuration using SIMnow? Rudolf -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
Christian Leber escribió: On Friday 26 February 2010 14:41:49 you wrote: Hi Knut I've got this ugly mct_d fatal exit error again on one of my H8QME-2+ boards. Even every single board is absolutely identical 4 Opterons 16G Ram, etc... there are several boards booting and working without any problem with coreboot and others don't even start and mct_d fatal exit :(. Has someone an idea what the problem could be ? AFAIK the boxes are using engineering samples, so who knows, have you tried swapping CPUs? Have you tried swapping the RAM? Does that happen with or without HTX board? Regards Christian Hi, it happens with and without board :(. No I haven't tried swapping CPUs or RAM. But this errors appears on memory initialization, right? So its most likely a ram issue? thx, Knut Kujat. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Supermicro H8QME-2+ mct_d fatal exit.
On Friday 26 February 2010 14:41:49 you wrote: Hi Knut I've got this ugly mct_d fatal exit error again on one of my H8QME-2+ boards. Even every single board is absolutely identical 4 Opterons 16G Ram, etc... there are several boards booting and working without any problem with coreboot and others don't even start and mct_d fatal exit :(. Has someone an idea what the problem could be ? AFAIK the boxes are using engineering samples, so who knows, have you tried swapping CPUs? Have you tried swapping the RAM? Does that happen with or without HTX board? Regards Christian -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot