Re: [coreboot] GCC update broke AMD Fam10h boot
On Mon, Mar 16, 2015 at 4:33 PM, Stefan Reinauer wrote: > * Aaron Durbin [150316 22:44]: >> A quick hack is add ALIGN(32) to the linker script before >> _bs_init_begin: src/arch/x86/ramstage.ld >> >> But I think we'll need to store pointers to the structures in order to >> properly handle the situation where the compiler is effectively making >> alignment/size decisions for some reason. > > I suggest trying to enforce alignment / size instead of adding another > layer of indirection. I'm not sure that's possible w/o marking the struct packed just for the sake of it. The other issue is that this section is sitting in RO while also being written to. The other thing, which has been known for awhile, is that we can't take advantage of symbols being equal for an empty set since C says no two symbols can be the same. I'm sure the language lawyers will correct me where I got that wrong. -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
* Aaron Durbin [150316 22:44]: > A quick hack is add ALIGN(32) to the linker script before > _bs_init_begin: src/arch/x86/ramstage.ld > > But I think we'll need to store pointers to the structures in order to > properly handle the situation where the compiler is effectively making > alignment/size decisions for some reason. I suggest trying to enforce alignment / size instead of adding another layer of indirection. Stefan -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Mon, Mar 16, 2015 at 4:49 PM, Timothy Pearson wrote: > On 03/16/2015 04:44 PM, Aaron Durbin wrote: >> >> On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson >> wrote: >>> >>> On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson wrote: > > > All, > > Just a heads up as there is no bugtracker for this project. GIT commit > 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, > breaks > ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST > code, > but then goes into an infinite loop). Downgrading the GCC version > repairs > the boot failure. > > Not sure if you want to revert that commit until someone can figure out > what > changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. >>> >>> >>> >>> Sure: >>> https://raptorengineeringinc.com/coreboot/built.tar.bz2 >>> >>> Other oddities: >>> GCC 4.8.3: >>> normal/romstage0x7ff80stage97345 >>> normal/ramstage0x97c40stage154869 >>> >>> GCC 4.9.2: >>> normal/romstage0x7ff80stage94773 >>> normal/ramstage0x97240stage173942 >>> >>> Note in particular, judging from the file sizes, that something seems to >>> have been relocated from romstage to ramstage by the new gcc version. >>> >> >> I noticed you had CONFIG_COVERAGE selected in both the builds. Could >> you try not having that selected? I wonder if something changed in the >> compiler on that front. But... I think I found a bigger issue. > > > That shouldn't be a problem. For reference, should CONFIG_COVERAGE be on or > off for board status report builds? > > >> $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ >> 00146fc4 r pch_intel_wifi >> 00146fd0 R cpu_drivers >> 00146fd0 R epci_drivers >> 00146fd0 r model_10xxx >> 00146fdc R _bs_init_begin >> 00146fdc r cbmem_bscb >> 00146fdc R ecpu_drivers >> 00146ff0 r gcov_bscb >> 0014702c R _bs_init_end >> 0014702c R pnp_conf_mode_870155_aa >> 00147034 R pnp_conf_mode_a0a0_aa >> 0014703c R pnp_conf_mode_8787_aa >> 00147044 R pnp_conf_mode__aa >> >> $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ >> 001465c4 r pch_intel_wifi >> 001465d0 R cpu_drivers >> 001465d0 R epci_drivers >> 001465d0 r model_10xxx >> 001465dc R _bs_init_begin >> 001465dc R ecpu_drivers >> 001465e0 r cbmem_bscb >> 00146600 r gcov_bscb >> 0014663c R _bs_init_end >> 00146640 R pnp_conf_mode_870155_aa >> 00146648 R pnp_conf_mode_a0a0_aa >> 00146650 R pnp_conf_mode_8787_aa >> 00146658 R pnp_conf_mode__a >> >> The boot state callbacks place the whole structure for each entry >> between _bs_init_begin and _bs_init_end. For both binaries the size of >> 0x14. > > >> For the 4.8.3 compiled ramstage I see: >> (gdb) p/x 0x0014702c - 0x00146fdc >> $12 = 0x50 >> For the 4.9.2 compiled ramstage I see: >> (gdb) p/x 0x014663c - 0x01465dc >> $14 = 0x60 >> >> 0x60 is not a multiple of 0x14 -- which is means things aren't cool. > > > This makes perfect sense--whenever coreboot didn't hang outright it started > infinitely spewing some message regarding a boot state callback already > being complete. > >> Looking at the symbols it appears the compiler is aligning those >> structures to 32-bytes for some reason... >> >> A quick hack is add ALIGN(32) to the linker script before >> _bs_init_begin: src/arch/x86/ramstage.ld > > > So I wonder if this is unique to AMD Fam10h or if a whole lot of other > boards broke with the gcc update. I wouldn't have even caught this if I > hadn't checked out a new coreboot tree instead of copying over the existing > tree with the prebuilt crossgcc, so we might be looking at a ticking > timebomb that will go off as people start upgrading their crossgcc > versions... It all sorta depends. But the issue that _bs_init_begin does not equal the address of the first bscb structure is bad news all around. > >> But I think we'll need to store pointers to the structures in order to >> properly handle the situation where the compiler is effectively making >> alignment/size decisions for some reason. > > > I am not at all familiar with the code in question, so all I can do is offer > to test. Thanks for analysing the problem! I might be able to whip up a patch, but it's harder than I first thought because we were relying on arrays to be swept into those regions. I'll have to think on this one or we'll just have to change the API entirely for all the users. > > >> -Aaron > > > > -- > Timothy Pearson > Raptor Engineering > +1 (415) 727-8645 > http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On 03/16/2015 04:44 PM, Aaron Durbin wrote: On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson wrote: On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2 Other oddities: GCC 4.8.3: normal/romstage0x7ff80stage97345 normal/ramstage0x97c40stage154869 GCC 4.9.2: normal/romstage0x7ff80stage94773 normal/ramstage0x97240stage173942 Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version. I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue. That shouldn't be a problem. For reference, should CONFIG_COVERAGE be on or off for board status report builds? $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode__aa $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode__a The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14. For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60 0x60 is not a multiple of 0x14 -- which is means things aren't cool. This makes perfect sense--whenever coreboot didn't hang outright it started infinitely spewing some message regarding a boot state callback already being complete. Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason... A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld So I wonder if this is unique to AMD Fam10h or if a whole lot of other boards broke with the gcc update. I wouldn't have even caught this if I hadn't checked out a new coreboot tree instead of copying over the existing tree with the prebuilt crossgcc, so we might be looking at a ticking timebomb that will go off as people start upgrading their crossgcc versions... But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. I am not at all familiar with the code in question, so all I can do is offer to test. Thanks for analysing the problem! -Aaron -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson wrote: > On 03/16/2015 09:23 AM, Aaron Durbin wrote: >> >> On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson >> wrote: >>> >>> All, >>> >>> Just a heads up as there is no bugtracker for this project. GIT commit >>> 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, >>> breaks >>> ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, >>> but then goes into an infinite loop). Downgrading the GCC version >>> repairs >>> the boot failure. >>> >>> Not sure if you want to revert that commit until someone can figure out >>> what >>> changed to cause the problem. >> >> >> Could post ramstage.elf from the two different builds somewhere? I'd >> like to take a peak at what is in there. > > > Sure: > https://raptorengineeringinc.com/coreboot/built.tar.bz2 > > Other oddities: > GCC 4.8.3: > normal/romstage0x7ff80stage97345 > normal/ramstage0x97c40stage154869 > > GCC 4.9.2: > normal/romstage0x7ff80stage94773 > normal/ramstage0x97240stage173942 > > Note in particular, judging from the file sizes, that something seems to > have been relocated from romstage to ramstage by the new gcc version. > I noticed you had CONFIG_COVERAGE selected in both the builds. Could you try not having that selected? I wonder if something changed in the compiler on that front. But... I think I found a bigger issue. $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_ 00146fc4 r pch_intel_wifi 00146fd0 R cpu_drivers 00146fd0 R epci_drivers 00146fd0 r model_10xxx 00146fdc R _bs_init_begin 00146fdc r cbmem_bscb 00146fdc R ecpu_drivers 00146ff0 r gcov_bscb 0014702c R _bs_init_end 0014702c R pnp_conf_mode_870155_aa 00147034 R pnp_conf_mode_a0a0_aa 0014703c R pnp_conf_mode_8787_aa 00147044 R pnp_conf_mode__aa $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_ 001465c4 r pch_intel_wifi 001465d0 R cpu_drivers 001465d0 R epci_drivers 001465d0 r model_10xxx 001465dc R _bs_init_begin 001465dc R ecpu_drivers 001465e0 r cbmem_bscb 00146600 r gcov_bscb 0014663c R _bs_init_end 00146640 R pnp_conf_mode_870155_aa 00146648 R pnp_conf_mode_a0a0_aa 00146650 R pnp_conf_mode_8787_aa 00146658 R pnp_conf_mode__a The boot state callbacks place the whole structure for each entry between _bs_init_begin and _bs_init_end. For both binaries the size of 0x14. For the 4.8.3 compiled ramstage I see: (gdb) p/x 0x0014702c - 0x00146fdc $12 = 0x50 For the 4.9.2 compiled ramstage I see: (gdb) p/x 0x014663c - 0x01465dc $14 = 0x60 0x60 is not a multiple of 0x14 -- which is means things aren't cool. Looking at the symbols it appears the compiler is aligning those structures to 32-bytes for some reason... A quick hack is add ALIGN(32) to the linker script before _bs_init_begin: src/arch/x86/ramstage.ld But I think we'll need to store pointers to the structures in order to properly handle the situation where the compiler is effectively making alignment/size decisions for some reason. -Aaron > > -- > Timothy Pearson > Raptor Engineering > +1 (415) 727-8645 > http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On 03/16/2015 09:23 AM, Aaron Durbin wrote: On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson wrote: All, Just a heads up as there is no bugtracker for this project. GIT commit 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, but then goes into an infinite loop). Downgrading the GCC version repairs the boot failure. Not sure if you want to revert that commit until someone can figure out what changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. Sure: https://raptorengineeringinc.com/coreboot/built.tar.bz2 Other oddities: GCC 4.8.3: normal/romstage0x7ff80stage97345 normal/ramstage0x97c40stage154869 GCC 4.9.2: normal/romstage0x7ff80stage94773 normal/ramstage0x97240stage173942 Note in particular, judging from the file sizes, that something seems to have been relocated from romstage to ramstage by the new gcc version. -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Automated test system: Nominations wanted
On 03/16/2015 08:17 AM, Alexander Couzens wrote: On Mon, 16 Mar 2015 01:20:17 -0500 Timothy Pearson wrote: Just wanted to mention that Raptor Engineering now has an automated test stand for the ASUS KFSN4-DRE board, run nightly with automatic bricking recovery. It has two Opteron 2431 (AMD Family 10h, 6 core @ 2.4GHz)CPUs and 6GB of DDR2-667 memory installed on Node 0. Each successful test result is recorded to the board-status repository, and each failure is reported to this list. Hi Timothy, can you please write down (wiki?) how your system is setted up? How do you do automatic bricking recovery? Best, lynxis Bricking recovery uses the fallback mechanism. There is a supervisory board attached to the target; it controls physical power on/power off/CMOS reset and also sends build/flash/test commands to the target as needed. The exact code/details for the controller are not public at this time, though I would be happy to provide additional information on the target and/or entertain target hardware configuration requests (add on cards, etc.) -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 http://www.raptorengineeringinc.com -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] GCC update broke AMD Fam10h boot
On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson wrote: > All, > > Just a heads up as there is no bugtracker for this project. GIT commit > 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks > ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code, > but then goes into an infinite loop). Downgrading the GCC version repairs > the boot failure. > > Not sure if you want to revert that commit until someone can figure out what > changed to cause the problem. Could post ramstage.elf from the two different builds somewhere? I'd like to take a peak at what is in there. > > -- > Timothy Pearson > Raptor Engineering > +1 (415) 727-8645 > http://www.raptorengineeringinc.com > > -- > coreboot mailing list: coreboot@coreboot.org > http://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Automated test system: Nominations wanted
On Mon, 16 Mar 2015 01:20:17 -0500 Timothy Pearson wrote: > Just wanted to mention that Raptor Engineering now has an automated test > stand for the ASUS KFSN4-DRE board, run nightly with automatic bricking > recovery. It has two Opteron 2431 (AMD Family 10h, 6 core @ 2.4GHz)CPUs > and 6GB of DDR2-667 memory installed on Node 0. > > Each successful test result is recorded to the board-status repository, > and each failure is reported to this list. Hi Timothy, can you please write down (wiki?) how your system is setted up? How do you do automatic bricking recovery? Best, lynxis -- Alexander Couzens mail: lyn...@fe80.eu jabber: lyn...@jabber.ccc.de mobile: +4915123277221 pgpF1rHpIAbnY.pgp Description: OpenPGP digital signature -- coreboot mailing list: coreboot@coreboot.org http://www.coreboot.org/mailman/listinfo/coreboot