Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Mon, Mar 16, 2015 at 4:33 PM, Stefan Reinauer
 wrote:
> * Aaron Durbin  [150316 22:44]:
>> A quick hack is add ALIGN(32) to the linker script before
>> _bs_init_begin: src/arch/x86/ramstage.ld
>>
>> But I think we'll need to store pointers to the structures in order to
>> properly handle the situation where the compiler is effectively making
>> alignment/size decisions for some reason.
>
> I suggest trying to enforce alignment / size instead of adding another
> layer of indirection.

I'm not sure that's possible w/o marking the struct packed just for
the sake of it. The other issue is that this section is sitting in RO
while also being written to. The other thing, which has been known for
awhile, is that we can't take advantage of symbols being equal for an
empty set since C says no two symbols can be the same. I'm sure the
language lawyers will correct me where I got that wrong.

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Stefan Reinauer
* Aaron Durbin  [150316 22:44]:
> A quick hack is add ALIGN(32) to the linker script before
> _bs_init_begin: src/arch/x86/ramstage.ld
> 
> But I think we'll need to store pointers to the structures in order to
> properly handle the situation where the compiler is effectively making
> alignment/size decisions for some reason.

I suggest trying to enforce alignment / size instead of adding another
layer of indirection.

Stefan


-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Mon, Mar 16, 2015 at 4:49 PM, Timothy Pearson
 wrote:
> On 03/16/2015 04:44 PM, Aaron Durbin wrote:
>>
>> On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
>>   wrote:
>>>
>>> On 03/16/2015 09:23 AM, Aaron Durbin wrote:


 On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
wrote:
>
>
> All,
>
> Just a heads up as there is no bugtracker for this project.  GIT commit
> 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
> breaks
> ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST
> code,
> but then goes into an infinite loop).  Downgrading the GCC version
> repairs
> the boot failure.
>
> Not sure if you want to revert that commit until someone can figure out
> what
> changed to cause the problem.



 Could post ramstage.elf from the two different builds somewhere? I'd
 like to take a peak at what is in there.
>>>
>>>
>>>
>>> Sure:
>>> https://raptorengineeringinc.com/coreboot/built.tar.bz2
>>>
>>> Other oddities:
>>> GCC 4.8.3:
>>> normal/romstage0x7ff80stage97345
>>> normal/ramstage0x97c40stage154869
>>>
>>> GCC 4.9.2:
>>> normal/romstage0x7ff80stage94773
>>> normal/ramstage0x97240stage173942
>>>
>>> Note in particular, judging from the file sizes, that something seems to
>>> have been relocated from romstage to ramstage by the new gcc version.
>>>
>>
>> I noticed you had CONFIG_COVERAGE selected in both the builds. Could
>> you try not having that selected? I wonder if something changed in the
>> compiler on that front. But... I think I found a bigger issue.
>
>
> That shouldn't be a problem.  For reference, should CONFIG_COVERAGE be on or
> off for board status report builds?
>
>
>> $ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
>> 00146fc4 r pch_intel_wifi
>> 00146fd0 R cpu_drivers
>> 00146fd0 R epci_drivers
>> 00146fd0 r model_10xxx
>> 00146fdc R _bs_init_begin
>> 00146fdc r cbmem_bscb
>> 00146fdc R ecpu_drivers
>> 00146ff0 r gcov_bscb
>> 0014702c R _bs_init_end
>> 0014702c R pnp_conf_mode_870155_aa
>> 00147034 R pnp_conf_mode_a0a0_aa
>> 0014703c R pnp_conf_mode_8787_aa
>> 00147044 R pnp_conf_mode__aa
>>
>> $ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
>> 001465c4 r pch_intel_wifi
>> 001465d0 R cpu_drivers
>> 001465d0 R epci_drivers
>> 001465d0 r model_10xxx
>> 001465dc R _bs_init_begin
>> 001465dc R ecpu_drivers
>> 001465e0 r cbmem_bscb
>> 00146600 r gcov_bscb
>> 0014663c R _bs_init_end
>> 00146640 R pnp_conf_mode_870155_aa
>> 00146648 R pnp_conf_mode_a0a0_aa
>> 00146650 R pnp_conf_mode_8787_aa
>> 00146658 R pnp_conf_mode__a
>>
>> The boot state callbacks place the whole structure for each entry
>> between _bs_init_begin and _bs_init_end. For both binaries the size of
>> 0x14.
>
>
>> For the 4.8.3 compiled ramstage I see:
>> (gdb) p/x 0x0014702c - 0x00146fdc
>> $12 = 0x50
>> For the 4.9.2 compiled ramstage I see:
>> (gdb) p/x 0x014663c - 0x01465dc
>> $14 = 0x60
>>
>> 0x60 is not a multiple of 0x14 -- which is means things aren't cool.
>
>
> This makes perfect sense--whenever coreboot didn't hang outright it started
> infinitely spewing some message regarding a boot state callback already
> being complete.
>
>> Looking at the symbols it appears the compiler is aligning those
>> structures to 32-bytes for some reason...
>>
>> A quick hack is add ALIGN(32) to the linker script before
>> _bs_init_begin: src/arch/x86/ramstage.ld
>
>
> So I wonder if this is unique to AMD Fam10h or if a whole lot of other
> boards broke with the gcc update.  I wouldn't have even caught this if I
> hadn't checked out a new coreboot tree instead of copying over the existing
> tree with the prebuilt crossgcc, so we might be looking at a ticking
> timebomb that will go off as people start upgrading their crossgcc
> versions...

It all sorta depends. But the issue that _bs_init_begin does not equal
the address of the first bscb structure is bad news all around.

>
>> But I think we'll need to store pointers to the structures in order to
>> properly handle the situation where the compiler is effectively making
>> alignment/size decisions for some reason.
>
>
> I am not at all familiar with the code in question, so all I can do is offer
> to test.  Thanks for analysing the problem!

I might be able to whip up a patch, but it's harder than I first
thought because we were relying on arrays to be swept into those
regions. I'll have to think on this one or we'll just have to change
the API entirely for all the users.

>
>
>> -Aaron
>
>
>
> --
> Timothy Pearson
> Raptor Engineering
> +1 (415) 727-8645
> http://www.raptorengineeringinc.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Timothy Pearson

On 03/16/2015 04:44 PM, Aaron Durbin wrote:

On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
  wrote:

On 03/16/2015 09:23 AM, Aaron Durbin wrote:


On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
   wrote:


All,

Just a heads up as there is no bugtracker for this project.  GIT commit
53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
breaks
ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
but then goes into an infinite loop).  Downgrading the GCC version
repairs
the boot failure.

Not sure if you want to revert that commit until someone can figure out
what
changed to cause the problem.



Could post ramstage.elf from the two different builds somewhere? I'd
like to take a peak at what is in there.



Sure:
https://raptorengineeringinc.com/coreboot/built.tar.bz2

Other oddities:
GCC 4.8.3:
normal/romstage0x7ff80stage97345
normal/ramstage0x97c40stage154869

GCC 4.9.2:
normal/romstage0x7ff80stage94773
normal/ramstage0x97240stage173942

Note in particular, judging from the file sizes, that something seems to
have been relocated from romstage to ramstage by the new gcc version.



I noticed you had CONFIG_COVERAGE selected in both the builds. Could
you try not having that selected? I wonder if something changed in the
compiler on that front. But... I think I found a bigger issue.


That shouldn't be a problem.  For reference, should CONFIG_COVERAGE be 
on or off for board status report builds?



$ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
00146fc4 r pch_intel_wifi
00146fd0 R cpu_drivers
00146fd0 R epci_drivers
00146fd0 r model_10xxx
00146fdc R _bs_init_begin
00146fdc r cbmem_bscb
00146fdc R ecpu_drivers
00146ff0 r gcov_bscb
0014702c R _bs_init_end
0014702c R pnp_conf_mode_870155_aa
00147034 R pnp_conf_mode_a0a0_aa
0014703c R pnp_conf_mode_8787_aa
00147044 R pnp_conf_mode__aa

$ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
001465c4 r pch_intel_wifi
001465d0 R cpu_drivers
001465d0 R epci_drivers
001465d0 r model_10xxx
001465dc R _bs_init_begin
001465dc R ecpu_drivers
001465e0 r cbmem_bscb
00146600 r gcov_bscb
0014663c R _bs_init_end
00146640 R pnp_conf_mode_870155_aa
00146648 R pnp_conf_mode_a0a0_aa
00146650 R pnp_conf_mode_8787_aa
00146658 R pnp_conf_mode__a

The boot state callbacks place the whole structure for each entry
between _bs_init_begin and _bs_init_end. For both binaries the size of
0x14.



For the 4.8.3 compiled ramstage I see:
(gdb) p/x 0x0014702c - 0x00146fdc
$12 = 0x50
For the 4.9.2 compiled ramstage I see:
(gdb) p/x 0x014663c - 0x01465dc
$14 = 0x60

0x60 is not a multiple of 0x14 -- which is means things aren't cool.


This makes perfect sense--whenever coreboot didn't hang outright it 
started infinitely spewing some message regarding a boot state callback 
already being complete.



Looking at the symbols it appears the compiler is aligning those
structures to 32-bytes for some reason...

A quick hack is add ALIGN(32) to the linker script before
_bs_init_begin: src/arch/x86/ramstage.ld


So I wonder if this is unique to AMD Fam10h or if a whole lot of other 
boards broke with the gcc update.  I wouldn't have even caught this if I 
hadn't checked out a new coreboot tree instead of copying over the 
existing tree with the prebuilt crossgcc, so we might be looking at a 
ticking timebomb that will go off as people start upgrading their 
crossgcc versions...



But I think we'll need to store pointers to the structures in order to
properly handle the situation where the compiler is effectively making
alignment/size decisions for some reason.


I am not at all familiar with the code in question, so all I can do is 
offer to test.  Thanks for analysing the problem!



-Aaron



--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Mon, Mar 16, 2015 at 1:39 PM, Timothy Pearson
 wrote:
> On 03/16/2015 09:23 AM, Aaron Durbin wrote:
>>
>> On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
>>   wrote:
>>>
>>> All,
>>>
>>> Just a heads up as there is no bugtracker for this project.  GIT commit
>>> 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2,
>>> breaks
>>> ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
>>> but then goes into an infinite loop).  Downgrading the GCC version
>>> repairs
>>> the boot failure.
>>>
>>> Not sure if you want to revert that commit until someone can figure out
>>> what
>>> changed to cause the problem.
>>
>>
>> Could post ramstage.elf from the two different builds somewhere? I'd
>> like to take a peak at what is in there.
>
>
> Sure:
> https://raptorengineeringinc.com/coreboot/built.tar.bz2
>
> Other oddities:
> GCC 4.8.3:
> normal/romstage0x7ff80stage97345
> normal/ramstage0x97c40stage154869
>
> GCC 4.9.2:
> normal/romstage0x7ff80stage94773
> normal/ramstage0x97240stage173942
>
> Note in particular, judging from the file sizes, that something seems to
> have been relocated from romstage to ramstage by the new gcc version.
>

I noticed you had CONFIG_COVERAGE selected in both the builds. Could
you try not having that selected? I wonder if something changed in the
compiler on that front. But... I think I found a bigger issue.



$ nm ./gcc4.8.3/ramstage.debug | sort | grep -C 4 _bs_init_
00146fc4 r pch_intel_wifi
00146fd0 R cpu_drivers
00146fd0 R epci_drivers
00146fd0 r model_10xxx
00146fdc R _bs_init_begin
00146fdc r cbmem_bscb
00146fdc R ecpu_drivers
00146ff0 r gcov_bscb
0014702c R _bs_init_end
0014702c R pnp_conf_mode_870155_aa
00147034 R pnp_conf_mode_a0a0_aa
0014703c R pnp_conf_mode_8787_aa
00147044 R pnp_conf_mode__aa

$ nm ./gcc4.9.2/ramstage.debug | sort | grep -C 4 _bs_init_
001465c4 r pch_intel_wifi
001465d0 R cpu_drivers
001465d0 R epci_drivers
001465d0 r model_10xxx
001465dc R _bs_init_begin
001465dc R ecpu_drivers
001465e0 r cbmem_bscb
00146600 r gcov_bscb
0014663c R _bs_init_end
00146640 R pnp_conf_mode_870155_aa
00146648 R pnp_conf_mode_a0a0_aa
00146650 R pnp_conf_mode_8787_aa
00146658 R pnp_conf_mode__a

The boot state callbacks place the whole structure for each entry
between _bs_init_begin and _bs_init_end. For both binaries the size of
0x14.

For the 4.8.3 compiled ramstage I see:
(gdb) p/x 0x0014702c - 0x00146fdc
$12 = 0x50
For the 4.9.2 compiled ramstage I see:
(gdb) p/x 0x014663c - 0x01465dc
$14 = 0x60

0x60 is not a multiple of 0x14 -- which is means things aren't cool.

Looking at the symbols it appears the compiler is aligning those
structures to 32-bytes for some reason...

A quick hack is add ALIGN(32) to the linker script before
_bs_init_begin: src/arch/x86/ramstage.ld

But I think we'll need to store pointers to the structures in order to
properly handle the situation where the compiler is effectively making
alignment/size decisions for some reason.

-Aaron



>
> --
> Timothy Pearson
> Raptor Engineering
> +1 (415) 727-8645
> http://www.raptorengineeringinc.com

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Timothy Pearson

On 03/16/2015 09:23 AM, Aaron Durbin wrote:

On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
  wrote:

All,

Just a heads up as there is no bugtracker for this project.  GIT commit
53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks
ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
but then goes into an infinite loop).  Downgrading the GCC version repairs
the boot failure.

Not sure if you want to revert that commit until someone can figure out what
changed to cause the problem.


Could post ramstage.elf from the two different builds somewhere? I'd
like to take a peak at what is in there.


Sure:
https://raptorengineeringinc.com/coreboot/built.tar.bz2

Other oddities:
GCC 4.8.3:
normal/romstage0x7ff80stage97345
normal/ramstage0x97c40stage154869

GCC 4.9.2:
normal/romstage0x7ff80stage94773
normal/ramstage0x97240stage173942

Note in particular, judging from the file sizes, that something seems to 
have been relocated from romstage to ramstage by the new gcc version.


--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Automated test system: Nominations wanted

2015-03-16 Thread Timothy Pearson

On 03/16/2015 08:17 AM, Alexander Couzens wrote:

On Mon, 16 Mar 2015 01:20:17 -0500
Timothy Pearson  wrote:


Just wanted to mention that Raptor Engineering now has an automated test
stand for the ASUS KFSN4-DRE board, run nightly with automatic bricking
recovery.  It has two Opteron 2431 (AMD Family 10h, 6 core @ 2.4GHz)CPUs
and 6GB of DDR2-667 memory installed on Node 0.

Each successful test result is recorded to the board-status repository,
and each failure is reported to this list.


Hi Timothy,

can you please write down (wiki?) how your system is setted up?
How do you do automatic bricking recovery?

Best,
lynxis



Bricking recovery uses the fallback mechanism.  There is a supervisory 
board attached to the target; it controls physical power on/power 
off/CMOS reset and also sends build/flash/test commands to the target as 
needed.


The exact code/details for the controller are not public at this time, 
though I would be happy to provide additional information on the target 
and/or entertain target hardware configuration requests (add on cards, etc.)


--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645
http://www.raptorengineeringinc.com

--
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] GCC update broke AMD Fam10h boot

2015-03-16 Thread Aaron Durbin
On Sun, Mar 15, 2015 at 2:04 PM, Timothy Pearson
 wrote:
> All,
>
> Just a heads up as there is no bugtracker for this project.  GIT commit
> 53c388fe, which updates the crossgcc GCC version from 4.8.3 to 4.9.2, breaks
> ramstage on AMD Fam10h systems (ramstage loads, sends its 0x39 POST code,
> but then goes into an infinite loop).  Downgrading the GCC version repairs
> the boot failure.
>
> Not sure if you want to revert that commit until someone can figure out what
> changed to cause the problem.

Could post ramstage.elf from the two different builds somewhere? I'd
like to take a peak at what is in there.

>
> --
> Timothy Pearson
> Raptor Engineering
> +1 (415) 727-8645
> http://www.raptorengineeringinc.com
>
> --
> coreboot mailing list: coreboot@coreboot.org
> http://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Automated test system: Nominations wanted

2015-03-16 Thread Alexander Couzens
On Mon, 16 Mar 2015 01:20:17 -0500
Timothy Pearson  wrote:

> Just wanted to mention that Raptor Engineering now has an automated test 
> stand for the ASUS KFSN4-DRE board, run nightly with automatic bricking 
> recovery.  It has two Opteron 2431 (AMD Family 10h, 6 core @ 2.4GHz)CPUs 
> and 6GB of DDR2-667 memory installed on Node 0.
> 
> Each successful test result is recorded to the board-status repository, 
> and each failure is reported to this list.

Hi Timothy,

can you please write down (wiki?) how your system is setted up?
How do you do automatic bricking recovery?

Best,
lynxis
-- 
Alexander Couzens

mail: lyn...@fe80.eu
jabber: lyn...@jabber.ccc.de
mobile: +4915123277221


pgpF1rHpIAbnY.pgp
Description: OpenPGP digital signature
-- 
coreboot mailing list: coreboot@coreboot.org
http://www.coreboot.org/mailman/listinfo/coreboot