Re: [coreboot] Add coreboot storage driver

2017-02-17 Thread Julius Werner
>
> So what we can see is that everything is serial and there is great deal of
> waiting. For that specific SDHCI case you can see "Storage device
> initialization" that is happening in depthcharge. That is CMD1 that you
> need keep on sending to the controller. As you can see, it completes in
> 130ms. Unfortunately you really can't just send CMD1 and go about your
> business. You need to poll readiness status and keep on sending CMD1 again
> and again. Also, it is not always 130ms. It tends to vary and worst case we
> seen was over 300ms.


Do you actually have an eMMC part that requires repeating CMD1 within a
certain bounded time interval? What happens if you violate that? Does it
just not progress initialization or does it actually fail in some way?

I can't find any official documentation suggesting that this is really
required. JESD84-B51 just says (6.4.3): "The busy bit in the CMD1 response
can be used by a device to tell the host that it is still working on its
power-up/reset procedure (e.g., downloading the register information from
memory field) and is not ready yet for communication. In this case the host
must repeat CMD1 until the busy bit is cleared." This suggests that the
only point of the command is polling for readiness.


> Another one is "kernel read", which is pure IO and takes 132ms. If you
> invest some 300ms in training the link (has to happen on every boot on
> every board) to HS400 you can read it in just 10ms. Naturally you can't see
> HS400 in the picture because enabling it late in the boot flow would be
> counter productive.
>

Have you considered implementing HS400-ES (enhanced strobe) support in your
host controllers? That feature allows you to run at HS400 speeds
immediately without any tuning (by essentially turning the clock master
around and having the device pulse its own clock when it's sending data
IIRC). We've had great success improving boot speed with that on a
different Chrome OS platform. This won't help you for your current
generation of SoCs yet, but at least it should resolve the tuning issue in
the long run as this feature becomes more standard (so this issue shouldn't
actually get worse and worse in the future... it should go away again).
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Andrey Petrov

Hi,

On 02/13/2017 11:16 AM, Nico Huber wrote:

On 13.02.2017 08:19, Andrey Petrov wrote:

For example Apollolake is struggling to finish firmware boot with all
the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE)
under one second.

Can you provide exhaustive figures, which part of this system's boot
process takes how long? That would make it easier to reason about where
"parallelism" would provide a benefit.


Such data is available.  Here is a boot chart I drew few months back:
http://imgur.com/a/huyPQ

I color-coded different work types. Some blocks are coded incorrectly 
please bear with me).


So what we can see is that everything is serial and there is great deal 
of waiting. For that specific SDHCI case you can see "Storage device 
initialization" that is happening in depthcharge. That is CMD1 that you 
need keep on sending to the controller. As you can see, it completes in 
130ms. Unfortunately you really can't just send CMD1 and go about your 
business. You need to poll readiness status and keep on sending CMD1 
again and again. Also, it is not always 130ms. It tends to vary and 
worst case we seen was over 300ms. Another one is "kernel read", which 
is pure IO and takes 132ms. If you invest some 300ms in training the 
link (has to happen on every boot on every board) to HS400 you can read 
it in just 10ms. Naturally you can't see HS400 in the picture because 
enabling it late in the boot flow would be counter productive.


That's essentially the motivation to why we are looking into starting 
this CMD1 and HS400 link training as early as possible. However fixing 
this particular issue is just a "per-platform" fix. I was hoping we 
could come up with a model that adds parallelism as a generic reusable 
feature not just a quick hack.


Andrey

--
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Leahy, Leroy P
Hi All,

We started looking at doing things in parallel speed the boot process and meet 
the ChromeOS boot time requirements.  One of the larger portions of boot time 
is memory initialization which is why we are considering doing parallelism 
early.

On Chromebooks, the Intel boot path is using bootstage/verstage to determine 
which version of romstage should run.  Memory initialization is being done 
during romstage which is now replaceable in the field.  One approach to 
parallelism is to use additional processors.  This approach requires that the 
cores start in the bootblock and transition to romstage to perform work in 
parallel with memory initialization.  While this approach has a number of 
issues, it is a path that might work with the existing FSP architecture.  Other 
single-thread approaches are also possible but most likely require changes to 
the FSP architecture to do work in parallel with memory initialization.

This thread has discussed multiple alternatives to achieve parallelism.  At 
this time we are not considering any type of preemptive mechanism.  Currently 
we are investigating alternatives and what benefits they bring.  If our 
investigation indicates that the parallelism significantly reduces the boot 
time and that the code is easy to develop and understand then we will share the 
patches with the coreboot community for further review and comment.

Until then we welcome constructive ways to enable coreboot to do things in 
parallel to reduce the boot time.

Thanks for your help,
Lee Leahy
(425) 881-4919
Intel Corporation
Suite 125
2700 - 156th Ave NE
Bellevue, WA 98007-6554


-Original Message-
From: coreboot [mailto:coreboot-boun...@coreboot.org] On Behalf Of Nico Huber
Sent: Tuesday, February 14, 2017 11:07 AM
To: ron minnich <rminn...@gmail.com>; Aaron Durbin <adur...@google.com>
Cc: Petrov, Andrey <andrey.pet...@intel.com>; Coreboot <coreboot@coreboot.org>
Subject: Re: [coreboot] Add coreboot storage driver

On 14.02.2017 18:56, ron minnich wrote:
> At what point is ramstage a kernel? I think at the point we add file 
> systems or preemptive scheduling. We're getting dangerously close. If 
> we really start to cross that boundary, it's time to rethink the 
> ramstage in my view. It's not a good foundation for a kernel.

Agreed. I wouldn't call it a kernel, but it really seems to grow very ugly. 
Every time I think about this, I scarcely find anything that needs to be done 
in ramstage. I believe even most payloads could live without it with some more 
initialization done in romstage.

Some things that I recall, what ramstage does:

  o MP init => maybe can be done earlier, does it need RAM generally???

  o PCI resource allocation => can be done offline
Just add the resources to the devicetree. If you want to boot
from a plugged card, that isn't in the devicetree, the payload
would have to handle it though.

  o Those small PCI device "drivers" => I doubt they need RAM

  o Table generation => Not that dynamic after all
I suppose much is done with static (compile time) information.

  o Sometimes gfx modesetting => do it in the payload

Nico

--
coreboot mailing list: coreboot@coreboot.org 
https://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Tue, Feb 14, 2017 at 1:07 PM, Patrick Georgi  wrote:
> 2017-02-14 17:12 GMT+01:00 Aaron Durbin via coreboot :
>> For an optimized bootflow
>> all pieces of work that need to be done pretty much need to be closely
>> coupled. One needs to globally optimize the full sequence.
> Like initializing slow hardware even before RAM init (as long as it's
> just an initial command)?
> How about using PIT/IRQ0 plus some tiny register-only interrupt
> routine to do trivial register wrangling (we do have register
> scripts)?

I don't think I properly understand your suggestion. For this
particular eMMC case are you suggesting taking the PIT interrupt and
doing the next piece of work in it?

>
>> that we seem to be absolutely needing to
>> maintain boot speeds. Is Chrome OS going against tide of coreboot
>> wanting to solve those sorts of issues?
> The problem is that two basic qualities collide here: speed and
> simplicity. The effect is that people ask to stop a second to
> reconsider the options.
> MPinit and parallelism are the "go to" solution for all performance
> related issues of the last 10 years, but they're not without cost.
> Questioning this approach doesn't mean that that we shouldn't go there
> at all, just that the obvious answers might not lead to simple
> solutions.
>
> As Andrey stated elsewhere, we're far from CPU bound.

Agreed. But our chunking of work is very coarsely sectioned up. I
think the other CPU path is an attempt to work around the coarseness
of the work steps in the dependency chain.

>
> For his concrete example: does eMMC init fail if you ping it more
> often than every 10ms? It better not, you already stated that it's
> hard to guarantee those 10ms, so there needs to be some spare room. We
> could look at the largest chunk of init process that could be
> restructured to implement cooperative multithreading on a single core
> for as many tasks as possible, to cut down on all those udelays (or
> even mdelays). Maybe we could even build a compiler plugin to ensure
> at compile time that the resulting code is proper (loops either have
> low bounds or are yielding, yield()/sched()/... aren't called within
> critical sections)...

That's a possibility, but you have to solve the case for each
combination of hardware present and/or per platform. Building up the
dependency chain is the most important piece. And from there to ensure
execution context is not lost for longer than a set amount of time.
We're miles away from that since we're run to completion serially
right now.

>
> Once we leave that scheduling to physics (ie enabling multicore
> operation), all bets are off (or we have to synchronize the execution
> to a degree that we could just as well do it manually). A lot of
> complexity just to have 8 times the CPU power for the same amount of
> IO bound tasks.
>
>
> Patrick

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Zoran Stojsavljevic
Listen Timothy

INTEL is FW (we can argue here, I do agree) and SW (NOT at all any
argument, it is an aksioma) very crappy company. I know that INTEL CCG
directors ordered people to watch me over, and, personally, I do NOT care.
Really I don't. I worked for 5 years for INTEL support in Bavaria.

But let me tell you one thing. Whatever/However I do NOT (somehow) trust to
INTEL FW and especially (in concrete) TO SW (mostly piece of junk, they
produce), I will give my life in/to IA (INTEL Architecture) HW top-notch
designers hands. Blindly. Now. Ever!

INTEL has IA HW group, which is The Best of The Best. I (opposing that I
should NOT know them) know couple of guys there. And they... Are... !

They can make this what I am proposing (not only me) happen. It is just
about The (Crapy) Politics. OK?

Zoran [OUT]

On Tue, Feb 14, 2017 at 8:45 PM, Timothy Pearson <
tpear...@raptorengineering.com> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 02/14/2017 01:36 PM, Zoran Stojsavljevic wrote:
> >> Where do we go from here?
> >
> > As I said (and I'll repeat, many times, if required - I do NOT care what
> > all INTEL [all their 13000+ managers] think):
> >
> > /I have another idea for INTEL SoCs/CPUs, as HW architecture
> > improvement. Why your top-notch HW guys do NOT implement MRC as part of
> > MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it?
> > MRCs should be few K in size, and they can perfectly fit in there, thus
> > MRC should be (my take on this) part of internal CPU architecture./
>
> I highly doubt this would ever happen, unless it's yet another signed
> blob with highly privileged access.  The main problem is that memory
> initialisation is very complex, and there is a definite need to be able
> to issue updates when / if a CPU / MB / DIMM combination fails to
> function correctly.
>
> Personally, having worked on RAM initialisation for many different
> systems (embedded to server), I find it ludicrous that this can be
> considered top secret IP worthy of a closed blob.  It takes time to get
> right, but the end result is inextricably tied to the hardware in
> question and is really not much more than a hardware-specific
> implementation of the bog-standard and widely known DDR init algorithms.
>
> Intel, why the blob?  What's hiding in there?  Asian companies I know
> tend to keep things closed to avoid patent lawsuits over stolen IP, but
> I highly doubt you have this problem?
>
> Just my *personal* $0.02 here. :-)
>
> - --
> Timothy Pearson
> Raptor Engineering
> +1 (415) 727-8645 (direct line)
> +1 (512) 690-0200 (switchboard)
> https://www.raptorengineering.com
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iQEcBAEBAgAGBQJYo15LAAoJEK+E3vEXDOFbCCMH/RazQCmj3rW8a+4hWG0Wt2zR
> eGrSnsoEgfnZOCiPO6lLQX3w799kr3lZUu02HtS0CvsHbIpzUPQ7cdNzcd6kgtMN
> /AzYNonmU6PM2dPjAMyrtk7oVInN7VsakfE3RaAvkqh9SdBH7z35AWbsgjXk1iQA
> +6n7EXFy1cfJKqk2OdNVcWCCf4b8tEZ5n9WcWufhLie0z/r7Fll4Jsk2UF39G+P4
> lsSp757bO2Y2juoEPmPT+lm6akSrV8h37FirnmIFvzbPhpGfJ2IrwJkjRpgK4Ypn
> 9hrvDMctmvq6X+FtW6y7Eer94i1jzXvWSjXJlOSnHPGJKAh946EhztmRijceMf4=
> =4Oil
> -END PGP SIGNATURE-
>
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Tue, Feb 14, 2017 at 1:06 PM, Nico Huber  wrote:
> On 14.02.2017 18:56, ron minnich wrote:
>> At what point is ramstage a kernel? I think at the point we add file
>> systems or preemptive scheduling. We're getting dangerously close. If we
>> really start to cross that boundary, it's time to rethink the ramstage in
>> my view. It's not a good foundation for a kernel.
>
> Agreed. I wouldn't call it a kernel, but it really seems to grow very
> ugly. Every time I think about this, I scarcely find anything that needs
> to be done in ramstage. I believe even most payloads could live without
> it with some more initialization done in romstage.
>
> Some things that I recall, what ramstage does:
>
>   o MP init => maybe can be done earlier, does it need RAM generally???

You need a stack and vector for the SIPI to be somewhere.

>
>   o PCI resource allocation => can be done offline
> Just add the resources to the devicetree. If you want to boot
> from a plugged card, that isn't in the devicetree, the payload
> would have to handle it though.

This largely works with static allocation, but the question is if you
want to handle different SKUs of devices that have different hardware
behind root bridges. You need to recalculate the IO windows. You could
produce a signature of devices and leverage that for picking the right
static allocation. Doable, but gets kinda funky needing to run the
allocation pass for each configuration and ensuring its updated
properly.
>
>   o Those small PCI device "drivers" => I doubt they need RAM

That's how their initialization code is currently scheduled. May not
need RAM, but I'm not sure that's what makes them distinctive nor why
you bring this up? Just a regular pci device doesn't need anything in
practice. It's the workarounds and things that need to be done for
power optimization, etc is where the complexity arises. Using pci
device "drivers" as a proxy for all pci devices isn't representative.

>
>   o Table generation => Not that dynamic after all
> I suppose much is done with static (compile time) information.

Sure, if you go and analyze devicetree.cb to know all the options.
Tables have quite a few things that change based on runtime attributes
aside from that. For example, a single firmware build can support a
different number SoC models that have largely different numbers of
CPUs, etc or support different feature sets that require different
table generation.

>
>   o Sometimes gfx modesetting => do it in the payload
>
> Nico

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Timothy Pearson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/14/2017 01:36 PM, Zoran Stojsavljevic wrote:
>> Where do we go from here?
> 
> As I said (and I'll repeat, many times, if required - I do NOT care what
> all INTEL [all their 13000+ managers] think):
> 
> /I have another idea for INTEL SoCs/CPUs, as HW architecture
> improvement. Why your top-notch HW guys do NOT implement MRC as part of
> MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it?
> MRCs should be few K in size, and they can perfectly fit in there, thus
> MRC should be (my take on this) part of internal CPU architecture./

I highly doubt this would ever happen, unless it's yet another signed
blob with highly privileged access.  The main problem is that memory
initialisation is very complex, and there is a definite need to be able
to issue updates when / if a CPU / MB / DIMM combination fails to
function correctly.

Personally, having worked on RAM initialisation for many different
systems (embedded to server), I find it ludicrous that this can be
considered top secret IP worthy of a closed blob.  It takes time to get
right, but the end result is inextricably tied to the hardware in
question and is really not much more than a hardware-specific
implementation of the bog-standard and widely known DDR init algorithms.

Intel, why the blob?  What's hiding in there?  Asian companies I know
tend to keep things closed to avoid patent lawsuits over stolen IP, but
I highly doubt you have this problem?

Just my *personal* $0.02 here. :-)

- -- 
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645 (direct line)
+1 (512) 690-0200 (switchboard)
https://www.raptorengineering.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJYo15LAAoJEK+E3vEXDOFbCCMH/RazQCmj3rW8a+4hWG0Wt2zR
eGrSnsoEgfnZOCiPO6lLQX3w799kr3lZUu02HtS0CvsHbIpzUPQ7cdNzcd6kgtMN
/AzYNonmU6PM2dPjAMyrtk7oVInN7VsakfE3RaAvkqh9SdBH7z35AWbsgjXk1iQA
+6n7EXFy1cfJKqk2OdNVcWCCf4b8tEZ5n9WcWufhLie0z/r7Fll4Jsk2UF39G+P4
lsSp757bO2Y2juoEPmPT+lm6akSrV8h37FirnmIFvzbPhpGfJ2IrwJkjRpgK4Ypn
9hrvDMctmvq6X+FtW6y7Eer94i1jzXvWSjXJlOSnHPGJKAh946EhztmRijceMf4=
=4Oil
-END PGP SIGNATURE-

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Zoran Stojsavljevic
> Where do we go from here?

As I said (and I'll repeat, many times, if required - I do NOT care what
all INTEL [all their 13000+ managers] think):

*I have another idea for INTEL SoCs/CPUs, as HW architecture improvement.
Why your top-notch HW guys do NOT implement MRC as part of MCU. Some HW
thread inside CPU/SoC should execute MCU, shouldn't it? MRCs should be few
K in size, and they can perfectly fit in there, thus MRC should be (my take
on this) part of internal CPU architecture.*

*Today's INTEL COREs and ATOMs have at least/minimum 100M gates, why not to
add couple of dozen K more? Lot of problems solved, don't they? ;-)*
*[1] BOOT stage to be much shorter (no anything such as CAR phase);*
*[2] ROM stage does not exist;*
*[3] IP preserved in HW, so the whole INTEL FSP is actually (imagine the
Beauty) Open Source...*

With INTEL. Here we go! Where no one has gone before! ;-)

Zoran

On Tue, Feb 14, 2017 at 6:56 PM, ron minnich  wrote:

> Just a reminder about times past. This discussion has been ongoing since
> 2000. In my view the questions come down to how much the ramstage does, how
> that impacts code complexity and performance, and when the ramstage gets so
> much capability that it ought to be a kernel.
>
> In the earliest iteration, there was no ramstage per se. What we now call
> the ramstage was a Linux kernel.
>
> We had lots of discussions in the early days with LNXI and others about
> what would boot fastest, a dedicated boot loader like etherboot or a
> general purpose kernel like Linux. In all the cases we measured at Los
> Alamos, Linux always won, easily: yes, slower to load than etherboot, more
> startup overhead, but once started Linux support for concurrency and
> parallelism always won the day. Loaders like etherboot (and its descendant,
> iPXE) spend most of their time doing nothing (as measured at the time). It
> was fun to boot 1000 nodes in the time it took PXE on one node to find a
> connected NIC.
>
> The arguments over payload ended when the FLASH sockets changed to QFP and
> maxed at 256K and Linux could no longer fit.
>
> But if your goal is fast boot, in fact if your goal is 800 miliseconds, we
> know this can work on slow ARMs with Linux, as was shown in 2006.
>
> The very first ramstage was created because Linux could not correctly
> configure a PCI bus in 2000. The core of the ramstage as we know it was the
> PCI config.
>
> We wanted to have ramstage only do PCI setup. We initially put SMP startup
> in Linux, which worked on all but K7, at which point ramstage took on SMP
> startup too. And ramstage started to grow. The growth has never stopped.
>
> At what point is ramstage a kernel? I think at the point we add file
> systems or preemptive scheduling. We're getting dangerously close. If we
> really start to cross that boundary, it's time to rethink the ramstage in
> my view. It's not a good foundation for a kernel.
>
> I've experimented with kernel-as-ramstage with harvey on the riscv and it
> worked. In this case, I manually removed the ramstage from coreboot.rom and
> replaced it with a kernel. It would be interesting, to me at least, to have
> a Kconfig option whereby we can replace the ramstage with some other ELF
> file, to aid such exploration.
>
> I also wonder if we're not at a fork in the road in some ways. There are
> open systems, like RISCV, in which we have full control and can really get
> flexibility in how we boot. We can influence the RISCV vendors not to
> implement hardware designs that have negative impact on firmware and boot
> time performance. And then there are closed systems, like x86, in which
> many opportunities for optimization are lost, and we have little
> opportunity to impact hardware design. We also can't get very smart on x86
> because the FSP boulder blocks the road.
>
> Where do we go from here?
>
> ron
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot
>
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Patrick Georgi via coreboot
2017-02-14 17:12 GMT+01:00 Aaron Durbin via coreboot :
> For an optimized bootflow
> all pieces of work that need to be done pretty much need to be closely
> coupled. One needs to globally optimize the full sequence.
Like initializing slow hardware even before RAM init (as long as it's
just an initial command)?
How about using PIT/IRQ0 plus some tiny register-only interrupt
routine to do trivial register wrangling (we do have register
scripts)?

> that we seem to be absolutely needing to
> maintain boot speeds. Is Chrome OS going against tide of coreboot
> wanting to solve those sorts of issues?
The problem is that two basic qualities collide here: speed and
simplicity. The effect is that people ask to stop a second to
reconsider the options.
MPinit and parallelism are the "go to" solution for all performance
related issues of the last 10 years, but they're not without cost.
Questioning this approach doesn't mean that that we shouldn't go there
at all, just that the obvious answers might not lead to simple
solutions.

As Andrey stated elsewhere, we're far from CPU bound.

For his concrete example: does eMMC init fail if you ping it more
often than every 10ms? It better not, you already stated that it's
hard to guarantee those 10ms, so there needs to be some spare room. We
could look at the largest chunk of init process that could be
restructured to implement cooperative multithreading on a single core
for as many tasks as possible, to cut down on all those udelays (or
even mdelays). Maybe we could even build a compiler plugin to ensure
at compile time that the resulting code is proper (loops either have
low bounds or are yielding, yield()/sched()/... aren't called within
critical sections)...

Once we leave that scheduling to physics (ie enabling multicore
operation), all bets are off (or we have to synchronize the execution
to a degree that we could just as well do it manually). A lot of
complexity just to have 8 times the CPU power for the same amount of
IO bound tasks.


Patrick

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Nico Huber
On 14.02.2017 18:56, ron minnich wrote:
> At what point is ramstage a kernel? I think at the point we add file
> systems or preemptive scheduling. We're getting dangerously close. If we
> really start to cross that boundary, it's time to rethink the ramstage in
> my view. It's not a good foundation for a kernel.

Agreed. I wouldn't call it a kernel, but it really seems to grow very
ugly. Every time I think about this, I scarcely find anything that needs
to be done in ramstage. I believe even most payloads could live without
it with some more initialization done in romstage.

Some things that I recall, what ramstage does:

  o MP init => maybe can be done earlier, does it need RAM generally???

  o PCI resource allocation => can be done offline
Just add the resources to the devicetree. If you want to boot
from a plugged card, that isn't in the devicetree, the payload
would have to handle it though.

  o Those small PCI device "drivers" => I doubt they need RAM

  o Table generation => Not that dynamic after all
I suppose much is done with static (compile time) information.

  o Sometimes gfx modesetting => do it in the payload

Nico

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Tue, Feb 14, 2017 at 11:56 AM, ron minnich  wrote:
> Just a reminder about times past. This discussion has been ongoing since
> 2000. In my view the questions come down to how much the ramstage does, how
> that impacts code complexity and performance, and when the ramstage gets so
> much capability that it ought to be a kernel.
>
> In the earliest iteration, there was no ramstage per se. What we now call
> the ramstage was a Linux kernel.
>
> We had lots of discussions in the early days with LNXI and others about what
> would boot fastest, a dedicated boot loader like etherboot or a general
> purpose kernel like Linux. In all the cases we measured at Los Alamos, Linux
> always won, easily: yes, slower to load than etherboot, more startup
> overhead, but once started Linux support for concurrency and parallelism
> always won the day. Loaders like etherboot (and its descendant, iPXE) spend
> most of their time doing nothing (as measured at the time). It was fun to
> boot 1000 nodes in the time it took PXE on one node to find a connected NIC.
>
> The arguments over payload ended when the FLASH sockets changed to QFP and
> maxed at 256K and Linux could no longer fit.
>
> But if your goal is fast boot, in fact if your goal is 800 miliseconds, we
> know this can work on slow ARMs with Linux, as was shown in 2006.
>
> The very first ramstage was created because Linux could not correctly
> configure a PCI bus in 2000. The core of the ramstage as we know it was the
> PCI config.
>
> We wanted to have ramstage only do PCI setup. We initially put SMP startup
> in Linux, which worked on all but K7, at which point ramstage took on SMP
> startup too. And ramstage started to grow. The growth has never stopped.
>
> At what point is ramstage a kernel? I think at the point we add file systems
> or preemptive scheduling. We're getting dangerously close. If we really
> start to cross that boundary, it's time to rethink the ramstage in my view.
> It's not a good foundation for a kernel.
>
> I've experimented with kernel-as-ramstage with harvey on the riscv and it
> worked. In this case, I manually removed the ramstage from coreboot.rom and
> replaced it with a kernel. It would be interesting, to me at least, to have
> a Kconfig option whereby we can replace the ramstage with some other ELF
> file, to aid such exploration.
>
> I also wonder if we're not at a fork in the road in some ways. There are
> open systems, like RISCV, in which we have full control and can really get
> flexibility in how we boot. We can influence the RISCV vendors not to
> implement hardware designs that have negative impact on firmware and boot
> time performance. And then there are closed systems, like x86, in which many
> opportunities for optimization are lost, and we have little opportunity to
> impact hardware design. We also can't get very smart on x86 because the FSP
> boulder blocks the road.
>
> Where do we go from here?

That I'm not sure. And it does very much depend on the goals of the
project. I will say this, though. Not all architectures are the same
so comparing them both as apples is impossible. With ARM punting
almost all of its initialization to ATF or the kernel it's not
surprising that coreboot's current architecture is simple and easy for
it. The work has just been pushed into other places. For some reason
Intel continually decides to place a large amount of things into the
firmware to do, but I think that decision is usually taken because it
keeps the kernel simpler. The complexity just got moved to a different
place in the stack. Coupled with the decision to hide the SoC support
into a closed off blob just makes things worse. When comparing an
Intel solution to an ARM vendor the SoC bits for bring up are much
more open and thus easier to optimize, if needed. As noted before you
can't punt things out on x86 where device visibility needs be
configured prior to resource allocation so there's definitely
intertwining involved in bringing up the intel SoCs. Firmware is
inherently exposed to the micro-architecture of the underlying device.
There's not a good way around that. Acting like it's not doesn't solve
that problem.

>
> ron

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread ron minnich
Just a reminder about times past. This discussion has been ongoing since
2000. In my view the questions come down to how much the ramstage does, how
that impacts code complexity and performance, and when the ramstage gets so
much capability that it ought to be a kernel.

In the earliest iteration, there was no ramstage per se. What we now call
the ramstage was a Linux kernel.

We had lots of discussions in the early days with LNXI and others about
what would boot fastest, a dedicated boot loader like etherboot or a
general purpose kernel like Linux. In all the cases we measured at Los
Alamos, Linux always won, easily: yes, slower to load than etherboot, more
startup overhead, but once started Linux support for concurrency and
parallelism always won the day. Loaders like etherboot (and its descendant,
iPXE) spend most of their time doing nothing (as measured at the time). It
was fun to boot 1000 nodes in the time it took PXE on one node to find a
connected NIC.

The arguments over payload ended when the FLASH sockets changed to QFP and
maxed at 256K and Linux could no longer fit.

But if your goal is fast boot, in fact if your goal is 800 miliseconds, we
know this can work on slow ARMs with Linux, as was shown in 2006.

The very first ramstage was created because Linux could not correctly
configure a PCI bus in 2000. The core of the ramstage as we know it was the
PCI config.

We wanted to have ramstage only do PCI setup. We initially put SMP startup
in Linux, which worked on all but K7, at which point ramstage took on SMP
startup too. And ramstage started to grow. The growth has never stopped.

At what point is ramstage a kernel? I think at the point we add file
systems or preemptive scheduling. We're getting dangerously close. If we
really start to cross that boundary, it's time to rethink the ramstage in
my view. It's not a good foundation for a kernel.

I've experimented with kernel-as-ramstage with harvey on the riscv and it
worked. In this case, I manually removed the ramstage from coreboot.rom and
replaced it with a kernel. It would be interesting, to me at least, to have
a Kconfig option whereby we can replace the ramstage with some other ELF
file, to aid such exploration.

I also wonder if we're not at a fork in the road in some ways. There are
open systems, like RISCV, in which we have full control and can really get
flexibility in how we boot. We can influence the RISCV vendors not to
implement hardware designs that have negative impact on firmware and boot
time performance. And then there are closed systems, like x86, in which
many opportunities for optimization are lost, and we have little
opportunity to impact hardware design. We also can't get very smart on x86
because the FSP boulder blocks the road.

Where do we go from here?

ron
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Mon, Feb 13, 2017 at 1:16 PM, Nico Huber  wrote:
> On 13.02.2017 08:19, Andrey Petrov wrote:
>> For example Apollolake is struggling to finish firmware boot with all
>> the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE)
>> under one second.
> Can you provide exhaustive figures, which part of this system's boot
> process takes how long? That would make it easier to reason about where
> "parallelism" would provide a benefit.
>
>> In order to address this problem we can do following things:
>> 1. Add scheduler, early or not
>
> Yes, but really doesn't fit into the coreboot idea, IMHO.
>
>> 2. Add early MPinit code
>
> No? um, at best very limited (by the number of threads the hardware sup-
> ports).
>
>> For [2] we have been working on prototype for Apollolake that does
>> pre-memory MPinit. We've got to a stage where we can run C code on
>> another core before DRAM is up (please do not try that at home, because
>> you'd need custom experimental ucode). However, there are many questions
>> what model to use and how to create infrastructure to run code in
>> parallel in such early stage. Shall we just add "run this (mini) stage
>> on this core" concept? Or shall we add tasklet/worklet structures that
>> would allow code to live in run and when migration to DRAM happens have
>> infrastructure take care of managing context and potentially resume it?
>> One problem is that code running with CAR needs to stop by the time
>> system is ready to tear down CAR and migrate to DRAM. We don't want to
>> delay that by waiting on such task to complete. At the same time,
>> certain task may have largely fluctuating run times so you would want to
>> continue them. It is actually may be possible just to do that, if we use
>> same address space for CAR and DRAM. But come to think of it, this is
>> just a tip of iceberg and there are packs of other issues we would need
>> to deal with.
>
> Sounds very scary, as if it would never fit, not matter how strong you
> push. If you really think, we should do something in parallel across
> coreboot stages, it might be time to redesign the whole thing across
> stages.
>
> As long as there is a concept involving romstage/ramstage, we should
> keep it to one thing in romstage: getting DRAM up. If this needs a
> clumsy blob, then accept its time penalty.
>
>>
>> Does any of that make sense? Perhaps somebody thought of this before?
>> Let's see what may be other ways to deal with this challenge.
>
> 3. Design a driver architecture that doesn't suffer from io-waiting
>
> This is something I kept in mind for payloads for some time now, but it
> could also apply to later coreboot stages: Instead of busy waiting for
> i/o, a driver could yield execution until it's called again. Obviously,
> this only helps if there is more than one driver running in "parallel".
> But it scales much better than one virtual core per driver...
>
> Another idea just popped up: Performing "background" tasks in udelay()
> / mdelay() implementations ;)
>
> I guess there are many more, maybe some viable, approaches to solve it
> with only one thread of execution.
>
> Anyway, I rather see this parallelism in payloads. Another thought: If
> there is something in coreboot that really slows booting down, maybe
> that could be moved into the payload?

I don't think things are as simple as that with the current solution
for these platforms. FSP very much complicates things because the
execution context is lost on the transfer. But it's actually worse
than that because resource allocation is dependent on the presence of
PCI devices. If those disappear or appear after resource allocation
then the IO map is not so hot. Things are definitely tightly coupled
so it's not clear to me the answer is to punt everything to a payload
making everything better.

FWIW, I've provided the feedback on FSP and its current deficiencies.
However, FSP does allow one to ship products without having to deal
with UEFI as a solution to the firmware and ensure all the correct
hardware tuning is done since that's the only place Intel supports
documenting/maintaining correct initialization sequences. It's
definitely a predicament if one wants to continue shipping products on
new hardware.

>
> Nico
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Mon, Feb 13, 2017 at 9:32 PM, ron minnich  wrote:
> andrey, great questions. If you're really concerned about those issues, then
> yes, maybe a space sharing solution is the right one.
>
> I really would rather not see people implementing schedulers at this point.
> If we're going to go that route, let's get a reasonable
> operating system and use it instead. If we continue on coreboot's current
> trajectory we're going to end up like every other
> firmware project that became an OS, and that to me is the wrong direction.


It's quite the predicament if we don't want to give up on boot speed.
Being heavily invested in coreboot is where we currently are -- for
better or worse (I think for the better). For an optimized bootflow
all pieces of work that need to be done pretty much need to be closely
coupled. One needs to globally optimize the full sequence. Carving
that work into granular pieces across different code bases just leaves
the perf on the floor that we seem to be absolutely needing to
maintain boot speeds. Is Chrome OS going against tide of coreboot
wanting to solve those sorts of issues?

>
> ron
>
> On Mon, Feb 13, 2017 at 6:43 PM Andrey Petrov 
> wrote:
>>
>> Hi,
>>
>> On 02/13/2017 12:31 PM, ron minnich wrote:
>> > Another idea just popped up: Performing "background" tasks in
>> > udelay()
>> > / mdelay() implementations ;)
>> >
>> >
>> > that is adurbin's threading model. I really like it.
>> >
>> > A lot of times, concurrency will get you just as far as ||ism without
>> > the nastiness.
>>
>> But how do you guarantee code will get a slice of execution time when it
>> needs it? For example for eMMC link training you need to issue certain
>> commands with certain time interval. Lets say every 10ms. How do you
>> make sure that happens? You can keep track of time and see when next
>> piece of work needs to be scheduled, but how do you guarantee you enter
>> this udelay code often enough?
>>
>> Andrey
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Mon, Feb 13, 2017 at 5:28 PM, Julius Werner  wrote:
> +1 for preferring a single-core concurrency model. This would be much more
> likely to be reusable for other platforms, and much simpler to maintain in
> the long run (way less platform-specific details to keep track of and figure
> out again and again for every new chipset). You CAR problems would become
> much more simple... just make sure the scheduler structures get migrated
> together with the rest of the globals and it should work fine out of the
> box.

FWIW, there's no coherency in CAR. It's per building block of the
hardware units -- much like multiple nodes in AMD K* systems.
Migrating CAR not necessarily a simple solution, but I'm not convinced
we need multiple cores executing with CAR as a backing store.

>
> On Mon, Feb 13, 2017 at 12:31 PM, ron minnich  wrote:
>>
>>
>>
>> On Mon, Feb 13, 2017 at 11:17 AM Nico Huber  wrote:
>>>
>>>
>>>
>>> Another idea just popped up: Performing "background" tasks in udelay()
>>> / mdelay() implementations ;)
>>
>>
>> that is adurbin's threading model. I really like it.
>>
>> A lot of times, concurrency will get you just as far as ||ism without the
>> nastiness.
>>
>> But if we're going to make a full up kernel for rom, my suggestion is we
>> could start with a real kernel, perhaps linux. We could then rename coreboot
>> to, say, LinuxBIOS.
>>
>> ron
>>
>> --
>> coreboot mailing list: coreboot@coreboot.org
>> https://www.coreboot.org/mailman/listinfo/coreboot
>
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Mon, Feb 13, 2017 at 8:43 PM, Andrey Petrov  wrote:
> Hi,
>
> On 02/13/2017 12:31 PM, ron minnich wrote:
>>
>> Another idea just popped up: Performing "background" tasks in udelay()
>> / mdelay() implementations ;)
>>
>>
>> that is adurbin's threading model. I really like it.
>>
>> A lot of times, concurrency will get you just as far as ||ism without
>> the nastiness.
>
>
> But how do you guarantee code will get a slice of execution time when it
> needs it? For example for eMMC link training you need to issue certain
> commands with certain time interval. Lets say every 10ms. How do you make
> sure that happens? You can keep track of time and see when next piece of
> work needs to be scheduled, but how do you guarantee you enter this udelay
> code often enough?
>

You can't guarantee anything like that. You'd need compiler help for
yielding at loop and function boundaries. Or you go the interrupt
route and reschedule. Or you use the other CPUs like you already
mentioned.

coreboot code base is not currently sympathetic to multiple threads
with full pre-emption. The threads currently provided in coreboot do
yielding in udelay calls as a safe place to reschedule. That by no
means provides any latency guarantees and since we have no concept of
work we can't ensure there's a guaranteed latency between each chunk
of work.

> Andrey
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Aaron Durbin via coreboot
On Mon, Feb 13, 2017 at 8:05 AM, Peter Stuge  wrote:
> Andrey Petrov wrote:
>> We are considering adding early parallel code execution in coreboot.
>> We need to discuss how this can be done.
>
> No - first we need to duscuss *if* this should be done.
>
>
>> Nowadays we see firmware getting more complicated.
>
> Sorry, but that's nonsense. Indeed MSFT is pushing more and more
> complicated requirements into the EFI/UEFI ecosystem, but that's
> their problem, not a universal one.
>
>
> Your colleague wants to speed up boot time by moving storage driver
> code from the payload into coreboot proper, but in fact this goes
> directly against the design goals of coreboot, so here's a refresh:
>
> * coreboot has *minimal* platform (think buses, not peripherals)
>   initialization code
>
> * A payload does everything further.


There's an inherent sequence point between coreboot and the payload.
All of coreboot needs to complete prior to handing off execution to
the payload. Everyone's boot up process differs, but boot speed is
something Chrome OS cares about very much. That's one of the reason
coreboot has been enlisted for Chrome OS products. By maintaining that
delineation boot speed can very much suffer. Pushing work out to
another piece of software doesn't inherently reduce the total amount
of work to be done. That's the current dilemma. Do we just throw our
hands up and say things will continue to be slower? Or do we come up
to solutions to the current problems we're seeing?

>
>
>> For example Apollolake is struggling to finish firmware boot with all
>> the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE)
>> under one second. Interestingly, great deal of tasks that needs to be
>> done are not even computation-bound. They are IO bound.
>
> How much of that time is spent in the FSP?
>
>
>> scheduler
> ..
>> how to create infrastructure to run code in parallel in such early stage
>
> I think you are going in completely the wrong direction.
>
> You want a scheduler, but that very clearly does not belong in coreboot.
>
>
>> Shall we just add "run this (mini) stage on this core" concept?
>> Or shall we add tasklet/worklet structures
>
> Neither. The correct engineering solution is very simple - adapt FSP
> to fit into coreboot, instead of trying to do things the other way
> around.
>
> This means that your scheduler lives in the payload. There is already
> precedent - SeaBIOS also already implements multitasking.
>
>
>> this is just a tip of iceberg
>
> That's exactly why it has no place within coreboot, but belongs in
> a payload.
>
>
> //Peter
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-14 Thread Zoran Stojsavljevic
Hello Andrey,

I found that Coreboot really implements atomic and semaphores operations?!
What for? Did not expect to find these... At all???

The ONLY reason why, is that in some SoCs are going some independent
(invisible) HW threads using the same resources as BSP core (all other
cores should be waiting on some SIPI event). I do NOT see any other
reason(s) for them to be used. No multi-threading in Coreboot, just single
thread continuously/sequentially executing correct?

Why not BIOS? There are 100s of millions of PCs, notebooks etc. out there,
and these are slow with BIOS. You can argue and tell: IOTG will soon have
billions of smart devices using SoCs. Valid point.

In the sense, I have another idea for INTEL SoCs/CPUs, as HW architecture
improvement. Why your top-notch HW guys do NOT implement MRC as part of
MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it? MRCs
should be few K in size, and they can perfectly fit in there, thus MRC
should be (my take on this) part of internal CPU architecture.

Today's INTEL COREs and ATOMs have at least/minimum 100M gates, why not to
add couple of dozen K more? Lot of problems solved, don't they? ;-)
[1] BOOT stage to be much shorter (no anything such as CAR phase);
[2] ROM stage does not exist;
[3] IP preserved in HW, so the whole INTEL FSP is actually (imagine the
Beauty) Open Source...

Just $.02 in addition to original $.02 (makes it nickel - $.01). :-)

Zoran

On Mon, Feb 13, 2017 at 7:08 PM, Andrey Petrov 
wrote:

> Hi,
>
> On 02/13/2017 12:21 AM, Zoran Stojsavljevic wrote:
>
> IBVs can work on this proposal, and see how BIOS boot-up time will improve
>> (by this parallelism)
>>
>
> There is no need to wait for anybody to see real-world benefits.
>
> The original patch where you train eMMC link already saves some 50ms.
> However MP init kicks in very late. That is a limitation of current
> approach where MPinit depends on DRAM to be available. If you move mpinit
> earlier, you can already get approx 200ms saving. On Apollolake we have a
> prototype where MPinit happens in bootblock. That already reduces boot time
> by some 200ms.
>
> Since, very soon, you'll run to shared HW resource, and then you'll need
>> to implement semaphores, atomic operations and God knows what!?
>>
>
> Fortunately, divine powers have nothing to do with it. Atomic operations
> are already implemented and spinlocks are in as well.
>
> What other major issues you see, Zoran?
>
> thanks
> Andrey
>
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread ron minnich
andrey, great questions. If you're really concerned about those issues,
then yes, maybe a space sharing solution is the right one.

I really would rather not see people implementing schedulers at this point.
If we're going to go that route, let's get a reasonable
operating system and use it instead. If we continue on coreboot's current
trajectory we're going to end up like every other
firmware project that became an OS, and that to me is the wrong direction.

ron

On Mon, Feb 13, 2017 at 6:43 PM Andrey Petrov 
wrote:

> Hi,
>
> On 02/13/2017 12:31 PM, ron minnich wrote:
> > Another idea just popped up: Performing "background" tasks in
> udelay()
> > / mdelay() implementations ;)
> >
> >
> > that is adurbin's threading model. I really like it.
> >
> > A lot of times, concurrency will get you just as far as ||ism without
> > the nastiness.
>
> But how do you guarantee code will get a slice of execution time when it
> needs it? For example for eMMC link training you need to issue certain
> commands with certain time interval. Lets say every 10ms. How do you
> make sure that happens? You can keep track of time and see when next
> piece of work needs to be scheduled, but how do you guarantee you enter
> this udelay code often enough?
>
> Andrey
>
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Andrey Petrov

Hi,

On 02/13/2017 12:31 PM, ron minnich wrote:

Another idea just popped up: Performing "background" tasks in udelay()
/ mdelay() implementations ;)


that is adurbin's threading model. I really like it.

A lot of times, concurrency will get you just as far as ||ism without
the nastiness.


But how do you guarantee code will get a slice of execution time when it 
needs it? For example for eMMC link training you need to issue certain 
commands with certain time interval. Lets say every 10ms. How do you 
make sure that happens? You can keep track of time and see when next 
piece of work needs to be scheduled, but how do you guarantee you enter 
this udelay code often enough?


Andrey

--
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread ron minnich
I don't see the big deal here, actually. We've had a nice concurrency in
coreboot for years, it works, I've used it, what else do we need to do?

On Mon, Feb 13, 2017 at 3:39 PM Vadim Bendebury 
wrote:

> Incidentally, a few years ago Chirantan and Simon (cced) implemented
> u-boot concurrency support for an ARM SOC. I don't remember how much gain
> it was bringing, and it did not go into production as it was quite late in
> the project cycle.
>
> But they might have some experience to share.
>
> -v
>
>
> On Tue, Feb 14, 2017 at 7:28 AM, Julius Werner 
> wrote:
>
> +1 for preferring a single-core concurrency model. This would be much more
> likely to be reusable for other platforms, and much simpler to maintain in
> the long run (way less platform-specific details to keep track of and
> figure out again and again for every new chipset). You CAR problems would
> become much more simple... just make sure the scheduler structures get
> migrated together with the rest of the globals and it should work fine out
> of the box.
>
> On Mon, Feb 13, 2017 at 12:31 PM, ron minnich  wrote:
>
>
>
> On Mon, Feb 13, 2017 at 11:17 AM Nico Huber  wrote:
>
>
>
> Another idea just popped up: Performing "background" tasks in udelay()
> / mdelay() implementations ;)
>
>
> that is adurbin's threading model. I really like it.
>
> A lot of times, concurrency will get you just as far as ||ism without the
> nastiness.
>
> But if we're going to make a full up kernel for rom, my suggestion is we
> could start with a real kernel, perhaps linux. We could then rename
> coreboot to, say, LinuxBIOS.
>
> ron
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot
>
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Vadim Bendebury
Incidentally, a few years ago Chirantan and Simon (cced) implemented u-boot
concurrency support for an ARM SOC. I don't remember how much gain it was
bringing, and it did not go into production as it was quite late in the
project cycle.

But they might have some experience to share.

-v


On Tue, Feb 14, 2017 at 7:28 AM, Julius Werner  wrote:

> +1 for preferring a single-core concurrency model. This would be much more
> likely to be reusable for other platforms, and much simpler to maintain in
> the long run (way less platform-specific details to keep track of and
> figure out again and again for every new chipset). You CAR problems would
> become much more simple... just make sure the scheduler structures get
> migrated together with the rest of the globals and it should work fine out
> of the box.
>
> On Mon, Feb 13, 2017 at 12:31 PM, ron minnich  wrote:
>
>>
>>
>> On Mon, Feb 13, 2017 at 11:17 AM Nico Huber  wrote:
>>
>>>
>>>
>>> Another idea just popped up: Performing "background" tasks in udelay()
>>> / mdelay() implementations ;)
>>>
>>
>> that is adurbin's threading model. I really like it.
>>
>> A lot of times, concurrency will get you just as far as ||ism without the
>> nastiness.
>>
>> But if we're going to make a full up kernel for rom, my suggestion is we
>> could start with a real kernel, perhaps linux. We could then rename
>> coreboot to, say, LinuxBIOS.
>>
>> ron
>>
>> --
>> coreboot mailing list: coreboot@coreboot.org
>> https://www.coreboot.org/mailman/listinfo/coreboot
>>
>
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot
>
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Julius Werner
+1 for preferring a single-core concurrency model. This would be much more
likely to be reusable for other platforms, and much simpler to maintain in
the long run (way less platform-specific details to keep track of and
figure out again and again for every new chipset). You CAR problems would
become much more simple... just make sure the scheduler structures get
migrated together with the rest of the globals and it should work fine out
of the box.

On Mon, Feb 13, 2017 at 12:31 PM, ron minnich  wrote:

>
>
> On Mon, Feb 13, 2017 at 11:17 AM Nico Huber  wrote:
>
>>
>>
>> Another idea just popped up: Performing "background" tasks in udelay()
>> / mdelay() implementations ;)
>>
>
> that is adurbin's threading model. I really like it.
>
> A lot of times, concurrency will get you just as far as ||ism without the
> nastiness.
>
> But if we're going to make a full up kernel for rom, my suggestion is we
> could start with a real kernel, perhaps linux. We could then rename
> coreboot to, say, LinuxBIOS.
>
> ron
>
> --
> coreboot mailing list: coreboot@coreboot.org
> https://www.coreboot.org/mailman/listinfo/coreboot
>
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread ron minnich
On Mon, Feb 13, 2017 at 11:17 AM Nico Huber  wrote:

>
>
> Another idea just popped up: Performing "background" tasks in udelay()
> / mdelay() implementations ;)
>

that is adurbin's threading model. I really like it.

A lot of times, concurrency will get you just as far as ||ism without the
nastiness.

But if we're going to make a full up kernel for rom, my suggestion is we
could start with a real kernel, perhaps linux. We could then rename
coreboot to, say, LinuxBIOS.

ron
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Nico Huber
On 13.02.2017 08:19, Andrey Petrov wrote:
> For example Apollolake is struggling to finish firmware boot with all
> the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE)
> under one second.
Can you provide exhaustive figures, which part of this system's boot
process takes how long? That would make it easier to reason about where
"parallelism" would provide a benefit.

> In order to address this problem we can do following things:
> 1. Add scheduler, early or not

Yes, but really doesn't fit into the coreboot idea, IMHO.

> 2. Add early MPinit code

No? um, at best very limited (by the number of threads the hardware sup-
ports).

> For [2] we have been working on prototype for Apollolake that does
> pre-memory MPinit. We've got to a stage where we can run C code on
> another core before DRAM is up (please do not try that at home, because
> you'd need custom experimental ucode). However, there are many questions
> what model to use and how to create infrastructure to run code in
> parallel in such early stage. Shall we just add "run this (mini) stage
> on this core" concept? Or shall we add tasklet/worklet structures that
> would allow code to live in run and when migration to DRAM happens have
> infrastructure take care of managing context and potentially resume it?
> One problem is that code running with CAR needs to stop by the time
> system is ready to tear down CAR and migrate to DRAM. We don't want to
> delay that by waiting on such task to complete. At the same time,
> certain task may have largely fluctuating run times so you would want to
> continue them. It is actually may be possible just to do that, if we use
> same address space for CAR and DRAM. But come to think of it, this is
> just a tip of iceberg and there are packs of other issues we would need
> to deal with.

Sounds very scary, as if it would never fit, not matter how strong you
push. If you really think, we should do something in parallel across
coreboot stages, it might be time to redesign the whole thing across
stages.

As long as there is a concept involving romstage/ramstage, we should
keep it to one thing in romstage: getting DRAM up. If this needs a
clumsy blob, then accept its time penalty.

> 
> Does any of that make sense? Perhaps somebody thought of this before?
> Let's see what may be other ways to deal with this challenge.

3. Design a driver architecture that doesn't suffer from io-waiting

This is something I kept in mind for payloads for some time now, but it
could also apply to later coreboot stages: Instead of busy waiting for
i/o, a driver could yield execution until it's called again. Obviously,
this only helps if there is more than one driver running in "parallel".
But it scales much better than one virtual core per driver...

Another idea just popped up: Performing "background" tasks in udelay()
/ mdelay() implementations ;)

I guess there are many more, maybe some viable, approaches to solve it
with only one thread of execution.

Anyway, I rather see this parallelism in payloads. Another thought: If
there is something in coreboot that really slows booting down, maybe
that could be moved into the payload?

Nico

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Patrick Georgi via coreboot
2017-02-13 8:19 GMT+01:00 Andrey Petrov :
> tl;dr:
> We are considering adding early parallel code execution in coreboot. We need
> to discuss how this can be done.
It's reasonable to discuss the "if" first.

> Nowadays we see firmware getting more complicated.
The coreboot mantra isn't just "boot fast", but also "boot simple".

On your "scheduler or MPinit" question, _if_ we have to go down that route:
I'd prefer a cooperative threaded single core scheduler, for one
simple reason: it's easier to reason about the correctness of code
that only ever ceases control at well-defined yield points. As you
said, those tasks are not CPU bound.
We also don't need experimental ucode for that even when running
threads in CAR ;-)


Patrick

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Andrey Petrov

Hi,

On 02/13/2017 10:22 AM, Timothy Pearson wrote:




For [2] we have been working on prototype for Apollolake that does
pre-memory MPinit. We've got to a stage where we can run C code on
another core before DRAM is up (please do not try that at home, because
you'd need custom experimental ucode).


In addition to the very valid points raised by others on this list, this
note in particular is concerning.  Whenever we start talking about
microcode, we're talking about yet another magic black box that coreboot
has no control over and cannot maintain.  Adding global functionality
that is so system specific in practice as to rely on microcode feature
support is not something I ever want to see, unless perhaps the relevant
portions of the microcode are open and maintainable by the coreboot project.


I am just talking about BIOS shadowing. This is a pretty standard 
feature, just that not every SoC implement it by default. Naturally, we 
would be only adding new code if it became publicly available. I believe 
shadowing works on many existing CPUs, so no, it is not "use this custom 
NDA-only ucode" to get the stuff working.


Andrey

--
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread ron minnich
What you're asking for is a parallelized or multicore coreboot IIUC.

We've done this before. I believe it was yhlu who implemented the multicore
DRAM startup on K8 ca. 2005 or so. I implemented a proof of concept
multi-core capability in coreboot in 2012. It was dead simple and based on
work we did in the NIX kernel, a very basic fork/join model. Instead of
halting after SMP startup, APs entered a state where they waited for
work.It worked. It was not well received at the time. Maybe it's time to
take a look at it again.

For your CAR case, all cores would have to finish before you moved into the
DRAM stage. Is that really a problem? I don't think based on your note that
you need such a complex model as found in linux with tasklets and
schedulers and such. A simple space-shared model ought to be sufficient.

Further, adurbin's concurrency (thread) model is a very nice API.

ron
-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Timothy Pearson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/13/2017 01:19 AM, Andrey Petrov wrote:

> For [2] we have been working on prototype for Apollolake that does
> pre-memory MPinit. We've got to a stage where we can run C code on
> another core before DRAM is up (please do not try that at home, because
> you'd need custom experimental ucode).

In addition to the very valid points raised by others on this list, this
note in particular is concerning.  Whenever we start talking about
microcode, we're talking about yet another magic black box that coreboot
has no control over and cannot maintain.  Adding global functionality
that is so system specific in practice as to rely on microcode feature
support is not something I ever want to see, unless perhaps the relevant
portions of the microcode are open and maintainable by the coreboot project.

In a nutshell, this proposal would make it even harder for any low-level
coreboot development on these systems to take place outside of Intel,
and as one of the main coreboot contractors this "soft lockdown" is
something we are strongly opposed to.  Furthermore, I suggest looking at
the AMD K8 memory init code -- some basic parallelism was introduced for
memory clear, but in the end the improved boot speed was not a "killer
feature" and had the side effect of making the code difficult to
maintain, leaving the K8 support permanently broken as of this writing.

- -- 
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645 (direct line)
+1 (512) 690-0200 (switchboard)
https://www.raptorengineering.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJYoflKAAoJEK+E3vEXDOFbNisH/0py+Ox2rSzuRzv/35YlvGeI
jkQITraudGstlx7GN/+nReVfOnulG7MKbuMsHLOCeXJz9/0JXsrd+XwJysVcALOJ
ESFQFnERbgjw/czkvVGAiHpJ9VkFfW3v0NqoeM6pe77bKXuDRULs5KGauXgFgeVI
j82gqtWVWB+x7wMedsvhvEeySRcGClfPew9CrJUe9kCqUxaAJZrrE8ZaZlXYpJ9N
fHEUKxQeKwEvVTO8CwuJKEqCCHukrU1ZZgMXOUtBxlkkH/WtrnN7s6o4oIQ/4RhJ
qwkuiA/rrlPC0b4F+piKuo1wXnqwE3NxUKdXKT4eaboovsPoeP7V2ISv8uw20SU=
=p6rz
-END PGP SIGNATURE-

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Andrey Petrov

Hi,

On 02/13/2017 12:21 AM, Zoran Stojsavljevic wrote:


IBVs can work on this proposal, and see how BIOS boot-up time will improve (by 
this parallelism)


There is no need to wait for anybody to see real-world benefits.

The original patch where you train eMMC link already saves some 50ms. 
However MP init kicks in very late. That is a limitation of current 
approach where MPinit depends on DRAM to be available. If you move 
mpinit earlier, you can already get approx 200ms saving. On Apollolake 
we have a prototype where MPinit happens in bootblock. That already 
reduces boot time by some 200ms.



Since, very soon, you'll run to shared HW resource, and then you'll need
to implement semaphores, atomic operations and God knows what!?


Fortunately, divine powers have nothing to do with it. Atomic operations 
are already implemented and spinlocks are in as well.


What other major issues you see, Zoran?

thanks
Andrey

--
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Andrey Petrov

Hi,

On 02/13/2017 06:05 AM, Peter Stuge wrote:

Andrey Petrov wrote:



Nowadays we see firmware getting more complicated.


Sorry, but that's nonsense. Indeed MSFT is pushing more and more
complicated requirements into the EFI/UEFI ecosystem, but that's
their problem, not a universal one.


I wish it was only MSFT. Chrome systems do a lot of work early on that 
is CPU intensive, and there waiting on secure hardware as well. Then 
there is the IO problem that original patch tries to address.



Your colleague wants to speed up boot time by moving storage driver
code from the payload into coreboot proper, but in fact this goes
directly against the design goals of coreboot, so here's a refresh:

* coreboot has *minimal* platform (think buses, not peripherals)
  initialization code

* A payload does everything further.


This is nice and clean design, no doubt about it. However, it is serial.

Another design goal of coreboot is to be fast. Do "be fast" and "be 
parallel" conflict?



For example Apollolake is struggling to finish firmware boot with all
the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE)
under one second. Interestingly, great deal of tasks that needs to be
done are not even computation-bound. They are IO bound.


How much of that time is spent in the FSP?


FSP is about 250ms grand total. However, that is not all that great if 
you compare to IO to load kernel over SHDCI (130ms) and initialize eMMC 
device itself (100-300ms). Not to mention other IO-bound tasks that can 
very well be started in parallel early.



how to create infrastructure to run code in parallel in such early stage


I think you are going in completely the wrong direction.

You want a scheduler, but that very clearly does not belong in coreboot.


Actually I am just interested in getting things to boot faster. It can 
be scheduling or parallel execution on secondary HW threads.



Shall we just add "run this (mini) stage on this core" concept?
Or shall we add tasklet/worklet structures


Neither. The correct engineering solution is very simple - adapt FSP
to fit into coreboot, instead of trying to do things the other way
around.


FSP definitely needs a lot of love to be more usable, I couldn't agree 
more. But if hardware needs be waited on and your initialization process 
is serial, you will end up wasting time on polling while you could be 
doing something else.



This means that your scheduler lives in the payload. There is already
precedent - SeaBIOS also already implements multitasking.


Unfortunately it is way too late to even make a dent on overall boot time.

Andrey

--
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Peter Stuge
Andrey Petrov wrote:
> We are considering adding early parallel code execution in coreboot.
> We need to discuss how this can be done.

No - first we need to duscuss *if* this should be done.


> Nowadays we see firmware getting more complicated.

Sorry, but that's nonsense. Indeed MSFT is pushing more and more
complicated requirements into the EFI/UEFI ecosystem, but that's
their problem, not a universal one.


Your colleague wants to speed up boot time by moving storage driver
code from the payload into coreboot proper, but in fact this goes
directly against the design goals of coreboot, so here's a refresh:

* coreboot has *minimal* platform (think buses, not peripherals)
  initialization code

* A payload does everything further.


> For example Apollolake is struggling to finish firmware boot with all 
> the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) 
> under one second. Interestingly, great deal of tasks that needs to be 
> done are not even computation-bound. They are IO bound.

How much of that time is spent in the FSP?


> scheduler
..
> how to create infrastructure to run code in parallel in such early stage

I think you are going in completely the wrong direction.

You want a scheduler, but that very clearly does not belong in coreboot.


> Shall we just add "run this (mini) stage on this core" concept?
> Or shall we add tasklet/worklet structures 

Neither. The correct engineering solution is very simple - adapt FSP
to fit into coreboot, instead of trying to do things the other way
around.

This means that your scheduler lives in the payload. There is already
precedent - SeaBIOS also already implements multitasking.


> this is just a tip of iceberg

That's exactly why it has no place within coreboot, but belongs in
a payload.


//Peter

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot


Re: [coreboot] Add coreboot storage driver

2017-02-13 Thread Zoran Stojsavljevic
Hello Andrey,

> Does any of that make sense? Perhaps somebody thought of this before?
Let's see what may be other ways to deal with this challenge.

No, it does not. What you are proposing, in-fact, is to make boot-loader as
quasi (adding scheduler) HW multithreading OS sans MMU (actually, dealing
with two (for now) HW threads). And you have chosen Coreboot to implement
this.

I will suggest what you are proposing first to be done in true BIOS, so
IBVs can work on this proposal, and see how BIOS boot-up time will improve
(by this parallelism). Besides, BIOS is much slower (UEFI BIOSes boot in
the range of 30 seconds), and should be faster. And... BIOS is closed
source, thus there is major business task which should go there, Project
Management and some few millions of $$ USD to be spent on this project.
Paid for by INTEL and INTEL BIOS Vendors. ;-)

Besides, one only knows what the next challenge is (repeating your
words:  *"...this
is just a tip of iceberg and there are packs of other issues we would need
to deal with."*

Since, very soon, you'll run to shared HW resource, and then you'll need to
implement semaphores, atomic operations and God knows what!?

My two cent thinking (after all, this is only me, Zoran, independept
self-contributor),
Zoran

On Mon, Feb 13, 2017 at 8:19 AM, Andrey Petrov 
wrote:

> Hi there,
>
> tl;dr:
> We are considering adding early parallel code execution in coreboot. We
> need to discuss how this can be done.
>
> Nowadays we see firmware getting more complicated. At the same time CPUs
> do not necessarily catch up. Furthermore, recent increases in performance
> can be largely attributed to parallelism and stuffing more cores on die
> rather than sheer core computing power. However, firmware typically runs on
> just one CPU and is effectively barred from all the parallelism goodies
> available to OS software.
>
> For example Apollolake is struggling to finish firmware boot with all the
> whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) under
> one second. Interestingly, great deal of tasks that needs to be done are
> not even computation-bound. They are IO bound. In case of SDHCI below it is
> possible to train eMMC link to switch from default low-freq single data
> rate (sdr50) mode to high frequency dual data rate mode (hs400). This link
> training increases eMMC throughput by factor by 15-20. As result time it
> takes to load kernel in depthcharge goes down from 130ms to 10ms. However,
> the training sequence requires constant by frequent CPU attention. As
> result, it doesn't make any sense to try to turn on higher-frequency modes
> because you don't get any net win. We also experimented by starting work in
> current MPinit code. Unfortunately it starts pretty late in the game and we
> do not have enough parallel time to reap meaningful benefit.
>
> In order to address this problem we can do following things:
> 1. Add scheduler, early or not
> 2. Add early MPinit code
>
> For [1] I am aware of one scheduler discussion in 2013, but that was long
> time ago and things may have moved a bit. I do not want to be a necromancer
> and reanimate old discussion, but does anybody see it as a useful/viable
> thing to do?
>
> For [2] we have been working on prototype for Apollolake that does
> pre-memory MPinit. We've got to a stage where we can run C code on another
> core before DRAM is up (please do not try that at home, because you'd need
> custom experimental ucode). However, there are many questions what model to
> use and how to create infrastructure to run code in parallel in such early
> stage. Shall we just add "run this (mini) stage on this core" concept? Or
> shall we add tasklet/worklet structures that would allow code to live in
> run and when migration to DRAM happens have infrastructure take care of
> managing context and potentially resume it? One problem is that code
> running with CAR needs to stop by the time system is ready to tear down CAR
> and migrate to DRAM. We don't want to delay that by waiting on such task to
> complete. At the same time, certain task may have largely fluctuating run
> times so you would want to continue them. It is actually may be possible
> just to do that, if we use same address space for CAR and DRAM. But come to
> think of it, this is just a tip of iceberg and there are packs of other
> issues we would need to deal with.
>
> Does any of that make sense? Perhaps somebody thought of this before?
> Let's see what may be other ways to deal with this challenge.
>
> thanks
> Andrey
>
>
>
> On 01/25/2017 03:16 PM, Guvendik, Bora wrote:
>
>> Port sdhci and mmc driver from depthcharge to coreboot. The purpose is
>> to speed up boot time by starting
>>
>> storage initialization on another CPU in parallel. On the Apollolake
>> systems we checked, we found that cpu can take
>>
>> up to 300ms sending CMD1s to HW, so we can avoid this delay by
>> parallelizing.
>>
>>
>>
>> - Why not add this 

Re: [coreboot] Add coreboot storage driver

2017-02-12 Thread Andrey Petrov

Hi there,

tl;dr:
We are considering adding early parallel code execution in coreboot. We 
need to discuss how this can be done.


Nowadays we see firmware getting more complicated. At the same time CPUs 
do not necessarily catch up. Furthermore, recent increases in 
performance can be largely attributed to parallelism and stuffing more 
cores on die rather than sheer core computing power. However, firmware 
typically runs on just one CPU and is effectively barred from all the 
parallelism goodies available to OS software.


For example Apollolake is struggling to finish firmware boot with all 
the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) 
under one second. Interestingly, great deal of tasks that needs to be 
done are not even computation-bound. They are IO bound. In case of SDHCI 
below it is possible to train eMMC link to switch from default low-freq 
single data rate (sdr50) mode to high frequency dual data rate mode 
(hs400). This link training increases eMMC throughput by factor by 
15-20. As result time it takes to load kernel in depthcharge goes down 
from 130ms to 10ms. However, the training sequence requires constant by 
frequent CPU attention. As result, it doesn't make any sense to try to 
turn on higher-frequency modes because you don't get any net win. We 
also experimented by starting work in current MPinit code. Unfortunately 
it starts pretty late in the game and we do not have enough parallel 
time to reap meaningful benefit.


In order to address this problem we can do following things:
1. Add scheduler, early or not
2. Add early MPinit code

For [1] I am aware of one scheduler discussion in 2013, but that was 
long time ago and things may have moved a bit. I do not want to be a 
necromancer and reanimate old discussion, but does anybody see it as a 
useful/viable thing to do?


For [2] we have been working on prototype for Apollolake that does 
pre-memory MPinit. We've got to a stage where we can run C code on 
another core before DRAM is up (please do not try that at home, because 
you'd need custom experimental ucode). However, there are many questions 
what model to use and how to create infrastructure to run code in 
parallel in such early stage. Shall we just add "run this (mini) stage 
on this core" concept? Or shall we add tasklet/worklet structures that 
would allow code to live in run and when migration to DRAM happens have 
infrastructure take care of managing context and potentially resume it? 
One problem is that code running with CAR needs to stop by the time 
system is ready to tear down CAR and migrate to DRAM. We don't want to 
delay that by waiting on such task to complete. At the same time, 
certain task may have largely fluctuating run times so you would want to 
continue them. It is actually may be possible just to do that, if we use 
same address space for CAR and DRAM. But come to think of it, this is 
just a tip of iceberg and there are packs of other issues we would need 
to deal with.


Does any of that make sense? Perhaps somebody thought of this before? 
Let's see what may be other ways to deal with this challenge.


thanks
Andrey


On 01/25/2017 03:16 PM, Guvendik, Bora wrote:

Port sdhci and mmc driver from depthcharge to coreboot. The purpose is
to speed up boot time by starting

storage initialization on another CPU in parallel. On the Apollolake
systems we checked, we found that cpu can take

up to 300ms sending CMD1s to HW, so we can avoid this delay by
parallelizing.



- Why not add this parallelization in the payload instead?

There is potentially more time to parallelize things in
coreboot. Payload execution is much faster,

so we don't get much parallel execution time.



- Why not send CMD1 once in coreboot to trigger power-up and let HW
initialize using only 1 cpu?

Jedec spec requires the CPU to keep sending CMD1s when
the hardware is busy (section 6.4.3). We tested

with real-world hardware and it indeed didn't work with
a single CMD1.



- Why did you port the driver from depthcharge?

I wanted to use a driver that is proven to avoid bugs.
It is also easier to apply patches back and forth.



https://review.coreboot.org/#/c/18105



Thanks

Bora







--
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot