Re: [coreboot] Add coreboot storage driver
> > So what we can see is that everything is serial and there is great deal of > waiting. For that specific SDHCI case you can see "Storage device > initialization" that is happening in depthcharge. That is CMD1 that you > need keep on sending to the controller. As you can see, it completes in > 130ms. Unfortunately you really can't just send CMD1 and go about your > business. You need to poll readiness status and keep on sending CMD1 again > and again. Also, it is not always 130ms. It tends to vary and worst case we > seen was over 300ms. Do you actually have an eMMC part that requires repeating CMD1 within a certain bounded time interval? What happens if you violate that? Does it just not progress initialization or does it actually fail in some way? I can't find any official documentation suggesting that this is really required. JESD84-B51 just says (6.4.3): "The busy bit in the CMD1 response can be used by a device to tell the host that it is still working on its power-up/reset procedure (e.g., downloading the register information from memory field) and is not ready yet for communication. In this case the host must repeat CMD1 until the busy bit is cleared." This suggests that the only point of the command is polling for readiness. > Another one is "kernel read", which is pure IO and takes 132ms. If you > invest some 300ms in training the link (has to happen on every boot on > every board) to HS400 you can read it in just 10ms. Naturally you can't see > HS400 in the picture because enabling it late in the boot flow would be > counter productive. > Have you considered implementing HS400-ES (enhanced strobe) support in your host controllers? That feature allows you to run at HS400 speeds immediately without any tuning (by essentially turning the clock master around and having the device pulse its own clock when it's sending data IIRC). We've had great success improving boot speed with that on a different Chrome OS platform. This won't help you for your current generation of SoCs yet, but at least it should resolve the tuning issue in the long run as this feature becomes more standard (so this issue shouldn't actually get worse and worse in the future... it should go away again). -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hi, On 02/13/2017 11:16 AM, Nico Huber wrote: On 13.02.2017 08:19, Andrey Petrov wrote: For example Apollolake is struggling to finish firmware boot with all the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) under one second. Can you provide exhaustive figures, which part of this system's boot process takes how long? That would make it easier to reason about where "parallelism" would provide a benefit. Such data is available. Here is a boot chart I drew few months back: http://imgur.com/a/huyPQ I color-coded different work types. Some blocks are coded incorrectly please bear with me). So what we can see is that everything is serial and there is great deal of waiting. For that specific SDHCI case you can see "Storage device initialization" that is happening in depthcharge. That is CMD1 that you need keep on sending to the controller. As you can see, it completes in 130ms. Unfortunately you really can't just send CMD1 and go about your business. You need to poll readiness status and keep on sending CMD1 again and again. Also, it is not always 130ms. It tends to vary and worst case we seen was over 300ms. Another one is "kernel read", which is pure IO and takes 132ms. If you invest some 300ms in training the link (has to happen on every boot on every board) to HS400 you can read it in just 10ms. Naturally you can't see HS400 in the picture because enabling it late in the boot flow would be counter productive. That's essentially the motivation to why we are looking into starting this CMD1 and HS400 link training as early as possible. However fixing this particular issue is just a "per-platform" fix. I was hoping we could come up with a model that adds parallelism as a generic reusable feature not just a quick hack. Andrey -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hi All, We started looking at doing things in parallel speed the boot process and meet the ChromeOS boot time requirements. One of the larger portions of boot time is memory initialization which is why we are considering doing parallelism early. On Chromebooks, the Intel boot path is using bootstage/verstage to determine which version of romstage should run. Memory initialization is being done during romstage which is now replaceable in the field. One approach to parallelism is to use additional processors. This approach requires that the cores start in the bootblock and transition to romstage to perform work in parallel with memory initialization. While this approach has a number of issues, it is a path that might work with the existing FSP architecture. Other single-thread approaches are also possible but most likely require changes to the FSP architecture to do work in parallel with memory initialization. This thread has discussed multiple alternatives to achieve parallelism. At this time we are not considering any type of preemptive mechanism. Currently we are investigating alternatives and what benefits they bring. If our investigation indicates that the parallelism significantly reduces the boot time and that the code is easy to develop and understand then we will share the patches with the coreboot community for further review and comment. Until then we welcome constructive ways to enable coreboot to do things in parallel to reduce the boot time. Thanks for your help, Lee Leahy (425) 881-4919 Intel Corporation Suite 125 2700 - 156th Ave NE Bellevue, WA 98007-6554 -Original Message- From: coreboot [mailto:coreboot-boun...@coreboot.org] On Behalf Of Nico Huber Sent: Tuesday, February 14, 2017 11:07 AM To: ron minnich <rminn...@gmail.com>; Aaron Durbin <adur...@google.com> Cc: Petrov, Andrey <andrey.pet...@intel.com>; Coreboot <coreboot@coreboot.org> Subject: Re: [coreboot] Add coreboot storage driver On 14.02.2017 18:56, ron minnich wrote: > At what point is ramstage a kernel? I think at the point we add file > systems or preemptive scheduling. We're getting dangerously close. If > we really start to cross that boundary, it's time to rethink the > ramstage in my view. It's not a good foundation for a kernel. Agreed. I wouldn't call it a kernel, but it really seems to grow very ugly. Every time I think about this, I scarcely find anything that needs to be done in ramstage. I believe even most payloads could live without it with some more initialization done in romstage. Some things that I recall, what ramstage does: o MP init => maybe can be done earlier, does it need RAM generally??? o PCI resource allocation => can be done offline Just add the resources to the devicetree. If you want to boot from a plugged card, that isn't in the devicetree, the payload would have to handle it though. o Those small PCI device "drivers" => I doubt they need RAM o Table generation => Not that dynamic after all I suppose much is done with static (compile time) information. o Sometimes gfx modesetting => do it in the payload Nico -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Tue, Feb 14, 2017 at 1:07 PM, Patrick Georgiwrote: > 2017-02-14 17:12 GMT+01:00 Aaron Durbin via coreboot : >> For an optimized bootflow >> all pieces of work that need to be done pretty much need to be closely >> coupled. One needs to globally optimize the full sequence. > Like initializing slow hardware even before RAM init (as long as it's > just an initial command)? > How about using PIT/IRQ0 plus some tiny register-only interrupt > routine to do trivial register wrangling (we do have register > scripts)? I don't think I properly understand your suggestion. For this particular eMMC case are you suggesting taking the PIT interrupt and doing the next piece of work in it? > >> that we seem to be absolutely needing to >> maintain boot speeds. Is Chrome OS going against tide of coreboot >> wanting to solve those sorts of issues? > The problem is that two basic qualities collide here: speed and > simplicity. The effect is that people ask to stop a second to > reconsider the options. > MPinit and parallelism are the "go to" solution for all performance > related issues of the last 10 years, but they're not without cost. > Questioning this approach doesn't mean that that we shouldn't go there > at all, just that the obvious answers might not lead to simple > solutions. > > As Andrey stated elsewhere, we're far from CPU bound. Agreed. But our chunking of work is very coarsely sectioned up. I think the other CPU path is an attempt to work around the coarseness of the work steps in the dependency chain. > > For his concrete example: does eMMC init fail if you ping it more > often than every 10ms? It better not, you already stated that it's > hard to guarantee those 10ms, so there needs to be some spare room. We > could look at the largest chunk of init process that could be > restructured to implement cooperative multithreading on a single core > for as many tasks as possible, to cut down on all those udelays (or > even mdelays). Maybe we could even build a compiler plugin to ensure > at compile time that the resulting code is proper (loops either have > low bounds or are yielding, yield()/sched()/... aren't called within > critical sections)... That's a possibility, but you have to solve the case for each combination of hardware present and/or per platform. Building up the dependency chain is the most important piece. And from there to ensure execution context is not lost for longer than a set amount of time. We're miles away from that since we're run to completion serially right now. > > Once we leave that scheduling to physics (ie enabling multicore > operation), all bets are off (or we have to synchronize the execution > to a degree that we could just as well do it manually). A lot of > complexity just to have 8 times the CPU power for the same amount of > IO bound tasks. > > > Patrick -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Listen Timothy INTEL is FW (we can argue here, I do agree) and SW (NOT at all any argument, it is an aksioma) very crappy company. I know that INTEL CCG directors ordered people to watch me over, and, personally, I do NOT care. Really I don't. I worked for 5 years for INTEL support in Bavaria. But let me tell you one thing. Whatever/However I do NOT (somehow) trust to INTEL FW and especially (in concrete) TO SW (mostly piece of junk, they produce), I will give my life in/to IA (INTEL Architecture) HW top-notch designers hands. Blindly. Now. Ever! INTEL has IA HW group, which is The Best of The Best. I (opposing that I should NOT know them) know couple of guys there. And they... Are... ! They can make this what I am proposing (not only me) happen. It is just about The (Crapy) Politics. OK? Zoran [OUT] On Tue, Feb 14, 2017 at 8:45 PM, Timothy Pearson < tpear...@raptorengineering.com> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 02/14/2017 01:36 PM, Zoran Stojsavljevic wrote: > >> Where do we go from here? > > > > As I said (and I'll repeat, many times, if required - I do NOT care what > > all INTEL [all their 13000+ managers] think): > > > > /I have another idea for INTEL SoCs/CPUs, as HW architecture > > improvement. Why your top-notch HW guys do NOT implement MRC as part of > > MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it? > > MRCs should be few K in size, and they can perfectly fit in there, thus > > MRC should be (my take on this) part of internal CPU architecture./ > > I highly doubt this would ever happen, unless it's yet another signed > blob with highly privileged access. The main problem is that memory > initialisation is very complex, and there is a definite need to be able > to issue updates when / if a CPU / MB / DIMM combination fails to > function correctly. > > Personally, having worked on RAM initialisation for many different > systems (embedded to server), I find it ludicrous that this can be > considered top secret IP worthy of a closed blob. It takes time to get > right, but the end result is inextricably tied to the hardware in > question and is really not much more than a hardware-specific > implementation of the bog-standard and widely known DDR init algorithms. > > Intel, why the blob? What's hiding in there? Asian companies I know > tend to keep things closed to avoid patent lawsuits over stolen IP, but > I highly doubt you have this problem? > > Just my *personal* $0.02 here. :-) > > - -- > Timothy Pearson > Raptor Engineering > +1 (415) 727-8645 (direct line) > +1 (512) 690-0200 (switchboard) > https://www.raptorengineering.com > -BEGIN PGP SIGNATURE- > Version: GnuPG v1 > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJYo15LAAoJEK+E3vEXDOFbCCMH/RazQCmj3rW8a+4hWG0Wt2zR > eGrSnsoEgfnZOCiPO6lLQX3w799kr3lZUu02HtS0CvsHbIpzUPQ7cdNzcd6kgtMN > /AzYNonmU6PM2dPjAMyrtk7oVInN7VsakfE3RaAvkqh9SdBH7z35AWbsgjXk1iQA > +6n7EXFy1cfJKqk2OdNVcWCCf4b8tEZ5n9WcWufhLie0z/r7Fll4Jsk2UF39G+P4 > lsSp757bO2Y2juoEPmPT+lm6akSrV8h37FirnmIFvzbPhpGfJ2IrwJkjRpgK4Ypn > 9hrvDMctmvq6X+FtW6y7Eer94i1jzXvWSjXJlOSnHPGJKAh946EhztmRijceMf4= > =4Oil > -END PGP SIGNATURE- > -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Tue, Feb 14, 2017 at 1:06 PM, Nico Huberwrote: > On 14.02.2017 18:56, ron minnich wrote: >> At what point is ramstage a kernel? I think at the point we add file >> systems or preemptive scheduling. We're getting dangerously close. If we >> really start to cross that boundary, it's time to rethink the ramstage in >> my view. It's not a good foundation for a kernel. > > Agreed. I wouldn't call it a kernel, but it really seems to grow very > ugly. Every time I think about this, I scarcely find anything that needs > to be done in ramstage. I believe even most payloads could live without > it with some more initialization done in romstage. > > Some things that I recall, what ramstage does: > > o MP init => maybe can be done earlier, does it need RAM generally??? You need a stack and vector for the SIPI to be somewhere. > > o PCI resource allocation => can be done offline > Just add the resources to the devicetree. If you want to boot > from a plugged card, that isn't in the devicetree, the payload > would have to handle it though. This largely works with static allocation, but the question is if you want to handle different SKUs of devices that have different hardware behind root bridges. You need to recalculate the IO windows. You could produce a signature of devices and leverage that for picking the right static allocation. Doable, but gets kinda funky needing to run the allocation pass for each configuration and ensuring its updated properly. > > o Those small PCI device "drivers" => I doubt they need RAM That's how their initialization code is currently scheduled. May not need RAM, but I'm not sure that's what makes them distinctive nor why you bring this up? Just a regular pci device doesn't need anything in practice. It's the workarounds and things that need to be done for power optimization, etc is where the complexity arises. Using pci device "drivers" as a proxy for all pci devices isn't representative. > > o Table generation => Not that dynamic after all > I suppose much is done with static (compile time) information. Sure, if you go and analyze devicetree.cb to know all the options. Tables have quite a few things that change based on runtime attributes aside from that. For example, a single firmware build can support a different number SoC models that have largely different numbers of CPUs, etc or support different feature sets that require different table generation. > > o Sometimes gfx modesetting => do it in the payload > > Nico -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/14/2017 01:36 PM, Zoran Stojsavljevic wrote: >> Where do we go from here? > > As I said (and I'll repeat, many times, if required - I do NOT care what > all INTEL [all their 13000+ managers] think): > > /I have another idea for INTEL SoCs/CPUs, as HW architecture > improvement. Why your top-notch HW guys do NOT implement MRC as part of > MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it? > MRCs should be few K in size, and they can perfectly fit in there, thus > MRC should be (my take on this) part of internal CPU architecture./ I highly doubt this would ever happen, unless it's yet another signed blob with highly privileged access. The main problem is that memory initialisation is very complex, and there is a definite need to be able to issue updates when / if a CPU / MB / DIMM combination fails to function correctly. Personally, having worked on RAM initialisation for many different systems (embedded to server), I find it ludicrous that this can be considered top secret IP worthy of a closed blob. It takes time to get right, but the end result is inextricably tied to the hardware in question and is really not much more than a hardware-specific implementation of the bog-standard and widely known DDR init algorithms. Intel, why the blob? What's hiding in there? Asian companies I know tend to keep things closed to avoid patent lawsuits over stolen IP, but I highly doubt you have this problem? Just my *personal* $0.02 here. :-) - -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 (direct line) +1 (512) 690-0200 (switchboard) https://www.raptorengineering.com -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJYo15LAAoJEK+E3vEXDOFbCCMH/RazQCmj3rW8a+4hWG0Wt2zR eGrSnsoEgfnZOCiPO6lLQX3w799kr3lZUu02HtS0CvsHbIpzUPQ7cdNzcd6kgtMN /AzYNonmU6PM2dPjAMyrtk7oVInN7VsakfE3RaAvkqh9SdBH7z35AWbsgjXk1iQA +6n7EXFy1cfJKqk2OdNVcWCCf4b8tEZ5n9WcWufhLie0z/r7Fll4Jsk2UF39G+P4 lsSp757bO2Y2juoEPmPT+lm6akSrV8h37FirnmIFvzbPhpGfJ2IrwJkjRpgK4Ypn 9hrvDMctmvq6X+FtW6y7Eer94i1jzXvWSjXJlOSnHPGJKAh946EhztmRijceMf4= =4Oil -END PGP SIGNATURE- -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
> Where do we go from here? As I said (and I'll repeat, many times, if required - I do NOT care what all INTEL [all their 13000+ managers] think): *I have another idea for INTEL SoCs/CPUs, as HW architecture improvement. Why your top-notch HW guys do NOT implement MRC as part of MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it? MRCs should be few K in size, and they can perfectly fit in there, thus MRC should be (my take on this) part of internal CPU architecture.* *Today's INTEL COREs and ATOMs have at least/minimum 100M gates, why not to add couple of dozen K more? Lot of problems solved, don't they? ;-)* *[1] BOOT stage to be much shorter (no anything such as CAR phase);* *[2] ROM stage does not exist;* *[3] IP preserved in HW, so the whole INTEL FSP is actually (imagine the Beauty) Open Source...* With INTEL. Here we go! Where no one has gone before! ;-) Zoran On Tue, Feb 14, 2017 at 6:56 PM, ron minnichwrote: > Just a reminder about times past. This discussion has been ongoing since > 2000. In my view the questions come down to how much the ramstage does, how > that impacts code complexity and performance, and when the ramstage gets so > much capability that it ought to be a kernel. > > In the earliest iteration, there was no ramstage per se. What we now call > the ramstage was a Linux kernel. > > We had lots of discussions in the early days with LNXI and others about > what would boot fastest, a dedicated boot loader like etherboot or a > general purpose kernel like Linux. In all the cases we measured at Los > Alamos, Linux always won, easily: yes, slower to load than etherboot, more > startup overhead, but once started Linux support for concurrency and > parallelism always won the day. Loaders like etherboot (and its descendant, > iPXE) spend most of their time doing nothing (as measured at the time). It > was fun to boot 1000 nodes in the time it took PXE on one node to find a > connected NIC. > > The arguments over payload ended when the FLASH sockets changed to QFP and > maxed at 256K and Linux could no longer fit. > > But if your goal is fast boot, in fact if your goal is 800 miliseconds, we > know this can work on slow ARMs with Linux, as was shown in 2006. > > The very first ramstage was created because Linux could not correctly > configure a PCI bus in 2000. The core of the ramstage as we know it was the > PCI config. > > We wanted to have ramstage only do PCI setup. We initially put SMP startup > in Linux, which worked on all but K7, at which point ramstage took on SMP > startup too. And ramstage started to grow. The growth has never stopped. > > At what point is ramstage a kernel? I think at the point we add file > systems or preemptive scheduling. We're getting dangerously close. If we > really start to cross that boundary, it's time to rethink the ramstage in > my view. It's not a good foundation for a kernel. > > I've experimented with kernel-as-ramstage with harvey on the riscv and it > worked. In this case, I manually removed the ramstage from coreboot.rom and > replaced it with a kernel. It would be interesting, to me at least, to have > a Kconfig option whereby we can replace the ramstage with some other ELF > file, to aid such exploration. > > I also wonder if we're not at a fork in the road in some ways. There are > open systems, like RISCV, in which we have full control and can really get > flexibility in how we boot. We can influence the RISCV vendors not to > implement hardware designs that have negative impact on firmware and boot > time performance. And then there are closed systems, like x86, in which > many opportunities for optimization are lost, and we have little > opportunity to impact hardware design. We also can't get very smart on x86 > because the FSP boulder blocks the road. > > Where do we go from here? > > ron > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot > -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
2017-02-14 17:12 GMT+01:00 Aaron Durbin via coreboot: > For an optimized bootflow > all pieces of work that need to be done pretty much need to be closely > coupled. One needs to globally optimize the full sequence. Like initializing slow hardware even before RAM init (as long as it's just an initial command)? How about using PIT/IRQ0 plus some tiny register-only interrupt routine to do trivial register wrangling (we do have register scripts)? > that we seem to be absolutely needing to > maintain boot speeds. Is Chrome OS going against tide of coreboot > wanting to solve those sorts of issues? The problem is that two basic qualities collide here: speed and simplicity. The effect is that people ask to stop a second to reconsider the options. MPinit and parallelism are the "go to" solution for all performance related issues of the last 10 years, but they're not without cost. Questioning this approach doesn't mean that that we shouldn't go there at all, just that the obvious answers might not lead to simple solutions. As Andrey stated elsewhere, we're far from CPU bound. For his concrete example: does eMMC init fail if you ping it more often than every 10ms? It better not, you already stated that it's hard to guarantee those 10ms, so there needs to be some spare room. We could look at the largest chunk of init process that could be restructured to implement cooperative multithreading on a single core for as many tasks as possible, to cut down on all those udelays (or even mdelays). Maybe we could even build a compiler plugin to ensure at compile time that the resulting code is proper (loops either have low bounds or are yielding, yield()/sched()/... aren't called within critical sections)... Once we leave that scheduling to physics (ie enabling multicore operation), all bets are off (or we have to synchronize the execution to a degree that we could just as well do it manually). A lot of complexity just to have 8 times the CPU power for the same amount of IO bound tasks. Patrick -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On 14.02.2017 18:56, ron minnich wrote: > At what point is ramstage a kernel? I think at the point we add file > systems or preemptive scheduling. We're getting dangerously close. If we > really start to cross that boundary, it's time to rethink the ramstage in > my view. It's not a good foundation for a kernel. Agreed. I wouldn't call it a kernel, but it really seems to grow very ugly. Every time I think about this, I scarcely find anything that needs to be done in ramstage. I believe even most payloads could live without it with some more initialization done in romstage. Some things that I recall, what ramstage does: o MP init => maybe can be done earlier, does it need RAM generally??? o PCI resource allocation => can be done offline Just add the resources to the devicetree. If you want to boot from a plugged card, that isn't in the devicetree, the payload would have to handle it though. o Those small PCI device "drivers" => I doubt they need RAM o Table generation => Not that dynamic after all I suppose much is done with static (compile time) information. o Sometimes gfx modesetting => do it in the payload Nico -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Tue, Feb 14, 2017 at 11:56 AM, ron minnichwrote: > Just a reminder about times past. This discussion has been ongoing since > 2000. In my view the questions come down to how much the ramstage does, how > that impacts code complexity and performance, and when the ramstage gets so > much capability that it ought to be a kernel. > > In the earliest iteration, there was no ramstage per se. What we now call > the ramstage was a Linux kernel. > > We had lots of discussions in the early days with LNXI and others about what > would boot fastest, a dedicated boot loader like etherboot or a general > purpose kernel like Linux. In all the cases we measured at Los Alamos, Linux > always won, easily: yes, slower to load than etherboot, more startup > overhead, but once started Linux support for concurrency and parallelism > always won the day. Loaders like etherboot (and its descendant, iPXE) spend > most of their time doing nothing (as measured at the time). It was fun to > boot 1000 nodes in the time it took PXE on one node to find a connected NIC. > > The arguments over payload ended when the FLASH sockets changed to QFP and > maxed at 256K and Linux could no longer fit. > > But if your goal is fast boot, in fact if your goal is 800 miliseconds, we > know this can work on slow ARMs with Linux, as was shown in 2006. > > The very first ramstage was created because Linux could not correctly > configure a PCI bus in 2000. The core of the ramstage as we know it was the > PCI config. > > We wanted to have ramstage only do PCI setup. We initially put SMP startup > in Linux, which worked on all but K7, at which point ramstage took on SMP > startup too. And ramstage started to grow. The growth has never stopped. > > At what point is ramstage a kernel? I think at the point we add file systems > or preemptive scheduling. We're getting dangerously close. If we really > start to cross that boundary, it's time to rethink the ramstage in my view. > It's not a good foundation for a kernel. > > I've experimented with kernel-as-ramstage with harvey on the riscv and it > worked. In this case, I manually removed the ramstage from coreboot.rom and > replaced it with a kernel. It would be interesting, to me at least, to have > a Kconfig option whereby we can replace the ramstage with some other ELF > file, to aid such exploration. > > I also wonder if we're not at a fork in the road in some ways. There are > open systems, like RISCV, in which we have full control and can really get > flexibility in how we boot. We can influence the RISCV vendors not to > implement hardware designs that have negative impact on firmware and boot > time performance. And then there are closed systems, like x86, in which many > opportunities for optimization are lost, and we have little opportunity to > impact hardware design. We also can't get very smart on x86 because the FSP > boulder blocks the road. > > Where do we go from here? That I'm not sure. And it does very much depend on the goals of the project. I will say this, though. Not all architectures are the same so comparing them both as apples is impossible. With ARM punting almost all of its initialization to ATF or the kernel it's not surprising that coreboot's current architecture is simple and easy for it. The work has just been pushed into other places. For some reason Intel continually decides to place a large amount of things into the firmware to do, but I think that decision is usually taken because it keeps the kernel simpler. The complexity just got moved to a different place in the stack. Coupled with the decision to hide the SoC support into a closed off blob just makes things worse. When comparing an Intel solution to an ARM vendor the SoC bits for bring up are much more open and thus easier to optimize, if needed. As noted before you can't punt things out on x86 where device visibility needs be configured prior to resource allocation so there's definitely intertwining involved in bringing up the intel SoCs. Firmware is inherently exposed to the micro-architecture of the underlying device. There's not a good way around that. Acting like it's not doesn't solve that problem. > > ron -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Just a reminder about times past. This discussion has been ongoing since 2000. In my view the questions come down to how much the ramstage does, how that impacts code complexity and performance, and when the ramstage gets so much capability that it ought to be a kernel. In the earliest iteration, there was no ramstage per se. What we now call the ramstage was a Linux kernel. We had lots of discussions in the early days with LNXI and others about what would boot fastest, a dedicated boot loader like etherboot or a general purpose kernel like Linux. In all the cases we measured at Los Alamos, Linux always won, easily: yes, slower to load than etherboot, more startup overhead, but once started Linux support for concurrency and parallelism always won the day. Loaders like etherboot (and its descendant, iPXE) spend most of their time doing nothing (as measured at the time). It was fun to boot 1000 nodes in the time it took PXE on one node to find a connected NIC. The arguments over payload ended when the FLASH sockets changed to QFP and maxed at 256K and Linux could no longer fit. But if your goal is fast boot, in fact if your goal is 800 miliseconds, we know this can work on slow ARMs with Linux, as was shown in 2006. The very first ramstage was created because Linux could not correctly configure a PCI bus in 2000. The core of the ramstage as we know it was the PCI config. We wanted to have ramstage only do PCI setup. We initially put SMP startup in Linux, which worked on all but K7, at which point ramstage took on SMP startup too. And ramstage started to grow. The growth has never stopped. At what point is ramstage a kernel? I think at the point we add file systems or preemptive scheduling. We're getting dangerously close. If we really start to cross that boundary, it's time to rethink the ramstage in my view. It's not a good foundation for a kernel. I've experimented with kernel-as-ramstage with harvey on the riscv and it worked. In this case, I manually removed the ramstage from coreboot.rom and replaced it with a kernel. It would be interesting, to me at least, to have a Kconfig option whereby we can replace the ramstage with some other ELF file, to aid such exploration. I also wonder if we're not at a fork in the road in some ways. There are open systems, like RISCV, in which we have full control and can really get flexibility in how we boot. We can influence the RISCV vendors not to implement hardware designs that have negative impact on firmware and boot time performance. And then there are closed systems, like x86, in which many opportunities for optimization are lost, and we have little opportunity to impact hardware design. We also can't get very smart on x86 because the FSP boulder blocks the road. Where do we go from here? ron -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Mon, Feb 13, 2017 at 1:16 PM, Nico Huberwrote: > On 13.02.2017 08:19, Andrey Petrov wrote: >> For example Apollolake is struggling to finish firmware boot with all >> the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) >> under one second. > Can you provide exhaustive figures, which part of this system's boot > process takes how long? That would make it easier to reason about where > "parallelism" would provide a benefit. > >> In order to address this problem we can do following things: >> 1. Add scheduler, early or not > > Yes, but really doesn't fit into the coreboot idea, IMHO. > >> 2. Add early MPinit code > > No? um, at best very limited (by the number of threads the hardware sup- > ports). > >> For [2] we have been working on prototype for Apollolake that does >> pre-memory MPinit. We've got to a stage where we can run C code on >> another core before DRAM is up (please do not try that at home, because >> you'd need custom experimental ucode). However, there are many questions >> what model to use and how to create infrastructure to run code in >> parallel in such early stage. Shall we just add "run this (mini) stage >> on this core" concept? Or shall we add tasklet/worklet structures that >> would allow code to live in run and when migration to DRAM happens have >> infrastructure take care of managing context and potentially resume it? >> One problem is that code running with CAR needs to stop by the time >> system is ready to tear down CAR and migrate to DRAM. We don't want to >> delay that by waiting on such task to complete. At the same time, >> certain task may have largely fluctuating run times so you would want to >> continue them. It is actually may be possible just to do that, if we use >> same address space for CAR and DRAM. But come to think of it, this is >> just a tip of iceberg and there are packs of other issues we would need >> to deal with. > > Sounds very scary, as if it would never fit, not matter how strong you > push. If you really think, we should do something in parallel across > coreboot stages, it might be time to redesign the whole thing across > stages. > > As long as there is a concept involving romstage/ramstage, we should > keep it to one thing in romstage: getting DRAM up. If this needs a > clumsy blob, then accept its time penalty. > >> >> Does any of that make sense? Perhaps somebody thought of this before? >> Let's see what may be other ways to deal with this challenge. > > 3. Design a driver architecture that doesn't suffer from io-waiting > > This is something I kept in mind for payloads for some time now, but it > could also apply to later coreboot stages: Instead of busy waiting for > i/o, a driver could yield execution until it's called again. Obviously, > this only helps if there is more than one driver running in "parallel". > But it scales much better than one virtual core per driver... > > Another idea just popped up: Performing "background" tasks in udelay() > / mdelay() implementations ;) > > I guess there are many more, maybe some viable, approaches to solve it > with only one thread of execution. > > Anyway, I rather see this parallelism in payloads. Another thought: If > there is something in coreboot that really slows booting down, maybe > that could be moved into the payload? I don't think things are as simple as that with the current solution for these platforms. FSP very much complicates things because the execution context is lost on the transfer. But it's actually worse than that because resource allocation is dependent on the presence of PCI devices. If those disappear or appear after resource allocation then the IO map is not so hot. Things are definitely tightly coupled so it's not clear to me the answer is to punt everything to a payload making everything better. FWIW, I've provided the feedback on FSP and its current deficiencies. However, FSP does allow one to ship products without having to deal with UEFI as a solution to the firmware and ensure all the correct hardware tuning is done since that's the only place Intel supports documenting/maintaining correct initialization sequences. It's definitely a predicament if one wants to continue shipping products on new hardware. > > Nico > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Mon, Feb 13, 2017 at 9:32 PM, ron minnichwrote: > andrey, great questions. If you're really concerned about those issues, then > yes, maybe a space sharing solution is the right one. > > I really would rather not see people implementing schedulers at this point. > If we're going to go that route, let's get a reasonable > operating system and use it instead. If we continue on coreboot's current > trajectory we're going to end up like every other > firmware project that became an OS, and that to me is the wrong direction. It's quite the predicament if we don't want to give up on boot speed. Being heavily invested in coreboot is where we currently are -- for better or worse (I think for the better). For an optimized bootflow all pieces of work that need to be done pretty much need to be closely coupled. One needs to globally optimize the full sequence. Carving that work into granular pieces across different code bases just leaves the perf on the floor that we seem to be absolutely needing to maintain boot speeds. Is Chrome OS going against tide of coreboot wanting to solve those sorts of issues? > > ron > > On Mon, Feb 13, 2017 at 6:43 PM Andrey Petrov > wrote: >> >> Hi, >> >> On 02/13/2017 12:31 PM, ron minnich wrote: >> > Another idea just popped up: Performing "background" tasks in >> > udelay() >> > / mdelay() implementations ;) >> > >> > >> > that is adurbin's threading model. I really like it. >> > >> > A lot of times, concurrency will get you just as far as ||ism without >> > the nastiness. >> >> But how do you guarantee code will get a slice of execution time when it >> needs it? For example for eMMC link training you need to issue certain >> commands with certain time interval. Lets say every 10ms. How do you >> make sure that happens? You can keep track of time and see when next >> piece of work needs to be scheduled, but how do you guarantee you enter >> this udelay code often enough? >> >> Andrey > > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Mon, Feb 13, 2017 at 5:28 PM, Julius Wernerwrote: > +1 for preferring a single-core concurrency model. This would be much more > likely to be reusable for other platforms, and much simpler to maintain in > the long run (way less platform-specific details to keep track of and figure > out again and again for every new chipset). You CAR problems would become > much more simple... just make sure the scheduler structures get migrated > together with the rest of the globals and it should work fine out of the > box. FWIW, there's no coherency in CAR. It's per building block of the hardware units -- much like multiple nodes in AMD K* systems. Migrating CAR not necessarily a simple solution, but I'm not convinced we need multiple cores executing with CAR as a backing store. > > On Mon, Feb 13, 2017 at 12:31 PM, ron minnich wrote: >> >> >> >> On Mon, Feb 13, 2017 at 11:17 AM Nico Huber wrote: >>> >>> >>> >>> Another idea just popped up: Performing "background" tasks in udelay() >>> / mdelay() implementations ;) >> >> >> that is adurbin's threading model. I really like it. >> >> A lot of times, concurrency will get you just as far as ||ism without the >> nastiness. >> >> But if we're going to make a full up kernel for rom, my suggestion is we >> could start with a real kernel, perhaps linux. We could then rename coreboot >> to, say, LinuxBIOS. >> >> ron >> >> -- >> coreboot mailing list: coreboot@coreboot.org >> https://www.coreboot.org/mailman/listinfo/coreboot > > > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Mon, Feb 13, 2017 at 8:43 PM, Andrey Petrovwrote: > Hi, > > On 02/13/2017 12:31 PM, ron minnich wrote: >> >> Another idea just popped up: Performing "background" tasks in udelay() >> / mdelay() implementations ;) >> >> >> that is adurbin's threading model. I really like it. >> >> A lot of times, concurrency will get you just as far as ||ism without >> the nastiness. > > > But how do you guarantee code will get a slice of execution time when it > needs it? For example for eMMC link training you need to issue certain > commands with certain time interval. Lets say every 10ms. How do you make > sure that happens? You can keep track of time and see when next piece of > work needs to be scheduled, but how do you guarantee you enter this udelay > code often enough? > You can't guarantee anything like that. You'd need compiler help for yielding at loop and function boundaries. Or you go the interrupt route and reschedule. Or you use the other CPUs like you already mentioned. coreboot code base is not currently sympathetic to multiple threads with full pre-emption. The threads currently provided in coreboot do yielding in udelay calls as a safe place to reschedule. That by no means provides any latency guarantees and since we have no concept of work we can't ensure there's a guaranteed latency between each chunk of work. > Andrey > > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Mon, Feb 13, 2017 at 8:05 AM, Peter Stugewrote: > Andrey Petrov wrote: >> We are considering adding early parallel code execution in coreboot. >> We need to discuss how this can be done. > > No - first we need to duscuss *if* this should be done. > > >> Nowadays we see firmware getting more complicated. > > Sorry, but that's nonsense. Indeed MSFT is pushing more and more > complicated requirements into the EFI/UEFI ecosystem, but that's > their problem, not a universal one. > > > Your colleague wants to speed up boot time by moving storage driver > code from the payload into coreboot proper, but in fact this goes > directly against the design goals of coreboot, so here's a refresh: > > * coreboot has *minimal* platform (think buses, not peripherals) > initialization code > > * A payload does everything further. There's an inherent sequence point between coreboot and the payload. All of coreboot needs to complete prior to handing off execution to the payload. Everyone's boot up process differs, but boot speed is something Chrome OS cares about very much. That's one of the reason coreboot has been enlisted for Chrome OS products. By maintaining that delineation boot speed can very much suffer. Pushing work out to another piece of software doesn't inherently reduce the total amount of work to be done. That's the current dilemma. Do we just throw our hands up and say things will continue to be slower? Or do we come up to solutions to the current problems we're seeing? > > >> For example Apollolake is struggling to finish firmware boot with all >> the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) >> under one second. Interestingly, great deal of tasks that needs to be >> done are not even computation-bound. They are IO bound. > > How much of that time is spent in the FSP? > > >> scheduler > .. >> how to create infrastructure to run code in parallel in such early stage > > I think you are going in completely the wrong direction. > > You want a scheduler, but that very clearly does not belong in coreboot. > > >> Shall we just add "run this (mini) stage on this core" concept? >> Or shall we add tasklet/worklet structures > > Neither. The correct engineering solution is very simple - adapt FSP > to fit into coreboot, instead of trying to do things the other way > around. > > This means that your scheduler lives in the payload. There is already > precedent - SeaBIOS also already implements multitasking. > > >> this is just a tip of iceberg > > That's exactly why it has no place within coreboot, but belongs in > a payload. > > > //Peter > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hello Andrey, I found that Coreboot really implements atomic and semaphores operations?! What for? Did not expect to find these... At all??? The ONLY reason why, is that in some SoCs are going some independent (invisible) HW threads using the same resources as BSP core (all other cores should be waiting on some SIPI event). I do NOT see any other reason(s) for them to be used. No multi-threading in Coreboot, just single thread continuously/sequentially executing correct? Why not BIOS? There are 100s of millions of PCs, notebooks etc. out there, and these are slow with BIOS. You can argue and tell: IOTG will soon have billions of smart devices using SoCs. Valid point. In the sense, I have another idea for INTEL SoCs/CPUs, as HW architecture improvement. Why your top-notch HW guys do NOT implement MRC as part of MCU. Some HW thread inside CPU/SoC should execute MCU, shouldn't it? MRCs should be few K in size, and they can perfectly fit in there, thus MRC should be (my take on this) part of internal CPU architecture. Today's INTEL COREs and ATOMs have at least/minimum 100M gates, why not to add couple of dozen K more? Lot of problems solved, don't they? ;-) [1] BOOT stage to be much shorter (no anything such as CAR phase); [2] ROM stage does not exist; [3] IP preserved in HW, so the whole INTEL FSP is actually (imagine the Beauty) Open Source... Just $.02 in addition to original $.02 (makes it nickel - $.01). :-) Zoran On Mon, Feb 13, 2017 at 7:08 PM, Andrey Petrovwrote: > Hi, > > On 02/13/2017 12:21 AM, Zoran Stojsavljevic wrote: > > IBVs can work on this proposal, and see how BIOS boot-up time will improve >> (by this parallelism) >> > > There is no need to wait for anybody to see real-world benefits. > > The original patch where you train eMMC link already saves some 50ms. > However MP init kicks in very late. That is a limitation of current > approach where MPinit depends on DRAM to be available. If you move mpinit > earlier, you can already get approx 200ms saving. On Apollolake we have a > prototype where MPinit happens in bootblock. That already reduces boot time > by some 200ms. > > Since, very soon, you'll run to shared HW resource, and then you'll need >> to implement semaphores, atomic operations and God knows what!? >> > > Fortunately, divine powers have nothing to do with it. Atomic operations > are already implemented and spinlocks are in as well. > > What other major issues you see, Zoran? > > thanks > Andrey > -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
andrey, great questions. If you're really concerned about those issues, then yes, maybe a space sharing solution is the right one. I really would rather not see people implementing schedulers at this point. If we're going to go that route, let's get a reasonable operating system and use it instead. If we continue on coreboot's current trajectory we're going to end up like every other firmware project that became an OS, and that to me is the wrong direction. ron On Mon, Feb 13, 2017 at 6:43 PM Andrey Petrovwrote: > Hi, > > On 02/13/2017 12:31 PM, ron minnich wrote: > > Another idea just popped up: Performing "background" tasks in > udelay() > > / mdelay() implementations ;) > > > > > > that is adurbin's threading model. I really like it. > > > > A lot of times, concurrency will get you just as far as ||ism without > > the nastiness. > > But how do you guarantee code will get a slice of execution time when it > needs it? For example for eMMC link training you need to issue certain > commands with certain time interval. Lets say every 10ms. How do you > make sure that happens? You can keep track of time and see when next > piece of work needs to be scheduled, but how do you guarantee you enter > this udelay code often enough? > > Andrey > -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hi, On 02/13/2017 12:31 PM, ron minnich wrote: Another idea just popped up: Performing "background" tasks in udelay() / mdelay() implementations ;) that is adurbin's threading model. I really like it. A lot of times, concurrency will get you just as far as ||ism without the nastiness. But how do you guarantee code will get a slice of execution time when it needs it? For example for eMMC link training you need to issue certain commands with certain time interval. Lets say every 10ms. How do you make sure that happens? You can keep track of time and see when next piece of work needs to be scheduled, but how do you guarantee you enter this udelay code often enough? Andrey -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
I don't see the big deal here, actually. We've had a nice concurrency in coreboot for years, it works, I've used it, what else do we need to do? On Mon, Feb 13, 2017 at 3:39 PM Vadim Bendeburywrote: > Incidentally, a few years ago Chirantan and Simon (cced) implemented > u-boot concurrency support for an ARM SOC. I don't remember how much gain > it was bringing, and it did not go into production as it was quite late in > the project cycle. > > But they might have some experience to share. > > -v > > > On Tue, Feb 14, 2017 at 7:28 AM, Julius Werner > wrote: > > +1 for preferring a single-core concurrency model. This would be much more > likely to be reusable for other platforms, and much simpler to maintain in > the long run (way less platform-specific details to keep track of and > figure out again and again for every new chipset). You CAR problems would > become much more simple... just make sure the scheduler structures get > migrated together with the rest of the globals and it should work fine out > of the box. > > On Mon, Feb 13, 2017 at 12:31 PM, ron minnich wrote: > > > > On Mon, Feb 13, 2017 at 11:17 AM Nico Huber wrote: > > > > Another idea just popped up: Performing "background" tasks in udelay() > / mdelay() implementations ;) > > > that is adurbin's threading model. I really like it. > > A lot of times, concurrency will get you just as far as ||ism without the > nastiness. > > But if we're going to make a full up kernel for rom, my suggestion is we > could start with a real kernel, perhaps linux. We could then rename > coreboot to, say, LinuxBIOS. > > ron > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot > > > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot > > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Incidentally, a few years ago Chirantan and Simon (cced) implemented u-boot concurrency support for an ARM SOC. I don't remember how much gain it was bringing, and it did not go into production as it was quite late in the project cycle. But they might have some experience to share. -v On Tue, Feb 14, 2017 at 7:28 AM, Julius Wernerwrote: > +1 for preferring a single-core concurrency model. This would be much more > likely to be reusable for other platforms, and much simpler to maintain in > the long run (way less platform-specific details to keep track of and > figure out again and again for every new chipset). You CAR problems would > become much more simple... just make sure the scheduler structures get > migrated together with the rest of the globals and it should work fine out > of the box. > > On Mon, Feb 13, 2017 at 12:31 PM, ron minnich wrote: > >> >> >> On Mon, Feb 13, 2017 at 11:17 AM Nico Huber wrote: >> >>> >>> >>> Another idea just popped up: Performing "background" tasks in udelay() >>> / mdelay() implementations ;) >>> >> >> that is adurbin's threading model. I really like it. >> >> A lot of times, concurrency will get you just as far as ||ism without the >> nastiness. >> >> But if we're going to make a full up kernel for rom, my suggestion is we >> could start with a real kernel, perhaps linux. We could then rename >> coreboot to, say, LinuxBIOS. >> >> ron >> >> -- >> coreboot mailing list: coreboot@coreboot.org >> https://www.coreboot.org/mailman/listinfo/coreboot >> > > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot > -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
+1 for preferring a single-core concurrency model. This would be much more likely to be reusable for other platforms, and much simpler to maintain in the long run (way less platform-specific details to keep track of and figure out again and again for every new chipset). You CAR problems would become much more simple... just make sure the scheduler structures get migrated together with the rest of the globals and it should work fine out of the box. On Mon, Feb 13, 2017 at 12:31 PM, ron minnichwrote: > > > On Mon, Feb 13, 2017 at 11:17 AM Nico Huber wrote: > >> >> >> Another idea just popped up: Performing "background" tasks in udelay() >> / mdelay() implementations ;) >> > > that is adurbin's threading model. I really like it. > > A lot of times, concurrency will get you just as far as ||ism without the > nastiness. > > But if we're going to make a full up kernel for rom, my suggestion is we > could start with a real kernel, perhaps linux. We could then rename > coreboot to, say, LinuxBIOS. > > ron > > -- > coreboot mailing list: coreboot@coreboot.org > https://www.coreboot.org/mailman/listinfo/coreboot > -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On Mon, Feb 13, 2017 at 11:17 AM Nico Huberwrote: > > > Another idea just popped up: Performing "background" tasks in udelay() > / mdelay() implementations ;) > that is adurbin's threading model. I really like it. A lot of times, concurrency will get you just as far as ||ism without the nastiness. But if we're going to make a full up kernel for rom, my suggestion is we could start with a real kernel, perhaps linux. We could then rename coreboot to, say, LinuxBIOS. ron -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
On 13.02.2017 08:19, Andrey Petrov wrote: > For example Apollolake is struggling to finish firmware boot with all > the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) > under one second. Can you provide exhaustive figures, which part of this system's boot process takes how long? That would make it easier to reason about where "parallelism" would provide a benefit. > In order to address this problem we can do following things: > 1. Add scheduler, early or not Yes, but really doesn't fit into the coreboot idea, IMHO. > 2. Add early MPinit code No? um, at best very limited (by the number of threads the hardware sup- ports). > For [2] we have been working on prototype for Apollolake that does > pre-memory MPinit. We've got to a stage where we can run C code on > another core before DRAM is up (please do not try that at home, because > you'd need custom experimental ucode). However, there are many questions > what model to use and how to create infrastructure to run code in > parallel in such early stage. Shall we just add "run this (mini) stage > on this core" concept? Or shall we add tasklet/worklet structures that > would allow code to live in run and when migration to DRAM happens have > infrastructure take care of managing context and potentially resume it? > One problem is that code running with CAR needs to stop by the time > system is ready to tear down CAR and migrate to DRAM. We don't want to > delay that by waiting on such task to complete. At the same time, > certain task may have largely fluctuating run times so you would want to > continue them. It is actually may be possible just to do that, if we use > same address space for CAR and DRAM. But come to think of it, this is > just a tip of iceberg and there are packs of other issues we would need > to deal with. Sounds very scary, as if it would never fit, not matter how strong you push. If you really think, we should do something in parallel across coreboot stages, it might be time to redesign the whole thing across stages. As long as there is a concept involving romstage/ramstage, we should keep it to one thing in romstage: getting DRAM up. If this needs a clumsy blob, then accept its time penalty. > > Does any of that make sense? Perhaps somebody thought of this before? > Let's see what may be other ways to deal with this challenge. 3. Design a driver architecture that doesn't suffer from io-waiting This is something I kept in mind for payloads for some time now, but it could also apply to later coreboot stages: Instead of busy waiting for i/o, a driver could yield execution until it's called again. Obviously, this only helps if there is more than one driver running in "parallel". But it scales much better than one virtual core per driver... Another idea just popped up: Performing "background" tasks in udelay() / mdelay() implementations ;) I guess there are many more, maybe some viable, approaches to solve it with only one thread of execution. Anyway, I rather see this parallelism in payloads. Another thought: If there is something in coreboot that really slows booting down, maybe that could be moved into the payload? Nico -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
2017-02-13 8:19 GMT+01:00 Andrey Petrov: > tl;dr: > We are considering adding early parallel code execution in coreboot. We need > to discuss how this can be done. It's reasonable to discuss the "if" first. > Nowadays we see firmware getting more complicated. The coreboot mantra isn't just "boot fast", but also "boot simple". On your "scheduler or MPinit" question, _if_ we have to go down that route: I'd prefer a cooperative threaded single core scheduler, for one simple reason: it's easier to reason about the correctness of code that only ever ceases control at well-defined yield points. As you said, those tasks are not CPU bound. We also don't need experimental ucode for that even when running threads in CAR ;-) Patrick -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hi, On 02/13/2017 10:22 AM, Timothy Pearson wrote: For [2] we have been working on prototype for Apollolake that does pre-memory MPinit. We've got to a stage where we can run C code on another core before DRAM is up (please do not try that at home, because you'd need custom experimental ucode). In addition to the very valid points raised by others on this list, this note in particular is concerning. Whenever we start talking about microcode, we're talking about yet another magic black box that coreboot has no control over and cannot maintain. Adding global functionality that is so system specific in practice as to rely on microcode feature support is not something I ever want to see, unless perhaps the relevant portions of the microcode are open and maintainable by the coreboot project. I am just talking about BIOS shadowing. This is a pretty standard feature, just that not every SoC implement it by default. Naturally, we would be only adding new code if it became publicly available. I believe shadowing works on many existing CPUs, so no, it is not "use this custom NDA-only ucode" to get the stuff working. Andrey -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
What you're asking for is a parallelized or multicore coreboot IIUC. We've done this before. I believe it was yhlu who implemented the multicore DRAM startup on K8 ca. 2005 or so. I implemented a proof of concept multi-core capability in coreboot in 2012. It was dead simple and based on work we did in the NIX kernel, a very basic fork/join model. Instead of halting after SMP startup, APs entered a state where they waited for work.It worked. It was not well received at the time. Maybe it's time to take a look at it again. For your CAR case, all cores would have to finish before you moved into the DRAM stage. Is that really a problem? I don't think based on your note that you need such a complex model as found in linux with tasklets and schedulers and such. A simple space-shared model ought to be sufficient. Further, adurbin's concurrency (thread) model is a very nice API. ron -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/13/2017 01:19 AM, Andrey Petrov wrote: > For [2] we have been working on prototype for Apollolake that does > pre-memory MPinit. We've got to a stage where we can run C code on > another core before DRAM is up (please do not try that at home, because > you'd need custom experimental ucode). In addition to the very valid points raised by others on this list, this note in particular is concerning. Whenever we start talking about microcode, we're talking about yet another magic black box that coreboot has no control over and cannot maintain. Adding global functionality that is so system specific in practice as to rely on microcode feature support is not something I ever want to see, unless perhaps the relevant portions of the microcode are open and maintainable by the coreboot project. In a nutshell, this proposal would make it even harder for any low-level coreboot development on these systems to take place outside of Intel, and as one of the main coreboot contractors this "soft lockdown" is something we are strongly opposed to. Furthermore, I suggest looking at the AMD K8 memory init code -- some basic parallelism was introduced for memory clear, but in the end the improved boot speed was not a "killer feature" and had the side effect of making the code difficult to maintain, leaving the K8 support permanently broken as of this writing. - -- Timothy Pearson Raptor Engineering +1 (415) 727-8645 (direct line) +1 (512) 690-0200 (switchboard) https://www.raptorengineering.com -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJYoflKAAoJEK+E3vEXDOFbNisH/0py+Ox2rSzuRzv/35YlvGeI jkQITraudGstlx7GN/+nReVfOnulG7MKbuMsHLOCeXJz9/0JXsrd+XwJysVcALOJ ESFQFnERbgjw/czkvVGAiHpJ9VkFfW3v0NqoeM6pe77bKXuDRULs5KGauXgFgeVI j82gqtWVWB+x7wMedsvhvEeySRcGClfPew9CrJUe9kCqUxaAJZrrE8ZaZlXYpJ9N fHEUKxQeKwEvVTO8CwuJKEqCCHukrU1ZZgMXOUtBxlkkH/WtrnN7s6o4oIQ/4RhJ qwkuiA/rrlPC0b4F+piKuo1wXnqwE3NxUKdXKT4eaboovsPoeP7V2ISv8uw20SU= =p6rz -END PGP SIGNATURE- -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hi, On 02/13/2017 12:21 AM, Zoran Stojsavljevic wrote: IBVs can work on this proposal, and see how BIOS boot-up time will improve (by this parallelism) There is no need to wait for anybody to see real-world benefits. The original patch where you train eMMC link already saves some 50ms. However MP init kicks in very late. That is a limitation of current approach where MPinit depends on DRAM to be available. If you move mpinit earlier, you can already get approx 200ms saving. On Apollolake we have a prototype where MPinit happens in bootblock. That already reduces boot time by some 200ms. Since, very soon, you'll run to shared HW resource, and then you'll need to implement semaphores, atomic operations and God knows what!? Fortunately, divine powers have nothing to do with it. Atomic operations are already implemented and spinlocks are in as well. What other major issues you see, Zoran? thanks Andrey -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hi, On 02/13/2017 06:05 AM, Peter Stuge wrote: Andrey Petrov wrote: Nowadays we see firmware getting more complicated. Sorry, but that's nonsense. Indeed MSFT is pushing more and more complicated requirements into the EFI/UEFI ecosystem, but that's their problem, not a universal one. I wish it was only MSFT. Chrome systems do a lot of work early on that is CPU intensive, and there waiting on secure hardware as well. Then there is the IO problem that original patch tries to address. Your colleague wants to speed up boot time by moving storage driver code from the payload into coreboot proper, but in fact this goes directly against the design goals of coreboot, so here's a refresh: * coreboot has *minimal* platform (think buses, not peripherals) initialization code * A payload does everything further. This is nice and clean design, no doubt about it. However, it is serial. Another design goal of coreboot is to be fast. Do "be fast" and "be parallel" conflict? For example Apollolake is struggling to finish firmware boot with all the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) under one second. Interestingly, great deal of tasks that needs to be done are not even computation-bound. They are IO bound. How much of that time is spent in the FSP? FSP is about 250ms grand total. However, that is not all that great if you compare to IO to load kernel over SHDCI (130ms) and initialize eMMC device itself (100-300ms). Not to mention other IO-bound tasks that can very well be started in parallel early. how to create infrastructure to run code in parallel in such early stage I think you are going in completely the wrong direction. You want a scheduler, but that very clearly does not belong in coreboot. Actually I am just interested in getting things to boot faster. It can be scheduling or parallel execution on secondary HW threads. Shall we just add "run this (mini) stage on this core" concept? Or shall we add tasklet/worklet structures Neither. The correct engineering solution is very simple - adapt FSP to fit into coreboot, instead of trying to do things the other way around. FSP definitely needs a lot of love to be more usable, I couldn't agree more. But if hardware needs be waited on and your initialization process is serial, you will end up wasting time on polling while you could be doing something else. This means that your scheduler lives in the payload. There is already precedent - SeaBIOS also already implements multitasking. Unfortunately it is way too late to even make a dent on overall boot time. Andrey -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Andrey Petrov wrote: > We are considering adding early parallel code execution in coreboot. > We need to discuss how this can be done. No - first we need to duscuss *if* this should be done. > Nowadays we see firmware getting more complicated. Sorry, but that's nonsense. Indeed MSFT is pushing more and more complicated requirements into the EFI/UEFI ecosystem, but that's their problem, not a universal one. Your colleague wants to speed up boot time by moving storage driver code from the payload into coreboot proper, but in fact this goes directly against the design goals of coreboot, so here's a refresh: * coreboot has *minimal* platform (think buses, not peripherals) initialization code * A payload does everything further. > For example Apollolake is struggling to finish firmware boot with all > the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) > under one second. Interestingly, great deal of tasks that needs to be > done are not even computation-bound. They are IO bound. How much of that time is spent in the FSP? > scheduler .. > how to create infrastructure to run code in parallel in such early stage I think you are going in completely the wrong direction. You want a scheduler, but that very clearly does not belong in coreboot. > Shall we just add "run this (mini) stage on this core" concept? > Or shall we add tasklet/worklet structures Neither. The correct engineering solution is very simple - adapt FSP to fit into coreboot, instead of trying to do things the other way around. This means that your scheduler lives in the payload. There is already precedent - SeaBIOS also already implements multitasking. > this is just a tip of iceberg That's exactly why it has no place within coreboot, but belongs in a payload. //Peter -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot
Re: [coreboot] Add coreboot storage driver
Hello Andrey, > Does any of that make sense? Perhaps somebody thought of this before? Let's see what may be other ways to deal with this challenge. No, it does not. What you are proposing, in-fact, is to make boot-loader as quasi (adding scheduler) HW multithreading OS sans MMU (actually, dealing with two (for now) HW threads). And you have chosen Coreboot to implement this. I will suggest what you are proposing first to be done in true BIOS, so IBVs can work on this proposal, and see how BIOS boot-up time will improve (by this parallelism). Besides, BIOS is much slower (UEFI BIOSes boot in the range of 30 seconds), and should be faster. And... BIOS is closed source, thus there is major business task which should go there, Project Management and some few millions of $$ USD to be spent on this project. Paid for by INTEL and INTEL BIOS Vendors. ;-) Besides, one only knows what the next challenge is (repeating your words: *"...this is just a tip of iceberg and there are packs of other issues we would need to deal with."* Since, very soon, you'll run to shared HW resource, and then you'll need to implement semaphores, atomic operations and God knows what!? My two cent thinking (after all, this is only me, Zoran, independept self-contributor), Zoran On Mon, Feb 13, 2017 at 8:19 AM, Andrey Petrovwrote: > Hi there, > > tl;dr: > We are considering adding early parallel code execution in coreboot. We > need to discuss how this can be done. > > Nowadays we see firmware getting more complicated. At the same time CPUs > do not necessarily catch up. Furthermore, recent increases in performance > can be largely attributed to parallelism and stuffing more cores on die > rather than sheer core computing power. However, firmware typically runs on > just one CPU and is effectively barred from all the parallelism goodies > available to OS software. > > For example Apollolake is struggling to finish firmware boot with all the > whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) under > one second. Interestingly, great deal of tasks that needs to be done are > not even computation-bound. They are IO bound. In case of SDHCI below it is > possible to train eMMC link to switch from default low-freq single data > rate (sdr50) mode to high frequency dual data rate mode (hs400). This link > training increases eMMC throughput by factor by 15-20. As result time it > takes to load kernel in depthcharge goes down from 130ms to 10ms. However, > the training sequence requires constant by frequent CPU attention. As > result, it doesn't make any sense to try to turn on higher-frequency modes > because you don't get any net win. We also experimented by starting work in > current MPinit code. Unfortunately it starts pretty late in the game and we > do not have enough parallel time to reap meaningful benefit. > > In order to address this problem we can do following things: > 1. Add scheduler, early or not > 2. Add early MPinit code > > For [1] I am aware of one scheduler discussion in 2013, but that was long > time ago and things may have moved a bit. I do not want to be a necromancer > and reanimate old discussion, but does anybody see it as a useful/viable > thing to do? > > For [2] we have been working on prototype for Apollolake that does > pre-memory MPinit. We've got to a stage where we can run C code on another > core before DRAM is up (please do not try that at home, because you'd need > custom experimental ucode). However, there are many questions what model to > use and how to create infrastructure to run code in parallel in such early > stage. Shall we just add "run this (mini) stage on this core" concept? Or > shall we add tasklet/worklet structures that would allow code to live in > run and when migration to DRAM happens have infrastructure take care of > managing context and potentially resume it? One problem is that code > running with CAR needs to stop by the time system is ready to tear down CAR > and migrate to DRAM. We don't want to delay that by waiting on such task to > complete. At the same time, certain task may have largely fluctuating run > times so you would want to continue them. It is actually may be possible > just to do that, if we use same address space for CAR and DRAM. But come to > think of it, this is just a tip of iceberg and there are packs of other > issues we would need to deal with. > > Does any of that make sense? Perhaps somebody thought of this before? > Let's see what may be other ways to deal with this challenge. > > thanks > Andrey > > > > On 01/25/2017 03:16 PM, Guvendik, Bora wrote: > >> Port sdhci and mmc driver from depthcharge to coreboot. The purpose is >> to speed up boot time by starting >> >> storage initialization on another CPU in parallel. On the Apollolake >> systems we checked, we found that cpu can take >> >> up to 300ms sending CMD1s to HW, so we can avoid this delay by >> parallelizing. >> >> >> >> - Why not add this
Re: [coreboot] Add coreboot storage driver
Hi there, tl;dr: We are considering adding early parallel code execution in coreboot. We need to discuss how this can be done. Nowadays we see firmware getting more complicated. At the same time CPUs do not necessarily catch up. Furthermore, recent increases in performance can be largely attributed to parallelism and stuffing more cores on die rather than sheer core computing power. However, firmware typically runs on just one CPU and is effectively barred from all the parallelism goodies available to OS software. For example Apollolake is struggling to finish firmware boot with all the whistles and bells (vboot, tpm and our friendly, ever-vigilant TXE) under one second. Interestingly, great deal of tasks that needs to be done are not even computation-bound. They are IO bound. In case of SDHCI below it is possible to train eMMC link to switch from default low-freq single data rate (sdr50) mode to high frequency dual data rate mode (hs400). This link training increases eMMC throughput by factor by 15-20. As result time it takes to load kernel in depthcharge goes down from 130ms to 10ms. However, the training sequence requires constant by frequent CPU attention. As result, it doesn't make any sense to try to turn on higher-frequency modes because you don't get any net win. We also experimented by starting work in current MPinit code. Unfortunately it starts pretty late in the game and we do not have enough parallel time to reap meaningful benefit. In order to address this problem we can do following things: 1. Add scheduler, early or not 2. Add early MPinit code For [1] I am aware of one scheduler discussion in 2013, but that was long time ago and things may have moved a bit. I do not want to be a necromancer and reanimate old discussion, but does anybody see it as a useful/viable thing to do? For [2] we have been working on prototype for Apollolake that does pre-memory MPinit. We've got to a stage where we can run C code on another core before DRAM is up (please do not try that at home, because you'd need custom experimental ucode). However, there are many questions what model to use and how to create infrastructure to run code in parallel in such early stage. Shall we just add "run this (mini) stage on this core" concept? Or shall we add tasklet/worklet structures that would allow code to live in run and when migration to DRAM happens have infrastructure take care of managing context and potentially resume it? One problem is that code running with CAR needs to stop by the time system is ready to tear down CAR and migrate to DRAM. We don't want to delay that by waiting on such task to complete. At the same time, certain task may have largely fluctuating run times so you would want to continue them. It is actually may be possible just to do that, if we use same address space for CAR and DRAM. But come to think of it, this is just a tip of iceberg and there are packs of other issues we would need to deal with. Does any of that make sense? Perhaps somebody thought of this before? Let's see what may be other ways to deal with this challenge. thanks Andrey On 01/25/2017 03:16 PM, Guvendik, Bora wrote: Port sdhci and mmc driver from depthcharge to coreboot. The purpose is to speed up boot time by starting storage initialization on another CPU in parallel. On the Apollolake systems we checked, we found that cpu can take up to 300ms sending CMD1s to HW, so we can avoid this delay by parallelizing. - Why not add this parallelization in the payload instead? There is potentially more time to parallelize things in coreboot. Payload execution is much faster, so we don't get much parallel execution time. - Why not send CMD1 once in coreboot to trigger power-up and let HW initialize using only 1 cpu? Jedec spec requires the CPU to keep sending CMD1s when the hardware is busy (section 6.4.3). We tested with real-world hardware and it indeed didn't work with a single CMD1. - Why did you port the driver from depthcharge? I wanted to use a driver that is proven to avoid bugs. It is also easier to apply patches back and forth. https://review.coreboot.org/#/c/18105 Thanks Bora -- coreboot mailing list: coreboot@coreboot.org https://www.coreboot.org/mailman/listinfo/coreboot