Of all the gin joints in all the towns in all the world, Laszlo Ersek had to 
walk into mine at 15:10:12 on Tuesday 28 January 2014 and say:

> On 01/28/14 21:55, Bill Paul wrote:
> > I think part of my problem is that I don't quite understand how the
> > firmware  image volumes work either. You start out with one OVMF.fd
> > image, which  contains all of the firmware in compressed form. I'm
> > assuming this image is  mapped by QEMU into the address space such
> > that there's some initial bootstrap  code placed at the reset vector
> > so that the CPU hits it at power-up/reset, and  from there it extracts
> > the contents into RAM.
> 
> Correct.
> 
> Regarding the on-disk format of the flash device, please see the commit
> message
> 
>   https://github.com/tianocore/edk2/commit/b36f701d
> 
> (the "OVMF.fd after" part).
> 
> It is mapped just below 4GB (*) by qemu. See pc_system_firmware_init()
> in file "hw/i386/pc_sysfw.c". We mostly care about
> pc_system_flash_init() there.
> 
> (*) The size of OVMF.fd is normally 2MB for debug builds, and 1MB for
> release builds. You can ask for the other size in both cases with -D
> FD_SIZE_1MB and -D FD_SIZE_2MB. (See
> <https://github.com/tianocore/edk2/commit/8184a764>.)

The reason I use 2MB when I build is that I discovered, back when I first 
began experimenting with OVMF last year, that if I included secure boot 
support, the IA32 image was too large to fit in 1MB and the build would fail 
because of this. (I thought this was odd: I would have expected the X64 image 
to be the culprit, but there you go.) I used FD_SIZE_2MB as a workaround for 
this.

> The reset vector code and the SEC code are uncompressed.
> 
> OVMF's reset vector is located in OvmfPkg/ResetVector. It reuses the
> "generic" edk2 reset vector when SEC+PEI are 32-bit (Ia32). When SEC+PEI
> are 64-bit (X64), then the reset vector sets up initial page tables too.
> (We used to keep the prebuilt page tables too in read-only flash, but
> KVM didn't really like to have them there, because it wanted to write
> the Accessed bits in the page table entries, even if they were all
> pre-set to 1. I can't recall the exact circumstances, but I believe it
> was only a problem when nested paging was supported and enabled on the
> host. See <https://github.com/tianocore/edk2/commit/c90e37b5>.)
> 
> The SEC code is entered at SecCoreStartupWithStack(), called from
> "OvmfPkg/Sec/X64/SecEntry.S". The C code is in "SecMain.c".
> 
> It sets up some temporary stack and heap near SEC_TOP_OF_STACK,
> decompresses the one FV FFS (= Firmware Volume / Firmware File System)
> file to a temporary RAM buffer (starting at 9MB) from the flash (located
> below 4GB).
> 
> It finds firmware volume headers in the decompressed output. One chunk
> corresponds to PEIFV, and the other corresponds to DXEFV. These are then
> copied to their final places.
> 
> Later on control is transferred to PEI. The last phase of PEI will key
> off the S3 status (cold boot or resume). In the former case, it will
> start DXE. In the latter case, it will jump to the OS's resume vector.
> 
> At S3 resume the reset vector and the SEC code run just the same from
> the flash below 4GB. The SEC code will determine if we're cold booting
> or resuming from S3 sleep. In the former case, see above. In the latter
> case, we won't decompress anything. First, won't need DXE at all.
> Second, we'll need PEI, but that's been decompressed before, and
> protected from the OS as ACPI NVS, so we'll just jump to it.

Okay... I guess I see now. This is starting to sound familiar.
 
> > What I don't know is just where everything ends up in RAM.
> 
> Well in my versions of the patchset :), ie. up to v3, the series used to
> start with a text file documenting the final RAM layout. Of course
> that's completely obsolete now, so I'm not giving you any link lest it
> confuse you.
> 
> We're between v4 and v5 now (an initial sequence from v4 has been
> pushed, and AFAIK Jordan is about to post v5 of the rest). You *can*
> glean the final layout from the FDF files (at the end of Jordan's
> series), precisely from the spot where you've been looking anyway:
> 
>   https://github.com/jljusten/edk2/blob/ovmf-s3/OvmfPkg/OvmfPkgX64.fdf
> 
> > [FD.MEMFD]
> > BaseAddress = 0x800000
> > Size = 0x800000
> > ErasePolarity = 1
> > BlockSize = 0x10000
> > NumBlocks = 0x80
> 
> So, we're basing it at 8MB, and the size is also 8MB. Within that range,
> 
> with relative start addresses:
> > 0x000000|0x006000
> > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPageTablesBase|gUefiOvmfPkgTokenSpac
> > eGuid.PcdOvmfSecPageTablesSize
> 
> These are the initial page tables built by the reset vector code,
> identity-mapping the first 4GB. It comprises six 4K pages. The first two
> pages host the page directories, the four other pages host the page
> tables. The PTEs in there map 4GB with 2MB pages. (If I recall
> correctly... 4GB/2MB == 2048 PTEs needed, 4*4KB=16384 bytes available
> for PTEs, 16384/2048==8 bytes per PTE.)
> 
> So this is at 0x800000 + 0x000000 == 8MB.
> 
> > 0x006000|0x001000
> > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfLockBoxStorageBase|gUefiOvmfPkgTokenSpa
> > ceGuid.PcdOvmfLockBoxStorageSize
> 
> This chunk (1 page) will be needed for internal purposes. Some data to
> save across S3 sleep are prepared during DXE, before booting the OS.
> Those data are separately allocated and saved in ACPI NVS regions (as
> high as possible below the end of 32-bit RAM, ie. below the 32-bit PCI
> hole), and they are linked into this small administrative range (which
> hosts basically a linked list of pointers and sizes).
> 
> Range: 8MB+24KB to 8MB+28K.
> 
> > 0x010000|0x008000
> > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPeiTempRamBase|gUefiOvmfPkgTokenSpac
> > eGuid.PcdOvmfSecPeiTempRamSize
> 
> This area hosts the initial (temporary) heap and stack for SEC and PEI
> that I mentioned above. After PEI detects the size of available RAM
> later on, it informs the PEI core about it ("installs permanent system
> memory"), and then this heap and stack are dynamically relocated higher.
> 
> Range: 8MB+64KB to 8MB+96KB.
> 
> > 0x018000|0x008000
> > gUefiOvmfPkgTokenSpaceGuid.PcdS3AcpiReservedMemoryBase|gEfiIntelFramework
> > ModulePkgTokenSpaceGuid.PcdS3AcpiReservedMemorySize
> 
> This range is not used for anything (other than reserving it from the
> OS) during cold boot. During S3 resume, the temporary stack and heap are
> *not* migrated to some dynamic place in the full system memory (because
> that's already used by the OS). Instead, the "permanent" PEI stack and
> heap are relocated to this region (which has been kept away from the
> OS).
> 
> Range: 8MB+96KB to 8MB+128KB.
> 
> > 0x020000|0x0E0000
> > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPeiMemFvBase|gUefiOvmfPkgTokenSpaceGuid
> > .PcdOvmfPeiMemFvSize FV = PEIFV
> 
> This region hosts the PEI modules (after decompression), ie. it's the
> final place for PEIFV.
> 
> Range: 8MB+128KB to 9MB.

Yes, that part I figured out. BTW, I misstated the size of the ACPI NVS 
region. I had said it was 2MB, but really it ends up being about 900KB. (Maybe 
I read it wrong when I looked at the memory map the first time.)
 
> > 0x100000|0x700000
> > gUefiOvmfPkgTokenSpaceGuid.PcdOvmfDxeMemFvBase|gUefiOvmfPkgTokenSpaceGuid
> > .PcdOvmfDxeMemFvSize FV = DXEFV
> 
> This region hosts the DXE modules (after decompression), ie. it's the
> final place for DXEFV.
> 
> Range: 9MB to 16MB.
> 
> For S3 purposes, we must reserve all of these as ACPI NVS, except the
> last one (ie. DXE modules), because DXE is not run/reached during S3
> resume.
> 
> > You have sections marked BS_Code, BS_Date, RT_Code, RT_Data and
> > LoaderCode. Is LoaderCode the guts of the firmware?
> 
> Hmmm I don't think so. Type "EfiLoaderCode" normally stands for "The
> code portions of a loaded application. (Note that UEFI OS loaders are
> UEFI applications.)" -- see Table 25 in the UEFI spec.

Yes, I'm painfully familiar with that part. :)
 
> For example, the "grub2-efi" binary qualifies.
> 
> The OS can release/repurpose ranges of this type (see table 26).
> 
> > Are you saying the PEIFV area contains yet more guts?
> 
> Internally, yes (it contains a bunch of PEI drivers), but the OS doesn't
> need to know.
> 
> Same for the DXEFV range.
> 
> > And that at the time you have to decide where to put it, you don't
> > know how  much RAM is available yet and/or the code isn't relocatable?
> 
> Correct. When we decompress the "nameless" FV FFS file in SEC, and copy
> PEIFV and DXEFV to their "final" places from the decompressed output, we
> don't yet know how much RAM is available. We only determine that in one
> of the PEI modules (OvmfPkg/PlatformPei/), which is code located inside
> PEIFV.
> 
> (At which point we (will) also install the "permanent PEI memory",
> triggering the temporary-to-permanent stack/heap migration.)
> 
> In theory, we could perhaps fetch the amount of RAM from the CMOS in SEC
> too, and use a 8MB range somewhere below the PCI hole rather than at
> fixed 8MB..16MB.
> 
> We certainly need Jordan to chime in here. The base address @ 8MB dates
> back to a time when I wasn't around yet. Moving it to the other end of
> guest RAM could regress stuff that I'm not aware of.
> 
> >> How large a contiguous range would you need from 1MB upwards?
> >> (Because the address that we'd shift this up to would likely directly
> >> impact the minimum qemu guest memory requirements.)
> > 
> > Unfortunately I'm not sure I have a good answer to that question. We
> > typically  load the VxWorks image at 0x408000, and I think out of the
> > box the 32-bit  build needs about 300MB. (Yes, I know: that doesn't
> > sound very embedded, does  it.)
> 
> Ouch!
> 
> > But I don't think this is the right way to approach the issue either.
> > Something tells me there's a better way to do what you're trying to
> > do, but I  don't understand enough about the problem yet to offer an
> > alternate solution.
> 
> I can't of course *prove* that what OVMF does is the best way, but I'll
> note that you load the VxWorks kernel at a fixed address, with a fixed
> size requirement (same as we do in OVMF, basically), even though the
> VxWorks kernel is higher up on the abstraction ladder.

The VxWorks kernel image is always linked for a user-defined address. You can 
change that address, but it must be done at build time. It's not runtime-
relocatable. The problem is that changing the link address is not all that's 
required to move the image: there are other dependencies in the code itself 
which don't automatically get updated when you change the entry point address. 
So while you can customize it, it's not exactly trivial. Again, the startup 
code itself has this hard-coded notion of memory starting at 0x100000, which 
doesn't change if you shift the load address. There's probably more I can't 
think of right now.

> I don't think we should even ask the question "who's right" here.
> 
> For example, I sometimes glance at #linaro-enterprise on FreeNode. The
> Aarch64 Linux kernel being discussed there seems to put other
> (different) address restrictions on the UEFI firmware that loads it
> (<http://irclogs.linaro.org/2014/01/28/%23linaro-enterprise.html>).
> 
> This suggests that firmware, OS boot loader, and OS should find some
> understanding, and that this understanding will be arbitrary (because it
> can't be really justified by anything else than "well this is how our OS
> works").
> 
> I assume that you boot, from under OVMF, a VxWorks-specific boot loader
> (which is a UEFI application), which in turn pulls in the 300MB kernel
> image at 0x408000. Is that correct?

Yes, I wrote a loader specifically for VxWorks (both IA32 and X64). VxWorks 
images are typically ELF files, though you can generate flat binary files too. 
Right now the loader expects a flat binary file, so it has to know the load 
address ahead of time. (You can change this by recompiling both VxWorks and 
the loader. I do plan to add support for directly loading an ELF image 
eventually so that the load address can be determined from the ELF header, but 
as I said before, that alone isn't enough to move it.)

> Maybe the boot loader could "simply" call gBS->AllocatePages()  with the
> appropriate address hints instead.

I don't understand what this buys me. It won't get rid of the memory hold 
caused by PEIFV, which is the real problem.

> Or, if loading occurs after ExitBootServices(), then the initial runtime
> code could iterate over the UEFI memory map, and find a sufficiently
> large contiguous range that consists of EfiConventionalMemory only (plus
> whatever types Table 26 allows to be freed), and load the kernel there.

Again, the load address alone is not the only issue. Just trust me when I say 
it's not as simple as that. If VxWorks worked like other OSes, it could just 
gather up a list of all available pages, put them into the memory pool, and go 
to town. But that's not how it works.

> >>> It is possible to tweak things in VxWorks to avoid this problem, but
> >>> it's a pain. It's also not something we typically encounter on real
> >>> hardware.
> >> 
> >> I don't think we'd like to hard-wire a *very* different base address
> >> statically. Maybe we could add a build option, but that only moves
> >> the pain around.
> >> 
> >> Re it being different from real hardware, the explanation is that
> >> most of OVMF's modules are stored compressed in the flash, and are
> >> decompressed to (and then run from) RAM at startup. I assume on real
> >> hardware the firmware simply runs from flash. (Hm, I guess it could
> >> be shadowed into RAM too, but I have no data about what addresses.)
> > 
> > I think it equally likely that you'd have compressed flash images on
> > real hardware too. (We actually offer a romCompressed option with
> > VxWorks, where there's no firmware on the system: there's just
> > VxWorks, and it disgorges itself into RAM to execute. There is also a
> > romResident option if you have enough flash/ROM to hold the whole
> > image and don't mind the performance hit.)
> > 
> > But if it's a question of just having the executable code still around
> > somewhere and you can't manage that with compressed images,
> 
> (we can)
> 
> > why not create an uncompressed build option too? Yes I know it would
> > take up some more address space, but that may be the only way to make
> > it work.
> 
> If we kept the PEI and DXE modules uncompressed in flash, then:
> - we could indeed execute them directly from below 4GB, probably,
> - but we'd still need to reserve other areas,

I don't have a problem with the other areas being reserved currently.

> - and the flash size would grow significantly. In the past, concerns
>   were raised on the mailing list about raising the default flash size
>   from 1MB to 2MB (I wasn't (and am not) aware why), but I do think such
>   a jump in size would be concerning again.

I guess it depends on the use case. I see a lot of boards with much more than 
1 or 2MB of NVRAM. There are people who design custom hardware with such small 
parts, but they're not the sort of systems that you would expect to have UEFI 
on. The smallest system I know of that has UEFI firmware is the Galileo/Quark 
X1000, and I'm pretty sure it has more than 2MB of flash.

> >>>> Additionally, after the full S3 support series committed, further
> >>>> code will be added to honor the case when the user disables S3 on
> >>>> the qemu command line ("-global PIIX4_PM.disable_s3=1"). Then the
> >>>> memory allocation in question will be qualified as Boot Services
> >>>> Data (rather than ACPI NVS), and the OS will be able to drop it
> >>>> after transitioning to runtime.
> >>> 
> >>> It appears I need a newer version of QEMU for that option:
> >>> 
> >>> root@core:/home/wpaul/ovmf # qemu-system-x86_64 -global
> >>> PIIX4_PM.disable_s3=1 qemu-system-x86_64: Property '.disable_s3' not
> >>> found
> >> 
> >> Correct. This property was added in
> >> 
> >> commit 459ae5ea5ad682c2b3220beb244d4102c1a4e332
> >> Author: Gleb Natapov <g...@redhat.com>
> >> Date:   Mon Jun 4 14:31:55 2012 +0300
> >> 
> >>     Add PIIX4 properties to control PM system states.
> >> 
> >> first released in v1.2.0.
> >> 
> >> I searched the FreeBSD ports repo for qemu, and it seems that the
> >> "qemu-devel" package is at 1.7.0. (Not sure if you can easily get it
> >> in 9.1-RELEASE.)
> > 
> > I'm sure I can shoehorn it in somehow. :)
> 
> Please note though that OVMF code to actually honor this setting will
> only be written/added once the "basic" S3 functionality is complete.
> (Which it is not, for the time being.)
> 
> Naturally, you can try to convince Jordan to implement that ASAP :)
> 
> (My v3 contained those patches at the end of the series. Jordan has
> taken over for v4 and v5, among other things changing the memory map
> significantly (for the better, I have no problems admitting that), but
> we've also split off / postponed honoring the disable_s3 property for a
> future, separate series.)
> 
> >>> That aside, this would be an acceptable compromise, at least until
> >>> VxWorks supports S3 resume on the Intel architecture. :)
> >>> 
> >>> I still think the placement of the PEIFV block is much less than
> >>> ideal, but for the time being I can deal with it.
> >> 
> >> Alternatively, please propose the lowest address that would work out
> >> of the box for your use case, and then Jordan could decide if it was
> >> reasonable to re-wire the FDFs with that address.
> > 
> > I don't think assuming 300MB of RAM is reasonable
> 
> Fully agreed :)
> 
> > so I don't think that will work. Maybe once I've read more of the code
> > I can suggest a better idea.
> 
> The in-tree code that fetches the amount of RAM from CMOS is in
> "OvmfPkg/PlatformPei/MemDetect.c", and as I said it runs in PEI.
> 
> The decompression code is in "OvmfPkg/Sec/SecMain.c". It runs in SEC,
> before PEI.
> 
> I think any proposed changes should be synchronized with Jordan's S3
> series (v5 is coming soon; see the "ovmf-s3" branch reference above),
> because I believe it's going to rework some of the MemDetect /
> reservation bits.
> 
> Thanks,
> Laszlo

I'll have to go over more of this in more detail later. I greatly appreciate 
the information you've provided though. I'll let you know if I have a 
brainstorm or something. :)

-Bill

-- 
=============================================================================
-Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
                 wp...@windriver.com | Master of Unix-Fu - Wind River Systems
=============================================================================
   "I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to