On 03/16/15 15:15, Gabriel L. Somlo wrote: > This document covers the generic portions of fw_cfg as well as > the x86/x86-64 architecture specific components. > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > Signed-off-by: Gabriel Somlo <so...@cmu.edu> > --- > > This is a resubmission of Jordan's patch from back when: > http://lists.gnu.org/archive/html/qemu-devel/2011-04/msg00238.html > My own signed-off-by may or may not belong in the commit log, not > quite 100% sure what the etiquette is. Please apply with or without it :) > > Thanks, > Gabriel > > docs/specs/fw_cfg.txt | 167 > ++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 167 insertions(+) > create mode 100644 docs/specs/fw_cfg.txt > > diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt > new file mode 100644 > index 0000000..7d156b7 > --- /dev/null > +++ b/docs/specs/fw_cfg.txt > @@ -0,0 +1,167 @@ > + > += Hardware Interface = > + > +== Index Port == > +* Two bytes in width (guest native endianness)
Hmmm. I'm not sure if Jordan's patch should be taken separately, then updated in a separate patch, or if an updated version should be posted as one squashed patch. In any case, I'd like to see the term "selector register" mentioned here (as a synonym, or even replacing "index port"). We have "Documentation/devicetree/bindings/arm/fw-cfg.txt" in the kernel tree now, and wording should be consistent. Second, the selector register's endianness is not guest native *in general*. The commit message says "generic portions of fw_cfg as well as the x86/x86-64 architecture specific components", so it's not easy to decide which this line belongs to. Namely, on x86 the selector register is mapped as a 16-bit wide, little endian ioport (see fw_cfg_comb_mem_ops and fw_cfg_io_realize()), and in that sense "guest native endianness" is correct. However, on arm and ppc (and whatever else uses fw_cfg_init_mem*()), the selector register is memory mapped, and it takes the keys in big endian representation (see fw_cfg_ctl_mem_ops and fw_cfg_mem_realize()). For the arm target, I think big endian cannot be called "guest native endianness". > +* Write only > +* Can be an I/O port and/or Memory-Mapped I/O Okay, if you mention MMIO, then the endianness claim is definitely wrong. > +* Location is platform dependent > + > +A write to this port sets the index of a firmware configuration item > +which can subsequently be accessed at the data port. > + > +Setting the index port will cause the data offset to be set to zero. > +The data offset impacts which data is accessed via the data port, and > +is explained below. > + > +Bit15 of the index value indicates if the configuration setting is > +architecture specific. If bit15 of the index is 0, then the item is > +a generic configuration item. If bit15 of the index is 1, then the > +item is specific to a particular architecture. (In other words, > +generic configuration items are accessed when the index is between > +0x0000-0x7fff, and architecture specific configuration items are > +accessed when the index is between 0x8000-0xffff.) > + > +Bit14 of the index value indicates if the configuration setting is > +being written. If bit14 of the index is 0, then the item is only > +being read, and all write access to the data port will be completely > +ignored. If bit14 of the index is 1, then the item's data can be > +written to by writing to the data port. (In other words, > +configuration write mode is enabled when the index is between > +0x4000-0x7fff or 0xc000-0xffff.) > + > +== Data Port == > +* One byte in width Not necessarily so, any longer. The memory mapped register can be 1, 2, 4 or 8 bytes wide (see fw_cfg_init_mem_wide(), data_width), with the choice depending on board code. Then the guest can access the data register with all of the above widths that do not exceed the maximum width, but always only at offset #0. And, somewhat non-intuitively, the data register has no endianness per se -- the DEVICE_BIG_ENDIAN you see in fw_cfg_data_mem_ops is a pure technicality -- because it does not transfer numerical values; it is *string*-preserving. Once the guest has read an fw_cfg blob, with a sequence of data register accesses of various allowed widths, *then* it can interpret the blob as necessary. (For example, (a) the "initrd blob" is an opaque blob, (b) the "initrd size blob" is a number in little endian encoding, (c) the "fw_cfg file directory blob" is an array of structures prefixed with the number of entries, and both that entry count and the numerical fields in the entry struct have big endian encoding.) Blob transfer and blob interpretation are separate. > +* Read + Write > +* Can be an I/O port and/or Memory-Mapped I/O > +* Location is platform dependent I think we should spell out that in the ioport mapping, the two ports overlap. (They cannot be mapped separately.) > + > +The data port allows for access to an array of bytes for each firmware > +configuration data item. This item is selected by a write to the > +index port. > + > +Initially following a write to the index port, the data offset will > +be set to zero. Each successful read or write to the data port will > +cause the data offset to increment by one byte. (In the mmio case, the offset is incremented by the access width, 1 / 2 / 4 / 8.) > There is only one > +data offset value, and it will be incremented by accesses to any of > +the I/O or MMIO data ports. I don't understand "any of". There is one data register only. Hm, yes, I do understand the text; it was originally written when the API that board code could use had not yet strictly separated the ioport mapping from the mmio mapping. > A write access will not increment the > +data offset if the selected index did not have bit14 set. > + > +Each firmware configuration item has a maximum length of data > +associated with the item. After the data offset has passed the > +end of this maximum data length, then any reads will return a data > +value of 0x00, and all writes will be ignored. > + > +A read of the data port will return the current byte of the firmware > +configuration item. In the mmio case, an N-byte wide read of the data register will return the next N bytes (as a substring of the blob being read) in increasing address order, like with memcpy(). > + > +A write of the data port will set the current byte of the firmware > +configuration item. A write access will not impact the firmware > +configuration data if the selected index did not have bit14 set. > + > +== x86, x86-64 Ports == > + > +I/O Index Port: 0x510 > +I/O Data Port: 0x511 > +MMIO Index Port: N/A > +MMIO Data Port: N/A Again, these are now mutually exclusive, as provided by the qemu-internal API. (Of course insane board code could circumvent that, but it won't.) > + > += Firmware Configuration Items = > + > +== Ranges == > + > +There are up to 0x4000 generic firmware configuration items, and up to > +0x4000 architecturally specific firmware configuration items. > + > +Index Port Range Range Usage > +---------------- ----------- > +0x0000 - 0x3fff Generic Items (0x0000 - 0x3fff) (Read-only) > +0x4000 - 0x7fff Generic Items (0x0000 - 0x3fff) (Read+Write) > +0x8000 - 0xbfff Architecture Specific Items (0x0000 - 0x3fff) (Read-only) > +0xc000 - 0xffff Architecture Specific Items (0x0000 - 0x3fff) (Read+Write) > + > +== Data Items Format == > + > +uint8_t : 8-bit unsigned integer > +uint16_t: 16-bit unsigned integer > +uint32_t: 32-bit unsigned integer > +uint64_t: 64-bit unsigned integer > +n bytes : byte array of length n > +array : byte array of a format specific size > +string : null byte terminated ascii string > + > +All integer data is accessed with the least significant byte first and > +then proceeding to more significant bytes on subsequent accesses. > + This section needs to be reworked. As I indicated above, blob interpretation is specific to the individual selector key. It *is* true that the following internal helper functions: - fw_cfg_add_i16 - fw_cfg_add_i32 - fw_cfg_add_i64 imbue the blob in question with little endian encoding (they call cpu_to_leXX internally, before calling fw_cfg_add_bytes()). However, the term "All integer data" is simply too generic. Pedantically, it should say All fw_cfg blobs added with the fw_cfg_add_iXX() functions carry integer values of the indicated size, with little-endian internal encoding. A bunch of other "integer data" exists, as part of different kinds of blobs, and their encodings are arbitrary. For example, the FWCfgFiles structure (the fw_cfg file directory blob) that I mentioned above is exposed to the guest directly, and the following fields are integers, and have big endian encoding (see fw_cfg_add_file_callback()): - FWCfgFiles.count - FWCfgFiles.f[*].size - FWCfgFiles.f[*].select > +== Generic Items == > + > +Index Data Type Data Meaning > +----- --------- ------------ > +0x00 4 bytes Signature - 'Q', 'E', 'M', 'U' > +0x01 uint32_t ID > +0x02 16 bytes System UUID > +0x03 uint64_t RAM Size > +0x04 uint16_t 0 indicates graphics mode, otherwise non-graphics mode > +0x05 uint16_t The number of SMP CPUs > +0x06 uint16_t Machine ID > +0x07 uint32_t Kernel Address > +0x08 uint32_t Kernel Size > +0x09 string Kernel command line > +0x0a uint32_t Initrd Address > +0x0b uint32_t Initrd Size > +0x0c uint16_t Boot Device > +0x0d array NUMA Data > +0x0e uint16_t Boot Menu > +0x0f uint16_t The maximum number of CPUs (hotpluggable) > +0x10 uint32_t Kernel Entry > +0x11 array Kernel Data > +0x12 array Initrd Data > +0x13 uint32_t Command Line Address > +0x14 uint32_t Command Line Size > +0x15 string Command Line Data > +0x16 uint32_t Setup Address > +0x17 uint32_t Setup Size > +0x18 array Setup Data > +0x19 array File Directory For all of the uintXX_t blobs, their encoding should be pointed out. (Assuming they are all added with the fw_cfg_add_iXX() functions, they can be advertised as little endian.) "string" should be explained as \0-terminated array. Also, selector key 0x01 ("ID") is now considered an "interface revision / feature bitmap". (With value 0 ATM.) > + > +=== File Directory Structure === > + > +Note: Integers in the file directory structure (index 0x19) are stored > +in big-endian format regardless of the host or guest endianness. This > +means that the first byte read of the integer is its most significant > +byte. I like to separate the transfer of a blob from the interpretation of a blob more than the above. It's okay with the 1-byte wide ioport, but with the wide memory-mapped data register, it very easily confuses people (including me, obviously). So, when talking about a blob's interpretation, please avoid words like "read" and "write". Transfer is complete at that point. > + > +The structure begins with a uint32_t in big-endian format. > +This number indicates the number of files that are available. > + > +Offset Data Type Data Meaning > +------ ------------ ------------ > +0x00 uint32_t(be) File count > +0x04 array Array of file instance structures. The total length > + of this array is 64-bytes * file_count. > + > +As shown above, following the initial file count uint32_t, > +there is an array of structures. Each structure is 64-bytes > +in size. The number of instances of this structure in the > +array is given by the initial uint32_t data value read. > +Each structure within the array has this format: > + > +Offset Data Type Data Meaning > +------ ------------ ------------ > +0x00 uint32_t(be) File size > +0x04 uint16_t(be) Firmware configuration entry index for file data (selector value or selector key would be the preferred term -- they are not inherently superior, it's just what I used in the kernel documentation.) > +0x06 2 bytes 2 reserved bytes > +0x08 56 bytes File name as a null terminated ascii string "NUL-terminated". > + > +== x86, x86-64 Architecture Items == > + > +As architecture specific items, these items should be accessed > +starting at 0x8000 for reading or 0xc000 for reading and writing. > + > +Index Data Type Data Meaning > +----- --------- ------------ > +0x00 array ACPI Tables Data > +0x01 array SMBIOS Data > +0x02 uint8_t IRQ0 Override > +0x03 array E820 Table > +0x04 array HPET Data Urgh. No clue about the last three, but I certainly wouldn't advertise the first two. I think we have much better interfaces now for ACPI and SMBIOS than the legacy ones above. (And they are exposed as fw_cfg files, not with fixed selector keys.) Thanks Laszlo