On Mon, Sep 26, 2011 at 2:51 AM, David Gibson <da...@gibson.dropbear.id.au> wrote: > On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote: >> Based on the discussions over the last couple of weeks >> I have updated the device fd file layout proposal and >> tried to specify it a bit more formally. >> >> =============================================================== >> >> 1. Overview >> >> This specification describes the layout of device files >> used in the context of vfio, which gives user space >> direct access to I/O devices that have been bound to >> vfio. >> >> When a device fd is opened and read, offset 0x0 contains >> a fixed sized header followed by a number of variable length >> records that describe different characteristics >> of the device-- addressable regions, interrupts, etc. >> >> 0x0 +-------------+-------------+ >> | magic | u32 // identifies this as a vfio >> device file >> +---------------------------+ and identifies the type of bus >> | version | u32 // specifies the version of this >> +---------------------------+ >> | flags | u32 // encodes any flags >> +---------------------------+ >> | dev info record 0 | >> | type | u32 // type of record >> | rec_len | u32 // length in bytes of record >> | | (including record header) >> | flags | u32 // type specific flags >> | ...content... | // record content, which could >> +---------------------------+ // include sub-records >> | dev info record 1 | >> +---------------------------+ >> | dev info record N | >> +---------------------------+ > > I really should have chimed in on this earlier, but I've been very > busy. > > Um, not to put too fine a point on it, this is madness. > > Yes, it's very flexible and can thereby cover a very wide range of > cases. But it's much, much too complex. Userspace has to parse a > complex, multilayered data structure, with variable length elements > just to get an address at which to do IO. I can pretty much guarantee > that if we went with this, most userspace programs using this > interface would just ignore this metadata and directly map the > offsets at which they happen to know the kernel will put things for > the type of device they care about. > > _At least_ for PCI, I think the original VFIO layout of each BAR at a > fixed, well known offset is much better. Despite its limitations, > just advertising a "device type" ID which describes one of a few fixed > layouts would be preferable to this. I'm still hoping, that we can do > a bit better than that. But we should try really hard to at the very > least force the metadata into a simple array of resources each with a > fixed size record describing it, even if it means some space wastage > with occasionally-used fields. Anything more complex than that and > userspace is just never going to use it properly.
So, is your issue really the variable length nature of what was proposed? I don't think it would be that hard to make the different resources fixed length. I think we have 2 types of resources now-- address regions and interrupts. The only thing that get's a bit tricky is device tree paths, which are obviously variable length. We could put a description of all the resources in an array with each element being something like 4KB?? Stuart