[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On 3/30/2023 6:58 PM, Parav Pandit wrote: Overview: - The Transitional MMR device is a variant of the transitional PCI device. It has its own small Device ID range. It does not have I/O region BAR; instead it exposes legacy configuration and device specific registers at an offset in the memory region BAR. Such transitional MMR devices will be used at the scale of thousands of devices using PCI SR-IOV and/or future scalable virtualization technology to provide backward compatibility (for legacy devices) and also future compatibility with new features. Usecase: 1. A hypervisor/system needs to provide transitional virtio devices to the guest VM at scale of thousands, typically, one to eight devices per VM. 2. A hypervisor/system needs to provide such devices using a vendor agnostic driver in the hypervisor system. 3. A hypervisor system prefers to have single stack regardless of virtio device type (net/blk) and be future compatible with a single vfio stack using SR-IOV or other scalable device virtualization technology to map PCI devices to the guest VM. (as transitional or otherwise) Motivation/Background: -- The existing transitional PCI device is missing support for PCI SR-IOV based devices. Currently it does not work beyond PCI PF, or as software emulated device in reality. It currently has below cited system level limitations: [a] PCIe spec citation: VFs do not support I/O Space and thus VF BARs shall not indicate I/O Space. [b] cpu arch citiation: Intel 64 and IA-32 Architectures Software Developer’s Manual: The processor’s I/O address space is separate and distinct from the physical-memory address space. The I/O address space consists of 64K individually addressable 8-bit I/O ports, numbered 0 through H. [c] PCIe spec citation: If a bridge implements an I/O address range,...I/O address range will be aligned to a 4 KB boundary. [d] I/O region accesses at PCI system level is slow as they are non-posted operations in PCIe fabric. After our last several discussions, feedback from Michel and Jason, to support above use case requirements, I would like to update v1 with below proposal. 1. Use existing non transitional device to extend the legacy registers access. 2. AQ of the parent PF is the optimal choice to access VF legacy registers. (As opposed to MMR of the VF). This is because: a. it enables to avoid complex reset flow at scale for the VFs. b. it enables using existing driver notification which is already present at notification section of 1.x and transitional device. 3. New AQ command opcode for legacy register access read/write Input fields: a. opcode 0x8000 b. group and VF member identifiers. c. registers offset, d. registers size (1 to 64B) e. registers content (on write) output fields: a. cmd status b. register content on read 4. New AQ command to return q notify address for legacy access. Inputs: a. opcode 0x8001 b. group and VF member identifier or this can be just constant for all VFs? Output: a. BAR index b. byte offset within the BAR 5. PCI Extended capabilities for all the existing capabilities located in the legacy section. Why? a. This is for the new driver (such as vfio) to always rely on the new capabilities. b. Legacy PCI regions is close to its full capacity. Few option questions: 1. Should the q notification query command be per VF or should be one for all group members (VF)? Any further comments to address in v1? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Wed, Apr 12, 2023 at 04:52:09AM +, Parav Pandit wrote: > > > From: Michael S. Tsirkin > > Sent: Wednesday, April 12, 2023 12:48 AM > > > Here is a counter proposal: > > > > #define VIRTIO_NET_F_LEGACY_HEADER 52 /* Use the legacy 10 byte > > header for all packets */ > > > > > > Yes, sorry to say, you need to emulate legacy pci in software. > > > > With notification hacks, and reset hacks, and legacy interrupt hacks, and > > writeable mac ... this thing best belongs in vdpa anyway. > > What? I don't follow. > Suddenly you attribute everything as hack with least explanation. > Again hacks is not a bad thing but it's an attempt at reusing things in unexpected ways. New issue I found today: - if guest disables MSI-X host can not disable MSI-X. need some other channel to notify device about this. Old issues we discussed before today: - reset needs some special handling because real hardware can not guarantee returning 0 on the 1st read - if guest writes into mac, reusing host mac (which is RO) will not work, need extra registers - something about notification makes you want to poke at modern notification register? which of course is its own can of worms with VIRTIO_F_NOTIFICATION_DATA changing the format completely. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device
> From: Michael S. Tsirkin > Sent: Wednesday, April 12, 2023 12:48 AM > Here is a counter proposal: > > #define VIRTIO_NET_F_LEGACY_HEADER 52 /* Use the legacy 10 byte > header for all packets */ > > > Yes, sorry to say, you need to emulate legacy pci in software. > > With notification hacks, and reset hacks, and legacy interrupt hacks, and > writeable mac ... this thing best belongs in vdpa anyway. What? I don't follow. Suddenly you attribute everything as hack with least explanation. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote: > Overview: > - > The Transitional MMR device is a variant of the transitional PCI device. > It has its own small Device ID range. It does not have I/O > region BAR; instead it exposes legacy configuration and device > specific registers at an offset in the memory region BAR. > > Such transitional MMR devices will be used at the scale of > thousands of devices using PCI SR-IOV and/or future scalable > virtualization technology to provide backward > compatibility (for legacy devices) and also future > compatibility with new features. > > Usecase: > > 1. A hypervisor/system needs to provide transitional >virtio devices to the guest VM at scale of thousands, >typically, one to eight devices per VM. > > 2. A hypervisor/system needs to provide such devices using a >vendor agnostic driver in the hypervisor system. > > 3. A hypervisor system prefers to have single stack regardless of >virtio device type (net/blk) and be future compatible with a >single vfio stack using SR-IOV or other scalable device >virtualization technology to map PCI devices to the guest VM. >(as transitional or otherwise) The more I look at it the more issues I see. Here is a counter proposal: #define VIRTIO_NET_F_LEGACY_HEADER 52 /* Use the legacy 10 byte header for all packets */ Yes, sorry to say, you need to emulate legacy pci in software. With notification hacks, and reset hacks, and legacy interrupt hacks, and writeable mac ... this thing best belongs in vdpa anyway. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Mon, Apr 03, 2023 at 01:35:01PM -0400, Parav Pandit wrote: > > So something like a vq would be a step up. I would like to > > understand the performance angle though. What you describe > > is pretty bad. > > > Do you mean latency is bad or the description? I don't know. We need admin vq and transport vq to work. You describe latency numbers that make both unworkable. I am interested in fixing that somehow, since it's a blocker. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On 4/3/2023 1:28 PM, Michael S. Tsirkin wrote: On Mon, Apr 03, 2023 at 03:47:56PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Monday, April 3, 2023 11:34 AM Another is that we can actually work around legacy bugs in the hypervisor. For example, atomicity and alignment bugs do not exist under DMA. Consider MAC field, writeable in legacy. Problem this write is not atomic, so there is a window where MAC is corrupted. If you do MMIO then you just have to copy this bug. If you do DMA then hypervisor can buffer all of MAC and send to device in one go. I am familiar with this bug. Users feedback that we received so far has kernels with driver support that uses CVQ for setting the mac address on legacy device. So, it may help but not super important. Also, if I recollect correctly, the mac address is configuring bit early in if-scripts sequence before bringing up the interface. So, haven't seen real issue around it. It's an example, there are other bugs in legacy interfaces. The intent is to provide backward compatibility to the legacy interface, and not really fixing the legacy interface in itself as it may break the legacy itself. Take inability to decline feature negotiation as an example. Legacy driver would do this anyway. It would expect certain flow to work that has been worked for it when it was working over previous sw-hypervisor. Hypervisor attempting to fail what was working before will not help. With transport vq we can fail at transport level and hypervisor can decide what to do, such as stopping guest or unplugging device, etc. So something like a vq would be a step up. I would like to understand the performance angle though. What you describe is pretty bad. Do you mean latency is bad or the description? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Mon, Apr 03, 2023 at 03:47:56PM +, Parav Pandit wrote: > > > > From: Michael S. Tsirkin > > Sent: Monday, April 3, 2023 11:34 AM > > > Another is that we can actually work around legacy bugs in the hypervisor. > > For > > example, atomicity and alignment bugs do not exist under DMA. Consider MAC > > field, writeable in legacy. Problem this write is not atomic, so there is > > a window > > where MAC is corrupted. If you do MMIO then you just have to copy this bug. > > If you do DMA then hypervisor can buffer all of MAC and send to device in > > one > > go. > I am familiar with this bug. > Users feedback that we received so far has kernels with driver support that > uses CVQ for setting the mac address on legacy device. > So, it may help but not super important. > > Also, if I recollect correctly, the mac address is configuring bit early in > if-scripts sequence before bringing up the interface. > So, haven't seen real issue around it. It's an example, there are other bugs in legacy interfaces. Take inability to decline feature negotiation as an example. With transport vq we can fail at transport level and hypervisor can decide what to do, such as stopping guest or unplugging device, etc. So something like a vq would be a step up. I would like to understand the performance angle though. What you describe is pretty bad. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device
> From: Michael S. Tsirkin > Sent: Monday, April 3, 2023 11:34 AM > Another is that we can actually work around legacy bugs in the hypervisor. For > example, atomicity and alignment bugs do not exist under DMA. Consider MAC > field, writeable in legacy. Problem this write is not atomic, so there is a > window > where MAC is corrupted. If you do MMIO then you just have to copy this bug. > If you do DMA then hypervisor can buffer all of MAC and send to device in one > go. I am familiar with this bug. Users feedback that we received so far has kernels with driver support that uses CVQ for setting the mac address on legacy device. So, it may help but not super important. Also, if I recollect correctly, the mac address is configuring bit early in if-scripts sequence before bringing up the interface. So, haven't seen real issue around it. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Mon, Apr 03, 2023 at 11:23:11AM -0400, Michael S. Tsirkin wrote: > On Mon, Apr 03, 2023 at 03:16:53PM +, Parav Pandit wrote: > > > > > > > From: Michael S. Tsirkin > > > Sent: Monday, April 3, 2023 11:07 AM > > > > > > > OTOH it is presumably required for scalability anyway, no? > > > > No. > > > > Most new generation SIOV and SR-IOV devices operate without any para- > > > virtualization. > > > > > > Don't see the connection to PV. You need an emulation layer in the host > > > if you > > > want to run legacy guests. Looks like it could do transport vq just as > > > well. > > > > > Transport vq for legacy MMR purpose seems fine with its latency and DMA > > overheads. > > Your question was about "scalability". > > After your latest response, I am unclear what "scalability" means. > > Do you mean saving the register space in the PCI device? > > yes that's how you used scalability in the past. > > > If yes, than, no for legacy guests for scalability it is not required, > > because the legacy register is subset of 1.x. > > Weird. what does guest being legacy have to do with a wish to save > registers on the host hardware? You don't have so many legacy guests as > modern guests? Why? > > > > > > > > > > And presumably it can all be done in firmware ... > > > > > Is there actual hardware that can't implement transport vq but is > > > > > going to implement the mmr spec? > > > > > > > > > Nvidia and Marvell DPUs implement MMR spec. > > > > > > Hmm implement it in what sense exactly? > > > > > Do not follow the question. > > The proposed series will be implemented as PCI SR-IOV devices using MMR > > spec. > > > > > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes > > > read/write. > > > > > > How many of these 2 byte accesses trigger from a typical guest? > > > > > Mostly during the VM boot time. 20 to 40 registers read write access. > > That is not a lot! How long does a DMA operation take then? > > > > > And before discussing "why not that approach", lets finish reviewing > > > > "this > > > approach" first. > > > > > > That's a weird way to put it. We don't want so many ways to do legacy if > > > we can > > > help it. > > Sure, so lets finish the review of current proposal details. > > At the moment > > a. I don't see any visible gain of transport VQ other than device reset > > part I explained. > > For example, we do not need a new range of device IDs and existing > drivers can bind on the host. Another is that we can actually work around legacy bugs in the hypervisor. For example, atomicity and alignment bugs do not exist under DMA. Consider MAC field, writeable in legacy. Problem this write is not atomic, so there is a window where MAC is corrupted. If you do MMIO then you just have to copy this bug. If you do DMA then hypervisor can buffer all of MAC and send to device in one go. > > b. it can be a way with high latency, DMA overheads on the virtqueue for > > read/writes for small access. > > numbers? > > -- > MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Mon, Apr 03, 2023 at 03:16:53PM +, Parav Pandit wrote: > > > > From: Michael S. Tsirkin > > Sent: Monday, April 3, 2023 11:07 AM > > > > > OTOH it is presumably required for scalability anyway, no? > > > No. > > > Most new generation SIOV and SR-IOV devices operate without any para- > > virtualization. > > > > Don't see the connection to PV. You need an emulation layer in the host if > > you > > want to run legacy guests. Looks like it could do transport vq just as well. > > > Transport vq for legacy MMR purpose seems fine with its latency and DMA > overheads. > Your question was about "scalability". > After your latest response, I am unclear what "scalability" means. > Do you mean saving the register space in the PCI device? yes that's how you used scalability in the past. > If yes, than, no for legacy guests for scalability it is not required, > because the legacy register is subset of 1.x. Weird. what does guest being legacy have to do with a wish to save registers on the host hardware? You don't have so many legacy guests as modern guests? Why? > > > > > And presumably it can all be done in firmware ... > > > > Is there actual hardware that can't implement transport vq but is > > > > going to implement the mmr spec? > > > > > > > Nvidia and Marvell DPUs implement MMR spec. > > > > Hmm implement it in what sense exactly? > > > Do not follow the question. > The proposed series will be implemented as PCI SR-IOV devices using MMR spec. > > > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes > > read/write. > > > > How many of these 2 byte accesses trigger from a typical guest? > > > Mostly during the VM boot time. 20 to 40 registers read write access. That is not a lot! How long does a DMA operation take then? > > > And before discussing "why not that approach", lets finish reviewing "this > > approach" first. > > > > That's a weird way to put it. We don't want so many ways to do legacy if we > > can > > help it. > Sure, so lets finish the review of current proposal details. > At the moment > a. I don't see any visible gain of transport VQ other than device reset part > I explained. For example, we do not need a new range of device IDs and existing drivers can bind on the host. > b. it can be a way with high latency, DMA overheads on the virtqueue for > read/writes for small access. numbers? -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device
> From: Michael S. Tsirkin > Sent: Monday, April 3, 2023 11:07 AM > > > OTOH it is presumably required for scalability anyway, no? > > No. > > Most new generation SIOV and SR-IOV devices operate without any para- > virtualization. > > Don't see the connection to PV. You need an emulation layer in the host if you > want to run legacy guests. Looks like it could do transport vq just as well. > Transport vq for legacy MMR purpose seems fine with its latency and DMA overheads. Your question was about "scalability". After your latest response, I am unclear what "scalability" means. Do you mean saving the register space in the PCI device? If yes, than, no for legacy guests for scalability it is not required, because the legacy register is subset of 1.x. > > > And presumably it can all be done in firmware ... > > > Is there actual hardware that can't implement transport vq but is > > > going to implement the mmr spec? > > > > > Nvidia and Marvell DPUs implement MMR spec. > > Hmm implement it in what sense exactly? > Do not follow the question. The proposed series will be implemented as PCI SR-IOV devices using MMR spec. > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes > read/write. > > How many of these 2 byte accesses trigger from a typical guest? > Mostly during the VM boot time. 20 to 40 registers read write access. > > And before discussing "why not that approach", lets finish reviewing "this > approach" first. > > That's a weird way to put it. We don't want so many ways to do legacy if we > can > help it. Sure, so lets finish the review of current proposal details. At the moment a. I don't see any visible gain of transport VQ other than device reset part I explained. b. it can be a way with high latency, DMA overheads on the virtqueue for read/writes for small access. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Mon, Apr 03, 2023 at 02:57:26PM +, Parav Pandit wrote: > > > > From: Michael S. Tsirkin > > Sent: Monday, April 3, 2023 10:53 AM > > > > On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote: > > > > I can not say I thought about this > > > > deeply so maybe there's some problem, or maybe it's a worse approach > > > > - could you comment on this? It looks like this could be a smaller > > > > change, but maybe it isn't? Did you consider this option? > > > > > > We can possibly let both the options open for device vendors to implement. > > > > > > Change wise transport VQ is fairly big addition for both hypervisor > > > driver and also for the device. > > > > OTOH it is presumably required for scalability anyway, no? > No. > Most new generation SIOV and SR-IOV devices operate without any > para-virtualization. Don't see the connection to PV. You need an emulation layer in the host if you want to run legacy guests. Looks like it could do transport vq just as well. > > And presumably it can all be done in firmware ... > > Is there actual hardware that can't implement transport vq but is going to > > implement the mmr spec? > > > Nvidia and Marvell DPUs implement MMR spec. Hmm implement it in what sense exactly? > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes > read/write. How many of these 2 byte accesses trigger from a typical guest? > And before discussing "why not that approach", lets finish reviewing "this > approach" first. That's a weird way to put it. We don't want so many ways to do legacy if we can help it. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device
> From: Michael S. Tsirkin > Sent: Monday, April 3, 2023 10:53 AM > > On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote: > > > I can not say I thought about this > > > deeply so maybe there's some problem, or maybe it's a worse approach > > > - could you comment on this? It looks like this could be a smaller > > > change, but maybe it isn't? Did you consider this option? > > > > We can possibly let both the options open for device vendors to implement. > > > > Change wise transport VQ is fairly big addition for both hypervisor > > driver and also for the device. > > OTOH it is presumably required for scalability anyway, no? No. Most new generation SIOV and SR-IOV devices operate without any para-virtualization. > And presumably it can all be done in firmware ... > Is there actual hardware that can't implement transport vq but is going to > implement the mmr spec? > Nvidia and Marvell DPUs implement MMR spec. Transport VQ has very high latency and DMA overheads for 2 to 4 bytes read/write. And before discussing "why not that approach", lets finish reviewing "this approach" first. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote: > > I can not say I thought about this > > deeply so maybe there's some problem, or maybe it's a worse approach - > > could you comment on this? It looks like this could be a smaller change, > > but maybe it isn't? Did you consider this option? > > We can possibly let both the options open for device vendors to implement. > > Change wise transport VQ is fairly big addition for both hypervisor driver > and also for the device. OTOH it is presumably required for scalability anyway, no? And presumably it can all be done in firmware ... Is there actual hardware that can't implement transport vq but is going to implement the mmr spec? -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On 3/31/2023 3:03 AM, Michael S. Tsirkin wrote: OK but this does not answer the following question: since a legacy driver can not bind to this type of MMR device, a new driver is needed anyway so why not implement a modern driver? Not sure I follow "implement a modern driver". If you mean hypervisor driver over modern driver, than yes, you captured those two problems below. More reply below. I think we discussed this at some call and it made some kind of sense. Yep. Unfortunately it has been a while and I am not sure I remember the detail, so I can no longer say for sure whether this proposal is fit for the purpose. Here is what I vaguely remember: A valid use-case is an emulation layer (e.g. a hypervisor) translating a legacy driver I/O accesses to MMIO. Yes. Ideally layering this emulation on top of a modern device would work ok but there are several things making this approach problematic. Right. One is a different virtio net header size between legacy and modern driver. Another is use of control VQ by modern where legacy used IO writes. In both cases the different would require the emulation getting involved on the DMA path, in particular somehow finding private addresses for communication between emulation and modern device. Both of these issues are resolved by this proposal. Does above summarize it reasonably? And if yes, would an alternative approach of adding legacy config support to transport vq work well? VF is supplying the legacy config region (subset of 1.x) in memory mapped area. A transport vq on the parent PF is yet another option for legacy register emulation. I think latency wise it will be a lot more higher, though it is not of lot of importance. The good part of transport vq is, device reset is better as it can act as slow operation. Given that device already implements part of registers in 1.x memory mapped area, it is reasonable for device to provide similar registers via memory map. (legacy is subset, no new addition). I can not say I thought about this deeply so maybe there's some problem, or maybe it's a worse approach - could you comment on this? It looks like this could be a smaller change, but maybe it isn't? Did you consider this option? We can possibly let both the options open for device vendors to implement. Change wise transport VQ is fairly big addition for both hypervisor driver and also for the device. More review later. ok. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote: > Overview: > - > The Transitional MMR device is a variant of the transitional PCI device. > It has its own small Device ID range. It does not have I/O > region BAR; instead it exposes legacy configuration and device > specific registers at an offset in the memory region BAR. > > Such transitional MMR devices will be used at the scale of > thousands of devices using PCI SR-IOV and/or future scalable > virtualization technology to provide backward > compatibility (for legacy devices) and also future > compatibility with new features. > > Usecase: > > 1. A hypervisor/system needs to provide transitional >virtio devices to the guest VM at scale of thousands, >typically, one to eight devices per VM. > > 2. A hypervisor/system needs to provide such devices using a >vendor agnostic driver in the hypervisor system. > > 3. A hypervisor system prefers to have single stack regardless of >virtio device type (net/blk) and be future compatible with a >single vfio stack using SR-IOV or other scalable device >virtualization technology to map PCI devices to the guest VM. >(as transitional or otherwise) > > Motivation/Background: > -- > The existing transitional PCI device is missing support for > PCI SR-IOV based devices. Currently it does not work beyond > PCI PF, or as software emulated device in reality. It currently > has below cited system level limitations: > > [a] PCIe spec citation: > VFs do not support I/O Space and thus VF BARs shall not > indicate I/O Space. > > [b] cpu arch citiation: > Intel 64 and IA-32 Architectures Software Developer’s Manual: > The processor’s I/O address space is separate and distinct from > the physical-memory address space. The I/O address space consists > of 64K individually addressable 8-bit I/O ports, numbered 0 through H. > > [c] PCIe spec citation: > If a bridge implements an I/O address range,...I/O address range > will be aligned to a 4 KB boundary. > > [d] I/O region accesses at PCI system level is slow as they are non-posted > operations in PCIe fabric. > > The usecase requirements and limitations above can be solved by > extending the transitional device, mapping legacy and device > specific configuration registers in a memory PCI BAR instead > of using non composable I/O region. > > Please review. So as you explain in a lot of detail above, IO support is going away, so the transitional device can no longer be used through the legacy interface. OK but this does not answer the following question: since a legacy driver can not bind to this type of MMR device, a new driver is needed anyway so why not implement a modern driver? I think we discussed this at some call and it made some kind of sense. Unfortunately it has been a while and I am not sure I remember the detail, so I can no longer say for sure whether this proposal is fit for the purpose. Here is what I vaguely remember: A valid use-case is an emulation layer (e.g. a hypervisor) translating a legacy driver I/O accesses to MMIO. Ideally layering this emulation on top of a modern device would work ok but there are several things making this approach problematic. One is a different virtio net header size between legacy and modern driver. Another is use of control VQ by modern where legacy used IO writes. In both cases the different would require the emulation getting involved on the DMA path, in particular somehow finding private addresses for communication between emulation and modern device. Does above summarize it reasonably? And if yes, would an alternative approach of adding legacy config support to transport vq work well? I can not say I thought about this deeply so maybe there's some problem, or maybe it's a worse approach - could you comment on this? It looks like this could be a smaller change, but maybe it isn't? Did you consider this option? More review later. > Patch summary: > -- > patch 1 to 5 prepares the spec > patch 6 to 11 defines transitional mmr device > > patch-1 uses lower case alphabets to name device id > patch-2 move transitional device id in legay section along with > revision id > patch-3 splits legacy feature bits description from device id > patch-4 rename and moves virtio config registers next to 1.x > registers section > patch-5 Adds missing helper verb in terminology definitions > patch-6 introduces transitional mmr device > patch-7 introduces transitional mmr device pci device ids > patch-8 introduces virtio extended pci capability > patch-9 describes new pci capability to locate legacy mmr > registers > patch-10 extended usage of driver notification capability for > the transitional mmr device > patch-11 adds conformance section of the transitional mmr device > > This design and details further described below. > > Design: > --- > Below picture captures the main small difference betwe