[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-24 Thread Parav Pandit




On 3/30/2023 6:58 PM, Parav Pandit wrote:

Overview:
-
The Transitional MMR device is a variant of the transitional PCI device.
It has its own small Device ID range. It does not have I/O
region BAR; instead it exposes legacy configuration and device
specific registers at an offset in the memory region BAR.

Such transitional MMR devices will be used at the scale of
thousands of devices using PCI SR-IOV and/or future scalable
virtualization technology to provide backward
compatibility (for legacy devices) and also future
compatibility with new features.

Usecase:

1. A hypervisor/system needs to provide transitional
virtio devices to the guest VM at scale of thousands,
typically, one to eight devices per VM.

2. A hypervisor/system needs to provide such devices using a
vendor agnostic driver in the hypervisor system.

3. A hypervisor system prefers to have single stack regardless of
virtio device type (net/blk) and be future compatible with a
single vfio stack using SR-IOV or other scalable device
virtualization technology to map PCI devices to the guest VM.
(as transitional or otherwise)

Motivation/Background:
--
The existing transitional PCI device is missing support for
PCI SR-IOV based devices. Currently it does not work beyond
PCI PF, or as software emulated device in reality. It currently
has below cited system level limitations:

[a] PCIe spec citation:
VFs do not support I/O Space and thus VF BARs shall not
indicate I/O Space.

[b] cpu arch citiation:
Intel 64 and IA-32 Architectures Software Developer’s Manual:
The processor’s I/O address space is separate and distinct from
the physical-memory address space. The I/O address space consists
of 64K individually addressable 8-bit I/O ports, numbered 0 through H.

[c] PCIe spec citation:
If a bridge implements an I/O address range,...I/O address range
will be aligned to a 4 KB boundary.

[d] I/O region accesses at PCI system level is slow as they are non-posted
operations in PCIe fabric.


After our last several discussions, feedback from Michel and Jason,
to support above use case requirements, I would like to update v1 with 
below proposal.


1. Use existing non transitional device to extend the legacy registers 
access.


2. AQ of the parent PF is the optimal choice to access VF legacy 
registers. (As opposed to MMR of the VF).

This is because:
a. it enables to avoid complex reset flow at scale for the VFs.

b. it enables using existing driver notification which is already 
present at notification section of 1.x and transitional device.


3. New AQ command opcode for legacy register access read/write
Input fields:
a. opcode 0x8000
b. group and VF member identifiers.
c. registers offset,
d. registers size (1 to 64B)
e. registers content (on write)

output fields:
a. cmd status
b. register content on read

4. New AQ command to return q notify address for legacy access.
Inputs:
a. opcode 0x8001
b. group and VF member identifier or this can be just constant for all VFs?

Output:
a. BAR index
b. byte offset within the BAR

5. PCI Extended capabilities for all the existing capabilities located 
in the legacy section.

Why?
a. This is for the new driver (such as vfio) to always rely on the new 
capabilities.

b. Legacy PCI regions is close to its full capacity.

Few option questions:
1. Should the q notification query command be per VF or should be one 
for all group members (VF)?


Any further comments to address in v1?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-11 Thread Michael S. Tsirkin
On Wed, Apr 12, 2023 at 04:52:09AM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, April 12, 2023 12:48 AM
> 
> > Here is a counter proposal:
> > 
> > #define VIRTIO_NET_F_LEGACY_HEADER  52  /* Use the legacy 10 byte
> > header for all packets */
> > 
> > 
> > Yes, sorry to say, you need to emulate legacy pci in software.
> > 
> > With notification hacks, and reset hacks, and legacy interrupt hacks, and
> > writeable mac ...  this thing best belongs in vdpa anyway.
> 
> What? I don't follow.
> Suddenly you attribute everything as hack with least explanation.
> 

Again hacks is not a bad thing but it's an attempt at reusing things in
unexpected ways.

New issue I found today:
- if guest disables MSI-X host can not disable MSI-X.
  need some other channel to notify device about this.

Old issues we discussed before today:
- reset needs some special handling because real hardware
  can not guarantee returning 0 on the 1st read
- if guest writes into mac, reusing host mac (which is RO)
  will not work, need extra registers
- something about notification makes you want to poke
  at modern notification register? which of course
  is its own can of worms with VIRTIO_F_NOTIFICATION_DATA
  changing the format completely.


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-11 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Wednesday, April 12, 2023 12:48 AM

> Here is a counter proposal:
> 
> #define VIRTIO_NET_F_LEGACY_HEADER  52  /* Use the legacy 10 byte
> header for all packets */
> 
> 
> Yes, sorry to say, you need to emulate legacy pci in software.
> 
> With notification hacks, and reset hacks, and legacy interrupt hacks, and
> writeable mac ...  this thing best belongs in vdpa anyway.

What? I don't follow.
Suddenly you attribute everything as hack with least explanation.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-11 Thread Michael S. Tsirkin
On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> Overview:
> -
> The Transitional MMR device is a variant of the transitional PCI device.
> It has its own small Device ID range. It does not have I/O
> region BAR; instead it exposes legacy configuration and device
> specific registers at an offset in the memory region BAR.
> 
> Such transitional MMR devices will be used at the scale of
> thousands of devices using PCI SR-IOV and/or future scalable
> virtualization technology to provide backward
> compatibility (for legacy devices) and also future
> compatibility with new features.
> 
> Usecase:
> 
> 1. A hypervisor/system needs to provide transitional
>virtio devices to the guest VM at scale of thousands,
>typically, one to eight devices per VM.
> 
> 2. A hypervisor/system needs to provide such devices using a
>vendor agnostic driver in the hypervisor system.
> 
> 3. A hypervisor system prefers to have single stack regardless of
>virtio device type (net/blk) and be future compatible with a
>single vfio stack using SR-IOV or other scalable device
>virtualization technology to map PCI devices to the guest VM.
>(as transitional or otherwise)


The more I look at it the more issues I see.

Here is a counter proposal:

#define VIRTIO_NET_F_LEGACY_HEADER  52  /* Use the legacy 10 byte header 
for all packets */


Yes, sorry to say, you need to emulate legacy pci in software.

With notification hacks, and reset hacks, and legacy interrupt hacks,
and writeable mac ...  this thing best belongs in vdpa anyway.


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 01:35:01PM -0400, Parav Pandit wrote:
> > So something like a vq would be a step up. I would like to
> > understand the performance angle though. What you describe
> > is pretty bad.
> > 
> Do you mean latency is bad or the description?

I don't know. We need admin vq and transport vq to work.
You describe latency numbers that make both unworkable.
I am interested in fixing that somehow, since it's a blocker.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit




On 4/3/2023 1:28 PM, Michael S. Tsirkin wrote:

On Mon, Apr 03, 2023 at 03:47:56PM +, Parav Pandit wrote:




From: Michael S. Tsirkin 
Sent: Monday, April 3, 2023 11:34 AM



Another is that we can actually work around legacy bugs in the hypervisor. For
example, atomicity and alignment bugs do not exist under DMA. Consider MAC
field, writeable in legacy.  Problem this write is not atomic, so there is a 
window
where MAC is corrupted.  If you do MMIO then you just have to copy this bug.
If you do DMA then hypervisor can buffer all of MAC and send to device in one
go.

I am familiar with this bug.
Users feedback that we received so far has kernels with driver support that 
uses CVQ for setting the mac address on legacy device.
So, it may help but not super important.

Also, if I recollect correctly, the mac address is configuring bit early in 
if-scripts sequence before bringing up the interface.
So, haven't seen real issue around it.


It's an example, there are other bugs in legacy interfaces.

The intent is to provide backward compatibility to the legacy interface, 
and not really fixing the legacy interface in itself as it may break the 
legacy itself.



Take inability to decline feature negotiation as an example.
Legacy driver would do this anyway. It would expect certain flow to work 
that has been worked for it when it was working over previous sw-hypervisor.


Hypervisor attempting to fail what was working before will not help.


With transport vq we can fail at transport level and
hypervisor can decide what to do, such as stopping guest or
unplugging device, etc.




So something like a vq would be a step up. I would like to
understand the performance angle though. What you describe
is pretty bad.


Do you mean latency is bad or the description?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 03:47:56PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 11:34 AM
> 
> > Another is that we can actually work around legacy bugs in the hypervisor. 
> > For
> > example, atomicity and alignment bugs do not exist under DMA. Consider MAC
> > field, writeable in legacy.  Problem this write is not atomic, so there is 
> > a window
> > where MAC is corrupted.  If you do MMIO then you just have to copy this bug.
> > If you do DMA then hypervisor can buffer all of MAC and send to device in 
> > one
> > go.
> I am familiar with this bug.
> Users feedback that we received so far has kernels with driver support that 
> uses CVQ for setting the mac address on legacy device.
> So, it may help but not super important.
> 
> Also, if I recollect correctly, the mac address is configuring bit early in 
> if-scripts sequence before bringing up the interface.
> So, haven't seen real issue around it.

It's an example, there are other bugs in legacy interfaces.

Take inability to decline feature negotiation as an example.
With transport vq we can fail at transport level and
hypervisor can decide what to do, such as stopping guest or
unplugging device, etc.

So something like a vq would be a step up. I would like to
understand the performance angle though. What you describe
is pretty bad.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 11:34 AM

> Another is that we can actually work around legacy bugs in the hypervisor. For
> example, atomicity and alignment bugs do not exist under DMA. Consider MAC
> field, writeable in legacy.  Problem this write is not atomic, so there is a 
> window
> where MAC is corrupted.  If you do MMIO then you just have to copy this bug.
> If you do DMA then hypervisor can buffer all of MAC and send to device in one
> go.
I am familiar with this bug.
Users feedback that we received so far has kernels with driver support that 
uses CVQ for setting the mac address on legacy device.
So, it may help but not super important.

Also, if I recollect correctly, the mac address is configuring bit early in 
if-scripts sequence before bringing up the interface.
So, haven't seen real issue around it.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 11:23:11AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2023 at 03:16:53PM +, Parav Pandit wrote:
> > 
> > 
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, April 3, 2023 11:07 AM
> > 
> > > > > OTOH it is presumably required for scalability anyway, no?
> > > > No.
> > > > Most new generation SIOV and SR-IOV devices operate without any para-
> > > virtualization.
> > > 
> > > Don't see the connection to PV. You need an emulation layer in the host 
> > > if you
> > > want to run legacy guests. Looks like it could do transport vq just as 
> > > well.
> > >
> > Transport vq for legacy MMR purpose seems fine with its latency and DMA 
> > overheads.
> > Your question was about "scalability".
> > After your latest response, I am unclear what "scalability" means.
> > Do you mean saving the register space in the PCI device?
> 
> yes that's how you used scalability in the past.
> 
> > If yes, than, no for legacy guests for scalability it is not required, 
> > because the legacy register is subset of 1.x.
> 
> Weird.  what does guest being legacy have to do with a wish to save
> registers on the host hardware? You don't have so many legacy guests as
> modern guests? Why?
> 
> 
> 
> >  
> > > > > And presumably it can all be done in firmware ...
> > > > > Is there actual hardware that can't implement transport vq but is
> > > > > going to implement the mmr spec?
> > > > >
> > > > Nvidia and Marvell DPUs implement MMR spec.
> > > 
> > > Hmm implement it in what sense exactly?
> > >
> > Do not follow the question.
> > The proposed series will be implemented as PCI SR-IOV devices using MMR 
> > spec.
> >  
> > > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes
> > > read/write.
> > > 
> > > How many of these 2 byte accesses trigger from a typical guest?
> > > 
> > Mostly during the VM boot time. 20 to 40 registers read write access.
> 
> That is not a lot! How long does a DMA operation take then?
> 
> > > > And before discussing "why not that approach", lets finish reviewing 
> > > > "this
> > > approach" first.
> > > 
> > > That's a weird way to put it. We don't want so many ways to do legacy if 
> > > we can
> > > help it.
> > Sure, so lets finish the review of current proposal details.
> > At the moment 
> > a. I don't see any visible gain of transport VQ other than device reset 
> > part I explained.
> 
> For example, we do not need a new range of device IDs and existing
> drivers can bind on the host.

Another is that we can actually work around legacy bugs in the
hypervisor. For example, atomicity and alignment bugs do not exist under
DMA. Consider MAC field, writeable in legacy.  Problem this write is not
atomic, so there is a window where MAC is corrupted.  If you do MMIO
then you just have to copy this bug. If you do DMA then hypervisor can
buffer all of MAC and send to device in one go.

> > b. it can be a way with high latency, DMA overheads on the virtqueue for 
> > read/writes for small access.
> 
> numbers?
> 
> -- 
> MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 03:16:53PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 11:07 AM
> 
> > > > OTOH it is presumably required for scalability anyway, no?
> > > No.
> > > Most new generation SIOV and SR-IOV devices operate without any para-
> > virtualization.
> > 
> > Don't see the connection to PV. You need an emulation layer in the host if 
> > you
> > want to run legacy guests. Looks like it could do transport vq just as well.
> >
> Transport vq for legacy MMR purpose seems fine with its latency and DMA 
> overheads.
> Your question was about "scalability".
> After your latest response, I am unclear what "scalability" means.
> Do you mean saving the register space in the PCI device?

yes that's how you used scalability in the past.

> If yes, than, no for legacy guests for scalability it is not required, 
> because the legacy register is subset of 1.x.

Weird.  what does guest being legacy have to do with a wish to save
registers on the host hardware? You don't have so many legacy guests as
modern guests? Why?



>  
> > > > And presumably it can all be done in firmware ...
> > > > Is there actual hardware that can't implement transport vq but is
> > > > going to implement the mmr spec?
> > > >
> > > Nvidia and Marvell DPUs implement MMR spec.
> > 
> > Hmm implement it in what sense exactly?
> >
> Do not follow the question.
> The proposed series will be implemented as PCI SR-IOV devices using MMR spec.
>  
> > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes
> > read/write.
> > 
> > How many of these 2 byte accesses trigger from a typical guest?
> > 
> Mostly during the VM boot time. 20 to 40 registers read write access.

That is not a lot! How long does a DMA operation take then?

> > > And before discussing "why not that approach", lets finish reviewing "this
> > approach" first.
> > 
> > That's a weird way to put it. We don't want so many ways to do legacy if we 
> > can
> > help it.
> Sure, so lets finish the review of current proposal details.
> At the moment 
> a. I don't see any visible gain of transport VQ other than device reset part 
> I explained.

For example, we do not need a new range of device IDs and existing
drivers can bind on the host.

> b. it can be a way with high latency, DMA overheads on the virtqueue for 
> read/writes for small access.

numbers?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 11:07 AM

> > > OTOH it is presumably required for scalability anyway, no?
> > No.
> > Most new generation SIOV and SR-IOV devices operate without any para-
> virtualization.
> 
> Don't see the connection to PV. You need an emulation layer in the host if you
> want to run legacy guests. Looks like it could do transport vq just as well.
>
Transport vq for legacy MMR purpose seems fine with its latency and DMA 
overheads.
Your question was about "scalability".
After your latest response, I am unclear what "scalability" means.
Do you mean saving the register space in the PCI device?
If yes, than, no for legacy guests for scalability it is not required, because 
the legacy register is subset of 1.x.

 
> > > And presumably it can all be done in firmware ...
> > > Is there actual hardware that can't implement transport vq but is
> > > going to implement the mmr spec?
> > >
> > Nvidia and Marvell DPUs implement MMR spec.
> 
> Hmm implement it in what sense exactly?
>
Do not follow the question.
The proposed series will be implemented as PCI SR-IOV devices using MMR spec.
 
> > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes
> read/write.
> 
> How many of these 2 byte accesses trigger from a typical guest?
> 
Mostly during the VM boot time. 20 to 40 registers read write access.

> > And before discussing "why not that approach", lets finish reviewing "this
> approach" first.
> 
> That's a weird way to put it. We don't want so many ways to do legacy if we 
> can
> help it.
Sure, so lets finish the review of current proposal details.
At the moment 
a. I don't see any visible gain of transport VQ other than device reset part I 
explained.
b. it can be a way with high latency, DMA overheads on the virtqueue for 
read/writes for small access.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 02:57:26PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 10:53 AM
> > 
> > On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote:
> > > > I can not say I thought about this
> > > > deeply so maybe there's some problem, or maybe it's a worse approach
> > > > - could you comment on this? It looks like this could be a smaller
> > > > change, but maybe it isn't? Did you consider this option?
> > >
> > > We can possibly let both the options open for device vendors to implement.
> > >
> > > Change wise transport VQ is fairly big addition for both hypervisor
> > > driver and also for the device.
> > 
> > OTOH it is presumably required for scalability anyway, no?
> No.
> Most new generation SIOV and SR-IOV devices operate without any 
> para-virtualization.

Don't see the connection to PV. You need an emulation layer in the host
if you want to run legacy guests. Looks like it could do transport vq
just as well.

> > And presumably it can all be done in firmware ...
> > Is there actual hardware that can't implement transport vq but is going to
> > implement the mmr spec?
> > 
> Nvidia and Marvell DPUs implement MMR spec.

Hmm implement it in what sense exactly?

> Transport VQ has very high latency and DMA overheads for 2 to 4 bytes 
> read/write.

How many of these 2 byte accesses trigger from a typical guest?

> And before discussing "why not that approach", lets finish reviewing "this 
> approach" first.

That's a weird way to put it. We don't want so many ways to do legacy
if we can help it.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 10:53 AM
> 
> On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote:
> > > I can not say I thought about this
> > > deeply so maybe there's some problem, or maybe it's a worse approach
> > > - could you comment on this? It looks like this could be a smaller
> > > change, but maybe it isn't? Did you consider this option?
> >
> > We can possibly let both the options open for device vendors to implement.
> >
> > Change wise transport VQ is fairly big addition for both hypervisor
> > driver and also for the device.
> 
> OTOH it is presumably required for scalability anyway, no?
No.
Most new generation SIOV and SR-IOV devices operate without any 
para-virtualization.

> And presumably it can all be done in firmware ...
> Is there actual hardware that can't implement transport vq but is going to
> implement the mmr spec?
> 
Nvidia and Marvell DPUs implement MMR spec.
Transport VQ has very high latency and DMA overheads for 2 to 4 bytes 
read/write.

And before discussing "why not that approach", lets finish reviewing "this 
approach" first.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin
On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote:
> > I can not say I thought about this
> > deeply so maybe there's some problem, or maybe it's a worse approach -
> > could you comment on this? It looks like this could be a smaller change,
> > but maybe it isn't? Did you consider this option?
> 
> We can possibly let both the options open for device vendors to implement.
> 
> Change wise transport VQ is fairly big addition for both hypervisor driver
> and also for the device.

OTOH it is presumably required for scalability anyway, no?
And presumably it can all be done in firmware ...
Is there actual hardware that can't implement transport vq
but is going to implement the mmr spec?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-03-31 Thread Parav Pandit




On 3/31/2023 3:03 AM, Michael S. Tsirkin wrote:


OK but this does not answer the following question:
since a legacy driver can not bind to this type of MMR device,
a new driver is needed anyway so
why not implement a modern driver?


Not sure I follow "implement a modern driver".
If you mean hypervisor driver over modern driver, than yes, you captured 
those two problems below.


More reply below.



I think we discussed this at some call and it made some kind of sense.

Yep.

Unfortunately it has been a while and I am not sure I remember the
detail, so I can no longer say for sure whether this proposal is fit for
the purpose.  Here is what I vaguely remember:

A valid use-case is an emulation layer (e.g. a hypervisor) translating
a legacy driver I/O accesses to MMIO. 

Yes.


Ideally layering this emulation
on top of a modern device would work ok
but there are several things making this approach problematic.

Right.


One is a different virtio net header size between legacy and modern
driver. Another is use of control VQ by modern where legacy used
IO writes. In both cases the different would require the
emulation getting involved on the DMA path, in particular
somehow finding private addresses for communication between
emulation and modern device.


Both of these issues are resolved by this proposal.



Does above summarize it reasonably?


And if yes, would an alternative approach of adding legacy config
support to transport vq work well?


VF is supplying the legacy config region (subset of 1.x) in memory 
mapped area.


A transport vq on the parent PF is yet another option for legacy 
register emulation. I think latency wise it will be a lot more higher, 
though it is not of lot of importance.


The good part of transport vq is, device reset is better as it can act 
as slow operation.


Given that device already implements part of registers in 1.x memory 
mapped area, it is reasonable for device to provide similar registers 
via memory map. (legacy is subset, no new addition).



I can not say I thought about this
deeply so maybe there's some problem, or maybe it's a worse approach -
could you comment on this? It looks like this could be a smaller change,
but maybe it isn't? Did you consider this option?


We can possibly let both the options open for device vendors to implement.

Change wise transport VQ is fairly big addition for both hypervisor 
driver and also for the device.





More review later.


ok.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-03-31 Thread Michael S. Tsirkin
On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> Overview:
> -
> The Transitional MMR device is a variant of the transitional PCI device.
> It has its own small Device ID range. It does not have I/O
> region BAR; instead it exposes legacy configuration and device
> specific registers at an offset in the memory region BAR.
> 
> Such transitional MMR devices will be used at the scale of
> thousands of devices using PCI SR-IOV and/or future scalable
> virtualization technology to provide backward
> compatibility (for legacy devices) and also future
> compatibility with new features.
> 
> Usecase:
> 
> 1. A hypervisor/system needs to provide transitional
>virtio devices to the guest VM at scale of thousands,
>typically, one to eight devices per VM.
> 
> 2. A hypervisor/system needs to provide such devices using a
>vendor agnostic driver in the hypervisor system.
> 
> 3. A hypervisor system prefers to have single stack regardless of
>virtio device type (net/blk) and be future compatible with a
>single vfio stack using SR-IOV or other scalable device
>virtualization technology to map PCI devices to the guest VM.
>(as transitional or otherwise)
> 
> Motivation/Background:
> --
> The existing transitional PCI device is missing support for
> PCI SR-IOV based devices. Currently it does not work beyond
> PCI PF, or as software emulated device in reality. It currently
> has below cited system level limitations:
> 
> [a] PCIe spec citation:
> VFs do not support I/O Space and thus VF BARs shall not
> indicate I/O Space.
> 
> [b] cpu arch citiation:
> Intel 64 and IA-32 Architectures Software Developer’s Manual:
> The processor’s I/O address space is separate and distinct from
> the physical-memory address space. The I/O address space consists
> of 64K individually addressable 8-bit I/O ports, numbered 0 through H.
> 
> [c] PCIe spec citation:
> If a bridge implements an I/O address range,...I/O address range
> will be aligned to a 4 KB boundary.
> 
> [d] I/O region accesses at PCI system level is slow as they are non-posted
> operations in PCIe fabric.
> 
> The usecase requirements and limitations above can be solved by
> extending the transitional device, mapping legacy and device
> specific configuration registers in a memory PCI BAR instead
> of using non composable I/O region.
> 
> Please review.

So as you explain in a lot of detail above, IO support is going away,
so the transitional device can no longer be used through the
legacy interface.

OK but this does not answer the following question:
since a legacy driver can not bind to this type of MMR device,
a new driver is needed anyway so
why not implement a modern driver?


I think we discussed this at some call and it made some kind of sense.
Unfortunately it has been a while and I am not sure I remember the
detail, so I can no longer say for sure whether this proposal is fit for
the purpose.  Here is what I vaguely remember:

A valid use-case is an emulation layer (e.g. a hypervisor) translating
a legacy driver I/O accesses to MMIO. Ideally layering this emulation
on top of a modern device would work ok
but there are several things making this approach problematic.
One is a different virtio net header size between legacy and modern
driver. Another is use of control VQ by modern where legacy used
IO writes. In both cases the different would require the
emulation getting involved on the DMA path, in particular
somehow finding private addresses for communication between
emulation and modern device.


Does above summarize it reasonably?


And if yes, would an alternative approach of adding legacy config
support to transport vq work well?  I can not say I thought about this
deeply so maybe there's some problem, or maybe it's a worse approach -
could you comment on this? It looks like this could be a smaller change,
but maybe it isn't? Did you consider this option?


More review later.



> Patch summary:
> --
> patch 1 to 5 prepares the spec
> patch 6 to 11 defines transitional mmr device
> 
> patch-1 uses lower case alphabets to name device id
> patch-2 move transitional device id in legay section along with
> revision id
> patch-3 splits legacy feature bits description from device id
> patch-4 rename and moves virtio config registers next to 1.x
> registers section
> patch-5 Adds missing helper verb in terminology definitions
> patch-6 introduces transitional mmr device
> patch-7 introduces transitional mmr device pci device ids
> patch-8 introduces virtio extended pci capability
> patch-9 describes new pci capability to locate legacy mmr
> registers
> patch-10 extended usage of driver notification capability for
>  the transitional mmr device
> patch-11 adds conformance section of the transitional mmr device
> 
> This design and details further described below.
> 
> Design:
> ---
> Below picture captures the main small difference