date:20230403

Re: [virtio-dev] [PATCH v10 0/8] Rename queue index to queue number

2023-04-03 Thread Parav Pandit




On 4/3/2023 9:36 PM, Halil Pasic wrote:

On Thu, 30 Mar 2023 19:13:47 +
Parav Pandit  wrote:


From: Halil Pasic 
Sent: Thursday, March 30, 2023 1:11 PM



Currently virtqueue number is only used in the parts that describe
notifications (Guest->Host), the rest of the spec uses virtqueue index.

I argue that using a different term in that context than in the rest of the
specification makes sense, because in the context of notifications the virtqueue
isn't always identified by its index.

Using single term as number is possible, so lets use single term.



I don't understand. In my opinion we should use a single term for
a particular thing, but we should avoid using the same term for
two different things.


Your below description clarifies that we both are talking same.
so I will pause to comment for above point.


One thing is the zero based index of the virtqueue, for that
unfortunately we currently have multiple terms: virtqueue index
and virtqueue nubmer. To remove ourselves from the index vs number
discussion let us call it "X".

There is nothing like a zero based index. A VQ can be any u16 _number_ 
in range of 0 to 65534.

Number can also start from zero. So zero based index = zero based number.


Another thing is the what you below propose to call vq_identifier, but
for the same reason as above let us briefly call it Y. "Y" may hold
an "X" or a "vq_notification_data" to use your term -- we both agree
on that so, I believe we should both agree that "X" and "Y" are distinct
things an need distinct and easy to distinguish names.


Yes. I proposed Y = vq_notify_identifier.
(like how we well named rss_rq_id).
Y = vqn when NOTIF_DATA is not negotiated (usually default).
Y = device supplied vq_notify_identifier when NOTIF_DATA is negotiated.

We should better rename the NOTIF_DATA to NOTIF_ID too. But I park that 
proposal for later.



What we currently have is:
* when we mean "Y" we write "virtqueue number" or "vqn", > * and when we mean "X" we write 
either "virtqueue index" or "virtqueue
   number"

Now let me pull the historic argument for consolidation on "virtqueue
index" rather than "virtqueue number".

Most of the occurrences of "virtqueue number" were introduced by
commit 4ca1311 ("VIRTIO_F_NOTIFICATION_DATA: extra data to devices"
and commit 2ff0d5c ("virtio-net: Add support for the flexible driver
notification structure.")

And I argue most of them stand where we mean "Y". Some of them
don't and that is bad.

Please have a look at the v1.0 version of the specification:
http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html#x1-370007
and search for "queue number" and "queue index". For "queue number"
you should get 1 hit and for "queue index" 8. All of them mean "X",
because "Y" was introduced later.

I argue, it is more sane to assume that the single "queue number"
occurrence is a back then quite harmless mistake (all we had is "X", so
we didn't have the "queue number" means "Y"), that to say the 8
occurrences of "queue index" are all mistakes.

Your approach to resolving the problem is:
* change all concurrences of "queue index" (all stand for "X") to
   "queue number"
* change the occurrences of "queue number" that stand for "Y" (that
   is most of them) to "vq_identifier"
* keep the occurrences of "queue number" that stand for "X" as is



Once we rename the field vqn to vq_notify_id in the notification 
structure, we are out of the mess.



Yes that way we can reach an in itself consistent state.

I argue that the following approach is better:
* keep all occurrences of "queue index" as is (all stand for "X")
* change he occurrences of "queue number" that stand for "X"
   to "virtqueue index"
* if we can find a name for "Y" better than "queue number"
   replace all remaining occurrences of "queue number" with
   that new name (I agree "queue number" is not a very good
   name for "Y")


vq_notify_id is a good name that takes us in sane place.


Both solutions are equally consistent in themselves, but I
argue the latter is better because:
* it is more consistent with historic usage

Well index/vqn has mixed up anyway now.
For historic reason, I agree that index is right.
But it is too late now.
Comments have not come on time.


* in my subjective opinion "queue index" is a better
name for "X" compared to "queue number"
* it requires fewer changes.


Unfortunately not.
Based on past agreements more patches have progress to use vqn.
It includes this patch, other patch you voted yes for vqn.
(interrupt moderation), more work that Michael did with AQ in the series 
v11.


If you sum up, changing all of it requires more changes.


@Michael: Do you agree? Disagree?

Parav, your work is very much appreciated. I know, the messy
current state is not your fault at all. In fact it is to a certain
degree my fault.

I was really fine either way for index and number, one month back when I 
asked to reach consensus.


Most agreed with vqn, and engineers wrote v11 of this series, and

Re: [virtio-dev] [PATCH v10 0/8] Rename queue index to queue number

2023-04-03 Thread Halil Pasic

On Thu, 30 Mar 2023 19:13:47 +
Parav Pandit  wrote:

> > From: Halil Pasic 
> > Sent: Thursday, March 30, 2023 1:11 PM  
> 
> > Currently virtqueue number is only used in the parts that describe
> > notifications (Guest->Host), the rest of the spec uses virtqueue index.
> >
> > I argue that using a different term in that context than in the rest of the
> > specification makes sense, because in the context of notifications the 
> > virtqueue
> > isn't always identified by its index.  
> Using single term as number is possible, so lets use single term. 
> 

I don't understand. In my opinion we should use a single term for
a particular thing, but we should avoid using the same term for
two different things.

One thing is the zero based index of the virtqueue, for that
unfortunately we currently have multiple terms: virtqueue index
and virtqueue nubmer. To remove ourselves from the index vs number
discussion let us call it "X".

Another thing is the what you below propose to call vq_identifier, but
for the same reason as above let us briefly call it Y. "Y" may hold
an "X" or a "vq_notification_data" to use your term -- we both agree
on that so, I believe we should both agree that "X" and "Y" are distinct
things an need distinct and easy to distinguish names.

What we currently have is:
* when we mean "Y" we write "virtqueue number" or "vqn", 
* and when we mean "X" we write either "virtqueue index" or "virtqueue
  number"

Now let me pull the historic argument for consolidation on "virtqueue
index" rather than "virtqueue number".

Most of the occurrences of "virtqueue number" were introduced by
commit 4ca1311 ("VIRTIO_F_NOTIFICATION_DATA: extra data to devices"
and commit 2ff0d5c ("virtio-net: Add support for the flexible driver
notification structure.")

And I argue most of them stand where we mean "Y". Some of them
don't and that is bad.

Please have a look at the v1.0 version of the specification:
http://docs.oasis-open.org/virtio/virtio/v1.0/cs04/virtio-v1.0-cs04.html#x1-370007
and search for "queue number" and "queue index". For "queue number"
you should get 1 hit and for "queue index" 8. All of them mean "X",
because "Y" was introduced later.

I argue, it is more sane to assume that the single "queue number"
occurrence is a back then quite harmless mistake (all we had is "X", so
we didn't have the "queue number" means "Y"), that to say the 8
occurrences of "queue index" are all mistakes.

Your approach to resolving the problem is: 
* change all concurrences of "queue index" (all stand for "X") to
  "queue number" 
* change the occurrences of "queue number" that stand for "Y" (that
  is most of them) to "vq_identifier"
* keep the occurrences of "queue number" that stand for "X" as is

Yes that way we can reach an in itself consistent state.

I argue that the following approach is better:
* keep all occurrences of "queue index" as is (all stand for "X")
* change he occurrences of "queue number" that stand for "X"
  to "virtqueue index"
* if we can find a name for "Y" better than "queue number"
  replace all remaining occurrences of "queue number" with
  that new name (I agree "queue number" is not a very good
  name for "Y")

Both solutions are equally consistent in themselves, but I
argue the latter is better because:
* it is more consistent with historic usage
* in my subjective opinion "queue index" is a better
name for "X" compared to "queue number"
* it requires fewer changes.

@Michael: Do you agree? Disagree?

Parav, your work is very much appreciated. I know, the messy
current state is not your fault at all. In fact it is to a certain
degree my fault. 

> > 
> > More precisely: if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated in the
> > context of notifications the virtqueue is identified by the so called
> > "queue_notify_data"; 
> > if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated in  
> 
> You missed "not" before negotiation. :)

Right.

Regard,
Halil

> 
> > the context of notifications the virtqueue is identified by the virtqueue 
> > index (as
> > usual, for example in queue_select, or in the ccws).
> > 
> > As I've pointed out in my comment to patch 2, I believe replacing virtqueue
> > index with virtqueue number is detrimental to clarity.

[..]

> > @@ -413,7 +416,9 @@ \section{Driver Notifications} \label{sec:Basic 
> > Facilities
> > of a Virtio Device /  the following information:
> > 
> >  \begin{description}
> > -\item [vqn] VQ number to be notified.
> > +\item [vqn] Identifies the virtqueueue to be notified. If
> > VIRTIO_F_NOTIF_CONFIG_DATA has been
> > +  negotiateted, the value of \field{vqn} is the notify data suplied by 
> > the
> > device  
> There is still vqn here.  So no better than vq number.
> To be truly clear, it should be renamed to vq_identifier, that either 
> contains vqn or vq_notification_data.
> 
> As I replied I can take up that cleanup at later point if we find some 
> motivation for CONFIG_DATA use.

[virtio-dev] [PATCH v11 8/8] virtio-net: Describe RSS using rss rq id

2023-04-03 Thread Parav Pandit

The content of the indirection table and unclassified_queue were
originally described based on mathematical operations. In order to
make it easier to understand and to avoid intermixing the array
index with the virtqueue number, introduce a structure
rss_rq_id (RSS receive queue
ID) and use it to describe the unclassified_queue and
indirection_table fields.

As part of it, have the example that uses non-zero virtqueue
number which helps to have better mapping between receiveX
object with virtqueue number and the actual value in the
indirection table.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Signed-off-by: Parav Pandit 

---
changelog:
v10->v11:
- commit log updated to drop old reference to rq_handle, replaced with
  rss_rq_id detail
v8->v9:
- reword rss_rq_id and unclassified_packets description
- use article
- use 'vq number' instead of 'virtqueue number'
v4->v5:
- change subject to reflect rss_rq_id
- fixed accidental removal of "unclassifed packets".
- simplfied text around indirection_table mask to use the term
  destination receive virtqueue. This aligns with next line about queue
  reset.
- removed rss_rq_id references as indirection table and
  unclassified_queue data type is self explanatory
v3->v4:
- renamed rq_handle to rss_rq_id
- moved rss_rq_id definition close to its usage in rss_config struct
v2->v3:
- moved rq_handle definition before using it
- removed duplicate example as rq_handle is now described first
v0->v1:
- new patch suggested by Michael Tsirkin
---
 device-types/net/description.tex | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 435c1fc..c64335d 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1423,11 +1423,16 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
 
 Driver sends a VIRTIO_NET_CTRL_MQ_RSS_CONFIG command using the following 
format for \field{command-specific-data}:
 \begin{lstlisting}
+struct rss_rq_id {
+   le16 vqn_1_16: 15; /* Bits 1 to 16 of the vq number */
+   le16 reserved: 1; /* Set to zero */
+};
+
 struct virtio_net_rss_config {
 le32 hash_types;
 le16 indirection_table_mask;
-le16 unclassified_queue;
-le16 indirection_table[indirection_table_length];
+struct rss_rq_id unclassified_queue;
+struct rss_rq_id indirection_table[indirection_table_length];
 le16 max_tx_vq;
 u8 hash_key_length;
 u8 hash_key_data[hash_key_length];
@@ -1442,10 +1447,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
 \field{indirection_table} array.
 Number of entries in \field{indirection_table} is 
(\field{indirection_table_mask} + 1).
 
-Field \field{unclassified_queue} contains the 0-based index of
-the receive virtqueue to place unclassified packets in. Index 0 corresponds to 
receiveq1.
+\field{rss_rq_id} is a receive virtqueue id. \field{vqn_1_16}
+consists of bits 1 to 16 of a vq number. For example, a
+\field{vqn_1_16} value of 3 corresponds to vq number 6,
+which maps to receiveq4.
+
+Field \field{unclassified_queue} contains the receive virtqueue
+in which to place unclassified packets.
 
-Field \field{indirection_table} contains an array of 0-based indices of 
receive virtqueues. Index 0 corresponds to receiveq1.
+Field \field{indirection_table} is an array of receive virtqueues.
 
 A driver sets \field{max_tx_vq} to inform a device how many transmit 
virtqueues it may use (transmitq1\ldots transmitq \field{max_tx_vq}).
 
@@ -1455,7 +1465,8 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types 
/ Network Device / Devi
 
 A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the 
feature VIRTIO_NET_F_RSS has not been negotiated.
 
-A driver MUST fill the \field{indirection_table} array only with indices of 
enabled queues. Index 0 corresponds to receiveq1.
+A driver MUST fill the \field{indirection_table} array only with
+enabled receive virtqueues.
 
 The number of entries in \field{indirection_table} 
(\field{indirection_table_mask} + 1) MUST be a power of two.
 
@@ -1468,7 +1479,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types 
/ Network Device / Devi
 \begin{itemize}
 \item Calculate the hash of the packet as defined in \ref{sec:Device Types / 
Network Device / Device Operation / Processing of Incoming Packets / Hash 
calculation for incoming packets}.
 \item If the device did not calculate the hash for the specific packet, the 
device directs the packet to the receiveq specified by 
\field{unclassified_queue} of virtio_net_rss_config structure.
-\item Apply \field{indirection_table_mask} to the calculated hash and use the 
result as the index in the indirection table to get 0-based number of 
destination receiveq.
+\item Apply \field{indirection_table_mask} to the calculated hash
+and use the result as the index in the indirection table to get
+the

[virtio-dev] [PATCH v11 6/8] transport-ccw: Refer to the vq by its number

2023-04-03 Thread Parav Pandit

Currently specification uses virtqueue index and
number interchangeably to refer to the virtqueue.

Instead refer to it by its number.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 

---
changelog:
v9->v10:
- replaced queue_select with vqn
- used lower case letter for first word in comment
v8->v9:
- replaced 'named' with 'known'
- replaced 'queue number' with 'vq number'
v3->v4:
- moved note to comment
v2->v3:
- added comment note for queue_select similar to max_queue_size
v0->v1:
- new patch
---
 transport-ccw.tex | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/transport-ccw.tex b/transport-ccw.tex
index 3ce3f9f..23a6483 100644
--- a/transport-ccw.tex
+++ b/transport-ccw.tex
@@ -237,11 +237,12 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 \begin{lstlisting}
 struct vq_config_block {
 be16 index;
+be16 vqn; /* previously known as index */
 be16 max_queue_size; /* previously known as max_num */
 };
 \end{lstlisting}
 
-The requested number of buffers for queue \field{index} is returned in
+The requested number of buffers for queue \field{vqn} is returned in
 \field{max_queue_size}.
 
 Afterwards, CCW_CMD_SET_VQ is issued by the driver to inform the
@@ -252,7 +253,7 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 struct vq_info_block {
 be64 desc;
 be32 res0;
-be16 index;
+be16 vqn; /* previously known as index */
 be16 size; /* previously known as num */
 be64 driver;
 be64 device;
@@ -261,7 +262,7 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 
 \field{desc}, \field{driver} and \field{device} contain the guest
 addresses for the descriptor area,
-available area and used area for queue \field{index}, respectively. The actual
+available area and used area for vq number \field{vqn}, respectively. The 
actual
 virtqueue size (number of allocated buffers) is transmitted in
 \field{size}.
 
@@ -278,12 +279,13 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 struct vq_info_block_legacy {
 be64 queue;
 be32 align;
-be16 index;
+be16 vqn; /* previously known as index */
 be16 size; /* previously known as num */
 };
 \end{lstlisting}
 
-\field{queue} contains the guest address for queue \field{index},
+\field{queue} contains the guest address for vq number
+\field{vqn},
 \field{size} the number of buffers
 and \field{align} the alignment. The queue layout follows \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on 
Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues 
/ Legacy Interfaces: A Note on Virtqueue Layout}.
 
@@ -571,7 +573,7 @@ \subsubsection{Guest->Host Notification}\label{sec:Virtio 
Transport Options / Vi
 For example:
 \begin{lstlisting}
 info->cookie = do_notify(schid,
- virtqueue_get_queue_index(vq),
+ virtqueue_get_queue_number(vq),
  info->cookie);
 \end{lstlisting}
 \end{note}
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 5/8] transport-ccw: Rename queue depth/size to other transports

2023-04-03 Thread Parav Pandit

max_num field reflects the maximum queue size/depth. Hence align name of
this field with similar field in PCI and MMIO transport to
max_queue_size.
Similarly rename 'num' to 'size'.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Signed-off-by: Parav Pandit 

---
changelog:
v9>v10:
- used lower case letter for comment first word
v8->v9:
- replaced 'named' as 'known'
v3->v4:
- moved note to comment
---
 transport-ccw.tex | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/transport-ccw.tex b/transport-ccw.tex
index c492cb9..3ce3f9f 100644
--- a/transport-ccw.tex
+++ b/transport-ccw.tex
@@ -237,12 +237,12 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 \begin{lstlisting}
 struct vq_config_block {
 be16 index;
-be16 max_num;
+be16 max_queue_size; /* previously known as max_num */
 };
 \end{lstlisting}
 
 The requested number of buffers for queue \field{index} is returned in
-\field{max_num}.
+\field{max_queue_size}.
 
 Afterwards, CCW_CMD_SET_VQ is issued by the driver to inform the
 device about the location used for its queue. The transmitted
@@ -253,7 +253,7 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 be64 desc;
 be32 res0;
 be16 index;
-be16 num;
+be16 size; /* previously known as num */
 be64 driver;
 be64 device;
 };
@@ -262,7 +262,8 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 \field{desc}, \field{driver} and \field{device} contain the guest
 addresses for the descriptor area,
 available area and used area for queue \field{index}, respectively. The actual
-virtqueue size (number of allocated buffers) is transmitted in \field{num}.
+virtqueue size (number of allocated buffers) is transmitted in
+\field{size}.
 
 \devicenormative{\paragraph}{Configuring a Virtqueue}{Virtio Transport Options 
/ Virtio over channel I/O / Device Initialization / Configuring a Virtqueue}
 
@@ -278,11 +279,12 @@ \subsubsection{Configuring a Virtqueue}\label{sec:Virtio 
Transport Options / Vir
 be64 queue;
 be32 align;
 be16 index;
-be16 num;
+be16 size; /* previously known as num */
 };
 \end{lstlisting}
 
-\field{queue} contains the guest address for queue \field{index}, \field{num} 
the number of buffers
+\field{queue} contains the guest address for queue \field{index},
+\field{size} the number of buffers
 and \field{align} the alignment. The queue layout follows \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Legacy Interfaces: A Note on 
Virtqueue Layout}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues 
/ Legacy Interfaces: A Note on Virtqueue Layout}.
 
 \subsubsection{Communicating Status Information}\label{sec:Virtio Transport 
Options / Virtio over channel I/O / Device Initialization / Communicating 
Status Information}
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 4/8] transport-mmio: Refer to the vq by its number

2023-04-03 Thread Parav Pandit

Currently specification uses virtqueue index and
number interchangeably to refer to the virtqueue.

Instead refer to it by its number.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Reviewed-by: Cornelia Huck 
Reviewed-by: Jiri Pirko 
Signed-off-by: Parav Pandit 

---
changelog:
v8->v9:
- added 'by' at two places
- replaced 'queue number' with 'vq number'

v6->v7:
- remove text around first vq as it is already covered in the basic
  virtqueues facility section
---
 transport-mmio.tex | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/transport-mmio.tex b/transport-mmio.tex
index 164e640..1350b02 100644
--- a/transport-mmio.tex
+++ b/transport-mmio.tex
@@ -108,13 +108,12 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
 bits accessible by writing to \field{DriverFeatures}.
   }
   \hline
-  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+  \mmioreg{QueueSel}{Virtual queue number}{0x030}{W}{%
 Writing to this register selects the virtual queue that the
 following operations on \field{QueueSizeMax},
 \field{QueueSize}, \field{QueueReady},
 \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueDriverlLow}, 
\field{QueueDriverHigh},
-\field{QueueDeviceLow}, \field{QueueDeviceHigh} and \field{QueueReset} 
apply to. The index
-number of the first queue is zero (0x0).
+\field{QueueDeviceLow}, \field{QueueDeviceHigh} and \field{QueueReset} 
apply to.
   }
   \hline
   \mmioreg{QueueSizeMax}{Maximum virtual queue size}{0x034}{R}{%
@@ -149,7 +148,7 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
 there are new buffers to process in a queue.
 
 When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
-the value written is the queue index.
+the value written is the vq number.
 
 When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
 the \field{Notification data} value has the following format:
@@ -363,8 +362,7 @@ \subsubsection{Virtqueue Configuration}\label{sec:Virtio 
Transport Options / Vir
 The driver will typically initialize the virtual queue in the following way:
 
 \begin{enumerate}
-\item Select the queue writing its index (first queue is 0) to
-   \field{QueueSel}.
+\item Select the queue by writing its number to \field{QueueSel}.
 
 \item Check if the queue is not already in use: read \field{QueueReady},
and expect a returned value of zero (0x0).
@@ -392,8 +390,8 @@ \subsubsection{Available Buffer 
Notifications}\label{sec:Virtio Transport Option
 
 When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
 the driver sends an available buffer notification to the device by writing
-the 16-bit virtqueue index
-of the queue to be notified to \field{QueueNotify}.
+the 16-bit vq number of the queue to be notified to
+\field{QueueNotify}.
 
 When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
 the driver sends an available buffer notification to the device by writing
@@ -470,13 +468,11 @@ \subsection{Legacy interface}\label{sec:Virtio Transport 
Options / Virtio Over M
 (see QueuePFN).
   }
   \hline
-  \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
+  \mmioreg{QueueSel}{Virtual queue number}{0x030}{W}{%
 Writing to this register selects the virtual queue that the
 following operations on the \field{QueueSizeMax},
 \field{QueueSize}, \field{QueueAlign}
-and \field{QueuePFN} registers apply to. The index
-number of the first queue is zero (0x0).
-.
+and \field{QueuePFN} registers apply to.
   }
   \hline
   \mmioreg{QueueSizeMax}{Maximum virtual queue size}{0x034}{R}{%
@@ -550,8 +546,7 @@ \subsection{Legacy interface}\label{sec:Virtio Transport 
Options / Virtio Over M
 
 The virtual queue is configured as follows:
 \begin{enumerate}
-\item Select the queue writing its index (first queue is 0) to
-   \field{QueueSel}.
+\item Select the queue by writing its number to \field{QueueSel}.
 
 \item Check if the queue is not already in use: read \field{QueuePFN},
expecting a returned value of zero (0x0).
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 2/8] transport-pci: Refer to the vq by its number

2023-04-03 Thread Parav Pandit

Currently specification uses virtqueue index and
number interchangeably to refer to the virtqueue.

Instead refer to it by its number.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Reviewed-by: Cornelia Huck 
Reviewed-by: Max Gurtovoy 
Reviewed-by: Jiri Pirko 
Signed-off-by: Parav Pandit 
---
changelog:
v9->v10:
- updated commit log to drop reference to old patch
v8->v9:
- reword the sentence to avoid future tense, like rest of the other
  fields description
- reword the sentence to avoid multiple verbs use and put -> uses
- use shorter name 'vq number' instead of 'virtqueue number'
v7->v8:
- remove note about first virtqueue number
- skipped Max's comment to put word 'structure' in same line as its
  crosses 65 chars limit per line
- reworded queue_notification data set line, as '=' and vq number
  wording was odd
v2->v3:
- addressed comments from Michael
- changed vqn to virtqueue number in the Note
- refer to vqn field instead of virtqueue number
---
 transport-pci.tex | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/transport-pci.tex b/transport-pci.tex
index b07a822..0f3a48b 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -390,13 +390,14 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
 
 \item[\field{queue_notify_data}]
 This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been 
negotiated.
-The driver will use this value to put it in the 'virtqueue number' 
field
+The driver uses this value in the field \field{vqn}
 in the available buffer notification structure.
 See section \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / 
PCI-specific Initialization And Device Operation / Available Buffer 
Notifications}.
 \begin{note}
 This field provides the device with flexibility to determine how 
virtqueues
 will be referred to in available buffer notifications.
-In a trivial case the device can set \field{queue_notify_data}=vqn. 
Some devices
+In a trivial case the device can set
+\field{queue_notify_data} to the vq number. Some devices
 may benefit from providing another value, for example an internal 
virtqueue
 identifier, or an internal offset related to the virtqueue number.
 \end{note}
@@ -1005,7 +1006,7 @@ \subsubsection{Device Initialization}\label{sec:Virtio 
Transport Options / Virti
 The driver typically does this as follows, for each virtqueue a device has:
 
 \begin{enumerate}
-\item Write the virtqueue index (first queue is 0) to \field{queue_select}.
+\item Write the vq number to \field{queue_select}.
 
 \item Read the virtqueue size from \field{queue_size}. This controls how big 
the virtqueue is
   (see \ref{sec:Basic Facilities of a Virtio Device / 
Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Virtqueues}). If 
this field is 0, the virtqueue does not exist.
@@ -1035,8 +1036,8 @@ \subsubsection{Available Buffer 
Notifications}\label{sec:Virtio Transport Option
 
 When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
 the driver sends an available buffer notification to the device by writing
-the 16-bit virtqueue index
-of this virtqueue to the Queue Notify address.
+the 16-bit vq number of this virtqueue to the Queue Notify
+address.
 
 When VIRTIO_F_NOTIFICATION_DATA has been negotiated,
 the driver sends an available buffer notification to the device by writing
@@ -1053,7 +1054,7 @@ \subsubsection{Available Buffer 
Notifications}\label{sec:Virtio Transport Option
 If VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated:
 \begin{itemize}
 \item If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the driver MUST 
use the
-\field{queue_notify_data} value instead of the virtqueue index.
+\field{queue_notify_data} value instead of the vq number.
 \item If VIRTIO_F_NOTIFICATION_DATA has been negotiated, the driver MUST set 
the
 \field{vqn} field to the \field{queue_notify_data} value.
 \end{itemize}
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 1/8] content: Add vq number text

2023-04-03 Thread Parav Pandit

Introduce vq number and its range so that subsequent patches can refer
to it.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
changelog:
v9->v10:
- added braces around vq number wording
- added vqn as another term for vq number
v8->v9:
- added inclusive when describing the vq number range
- skipped comment to put virtqueue number wording first because we
  prefer to use shorter vq number as much as possible
v5->v6:
- moved description close to introduction, it was in middle of
  queue data transfer description
v2->v3:
- new patch
---
 content.tex | 4 
 1 file changed, 4 insertions(+)

diff --git a/content.tex b/content.tex
index cff548a..7bac167 100644
--- a/content.tex
+++ b/content.tex
@@ -298,6 +298,10 @@ \section{Virtqueues}\label{sec:Basic Facilities of a 
Virtio Device / Virtqueues}
 virtqueues\footnote{For example, the simplest network device has one virtqueue 
for
 transmit and one for receive.}.
 
+Each virtqueue is identified by a vq number (also referred
+to as a virtqueue number or vqn); vq number range is from 0 to 65535
+inclusive.
+
 Driver makes requests available to device by adding
 an available buffer to the queue, i.e., adding a buffer
 describing the request to a virtqueue, and optionally triggering
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 0/8] Rename queue index to queue number

2023-04-03 Thread Parav Pandit

1. Currently, virtqueue is identified between driver and device
interchangeably using either number or index terminology.

2. Between PCI and MMIO transport the queue size (depth) is
defined as queue_size and QueueNum respectively.

To avoid confusion and to have consistency, unify them to use Number.

Solution:
a. Use virtqueue number description, and rename MMIO register as QueueSize.
b. Replace virtqueue index with virtqueue number
c. RSS area of virtio net has inherited some logic, describe it
using abstract rss_rq_id.

Patch summary:
patch-1 introduce vq number as generic term
patch-2 renames index to number for pci transport
patch-3 renames mmio register from Num to Size
patch-4 renames index to number for mmio transport
patch-5 renames num field to size for ccw transport
patch-6 renames index field to vqn for ccw transport
patch-7 for virtio-net removes duplicate example from requirements
patch-8 for virtio-net updates rss description to use vq number

This series only improves the documentation, it does not change any
transport or device functionality.

Please review.
This series fixes the issue [1].

[1] https://github.com/oasis-tcs/virtio-spec/issues/163

---
changelog:
v10->v11:
- added Reviewed-by for all the reviewed patches
- updated commit log of patch-8 to drop rq_handle reference
- skipped comment to further use rss_rq_id, as rss_rq_id usage
  and structure are self describing
v9->v10:
- added virtqueue number part in content in braces
- replaced queue_select to vqn in ccw
- avoided aggrasive alignment of 65 chars
- updated commit log to drop reference to already merged patches
- added review-by tag for already reviewed patches
v8->v9:
- addressed comments from David
- few corrections with article
- renaming 'virtqueue number' to 'vq number'
- improving text and wording for rss_rq_id, avail notification
- commit log of specific change in individual patches
v7->v8:
- remove note about first virtqueue number
- skipped Max's comment to put word 'structure' in same line as its
  crosses 65 chars limit per line
- reworded queue_notification data set line, as '=' and vq number
  wording was odd
v6->v7:
- remove text around first vq as it is already covered in the basic
  virtqueues facility section
v5->v6:
- moved the vq number description from middle of vq operation
  to beginning of vq introduction
v4->v5:
- fixed accidental removal of "unclassifed packets".
- simplfied text around indirection_table mask
- removed rss_rq_id references as indirection table and
  unclassified_queue data type is self explanatory
v3->v4:
- moved note to comment for ccw
- renamed rq_handle to rss_rq_id
- moved rss_rq_id next to rss_config structure
- define rss_config structure using rss_rq_id
v2->v3:
- addressed comments from Michael
- added previous definitions for ccw fields
- moved rq_handle definition before using it
- added first patch to describe vq number
- updated pci for available buffer notification section
v1->v2:
- added patches for virtio net for rss area
- added patches for covering ccw transport
- added missing entries to refer in mmio transport

Parav Pandit (8):
  content: Add vq number text
  transport-pci: Refer to the vq by its number
  transport-mmio: Rename QueueNum register
  transport-mmio: Refer to the vq by its number
  transport-ccw: Rename queue depth/size to other transports
  transport-ccw: Refer to the vq by its number
  virtio-net: Avoid duplicate receive queue example
  virtio-net: Describe RSS using rss rq id

 content.tex  |  4 ++
 device-types/net/description.tex | 29 ++
 transport-ccw.tex| 26 +++--
 transport-mmio.tex   | 65 ++--
 transport-pci.tex| 13 ---
 5 files changed, 84 insertions(+), 53 deletions(-)

-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 7/8] virtio-net: Avoid duplicate receive queue example

2023-04-03 Thread Parav Pandit

Receive queue number/index example is duplicate which is already defined
in the Setting RSS parameters section.

Hence, avoid such duplicate example and prepare it for the subsequent
patch to describe using receive queue handle.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 device-types/net/description.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index d7c8b1b..435c1fc 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1467,8 +1467,8 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types 
/ Network Device / Devi
 The device MUST determine the destination queue for a network packet as 
follows:
 \begin{itemize}
 \item Calculate the hash of the packet as defined in \ref{sec:Device Types / 
Network Device / Device Operation / Processing of Incoming Packets / Hash 
calculation for incoming packets}.
-\item If the device did not calculate the hash for the specific packet, the 
device directs the packet to the receiveq specified by 
\field{unclassified_queue} of virtio_net_rss_config structure (value of 0 
corresponds to receiveq1).
-\item Apply \field{indirection_table_mask} to the calculated hash and use the 
result as the index in the indirection table to get 0-based number of 
destination receiveq (value of 0 corresponds to receiveq1).
+\item If the device did not calculate the hash for the specific packet, the 
device directs the packet to the receiveq specified by 
\field{unclassified_queue} of virtio_net_rss_config structure.
+\item Apply \field{indirection_table_mask} to the calculated hash and use the 
result as the index in the indirection table to get 0-based number of 
destination receiveq.
 \item If the destination receive queue is being reset (See \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}), the device MUST 
drop the packet.
 \end{itemize}
 
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 3/8] transport-mmio: Rename QueueNum register

2023-04-03 Thread Parav Pandit

Currently specification uses virtqueue index and number
interchangeably to refer to the virtqueue.

It is better to always refer to it the virtqueue in consistent manner.

Two registers QueueNumMax and QueueNum actually reflects the queue size
or queue depth indicating max and actual number of entries in the queue.

These are further named differently between pci and mmio transport.
PCI transport indicates queue size as queue_size.

To bring consistency between pci and mmio transport, and to avoid
confusion between number and index, rename the QueueNumMax and QueueNum
registers to QueueSizeMax and QueueSize respectively.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/163
Reviewed-by: Cornelia Huck 
Reviewed-by: Jiri Pirko 
Signed-off-by: Parav Pandit 

---
changelog:
v8->v9:
- added field tag to indicate field name instead of English word
v0->v1:
- replaced references of QueueNumMax to QueueSizeMax
- replaced references of QueueNum to QueueSize
- added note for renamed fields old name suggested by @Michael Tsirkin
---
 transport-mmio.tex | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/transport-mmio.tex b/transport-mmio.tex
index f884a2c..164e640 100644
--- a/transport-mmio.tex
+++ b/transport-mmio.tex
@@ -110,24 +110,31 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
   \hline
   \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
 Writing to this register selects the virtual queue that the
-following operations on \field{QueueNumMax}, \field{QueueNum}, 
\field{QueueReady},
+following operations on \field{QueueSizeMax},
+\field{QueueSize}, \field{QueueReady},
 \field{QueueDescLow}, \field{QueueDescHigh}, \field{QueueDriverlLow}, 
\field{QueueDriverHigh},
 \field{QueueDeviceLow}, \field{QueueDeviceHigh} and \field{QueueReset} 
apply to. The index
 number of the first queue is zero (0x0).
   }
   \hline
-  \mmioreg{QueueNumMax}{Maximum virtual queue size}{0x034}{R}{%
+  \mmioreg{QueueSizeMax}{Maximum virtual queue size}{0x034}{R}{%
 Reading from the register returns the maximum size (number of
 elements) of the queue the device is ready to process or
 zero (0x0) if the queue is not available. This applies to the
 queue selected by writing to \field{QueueSel}.
+\begin{note}
+\field{QueueSizeMax} was previously known as \field{QueueNumMax}.
+\end{note}
   }
   \hline
-  \mmioreg{QueueNum}{Virtual queue size}{0x038}{W}{%
+  \mmioreg{QueueSize}{Virtual queue size}{0x038}{W}{%
 Queue size is the number of elements in the queue.
 Writing to this register notifies the device what size of the
 queue the driver will use. This applies to the queue selected by
 writing to \field{QueueSel}.
+\begin{note}
+\field{QueueSize} was previously known as \field{QueueNum}.
+\end{note}
   }
   \hline
   \mmioreg{QueueReady}{Virtual queue ready bit}{0x044}{RW}{%
@@ -308,11 +315,11 @@ \subsection{MMIO Device Register Layout}\label{sec:Virtio 
Transport Options / Vi
 
 Before writing to the \field{DriverFeatures} register, the driver MUST write a 
value to the \field{DriverFeaturesSel} register.
 
-The driver MUST write a value to \field{QueueNum} which is less than
-or equal to the value presented by the device in \field{QueueNumMax}.
+The driver MUST write a value to \field{QueueSize} which is less than
+or equal to the value presented by the device in \field{QueueSizeMax}.
 
 When \field{QueueReady} is not zero, the driver MUST NOT access
-\field{QueueNum}, \field{QueueDescLow}, \field{QueueDescHigh},
+\field{QueueSize}, \field{QueueDescLow}, \field{QueueDescHigh},
 \field{QueueDriverLow}, \field{QueueDriverHigh}, \field{QueueDeviceLow}, 
\field{QueueDeviceHigh}.
 
 To stop using the queue the driver MUST write zero (0x0) to this
@@ -363,14 +370,14 @@ \subsubsection{Virtqueue Configuration}\label{sec:Virtio 
Transport Options / Vir
and expect a returned value of zero (0x0).
 
 \item Read maximum queue size (number of elements) from
-   \field{QueueNumMax}. If the returned value is zero (0x0) the
+   \field{QueueSizeMax}. If the returned value is zero (0x0) the
queue is not available.
 
 \item Allocate and zero the queue memory, making sure the memory
is physically contiguous.
 
 \item Notify the device about the queue size by writing the size to
-   \field{QueueNum}.
+   \field{QueueSize}.
 
 \item Write physical addresses of the queue's Descriptor Area,
Driver Area and Device Area to (respectively) the
@@ -465,25 +472,32 @@ \subsection{Legacy interface}\label{sec:Virtio Transport 
Options / Virtio Over M
   \hline
   \mmioreg{QueueSel}{Virtual queue index}{0x030}{W}{%
 Writing to this register selects the virtual queue that the
-following operations on the \field{QueueNumMax}, \field{QueueNum}, 
\field{QueueAlign}
+following operations on the \field{QueueSizeMax},
+\field{QueueSize}, \field{QueueAlign}

[virtio-dev] RE: [virtio-comment] [PATCH 0/3] transport-pci: msix summary update

2023-04-03 Thread Parav Pandit

Hi Cornelia,

> From: Xuan Zhuo 
> Sent: Monday, March 27, 2023 2:09 AM
> Series:
> 
> Reviewed-by: Xuan Zhuo 
> 

It is editorial change rewording the text.
Can you please take this short series forward?

> On Fri, 24 Mar 2023 04:35:31 +0300, Parav Pandit  wrote:
> > Summary for config_msix_vector and queue_msix_vector registers have a
> > confusing text of "for MSI-X".
> >
> > Rewrite the register summary line.
> >
> > Patch summary:
> > patch 1 removes empty line at the end of the file patch 2 updates
> > configuration vector summary patch 3 updates queue vector summary
> >
> > These are only editorial changes to rewrite the summary.
> > Hence it should not require a github issue.
> >
> > Parav Pandit (3):
> >   transport-pci: Remove empty line at end of file
> >   transport-pci: Improve config msix vector description
> >   transport-pci: Improve queue msix vector register desc
> >
> >  transport-pci.tex | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > --
> > 2.26.2
> >
> >
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and to
> > minimize spam in the list archive, subscription is required before
> > posting.
> >
> > Subscribe: virtio-comment-subscr...@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
> > List help: virtio-comment-h...@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License:
> > https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines:
> > https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> >

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH v10 8/8] virtio-net: Describe RSS using rss rq id

2023-04-03 Thread Parav Pandit




> From: Cornelia Huck 
> Sent: Thursday, March 30, 2023 5:17 AM
> 
> On Thu, Mar 30 2023, Parav Pandit  wrote:
> 
> > The content of indirection table and unclassified_queue which are
> > based on math calculation historically. To better describe this, to
> > avoid intermixing array index with virtqueue index and to use
> > virtqueue number
> >
> > introduce a field rq_handle (receive queue handle) and refer them to
> > describe unclassified_queue and indirection_table fields.
> 
> This description is a bit confusing (and you renamed rq_handle as well.) Does
> 
> "The content of the indirection table and unclassified_queue were originally
> described based on mathematical operations. In order to make it easier to
> understand and to avoid intermixing the array index with the virtqueue
> number, introduce a structure rss_rq_id (RSS receive queue
> ID) and use it to describe the unclassified_queue and indirection_table 
> fields."
> 
> capture the intent correctly?
> 
Yep. Will use it.
I missed to update the commit log.
Will roll v11.

> >
> > As part of it, have the example that uses non zero virtqueue
> 
> "have the example use a non-zero..." ?
:)
Ok.

> > -Field \field{unclassified_queue} contains the 0-based index of -the
> > receive virtqueue to place unclassified packets in. Index 0 corresponds to
> receiveq1.
> > +\field{rss_rq_id} is a receive virtqueue id. \field{vqn_1_16}
> > +consists of bits 1 to 16 of a vq number. For example, a
> > +\field{vqn_1_16} value of 3 corresponds to vq number 6, which maps to
> > +receiveq4.
> > +
> > +Field \field{unclassified_queue} contains the receive virtqueue in
> > +which to place unclassified packets.
> >
> > -Field \field{indirection_table} contains an array of 0-based indices of 
> > receive
> virtqueues. Index 0 corresponds to receiveq1.
> > +Field \field{indirection_table} is an array of receive virtqueues.
> 
> "an array of receive virtqueues identified via their rss_rq_id" ?
No need to make it this verbose. It is evident from the definition itself.

> 
> >
> >  A driver sets \field{max_tx_vq} to inform a device how many transmit
> virtqueues it may use (transmitq1\ldots transmitq \field{max_tx_vq}).
> >
> > @@ -1455,7 +1465,8 @@ \subsubsection{Control
> > Virtqueue}\label{sec:Device Types / Network Device / Devi
> >
> >  A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command
> if the feature VIRTIO_NET_F_RSS has not been negotiated.
> >
> > -A driver MUST fill the \field{indirection_table} array only with indices of
> enabled queues. Index 0 corresponds to receiveq1.
> > +A driver MUST fill the \field{indirection_table} array only with
> > +enabled receive virtqueues.
> 
> "only with rss_rq_id references to enabled receive virtqueues" ?
> 
In this field and other fields, we just refer to the receive virtqueues as 
rss_rq_id parent structure itself is describing what it is.
Hence, we omit re-iterating rss_rq_id at multiple places.

> >
> >  The number of entries in \field{indirection_table}
> (\field{indirection_table_mask} + 1) MUST be a power of two.
> >
> > @@ -1468,7 +1479,9 @@ \subsubsection{Control
> > Virtqueue}\label{sec:Device Types / Network Device / Devi
> > \begin{itemize}  \item Calculate the hash of the packet as defined in
> \ref{sec:Device Types / Network Device / Device Operation / Processing of
> Incoming Packets / Hash calculation for incoming packets}.
> >  \item If the device did not calculate the hash for the specific packet, the
> device directs the packet to the receiveq specified by 
> \field{unclassified_queue}
> of virtio_net_rss_config structure.
> > -\item Apply \field{indirection_table_mask} to the calculated hash and use
> the result as the index in the indirection table to get 0-based number of
> destination receiveq.
> > +\item Apply \field{indirection_table_mask} to the calculated hash and
> > +use the result as the index in the indirection table to get the
> > +destination receive virtqueue.
> 
> Ok, you remove the line here anyway, so please just ignore my suggestion for
> the previous patch.
> 
Ok.
Will send v11 with updated commit log.

> >  \item If the destination receive queue is being reset (See \ref{sec:Basic
> Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}), the device 
> MUST
> drop the packet.
> >  \end{itemize}


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit





On 4/3/2023 5:14 PM, Michael S. Tsirkin wrote:

On Mon, Apr 03, 2023 at 08:42:52PM +, Parav Pandit wrote:



From: Michael S. Tsirkin 
Sent: Monday, April 3, 2023 4:03 PM

On Mon, Apr 03, 2023 at 07:48:56PM +, Parav Pandit wrote:

OK but supporting them with a passthrough driver such as vfio does
not seem that important.

Not sure on what basis you assert it.
I clarified in the cover letter that these are the user level requirements to

support transitional and non-transitional devices both via single vfio 
subsystem.

And what is so wrong with vdpa?  Really I don't see how the virtio spec needs to
accomodate specific partitioning between linux modules, be it vdpa or vfio.
Way beyond the scope of the driver.


vdpa has its own value.

Here requirements are different as listed so let's focus on it.


I'm not sure how convincing that is. Yes simpler software is good,
it's nice to have, but it's not such a hard requirement to use vfio
when vdpa is there. And again, the cost is reduced robustness.


It is not hard requirement to use vdpa or vfio one way or either.

vdpa users can use vdpa.
vfio users can use vfio.


But anyway, my main
point is about DMA. On the one hand you are asking for a VQ based
management interface because it saves money. On the other you are saying
DMA operations take extremely long to the point where they are unusable in
the boot sequence.


I think you missed the point I described few emails back.
The legacy registers are subset of the 1.x registers, so a device that 
implements existing 1.x registers, they get legacy registers for free.
Hence, there is no _real_ saving.


First not 100%.  E.g. MAC is writeable so that's a R/W register as
opposed to RO.  But generally, why implement 1.0 registers at all? Do it
all in transport vq.

1.x non transitional VFs and SIOV devices will use their own 
register_transport_vq once that infrastructure is in place to access its 
registers. Such VQ is directly accessible in the guest VM without 
hypervisor intervention.


It is orthogonal to this use case (and such future VQ still works with 
this design).


In this approach a hypervisor needs to use PF's AQ because the VF device 
(its registers, features etc) is not owned by the hypervisor VF driver.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit





On 4/3/2023 5:04 PM, Michael S. Tsirkin wrote:

On Mon, Apr 03, 2023 at 08:25:02PM +, Parav Pandit wrote:



From: Michael S. Tsirkin 
Sent: Monday, April 3, 2023 2:02 PM



Because vqs involve DMA operations.
It is left to the device implementation to do it, but a generic wisdom
is not implement such slow work in the data path engines.
So such register access vqs can/may be through firmware.
Hence it can involve a lot higher latency.


Then that wisdom is wrong? tens of microseconds is not workable even for
ethtool operations, you are killing boot time.


Huh.
What ethtool latencies have you experienced? Number?


I know an order of tens of eth calls happens during boot.
If as you said each takes tens of ms then we are talking close to a second.
That is measureable.

I said it can take, doesn't have to be always same for all the commands.
Better to work with real numbers. :)

Let me take an example to walk through.

If a cvq or aq command takes 0.5msec, total of 100 such commands will 
take 50msec.


Once a while if two of commands say take 5msec, will result in 50 -> 60 
msec.




OK then. Then if it is a dead end then it looks weird to add a whole new
config space as memory mapped.


I am aligned with you to not add any new register as memory mapped for 1.x.
Or access through device own's tvq is fine if such q can be initialized 
before during device reset (init) phase.


I explained that legacy registers are sub-set of existing 1.x.
They should not consume extra memory.

Lets walk through the merits and negatives of both to conclude.


Let me try again.



If hardware vendors do not want to bear the costs of registers then they
will not implement devices with registers, and then the whole thing will
become yet another legacy thing we need to support. If legacy emulation
without IO is useful, then can we not find a way to do it that will
survive the test of time?
legacy_register_transport_vq for VF can be a option, but not for PF 
emulation.

More below.




Again, I want to emphasize that register read/write over tvq has merits with 
trade-off.
And so the mmr has merits with trade-off too.

Better to list them and proceed forward.

Method-1: VF's register read/write via PF based transport VQ
Pros:
a. Light weight registers implementation in device for new memory region window


Is that all? I mentioned more.


b. device reset is more optimal with transport VQ
c. a hypervisor may want to check (but not necessary) register content
d. Some unknown guest VM driver which modifies mac address and still 
expect atomicity can benefit if hypervisor wants to do extra checks



Cons:
a. Higher DMA read/write latency
b. Device requires synchronization between non legacy memory mapped registers 
and legacy regs access via tvq


Same as a separate mmemory bar really.  Just don't do it. Either access
legacy or non legacy.

It is really not same to treat them equally as tvq encapsulation is 
different, and hw wouldn't prefer to treat them equally like regular 
memory writes.


Transitional device exposed by hypervisor contains both legacy I/O bar 
and also the memory mapped registers. So a guest vm can access both.



c. Can only work with the VF. Cannot work for thin hypervisor, which can map 
transitional PF to bare metal OS
(also listed in cover letter)


Is that a significant limitation? Why?

It is a functional limitation for the PF, as PF has no parent.
and PF can also utilize memory BAR.




Method-2: VF's register read/write via MMR (current proposal)
Pros:
a. Device utilizes the same legacy and non-legacy registers.



b. an order of magnitude lower latency due to avoidance of DMA on register 
accesses
(Important but not critical)


And no cons? Even if you could not see them yourself did I fail to express 
myself to such
an extent?

Method-1 pros covered the advantage of it over method-2, but yes worth 
to list here for completeness.


Cons:
requires creating new memory region window in the device for 
configuration access



No. Interrupt latency is in usec range.
The major latency contributors in msec range can arise from the device side.


So you are saying there are devices out there already with this MMR hack
baked in, and in hardware not firmware, so it works reasonably?

It is better to not assert a solution a "hack",


Sorry if that sounded offensive.  a hack is not necessary a bad thing.
It's a quick solution to a very local problem, though.

It is a solution because device can do at near to zero extra memory for 
existing registers.

Anyways, we have better technical details to resolve. :)
Lets focus on it.


Yes motivation is one of the things I'm trying to work out here.
It does however not help that it's an 11 patch strong patchset
adding 500 lines of text for what is supposedly a small change.

Many of the patches are rework and incorrect to attribute to the 
specific feature.


Like others it could have been one giant patch... but we see value in 
smaller patches..


Using tvq is even

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 08:42:52PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 4:03 PM
> > 
> > On Mon, Apr 03, 2023 at 07:48:56PM +, Parav Pandit wrote:
> > > > OK but supporting them with a passthrough driver such as vfio does
> > > > not seem that important.
> > > Not sure on what basis you assert it.
> > > I clarified in the cover letter that these are the user level 
> > > requirements to
> > support transitional and non-transitional devices both via single vfio 
> > subsystem.
> > 
> > And what is so wrong with vdpa?  Really I don't see how the virtio spec 
> > needs to
> > accomodate specific partitioning between linux modules, be it vdpa or vfio.
> > Way beyond the scope of the driver.
> >
> vdpa has its own value.
> 
> Here requirements are different as listed so let's focus on it.

I'm not sure how convincing that is. Yes simpler software is good,
it's nice to have, but it's not such a hard requirement to use vfio
when vdpa is there. And again, the cost is reduced robustness.

> > But anyway, my main
> > point is about DMA. On the one hand you are asking for a VQ based
> > management interface because it saves money. On the other you are saying
> > DMA operations take extremely long to the point where they are unusable in
> > the boot sequence.
> 
> I think you missed the point I described few emails back.
> The legacy registers are subset of the 1.x registers, so a device that 
> implements existing 1.x registers, they get legacy registers for free.
> Hence, there is no _real_ saving.

First not 100%.  E.g. MAC is writeable so that's a R/W register as
opposed to RO.  But generally, why implement 1.0 registers at all? Do it
all in transport vq.

> > So what is it? Was admin vq a mistake and we should do memory mapped?
> No. Certainly not.
> AQ is needed for LM, SR-IOV (SR-PCIM management), SIOV device life cycle.
> 
> > Or is
> > legacy emulation doable over a vq and latency is not a concern, and the real
> > reason is because it makes you push out a host driver with a bit less 
> > effort?
> > 
> Legacy registers emulation is doable over VQ and has its merits (I listed in 
> previous email).
> I forgot to mention in previous email that device reset is also better via 
> tvq.
> It is just that legacy_registers_transport_vq (LRT_VQ) requires more complex 
> hypervisor driver and only works for the VFs.
> 
> At spec level, MMR has value on the PF as well, hence I previously proposed 
> last week on your first email that spec should allow both.
> 
> Efforts of hypervisor not really a big concern.
> Once we converge that LRT_VQ is good, it is viable option too.
> I will shortly send out little more verbose command on lrt_vq so that its 
> optimal enough.
> 
> > I just do not see how these claims do not contradict each other.
> 
> An AQ for queuing, parallelism, memory saving.

Ok that all sounds good.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 08:25:02PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 2:02 PM
> 
> > > Because vqs involve DMA operations.
> > > It is left to the device implementation to do it, but a generic wisdom
> > > is not implement such slow work in the data path engines.
> > > So such register access vqs can/may be through firmware.
> > > Hence it can involve a lot higher latency.
> > 
> > Then that wisdom is wrong? tens of microseconds is not workable even for
> > ethtool operations, you are killing boot time.
> > 
> Huh.
> What ethtool latencies have you experienced? Number?

I know an order of tens of eth calls happens during boot.
If as you said each takes tens of ms then we are talking close to a second.
That is measureable.

> > I frankly don't know, if device vendors are going to interpret "DMA" as "can
> > take insane time" then maybe we need to scrap the whole admin vq idea and
> > make it all memory mapped like Jason wanted, so as not to lead them into
> > temptation?
> 
> DMA happens for all types of devices for control and data path.
> Can you point to any existing industry specification and real implementation 
> that highlights such timing requirements.
> This will be useful to understand where these requirements come from.
> 
> Multiple device implementors do not see memory mapped registers as way 
> forward.
> Discussed many times.
> There is no point in going that dead end.

OK then. Then if it is a dead end then it looks weird to add a whole new
config space as memory mapped.

> > Let me try again.
> > 
> > Modern host binds to modern interface. It can use the PF normally.
> > Legacy guest IOBAR accesses to VF are translated to transport vq accesses.
> > 
> I understand this part.
> Transport VQ is on the PF, right? (Nothing but AQ, right?)
> 
> It can work in VF case with trade-off compared to memory mapped registers.
> A lightweight hypervisor cannot benefit from this which wants to utilize this 
> for transitional PF too.
> So providing both the options is useful.

If hardware vendors do not want to bear the costs of registers then they
will not implement devices with registers, and then the whole thing will
become yet another legacy thing we need to support. If legacy emulation
without IO is useful, then can we not find a way to do it that will
survive the test of time?

> Again, I want to emphasize that register read/write over tvq has merits with 
> trade-off.
> And so the mmr has merits with trade-off too.
> 
> Better to list them and proceed forward.
> 
> Method-1: VF's register read/write via PF based transport VQ
> Pros:
> a. Light weight registers implementation in device for new memory region 
> window

Is that all? I mentioned more.

> Cons:
> a. Higher DMA read/write latency
> b. Device requires synchronization between non legacy memory mapped registers 
> and legacy regs access via tvq

Same as a separate mmemory bar really.  Just don't do it. Either access
legacy or non legacy.

> c. Can only work with the VF. Cannot work for thin hypervisor, which can map 
> transitional PF to bare metal OS
> (also listed in cover letter)

Is that a significant limitation? Why?

> Method-2: VF's register read/write via MMR (current proposal)
> Pros:
> a. Device utilizes the same legacy and non-legacy registers.

Not really. For starters endian-ness can be different. Maybe for some
devices and systems. 

> b. an order of magnitude lower latency due to avoidance of DMA on register 
> accesses
> (Important but not critical)

And no cons? Even if you could not see them yourself did I fail to express 
myself to such
an extent?

> > > No. Interrupt latency is in usec range.
> > > The major latency contributors in msec range can arise from the device 
> > > side.
> > 
> > So you are saying there are devices out there already with this MMR hack
> > baked in, and in hardware not firmware, so it works reasonably?
> It is better to not assert a solution a "hack",

Sorry if that sounded offensive.  a hack is not necessary a bad thing.
It's a quick solution to a very local problem, though.

> when you are still
> trying to understand the trade-offs of multiple solutions and when you
> are yet to fully review all requirements.
> (and when solution is also based on an offline feedback from you!)

Sorry if I set you on a wrong path I frankly didn't realize how
big a change the result will be. I was thinking more along the lines
of a capability describing a portion of a memory bar, saying access
there is equivalent to access to the io bar. That's it.

> No. I didn't say that device is out there.
> However, large part of the proposed changes is done based on real devices 
> (and not limited to virtio).

Yes motivation is one of the things I'm trying to work out here.
It does however not help that it's an 11 patch strong patchset
adding 500 lines of text for what is supposedly a small change.

> Regarding tvq, I have some idea on how to improve the register

[virtio-dev] RE: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 4:03 PM
> 
> On Mon, Apr 03, 2023 at 07:48:56PM +, Parav Pandit wrote:
> > > OK but supporting them with a passthrough driver such as vfio does
> > > not seem that important.
> > Not sure on what basis you assert it.
> > I clarified in the cover letter that these are the user level requirements 
> > to
> support transitional and non-transitional devices both via single vfio 
> subsystem.
> 
> And what is so wrong with vdpa?  Really I don't see how the virtio spec needs 
> to
> accomodate specific partitioning between linux modules, be it vdpa or vfio.
> Way beyond the scope of the driver.
>
vdpa has its own value.

Here requirements are different as listed so let's focus on it.
 
> But anyway, my main
> point is about DMA. On the one hand you are asking for a VQ based
> management interface because it saves money. On the other you are saying
> DMA operations take extremely long to the point where they are unusable in
> the boot sequence.

I think you missed the point I described few emails back.
The legacy registers are subset of the 1.x registers, so a device that 
implements existing 1.x registers, they get legacy registers for free.
Hence, there is no _real_ saving.

> So what is it? Was admin vq a mistake and we should do memory mapped?
No. Certainly not.
AQ is needed for LM, SR-IOV (SR-PCIM management), SIOV device life cycle.

> Or is
> legacy emulation doable over a vq and latency is not a concern, and the real
> reason is because it makes you push out a host driver with a bit less effort?
> 
Legacy registers emulation is doable over VQ and has its merits (I listed in 
previous email).
I forgot to mention in previous email that device reset is also better via tvq.
It is just that legacy_registers_transport_vq (LRT_VQ) requires more complex 
hypervisor driver and only works for the VFs.

At spec level, MMR has value on the PF as well, hence I previously proposed 
last week on your first email that spec should allow both.

Efforts of hypervisor not really a big concern.
Once we converge that LRT_VQ is good, it is viable option too.
I will shortly send out little more verbose command on lrt_vq so that its 
optimal enough.

> I just do not see how these claims do not contradict each other.

An AQ for queuing, parallelism, memory saving.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit



> From: Stefan Hajnoczi 
> Sent: Monday, April 3, 2023 3:10 PM


> Maybe call it "Memory-mapped Transitional"? That name would be easier to
> understand.
>
Sounds fine to me.
 
> > > Modern devices were added to Linux in 2014 and support SR-IOV.
> >
> > > Why is it
> > > important to support Transitional (which really means Legacy
> > > devices, otherwise Modern devices would be sufficient)?
> > >
> > To support guest VMs which only understand legacy devices and
> > unfortunately they are still in much wider use by the users.
> 
> I wonder which guest software without Modern VIRTIO support will still be
> supported by the time Transitional MMR software and hardware becomes
> available. Are you aiming for particular guest software versions?

Transitional MMR hardware is available almost now.
Transitional MMR software is also WIP to be available as soon as we ratify the 
spec via tvq or via mmr or both.
This will be support the guest sw version such as 2.6.32.754 kernel.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 2:02 PM

> > Because vqs involve DMA operations.
> > It is left to the device implementation to do it, but a generic wisdom
> > is not implement such slow work in the data path engines.
> > So such register access vqs can/may be through firmware.
> > Hence it can involve a lot higher latency.
> 
> Then that wisdom is wrong? tens of microseconds is not workable even for
> ethtool operations, you are killing boot time.
> 
Huh.
What ethtool latencies have you experienced? Number?

> I frankly don't know, if device vendors are going to interpret "DMA" as "can
> take insane time" then maybe we need to scrap the whole admin vq idea and
> make it all memory mapped like Jason wanted, so as not to lead them into
> temptation?

DMA happens for all types of devices for control and data path.
Can you point to any existing industry specification and real implementation 
that highlights such timing requirements.
This will be useful to understand where these requirements come from.

Multiple device implementors do not see memory mapped registers as way forward.
Discussed many times.
There is no point in going that dead end.

> Let me try again.
> 
> Modern host binds to modern interface. It can use the PF normally.
> Legacy guest IOBAR accesses to VF are translated to transport vq accesses.
> 
I understand this part.
Transport VQ is on the PF, right? (Nothing but AQ, right?)

It can work in VF case with trade-off compared to memory mapped registers.
A lightweight hypervisor cannot benefit from this which wants to utilize this 
for transitional PF too.
So providing both the options is useful.

Again, I want to emphasize that register read/write over tvq has merits with 
trade-off.
And so the mmr has merits with trade-off too.

Better to list them and proceed forward.

Method-1: VF's register read/write via PF based transport VQ
Pros:
a. Light weight registers implementation in device for new memory region window

Cons:
a. Higher DMA read/write latency
b. Device requires synchronization between non legacy memory mapped registers 
and legacy regs access via tvq
c. Can only work with the VF. Cannot work for thin hypervisor, which can map 
transitional PF to bare metal OS
(also listed in cover letter)

Method-2: VF's register read/write via MMR (current proposal)
Pros:
a. Device utilizes the same legacy and non-legacy registers.
b. an order of magnitude lower latency due to avoidance of DMA on register 
accesses
(Important but not critical)

> > No. Interrupt latency is in usec range.
> > The major latency contributors in msec range can arise from the device side.
> 
> So you are saying there are devices out there already with this MMR hack
> baked in, and in hardware not firmware, so it works reasonably?
It is better to not assert a solution a "hack", when you are still trying to 
understand the trade-offs of multiple solutions and when you are yet to fully 
review all requirements.
(and when solution is also based on an offline feedback from you!)

No. I didn't say that device is out there.
However, large part of the proposed changes is done based on real devices (and 
not limited to virtio).

Regarding tvq, I have some idea on how to improve the register read/writes so 
that its optimal for devices to implement.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 03:11:52PM -0400, Stefan Hajnoczi wrote:
> On Mon, Apr 03, 2023 at 01:48:46PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Apr 03, 2023 at 10:53:29AM -0400, Parav Pandit wrote:
> > > On 4/3/2023 10:45 AM, Stefan Hajnoczi wrote:
> > > > On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> > > > > 3. A hypervisor system prefers to have single stack regardless of
> > > > > virtio device type (net/blk) and be future compatible with a
> > > > > single vfio stack using SR-IOV or other scalable device
> > > > > virtualization technology to map PCI devices to the guest VM.
> > > > > (as transitional or otherwise)
> > > > 
> > > > What does this paragraph mean?
> > > > 
> > > It means regardless of a VF being transitional MMR VF or 1.x VF without 
> > > any
> > > MMR extensions, there is single vfio virtio driver handling both type of
> > > devices to map to the guest VM.
> > 
> > I don't think this can be vfio. You need a host layer translating
> > things such as device ID etc.
> 
> An mdev driver?

vdpa seems more likely. we can extend it to support legacy if we want
to.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 1:49 PM

> > Yes, but hypervisor is not involved in any configuration parsing or
> > anything of that nature.
> > It is only a passthrough fowarder from emulated IOBAR to memory mapped
> > legacy registers.
> > In other words, hypervisor do not care for the registers content at all.
> 
> This part I do not see as important. legacy is frozen in time. Implement it 
> once
> and you are done. Datapath differences are more important.
>
 
> > > In other words, the guest doesn't know about Transitional MMR and
> > > does not need any code changes.
> > >
> > > > 3. A hypervisor system prefers to have single stack regardless of
> > > > virtio device type (net/blk) and be future compatible with a
> > > > single vfio stack using SR-IOV or other scalable device
> > > > virtualization technology to map PCI devices to the guest VM.
> > > > (as transitional or otherwise)
> > >
> > > What does this paragraph mean?
> > >
> > It means regardless of a VF being transitional MMR VF or 1.x VF
> > without any MMR extensions, there is single vfio virtio driver
> > handling both type of devices to map to the guest VM.
> 
> I don't think this can be vfio. You need a host layer translating things such 
> as
> device ID etc.
>
vfio layer does it.

 
> > >
> > > Modern devices were added to Linux in 2014 and support SR-IOV.
> >
> > > Why is it
> > > important to support Transitional (which really means Legacy
> > > devices, otherwise Modern devices would be sufficient)?
> > >
> > To support guest VMs which only understand legacy devices and
> > unfortunately they are still in much wider use by the users.
> 
> OK but supporting them with a passthrough driver such as vfio does not seem
> that important.
Not sure on what basis you assert it.
I clarified in the cover letter that these are the user level requirements to 
support transitional and non-transitional devices both via single vfio 
subsystem.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Stefan Hajnoczi

On Mon, Apr 03, 2023 at 01:48:46PM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2023 at 10:53:29AM -0400, Parav Pandit wrote:
> > On 4/3/2023 10:45 AM, Stefan Hajnoczi wrote:
> > > On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> > > > 3. A hypervisor system prefers to have single stack regardless of
> > > > virtio device type (net/blk) and be future compatible with a
> > > > single vfio stack using SR-IOV or other scalable device
> > > > virtualization technology to map PCI devices to the guest VM.
> > > > (as transitional or otherwise)
> > > 
> > > What does this paragraph mean?
> > > 
> > It means regardless of a VF being transitional MMR VF or 1.x VF without any
> > MMR extensions, there is single vfio virtio driver handling both type of
> > devices to map to the guest VM.
> 
> I don't think this can be vfio. You need a host layer translating
> things such as device ID etc.

An mdev driver?


signature.asc
Description: PGP signature

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Stefan Hajnoczi

On Mon, Apr 03, 2023 at 10:53:29AM -0400, Parav Pandit wrote:
> 
> 
> On 4/3/2023 10:45 AM, Stefan Hajnoczi wrote:
> > On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> > > Overview:
> > > -
> > > The Transitional MMR device is a variant of the transitional PCI device.
> > 
> > What does "MMR" mean?
> > 
> memory mapped registers.
> Explained below in the design section and also in relevant patches 6 to 11.

Maybe call it "Memory-mapped Transitional"? That name would be easier to
understand.

> > Modern devices were added to Linux in 2014 and support SR-IOV.
> 
> > Why is it
> > important to support Transitional (which really means Legacy devices,
> > otherwise Modern devices would be sufficient)?
> > 
> To support guest VMs which only understand legacy devices and unfortunately
> they are still in much wider use by the users.

I wonder which guest software without Modern VIRTIO support will still
be supported by the time Transitional MMR software and hardware becomes
available. Are you aiming for particular guest software versions?

Stefan


signature.asc
Description: PGP signature

[virtio-dev] Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 01:29:32PM -0400, Parav Pandit wrote:
> 
> 
> On 4/3/2023 1:16 PM, Michael S. Tsirkin wrote:
> > On Mon, Apr 03, 2023 at 03:36:25PM +, Parav Pandit wrote:
> > > 
> > > 
> > > > From: virtio-comm...@lists.oasis-open.org  > > > open.org> On Behalf Of Michael S. Tsirkin
> > > 
> > > > > Transport vq for legacy MMR purpose seems fine with its latency and 
> > > > > DMA
> > > > overheads.
> > > > > Your question was about "scalability".
> > > > > After your latest response, I am unclear what "scalability" means.
> > > > > Do you mean saving the register space in the PCI device?
> > > > 
> > > > yes that's how you used scalability in the past.
> > > > 
> > > Ok. I am aligned.
> > > > > If yes, than, no for legacy guests for scalability it is not 
> > > > > required, because the
> > > > legacy register is subset of 1.x.
> > > > 
> > > > Weird.  what does guest being legacy have to do with a wish to save 
> > > > registers
> > > > on the host hardware?
> > > Because legacy has subset of the registers of 1.x. So no new registers 
> > > additional expected on legacy side.
> > > 
> > > > You don't have so many legacy guests as modern
> > > > guests? Why?
> > > > 
> > > This isn't true.
> > > 
> > > There is a trade-off, upto certain N, MMR based register access is fine.
> > > This is because 1.x is exposing super set of registers of legacy.
> > > Beyond a certain point device will have difficulty in doing MMR for 
> > > legacy and 1.x.
> > > At that point, legacy over tvq can be better scale but with lot higher 
> > > latency order of magnitude higher compare to MMR.
> > > If tvq being the only transport for these registers access, it would hurt 
> > > at lower scale too, due the primary nature of non_register access.
> > > And scale is relative from device to device.
> > 
> > Wow! Why an order of magnitide?
> > 
> Because vqs involve DMA operations.
> It is left to the device implementation to do it, but a generic wisdom is
> not implement such slow work in the data path engines.
> So such register access vqs can/may be through firmware.
> Hence it can involve a lot higher latency.

Then that wisdom is wrong? tens of microseconds is not workable even for
ethtool operations, you are killing boot time.

I frankly don't know, if device vendors are going to interpret
"DMA" as "can take insane time" then maybe we need to scrap
the whole admin vq idea and make it all memory mapped like
Jason wanted, so as not to lead them into temptation?


> > > > > 
> > > > > > > > And presumably it can all be done in firmware ...
> > > > > > > > Is there actual hardware that can't implement transport vq but
> > > > > > > > is going to implement the mmr spec?
> > > > > > > > 
> > > > > > > Nvidia and Marvell DPUs implement MMR spec.
> > > > > > 
> > > > > > Hmm implement it in what sense exactly?
> > > > > > 
> > > > > Do not follow the question.
> > > > > The proposed series will be implemented as PCI SR-IOV devices using 
> > > > > MMR
> > > > spec.
> > > > > 
> > > > > > > Transport VQ has very high latency and DMA overheads for 2 to 4
> > > > > > > bytes
> > > > > > read/write.
> > > > > > 
> > > > > > How many of these 2 byte accesses trigger from a typical guest?
> > > > > > 
> > > > > Mostly during the VM boot time. 20 to 40 registers read write access.
> > > > 
> > > > That is not a lot! How long does a DMA operation take then?
> > > > 
> > > > > > > And before discussing "why not that approach", lets finish
> > > > > > > reviewing "this
> > > > > > approach" first.
> > > > > > 
> > > > > > That's a weird way to put it. We don't want so many ways to do
> > > > > > legacy if we can help it.
> > > > > Sure, so lets finish the review of current proposal details.
> > > > > At the moment
> > > > > a. I don't see any visible gain of transport VQ other than device 
> > > > > reset part I
> > > > explained.
> > > > 
> > > > For example, we do not need a new range of device IDs and existing 
> > > > drivers can
> > > > bind on the host.
> > > > 
> > > So, unlikely due to already discussed limitation of feature negotiation.
> > > Existing transitional driver would also look for an IOBAR being second 
> > > limitation.
> > 
> > Some confusion here.
> Yes.
> > If you have a transitional driver you do not need a legacy device.
> > 
> IF I understood your thoughts split in two emails,
> 
> Your point was "we dont need new range of device IDs for transitional TVQ
> device because TVQ is new and its optional".
> 
> But this transitional TVQ device do not expose IOBAR expected by the
> existing transitional device, failing the driver load.
> 
> Your idea is not very clear.

Let me try again.

Modern host binds to modern interface. It can use the PF normally.
Legacy guest IOBAR accesses to VF are translated to transport vq
accesses.






> > 
> > 
> > > > > b. it can be a way with high latency, DMA overheads on the virtqueue 
> > > > > for
> > > > read/writes for small access.
> > > > 
> > > > numbers?
> > > It

Re: [virtio-dev] [PATCH v2 1/2] virtio-blk: migrate to the latest patchset version

2023-04-03 Thread Michael S. Tsirkin

On Thu, Mar 30, 2023 at 05:49:52PM -0400, Dmitry Fomichev wrote:
> The merged patch series to support zoned block devices in virtio-blk
> is not the most up to date version. The merged patch can be found at
> 
> https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomic...@wdc.com/
> 
> , but the latest and reviewed version is
> 
> https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomic...@wdc.com/

What happened here is that it was not sent to correct people or lists.

v2 happened to draw my attention by chance, I missed the
interface change and I did not see later ones and merged v2.

To: Jens Axboe ,
linux-bl...@vger.kernel.org,
Damien Le Moal ,
Stefan Hajnoczi ,
Hannes Reinecke , Sam Li 
Cc: virtio-dev@lists.oasis-open.org,
Dmitry Fomichev 

while:

$ ./scripts/get_maintainer.pl -f drivers/block/virtio_blk.c
"Michael S. Tsirkin"  (maintainer:VIRTIO CORE AND NET DRIVERS)
Jason Wang  (maintainer:VIRTIO CORE AND NET DRIVERS)
Paolo Bonzini  (reviewer:VIRTIO BLOCK AND SCSI DRIVERS)
Stefan Hajnoczi  (reviewer:VIRTIO BLOCK AND SCSI DRIVERS)
Jens Axboe  (maintainer:BLOCK LAYER)
virtualizat...@lists.linux-foundation.org (open list:VIRTIO CORE AND NET 
DRIVERS)
linux-bl...@vger.kernel.org (open list:BLOCK LAYER)
linux-ker...@vger.kernel.org (open list)




> The differences between the two are mostly cleanups, but there is one
> change that is very important in terms of compatibility with the
> approved virtio-zbd specification.
> 
> Before it was approved, the OASIS virtio spec had a change in
> VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the
> current virtio-blk driver code. In the running code, the status is
> the first byte of the in-header that is followed by some pad bytes
> and the u64 that carries the sector at which the data has been written
> to the zone back to the driver, aka the append sector.
> 
> This layout turned out to be problematic for implementing in QEMU and
> the request status byte has been eventually made the last byte of the
> in-header. The current code doesn't expect that and this causes the
> append sector value always come as zero to the block layer. This needs
> to be fixed ASAP.
> 
> Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 
> ---
>  drivers/block/virtio_blk.c  | 238 +---
>  include/uapi/linux/virtio_blk.h |  18 +--
>  2 files changed, 166 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2723eede6f21..4f0dbbb3d4a5 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -96,16 +96,14 @@ struct virtblk_req {
>  
>   /*
>* The zone append command has an extended in header.
> -  * The status field in zone_append_in_hdr must have
> -  * the same offset in virtblk_req as the non-zoned
> -  * status field above.
> +  * The status field in zone_append_in_hdr must always
> +  * be the last byte.
>*/
>   struct {
> + __virtio64 sector;
>   u8 status;
> - u8 reserved[7];
> - __le64 append_sector;
> - } zone_append_in_hdr;
> - };
> + } zone_append;
> + } in_hdr;
>  
>   size_t in_hdr_len;
>  
> @@ -154,7 +152,7 @@ static int virtblk_add_req(struct virtqueue *vq, struct 
> virtblk_req *vbr)
>   sgs[num_out + num_in++] = vbr->sg_table.sgl;
>   }
>  
> - sg_init_one(_hdr, >status, vbr->in_hdr_len);
> + sg_init_one(_hdr, >in_hdr.status, vbr->in_hdr_len);
>   sgs[num_out + num_in++] = _hdr;
>  
>   return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
> @@ -242,11 +240,14 @@ static blk_status_t virtblk_setup_cmd(struct 
> virtio_device *vdev,
> struct request *req,
> struct virtblk_req *vbr)
>  {
> - size_t in_hdr_len = sizeof(vbr->status);
> + size_t in_hdr_len = sizeof(vbr->in_hdr.status);
>   bool unmap = false;
>   u32 type;
>   u64 sector = 0;
>  
> + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED) && op_is_zone_mgmt(req_op(req)))
> + return BLK_STS_NOTSUPP;
> +
>   /* Set fields for all request types */
>   vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req));
>  
> @@ -287,7 +288,7 @@ static blk_status_t virtblk_setup_cmd(struct 
> virtio_device *vdev,
>   case REQ_OP_ZONE_APPEND:
>   type = VIRTIO_BLK_T_ZONE_APPEND;
>   sector = blk_rq_pos(req);
> - in_hdr_len = sizeof(vbr->zone_append_in_hdr);
> + in_hdr_len = sizeof(vbr->in_hdr.zone_append);
>   break;
>

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 10:53:29AM -0400, Parav Pandit wrote:
> 
> 
> On 4/3/2023 10:45 AM, Stefan Hajnoczi wrote:
> > On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> > > Overview:
> > > -
> > > The Transitional MMR device is a variant of the transitional PCI device.
> > 
> > What does "MMR" mean?
> > 
> memory mapped registers.
> Explained below in the design section and also in relevant patches 6 to 11.
> 
> > > It has its own small Device ID range. It does not have I/O
> > > region BAR; instead it exposes legacy configuration and device
> > > specific registers at an offset in the memory region BAR.
> > > 
> > > Such transitional MMR devices will be used at the scale of
> > > thousands of devices using PCI SR-IOV and/or future scalable
> > > virtualization technology to provide backward
> > > compatibility (for legacy devices) and also future
> > > compatibility with new features.
> > > 
> > > Usecase:
> > > 
> > > 1. A hypervisor/system needs to provide transitional
> > > virtio devices to the guest VM at scale of thousands,
> > > typically, one to eight devices per VM.
> > > 
> > > 2. A hypervisor/system needs to provide such devices using a
> > > vendor agnostic driver in the hypervisor system.
> > 
> > Is the idea that the hypervisor configures the new Transitional MMR
> > devices and makes them appear like virtio-pci Transitional devices?
> > 
> Yes, but hypervisor is not involved in any configuration parsing or anything
> of that nature.
> It is only a passthrough fowarder from emulated IOBAR to memory mapped
> legacy registers.
> In other words, hypervisor do not care for the registers content at all.

This part I do not see as important. legacy is frozen in time. Implement
it once and you are done. Datapath differences are more important.

> > In other words, the guest doesn't know about Transitional MMR and does
> > not need any code changes.
> > 
> > > 3. A hypervisor system prefers to have single stack regardless of
> > > virtio device type (net/blk) and be future compatible with a
> > > single vfio stack using SR-IOV or other scalable device
> > > virtualization technology to map PCI devices to the guest VM.
> > > (as transitional or otherwise)
> > 
> > What does this paragraph mean?
> > 
> It means regardless of a VF being transitional MMR VF or 1.x VF without any
> MMR extensions, there is single vfio virtio driver handling both type of
> devices to map to the guest VM.

I don't think this can be vfio. You need a host layer translating
things such as device ID etc.

> > 
> > Modern devices were added to Linux in 2014 and support SR-IOV.
> 
> > Why is it
> > important to support Transitional (which really means Legacy devices,
> > otherwise Modern devices would be sufficient)?
> > 
> To support guest VMs which only understand legacy devices and unfortunately
> they are still in much wider use by the users.

OK but supporting them with a passthrough driver such as vfio does
not seem that important.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 01:35:01PM -0400, Parav Pandit wrote:
> > So something like a vq would be a step up. I would like to
> > understand the performance angle though. What you describe
> > is pretty bad.
> > 
> Do you mean latency is bad or the description?

I don't know. We need admin vq and transport vq to work.
You describe latency numbers that make both unworkable.
I am interested in fixing that somehow, since it's a blocker.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v2 1/2] virtio-blk: migrate to the latest patchset version

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 08:15:26AM -0700, Christoph Hellwig wrote:
> What is "migrate to the latest patchset version" even supposed to mean?
> 
> I don't think that belongs into a kernel commit description.

I think this should be something like "update to latest interface version".

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit





On 4/3/2023 1:28 PM, Michael S. Tsirkin wrote:

On Mon, Apr 03, 2023 at 03:47:56PM +, Parav Pandit wrote:




From: Michael S. Tsirkin 
Sent: Monday, April 3, 2023 11:34 AM



Another is that we can actually work around legacy bugs in the hypervisor. For
example, atomicity and alignment bugs do not exist under DMA. Consider MAC
field, writeable in legacy.  Problem this write is not atomic, so there is a 
window
where MAC is corrupted.  If you do MMIO then you just have to copy this bug.
If you do DMA then hypervisor can buffer all of MAC and send to device in one
go.

I am familiar with this bug.
Users feedback that we received so far has kernels with driver support that 
uses CVQ for setting the mac address on legacy device.
So, it may help but not super important.

Also, if I recollect correctly, the mac address is configuring bit early in 
if-scripts sequence before bringing up the interface.
So, haven't seen real issue around it.


It's an example, there are other bugs in legacy interfaces.

The intent is to provide backward compatibility to the legacy interface, 
and not really fixing the legacy interface in itself as it may break the 
legacy itself.



Take inability to decline feature negotiation as an example.
Legacy driver would do this anyway. It would expect certain flow to work 
that has been worked for it when it was working over previous sw-hypervisor.


Hypervisor attempting to fail what was working before will not help.


With transport vq we can fail at transport level and
hypervisor can decide what to do, such as stopping guest or
unplugging device, etc.




So something like a vq would be a step up. I would like to
understand the performance angle though. What you describe
is pretty bad.


Do you mean latency is bad or the description?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v11 00/10] Introduce device group and device management

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 06:39:12PM +0200, Cornelia Huck wrote:
> On Mon, Apr 03 2023, "Michael S. Tsirkin"  wrote:
> 
> > Change log:
> >
> > since v10:
> > addressed lots of comments by Jiri, Stefan. Cornelia, Lngshan, Parav, 
> > Max
> > Cornelia, do you want to pick up 10/10 separately?
> 
> I'm not sure that makes sense -- if we refactor to introduce a patch
> that deals with all of the currently unimplemented features (and respin
> this series with the patch that lists the admin vq as unsupported on
> top) we could do that, but I don't think it's worth the effort.

My concern is that people are already kind of familiar with the
patchset, reshuffling for no end result will cause everyone to re-read.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit





On 4/3/2023 1:16 PM, Michael S. Tsirkin wrote:

On Mon, Apr 03, 2023 at 03:36:25PM +, Parav Pandit wrote:




From: virtio-comm...@lists.oasis-open.org  On Behalf Of Michael S. Tsirkin



Transport vq for legacy MMR purpose seems fine with its latency and DMA

overheads.

Your question was about "scalability".
After your latest response, I am unclear what "scalability" means.
Do you mean saving the register space in the PCI device?


yes that's how you used scalability in the past.


Ok. I am aligned.
  

If yes, than, no for legacy guests for scalability it is not required, because 
the

legacy register is subset of 1.x.

Weird.  what does guest being legacy have to do with a wish to save registers
on the host hardware?

Because legacy has subset of the registers of 1.x. So no new registers 
additional expected on legacy side.


You don't have so many legacy guests as modern
guests? Why?


This isn't true.

There is a trade-off, upto certain N, MMR based register access is fine.
This is because 1.x is exposing super set of registers of legacy.
Beyond a certain point device will have difficulty in doing MMR for legacy and 
1.x.
At that point, legacy over tvq can be better scale but with lot higher latency 
order of magnitude higher compare to MMR.
If tvq being the only transport for these registers access, it would hurt at 
lower scale too, due the primary nature of non_register access.
And scale is relative from device to device.


Wow! Why an order of magnitide?


Because vqs involve DMA operations.
It is left to the device implementation to do it, but a generic wisdom 
is not implement such slow work in the data path engines.

So such register access vqs can/may be through firmware.
Hence it can involve a lot higher latency.




And presumably it can all be done in firmware ...
Is there actual hardware that can't implement transport vq but
is going to implement the mmr spec?


Nvidia and Marvell DPUs implement MMR spec.


Hmm implement it in what sense exactly?


Do not follow the question.
The proposed series will be implemented as PCI SR-IOV devices using MMR

spec.



Transport VQ has very high latency and DMA overheads for 2 to 4
bytes

read/write.

How many of these 2 byte accesses trigger from a typical guest?


Mostly during the VM boot time. 20 to 40 registers read write access.


That is not a lot! How long does a DMA operation take then?


And before discussing "why not that approach", lets finish
reviewing "this

approach" first.

That's a weird way to put it. We don't want so many ways to do
legacy if we can help it.

Sure, so lets finish the review of current proposal details.
At the moment
a. I don't see any visible gain of transport VQ other than device reset part I

explained.

For example, we do not need a new range of device IDs and existing drivers can
bind on the host.


So, unlikely due to already discussed limitation of feature negotiation.
Existing transitional driver would also look for an IOBAR being second 
limitation.


Some confusion here.

Yes.

If you have a transitional driver you do not need a legacy device.


IF I understood your thoughts split in two emails,

Your point was "we dont need new range of device IDs for transitional 
TVQ device because TVQ is new and its optional".


But this transitional TVQ device do not expose IOBAR expected by the 
existing transitional device, failing the driver load.


Your idea is not very clear.





b. it can be a way with high latency, DMA overheads on the virtqueue for

read/writes for small access.

numbers?

It depends on the implementation, but at minimum, writes and reads can pay 
order of magnitude higher in 10 msec range.


A single VQ roundtrip takes a minimum of 10 milliseconds? This is indeed
completely unworkable for transport vq. Points:
- even for memory mapped you have an access take 1 millisecond?
   Extremely slow. Why?
- Why is DMA 10x more expensive? I expect it to be 2x more expensive:
   Normal read goes cpu -> device -> cpu, DMA does cpu -> device -> memory -> 
device -> cpu

Reason I am asking is because it is important for transport vq to have
a workable design.


But let me guess. Is there a chance that you are talking about an
interrupt driven design? *That* is going to be slow though I don't think
10msec, more like 10usec. But I expect transport vq to typically
work by (adaptive?) polling mostly avoiding interrupts.


No. Interrupt latency is in usec range.
The major latency contributors in msec range can arise from the device side.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v11 07/10] ccw: document ADMIN_VQ as reserved

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 11:55:29AM -0400, Parav Pandit wrote:
> 
> 
> On 4/3/2023 11:03 AM, Michael S. Tsirkin wrote:
> > Adding relevant registers needs more work and it's not
> > clear what the use-case will be as currently only
> > the PCI transport is supported. But let's keep the
> > door open on this.
> > We already say it's reserved in a central place, but it
> > does not hurt to remind implementers to mask it.
> > 
> s/implementers/implementation
> 
> > Note: there are more features to add to this list.
> > Will be done later with a patch on top.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > Reviewed-by: Stefan Hajnoczi 
> > ---
> >   content.tex | 12 
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/content.tex b/content.tex
> > index f7446bf..1213d48 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -2978,6 +2978,18 @@ \subsubsection{Resetting Devices}\label{sec:Virtio 
> > Transport Options / Virtio ov
> >   MAY also choose to verify reset completion by reading \field{device 
> > status} via
> >   CCW_CMD_READ_STATUS and checking whether it is 0 afterwards.
> > +\subsection{Features reserved for future use}\label{sec:Virtio Transport 
> > Options / Virtio over channel I/O / Features reserved for future use}
> > +
> > +At this time, devices and drivers utilizing Virtio over channel I/O
> > +do not support the following features:
> > +\begin{itemize}
> > +
> > +\item VIRTIO_F_ADMIN_VQ
> > +
> > +\end{itemize}
> > +
> > +These features are reserved for future use.
> > +
> >   \chapter{Device Types}\label{sec:Device Types}
> >   On top of the queues, config space and feature negotiation facilities
> 
> Same nit as that of mmio.

Sorry I don't get what this refers to.

> Reviewed-by: Parav Pandit 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 03:47:56PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 11:34 AM
> 
> > Another is that we can actually work around legacy bugs in the hypervisor. 
> > For
> > example, atomicity and alignment bugs do not exist under DMA. Consider MAC
> > field, writeable in legacy.  Problem this write is not atomic, so there is 
> > a window
> > where MAC is corrupted.  If you do MMIO then you just have to copy this bug.
> > If you do DMA then hypervisor can buffer all of MAC and send to device in 
> > one
> > go.
> I am familiar with this bug.
> Users feedback that we received so far has kernels with driver support that 
> uses CVQ for setting the mac address on legacy device.
> So, it may help but not super important.
> 
> Also, if I recollect correctly, the mac address is configuring bit early in 
> if-scripts sequence before bringing up the interface.
> So, haven't seen real issue around it.

It's an example, there are other bugs in legacy interfaces.

Take inability to decline feature negotiation as an example.
With transport vq we can fail at transport level and
hypervisor can decide what to do, such as stopping guest or
unplugging device, etc.

So something like a vq would be a step up. I would like to
understand the performance angle though. What you describe
is pretty bad.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v11 00/10] Introduce device group and device management

2023-04-03 Thread Parav Pandit





On 4/3/2023 11:02 AM, Michael S. Tsirkin wrote:


Change log:

since v10:
addressed lots of comments by Jiri, Stefan. Cornelia, Lngshan, Parav, 
Max
Cornelia, do you want to pick up 10/10 separately?

structuring prep patches as 7/10, 10/10 as first patches followed by 
6/10 is more clear as it builds the framework first for existing items.


But not a must item for me.

I am ok with the current patch sequence as the change in 10/10 is really 
small and issue number is already listed in commit log of it.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v11 07/10] ccw: document ADMIN_VQ as reserved

2023-04-03 Thread Parav Pandit





On 4/3/2023 11:03 AM, Michael S. Tsirkin wrote:

Adding relevant registers needs more work and it's not
clear what the use-case will be as currently only
the PCI transport is supported. But let's keep the
door open on this.
We already say it's reserved in a central place, but it
does not hurt to remind implementers to mask it.


s/implementers/implementation


Note: there are more features to add to this list.
Will be done later with a patch on top.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
  content.tex | 12 
  1 file changed, 12 insertions(+)

diff --git a/content.tex b/content.tex
index f7446bf..1213d48 100644
--- a/content.tex
+++ b/content.tex
@@ -2978,6 +2978,18 @@ \subsubsection{Resetting Devices}\label{sec:Virtio 
Transport Options / Virtio ov
  MAY also choose to verify reset completion by reading \field{device status} 
via
  CCW_CMD_READ_STATUS and checking whether it is 0 afterwards.
  
+\subsection{Features reserved for future use}\label{sec:Virtio Transport Options / Virtio over channel I/O / Features reserved for future use}

+
+At this time, devices and drivers utilizing Virtio over channel I/O
+do not support the following features:
+\begin{itemize}
+
+\item VIRTIO_F_ADMIN_VQ
+
+\end{itemize}
+
+These features are reserved for future use.
+
  \chapter{Device Types}\label{sec:Device Types}
  
  On top of the queues, config space and feature negotiation facilities


Same nit as that of mmio.

Reviewed-by: Parav Pandit 

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v11 06/10] mmio: document ADMIN_VQ as reserved

2023-04-03 Thread Parav Pandit





On 4/3/2023 11:03 AM, Michael S. Tsirkin wrote:

Adding relevant registers needs more work and it's not
clear what the use-case will be as currently only
the PCI transport is supported. But let's keep the
door open on this.
We already say it's reserved in a central place, but it
does not hurt to remind implementers to mask it.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Parav Pandit 
Reviewed-by: Stefan Hajnoczi 
---
  content.tex | 12 
  1 file changed, 12 insertions(+)

diff --git a/content.tex b/content.tex
index 5057df2..f7446bf 100644
--- a/content.tex
+++ b/content.tex
@@ -2364,6 +2364,18 @@ \subsection{Legacy interface}\label{sec:Virtio Transport 
Options / Virtio Over M
  
  Notification mechanisms did not change.
  
+\subsection{Features reserved for future use}\label{sec:Virtio Transport Options / Virtio Over MMIO / Features reserved for future use}

+
+At this time, devices and drivers utilizing Virtio Over MMIO

No need to mention, "At this time".
It is always spec version to version.
And "current time" of the spec is visible in its version.
Please remove "At this time".


+do not support the following features:
+\begin{itemize}
+
+\item VIRTIO_F_ADMIN_VQ
+
+\end{itemize}
+
+These features are reserved for future use.
+

Above line should be moved up before listing the features.
or should be written as
s/These/Above

like, Above listed features are reserved for future use.


  \section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio 
Over Channel I/O}
  
  S/390 based virtual machines support neither PCI nor MMIO, so a


Reviewed-by: Parav Pandit 

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 11:34 AM

> Another is that we can actually work around legacy bugs in the hypervisor. For
> example, atomicity and alignment bugs do not exist under DMA. Consider MAC
> field, writeable in legacy.  Problem this write is not atomic, so there is a 
> window
> where MAC is corrupted.  If you do MMIO then you just have to copy this bug.
> If you do DMA then hypervisor can buffer all of MAC and send to device in one
> go.
I am familiar with this bug.
Users feedback that we received so far has kernels with driver support that 
uses CVQ for setting the mac address on legacy device.
So, it may help but not super important.

Also, if I recollect correctly, the mac address is configuring bit early in 
if-scripts sequence before bringing up the interface.
So, haven't seen real issue around it.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit




> From: virtio-comm...@lists.oasis-open.org  open.org> On Behalf Of Michael S. Tsirkin

> > Transport vq for legacy MMR purpose seems fine with its latency and DMA
> overheads.
> > Your question was about "scalability".
> > After your latest response, I am unclear what "scalability" means.
> > Do you mean saving the register space in the PCI device?
> 
> yes that's how you used scalability in the past.
>
Ok. I am aligned.
 
> > If yes, than, no for legacy guests for scalability it is not required, 
> > because the
> legacy register is subset of 1.x.
> 
> Weird.  what does guest being legacy have to do with a wish to save registers
> on the host hardware? 
Because legacy has subset of the registers of 1.x. So no new registers 
additional expected on legacy side.

> You don't have so many legacy guests as modern
> guests? Why?
> 
This isn't true.

There is a trade-off, upto certain N, MMR based register access is fine.
This is because 1.x is exposing super set of registers of legacy.
Beyond a certain point device will have difficulty in doing MMR for legacy and 
1.x.
At that point, legacy over tvq can be better scale but with lot higher latency 
order of magnitude higher compare to MMR.
If tvq being the only transport for these registers access, it would hurt at 
lower scale too, due the primary nature of non_register access.
And scale is relative from device to device.

> >
> > > > > And presumably it can all be done in firmware ...
> > > > > Is there actual hardware that can't implement transport vq but
> > > > > is going to implement the mmr spec?
> > > > >
> > > > Nvidia and Marvell DPUs implement MMR spec.
> > >
> > > Hmm implement it in what sense exactly?
> > >
> > Do not follow the question.
> > The proposed series will be implemented as PCI SR-IOV devices using MMR
> spec.
> >
> > > > Transport VQ has very high latency and DMA overheads for 2 to 4
> > > > bytes
> > > read/write.
> > >
> > > How many of these 2 byte accesses trigger from a typical guest?
> > >
> > Mostly during the VM boot time. 20 to 40 registers read write access.
> 
> That is not a lot! How long does a DMA operation take then?
> 
> > > > And before discussing "why not that approach", lets finish
> > > > reviewing "this
> > > approach" first.
> > >
> > > That's a weird way to put it. We don't want so many ways to do
> > > legacy if we can help it.
> > Sure, so lets finish the review of current proposal details.
> > At the moment
> > a. I don't see any visible gain of transport VQ other than device reset 
> > part I
> explained.
> 
> For example, we do not need a new range of device IDs and existing drivers can
> bind on the host.
>
So, unlikely due to already discussed limitation of feature negotiation.
Existing transitional driver would also look for an IOBAR being second 
limitation.

> > b. it can be a way with high latency, DMA overheads on the virtqueue for
> read/writes for small access.
> 
> numbers?
It depends on the implementation, but at minimum, writes and reads can pay 
order of magnitude higher in 10 msec range.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 11:23:11AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2023 at 03:16:53PM +, Parav Pandit wrote:
> > 
> > 
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, April 3, 2023 11:07 AM
> > 
> > > > > OTOH it is presumably required for scalability anyway, no?
> > > > No.
> > > > Most new generation SIOV and SR-IOV devices operate without any para-
> > > virtualization.
> > > 
> > > Don't see the connection to PV. You need an emulation layer in the host 
> > > if you
> > > want to run legacy guests. Looks like it could do transport vq just as 
> > > well.
> > >
> > Transport vq for legacy MMR purpose seems fine with its latency and DMA 
> > overheads.
> > Your question was about "scalability".
> > After your latest response, I am unclear what "scalability" means.
> > Do you mean saving the register space in the PCI device?
> 
> yes that's how you used scalability in the past.
> 
> > If yes, than, no for legacy guests for scalability it is not required, 
> > because the legacy register is subset of 1.x.
> 
> Weird.  what does guest being legacy have to do with a wish to save
> registers on the host hardware? You don't have so many legacy guests as
> modern guests? Why?
> 
> 
> 
> >  
> > > > > And presumably it can all be done in firmware ...
> > > > > Is there actual hardware that can't implement transport vq but is
> > > > > going to implement the mmr spec?
> > > > >
> > > > Nvidia and Marvell DPUs implement MMR spec.
> > > 
> > > Hmm implement it in what sense exactly?
> > >
> > Do not follow the question.
> > The proposed series will be implemented as PCI SR-IOV devices using MMR 
> > spec.
> >  
> > > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes
> > > read/write.
> > > 
> > > How many of these 2 byte accesses trigger from a typical guest?
> > > 
> > Mostly during the VM boot time. 20 to 40 registers read write access.
> 
> That is not a lot! How long does a DMA operation take then?
> 
> > > > And before discussing "why not that approach", lets finish reviewing 
> > > > "this
> > > approach" first.
> > > 
> > > That's a weird way to put it. We don't want so many ways to do legacy if 
> > > we can
> > > help it.
> > Sure, so lets finish the review of current proposal details.
> > At the moment 
> > a. I don't see any visible gain of transport VQ other than device reset 
> > part I explained.
> 
> For example, we do not need a new range of device IDs and existing
> drivers can bind on the host.

Another is that we can actually work around legacy bugs in the
hypervisor. For example, atomicity and alignment bugs do not exist under
DMA. Consider MAC field, writeable in legacy.  Problem this write is not
atomic, so there is a window where MAC is corrupted.  If you do MMIO
then you just have to copy this bug. If you do DMA then hypervisor can
buffer all of MAC and send to device in one go.

> > b. it can be a way with high latency, DMA overheads on the virtqueue for 
> > read/writes for small access.
> 
> numbers?
> 
> -- 
> MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 03:16:53PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 11:07 AM
> 
> > > > OTOH it is presumably required for scalability anyway, no?
> > > No.
> > > Most new generation SIOV and SR-IOV devices operate without any para-
> > virtualization.
> > 
> > Don't see the connection to PV. You need an emulation layer in the host if 
> > you
> > want to run legacy guests. Looks like it could do transport vq just as well.
> >
> Transport vq for legacy MMR purpose seems fine with its latency and DMA 
> overheads.
> Your question was about "scalability".
> After your latest response, I am unclear what "scalability" means.
> Do you mean saving the register space in the PCI device?

yes that's how you used scalability in the past.

> If yes, than, no for legacy guests for scalability it is not required, 
> because the legacy register is subset of 1.x.

Weird.  what does guest being legacy have to do with a wish to save
registers on the host hardware? You don't have so many legacy guests as
modern guests? Why?



>  
> > > > And presumably it can all be done in firmware ...
> > > > Is there actual hardware that can't implement transport vq but is
> > > > going to implement the mmr spec?
> > > >
> > > Nvidia and Marvell DPUs implement MMR spec.
> > 
> > Hmm implement it in what sense exactly?
> >
> Do not follow the question.
> The proposed series will be implemented as PCI SR-IOV devices using MMR spec.
>  
> > > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes
> > read/write.
> > 
> > How many of these 2 byte accesses trigger from a typical guest?
> > 
> Mostly during the VM boot time. 20 to 40 registers read write access.

That is not a lot! How long does a DMA operation take then?

> > > And before discussing "why not that approach", lets finish reviewing "this
> > approach" first.
> > 
> > That's a weird way to put it. We don't want so many ways to do legacy if we 
> > can
> > help it.
> Sure, so lets finish the review of current proposal details.
> At the moment 
> a. I don't see any visible gain of transport VQ other than device reset part 
> I explained.

For example, we do not need a new range of device IDs and existing
drivers can bind on the host.

> b. it can be a way with high latency, DMA overheads on the virtqueue for 
> read/writes for small access.

numbers?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 11:07 AM

> > > OTOH it is presumably required for scalability anyway, no?
> > No.
> > Most new generation SIOV and SR-IOV devices operate without any para-
> virtualization.
> 
> Don't see the connection to PV. You need an emulation layer in the host if you
> want to run legacy guests. Looks like it could do transport vq just as well.
>
Transport vq for legacy MMR purpose seems fine with its latency and DMA 
overheads.
Your question was about "scalability".
After your latest response, I am unclear what "scalability" means.
Do you mean saving the register space in the PCI device?
If yes, than, no for legacy guests for scalability it is not required, because 
the legacy register is subset of 1.x.

 
> > > And presumably it can all be done in firmware ...
> > > Is there actual hardware that can't implement transport vq but is
> > > going to implement the mmr spec?
> > >
> > Nvidia and Marvell DPUs implement MMR spec.
> 
> Hmm implement it in what sense exactly?
>
Do not follow the question.
The proposed series will be implemented as PCI SR-IOV devices using MMR spec.
 
> > Transport VQ has very high latency and DMA overheads for 2 to 4 bytes
> read/write.
> 
> How many of these 2 byte accesses trigger from a typical guest?
> 
Mostly during the VM boot time. 20 to 40 registers read write access.

> > And before discussing "why not that approach", lets finish reviewing "this
> approach" first.
> 
> That's a weird way to put it. We don't want so many ways to do legacy if we 
> can
> help it.
Sure, so lets finish the review of current proposal details.
At the moment 
a. I don't see any visible gain of transport VQ other than device reset part I 
explained.
b. it can be a way with high latency, DMA overheads on the virtqueue for 
read/writes for small access.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 02/11] transport-pci: Move transitional device id to legacy section

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 02:58:52PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 10:50 AM
> 
> > 
> > No idea what all this means, sorry.  Please do not move text that affects 
> > modern
> > drivers to a legacy section. And we've spilled way too much ink on this 
> > already.
> 
> I disagree because spec do not describe modern driver and what you are 
> describing is not aligned the way current spec is written.
> I prefer to avoid mentioning it again the same feature bits section that 
> talks about Transitional interface.

Sorry I don't understand what you are trying to say here.  This is all
cosmetics, matter of personal preference.  But I did my best to try to
explain the reason this is not a cleanup but a breaking change. Was I
misunderstood or you just don't agree? No idea.
The reason for current placement is this:

A conformant implementation MUST be either transitional or
non-transitional, see \ref{intro:Legacy
Interface: Terminology}.

An implementation MAY choose to implement OPTIONAL support for the
legacy interface, including support for legacy drivers
or devices, by conforming to all of the MUST or
REQUIRED level requirements for the legacy interface
for the transitional devices and drivers.

The requirements for the legacy interface for transitional 
implementations
are located in sections named ``Legacy Interface'' listed below:

Binding to a transitional ID is mandatory for modern drivers.
*This* is why this ID can not go to legacy section - all of legacy sections
are and must stay optional.

What is true (and unfortunate) is that legacy sections are not as formal
as modern ones - originally we wanted them to be informational only. For
example there is no clear separation between driver and device
conformance sections.  Work on this if you like, that is welcome.
But please stop moving mandatory text to legacy sections.

-- 
MSR

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 02:57:26PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, April 3, 2023 10:53 AM
> > 
> > On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote:
> > > > I can not say I thought about this
> > > > deeply so maybe there's some problem, or maybe it's a worse approach
> > > > - could you comment on this? It looks like this could be a smaller
> > > > change, but maybe it isn't? Did you consider this option?
> > >
> > > We can possibly let both the options open for device vendors to implement.
> > >
> > > Change wise transport VQ is fairly big addition for both hypervisor
> > > driver and also for the device.
> > 
> > OTOH it is presumably required for scalability anyway, no?
> No.
> Most new generation SIOV and SR-IOV devices operate without any 
> para-virtualization.

Don't see the connection to PV. You need an emulation layer in the host
if you want to run legacy guests. Looks like it could do transport vq
just as well.

> > And presumably it can all be done in firmware ...
> > Is there actual hardware that can't implement transport vq but is going to
> > implement the mmr spec?
> > 
> Nvidia and Marvell DPUs implement MMR spec.

Hmm implement it in what sense exactly?

> Transport VQ has very high latency and DMA overheads for 2 to 4 bytes 
> read/write.

How many of these 2 byte accesses trigger from a typical guest?

> And before discussing "why not that approach", lets finish reviewing "this 
> approach" first.

That's a weird way to put it. We don't want so many ways to do legacy
if we can help it.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 08/10] admin: command list discovery

2023-04-03 Thread Michael S. Tsirkin

Add commands to find out which commands does each group support,
as well as enable their use by driver.
This will be especially useful once we have multiple group types.

An alternative is per-type VQs. This is possible but will
require more per-transport work. Discovery through the vq
helps keep things contained.

e.g. lack of support for some command can switch to a legacy mode

note that commands are expected to be avolved by adding new
fields to command specific data at the tail, so
we generally do not need feature bits for compatibility.

Signed-off-by: Michael S. Tsirkin 

---
dropped S.O.B by Max
explain in commit log how commands will evolve, comment by Stefan
replace will use with is capable of use, comment by Stefan
typo fix reported by David and Lingshan
rename cmds to cmd_opcodes suggested by Jiri
list group_type explicitly in command description suggested by Jiri
clarify how can device not support some commands
always say group administration commands, comment by Jiri
---
 admin.tex | 102 +-
 1 file changed, 101 insertions(+), 1 deletion(-)

diff --git a/admin.tex b/admin.tex
index f201bcd..9552797 100644
--- a/admin.tex
+++ b/admin.tex
@@ -113,7 +113,9 @@ \subsection{Group administration commands}\label{sec:Basic 
Facilities of a Virti
 \hline
 opcode & Name & Command Description \\
 \hline \hline
-0x - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd}\\
+0x & VIRTIO_ADMIN_CMD_LIST_QUERY & Provides to driver list of commands 
supported for this group type\\
+0x0001 & VIRTIO_ADMIN_CMD_LIST_USE & Provides to device list of commands used 
for this group type \\
+0x0002 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd}\\
 \hline
 0x8000 - 0x & - & Reserved for future commands (possibly using a different 
structure)\\
 \hline
@@ -183,6 +185,104 @@ \subsection{Group administration 
commands}\label{sec:Basic Facilities of a Virti
 depends on these structures and is described separately or is
 implicit in the structure description.
 
+Before sending any group administration commands to the device, the driver
+needs to communicate to the device which commands it is going to
+use. Initially (after reset), only two commands are assumed to be used:
+VIRTIO_ADMIN_CMD_LIST_QUERY and VIRTIO_ADMIN_CMD_LIST_USE.
+
+Before sending any other commands for any member of a specific group to
+the device, the driver queries the supported commands via
+VIRTIO_ADMIN_CMD_LIST_QUERY and sends the commands it is
+capable of using via VIRTIO_ADMIN_CMD_LIST_USE.
+
+Commands VIRTIO_ADMIN_CMD_LIST_QUERY and
+VIRTIO_ADMIN_CMD_LIST_USE
+both use the following structure describing the
+command opcodes:
+
+\begin{lstlisting}
+struct virtio_admin_cmd_list {
+   /* Indicates which of the below fields were returned
+   le64 device_admin_cmd_opcodes[];
+};
+\end{lstlisting}
+
+This structure is an array of 64 bit values in little-endian byte
+order, in which a bit is set if the specific command opcode
+is supported. Thus, \field{device_admin_cmd_opcodes[0]} refers to the
+first 64-bit value in this array corresponding to opcodes 0 to
+63, \field{device_admin_cmd_opcodes[1]} is the second 64-bit value
+corresponding to opcodes 64 to 127, etc.
+For example, the array of size 2 including
+the values 0x3 in \field{device_admin_cmd_opcodes[0]}
+and 0x1 in \field{device_admin_cmd_opcodes[1]} indicates that only
+opcodes 0, 1 and 64 are supported.
+The length of the array depends on the supported opcodes - it is
+large enough to include bits set for all supported opcodes,
+that is the length can be calculated by starting with the largest
+supported opcode adding one, dividing by 64 and rounding up.
+In other words, for
+VIRTIO_ADMIN_CMD_LIST_QUERY and VIRTIO_ADMIN_CMD_LIST_USE the
+length of \field{command_specific_result} and
+\field{command_specific_data} respectively will be
+$DIV_ROUND_UP(max_cmd, 64) * 8$ where DIV_ROUND_UP is integer division
+with round up and \field{max_cmd} is the largest available command opcode.
+
+The array is also allowed to be larger and to additionally
+include an arbitrary number of all-zero entries.
+
+Accordingly, bits 0 and 1 corresponding to opcode 0
+(VIRTIO_ADMIN_CMD_LIST_QUERY) and 1
+(VIRTIO_ADMIN_CMD_LIST_USE) are
+always set in \field{device_admin_cmd_opcodes[0]} returned by 
VIRTIO_ADMIN_CMD_LIST_QUERY.
+
+For the command VIRTIO_ADMIN_CMD_LIST_QUERY, \field{opcode} is set to 0x0.
+The \field{group_member_id} is unused. It is set to zero by driver.
+This command has no command specific data.
+The device, upon success, returns a result in
+\field{command_specific_result} in the format
+\field{struct virtio_admin_cmd_list} describing the
+list of group administration commands supported for the group type
+specified by \field{group_type}.
+
+For the command VIRTIO_ADMIN_CMD_LIST_USE, \field{opcode}
+is set to 0x1.
+The \field{group_member_id} is unused. It is set to zero by driver.
+The

[virtio-dev] [PATCH v11 09/10] admin: conformance clauses

2023-04-03 Thread Michael S. Tsirkin

Add conformance clauses for admin commands and admin virtqueues.

Signed-off-by: Michael S. Tsirkin 

---
typo and wording fixes reported by David
clarify cmd list use can be repeated but not in parallel with other
commands, and returning same value across uses - comment by Lingshan
add conformance clauses for new error codes
---
 admin.tex | 227 ++
 1 file changed, 227 insertions(+)

diff --git a/admin.tex b/admin.tex
index 9552797..a398c51 100644
--- a/admin.tex
+++ b/admin.tex
@@ -283,6 +283,150 @@ \subsection{Group administration 
commands}\label{sec:Basic Facilities of a Virti
 supporting multiple group types, the list of supported commands
 might differ between different group types.
 
+\devicenormative{\paragraph}{Group administration commands}{Basic Facilities 
of a Virtio Device / Device groups / Group administration commands}
+
+The device MUST validate \field{opcode}, \field{group_type} and
+\field{group_member_id}, and if any of these has an invalid or
+unsupported value, set \field{status} to
+VIRTIO_ADMIN_STATUS_EINVAL and set \field{status_qualifier}
+accordingly:
+\begin{itemize}
+\item if \field{group_type} is invalid, \field{status_qualifier}
+   MUST be set to VIRTIO_ADMIN_STATUS_Q_INVALID_GROUP;
+\item otherwise, if \field{opcode} is invalid,
+   \field{status_qualifier} MUST be set to
+   VIRTIO_ADMIN_STATUS_Q_INVALID_OPCODE;
+\item otherwise, if \field{group_member_id} is used by the
+   specific command and is invalid, \field{status_qualifier} MUST be
+   set to VIRTIO_ADMIN_STATUS_Q_INVALID_MEMBER.
+\end{itemize}
+
+If a command completes successfully, the device MUST set
+\field{status} to VIRTIO_ADMIN_STATUS_OK.
+
+If a command fails, the device MUST set
+\field{status} to a value different from VIRTIO_ADMIN_STATUS_OK.
+
+If \field{status} is set to VIRTIO_ADMIN_STATUS_EINVAL, the
+device state MUST NOT change, that is the command MUST NOT have
+any side effects on the device, in particular the device MUST NOT
+enter an error state as a result of this command.
+
+If a command fails, the device state generally SHOULD NOT change,
+as far as possible.
+
+The device MAY enforce additional restrictions and dependencies on
+opcodes used by the driver and MAY fail the command
+VIRTIO_ADMIN_CMD_LIST_USE with \field{status} set to VIRTIO_ADMIN_STATUS_EINVAL
+and \field{status_qualifier} set to VIRTIO_ADMIN_STATUS_Q_INVALID_FIELD
+if the list of commands used violate internal device dependencies.
+
+If the device supports multiple group types, commands for each group
+type MUST operate independently of each other, in particular,
+the device MAY return different results for VIRTIO_ADMIN_CMD_LIST_QUERY
+for different group types.
+
+After reset, if the device supports a given group type
+and before receiving VIRTIO_ADMIN_CMD_LIST_USE for this group type
+the device MUST assume
+that the list of legal commands used by the driver consists of
+the two commands VIRTIO_ADMIN_CMD_LIST_QUERY and VIRTIO_ADMIN_CMD_LIST_USE.
+
+After completing VIRTIO_ADMIN_CMD_LIST_USE successfully,
+the device MUST set the list of legal commands used by the driver
+to the one supplied in \field{command_specific_data}.
+
+The device MUST validate commands against the list used by
+the driver and MUST fail any commands not in the list with
+\field{status} set to VIRTIO_ADMIN_STATUS_EINVAL
+and \field{status_qualifier} set to
+VIRTIO_ADMIN_STATUS_Q_INVALID_OPCODE.
+
+The list of supported commands reported by the device MUST NOT
+shrink (but MAY expand): after reporting a given command as
+supported through VIRTIO_ADMIN_CMD_LIST_QUERY the device MUST NOT
+later report it as unsupported.  Further, after a given set of
+commands has been used (via a successful
+VIRTIO_ADMIN_CMD_LIST_USE), then after a device or system reset
+the device SHOULD complete successfully any following calls to
+VIRTIO_ADMIN_CMD_LIST_USE with the same list of commands; if this
+command VIRTIO_ADMIN_CMD_LIST_USE fails after a device or system
+reset, the device MUST not fail it solely because of the command
+list used.  Failure to do so would interfere with resuming from
+suspend and error recovery. Exceptions MAY apply if the system
+configuration assures, in some way, that the driver does not
+cache the previous value of VIRTIO_ADMIN_CMD_LIST_USE,
+such as in the case of a firmware upgrade or downgrade.
+
+When processing a command with the SR-IOV group type,
+if the device does not have an SR-IOV Extended Capability or
+if \field{VF Enable} is clear
+then the device MUST fail all commands with
+\field{status} set to VIRTIO_ADMIN_STATUS_EINVAL and
+\field{status_qualifier} set to
+VIRTIO_ADMIN_STATUS_Q_INVALID_GROUP;
+otherwise, if \field{group_member_id} is not
+between $1$ and \field{NumVFs} inclusive,
+the device MUST fail all commands with
+\field{status} set to VIRTIO_ADMIN_STATUS_EINVAL and
+\field{status_qualifier} set to
+VIRTIO_ADMIN_STATUS_Q_INVALID_MEMBER;

[virtio-dev] [PATCH v11 10/10] ccw: document more reserved features

2023-04-03 Thread Michael S. Tsirkin

vq reset and shared memory are unsupported, too.

Signed-off-by: Michael S. Tsirkin 
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/160
Reviewed-by: Stefan Hajnoczi 
---
 content.tex | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/content.tex b/content.tex
index 1213d48..fca3827 100644
--- a/content.tex
+++ b/content.tex
@@ -2985,6 +2985,8 @@ \subsection{Features reserved for future 
use}\label{sec:Virtio Transport Options
 \begin{itemize}
 
 \item VIRTIO_F_ADMIN_VQ
+\item VIRTIO_F_RING_RESET
+\item Shared memory regions including VIRTIO_PMEM_F_SHMEM_REGION
 
 \end{itemize}
 
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 07/10] ccw: document ADMIN_VQ as reserved

2023-04-03 Thread Michael S. Tsirkin

Adding relevant registers needs more work and it's not
clear what the use-case will be as currently only
the PCI transport is supported. But let's keep the
door open on this.
We already say it's reserved in a central place, but it
does not hurt to remind implementers to mask it.

Note: there are more features to add to this list.
Will be done later with a patch on top.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 content.tex | 12 
 1 file changed, 12 insertions(+)

diff --git a/content.tex b/content.tex
index f7446bf..1213d48 100644
--- a/content.tex
+++ b/content.tex
@@ -2978,6 +2978,18 @@ \subsubsection{Resetting Devices}\label{sec:Virtio 
Transport Options / Virtio ov
 MAY also choose to verify reset completion by reading \field{device status} via
 CCW_CMD_READ_STATUS and checking whether it is 0 afterwards.
 
+\subsection{Features reserved for future use}\label{sec:Virtio Transport 
Options / Virtio over channel I/O / Features reserved for future use}
+
+At this time, devices and drivers utilizing Virtio over channel I/O
+do not support the following features:
+\begin{itemize}
+
+\item VIRTIO_F_ADMIN_VQ
+
+\end{itemize}
+
+These features are reserved for future use.
+
 \chapter{Device Types}\label{sec:Device Types}
 
 On top of the queues, config space and feature negotiation facilities
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 04/10] admin: introduce virtio admin virtqueues

2023-04-03 Thread Michael S. Tsirkin

The admin virtqueues will be the first interface used to issue admin commands.

Currently the virtio specification defines control virtqueue to manipulate
features and configuration of the device it operates on:
virtio-net, virtio-scsi, etc all have existing control virtqueues. However,
control virtqueue commands are device type specific, which makes it very
difficult to extend for device agnostic commands.

Keeping the device-specific virtqueue separate from the admin virtqueue
is simpler and has fewer potential problems. I don't think creating
common infrastructure for device-specific control virtqueues across
device types worthwhile or within the scope of this patch series.

To support this requirement in a more generic way, this patch introduces
a new admin virtqueue interface.
The admin virtqueue can be seen as the virtqueue analog to a transport.
The admin queue thus does nothing device type-specific (net, scsi, etc)
and instead focuses on transporting the admin commands.

We also support more than one admin virtqueue, for QoS and
scalability requirements.

Signed-off-by: Michael S. Tsirkin 

---
explain ordering of commands as suggested by Stefan
dropped Max's S.O.B
reword commit log as suggested by David
minor wording fixes suggested by David
---
 admin.tex   | 75 +
 content.tex |  7 +++--
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/admin.tex b/admin.tex
index c869a1a..f201bcd 100644
--- a/admin.tex
+++ b/admin.tex
@@ -182,3 +182,78 @@ \subsection{Group administration commands}\label{sec:Basic 
Facilities of a Virti
 \field{command_specific_data} and \field{command_specific_result}
 depends on these structures and is described separately or is
 implicit in the structure description.
+
+\section{Administration Virtqueues}\label{sec:Basic Facilities of a Virtio 
Device / Administration Virtqueues}
+
+An administration virtqueue of an owner device is used to submit
+group administration commands. An owner device can have more
+than one administration virtqueue.
+
+If VIRTIO_F_ADMIN_VQ has been negotiated, an owner device exposes one
+or more adminstration virtqueues. The number and locations of the
+administration virtqueues are exposed by the owner device in a transport
+specific manner.
+
+The driver queues requests to an arbitrary administration
+virtqueue, and they are used by the device on that same
+virtqueue. It is the responsibility of the driver to ensure
+strict request ordering for commands, because they will be
+consumed with no order constraints.  For example, if consistency
+is required then the driver can wait for the processing of a
+first command by the device to be completed before submitting
+another command depending on the first one.
+
+Administration virtqueues are used as follows:
+\begin{itemize}
+\item The driver submits the command using the \field{struct virtio_admin_cmd}
+structure using a buffer consisting of two parts: a device-readable one 
followed by a
+device-writable one.
+\item the device-readable part includes fields from \field{opcode}
+through \field{command_specific_data}.
+\item the device-writeable buffer includes fields from \field{status}
+through \field{command_specific_result} inclusive.
+\end{itemize}
+
+For each command, this specification describes a distinct
+format structure used for \field{command_specific_data} and
+\field{command_specific_result}, the length of these fields
+depends on the command.
+
+However, to ensure forward compatibility
+\begin{itemize}
+\item drivers are allowed to submit buffers that are longer
+than the device expects
+(that is, longer than the length of
+\field{opcode} through \field{command_specific_data}).
+This allows the driver to maintain
+a single format structure even if some structure fields are
+unused by the device.
+\item drivers are allowed to submit buffers that are shorter
+than what the device expects
+(that is, shorter than the length of \field{status} through
+\field{command_specific_result}). This allows the device to maintain
+a single format structure even if some structure fields are
+unused by the driver.
+\end{itemize}
+
+The device compares the length of each part (device-readable and
+device-writeable) of the buffer as submitted by driver to what it
+expects and then silently truncates the structures to either the
+length submitted by the driver, or the length described in this
+specification, whichever is shorter.  The device silently ignores
+any data falling outside the shorter of the two lengths. Any
+missing fields are interpreted as set to zero.
+
+Similarly, the driver compares the used buffer length
+of the buffer to what it expects and then silently
+truncates the structure to the used buffer length.
+The driver silently ignores any data falling outside
+the used buffer length reported by the device.  Any missing
+fields are interpreted as set to zero.
+
+This simplifies driver and device implementations since the

[virtio-dev] [PATCH v11 06/10] mmio: document ADMIN_VQ as reserved

2023-04-03 Thread Michael S. Tsirkin

Adding relevant registers needs more work and it's not
clear what the use-case will be as currently only
the PCI transport is supported. But let's keep the
door open on this.
We already say it's reserved in a central place, but it
does not hurt to remind implementers to mask it.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Parav Pandit 
Reviewed-by: Stefan Hajnoczi 
---
 content.tex | 12 
 1 file changed, 12 insertions(+)

diff --git a/content.tex b/content.tex
index 5057df2..f7446bf 100644
--- a/content.tex
+++ b/content.tex
@@ -2364,6 +2364,18 @@ \subsection{Legacy interface}\label{sec:Virtio Transport 
Options / Virtio Over M
 
 Notification mechanisms did not change.
 
+\subsection{Features reserved for future use}\label{sec:Virtio Transport 
Options / Virtio Over MMIO / Features reserved for future use}
+
+At this time, devices and drivers utilizing Virtio Over MMIO
+do not support the following features:
+\begin{itemize}
+
+\item VIRTIO_F_ADMIN_VQ
+
+\end{itemize}
+
+These features are reserved for future use.
+
 \section{Virtio Over Channel I/O}\label{sec:Virtio Transport Options / Virtio 
Over Channel I/O}
 
 S/390 based virtual machines support neither PCI nor MMIO, so a
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 05/10] pci: add admin vq registers to virtio over pci

2023-04-03 Thread Michael S. Tsirkin

Add new registers to the PCI common configuration structure.

These registers will be used for querying the indices of the admin
virtqueues of the owner device. To configure, reset or enable the admin
virtqueues, the driver should follow existing queue configuration/setup
sequence.

Signed-off-by: Michael S. Tsirkin 

---
dropped Max's S.O.B
make queue_num not 0 based
---
 content.tex | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/content.tex b/content.tex
index 2eb15fa..5057df2 100644
--- a/content.tex
+++ b/content.tex
@@ -948,6 +948,10 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
 le64 queue_device;  /* read-write */
 le16 queue_notify_data; /* read-only for driver */
 le16 queue_reset;   /* read-write */
+
+/* About the administration virtqueue. */
+le16 admin_queue_index; /* read-only for driver */
+le16 admin_queue_num; /* read-only for driver */
 };
 \end{lstlisting}
 
@@ -1033,6 +1037,19 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
 This field exists only if VIRTIO_F_RING_RESET has been
 negotiated. (see \ref{sec:Basic Facilities of a Virtio Device / 
Virtqueues / Virtqueue Reset}).
 
+\item[\field{admin_queue_index}]
+The device uses this to report the index of the first administration 
virtqueue.
+This field is valid only if VIRTIO_F_ADMIN_VQ has been negotiated.
+\item[\field{admin_queue_num}]
+   The device uses this to report the number of the
+   supported administration virtqueues.
+   Virtqueues with index
+   between \field{admin_queue_index} and (\field{admin_queue_index} +
+   \field{admin_queue_num} - 1) inclusive serve as administration
+   virtqueues.
+   The value 0 indicates no supported administration virtqueues.
+   This field is valid only if VIRTIO_F_ADMIN_VQ has been
+   negotiated.
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio 
Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common 
configuration structure layout}
@@ -1119,6 +1136,14 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
 were used before the queue reset.
 (see \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue 
Reset}).
 
+If VIRTIO_F_ADMIN_VQ has been negotiated, and if the driver
+configures any administration virtqueues, the driver MUST
+configure the administration virtqueues using the index
+in the range \field{admin_queue_index} to
+\field{admin_queue_index} + \field{admin_queue_num} - 1 inclusive.
+The driver MAY configure less administration virtqueues than
+supported by the device.
+
 \subsubsection{Notification structure layout}\label{sec:Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}
 
 The notification location is found using the VIRTIO_PCI_CAP_NOTIFY_CFG
@@ -7686,6 +7711,15 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
Feature Bits}
 
   \item[VIRTIO_F_ADMIN_VQ(41)] This feature indicates that the device exposes 
one or more
   administration virtqueues.
+  At the moment this feature is only supported for devices using
+  \ref{sec:Virtio Transport Options / Virtio Over PCI
+ Bus}~\nameref{sec:Virtio Transport Options / Virtio Over PCI Bus}
+ as the transport and is reserved for future use for
+ devices using other transports (see
+ \ref{drivernormative:Basic Facilities of a Virtio Device / Feature 
Bits}
+   and
+   \ref{devicenormative:Basic Facilities of a Virtio Device / Feature 
Bits} for
+   handling features reserved for future use.
 
 \end{description}
 
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 03/10] admin: introduce group administration commands

2023-04-03 Thread Michael S. Tsirkin

This introduces a general structure for group administration commands,
used to control device groups through their owner.

Following patches will introduce specific commands and an interface for
submitting these commands to the owner.

Note that the commands are focused on controlling device groups:
this is why group related fields are in the generic part of
the structure.
Without this the admin vq would become a "whatever" vq which does not do
anything specific at all, just a general transport like thing.
I feel going this way opens the design space to the point where
we no longer know what belongs in e.g. config space
what in the control q and what in the admin q.
As it is, whatever deals with groups is in the admin q; other
things not in the admin q.

There are specific exceptions such as query but that's an exception that
proves the rule ;)

Signed-off-by: Michael S. Tsirkin 
---

changes since v10:
explain the role of status vs status_qualifier, addresses
multiple comments
add more status values and appropriate qualifiers,
as suggested by Parav
clarify reserved command opcodes as suggested by Stefan
better formatting for ranges, comment by Jiri
make sure command-specific-data is a multiple of qword,
so things are aligned, suggested by Jiri
add Q_OK qualifier for success
---
 admin.tex| 121 +++
 introduction.tex |   3 ++
 2 files changed, 124 insertions(+)

diff --git a/admin.tex b/admin.tex
index 04d5498..c869a1a 100644
--- a/admin.tex
+++ b/admin.tex
@@ -60,4 +60,125 @@ \section{Device groups}\label{sec:Basic Facilities of a 
Virtio Device / Device g
 PCI transport (see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}).
 \end{description}
 
+\subsection{Group administration commands}\label{sec:Basic Facilities of a 
Virtio Device / Device groups / Group administration commands}
 
+The driver sends group administration commands to the owner device of
+a group to control member devices of the group.
+This mechanism can
+be used, for example, to configure a member device before it is
+initialized by its driver.
+\footnote{The term "administration" is intended to be interpreted
+widely to include any kind of control. See specific commands
+for detail.}
+
+All the group administration commands are of the following form:
+
+\begin{lstlisting}
+struct virtio_admin_cmd {
+/* Device-readable part */
+le16 opcode;
+/*
+ * 1   - SR-IOV
+ * 2-65535 - reserved
+ */
+le16 group_type;
+/* unused, reserved for future extensions */
+u8 reserved1[12];
+le64 group_member_id;
+le64 command_specific_data[];
+
+/* Device-writable part */
+le16 status;
+le16 status_qualifier;
+/* unused, reserved for future extensions */
+u8 reserved2[4];
+u8 command_specific_result[];
+};
+\end{lstlisting}
+
+For all commands, \field{opcode}, \field{group_type} and if
+necessary \field{group_member_id} and \field{command_specific_data} are
+set by the driver, and the owner device sets \field{status} and if
+needed \field{status_qualifier} and
+\field{command_specific_result}.
+
+Generally, any unused device-readable fields are set to zero by the driver
+and ignored by the device.  Any unused device-writeable fields are set to zero
+by the device and ignored by the driver.
+
+\field{opcode} specifies the command. The valid
+values for \field{opcode} can be found in the following table:
+
+\begin{tabular}{|l|l|}
+\hline
+opcode & Name & Command Description \\
+\hline \hline
+0x - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd}\\
+\hline
+0x8000 - 0x & - & Reserved for future commands (possibly using a different 
structure)\\
+\hline
+\end{tabular}
+
+The \field{group_type} specifies the group type identifier.
+The \field{group_member_id} specifies the member identifier within the group.
+See section \ref{sec:Introduction / Terminology / Device group}
+for the definition of the group type identifier and group member
+identifier.
+
+The \field{status} describes the command result and possibly
+failure reason at an abstract level, this is appropriate for
+forwarding to applications. The \field{status_qualifier} describes
+failures at a low virtio specific level, as appropriate for debugging.
+The following table describes possible \field{status} values;
+to simplify common implementations, they are intentionally
+matching common \hyperref[intro:errno]{Linux error names and numbers}:
+
+\begin{tabular}{|l|l|l|}
+\hline
+Status (decimal) & Name & Description \\
+\hline \hline
+00   & VIRTIO_ADMIN_STATUS_OK& successful completion  \\
+\hline
+11   & VIRTIO_ADMIN_STATUS_EAGAIN& try again \\
+\hline
+12   & VIRTIO_ADMIN_STATUS_ENOMEM& insufficient resources \\
+\hline
+22   & VIRTIO_ADMIN_STATUS_EINVAL& invalid command \\

[virtio-dev] [PATCH v11 02/10] admin: introduce device group and related concepts

2023-04-03 Thread Michael S. Tsirkin

Each device group has a type. For now, define one initial group type:

SR-IOV type - PCI SR-IOV virtual functions (VFs) of a given
PCI SR-IOV physical function (PF). This group may contain zero or more
virtio devices according to NumVFs configured.

Each device within a group has a unique identifier. This identifier
is the group member identifier.

Note: one can argue both ways whether the new device group handling
functionality (this and following patches) is closer
to a new device type or a new transport type.

However, I expect that we will add more features in the near future. To
facilitate this as much as possible of the text is located in the new
admin chapter.

I did my best to minimize transport-specific text.

There's a bit of duplication with 0x1 repeated twice and
no special section for group type identifiers.
I think it's ok to defer adding these until we have more group
types.

Signed-off-by: Michael S. Tsirkin 
---
changes in v11:
dropped Max's S.O.B.
dropped an unmatched ) as reported by Parav
document that member id is 1 to numvfs
document that vf enable must be set (will also be reflected in
the conformance section)
document that we deferred generalizing text as requested by Parav -
I think we can do it later
minor wording fixes to address comments by David
add concepts of owner and member driver. unused currently
but will come in handy later, as suggested by Stefan
---
 admin.tex   | 63 +
 content.tex |  2 ++
 2 files changed, 65 insertions(+)
 create mode 100644 admin.tex

diff --git a/admin.tex b/admin.tex
new file mode 100644
index 000..04d5498
--- /dev/null
+++ b/admin.tex
@@ -0,0 +1,63 @@
+\section{Device groups}\label{sec:Basic Facilities of a Virtio Device / Device 
groups}
+
+It is occasionally useful to have a device control a group of
+other devices. Terminology used in such cases:
+
+\begin{description}
+\item[Device group]
+or just group, includes zero or more devices.
+\item[Owner device]
+or owner, the device controlling the group.
+\item[Member device]
+a device within a group. The owner device itself is not
+   a member of the group.
+\item[Member identifier]
+each member has this identifier, unique within the group
+   and used to address it through the owner device.
+\item[Group type identifier]
+   specifies what kind of member devices there are in a
+   group, how the member identifier is interpreted
+   and what kind of control the owner has.
+   A given owner can control multiple groups
+   of different types but only a single group of a given type,
+   thus the type and the owner together identify the group.
+   \footnote{Even though some group types only support
+   specific transports, group type identifiers
+   are global rather than transport-specific -
+   we don't expect a flood of new group types.}
+\end{description}
+
+\begin{note}
+Each device only has a single driver, thus for the purposes of
+this section, "the driver" is usually unambiguous and refers to
+the driver of the owner device.  When there's ambiguity, "owner
+driver" refers to the driver of the owner device, while "member
+driver" refers to the driver of a member device.
+\end{note}
+
+The following group types, and their identifiers, are currently specified:
+\begin{description}
+\item[SR-IOV group type (0x1)]
+This device group has a PCI Single Root I/O Virtualization
+(SR-IOV) physical function (PF) device as the owner and includes
+all its SR-IOV virtual functions (VFs) as members (see
+\hyperref[intro:PCIe]{[PCIe]}).
+
+The PF device itself is not a member of the group.
+
+The group type identifier for this group is 0x1.
+
+A member identifier for this group can have a value from 0x1 to
+\field{NumVFs} as specified in the
+SR-IOV Extended Capability of the owner device
+and equals the SR-IOV VF number of the member device;
+the group only exists when the \field{VF Enable} bit
+in the SR-IOV Control Register within the
+SR-IOV Extended Capability of the owner device is set
+(see \hyperref[intro:PCIe]{[PCIe]}).
+
+Both owner and member devices for this group type use the Virtio
+PCI transport (see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}).
+\end{description}
+
+
diff --git a/content.tex b/content.tex
index 8098988..04592fb 100644
--- a/content.tex
+++ b/content.tex
@@ -493,6 +493,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a 
Virtio Device / Expo
 types. It is RECOMMENDED that devices generate version 4
 UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
 
+\input{admin.tex}
+
 \chapter{General Initialization And Device Operation}\label{sec:General 
Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
-- 
MST

[virtio-dev] [PATCH v11 01/10] virtio: document forward compatibility guarantees

2023-04-03 Thread Michael S. Tsirkin

Feature negotiation forms the basis of forward compatibility
guarantees of virtio but has never been properly documented.
Do it now.

Suggested-by: Halil Pasic 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Parav Pandit 
---

Changes in v11:
make explanation about not using any unspecified bits
more detailed as suggested by Cornelia and David
---
 content.tex | 44 
 1 file changed, 44 insertions(+)

diff --git a/content.tex b/content.tex
index 0e474dd..8098988 100644
--- a/content.tex
+++ b/content.tex
@@ -114,21 +114,65 @@ \section{Feature Bits}\label{sec:Basic Facilities of a 
Virtio Device / Feature B
 In particular, new fields in the device configuration space are
 indicated by offering a new feature bit.
 
+To keep the feature negotiation mechanism extensible, it is
+important that devices \em{do not} offer any feature bits that
+they would not be able to handle if the driver accepted them
+(even though drivers are not supposed to accept any unspecified,
+reserved, or unsupported features even if offered, according to
+the specification.) Likewise, it is important that drivers \em{do
+not} accept feature bits they do not know how to handle (even
+though devices are not supposed to offer any unspecified,
+reserved, or unsupported features in the first place,
+according to the specification.) The preferred
+way for handling reserved and unexpected features is that the
+driver ignores them.
+
+In particular, this is
+especially important for features limited to specific transports,
+as enabling these for more transports in future versions of the
+specification is highly likely to require changing the behaviour
+from drivers and devices.  Drivers and devices supporting
+multiple transports need to carefully maintain per-transport
+lists of allowed features.
+
 \drivernormative{\subsection}{Feature Bits}{Basic Facilities of a Virtio 
Device / Feature Bits}
 The driver MUST NOT accept a feature which the device did not offer,
 and MUST NOT accept a feature which requires another feature which was
 not accepted.
 
+The driver MUST validate the feature bits offered by the device.
+The driver MUST ignore and MUST NOT accept any feature bit that is
+\begin{itemize}
+\item not described in this specification,
+\item marked as reserved,
+\item not supported for the specific transport,
+\item not defined for the device type.
+\end{itemize}
+
 The driver SHOULD go into backwards compatibility mode
 if the device does not offer a feature it understands, otherwise MUST
 set the FAILED \field{device status} bit and cease initialization.
 
+By contrast, the driver MUST NOT fail solely because a feature
+it does not understand has been offered by the device.
+
 \devicenormative{\subsection}{Feature Bits}{Basic Facilities of a Virtio 
Device / Feature Bits}
 The device MUST NOT offer a feature which requires another feature
 which was not offered.  The device SHOULD accept any valid subset
 of features the driver accepts, otherwise it MUST fail to set the
 FEATURES_OK \field{device status} bit when the driver writes it.
 
+The device MUST NOT offer feature bits corresponding to features
+it would not support if accepted by the driver (even if the
+driver is prohibited from accepting the feature bits by the
+specification); for the sake of clarity, this refers to feature
+bits not described in this specification, reserved feature bits
+and feature bits reserved or not supported for the specific
+transport or the specific device type, but this does not preclude
+devices written to a future version of this specification from
+offering such feature bits should such a specification have a
+provision for devices to support the corresponding features.
+
 If a device has successfully negotiated a set of features
 at least once (by accepting the FEATURES_OK \field{device
 status} bit during device initialization), then it SHOULD
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] [PATCH v11 00/10] Introduce device group and device management

2023-04-03 Thread Michael S. Tsirkin



Change log:

since v10:
addressed lots of comments by Jiri, Stefan. Cornelia, Lngshan, Parav, 
Max
Cornelia, do you want to pick up 10/10 separately?

since v9:
addressed comments by Parav, Max, Cornelia, David and Zhu Lingshan:
added link to errno header from Linux
rename _MEM to _MEMBER
admin vq num is zero based
clarify who sends commands where
minor english tweaks
clarify command length
specify interaction with sriov capability
correct commit log - NumVFs can be 0

i could not decide what should happen when VFs are
disabled. for now did not specify.

since v8:
addressed comments by Cornelia - as we agreed on list

since v7:
make high level error codes match linux, with virtio specific codes
in a separate field
renamed _ACCEPT to _USE since that's what it does
clarified forward compatibility and non pci transports
support multiple admin vqs
conformance statements
lots of changes all over the place to I changed author from Max
to myself. Don't need to take credit but also don't want
to blame Max for my mistakes.

since v6:

- removed some extentions intended for future use.
  We'll do them when we get there.

- brought back command list query from v5 in a simplified form -
  it's here to address the case where a single parent
  can address multiple groups, such as PF addressing
  transport vq and sriov vfs.

- attempt to make terminology more formal.
In particular a term for whoever controls the group.
I am still going back
and forth between "parent" and "owner" - owner might
be better after all since it will work if we ever
have a self group. For now it's parent.

TODO (maybe?) - probably ok to defer until this part is upstream:

Add "all members" member id.

Add commands for MSI, feature discovery.

Add commands for transport vq.


My intent is to try and support both SR-IOV and SIOV
usecases with the same structure and maybe even the same
VQ.

For example, it might make sense to split creating/destroying
SIOV devices from the transport passing data from the guest - the
driver would then not negotiate VIRTIO_F_SR_IOV (which
then means auto-provisioning).

More ideas for use-cases:
virtio VF features query and configuration space provisioning
virtio VF resource (queues, msix vectors count) provisioning


Future directions (shouldn't block this patch)
- aborting commands - left for later. or is vq reset enough?
- should we rename structures from admin to group admin?
  left as is for now just in case we soon get more admin things which are
  not group administration. easy to change later.


Michael S. Tsirkin (10):
  virtio: document forward compatibility guarantees
  admin: introduce device group and related concepts
  admin: introduce group administration commands
  admin: introduce virtio admin virtqueues
  pci: add admin vq registers to virtio over pci
  mmio: document ADMIN_VQ as reserved
  ccw: document ADMIN_VQ as reserved
  admin: command list discovery
  admin: conformance clauses
  ccw: document more reserved features

 admin.tex| 586 +++
 content.tex  | 113 -
 introduction.tex |   3 +
 3 files changed, 700 insertions(+), 2 deletions(-)
 create mode 100644 admin.tex

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH 02/11] transport-pci: Move transitional device id to legacy section

2023-04-03 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 10:50 AM

> 
> No idea what all this means, sorry.  Please do not move text that affects 
> modern
> drivers to a legacy section. And we've spilled way too much ink on this 
> already.

I disagree because spec do not describe modern driver and what you are 
describing is not aligned the way current spec is written.
I prefer to avoid mentioning it again the same feature bits section that talks 
about Transitional interface.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Monday, April 3, 2023 10:53 AM
> 
> On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote:
> > > I can not say I thought about this
> > > deeply so maybe there's some problem, or maybe it's a worse approach
> > > - could you comment on this? It looks like this could be a smaller
> > > change, but maybe it isn't? Did you consider this option?
> >
> > We can possibly let both the options open for device vendors to implement.
> >
> > Change wise transport VQ is fairly big addition for both hypervisor
> > driver and also for the device.
> 
> OTOH it is presumably required for scalability anyway, no?
No.
Most new generation SIOV and SR-IOV devices operate without any 
para-virtualization.

> And presumably it can all be done in firmware ...
> Is there actual hardware that can't implement transport vq but is going to
> implement the mmr spec?
> 
Nvidia and Marvell DPUs implement MMR spec.
Transport VQ has very high latency and DMA overheads for 2 to 4 bytes 
read/write.

And before discussing "why not that approach", lets finish reviewing "this 
approach" first.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Parav Pandit





On 4/3/2023 10:45 AM, Stefan Hajnoczi wrote:

On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:

Overview:
-
The Transitional MMR device is a variant of the transitional PCI device.


What does "MMR" mean?


memory mapped registers.
Explained below in the design section and also in relevant patches 6 to 11.


It has its own small Device ID range. It does not have I/O
region BAR; instead it exposes legacy configuration and device
specific registers at an offset in the memory region BAR.

Such transitional MMR devices will be used at the scale of
thousands of devices using PCI SR-IOV and/or future scalable
virtualization technology to provide backward
compatibility (for legacy devices) and also future
compatibility with new features.

Usecase:

1. A hypervisor/system needs to provide transitional
virtio devices to the guest VM at scale of thousands,
typically, one to eight devices per VM.

2. A hypervisor/system needs to provide such devices using a
vendor agnostic driver in the hypervisor system.


Is the idea that the hypervisor configures the new Transitional MMR
devices and makes them appear like virtio-pci Transitional devices?

Yes, but hypervisor is not involved in any configuration parsing or 
anything of that nature.
It is only a passthrough fowarder from emulated IOBAR to memory mapped 
legacy registers.

In other words, hypervisor do not care for the registers content at all.


In other words, the guest doesn't know about Transitional MMR and does
not need any code changes.


3. A hypervisor system prefers to have single stack regardless of
virtio device type (net/blk) and be future compatible with a
single vfio stack using SR-IOV or other scalable device
virtualization technology to map PCI devices to the guest VM.
(as transitional or otherwise)


What does this paragraph mean?

It means regardless of a VF being transitional MMR VF or 1.x VF without 
any MMR extensions, there is single vfio virtio driver handling both 
type of devices to map to the guest VM.




Modern devices were added to Linux in 2014 and support SR-IOV. 



Why is it
important to support Transitional (which really means Legacy devices,
otherwise Modern devices would be sufficient)?

To support guest VMs which only understand legacy devices and 
unfortunately they are still in much wider use by the users.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Michael S. Tsirkin

On Fri, Mar 31, 2023 at 05:43:11PM -0400, Parav Pandit wrote:
> > I can not say I thought about this
> > deeply so maybe there's some problem, or maybe it's a worse approach -
> > could you comment on this? It looks like this could be a smaller change,
> > but maybe it isn't? Did you consider this option?
> 
> We can possibly let both the options open for device vendors to implement.
> 
> Change wise transport VQ is fairly big addition for both hypervisor driver
> and also for the device.

OTOH it is presumably required for scalability anyway, no?
And presumably it can all be done in firmware ...
Is there actual hardware that can't implement transport vq
but is going to implement the mmr spec?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH 02/11] transport-pci: Move transitional device id to legacy section

2023-04-03 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 02:42:15PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Sunday, April 2, 2023 3:55 AM
> > 
> > On Fri, Mar 31, 2023 at 09:24:21PM +, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin 
> > > > Sent: Friday, March 31, 2023 2:44 AM
> > > >
> > > > On Fri, Mar 31, 2023 at 01:58:25AM +0300, Parav Pandit wrote:
> > > > > Currently PCI device discovery details for the transitional device
> > > > > are documented in two different sections.
> > > > >
> > > > > For example, PCI device and vendor ID registers are documented in
> > > > > 'Device Requirements: PCI Device Discovery' section, while PCI
> > > > > revision id is documented in 'Legacy Interfaces: A Note on PCI
> > > > > Device Discovery' section.
> > > > >
> > > > > Transitional devices requirements should be documented in "legacy
> > > > > interfaces" section as clearly mentioned in 'Legacy Interface: A
> > > > > Note on Feature Bits'.
> > > >
> > > > I already commented on this, I disagree.
> > > > Modern drivers must be able
> > > > to completely ignore legacy interface sections, but they do bind to
> > > > transitional device IDs.
> > > > This change breaks this assumption.
> > > >
> > > Legacy interface section holds the detail about transitional devices.
> > > We do not have,
> > > "Legacy only" section.
> > >
> > > It doesn't make sense to partial information in legacy and partial in 
> > > other
> > place.
> > > Modern drivers are not mentioned in the spec terminology section.
> > >
> > > Can you please explain, how can modern driver ignore the text " 
> > > Transitional
> > devices MUST have a PCI Revision ID of 0." written in legacy interface 
> > section?
> > 
> > Modern drivers ignore revision ID. It is 0 to accommodate legacy drivers.
> For transitional device the revision ID must be zero and "a driver" miust 
> check it be zero.
> The section refers to Legacy interface covering transitional devices (not 
> just legacy device).
> So it cannot be written in spec from undefined modern driver POV in the spec.

No idea what all this means, sorry.  Please do not move text that
affects modern drivers to a legacy section. And we've spilled way too
much ink on this already.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] [PATCH 00/11] Introduce transitional mmr pci device

2023-04-03 Thread Stefan Hajnoczi

On Fri, Mar 31, 2023 at 01:58:23AM +0300, Parav Pandit wrote:
> Overview:
> -
> The Transitional MMR device is a variant of the transitional PCI device.

What does "MMR" mean?

> It has its own small Device ID range. It does not have I/O
> region BAR; instead it exposes legacy configuration and device
> specific registers at an offset in the memory region BAR.
> 
> Such transitional MMR devices will be used at the scale of
> thousands of devices using PCI SR-IOV and/or future scalable
> virtualization technology to provide backward
> compatibility (for legacy devices) and also future
> compatibility with new features.
> 
> Usecase:
> 
> 1. A hypervisor/system needs to provide transitional
>virtio devices to the guest VM at scale of thousands,
>typically, one to eight devices per VM.
> 
> 2. A hypervisor/system needs to provide such devices using a
>vendor agnostic driver in the hypervisor system.

Is the idea that the hypervisor configures the new Transitional MMR
devices and makes them appear like virtio-pci Transitional devices?

In other words, the guest doesn't know about Transitional MMR and does
not need any code changes.

> 3. A hypervisor system prefers to have single stack regardless of
>virtio device type (net/blk) and be future compatible with a
>single vfio stack using SR-IOV or other scalable device
>virtualization technology to map PCI devices to the guest VM.
>(as transitional or otherwise)

What does this paragraph mean?

> 
> Motivation/Background:
> --
> The existing transitional PCI device is missing support for
> PCI SR-IOV based devices. Currently it does not work beyond
> PCI PF, or as software emulated device in reality. It currently
> has below cited system level limitations:
> 
> [a] PCIe spec citation:
> VFs do not support I/O Space and thus VF BARs shall not
> indicate I/O Space.
> 
> [b] cpu arch citiation:
> Intel 64 and IA-32 Architectures Software Developer’s Manual:
> The processor’s I/O address space is separate and distinct from
> the physical-memory address space. The I/O address space consists
> of 64K individually addressable 8-bit I/O ports, numbered 0 through H.
> 
> [c] PCIe spec citation:
> If a bridge implements an I/O address range,...I/O address range
> will be aligned to a 4 KB boundary.
> 
> [d] I/O region accesses at PCI system level is slow as they are non-posted
> operations in PCIe fabric.
> 
> The usecase requirements and limitations above can be solved by
> extending the transitional device, mapping legacy and device
> specific configuration registers in a memory PCI BAR instead
> of using non composable I/O region.
> 
> Please review.

Modern devices were added to Linux in 2014 and support SR-IOV. Why is it
important to support Transitional (which really means Legacy devices,
otherwise Modern devices would be sufficient)?

> 
> Patch summary:
> --
> patch 1 to 5 prepares the spec
> patch 6 to 11 defines transitional mmr device
> 
> patch-1 uses lower case alphabets to name device id
> patch-2 move transitional device id in legay section along with
> revision id
> patch-3 splits legacy feature bits description from device id
> patch-4 rename and moves virtio config registers next to 1.x
> registers section
> patch-5 Adds missing helper verb in terminology definitions
> patch-6 introduces transitional mmr device
> patch-7 introduces transitional mmr device pci device ids
> patch-8 introduces virtio extended pci capability
> patch-9 describes new pci capability to locate legacy mmr
> registers
> patch-10 extended usage of driver notification capability for
>  the transitional mmr device
> patch-11 adds conformance section of the transitional mmr device
> 
> This design and details further described below.
> 
> Design:
> ---
> Below picture captures the main small difference between current
> transitional PCI SR-IOV VF and transitional MMR SR-IOV VF.
> 
> +--+ ++ ++
> |virtio 1.x| |Transitional| |Transitional|
> |SRIOV VF  | |SRIOV VF| |MMR SRIOV VF|
> |  | || ||
> ++---+ | ++---+   | ++---+   |
> ||dev_id =   | | ||dev_id =   |   | ||dev_id =   |   |
> ||{0x1040-0x106C}| | ||{0x1000-0x103f}|   | ||{0x10f9-0x10ff}|   |
> |+---+ | |+---+   | |+---+   |
> |  | || ||
> |++| |++  | |+-+ |
> ||Memory BAR  || ||Memory BAR  |  | ||Memory BAR   | |
> |++| |++  | || | |
> |  | || || +--+| |
> |  | |+-+ | || |legacy virtio || |
> |  | ||IOBAR impossible |

[virtio-dev] RE: [PATCH 02/11] transport-pci: Move transitional device id to legacy section

2023-04-03 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Sunday, April 2, 2023 3:55 AM
> 
> On Fri, Mar 31, 2023 at 09:24:21PM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Friday, March 31, 2023 2:44 AM
> > >
> > > On Fri, Mar 31, 2023 at 01:58:25AM +0300, Parav Pandit wrote:
> > > > Currently PCI device discovery details for the transitional device
> > > > are documented in two different sections.
> > > >
> > > > For example, PCI device and vendor ID registers are documented in
> > > > 'Device Requirements: PCI Device Discovery' section, while PCI
> > > > revision id is documented in 'Legacy Interfaces: A Note on PCI
> > > > Device Discovery' section.
> > > >
> > > > Transitional devices requirements should be documented in "legacy
> > > > interfaces" section as clearly mentioned in 'Legacy Interface: A
> > > > Note on Feature Bits'.
> > >
> > > I already commented on this, I disagree.
> > > Modern drivers must be able
> > > to completely ignore legacy interface sections, but they do bind to
> > > transitional device IDs.
> > > This change breaks this assumption.
> > >
> > Legacy interface section holds the detail about transitional devices.
> > We do not have,
> > "Legacy only" section.
> >
> > It doesn't make sense to partial information in legacy and partial in other
> place.
> > Modern drivers are not mentioned in the spec terminology section.
> >
> > Can you please explain, how can modern driver ignore the text " Transitional
> devices MUST have a PCI Revision ID of 0." written in legacy interface 
> section?
> 
> Modern drivers ignore revision ID. It is 0 to accommodate legacy drivers.
For transitional device the revision ID must be zero and "a driver" miust check 
it be zero.
The section refers to Legacy interface covering transitional devices (not just 
legacy device).
So it cannot be written in spec from undefined modern driver POV in the spec.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

62 matches

Mail list logo