[virtio-dev] [PATCH v8] virtio-net: support device stats

2022-01-10 Thread Xuan Zhuo
This patch allows the driver to obtain some statistics from the device.

In the back-end implementation, we can count a lot of such information,
which can be used for debugging and judging the running status of the
back-end. We hope to directly display it to the user through ethtool.

Signed-off-by: Xuan Zhuo 
Reviewed-by: Jason Wang 
---

v8:
1. Modified based on comments by Cornelia Huck

v7:
1. add rx_reset, tx_reset
2. add device normative and dirver normative
3. add comments for *_packets, *_bytres

v6:
1. correct the names and descriptions of some stats items

v5:
1. add VIRTIO_NET_CTRL_STATS_GET_CTRL_VQ
2. more item for virtio_net_ctrl_reply_stats_queue_pair

v4:
1. remove dev_stats_num, {rx|tx}_stats_num
2. Use two commands to get the stats of queue pair and dev respectively

v3 changes:
1. add dev_version
2. use queue_pair_index replace rx_num, tx_num
3. Explain the processing when the device and driver support different numbers
of stats

 conformance.tex |   2 +
 content.tex | 133 +++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/conformance.tex b/conformance.tex
index 42f8537..950924e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -142,6 +142,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Device stats}
 \end{itemize}

 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -401,6 +402,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Gratuitous Packet Sending}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Device stats}
 \end{itemize}

 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index cf20570..fa385e2 100644
--- a/content.tex
+++ b/content.tex
@@ -3092,6 +3092,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.

+\item[VIRTIO_NET_F_CTRL_STATS(55)] Device can provide device-level statistics
+to the driver through the control channel.
+
 \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. Unlike UFO
  (fragmenting the packet) the USO splits large UDP packet
  to several segments when each of these smaller packets has UDP header.
@@ -3137,6 +3140,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_GUEST_ANNOUNCE] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_MQ] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_CTRL_STATS] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
@@ -4015,6 +4019,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types 
/ Network Device / Devi
 u8 command;
 u8 command-specific-data[];
 u8 ack;
+u8 command-specific-data-reply[];
 };

 /* ack values */
@@ -4023,7 +4028,8 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types 
/ Network Device / Devi
 \end{lstlisting}

 The \field{class}, \field{command} and command-specific-data are set by the
-driver, and the device sets the \field{ack} byte. There is little it can
+driver, and the device sets the \field{ack} byte and optionally
+\field{command-specific-data-reply}. There is little it can
 do except issue a diagnostic if \field{ack} is not
 VIRTIO_NET_OK.

@@ -4471,6 +4477,131 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
 according to the native endian of the guest rather than
 (necessarily when not using the legacy interface) little-endian.

+\paragraph{Device stats}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Device 

Re: [virtio-dev] [PATCH v7] virtio-net: support device stats

2022-01-10 Thread Cornelia Huck
On Wed, Jan 05 2022, Xuan Zhuo  wrote:

> This patch allows the driver to obtain some statistics from the device.
>
> In the back-end implementation, we can count a lot of such information,
> which can be used for debugging and judging the running status of the
> back-end. We hope to directly display it to the user through ethtool.
>
> Signed-off-by: Xuan Zhuo 
> ---
> v7:
> 1. add rx_reset, tx_reset
> 2. add device normative and dirver normative
> 3. add comments for *_packets, *_bytres
>
> v6:
> 1. correct the names and descriptions of some stats items
>
> v5:
> 1. add VIRTIO_NET_CTRL_STATS_GET_CTRL_VQ
> 2. more item for virtio_net_ctrl_reply_stats_queue_pair
>
> v4:
> 1. remove dev_stats_num, {rx|tx}_stats_num
> 2. Use two commands to get the stats of queue pair and dev respectively
>
> v3 changes:
> 1. add dev_version
> 2. use queue_pair_index replace rx_num, tx_num
> 3. Explain the processing when the device and driver support different numbers
> of stats
>
>  content.tex | 123 +++-
>  1 file changed, 122 insertions(+), 1 deletion(-)
>
> diff --git a/content.tex b/content.tex
> index cf20570..f99d663 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3092,6 +3092,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
> Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>  channel.
>
> +\item[VIRTIO_NET_F_CTRL_STATS(55)] Device can provide device-level statistics
> +to the driver through the control channel.
> +

[More of a curiousity question. Is there any pattern to the feature bit
numbering for virtio-net? It seems that it is counting down from 63, but
bit 58 has been omitted? 55 is fine, though, I'm not asking to change it
:)]

>  \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. Unlike UFO
>   (fragmenting the packet) the USO splits large UDP packet
>   to several segments when each of these smaller packets has UDP header.

> @@ -4504,6 +4510,121 @@ \subsubsection{Legacy Interface: Framing 
> Requirements}\label{sec:Device
>  See \ref{sec:Basic
>  Facilities of a Virtio Device / Virtqueues / Message Framing}.
>
> +\paragraph{Device stats}\label{sec:Device Types / Network Device / Device 
> Operation / Control Virtqueue / Device stats}
> +
> +If the VIRTIO_NET_F_CTRL_STATS feature is negotiated, the driver can
> +get device stats from the device by \field{command-specific-data-reply}.

s/by/in/ ?

> +
> +To get the stats, the following definitions are used:
> +\begin{lstlisting}
> +#define VIRTIO_NET_CTRL_STATS  6
> +#define VIRTIO_NET_CTRL_STATS_GET_DEV  0
> +#define VIRTIO_NET_CTRL_STATS_GET_CTRL_VQ  1
> +#define VIRTIO_NET_CTRL_STATS_GET_QUEUE_PAIR   2
> +\end{lstlisting}
> +
> +The following layout structure are used:

s/structure/structures/

> +
> +\field{command-specific-data}
> +\begin{lstlisting}
> +struct virtio_net_ctrl_stats_queue_pair {
> +le64 queue_pair_index;
> +}
> +\end{lstlisting}
> +
> +\field{command-specific-data-reply}
> +\begin{lstlisting}
> +struct virtio_net_ctrl_reply_stats_dev {
> +le64 dev_reset; // The number of device reset.

s/device reset/device resets/

> +}
> +
> +struct virtio_net_ctrl_reply_stats_cvq {
> +le64 request_num; // The number of requests.
> +le64 ok_num;  // The number of ok ack.
> +le64 err_num; // The number of err ack.

s/ack/acks/ (x2)

> +
> +le64 req_rx_promisc; // The number of requests with command 
> VIRTIO_NET_CTRL_RX_PROMISC.
> +le64 req_rx_allmulti;// The number of requests with command 
> VIRTIO_NET_CTRL_RX_ALLMULTI.
> +le64 req_rx_alluni;  // The number of requests with command 
> VIRTIO_NET_CTRL_RX_ALLUNI.
> +le64 req_rx_nomulti; // The number of requests with command 
> VIRTIO_NET_CTRL_RX_NOMULTI.
> +le64 req_rx_nouni;   // The number of requests with command 
> VIRTIO_NET_CTRL_RX_NOUNI.
> +le64 req_rx_nobcast; // The number of requests with command 
> VIRTIO_NET_CTRL_RX_NOBCAST.
> +le64 req_mac_table_set;  // The number of requests with command 
> VIRTIO_NET_CTRL_MAC_TABLE_SET.
> +le64 req_mac_addr_set;   // The number of requests with command 
> VIRTIO_NET_CTRL_MAC_ADDR_SET.
> +le64 req_vlan_add;   // The number of requests with command 
> VIRTIO_NET_CTRL_VLAN_ADD.
> +le64 req_vlan_del;   // The number of requests with command 
> VIRTIO_NET_CTRL_VLAN_DEL.
> +le64 req_announce_ack;   // The number of requests with command 
> VIRTIO_NET_CTRL_ANNOUNCE_ACK.
> +le64 req_mq_vq_pairs_set;// The number of requests with command 
> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET.
> +le64 req_mq_rss_config;  // The number of requests with command 
> VIRTIO_NET_CTRL_MQ_RSS_CONFIG.
> +le64 req_mq_hash_config; // The number of requests with command 
> VIRTIO_NET_CTRL_MQ_HASH_CONFIG.
> +le64 req_guest_offloads_set; // The number 

Re: [virtio-dev] Re: [PATCH v7] virtio-net: support device stats

2022-01-10 Thread Cornelia Huck
On Fri, Jan 07 2022, Jason Wang  wrote:

> On Fri, Jan 7, 2022 at 11:03 AM Xuan Zhuo  wrote:
>>
>> On Thu, 6 Jan 2022 12:00:14 +0800, Jason Wang  wrote:
>> > On Wed, Jan 5, 2022 at 10:49 AM Xuan Zhuo  
>> > wrote:
>> > >
>> > > This patch allows the driver to obtain some statistics from the device.
>> > >
>> > > In the back-end implementation, we can count a lot of such information,
>> > > which can be used for debugging and judging the running status of the
>> > > back-end. We hope to directly display it to the user through ethtool.
>> > >
>> > > Signed-off-by: Xuan Zhuo 
>> >
>> > Reviewed-by: Jason Wang 
>>
>> hi,
>>
>> When do you think we can start voting.
>
> I think we need to wait for 1-2 weeks to make sure no further comments
> then we may start the voting.

Not strictly needed; I assume that this patch has already seen good
review from the networking side, I'll take a look at it as well.

>
>> Can this patch enter virtio spec v1.2?
>
> I'm not sure. Adding Cornelia for answering this question.

We need to start voting this week for this to make 1.2 (but no worries,
there will be more releases after that, and it is usually sufficient for
a change to make it into the git repository before implementations can
proceed.)

I did not find a github issue for this, so please follow the process
outlined in https://github.com/oasis-tcs/virtio-spec#use-of-github-issues
to request inclusion.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Some thoughts on zero-copy between VM domains for discussion

2022-01-10 Thread Stefan Hajnoczi
On Thu, Jan 06, 2022 at 05:03:38PM +, Alex Bennée wrote:
> 
> Hi,
> 
> To start the new year I thought would dump some of my thoughts on
> zero-copy between VM domains. For project Stratos we've gamely avoided
> thinking too hard about this while we've been concentrating on solving
> more tractable problems. However we can't put it off forever so lets
> work through the problem.
> 
> Memory Sharing
> ==
> 
> For any zero-copy to work there has to be memory sharing between the
> domains. For traditional KVM this isn't a problem as the host kernel
> already has access to the whole address space of all it's guests.
> However type-1 setups (and now pKVM) are less promiscuous about sharing
> their address space across the domains.
> 
> We've discussed options like dynamically sharing individual regions in
> the past (maybe via iommu hooks). However given the performance
> requirements I think that is ruled out in favour of sharing of
> appropriately sized blocks of memory. Either one of the two domains has
> to explicitly share a chunk of its memory with the other or the
> hypervisor has to allocate the memory and make it visible to both. What
> considerations do we have to take into account to do this?
> 
>  * the actual HW device may only have the ability to DMA to certain
>areas of the physical address space.
>  * there may be alignment requirements for HW to access structures (e.g.
>GPU buffers/blocks)
> 
> Which domain should do the sharing? The hypervisor itself likely doesn't
> have all the information to make the choice but in a distributed driver
> world it won't always be the Dom0/Host equivalent. While the domain with
> the HW driver in it will know what the HW needs it might not know if the
> GPA's being used are actually visible to the real PA it is mapped to.
> 
> I think this means for useful memory sharing we need the allocation to
> be done by the HW domain but with support from the hypervisor to
> validate the region meets all the physical bus requirements.
> 
> Buffer Allocation
> =
> 
> Ultimately I think the majority of the work that will be needed comes
> down to how buffer allocation is handled in the kernels. This is also
> the area I'm least familiar with so I look forward to feedback from
> those with deeper kernel knowledge.
> 
> For Linux there already exists the concept of DMA reachable regions that
> take into account the potentially restricted set of addresses that HW
> can DMA to. However we are now adding a second constraint which is where
> the data is eventually going to end up.
> 
> For example the HW domain may be talking to network device but the
> packet data from that device might be going to two other domains. We
> wouldn't want to share a region for received network packets between
> both domains because that would leak information so the network driver
> needs knowledge of which shared region to allocate from and hope the HW
> allows us to filter the packets appropriately (maybe via VLAN tag). I
> suspect the pure HW solution of just splitting into two HW virtual
> functions directly into each domain is going to remain the preserve of
> expensive enterprise kit for some time.
> 
> Should the work be divided up between sub-systems? Both the network and
> block device sub-systems have their own allocation strategies and would
> need some knowledge about the final destination for their data. What
> other driver sub-systems are going to need support for this sort of
> zero-copy forwarding? While it would be nice for every VM transaction to
> be zero-copy we don't really need to solve it for low speed transports.
> 
> Transparent fallback and scaling
> 
> 
> As we know memory is always a precious resource that we never have
> enough of. The more we start carving up memory regions for particular
> tasks the less flexibility the system has as a whole to make efficient
> use of it. We can almost guarantee whatever number we pick for given
> VM-to-VM conduit will be wrong. Any memory allocation system based on
> regions will have to be able to fall back graciously to using other
> memory in the HW domain and rely on traditional bounce buffering
> approaches while under heavy load. This will be a problem for VirtIO
> backends to understand when some data that needs to go to the FE domain
> needs this bounce buffer treatment. This will involve tracking
> destination domain metadata somewhere in the system so it can be queried
> quickly.
> 
> Is there a cross-over here with the kernels existing support for NUMA
> architectures? It seems to me there are similar questions about the best
> place to put memory that perhaps we can treat multi-VM domains as
> different NUMA zones?
> 
> Finally there is the question of scaling. While mapping individual
> transactions would be painfully slow we need to think about how dynamic
> a modern system is. For example do you size your shared network region
> to cope with a