Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 05/09/2016, 09:40, "Alexander Graf" wrote: >On 09/04/2016 05:47 PM, Ola Liljedahl wrote: >> >> On 02/09/2016, 16:05, "Alexander Graf" wrote: >> > There is a big problem that the control handle logic is >synchronization, > but the data queue > handling logic is asynchronization. We can't combine them into one > queue. > It will decrease the performance because you need indentify each >packet > if we do this forcedly. Are you saying that control and data operations are handled by separate "blocks??? If you combined control and data queues, there would have to be a (SW) demultiplexer that would add overhead (and potentially decrease throughout) especially for the data operations? >>> Uh, the multiplexer is as simple as a switch() statement on the opcode, >>> no? >> You are assuming the backend will (always) be implemented in software. > >If you implement it in something that is not software, multiplexing >suddenly becomes a lot harder. What if you want to run 20 VMs on a >single host? Would you spawn SR-IOV devices with separate control queues >each? Or would you trap the control queue into the host and let the >guest freely access data queues which then means one guest could >interfere with another guest's data? For a backend implementation in hardware, it would of course also have to support separation and protection between clients. I haven??t tried to understand how virtio could be made to support hardware implementation of some interesting backends. I just want us to avoid making interface definitions and specification that make alternative backend implementations difficult or less efficient. In OPNFV DPACC project, there was some prototyping of virtio-crypto with HW offload and one conclusion was that the SW overhead was so high that you had to pass packets of size >1000 bytes for the HW acceleration to be worth it. (I think the comparison was with AES). >If you manage to give each queue its own stream ID, you could just pass >as many real hardware queues as you like into guests, no? > > >Alex > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> > > On 09/03/2016 05:18 AM, Gonglei (Arei) wrote: > > Hi, > > > >> -Original Message- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Friday, September 02, 2016 10:05 PM > >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > >> > >> > >> > >> On 02.09.16 14:16, Ola Liljedahl wrote: > >>> > >>> On 02/09/2016, 12:26, "Gonglei (Arei)" > wrote: > >>> > > -Original Message- > > From: Alexander Graf [mailto:ag...@suse.de] > > Sent: Friday, September 02, 2016 4:07 PM > > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > > specification > > > > > > > > On 02.09.16 05:08, Gonglei (Arei) wrote: > >> Hi Alex, > >> > >> > >>> -Original Message- > >>> From: Alexander Graf [mailto:ag...@suse.de] > >>> Sent: Thursday, September 01, 2016 9:37 PM > >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > > specification > >>> On 08/30/2016 02:12 PM, Gonglei wrote: > The virtio crypto device is a virtual crypto device (ie. hardware > crypto accelerator card). The virtio crypto device can provide > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, > >> PRIMITIVE. > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > >>> I have mostly a few high level comments. > >>> > >>> For starters, a lot of the structs rely on the compiler to pad them > > to > >>> natural alignment. That may get us into trouble when trying to > > emulate a > >>> virtio device on a host with different guest architecture (like arm > > on > >>> x86). It'd be a good idea to manually pad everything to be 64bit > > aligned > >>> - then all fields are always at the same spot. > >>> > >> Good point! I'll do this in the next version. Thanks! > >> > >>> I also have a hard time getting my head around the final flow of > >>> everything. Do I always have to create a session to be able to emit a > >>> command? In that case, doesn't that slow down everything, since a > >>> request would then need to wait for the host reply to receive its > >>> session id? There should be some way to fire off a simple non-iv > >>> operation without any session set up imho. > >>> > >> For symmetric algorithms, we'd better create a session before > > executing > > encryption > >> Or decryption, because the session usually be kept for a specific > >> algorithm with specific key in the production environment. And if we > > only > > change the iv, > >> we don't need to re-create the session. > > I think we have a slight misunderstanding here :) > > > > The current workflow is > > > >-> create session > ><- session key > >-> data in > ><- data out > >... > ><- close session > >-> ack > > > > That means that at least for the first packet you have at least one full > > round-trip cost from guest to host to guest to be able to send any data. > > > Yes, that's true... > > > That sounds pretty expensive to me on the latency side. There are > > multiple ways to mitigate that. One idea was to have a separate path in > > parallel to the create session + data + close session dance that would > > combine them all into a single command. You would still have the session > > based version, but accelerate the one-blob case. > > > > Another idea would be to make the guest be the session id janitor. Then > > you could just do > > > >-> create session with key X > >-> data in > ><- data out > >... > > > > so you save the round trip, if you combine command and data queues, as > > then the create and data bits are serialized by their position in the > > queue. > > > ... But for lots of crypto requests, the exhaust is low: > > -> create session with key X > <- session id > //do something in the guest if you like > -> data in with session_id > -> data in with session_id > -> data in with session_id > ... ...(fill out data virtqueue) > <-data out > <-data out > <-data out > > And this is what the production environment do currently. > > >> For the asymmetric algorithms, we don't need create a session IIRC. > >> > >> So, I don't think this is a performance degradation, but a performance > > enhancement. > >>> Also, I don't fully understand the split between control and data > >>> queues. As far as I read things, the control queue is used to create > >>> session ids and the data queues can then be used to push data. Is > > there > >>> any particular reason for the split? One thing that seems natural to > > me > >>> would be to have sessions be per-
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 09/04/2016 05:47 PM, Ola Liljedahl wrote: > > On 02/09/2016, 16:05, "Alexander Graf" wrote: > There is a big problem that the control handle logic is synchronization, but the data queue handling logic is asynchronization. We can't combine them into one queue. It will decrease the performance because you need indentify each packet if we do this forcedly. >>> Are you saying that control and data operations are handled by separate >>> "blocksĀ²? >>> If you combined control and data queues, there would have to be a (SW) >>> demultiplexer >>> that would add overhead (and potentially decrease throughout) especially >>> for the data >>> operations? >> Uh, the multiplexer is as simple as a switch() statement on the opcode, >> no? > You are assuming the backend will (always) be implemented in software. If you implement it in something that is not software, multiplexing suddenly becomes a lot harder. What if you want to run 20 VMs on a single host? Would you spawn SR-IOV devices with separate control queues each? Or would you trap the control queue into the host and let the guest freely access data queues which then means one guest could interfere with another guest's data? If you manage to give each queue its own stream ID, you could just pass as many real hardware queues as you like into guests, no? Alex
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 09/03/2016 05:18 AM, Gonglei (Arei) wrote: Hi, -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, September 02, 2016 10:05 PM Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification On 02.09.16 14:16, Ola Liljedahl wrote: On 02/09/2016, 12:26, "Gonglei (Arei)" wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, September 02, 2016 4:07 PM Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification On 02.09.16 05:08, Gonglei (Arei) wrote: Hi Alex, -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, September 01, 2016 9:37 PM Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification On 08/30/2016 02:12 PM, Gonglei wrote: The virtio crypto device is a virtual crypto device (ie. hardware crypto accelerator card). The virtio crypto device can provide five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. In this patch, CIPHER, MAC, HASH, AEAD services are introduced. I have mostly a few high level comments. For starters, a lot of the structs rely on the compiler to pad them to natural alignment. That may get us into trouble when trying to emulate a virtio device on a host with different guest architecture (like arm on x86). It'd be a good idea to manually pad everything to be 64bit aligned - then all fields are always at the same spot. Good point! I'll do this in the next version. Thanks! I also have a hard time getting my head around the final flow of everything. Do I always have to create a session to be able to emit a command? In that case, doesn't that slow down everything, since a request would then need to wait for the host reply to receive its session id? There should be some way to fire off a simple non-iv operation without any session set up imho. For symmetric algorithms, we'd better create a session before executing encryption Or decryption, because the session usually be kept for a specific algorithm with specific key in the production environment. And if we only change the iv, we don't need to re-create the session. I think we have a slight misunderstanding here :) The current workflow is -> create session <- session key -> data in <- data out ... <- close session -> ack That means that at least for the first packet you have at least one full round-trip cost from guest to host to guest to be able to send any data. Yes, that's true... That sounds pretty expensive to me on the latency side. There are multiple ways to mitigate that. One idea was to have a separate path in parallel to the create session + data + close session dance that would combine them all into a single command. You would still have the session based version, but accelerate the one-blob case. Another idea would be to make the guest be the session id janitor. Then you could just do -> create session with key X -> data in <- data out ... so you save the round trip, if you combine command and data queues, as then the create and data bits are serialized by their position in the queue. ... But for lots of crypto requests, the exhaust is low: -> create session with key X <- session id //do something in the guest if you like -> data in with session_id -> data in with session_id -> data in with session_id ... ...(fill out data virtqueue) <-data out <-data out <-data out And this is what the production environment do currently. For the asymmetric algorithms, we don't need create a session IIRC. So, I don't think this is a performance degradation, but a performance enhancement. Also, I don't fully understand the split between control and data queues. As far as I read things, the control queue is used to create session ids and the data queues can then be used to push data. Is there any particular reason for the split? One thing that seems natural to me would be to have sessions be per-queue, so you would create a session on a particular queue and only have it ever be available there. That way you get rid of any locking for sessions. We want to keep a unify request type (structure) for data queue, so we can keep the session operation in the control queue. Of course the control queue only used to create sessions currently, but we can extend its functions if needed in the future. I still don't understand. With separate control+data queues you just get yourself into synchronization troubles. Both struct virtio_crypto_ctrl_header and struct virtio_crypto_op_header already have an opcode as first le32 field. You can easily use that to determine both length of the payload as well as command (control vs data). There is a big problem that the control handle logic is synchronization, but the data queue handling logic is asynchronization. We can't combine them into one queue. It will decrease the performance because you need indenti
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 02/09/2016, 16:05, "Alexander Graf" wrote: >>>There is a big problem that the control handle logic is synchronization, >>>but the data queue >>>handling logic is asynchronization. We can't combine them into one >>>queue. >>>It will decrease the performance because you need indentify each packet >>>if we do this forcedly. >>Are you saying that control and data operations are handled by separate >>"blocks??? >>If you combined control and data queues, there would have to be a (SW) >>demultiplexer >>that would add overhead (and potentially decrease throughout) especially >>for the data >>operations? > >Uh, the multiplexer is as simple as a switch() statement on the opcode, >no? You are assuming the backend will (always) be implemented in software. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Hi Stefan, > -Original Message- > From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Friday, September 02, 2016 8:07 PM > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > On Tue, Aug 30, 2016 at 08:12:15PM +0800, Gonglei wrote: > > Hi, > I have read through part of the spec and suggest mostly grammar fixes > below. > Thank you very much! Forgive me for my poor English please :) I'll fix them in the following version. > > The virtio crypto device is a virtual crypto device (ie. hardware > > crypto accelerator card). The virtio crypto device can provide > > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > > > > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > > > Signed-off-by: Gonglei > > CC: Michael S. Tsirkin > > CC: Cornelia Huck > > CC: Stefan Hajnoczi > > CC: Lingli Deng > > CC: Jani Kokkonen > > CC: Ola Liljedahl > > CC: Varun Sethi > > CC: Zeng Xin > > CC: Keating Brian > > CC: Ma Liang J > > CC: Griffin John > > CC: Hanweidong > > CC: Mihai Claudiu Caraman > > --- > > content.tex | 2 + > > virtio-crypto.tex | 835 > ++ > > 2 files changed, 837 insertions(+) > > create mode 100644 virtio-crypto.tex > > > > diff --git a/content.tex b/content.tex > > index 4b45678..ab75f78 100644 > > --- a/content.tex > > +++ b/content.tex > > @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, > \field{residual}, > > \field{status_qualifier}, \field{status}, \field{response} and > > \field{sense} fields. > > > > +\input{virtio-crypto.tex} > > + > > \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > > > > Currently there are three device-independent feature bits defined: > > diff --git a/virtio-crypto.tex b/virtio-crypto.tex > > new file mode 100644 > > index 000..491fc25 > > --- /dev/null > > +++ b/virtio-crypto.tex > > @@ -0,0 +1,835 @@ > > +\section{Crypto Device}\label{sec:Device Types / Crypto Device} > > + > > +The virtio crypto device is a virtual crypto device, and is a kind of > > +virtual hardware accelerators for virtual machine. The encryption and > > s/accelerators/accelerator/ > s/virtual machine/virtual machines/ > > > +decryption requests of are placed in the data queue, and handled by the > > s/of// > > > +real crypto accelerators finally. The second queue is the control queue, > > +which is used to create or destroy session for symmetric algorithms, and > > s/session/sessions/ > > > +control some advanced features in the future. The virtio crypto > > +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD, > > +KDF, ASYM, PRIMITIVE. > > + > > +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID} > > + > > +20 > > + > > +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / > Virtqueues} > > + > > +\begin{description} > > +\item[0] dataq1 > > +\item[\ldots] > > +\item[N-1] dataqN > > +\item[N] controlq > > +\end{description} > > + > > +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1). > > I suggest dropping (\field{max_dataqueues} >= 1) since this constraint > already is expressed in the device normative section below. Things > can go out of sync if they are duplicated. > > > + > > +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature > bits} > > + None currently defined > > + > > +\subsection{Device configuration layout}\label{sec:Device Types / Crypto > Device / Device configuration layout} > > + > > +Thirteen driver-read-only configuration fields are currently defined. > > I count only 12 fields. I suggest saying "The following > driver-read-only configuration fields are currently defined:" instead. > > > +\begin{lstlisting} > > +struct virtio_crypto_config { > > +le16 status; > > +le16 max_dataqueues; > > +le32 crypto_services; > > +/* detailed algorithms mask */ > > +le32 cipher_algo_l; > > +le32 cipher_algo_h; > > +le32 hash_algo; > > +le32 mac_algo_l; > > +le32 mac_algo_h; > > +le32 asym_algo; > > +le32 kdf_algo; > > +le32 aead_algo; > > +le32 primitive_algo; > > +}; > > +\end{lstlisting} > > + > > +The first field, \field{status} is currently defined: > VIRTIO_CRYPTO_S_HW_READY. > > + > > +\begin{lstlisting} > > +#define VIRTIO_CRYPTO_S_HW_READY (1 << 0) > > +\end{lstlisting} > > + > > +The following driver-read-only field, \field{max_dataqueuess} specifies the > > +maximum number of data virtqueues (dataq1\ldots dataqN). The > crypto_services > > +shows the crypto services the virtio crypto supports. The service currently > > s/service/services/ > > > +defined are: > > + > > +\begin{lstlisting} > > +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */ > > +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */ > > How are these constants used: 1 << VIRTIO_CRYPTO_NO_SERVICE? > > I suggest eliminating the 0 bit constants
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Hi, > -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 02, 2016 10:05 PM > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > > > On 02.09.16 14:16, Ola Liljedahl wrote: > > > > > > On 02/09/2016, 12:26, "Gonglei (Arei)" wrote: > > > >> > >>> -Original Message- > >>> From: Alexander Graf [mailto:ag...@suse.de] > >>> Sent: Friday, September 02, 2016 4:07 PM > >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > >>> specification > >>> > >>> > >>> > >>> On 02.09.16 05:08, Gonglei (Arei) wrote: > Hi Alex, > > > > -Original Message- > > From: Alexander Graf [mailto:ag...@suse.de] > > Sent: Thursday, September 01, 2016 9:37 PM > > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > >>> specification > > > > On 08/30/2016 02:12 PM, Gonglei wrote: > >> The virtio crypto device is a virtual crypto device (ie. hardware > >> crypto accelerator card). The virtio crypto device can provide > >> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, > PRIMITIVE. > >> > >> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > > > I have mostly a few high level comments. > > > > For starters, a lot of the structs rely on the compiler to pad them > >>> to > > natural alignment. That may get us into trouble when trying to > >>> emulate a > > virtio device on a host with different guest architecture (like arm > >>> on > > x86). It'd be a good idea to manually pad everything to be 64bit > >>> aligned > > - then all fields are always at the same spot. > > > Good point! I'll do this in the next version. Thanks! > > > I also have a hard time getting my head around the final flow of > > everything. Do I always have to create a session to be able to emit a > > command? In that case, doesn't that slow down everything, since a > > request would then need to wait for the host reply to receive its > > session id? There should be some way to fire off a simple non-iv > > operation without any session set up imho. > > > For symmetric algorithms, we'd better create a session before > >>> executing > >>> encryption > Or decryption, because the session usually be kept for a specific > algorithm with specific key in the production environment. And if we > >>> only > >>> change the iv, > we don't need to re-create the session. > >>> > >>> I think we have a slight misunderstanding here :) > >>> > >>> The current workflow is > >>> > >>> -> create session > >>> <- session key > >>> -> data in > >>> <- data out > >>> ... > >>> <- close session > >>> -> ack > >>> > >>> That means that at least for the first packet you have at least one full > >>> round-trip cost from guest to host to guest to be able to send any data. > >>> > >> Yes, that's true... > >> > >>> That sounds pretty expensive to me on the latency side. There are > >>> multiple ways to mitigate that. One idea was to have a separate path in > >>> parallel to the create session + data + close session dance that would > >>> combine them all into a single command. You would still have the session > >>> based version, but accelerate the one-blob case. > >>> > >>> Another idea would be to make the guest be the session id janitor. Then > >>> you could just do > >>> > >>> -> create session with key X > >>> -> data in > >>> <- data out > >>> ... > >>> > >>> so you save the round trip, if you combine command and data queues, as > >>> then the create and data bits are serialized by their position in the > >>> queue. > >>> > >> ... But for lots of crypto requests, the exhaust is low: > >> > >> -> create session with key X > >> <- session id > >>//do something in the guest if you like > >> -> data in with session_id > >> -> data in with session_id > >> -> data in with session_id > >>... ...(fill out data virtqueue) > >> <-data out > >> <-data out > >><-data out > >> > >> And this is what the production environment do currently. > >> > > For the asymmetric algorithms, we don't need create a session IIRC. > > So, I don't think this is a performance degradation, but a performance > >>> enhancement. > > > Also, I don't fully understand the split between control and data > > queues. As far as I read things, the control queue is used to create > > session ids and the data queues can then be used to push data. Is > >>> there > > any particular reason for the split? One thing that seems natural to > >>> me > > would be to have sessions be per-queue, so you would create a > >>> session on > > a particular queue and only have it ever be available there. That way > > you get rid of any locking for sessions. > > > We want to keep a unify request type (structure) for data queue, so > >>> we can >
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Hi, > -Original Message- > From: Ma, Liang J [mailto:liang.j...@intel.com] > Sent: Friday, September 02, 2016 9:47 PM > Subject: RE: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > Hi Lei, > For wireless algorithm > +#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9 > 27 > +#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA228 > I suggest rename as > +#define VIRTIO_CRYPTO_MAC_KASUMI_F9 27 > +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA228 > Kasumi/snow3g auth mode is not belongs to CMAC family (which is > used for auth based on general block cipher.e.g AES). OK, thanks. > Regards > Liang > Regards, -Gonglei
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Hi Alex, > -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 02, 2016 9:53 PM > Euler) > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > > > On 02.09.16 12:26, Gonglei (Arei) wrote: > > > >> -Original Message- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Friday, September 02, 2016 4:07 PM > >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > >> > >> > >> > >> On 02.09.16 05:08, Gonglei (Arei) wrote: > >>> Hi Alex, > >>> > >>> > -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, September 01, 2016 9:37 PM > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > >> specification > > On 08/30/2016 02:12 PM, Gonglei wrote: > > The virtio crypto device is a virtual crypto device (ie. hardware > > crypto accelerator card). The virtio crypto device can provide > > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > > > > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > I have mostly a few high level comments. > > For starters, a lot of the structs rely on the compiler to pad them to > natural alignment. That may get us into trouble when trying to emulate a > virtio device on a host with different guest architecture (like arm on > x86). It'd be a good idea to manually pad everything to be 64bit aligned > - then all fields are always at the same spot. > > >>> Good point! I'll do this in the next version. Thanks! > >>> > I also have a hard time getting my head around the final flow of > everything. Do I always have to create a session to be able to emit a > command? In that case, doesn't that slow down everything, since a > request would then need to wait for the host reply to receive its > session id? There should be some way to fire off a simple non-iv > operation without any session set up imho. > > >>> For symmetric algorithms, we'd better create a session before executing > >> encryption > >>> Or decryption, because the session usually be kept for a specific > >>> algorithm with specific key in the production environment. And if we only > >> change the iv, > >>> we don't need to re-create the session. > >> > >> I think we have a slight misunderstanding here :) > >> > >> The current workflow is > >> > >> -> create session > >> <- session key > >> -> data in > >> <- data out > >> ... > >> <- close session > >> -> ack > >> > >> That means that at least for the first packet you have at least one full > >> round-trip cost from guest to host to guest to be able to send any data. > >> > > Yes, that's true... > > > >> That sounds pretty expensive to me on the latency side. There are > >> multiple ways to mitigate that. One idea was to have a separate path in > >> parallel to the create session + data + close session dance that would > >> combine them all into a single command. You would still have the session > >> based version, but accelerate the one-blob case. > >> > >> Another idea would be to make the guest be the session id janitor. Then > >> you could just do > >> > >> -> create session with key X > >> -> data in > >> <- data out > >> ... > >> > >> so you save the round trip, if you combine command and data queues, as > >> then the create and data bits are serialized by their position in the > >> queue. > >> > > ... But for lots of crypto requests, the exhaust is low: > > > > -> create session with key X > > <- session id > > //do something in the guest if you like > > -> data in with session_id > > -> data in with session_id > > -> data in with session_id > > ... ...(fill out data virtqueue) > > <-data out > > <-data out > > <-data out > > > > And this is what the production environment do currently. > > Does this also apply to a web server handling lots of small requests? > Does crypto acceleration even make sense in such an environment or would > it be cheaper to simply to the crypto on the CPU for such small payloads? > > I'm seriously asking :). > I think I get your points now. You mean for one-blob request (different packets have different algorithms or key), then the exhaust is too higher under current approach? That's true, because for each packet we do have the below workflow: Guest -> Host -> create session <- session id -> data in <- data out <- close session This is a big problem indeed. > > > >>> > >>> For the asymmetric algorithms, we don't need create a session IIRC. > >>> > >>> So, I don't think this is a performance degradation, but a performance > >> enhancement. > >>> > Also, I don't fully understand the split between control and data > queues. As far as I read things, the control queue is used to create > session ids and the da
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Hi Lei, For wireless algorithm +#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9 27 +#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA228 I suggest rename as +#define VIRTIO_CRYPTO_MAC_KASUMI_F9 27 +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA228 Kasumi/snow3g auth mode is not belongs to CMAC family (which is used for auth based on general block cipher.e.g AES). Regards Liang -Original Message- From: Gonglei [mailto:arei.gong...@huawei.com] Sent: Tuesday, August 30, 2016 1:12 PM To: qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org Cc: peter.huangp...@huawei.com; luoneng...@huawei.com; m...@redhat.com; cornelia.h...@de.ibm.com; stefa...@redhat.com; denglin...@chinamobile.com; jani.kokko...@huawei.com; ola.liljed...@arm.com; varun.se...@freescale.com; Zeng, Xin ; Keating, Brian A ; Ma, Liang J ; Griffin, John ; hanweid...@huawei.com; weidong.hu...@huawei.com; mike.cara...@nxp.com; ag...@suse.de; claudio.font...@huawei.com; Gonglei Subject: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification The virtio crypto device is a virtual crypto device (ie. hardware crypto accelerator card). The virtio crypto device can provide five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. In this patch, CIPHER, MAC, HASH, AEAD services are introduced. Signed-off-by: Gonglei CC: Michael S. Tsirkin CC: Cornelia Huck CC: Stefan Hajnoczi CC: Lingli Deng CC: Jani Kokkonen CC: Ola Liljedahl CC: Varun Sethi CC: Zeng Xin CC: Keating Brian CC: Ma Liang J CC: Griffin John CC: Hanweidong CC: Mihai Claudiu Caraman --- content.tex | 2 + virtio-crypto.tex | 835 ++ 2 files changed, 837 insertions(+) create mode 100644 virtio-crypto.tex diff --git a/content.tex b/content.tex index 4b45678..ab75f78 100644 --- a/content.tex +++ b/content.tex @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual}, \field{status_qualifier}, \field{status}, \field{response} and \field{sense} fields. +\input{virtio-crypto.tex} + \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} Currently there are three device-independent feature bits defined: diff --git a/virtio-crypto.tex b/virtio-crypto.tex new file mode 100644 index 000..491fc25 --- /dev/null +++ b/virtio-crypto.tex @@ -0,0 +1,835 @@ +\section{Crypto Device}\label{sec:Device Types / Crypto Device} + +The virtio crypto device is a virtual crypto device, and is a kind of +virtual hardware accelerators for virtual machine. The encryption and +decryption requests of are placed in the data queue, and handled by the +real crypto accelerators finally. The second queue is the control +queue, which is used to create or destroy session for symmetric +algorithms, and control some advanced features in the future. The +virtio crypto device can provide seven crypto services: CIPHER, MAC, +HASH, AEAD, KDF, ASYM, PRIMITIVE. + +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device +ID} + +20 + +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / +Virtqueues} + +\begin{description} +\item[0] dataq1 +\item[\ldots] +\item[N-1] dataqN +\item[N] controlq +\end{description} + +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1). + +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / +Feature bits} + None currently defined + +\subsection{Device configuration layout}\label{sec:Device Types / +Crypto Device / Device configuration layout} + +Thirteen driver-read-only configuration fields are currently defined. + +\begin{lstlisting} +struct virtio_crypto_config { +le16 status; +le16 max_dataqueues; +le32 crypto_services; +/* detailed algorithms mask */ +le32 cipher_algo_l; +le32 cipher_algo_h; +le32 hash_algo; +le32 mac_algo_l; +le32 mac_algo_h; +le32 asym_algo; +le32 kdf_algo; +le32 aead_algo; +le32 primitive_algo; +}; +\end{lstlisting} + +The first field, \field{status} is currently defined: VIRTIO_CRYPTO_S_HW_READY. + +\begin{lstlisting} +#define VIRTIO_CRYPTO_S_HW_READY (1 << 0) \end{lstlisting} + +The following driver-read-only field, \field{max_dataqueuess} specifies +the maximum number of data virtqueues (dataq1\ldots dataqN). The +crypto_services shows the crypto services the virtio crypto supports. +The service currently defined are: + +\begin{lstlisting} +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */ #define +VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */ #define +VIRTIO_CRYPTO_SERVICE_HASH (2) /* hash service */ +#define VIRTIO_CRYPTO_SERVICE_MAC (3) /* MAC (Message Authentication Codes) service */ +#define VIRTIO_CRYPTO_SERVICE_AEAD (4) /* AEAD(Authenticated +Encryption with Associated Data) service */ \end{lstlisting} + +The last driver-read-only fields specify detailed algorithms mask which +the device offered for corresponding service
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 02.09.16 14:16, Ola Liljedahl wrote: > > > On 02/09/2016, 12:26, "Gonglei (Arei)" wrote: > >> >>> -Original Message- >>> From: Alexander Graf [mailto:ag...@suse.de] >>> Sent: Friday, September 02, 2016 4:07 PM >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >>> specification >>> >>> >>> >>> On 02.09.16 05:08, Gonglei (Arei) wrote: Hi Alex, > -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, September 01, 2016 9:37 PM > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >>> specification > > On 08/30/2016 02:12 PM, Gonglei wrote: >> The virtio crypto device is a virtual crypto device (ie. hardware >> crypto accelerator card). The virtio crypto device can provide >> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. >> >> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > I have mostly a few high level comments. > > For starters, a lot of the structs rely on the compiler to pad them >>> to > natural alignment. That may get us into trouble when trying to >>> emulate a > virtio device on a host with different guest architecture (like arm >>> on > x86). It'd be a good idea to manually pad everything to be 64bit >>> aligned > - then all fields are always at the same spot. > Good point! I'll do this in the next version. Thanks! > I also have a hard time getting my head around the final flow of > everything. Do I always have to create a session to be able to emit a > command? In that case, doesn't that slow down everything, since a > request would then need to wait for the host reply to receive its > session id? There should be some way to fire off a simple non-iv > operation without any session set up imho. > For symmetric algorithms, we'd better create a session before >>> executing >>> encryption Or decryption, because the session usually be kept for a specific algorithm with specific key in the production environment. And if we >>> only >>> change the iv, we don't need to re-create the session. >>> >>> I think we have a slight misunderstanding here :) >>> >>> The current workflow is >>> >>> -> create session >>> <- session key >>> -> data in >>> <- data out >>> ... >>> <- close session >>> -> ack >>> >>> That means that at least for the first packet you have at least one full >>> round-trip cost from guest to host to guest to be able to send any data. >>> >> Yes, that's true... >> >>> That sounds pretty expensive to me on the latency side. There are >>> multiple ways to mitigate that. One idea was to have a separate path in >>> parallel to the create session + data + close session dance that would >>> combine them all into a single command. You would still have the session >>> based version, but accelerate the one-blob case. >>> >>> Another idea would be to make the guest be the session id janitor. Then >>> you could just do >>> >>> -> create session with key X >>> -> data in >>> <- data out >>> ... >>> >>> so you save the round trip, if you combine command and data queues, as >>> then the create and data bits are serialized by their position in the >>> queue. >>> >> ... But for lots of crypto requests, the exhaust is low: >> >> -> create session with key X >> <- session id >>//do something in the guest if you like >> -> data in with session_id >> -> data in with session_id >> -> data in with session_id >>... ...(fill out data virtqueue) >> <-data out >> <-data out >><-data out >> >> And this is what the production environment do currently. >> For the asymmetric algorithms, we don't need create a session IIRC. So, I don't think this is a performance degradation, but a performance >>> enhancement. > Also, I don't fully understand the split between control and data > queues. As far as I read things, the control queue is used to create > session ids and the data queues can then be used to push data. Is >>> there > any particular reason for the split? One thing that seems natural to >>> me > would be to have sessions be per-queue, so you would create a >>> session on > a particular queue and only have it ever be available there. That way > you get rid of any locking for sessions. > We want to keep a unify request type (structure) for data queue, so >>> we can keep the session operation in the control queue. Of course the >>> control queue only used to create sessions currently, but we can extend its >>> functions if >>> needed in the future. >>> >>> I still don't understand. With separate control+data queues you just get >>> yourself into synchronization troubles. Both struct >>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already >>> have an opcode as first le32 field. You can easily use that to determ
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 02.09.16 12:26, Gonglei (Arei) wrote: > >> -Original Message- >> From: Alexander Graf [mailto:ag...@suse.de] >> Sent: Friday, September 02, 2016 4:07 PM >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >> specification >> >> >> >> On 02.09.16 05:08, Gonglei (Arei) wrote: >>> Hi Alex, >>> >>> -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, September 01, 2016 9:37 PM Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >> specification On 08/30/2016 02:12 PM, Gonglei wrote: > The virtio crypto device is a virtual crypto device (ie. hardware > crypto accelerator card). The virtio crypto device can provide > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. I have mostly a few high level comments. For starters, a lot of the structs rely on the compiler to pad them to natural alignment. That may get us into trouble when trying to emulate a virtio device on a host with different guest architecture (like arm on x86). It'd be a good idea to manually pad everything to be 64bit aligned - then all fields are always at the same spot. >>> Good point! I'll do this in the next version. Thanks! >>> I also have a hard time getting my head around the final flow of everything. Do I always have to create a session to be able to emit a command? In that case, doesn't that slow down everything, since a request would then need to wait for the host reply to receive its session id? There should be some way to fire off a simple non-iv operation without any session set up imho. >>> For symmetric algorithms, we'd better create a session before executing >> encryption >>> Or decryption, because the session usually be kept for a specific >>> algorithm with specific key in the production environment. And if we only >> change the iv, >>> we don't need to re-create the session. >> >> I think we have a slight misunderstanding here :) >> >> The current workflow is >> >> -> create session >> <- session key >> -> data in >> <- data out >> ... >> <- close session >> -> ack >> >> That means that at least for the first packet you have at least one full >> round-trip cost from guest to host to guest to be able to send any data. >> > Yes, that's true... > >> That sounds pretty expensive to me on the latency side. There are >> multiple ways to mitigate that. One idea was to have a separate path in >> parallel to the create session + data + close session dance that would >> combine them all into a single command. You would still have the session >> based version, but accelerate the one-blob case. >> >> Another idea would be to make the guest be the session id janitor. Then >> you could just do >> >> -> create session with key X >> -> data in >> <- data out >> ... >> >> so you save the round trip, if you combine command and data queues, as >> then the create and data bits are serialized by their position in the queue. >> > ... But for lots of crypto requests, the exhaust is low: > > -> create session with key X > <- session id > //do something in the guest if you like > -> data in with session_id > -> data in with session_id > -> data in with session_id > ... ...(fill out data virtqueue) > <-data out > <-data out > <-data out > > And this is what the production environment do currently. Does this also apply to a web server handling lots of small requests? Does crypto acceleration even make sense in such an environment or would it be cheaper to simply to the crypto on the CPU for such small payloads? I'm seriously asking :). > >>> >>> For the asymmetric algorithms, we don't need create a session IIRC. >>> >>> So, I don't think this is a performance degradation, but a performance >> enhancement. >>> Also, I don't fully understand the split between control and data queues. As far as I read things, the control queue is used to create session ids and the data queues can then be used to push data. Is there any particular reason for the split? One thing that seems natural to me would be to have sessions be per-queue, so you would create a session on a particular queue and only have it ever be available there. That way you get rid of any locking for sessions. >>> We want to keep a unify request type (structure) for data queue, so we can >>> keep the session operation in the control queue. Of course the control queue >>> only used to create sessions currently, but we can extend its functions if >> needed >>> in the future. >> >> I still don't understand. With separate control+data queues you just get >> yourself into synchronization troubles. Both struct >> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already >> have
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On Tue, Aug 30, 2016 at 08:12:15PM +0800, Gonglei wrote: Hi, I have read through part of the spec and suggest mostly grammar fixes below. > The virtio crypto device is a virtual crypto device (ie. hardware > crypto accelerator card). The virtio crypto device can provide > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > Signed-off-by: Gonglei > CC: Michael S. Tsirkin > CC: Cornelia Huck > CC: Stefan Hajnoczi > CC: Lingli Deng > CC: Jani Kokkonen > CC: Ola Liljedahl > CC: Varun Sethi > CC: Zeng Xin > CC: Keating Brian > CC: Ma Liang J > CC: Griffin John > CC: Hanweidong > CC: Mihai Claudiu Caraman > --- > content.tex | 2 + > virtio-crypto.tex | 835 > ++ > 2 files changed, 837 insertions(+) > create mode 100644 virtio-crypto.tex > > diff --git a/content.tex b/content.tex > index 4b45678..ab75f78 100644 > --- a/content.tex > +++ b/content.tex > @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual}, > \field{status_qualifier}, \field{status}, \field{response} and > \field{sense} fields. > > +\input{virtio-crypto.tex} > + > \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > > Currently there are three device-independent feature bits defined: > diff --git a/virtio-crypto.tex b/virtio-crypto.tex > new file mode 100644 > index 000..491fc25 > --- /dev/null > +++ b/virtio-crypto.tex > @@ -0,0 +1,835 @@ > +\section{Crypto Device}\label{sec:Device Types / Crypto Device} > + > +The virtio crypto device is a virtual crypto device, and is a kind of > +virtual hardware accelerators for virtual machine. The encryption and s/accelerators/accelerator/ s/virtual machine/virtual machines/ > +decryption requests of are placed in the data queue, and handled by the s/of// > +real crypto accelerators finally. The second queue is the control queue, > +which is used to create or destroy session for symmetric algorithms, and s/session/sessions/ > +control some advanced features in the future. The virtio crypto > +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD, > +KDF, ASYM, PRIMITIVE. > + > +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID} > + > +20 > + > +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues} > + > +\begin{description} > +\item[0] dataq1 > +\item[\ldots] > +\item[N-1] dataqN > +\item[N] controlq > +\end{description} > + > +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1). I suggest dropping (\field{max_dataqueues} >= 1) since this constraint already is expressed in the device normative section below. Things can go out of sync if they are duplicated. > + > +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature > bits} > + None currently defined > + > +\subsection{Device configuration layout}\label{sec:Device Types / Crypto > Device / Device configuration layout} > + > +Thirteen driver-read-only configuration fields are currently defined. I count only 12 fields. I suggest saying "The following driver-read-only configuration fields are currently defined:" instead. > +\begin{lstlisting} > +struct virtio_crypto_config { > +le16 status; > +le16 max_dataqueues; > +le32 crypto_services; > +/* detailed algorithms mask */ > +le32 cipher_algo_l; > +le32 cipher_algo_h; > +le32 hash_algo; > +le32 mac_algo_l; > +le32 mac_algo_h; > +le32 asym_algo; > +le32 kdf_algo; > +le32 aead_algo; > +le32 primitive_algo; > +}; > +\end{lstlisting} > + > +The first field, \field{status} is currently defined: > VIRTIO_CRYPTO_S_HW_READY. > + > +\begin{lstlisting} > +#define VIRTIO_CRYPTO_S_HW_READY (1 << 0) > +\end{lstlisting} > + > +The following driver-read-only field, \field{max_dataqueuess} specifies the > +maximum number of data virtqueues (dataq1\ldots dataqN). The crypto_services > +shows the crypto services the virtio crypto supports. The service currently s/service/services/ > +defined are: > + > +\begin{lstlisting} > +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */ > +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */ How are these constants used: 1 << VIRTIO_CRYPTO_NO_SERVICE? I suggest eliminating the 0 bit constants since they can be deduced from the fact that all other bits are zeroed. There is no need for a dedicated "NO_SERVICE" constant. > +#define VIRTIO_CRYPTO_SERVICE_HASH (2) /* hash service */ > +#define VIRTIO_CRYPTO_SERVICE_MAC (3) /* MAC (Message Authentication > Codes) service */ > +#define VIRTIO_CRYPTO_SERVICE_AEAD (4) /* AEAD(Authenticated Encryption > with Associated Data) service */ > +\end{lstlisting} > + > +The last driver-read-only fields specify detailed algorithms mask which s/mask/masks/ > +the device offered for corresponding service. The below CIPHER algorithms s/the device of
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 02/09/2016, 12:26, "Gonglei (Arei)" wrote: > >> -Original Message- >> From: Alexander Graf [mailto:ag...@suse.de] >> Sent: Friday, September 02, 2016 4:07 PM >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >>specification >> >> >> >> On 02.09.16 05:08, Gonglei (Arei) wrote: >> > Hi Alex, >> > >> > >> >> -Original Message- >> >> From: Alexander Graf [mailto:ag...@suse.de] >> >> Sent: Thursday, September 01, 2016 9:37 PM >> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >> specification >> >> >> >> On 08/30/2016 02:12 PM, Gonglei wrote: >> >>> The virtio crypto device is a virtual crypto device (ie. hardware >> >>> crypto accelerator card). The virtio crypto device can provide >> >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. >> >>> >> >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. >> >> >> >> I have mostly a few high level comments. >> >> >> >> For starters, a lot of the structs rely on the compiler to pad them >>to >> >> natural alignment. That may get us into trouble when trying to >>emulate a >> >> virtio device on a host with different guest architecture (like arm >>on >> >> x86). It'd be a good idea to manually pad everything to be 64bit >>aligned >> >> - then all fields are always at the same spot. >> >> >> > Good point! I'll do this in the next version. Thanks! >> > >> >> I also have a hard time getting my head around the final flow of >> >> everything. Do I always have to create a session to be able to emit a >> >> command? In that case, doesn't that slow down everything, since a >> >> request would then need to wait for the host reply to receive its >> >> session id? There should be some way to fire off a simple non-iv >> >> operation without any session set up imho. >> >> >> > For symmetric algorithms, we'd better create a session before >>executing >> encryption >> > Or decryption, because the session usually be kept for a specific >> > algorithm with specific key in the production environment. And if we >>only >> change the iv, >> > we don't need to re-create the session. >> >> I think we have a slight misunderstanding here :) >> >> The current workflow is >> >> -> create session >> <- session key >> -> data in >> <- data out >> ... >> <- close session >> -> ack >> >> That means that at least for the first packet you have at least one full >> round-trip cost from guest to host to guest to be able to send any data. >> >Yes, that's true... > >> That sounds pretty expensive to me on the latency side. There are >> multiple ways to mitigate that. One idea was to have a separate path in >> parallel to the create session + data + close session dance that would >> combine them all into a single command. You would still have the session >> based version, but accelerate the one-blob case. >> >> Another idea would be to make the guest be the session id janitor. Then >> you could just do >> >> -> create session with key X >> -> data in >> <- data out >> ... >> >> so you save the round trip, if you combine command and data queues, as >> then the create and data bits are serialized by their position in the >>queue. >> >... But for lots of crypto requests, the exhaust is low: > >-> create session with key X ><- session id >//do something in the guest if you like > -> data in with session_id >-> data in with session_id >-> data in with session_id >... ...(fill out data virtqueue) ><-data out ><-data out ><-data out > >And this is what the production environment do currently. > >> > >> > For the asymmetric algorithms, we don't need create a session IIRC. >> > >> > So, I don't think this is a performance degradation, but a performance >> enhancement. >> > >> >> Also, I don't fully understand the split between control and data >> >> queues. As far as I read things, the control queue is used to create >> >> session ids and the data queues can then be used to push data. Is >>there >> >> any particular reason for the split? One thing that seems natural to >>me >> >> would be to have sessions be per-queue, so you would create a >>session on >> >> a particular queue and only have it ever be available there. That way >> >> you get rid of any locking for sessions. >> >> >> > We want to keep a unify request type (structure) for data queue, so >>we can >> > keep the session operation in the control queue. Of course the >>control queue >> > only used to create sessions currently, but we can extend its >>functions if >> needed >> > in the future. >> >> I still don't understand. With separate control+data queues you just get >> yourself into synchronization troubles. Both struct >> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already >> have an opcode as first le32 field. You can easily use that to determine >> both length of the payload as well as command (control vs data). >> > >There is a big problem that the control handle logic is synchronization, >but
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 02, 2016 4:07 PM > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > > > On 02.09.16 05:08, Gonglei (Arei) wrote: > > Hi Alex, > > > > > >> -Original Message- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Thursday, September 01, 2016 9:37 PM > >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > >> > >> On 08/30/2016 02:12 PM, Gonglei wrote: > >>> The virtio crypto device is a virtual crypto device (ie. hardware > >>> crypto accelerator card). The virtio crypto device can provide > >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > >>> > >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > >> > >> I have mostly a few high level comments. > >> > >> For starters, a lot of the structs rely on the compiler to pad them to > >> natural alignment. That may get us into trouble when trying to emulate a > >> virtio device on a host with different guest architecture (like arm on > >> x86). It'd be a good idea to manually pad everything to be 64bit aligned > >> - then all fields are always at the same spot. > >> > > Good point! I'll do this in the next version. Thanks! > > > >> I also have a hard time getting my head around the final flow of > >> everything. Do I always have to create a session to be able to emit a > >> command? In that case, doesn't that slow down everything, since a > >> request would then need to wait for the host reply to receive its > >> session id? There should be some way to fire off a simple non-iv > >> operation without any session set up imho. > >> > > For symmetric algorithms, we'd better create a session before executing > encryption > > Or decryption, because the session usually be kept for a specific > > algorithm with specific key in the production environment. And if we only > change the iv, > > we don't need to re-create the session. > > I think we have a slight misunderstanding here :) > > The current workflow is > > -> create session > <- session key > -> data in > <- data out > ... > <- close session > -> ack > > That means that at least for the first packet you have at least one full > round-trip cost from guest to host to guest to be able to send any data. > Yes, that's true... > That sounds pretty expensive to me on the latency side. There are > multiple ways to mitigate that. One idea was to have a separate path in > parallel to the create session + data + close session dance that would > combine them all into a single command. You would still have the session > based version, but accelerate the one-blob case. > > Another idea would be to make the guest be the session id janitor. Then > you could just do > > -> create session with key X > -> data in > <- data out > ... > > so you save the round trip, if you combine command and data queues, as > then the create and data bits are serialized by their position in the queue. > ... But for lots of crypto requests, the exhaust is low: -> create session with key X <- session id //do something in the guest if you like -> data in with session_id -> data in with session_id -> data in with session_id ... ...(fill out data virtqueue) <-data out <-data out <-data out And this is what the production environment do currently. > > > > For the asymmetric algorithms, we don't need create a session IIRC. > > > > So, I don't think this is a performance degradation, but a performance > enhancement. > > > >> Also, I don't fully understand the split between control and data > >> queues. As far as I read things, the control queue is used to create > >> session ids and the data queues can then be used to push data. Is there > >> any particular reason for the split? One thing that seems natural to me > >> would be to have sessions be per-queue, so you would create a session on > >> a particular queue and only have it ever be available there. That way > >> you get rid of any locking for sessions. > >> > > We want to keep a unify request type (structure) for data queue, so we can > > keep the session operation in the control queue. Of course the control queue > > only used to create sessions currently, but we can extend its functions if > needed > > in the future. > > I still don't understand. With separate control+data queues you just get > yourself into synchronization troubles. Both struct > virtio_crypto_ctrl_header and struct virtio_crypto_op_header already > have an opcode as first le32 field. You can easily use that to determine > both length of the payload as well as command (control vs data). > There is a big problem that the control handle logic is synchronization, but the data queue handling logic is asynchronization. We can't combine them into one queue. It will decrease the performance because y
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 02.09.16 05:08, Gonglei (Arei) wrote: > Hi Alex, > > >> -Original Message- >> From: Alexander Graf [mailto:ag...@suse.de] >> Sent: Thursday, September 01, 2016 9:37 PM >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device >> specification >> >> On 08/30/2016 02:12 PM, Gonglei wrote: >>> The virtio crypto device is a virtual crypto device (ie. hardware >>> crypto accelerator card). The virtio crypto device can provide >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. >>> >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. >> >> I have mostly a few high level comments. >> >> For starters, a lot of the structs rely on the compiler to pad them to >> natural alignment. That may get us into trouble when trying to emulate a >> virtio device on a host with different guest architecture (like arm on >> x86). It'd be a good idea to manually pad everything to be 64bit aligned >> - then all fields are always at the same spot. >> > Good point! I'll do this in the next version. Thanks! > >> I also have a hard time getting my head around the final flow of >> everything. Do I always have to create a session to be able to emit a >> command? In that case, doesn't that slow down everything, since a >> request would then need to wait for the host reply to receive its >> session id? There should be some way to fire off a simple non-iv >> operation without any session set up imho. >> > For symmetric algorithms, we'd better create a session before executing > encryption > Or decryption, because the session usually be kept for a specific > algorithm with specific key in the production environment. And if we only > change the iv, > we don't need to re-create the session. I think we have a slight misunderstanding here :) The current workflow is -> create session <- session key -> data in <- data out ... <- close session -> ack That means that at least for the first packet you have at least one full round-trip cost from guest to host to guest to be able to send any data. That sounds pretty expensive to me on the latency side. There are multiple ways to mitigate that. One idea was to have a separate path in parallel to the create session + data + close session dance that would combine them all into a single command. You would still have the session based version, but accelerate the one-blob case. Another idea would be to make the guest be the session id janitor. Then you could just do -> create session with key X -> data in <- data out ... so you save the round trip, if you combine command and data queues, as then the create and data bits are serialized by their position in the queue. > > For the asymmetric algorithms, we don't need create a session IIRC. > > So, I don't think this is a performance degradation, but a performance > enhancement. > >> Also, I don't fully understand the split between control and data >> queues. As far as I read things, the control queue is used to create >> session ids and the data queues can then be used to push data. Is there >> any particular reason for the split? One thing that seems natural to me >> would be to have sessions be per-queue, so you would create a session on >> a particular queue and only have it ever be available there. That way >> you get rid of any locking for sessions. >> > We want to keep a unify request type (structure) for data queue, so we can > keep the session operation in the control queue. Of course the control queue > only used to create sessions currently, but we can extend its functions if > needed > in the future. I still don't understand. With separate control+data queues you just get yourself into synchronization troubles. Both struct virtio_crypto_ctrl_header and struct virtio_crypto_op_header already have an opcode as first le32 field. You can easily use that to determine both length of the payload as well as command (control vs data). You could then also completely get rid of the "queue_id" fields, as any operation would only ever operate on the queue it's running on. Alex
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
Hi Alex, > -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, September 01, 2016 9:37 PM > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > On 08/30/2016 02:12 PM, Gonglei wrote: > > The virtio crypto device is a virtual crypto device (ie. hardware > > crypto accelerator card). The virtio crypto device can provide > > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > > > > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > I have mostly a few high level comments. > > For starters, a lot of the structs rely on the compiler to pad them to > natural alignment. That may get us into trouble when trying to emulate a > virtio device on a host with different guest architecture (like arm on > x86). It'd be a good idea to manually pad everything to be 64bit aligned > - then all fields are always at the same spot. > Good point! I'll do this in the next version. Thanks! > I also have a hard time getting my head around the final flow of > everything. Do I always have to create a session to be able to emit a > command? In that case, doesn't that slow down everything, since a > request would then need to wait for the host reply to receive its > session id? There should be some way to fire off a simple non-iv > operation without any session set up imho. > For symmetric algorithms, we'd better create a session before executing encryption Or decryption, because the session usually be kept for a specific algorithm with specific key in the production environment. And if we only change the iv, we don't need to re-create the session. For the asymmetric algorithms, we don't need create a session IIRC. So, I don't think this is a performance degradation, but a performance enhancement. > Also, I don't fully understand the split between control and data > queues. As far as I read things, the control queue is used to create > session ids and the data queues can then be used to push data. Is there > any particular reason for the split? One thing that seems natural to me > would be to have sessions be per-queue, so you would create a session on > a particular queue and only have it ever be available there. That way > you get rid of any locking for sessions. > We want to keep a unify request type (structure) for data queue, so we can keep the session operation in the control queue. Of course the control queue only used to create sessions currently, but we can extend its functions if needed in the future. > > Alex Regards, -Gonglei
Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
On 08/30/2016 02:12 PM, Gonglei wrote: The virtio crypto device is a virtual crypto device (ie. hardware crypto accelerator card). The virtio crypto device can provide five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. In this patch, CIPHER, MAC, HASH, AEAD services are introduced. I have mostly a few high level comments. For starters, a lot of the structs rely on the compiler to pad them to natural alignment. That may get us into trouble when trying to emulate a virtio device on a host with different guest architecture (like arm on x86). It'd be a good idea to manually pad everything to be 64bit aligned - then all fields are always at the same spot. I also have a hard time getting my head around the final flow of everything. Do I always have to create a session to be able to emit a command? In that case, doesn't that slow down everything, since a request would then need to wait for the host reply to receive its session id? There should be some way to fire off a simple non-iv operation without any session set up imho. Also, I don't fully understand the split between control and data queues. As far as I read things, the control queue is used to create session ids and the data queues can then be used to push data. Is there any particular reason for the split? One thing that seems natural to me would be to have sessions be per-queue, so you would create a session on a particular queue and only have it ever be available there. That way you get rid of any locking for sessions. Alex
[Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
The virtio crypto device is a virtual crypto device (ie. hardware crypto accelerator card). The virtio crypto device can provide five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. In this patch, CIPHER, MAC, HASH, AEAD services are introduced. Signed-off-by: Gonglei CC: Michael S. Tsirkin CC: Cornelia Huck CC: Stefan Hajnoczi CC: Lingli Deng CC: Jani Kokkonen CC: Ola Liljedahl CC: Varun Sethi CC: Zeng Xin CC: Keating Brian CC: Ma Liang J CC: Griffin John CC: Hanweidong CC: Mihai Claudiu Caraman --- content.tex | 2 + virtio-crypto.tex | 835 ++ 2 files changed, 837 insertions(+) create mode 100644 virtio-crypto.tex diff --git a/content.tex b/content.tex index 4b45678..ab75f78 100644 --- a/content.tex +++ b/content.tex @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual}, \field{status_qualifier}, \field{status}, \field{response} and \field{sense} fields. +\input{virtio-crypto.tex} + \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} Currently there are three device-independent feature bits defined: diff --git a/virtio-crypto.tex b/virtio-crypto.tex new file mode 100644 index 000..491fc25 --- /dev/null +++ b/virtio-crypto.tex @@ -0,0 +1,835 @@ +\section{Crypto Device}\label{sec:Device Types / Crypto Device} + +The virtio crypto device is a virtual crypto device, and is a kind of +virtual hardware accelerators for virtual machine. The encryption and +decryption requests of are placed in the data queue, and handled by the +real crypto accelerators finally. The second queue is the control queue, +which is used to create or destroy session for symmetric algorithms, and +control some advanced features in the future. The virtio crypto +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD, +KDF, ASYM, PRIMITIVE. + +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID} + +20 + +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues} + +\begin{description} +\item[0] dataq1 +\item[\ldots] +\item[N-1] dataqN +\item[N] controlq +\end{description} + +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1). + +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature bits} + None currently defined + +\subsection{Device configuration layout}\label{sec:Device Types / Crypto Device / Device configuration layout} + +Thirteen driver-read-only configuration fields are currently defined. + +\begin{lstlisting} +struct virtio_crypto_config { +le16 status; +le16 max_dataqueues; +le32 crypto_services; +/* detailed algorithms mask */ +le32 cipher_algo_l; +le32 cipher_algo_h; +le32 hash_algo; +le32 mac_algo_l; +le32 mac_algo_h; +le32 asym_algo; +le32 kdf_algo; +le32 aead_algo; +le32 primitive_algo; +}; +\end{lstlisting} + +The first field, \field{status} is currently defined: VIRTIO_CRYPTO_S_HW_READY. + +\begin{lstlisting} +#define VIRTIO_CRYPTO_S_HW_READY (1 << 0) +\end{lstlisting} + +The following driver-read-only field, \field{max_dataqueuess} specifies the +maximum number of data virtqueues (dataq1\ldots dataqN). The crypto_services +shows the crypto services the virtio crypto supports. The service currently +defined are: + +\begin{lstlisting} +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */ +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */ +#define VIRTIO_CRYPTO_SERVICE_HASH (2) /* hash service */ +#define VIRTIO_CRYPTO_SERVICE_MAC (3) /* MAC (Message Authentication Codes) service */ +#define VIRTIO_CRYPTO_SERVICE_AEAD (4) /* AEAD(Authenticated Encryption with Associated Data) service */ +\end{lstlisting} + +The last driver-read-only fields specify detailed algorithms mask which +the device offered for corresponding service. The below CIPHER algorithms +are defined currently: + +\begin{lstlisting} +#define VIRTIO_CRYPTO_NO_CIPHER 0 +#define VIRTIO_CRYPTO_CIPHER_ARC4 1 +#define VIRTIO_CRYPTO_CIPHER_AES_ECB2 +#define VIRTIO_CRYPTO_CIPHER_AES_CBC3 +#define VIRTIO_CRYPTO_CIPHER_AES_CTR4 +#define VIRTIO_CRYPTO_CIPHER_DES_ECB5 +#define VIRTIO_CRYPTO_CIPHER_DES_CBC6 +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB 7 +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC 8 +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR 9 +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8 10 +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA211 +#define VIRTIO_CRYPTO_CIPHER_AES_F8 12 +#define VIRTIO_CRYPTO_CIPHER_AES_XTS13 +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3 14 +\end{lstlisting} + +The below HASH algorithms are defined currently: + +\begin{lstlisting} +#define VIRTIO_CRYPTO_NO_HASH0 +#define VIRTIO_CRYPTO_HASH_MD5 1 +#define VIRTIO_CRYPTO_HASH_SHA1 2 +#define VIRTIO_CRYPTO_HASH_SHA_224 3