Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-05 Thread Ola Liljedahl


On 05/09/2016, 09:40, "Alexander Graf"  wrote:

>On 09/04/2016 05:47 PM, Ola Liljedahl wrote:
>>
>> On 02/09/2016, 16:05, "Alexander Graf"  wrote:
>>
> There is a big problem that the control handle logic is
>synchronization,
> but the data queue
> handling logic is asynchronization. We can't combine them into one
> queue.
> It will decrease the performance because you need indentify each
>packet
> if we do this forcedly.
 Are you saying that control and data operations are handled by
separate
 "blocks???
 If you combined control and data queues, there would have to be a (SW)
 demultiplexer
 that would add overhead (and potentially decrease throughout)
especially
 for the data
 operations?
>>> Uh, the multiplexer is as simple as a switch() statement on the opcode,
>>> no?
>> You are assuming the backend will (always) be implemented in software.
>
>If you implement it in something that is not software, multiplexing
>suddenly becomes a lot harder. What if you want to run 20 VMs on a
>single host? Would you spawn SR-IOV devices with separate control queues
>each? Or would you trap the control queue into the host and let the
>guest freely access data queues which then means one guest could
>interfere with another guest's data?
For a backend implementation in hardware, it would of course also have to
support separation and protection between clients.

I haven??t tried to understand how virtio could be made to support hardware
implementation of some interesting backends. I just want us to avoid making
interface definitions and specification that make alternative backend
implementations difficult or less efficient.

In OPNFV DPACC project, there was some prototyping of virtio-crypto with HW
offload and one conclusion was that the SW overhead was so high that you
had
to pass packets of size >1000 bytes for the HW acceleration to be worth it.
(I think the comparison was with AES).

>If you manage to give each queue its own stream ID, you could just pass
>as many real hardware queues as you like into guests, no?
>
>
>Alex
>

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-05 Thread Gonglei (Arei)

>
> 
> On 09/03/2016 05:18 AM, Gonglei (Arei) wrote:
> > Hi,
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Friday, September 02, 2016 10:05 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >>
> >>
> >> On 02.09.16 14:16, Ola Liljedahl wrote:
> >>>
> >>> On 02/09/2016, 12:26, "Gonglei (Arei)" 
> wrote:
> >>>
> > -Original Message-
> > From: Alexander Graf [mailto:ag...@suse.de]
> > Sent: Friday, September 02, 2016 4:07 PM
> > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> > specification
> >
> >
> >
> > On 02.09.16 05:08, Gonglei (Arei) wrote:
> >> Hi Alex,
> >>
> >>
> >>> -Original Message-
> >>> From: Alexander Graf [mailto:ag...@suse.de]
> >>> Sent: Thursday, September 01, 2016 9:37 PM
> >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> > specification
> >>> On 08/30/2016 02:12 PM, Gonglei wrote:
>  The virtio crypto device is a virtual crypto device (ie. hardware
>  crypto accelerator card). The virtio crypto device can provide
>  five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
> >> PRIMITIVE.
>  In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>> I have mostly a few high level comments.
> >>>
> >>> For starters, a lot of the structs rely on the compiler to pad them
> > to
> >>> natural alignment. That may get us into trouble when trying to
> > emulate a
> >>> virtio device on a host with different guest architecture (like arm
> > on
> >>> x86). It'd be a good idea to manually pad everything to be 64bit
> > aligned
> >>> - then all fields are always at the same spot.
> >>>
> >> Good point! I'll do this in the next version. Thanks!
> >>
> >>> I also have a hard time getting my head around the final flow of
> >>> everything. Do I always have to create a session to be able to emit a
> >>> command? In that case, doesn't that slow down everything, since a
> >>> request would then need to wait for the host reply to receive its
> >>> session id? There should be some way to fire off a simple non-iv
> >>> operation without any session set up imho.
> >>>
> >> For symmetric algorithms, we'd better create a session before
> > executing
> > encryption
> >> Or decryption, because the session usually be kept for a specific
> >> algorithm with specific key in the production environment. And if we
> > only
> > change the iv,
> >> we don't need to re-create the session.
> > I think we have a slight misunderstanding here :)
> >
> > The current workflow is
> >
> >-> create session
> ><- session key
> >-> data in
> ><- data out
> >...
> ><- close session
> >-> ack
> >
> > That means that at least for the first packet you have at least one full
> > round-trip cost from guest to host to guest to be able to send any data.
> >
>  Yes, that's true...
> 
> > That sounds pretty expensive to me on the latency side. There are
> > multiple ways to mitigate that. One idea was to have a separate path in
> > parallel to the create session + data + close session dance that would
> > combine them all into a single command. You would still have the session
> > based version, but accelerate the one-blob case.
> >
> > Another idea would be to make the guest be the session id janitor. Then
> > you could just do
> >
> >-> create session with key X
> >-> data in
> ><- data out
> >...
> >
> > so you save the round trip, if you combine command and data queues, as
> > then the create and data bits are serialized by their position in the
> > queue.
> >
>  ... But for lots of crypto requests, the exhaust is low:
> 
>  -> create session with key X
>  <- session id
>  //do something in the guest if you like
>  -> data in with session_id
>  -> data in with session_id
>  -> data in with session_id
>  ... ...(fill out data virtqueue)
>  <-data out
>  <-data out
>  <-data out
> 
>  And this is what the production environment do currently.
> 
> >> For the asymmetric algorithms, we don't need create a session IIRC.
> >>
> >> So, I don't think this is a performance degradation, but a performance
> > enhancement.
> >>> Also, I don't fully understand the split between control and data
> >>> queues. As far as I read things, the control queue is used to create
> >>> session ids and the data queues can then be used to push data. Is
> > there
> >>> any particular reason for the split? One thing that seems natural to
> > me
> >>> would be to have sessions be per-

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-05 Thread Alexander Graf
On 09/04/2016 05:47 PM, Ola Liljedahl wrote:
>
> On 02/09/2016, 16:05, "Alexander Graf"  wrote:
>
 There is a big problem that the control handle logic is synchronization,
 but the data queue
 handling logic is asynchronization. We can't combine them into one
 queue.
 It will decrease the performance because you need indentify each packet
 if we do this forcedly.
>>> Are you saying that control and data operations are handled by separate
>>> "blocksĀ²?
>>> If you combined control and data queues, there would have to be a (SW)
>>> demultiplexer
>>> that would add overhead (and potentially decrease throughout) especially
>>> for the data
>>> operations?
>> Uh, the multiplexer is as simple as a switch() statement on the opcode,
>> no?
> You are assuming the backend will (always) be implemented in software.

If you implement it in something that is not software, multiplexing
suddenly becomes a lot harder. What if you want to run 20 VMs on a
single host? Would you spawn SR-IOV devices with separate control queues
each? Or would you trap the control queue into the host and let the
guest freely access data queues which then means one guest could
interfere with another guest's data?

If you manage to give each queue its own stream ID, you could just pass
as many real hardware queues as you like into guests, no?


Alex




Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-05 Thread Alexander Graf

On 09/03/2016 05:18 AM, Gonglei (Arei) wrote:

Hi,


-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Friday, September 02, 2016 10:05 PM
Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
specification



On 02.09.16 14:16, Ola Liljedahl wrote:


On 02/09/2016, 12:26, "Gonglei (Arei)"  wrote:


-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Friday, September 02, 2016 4:07 PM
Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
specification



On 02.09.16 05:08, Gonglei (Arei) wrote:

Hi Alex,



-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, September 01, 2016 9:37 PM
Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device

specification

On 08/30/2016 02:12 PM, Gonglei wrote:

The virtio crypto device is a virtual crypto device (ie. hardware
crypto accelerator card). The virtio crypto device can provide
five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,

PRIMITIVE.

In this patch, CIPHER, MAC, HASH, AEAD services are introduced.

I have mostly a few high level comments.

For starters, a lot of the structs rely on the compiler to pad them

to

natural alignment. That may get us into trouble when trying to

emulate a

virtio device on a host with different guest architecture (like arm

on

x86). It'd be a good idea to manually pad everything to be 64bit

aligned

- then all fields are always at the same spot.


Good point! I'll do this in the next version. Thanks!


I also have a hard time getting my head around the final flow of
everything. Do I always have to create a session to be able to emit a
command? In that case, doesn't that slow down everything, since a
request would then need to wait for the host reply to receive its
session id? There should be some way to fire off a simple non-iv
operation without any session set up imho.


For symmetric algorithms, we'd better create a session before

executing
encryption

Or decryption, because the session usually be kept for a specific
algorithm with specific key in the production environment. And if we

only
change the iv,

we don't need to re-create the session.

I think we have a slight misunderstanding here :)

The current workflow is

   -> create session
   <- session key
   -> data in
   <- data out
   ...
   <- close session
   -> ack

That means that at least for the first packet you have at least one full
round-trip cost from guest to host to guest to be able to send any data.


Yes, that's true...


That sounds pretty expensive to me on the latency side. There are
multiple ways to mitigate that. One idea was to have a separate path in
parallel to the create session + data + close session dance that would
combine them all into a single command. You would still have the session
based version, but accelerate the one-blob case.

Another idea would be to make the guest be the session id janitor. Then
you could just do

   -> create session with key X
   -> data in
   <- data out
   ...

so you save the round trip, if you combine command and data queues, as
then the create and data bits are serialized by their position in the
queue.


... But for lots of crypto requests, the exhaust is low:

-> create session with key X
<- session id
//do something in the guest if you like
-> data in with session_id
-> data in with session_id
-> data in with session_id
... ...(fill out data virtqueue)
<-data out
<-data out
<-data out

And this is what the production environment do currently.


For the asymmetric algorithms, we don't need create a session IIRC.

So, I don't think this is a performance degradation, but a performance

enhancement.

Also, I don't fully understand the split between control and data
queues. As far as I read things, the control queue is used to create
session ids and the data queues can then be used to push data. Is

there

any particular reason for the split? One thing that seems natural to

me

would be to have sessions be per-queue, so you would create a

session on

a particular queue and only have it ever be available there. That way
you get rid of any locking for sessions.


We want to keep a unify request type (structure) for data queue, so

we can

keep the session operation in the control queue. Of course the

control queue

only used to create sessions currently, but we can extend its

functions if
needed

in the future.

I still don't understand. With separate control+data queues you just get
yourself into synchronization troubles. Both struct
virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
have an opcode as first le32 field. You can easily use that to determine
both length of the payload as well as command (control vs data).


There is a big problem that the control handle logic is synchronization,
but the data queue
handling logic is asynchronization. We can't combine them into one queue.
It will decrease the performance because you need indenti

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-04 Thread Ola Liljedahl


On 02/09/2016, 16:05, "Alexander Graf"  wrote:

>>>There is a big problem that the control handle logic is synchronization,
>>>but the data queue
>>>handling logic is asynchronization. We can't combine them into one
>>>queue.
>>>It will decrease the performance because you need indentify each packet
>>>if we do this forcedly.
>>Are you saying that control and data operations are handled by separate
>>"blocks???
>>If you combined control and data queues, there would have to be a (SW)
>>demultiplexer
>>that would add overhead (and potentially decrease throughout) especially
>>for the data
>>operations?
>
>Uh, the multiplexer is as simple as a switch() statement on the opcode,
>no?
You are assuming the backend will (always) be implemented in software.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Gonglei (Arei)
Hi Stefan,


> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Friday, September 02, 2016 8:07 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> On Tue, Aug 30, 2016 at 08:12:15PM +0800, Gonglei wrote:
> 
> Hi,
> I have read through part of the spec and suggest mostly grammar fixes
> below.
> 
Thank you very much! Forgive me for my poor English please :)
I'll fix them in the following version.

> > The virtio crypto device is a virtual crypto device (ie. hardware
> > crypto accelerator card). The virtio crypto device can provide
> > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >
> > In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >
> > Signed-off-by: Gonglei 
> > CC: Michael S. Tsirkin 
> > CC: Cornelia Huck 
> > CC: Stefan Hajnoczi 
> > CC: Lingli Deng 
> > CC: Jani Kokkonen 
> > CC: Ola Liljedahl 
> > CC: Varun Sethi 
> > CC: Zeng Xin 
> > CC: Keating Brian 
> > CC: Ma Liang J 
> > CC: Griffin John 
> > CC: Hanweidong 
> > CC: Mihai Claudiu Caraman 
> > ---
> >  content.tex   |   2 +
> >  virtio-crypto.tex | 835
> ++
> >  2 files changed, 837 insertions(+)
> >  create mode 100644 virtio-crypto.tex
> >
> > diff --git a/content.tex b/content.tex
> > index 4b45678..ab75f78 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len},
> \field{residual},
> >  \field{status_qualifier}, \field{status}, \field{response} and
> >  \field{sense} fields.
> >
> > +\input{virtio-crypto.tex}
> > +
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >
> >  Currently there are three device-independent feature bits defined:
> > diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> > new file mode 100644
> > index 000..491fc25
> > --- /dev/null
> > +++ b/virtio-crypto.tex
> > @@ -0,0 +1,835 @@
> > +\section{Crypto Device}\label{sec:Device Types / Crypto Device}
> > +
> > +The virtio crypto device is a virtual crypto device, and is a kind of
> > +virtual hardware accelerators for virtual machine.  The encryption and
> 
> s/accelerators/accelerator/
> s/virtual machine/virtual machines/
> 
> > +decryption requests of are placed in the data queue, and handled by the
> 
> s/of//
> 
> > +real crypto accelerators finally. The second queue is the control queue,
> > +which is used to create or destroy session for symmetric algorithms, and
> 
> s/session/sessions/
> 
> > +control some advanced features in the future. The virtio crypto
> > +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD,
> > +KDF, ASYM, PRIMITIVE.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
> > +
> > +20
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device /
> Virtqueues}
> > +
> > +\begin{description}
> > +\item[0] dataq1
> > +\item[\ldots]
> > +\item[N-1] dataqN
> > +\item[N] controlq
> > +\end{description}
> > +
> > +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).
> 
> I suggest dropping (\field{max_dataqueues} >= 1) since this constraint
> already is expressed in the device normative section below.  Things
> can go out of sync if they are duplicated.
> 
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature
> bits}
> > +  None currently defined
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / Crypto
> Device / Device configuration layout}
> > +
> > +Thirteen driver-read-only configuration fields are currently defined.
> 
> I count only 12 fields.  I suggest saying "The following
> driver-read-only configuration fields are currently defined:" instead.
> 
> > +\begin{lstlisting}
> > +struct virtio_crypto_config {
> > +le16  status;
> > +le16  max_dataqueues;
> > +le32  crypto_services;
> > +/* detailed algorithms mask */
> > +le32 cipher_algo_l;
> > +le32 cipher_algo_h;
> > +le32 hash_algo;
> > +le32 mac_algo_l;
> > +le32 mac_algo_h;
> > +le32 asym_algo;
> > +le32 kdf_algo;
> > +le32 aead_algo;
> > +le32 primitive_algo;
> > +};
> > +\end{lstlisting}
> > +
> > +The first field, \field{status} is currently defined:
> VIRTIO_CRYPTO_S_HW_READY.
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> > +\end{lstlisting}
> > +
> > +The following driver-read-only field, \field{max_dataqueuess} specifies the
> > +maximum number of data virtqueues (dataq1\ldots dataqN). The
> crypto_services
> > +shows the crypto services the virtio crypto supports. The service currently
> 
> s/service/services/
> 
> > +defined are:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */
> > +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */
> 
> How are these constants used: 1 << VIRTIO_CRYPTO_NO_SERVICE?
> 
> I suggest eliminating the 0 bit constants

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Gonglei (Arei)
Hi,

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, September 02, 2016 10:05 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> 
> 
> On 02.09.16 14:16, Ola Liljedahl wrote:
> >
> >
> > On 02/09/2016, 12:26, "Gonglei (Arei)"  wrote:
> >
> >>
> >>> -Original Message-
> >>> From: Alexander Graf [mailto:ag...@suse.de]
> >>> Sent: Friday, September 02, 2016 4:07 PM
> >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>> specification
> >>>
> >>>
> >>>
> >>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>  Hi Alex,
> 
> 
> > -Original Message-
> > From: Alexander Graf [mailto:ag...@suse.de]
> > Sent: Thursday, September 01, 2016 9:37 PM
> > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>> specification
> >
> > On 08/30/2016 02:12 PM, Gonglei wrote:
> >> The virtio crypto device is a virtual crypto device (ie. hardware
> >> crypto accelerator card). The virtio crypto device can provide
> >> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
> PRIMITIVE.
> >>
> >> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >
> > I have mostly a few high level comments.
> >
> > For starters, a lot of the structs rely on the compiler to pad them
> >>> to
> > natural alignment. That may get us into trouble when trying to
> >>> emulate a
> > virtio device on a host with different guest architecture (like arm
> >>> on
> > x86). It'd be a good idea to manually pad everything to be 64bit
> >>> aligned
> > - then all fields are always at the same spot.
> >
>  Good point! I'll do this in the next version. Thanks!
> 
> > I also have a hard time getting my head around the final flow of
> > everything. Do I always have to create a session to be able to emit a
> > command? In that case, doesn't that slow down everything, since a
> > request would then need to wait for the host reply to receive its
> > session id? There should be some way to fire off a simple non-iv
> > operation without any session set up imho.
> >
>  For symmetric algorithms, we'd better create a session before
> >>> executing
> >>> encryption
>  Or decryption, because the session usually be kept for a specific
>  algorithm with specific key in the production environment. And if we
> >>> only
> >>> change the iv,
>  we don't need to re-create the session.
> >>>
> >>> I think we have a slight misunderstanding here :)
> >>>
> >>> The current workflow is
> >>>
> >>>   -> create session
> >>>   <- session key
> >>>   -> data in
> >>>   <- data out
> >>>   ...
> >>>   <- close session
> >>>   -> ack
> >>>
> >>> That means that at least for the first packet you have at least one full
> >>> round-trip cost from guest to host to guest to be able to send any data.
> >>>
> >> Yes, that's true...
> >>
> >>> That sounds pretty expensive to me on the latency side. There are
> >>> multiple ways to mitigate that. One idea was to have a separate path in
> >>> parallel to the create session + data + close session dance that would
> >>> combine them all into a single command. You would still have the session
> >>> based version, but accelerate the one-blob case.
> >>>
> >>> Another idea would be to make the guest be the session id janitor. Then
> >>> you could just do
> >>>
> >>>   -> create session with key X
> >>>   -> data in
> >>>   <- data out
> >>>   ...
> >>>
> >>> so you save the round trip, if you combine command and data queues, as
> >>> then the create and data bits are serialized by their position in the
> >>> queue.
> >>>
> >> ... But for lots of crypto requests, the exhaust is low:
> >>
> >> -> create session with key X
> >> <- session id
> >>//do something in the guest if you like
> >> -> data in with session_id
> >> -> data in with session_id
> >> -> data in with session_id
> >>... ...(fill out data virtqueue)
> >> <-data out
> >> <-data out
> >><-data out
> >>
> >> And this is what the production environment do currently.
> >>
> 
>  For the asymmetric algorithms, we don't need create a session IIRC.
> 
>  So, I don't think this is a performance degradation, but a performance
> >>> enhancement.
> 
> > Also, I don't fully understand the split between control and data
> > queues. As far as I read things, the control queue is used to create
> > session ids and the data queues can then be used to push data. Is
> >>> there
> > any particular reason for the split? One thing that seems natural to
> >>> me
> > would be to have sessions be per-queue, so you would create a
> >>> session on
> > a particular queue and only have it ever be available there. That way
> > you get rid of any locking for sessions.
> >
>  We want to keep a unify request type (structure) for data queue, so
> >>> we can
>  

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Gonglei (Arei)
Hi,

> -Original Message-
> From: Ma, Liang J [mailto:liang.j...@intel.com]
> Sent: Friday, September 02, 2016 9:47 PM
> Subject: RE: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> Hi  Lei,
>  For wireless algorithm
>  +#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9
> 27
>  +#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA228
>   I suggest rename as
>  +#define VIRTIO_CRYPTO_MAC_KASUMI_F9  27
>  +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA228
>  Kasumi/snow3g  auth mode is not belongs to CMAC family (which is
> used for auth based on general block cipher.e.g AES).

OK, thanks.

> Regards
> Liang
> 

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Gonglei (Arei)
Hi Alex,

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, September 02, 2016 9:53 PM
> Euler)
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> 
> 
> On 02.09.16 12:26, Gonglei (Arei) wrote:
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Friday, September 02, 2016 4:07 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >>
> >>
> >> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>> Hi Alex,
> >>>
> >>>
>  -Original Message-
>  From: Alexander Graf [mailto:ag...@suse.de]
>  Sent: Thursday, September 01, 2016 9:37 PM
>  Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >> specification
> 
>  On 08/30/2016 02:12 PM, Gonglei wrote:
> > The virtio crypto device is a virtual crypto device (ie. hardware
> > crypto accelerator card). The virtio crypto device can provide
> > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >
> > In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
>  I have mostly a few high level comments.
> 
>  For starters, a lot of the structs rely on the compiler to pad them to
>  natural alignment. That may get us into trouble when trying to emulate a
>  virtio device on a host with different guest architecture (like arm on
>  x86). It'd be a good idea to manually pad everything to be 64bit aligned
>  - then all fields are always at the same spot.
> 
> >>> Good point! I'll do this in the next version. Thanks!
> >>>
>  I also have a hard time getting my head around the final flow of
>  everything. Do I always have to create a session to be able to emit a
>  command? In that case, doesn't that slow down everything, since a
>  request would then need to wait for the host reply to receive its
>  session id? There should be some way to fire off a simple non-iv
>  operation without any session set up imho.
> 
> >>> For symmetric algorithms, we'd better create a session before executing
> >> encryption
> >>> Or decryption, because the session usually be kept for a specific
> >>> algorithm with specific key in the production environment. And if we only
> >> change the iv,
> >>> we don't need to re-create the session.
> >>
> >> I think we have a slight misunderstanding here :)
> >>
> >> The current workflow is
> >>
> >>   -> create session
> >>   <- session key
> >>   -> data in
> >>   <- data out
> >>   ...
> >>   <- close session
> >>   -> ack
> >>
> >> That means that at least for the first packet you have at least one full
> >> round-trip cost from guest to host to guest to be able to send any data.
> >>
> > Yes, that's true...
> >
> >> That sounds pretty expensive to me on the latency side. There are
> >> multiple ways to mitigate that. One idea was to have a separate path in
> >> parallel to the create session + data + close session dance that would
> >> combine them all into a single command. You would still have the session
> >> based version, but accelerate the one-blob case.
> >>
> >> Another idea would be to make the guest be the session id janitor. Then
> >> you could just do
> >>
> >>   -> create session with key X
> >>   -> data in
> >>   <- data out
> >>   ...
> >>
> >> so you save the round trip, if you combine command and data queues, as
> >> then the create and data bits are serialized by their position in the 
> >> queue.
> >>
> > ... But for lots of crypto requests, the exhaust is low:
> >
> > -> create session with key X
> > <- session id
> > //do something in the guest if you like
> > -> data in with session_id
> > -> data in with session_id
> > -> data in with session_id
> > ... ...(fill out data virtqueue)
> > <-data out
> > <-data out
> > <-data out
> >
> > And this is what the production environment do currently.
> 
> Does this also apply to a web server handling lots of small requests?
> Does crypto acceleration even make sense in such an environment or would
> it be cheaper to simply to the crypto on the CPU for such small payloads?
> 
> I'm seriously asking :).
> 
I think I get your points now. You mean for one-blob request (different packets 
have different algorithms or key), 
then the exhaust is too higher under current approach? That's true, because for 
each packet
we do have the below workflow:

 Guest -> Host

 -> create session
 <- session id
 -> data in
 <- data out
 <- close session

This is a big problem indeed.

> >
> >>>
> >>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>
> >>> So, I don't think this is a performance degradation, but a performance
> >> enhancement.
> >>>
>  Also, I don't fully understand the split between control and data
>  queues. As far as I read things, the control queue is used to create
>  session ids and the da

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Ma, Liang J
Hi  Lei,
 For wireless algorithm 
 +#define VIRTIO_CRYPTO_MAC_CMAC_KASUMI_F9   27
 +#define VIRTIO_CRYPTO_MAC_CMAC_SNOW3G_UIA228
  I suggest rename as
 +#define VIRTIO_CRYPTO_MAC_KASUMI_F9  27
 +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA228
 Kasumi/snow3g  auth mode is not belongs to CMAC family (which is used
for auth based on general block cipher.e.g AES).
Regards
Liang

-Original Message-
From: Gonglei [mailto:arei.gong...@huawei.com] 
Sent: Tuesday, August 30, 2016 1:12 PM
To: qemu-devel@nongnu.org; virtio-...@lists.oasis-open.org
Cc: peter.huangp...@huawei.com; luoneng...@huawei.com; m...@redhat.com;
cornelia.h...@de.ibm.com; stefa...@redhat.com; denglin...@chinamobile.com;
jani.kokko...@huawei.com; ola.liljed...@arm.com; varun.se...@freescale.com;
Zeng, Xin ; Keating, Brian A
; Ma, Liang J ; Griffin,
John ; hanweid...@huawei.com;
weidong.hu...@huawei.com; mike.cara...@nxp.com; ag...@suse.de;
claudio.font...@huawei.com; Gonglei 
Subject: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
specification

The virtio crypto device is a virtual crypto device (ie. hardware crypto
accelerator card). The virtio crypto device can provide five crypto
services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.

In this patch, CIPHER, MAC, HASH, AEAD services are introduced.

Signed-off-by: Gonglei 
CC: Michael S. Tsirkin 
CC: Cornelia Huck 
CC: Stefan Hajnoczi 
CC: Lingli Deng 
CC: Jani Kokkonen 
CC: Ola Liljedahl 
CC: Varun Sethi 
CC: Zeng Xin 
CC: Keating Brian 
CC: Ma Liang J 
CC: Griffin John 
CC: Hanweidong 
CC: Mihai Claudiu Caraman 
---
 content.tex   |   2 +
 virtio-crypto.tex | 835
++
 2 files changed, 837 insertions(+)
 create mode 100644 virtio-crypto.tex

diff --git a/content.tex b/content.tex
index 4b45678..ab75f78 100644
--- a/content.tex
+++ b/content.tex
@@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len},
\field{residual},  \field{status_qualifier}, \field{status},
\field{response} and  \field{sense} fields.
 
+\input{virtio-crypto.tex}
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined:
diff --git a/virtio-crypto.tex b/virtio-crypto.tex new file mode 100644
index 000..491fc25
--- /dev/null
+++ b/virtio-crypto.tex
@@ -0,0 +1,835 @@
+\section{Crypto Device}\label{sec:Device Types / Crypto Device}
+
+The virtio crypto device is a virtual crypto device, and is a kind of 
+virtual hardware accelerators for virtual machine.  The encryption and 
+decryption requests of are placed in the data queue, and handled by the 
+real crypto accelerators finally. The second queue is the control 
+queue, which is used to create or destroy session for symmetric 
+algorithms, and control some advanced features in the future. The 
+virtio crypto device can provide seven crypto services: CIPHER, MAC, 
+HASH, AEAD, KDF, ASYM, PRIMITIVE.
+
+\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device 
+ID}
+
+20
+
+\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / 
+Virtqueues}
+
+\begin{description}
+\item[0] dataq1
+\item[\ldots]
+\item[N-1] dataqN
+\item[N] controlq
+\end{description}
+
+N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).
+
+\subsection{Feature bits}\label{sec:Device Types / Crypto Device / 
+Feature bits}
+  None currently defined
+
+\subsection{Device configuration layout}\label{sec:Device Types / 
+Crypto Device / Device configuration layout}
+
+Thirteen driver-read-only configuration fields are currently defined.
+
+\begin{lstlisting}
+struct virtio_crypto_config {
+le16  status;
+le16  max_dataqueues;
+le32  crypto_services;
+/* detailed algorithms mask */
+le32 cipher_algo_l;
+le32 cipher_algo_h;
+le32 hash_algo;
+le32 mac_algo_l;
+le32 mac_algo_h;
+le32 asym_algo;
+le32 kdf_algo;
+le32 aead_algo;
+le32 primitive_algo;
+};
+\end{lstlisting}
+
+The first field, \field{status} is currently defined:
VIRTIO_CRYPTO_S_HW_READY.
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0) \end{lstlisting}
+
+The following driver-read-only field, \field{max_dataqueuess} specifies 
+the maximum number of data virtqueues (dataq1\ldots dataqN). The 
+crypto_services shows the crypto services the virtio crypto supports. 
+The service currently defined are:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */ #define 
+VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */ #define 
+VIRTIO_CRYPTO_SERVICE_HASH  (2) /* hash service */
+#define VIRTIO_CRYPTO_SERVICE_MAC   (3) /* MAC (Message Authentication
Codes) service */
+#define VIRTIO_CRYPTO_SERVICE_AEAD  (4) /* AEAD(Authenticated 
+Encryption with Associated Data) service */ \end{lstlisting}
+
+The last driver-read-only fields specify detailed algorithms mask which 
+the device offered for corresponding service

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Alexander Graf


On 02.09.16 14:16, Ola Liljedahl wrote:
> 
> 
> On 02/09/2016, 12:26, "Gonglei (Arei)"  wrote:
> 
>>
>>> -Original Message-
>>> From: Alexander Graf [mailto:ag...@suse.de]
>>> Sent: Friday, September 02, 2016 4:07 PM
>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>> specification
>>>
>>>
>>>
>>> On 02.09.16 05:08, Gonglei (Arei) wrote:
 Hi Alex,


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, September 01, 2016 9:37 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>> specification
>
> On 08/30/2016 02:12 PM, Gonglei wrote:
>> The virtio crypto device is a virtual crypto device (ie. hardware
>> crypto accelerator card). The virtio crypto device can provide
>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>>
>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>
> I have mostly a few high level comments.
>
> For starters, a lot of the structs rely on the compiler to pad them
>>> to
> natural alignment. That may get us into trouble when trying to
>>> emulate a
> virtio device on a host with different guest architecture (like arm
>>> on
> x86). It'd be a good idea to manually pad everything to be 64bit
>>> aligned
> - then all fields are always at the same spot.
>
 Good point! I'll do this in the next version. Thanks!

> I also have a hard time getting my head around the final flow of
> everything. Do I always have to create a session to be able to emit a
> command? In that case, doesn't that slow down everything, since a
> request would then need to wait for the host reply to receive its
> session id? There should be some way to fire off a simple non-iv
> operation without any session set up imho.
>
 For symmetric algorithms, we'd better create a session before
>>> executing
>>> encryption
 Or decryption, because the session usually be kept for a specific
 algorithm with specific key in the production environment. And if we
>>> only
>>> change the iv,
 we don't need to re-create the session.
>>>
>>> I think we have a slight misunderstanding here :)
>>>
>>> The current workflow is
>>>
>>>   -> create session
>>>   <- session key
>>>   -> data in
>>>   <- data out
>>>   ...
>>>   <- close session
>>>   -> ack
>>>
>>> That means that at least for the first packet you have at least one full
>>> round-trip cost from guest to host to guest to be able to send any data.
>>>
>> Yes, that's true...
>>
>>> That sounds pretty expensive to me on the latency side. There are
>>> multiple ways to mitigate that. One idea was to have a separate path in
>>> parallel to the create session + data + close session dance that would
>>> combine them all into a single command. You would still have the session
>>> based version, but accelerate the one-blob case.
>>>
>>> Another idea would be to make the guest be the session id janitor. Then
>>> you could just do
>>>
>>>   -> create session with key X
>>>   -> data in
>>>   <- data out
>>>   ...
>>>
>>> so you save the round trip, if you combine command and data queues, as
>>> then the create and data bits are serialized by their position in the
>>> queue.
>>>
>> ... But for lots of crypto requests, the exhaust is low:
>>
>> -> create session with key X
>> <- session id
>>//do something in the guest if you like
>> -> data in with session_id
>> -> data in with session_id
>> -> data in with session_id
>>... ...(fill out data virtqueue)
>> <-data out
>> <-data out
>><-data out
>>
>> And this is what the production environment do currently.
>>

 For the asymmetric algorithms, we don't need create a session IIRC.

 So, I don't think this is a performance degradation, but a performance
>>> enhancement.

> Also, I don't fully understand the split between control and data
> queues. As far as I read things, the control queue is used to create
> session ids and the data queues can then be used to push data. Is
>>> there
> any particular reason for the split? One thing that seems natural to
>>> me
> would be to have sessions be per-queue, so you would create a
>>> session on
> a particular queue and only have it ever be available there. That way
> you get rid of any locking for sessions.
>
 We want to keep a unify request type (structure) for data queue, so
>>> we can
 keep the session operation in the control queue. Of course the
>>> control queue
 only used to create sessions currently, but we can extend its
>>> functions if
>>> needed
 in the future.
>>>
>>> I still don't understand. With separate control+data queues you just get
>>> yourself into synchronization troubles. Both struct
>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>>> have an opcode as first le32 field. You can easily use that to determ

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Alexander Graf


On 02.09.16 12:26, Gonglei (Arei) wrote:
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Friday, September 02, 2016 4:07 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
>> specification
>>
>>
>>
>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>>> Hi Alex,
>>>
>>>
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, September 01, 2016 9:37 PM
 Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>> specification

 On 08/30/2016 02:12 PM, Gonglei wrote:
> The virtio crypto device is a virtual crypto device (ie. hardware
> crypto accelerator card). The virtio crypto device can provide
> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>
> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.

 I have mostly a few high level comments.

 For starters, a lot of the structs rely on the compiler to pad them to
 natural alignment. That may get us into trouble when trying to emulate a
 virtio device on a host with different guest architecture (like arm on
 x86). It'd be a good idea to manually pad everything to be 64bit aligned
 - then all fields are always at the same spot.

>>> Good point! I'll do this in the next version. Thanks!
>>>
 I also have a hard time getting my head around the final flow of
 everything. Do I always have to create a session to be able to emit a
 command? In that case, doesn't that slow down everything, since a
 request would then need to wait for the host reply to receive its
 session id? There should be some way to fire off a simple non-iv
 operation without any session set up imho.

>>> For symmetric algorithms, we'd better create a session before executing
>> encryption
>>> Or decryption, because the session usually be kept for a specific
>>> algorithm with specific key in the production environment. And if we only
>> change the iv,
>>> we don't need to re-create the session.
>>
>> I think we have a slight misunderstanding here :)
>>
>> The current workflow is
>>
>>   -> create session
>>   <- session key
>>   -> data in
>>   <- data out
>>   ...
>>   <- close session
>>   -> ack
>>
>> That means that at least for the first packet you have at least one full
>> round-trip cost from guest to host to guest to be able to send any data.
>>
> Yes, that's true... 
> 
>> That sounds pretty expensive to me on the latency side. There are
>> multiple ways to mitigate that. One idea was to have a separate path in
>> parallel to the create session + data + close session dance that would
>> combine them all into a single command. You would still have the session
>> based version, but accelerate the one-blob case.
>>
>> Another idea would be to make the guest be the session id janitor. Then
>> you could just do
>>
>>   -> create session with key X
>>   -> data in
>>   <- data out
>>   ...
>>
>> so you save the round trip, if you combine command and data queues, as
>> then the create and data bits are serialized by their position in the queue.
>>
> ... But for lots of crypto requests, the exhaust is low:
> 
>   -> create session with key X
>   <- session id
> //do something in the guest if you like
>   -> data in with session_id
>   -> data in with session_id
>   -> data in with session_id
> ... ...(fill out data virtqueue)
>   <-data out
>   <-data out
> <-data out
> 
> And this is what the production environment do currently.

Does this also apply to a web server handling lots of small requests?
Does crypto acceleration even make sense in such an environment or would
it be cheaper to simply to the crypto on the CPU for such small payloads?

I'm seriously asking :).

> 
>>>
>>> For the asymmetric algorithms, we don't need create a session IIRC.
>>>
>>> So, I don't think this is a performance degradation, but a performance
>> enhancement.
>>>
 Also, I don't fully understand the split between control and data
 queues. As far as I read things, the control queue is used to create
 session ids and the data queues can then be used to push data. Is there
 any particular reason for the split? One thing that seems natural to me
 would be to have sessions be per-queue, so you would create a session on
 a particular queue and only have it ever be available there. That way
 you get rid of any locking for sessions.

>>> We want to keep a unify request type (structure) for data queue, so we can
>>> keep the session operation in the control queue. Of course the control queue
>>> only used to create sessions currently, but we can extend its functions if
>> needed
>>> in the future.
>>
>> I still don't understand. With separate control+data queues you just get
>> yourself into synchronization troubles. Both struct
>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>> have

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Stefan Hajnoczi
On Tue, Aug 30, 2016 at 08:12:15PM +0800, Gonglei wrote:

Hi,
I have read through part of the spec and suggest mostly grammar fixes
below.

> The virtio crypto device is a virtual crypto device (ie. hardware
> crypto accelerator card). The virtio crypto device can provide
> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> 
> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
> Signed-off-by: Gonglei 
> CC: Michael S. Tsirkin 
> CC: Cornelia Huck 
> CC: Stefan Hajnoczi 
> CC: Lingli Deng 
> CC: Jani Kokkonen 
> CC: Ola Liljedahl 
> CC: Varun Sethi 
> CC: Zeng Xin 
> CC: Keating Brian 
> CC: Ma Liang J 
> CC: Griffin John 
> CC: Hanweidong 
> CC: Mihai Claudiu Caraman 
> ---
>  content.tex   |   2 +
>  virtio-crypto.tex | 835 
> ++
>  2 files changed, 837 insertions(+)
>  create mode 100644 virtio-crypto.tex
> 
> diff --git a/content.tex b/content.tex
> index 4b45678..ab75f78 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual},
>  \field{status_qualifier}, \field{status}, \field{response} and
>  \field{sense} fields.
>  
> +\input{virtio-crypto.tex}
> +
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
>  Currently there are three device-independent feature bits defined:
> diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> new file mode 100644
> index 000..491fc25
> --- /dev/null
> +++ b/virtio-crypto.tex
> @@ -0,0 +1,835 @@
> +\section{Crypto Device}\label{sec:Device Types / Crypto Device}
> +
> +The virtio crypto device is a virtual crypto device, and is a kind of
> +virtual hardware accelerators for virtual machine.  The encryption and

s/accelerators/accelerator/
s/virtual machine/virtual machines/

> +decryption requests of are placed in the data queue, and handled by the

s/of//

> +real crypto accelerators finally. The second queue is the control queue,
> +which is used to create or destroy session for symmetric algorithms, and

s/session/sessions/

> +control some advanced features in the future. The virtio crypto
> +device can provide seven crypto services: CIPHER, MAC, HASH, AEAD,
> +KDF, ASYM, PRIMITIVE.
> +
> +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
> +
> +20
> +
> +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] dataq1
> +\item[\ldots]
> +\item[N-1] dataqN
> +\item[N] controlq
> +\end{description}
> +
> +N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).

I suggest dropping (\field{max_dataqueues} >= 1) since this constraint
already is expressed in the device normative section below.  Things
can go out of sync if they are duplicated.

> +
> +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature 
> bits}
> +  None currently defined
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / Crypto 
> Device / Device configuration layout}
> +
> +Thirteen driver-read-only configuration fields are currently defined.

I count only 12 fields.  I suggest saying "The following
driver-read-only configuration fields are currently defined:" instead.

> +\begin{lstlisting}
> +struct virtio_crypto_config {
> +le16  status;
> +le16  max_dataqueues;
> +le32  crypto_services;
> +/* detailed algorithms mask */
> +le32 cipher_algo_l;
> +le32 cipher_algo_h;
> +le32 hash_algo;
> +le32 mac_algo_l;
> +le32 mac_algo_h;
> +le32 asym_algo;
> +le32 kdf_algo;
> +le32 aead_algo;
> +le32 primitive_algo;
> +};
> +\end{lstlisting}
> +
> +The first field, \field{status} is currently defined: 
> VIRTIO_CRYPTO_S_HW_READY.
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> +\end{lstlisting}
> +
> +The following driver-read-only field, \field{max_dataqueuess} specifies the
> +maximum number of data virtqueues (dataq1\ldots dataqN). The crypto_services
> +shows the crypto services the virtio crypto supports. The service currently

s/service/services/

> +defined are:
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */
> +#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */

How are these constants used: 1 << VIRTIO_CRYPTO_NO_SERVICE?

I suggest eliminating the 0 bit constants since they can be deduced from
the fact that all other bits are zeroed.  There is no need for a
dedicated "NO_SERVICE" constant.

> +#define VIRTIO_CRYPTO_SERVICE_HASH  (2) /* hash service */
> +#define VIRTIO_CRYPTO_SERVICE_MAC   (3) /* MAC (Message Authentication 
> Codes) service */
> +#define VIRTIO_CRYPTO_SERVICE_AEAD  (4) /* AEAD(Authenticated Encryption 
> with Associated Data) service */
> +\end{lstlisting}
> +
> +The last driver-read-only fields specify detailed algorithms mask which

s/mask/masks/

> +the device offered for corresponding service. The below CIPHER algorithms

s/the device of

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Ola Liljedahl


On 02/09/2016, 12:26, "Gonglei (Arei)"  wrote:

>
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Friday, September 02, 2016 4:07 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>>specification
>>
>>
>>
>> On 02.09.16 05:08, Gonglei (Arei) wrote:
>> > Hi Alex,
>> >
>> >
>> >> -Original Message-
>> >> From: Alexander Graf [mailto:ag...@suse.de]
>> >> Sent: Thursday, September 01, 2016 9:37 PM
>> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
>> specification
>> >>
>> >> On 08/30/2016 02:12 PM, Gonglei wrote:
>> >>> The virtio crypto device is a virtual crypto device (ie. hardware
>> >>> crypto accelerator card). The virtio crypto device can provide
>> >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>> >>>
>> >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>> >>
>> >> I have mostly a few high level comments.
>> >>
>> >> For starters, a lot of the structs rely on the compiler to pad them
>>to
>> >> natural alignment. That may get us into trouble when trying to
>>emulate a
>> >> virtio device on a host with different guest architecture (like arm
>>on
>> >> x86). It'd be a good idea to manually pad everything to be 64bit
>>aligned
>> >> - then all fields are always at the same spot.
>> >>
>> > Good point! I'll do this in the next version. Thanks!
>> >
>> >> I also have a hard time getting my head around the final flow of
>> >> everything. Do I always have to create a session to be able to emit a
>> >> command? In that case, doesn't that slow down everything, since a
>> >> request would then need to wait for the host reply to receive its
>> >> session id? There should be some way to fire off a simple non-iv
>> >> operation without any session set up imho.
>> >>
>> > For symmetric algorithms, we'd better create a session before
>>executing
>> encryption
>> > Or decryption, because the session usually be kept for a specific
>> > algorithm with specific key in the production environment. And if we
>>only
>> change the iv,
>> > we don't need to re-create the session.
>>
>> I think we have a slight misunderstanding here :)
>>
>> The current workflow is
>>
>>   -> create session
>>   <- session key
>>   -> data in
>>   <- data out
>>   ...
>>   <- close session
>>   -> ack
>>
>> That means that at least for the first packet you have at least one full
>> round-trip cost from guest to host to guest to be able to send any data.
>>
>Yes, that's true...
>
>> That sounds pretty expensive to me on the latency side. There are
>> multiple ways to mitigate that. One idea was to have a separate path in
>> parallel to the create session + data + close session dance that would
>> combine them all into a single command. You would still have the session
>> based version, but accelerate the one-blob case.
>>
>> Another idea would be to make the guest be the session id janitor. Then
>> you could just do
>>
>>   -> create session with key X
>>   -> data in
>>   <- data out
>>   ...
>>
>> so you save the round trip, if you combine command and data queues, as
>> then the create and data bits are serialized by their position in the
>>queue.
>>
>... But for lots of crypto requests, the exhaust is low:
>
>-> create session with key X
><- session id
>//do something in the guest if you like
> -> data in with session_id
>-> data in with session_id
>-> data in with session_id
>... ...(fill out data virtqueue)
><-data out
><-data out
><-data out
>
>And this is what the production environment do currently.
>
>> >
>> > For the asymmetric algorithms, we don't need create a session IIRC.
>> >
>> > So, I don't think this is a performance degradation, but a performance
>> enhancement.
>> >
>> >> Also, I don't fully understand the split between control and data
>> >> queues. As far as I read things, the control queue is used to create
>> >> session ids and the data queues can then be used to push data. Is
>>there
>> >> any particular reason for the split? One thing that seems natural to
>>me
>> >> would be to have sessions be per-queue, so you would create a
>>session on
>> >> a particular queue and only have it ever be available there. That way
>> >> you get rid of any locking for sessions.
>> >>
>> > We want to keep a unify request type (structure) for data queue, so
>>we can
>> > keep the session operation in the control queue. Of course the
>>control queue
>> > only used to create sessions currently, but we can extend its
>>functions if
>> needed
>> > in the future.
>>
>> I still don't understand. With separate control+data queues you just get
>> yourself into synchronization troubles. Both struct
>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
>> have an opcode as first le32 field. You can easily use that to determine
>> both length of the payload as well as command (control vs data).
>>
>
>There is a big problem that the control handle logic is synchronization,
>but 

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Gonglei (Arei)

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, September 02, 2016 4:07 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> 
> 
> On 02.09.16 05:08, Gonglei (Arei) wrote:
> > Hi Alex,
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Thursday, September 01, 2016 9:37 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>> crypto accelerator card). The virtio crypto device can provide
> >>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >>>
> >>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>
> >> I have mostly a few high level comments.
> >>
> >> For starters, a lot of the structs rely on the compiler to pad them to
> >> natural alignment. That may get us into trouble when trying to emulate a
> >> virtio device on a host with different guest architecture (like arm on
> >> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> >> - then all fields are always at the same spot.
> >>
> > Good point! I'll do this in the next version. Thanks!
> >
> >> I also have a hard time getting my head around the final flow of
> >> everything. Do I always have to create a session to be able to emit a
> >> command? In that case, doesn't that slow down everything, since a
> >> request would then need to wait for the host reply to receive its
> >> session id? There should be some way to fire off a simple non-iv
> >> operation without any session set up imho.
> >>
> > For symmetric algorithms, we'd better create a session before executing
> encryption
> > Or decryption, because the session usually be kept for a specific
> > algorithm with specific key in the production environment. And if we only
> change the iv,
> > we don't need to re-create the session.
> 
> I think we have a slight misunderstanding here :)
> 
> The current workflow is
> 
>   -> create session
>   <- session key
>   -> data in
>   <- data out
>   ...
>   <- close session
>   -> ack
> 
> That means that at least for the first packet you have at least one full
> round-trip cost from guest to host to guest to be able to send any data.
> 
Yes, that's true... 

> That sounds pretty expensive to me on the latency side. There are
> multiple ways to mitigate that. One idea was to have a separate path in
> parallel to the create session + data + close session dance that would
> combine them all into a single command. You would still have the session
> based version, but accelerate the one-blob case.
> 
> Another idea would be to make the guest be the session id janitor. Then
> you could just do
> 
>   -> create session with key X
>   -> data in
>   <- data out
>   ...
> 
> so you save the round trip, if you combine command and data queues, as
> then the create and data bits are serialized by their position in the queue.
> 
... But for lots of crypto requests, the exhaust is low:

-> create session with key X
<- session id
//do something in the guest if you like
-> data in with session_id
-> data in with session_id
-> data in with session_id
... ...(fill out data virtqueue)
<-data out
<-data out
<-data out

And this is what the production environment do currently.

> >
> > For the asymmetric algorithms, we don't need create a session IIRC.
> >
> > So, I don't think this is a performance degradation, but a performance
> enhancement.
> >
> >> Also, I don't fully understand the split between control and data
> >> queues. As far as I read things, the control queue is used to create
> >> session ids and the data queues can then be used to push data. Is there
> >> any particular reason for the split? One thing that seems natural to me
> >> would be to have sessions be per-queue, so you would create a session on
> >> a particular queue and only have it ever be available there. That way
> >> you get rid of any locking for sessions.
> >>
> > We want to keep a unify request type (structure) for data queue, so we can
> > keep the session operation in the control queue. Of course the control queue
> > only used to create sessions currently, but we can extend its functions if
> needed
> > in the future.
> 
> I still don't understand. With separate control+data queues you just get
> yourself into synchronization troubles. Both struct
> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> have an opcode as first le32 field. You can easily use that to determine
> both length of the payload as well as command (control vs data).
> 

There is a big problem that the control handle logic is synchronization, but 
the data queue
handling logic is asynchronization. We can't combine them into one queue.
It will decrease the performance because y

Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-02 Thread Alexander Graf


On 02.09.16 05:08, Gonglei (Arei) wrote:
> Hi Alex,
> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Thursday, September 01, 2016 9:37 PM
>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
>> specification
>>
>> On 08/30/2016 02:12 PM, Gonglei wrote:
>>> The virtio crypto device is a virtual crypto device (ie. hardware
>>> crypto accelerator card). The virtio crypto device can provide
>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
>>>
>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
>>
>> I have mostly a few high level comments.
>>
>> For starters, a lot of the structs rely on the compiler to pad them to
>> natural alignment. That may get us into trouble when trying to emulate a
>> virtio device on a host with different guest architecture (like arm on
>> x86). It'd be a good idea to manually pad everything to be 64bit aligned
>> - then all fields are always at the same spot.
>>
> Good point! I'll do this in the next version. Thanks!
> 
>> I also have a hard time getting my head around the final flow of
>> everything. Do I always have to create a session to be able to emit a
>> command? In that case, doesn't that slow down everything, since a
>> request would then need to wait for the host reply to receive its
>> session id? There should be some way to fire off a simple non-iv
>> operation without any session set up imho.
>>
> For symmetric algorithms, we'd better create a session before executing 
> encryption
> Or decryption, because the session usually be kept for a specific
> algorithm with specific key in the production environment. And if we only 
> change the iv, 
> we don't need to re-create the session. 

I think we have a slight misunderstanding here :)

The current workflow is

  -> create session
  <- session key
  -> data in
  <- data out
  ...
  <- close session
  -> ack

That means that at least for the first packet you have at least one full
round-trip cost from guest to host to guest to be able to send any data.

That sounds pretty expensive to me on the latency side. There are
multiple ways to mitigate that. One idea was to have a separate path in
parallel to the create session + data + close session dance that would
combine them all into a single command. You would still have the session
based version, but accelerate the one-blob case.

Another idea would be to make the guest be the session id janitor. Then
you could just do

  -> create session with key X
  -> data in
  <- data out
  ...

so you save the round trip, if you combine command and data queues, as
then the create and data bits are serialized by their position in the queue.

> 
> For the asymmetric algorithms, we don't need create a session IIRC.
> 
> So, I don't think this is a performance degradation, but a performance 
> enhancement.
> 
>> Also, I don't fully understand the split between control and data
>> queues. As far as I read things, the control queue is used to create
>> session ids and the data queues can then be used to push data. Is there
>> any particular reason for the split? One thing that seems natural to me
>> would be to have sessions be per-queue, so you would create a session on
>> a particular queue and only have it ever be available there. That way
>> you get rid of any locking for sessions.
>>
> We want to keep a unify request type (structure) for data queue, so we can
> keep the session operation in the control queue. Of course the control queue
> only used to create sessions currently, but we can extend its functions if 
> needed
> in the future.

I still don't understand. With separate control+data queues you just get
yourself into synchronization troubles. Both struct
virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
have an opcode as first le32 field. You can easily use that to determine
both length of the payload as well as command (control vs data).

You could then also completely get rid of the "queue_id" fields, as any
operation would only ever operate on the queue it's running on.


Alex



Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-01 Thread Gonglei (Arei)
Hi Alex,


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, September 01, 2016 9:37 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device 
> specification
> 
> On 08/30/2016 02:12 PM, Gonglei wrote:
> > The virtio crypto device is a virtual crypto device (ie. hardware
> > crypto accelerator card). The virtio crypto device can provide
> > five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >
> > In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
> I have mostly a few high level comments.
> 
> For starters, a lot of the structs rely on the compiler to pad them to
> natural alignment. That may get us into trouble when trying to emulate a
> virtio device on a host with different guest architecture (like arm on
> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> - then all fields are always at the same spot.
> 
Good point! I'll do this in the next version. Thanks!

> I also have a hard time getting my head around the final flow of
> everything. Do I always have to create a session to be able to emit a
> command? In that case, doesn't that slow down everything, since a
> request would then need to wait for the host reply to receive its
> session id? There should be some way to fire off a simple non-iv
> operation without any session set up imho.
> 
For symmetric algorithms, we'd better create a session before executing 
encryption
Or decryption, because the session usually be kept for a specific
algorithm with specific key in the production environment. And if we only 
change the iv, 
we don't need to re-create the session. 

For the asymmetric algorithms, we don't need create a session IIRC.

So, I don't think this is a performance degradation, but a performance 
enhancement.

> Also, I don't fully understand the split between control and data
> queues. As far as I read things, the control queue is used to create
> session ids and the data queues can then be used to push data. Is there
> any particular reason for the split? One thing that seems natural to me
> would be to have sessions be per-queue, so you would create a session on
> a particular queue and only have it ever be available there. That way
> you get rid of any locking for sessions.
> 
We want to keep a unify request type (structure) for data queue, so we can
keep the session operation in the control queue. Of course the control queue
only used to create sessions currently, but we can extend its functions if 
needed
in the future.

> 
> Alex

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-09-01 Thread Alexander Graf

On 08/30/2016 02:12 PM, Gonglei wrote:

The virtio crypto device is a virtual crypto device (ie. hardware
crypto accelerator card). The virtio crypto device can provide
five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.

In this patch, CIPHER, MAC, HASH, AEAD services are introduced.


I have mostly a few high level comments.

For starters, a lot of the structs rely on the compiler to pad them to 
natural alignment. That may get us into trouble when trying to emulate a 
virtio device on a host with different guest architecture (like arm on 
x86). It'd be a good idea to manually pad everything to be 64bit aligned 
- then all fields are always at the same spot.


I also have a hard time getting my head around the final flow of 
everything. Do I always have to create a session to be able to emit a 
command? In that case, doesn't that slow down everything, since a 
request would then need to wait for the host reply to receive its 
session id? There should be some way to fire off a simple non-iv 
operation without any session set up imho.


Also, I don't fully understand the split between control and data 
queues. As far as I read things, the control queue is used to create 
session ids and the data queues can then be used to push data. Is there 
any particular reason for the split? One thing that seems natural to me 
would be to have sessions be per-queue, so you would create a session on 
a particular queue and only have it ever be available there. That way 
you get rid of any locking for sessions.



Alex




[Qemu-devel] [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification

2016-08-30 Thread Gonglei
The virtio crypto device is a virtual crypto device (ie. hardware
crypto accelerator card). The virtio crypto device can provide
five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.

In this patch, CIPHER, MAC, HASH, AEAD services are introduced.

Signed-off-by: Gonglei 
CC: Michael S. Tsirkin 
CC: Cornelia Huck 
CC: Stefan Hajnoczi 
CC: Lingli Deng 
CC: Jani Kokkonen 
CC: Ola Liljedahl 
CC: Varun Sethi 
CC: Zeng Xin 
CC: Keating Brian 
CC: Ma Liang J 
CC: Griffin John 
CC: Hanweidong 
CC: Mihai Claudiu Caraman 
---
 content.tex   |   2 +
 virtio-crypto.tex | 835 ++
 2 files changed, 837 insertions(+)
 create mode 100644 virtio-crypto.tex

diff --git a/content.tex b/content.tex
index 4b45678..ab75f78 100644
--- a/content.tex
+++ b/content.tex
@@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\input{virtio-crypto.tex}
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined:
diff --git a/virtio-crypto.tex b/virtio-crypto.tex
new file mode 100644
index 000..491fc25
--- /dev/null
+++ b/virtio-crypto.tex
@@ -0,0 +1,835 @@
+\section{Crypto Device}\label{sec:Device Types / Crypto Device}
+
+The virtio crypto device is a virtual crypto device, and is a kind of
+virtual hardware accelerators for virtual machine.  The encryption and
+decryption requests of are placed in the data queue, and handled by the
+real crypto accelerators finally. The second queue is the control queue,
+which is used to create or destroy session for symmetric algorithms, and
+control some advanced features in the future. The virtio crypto
+device can provide seven crypto services: CIPHER, MAC, HASH, AEAD,
+KDF, ASYM, PRIMITIVE.
+
+\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
+
+20
+
+\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues}
+
+\begin{description}
+\item[0] dataq1
+\item[\ldots]
+\item[N-1] dataqN
+\item[N] controlq
+\end{description}
+
+N is set by \field{max_dataqueues} (\field{max_dataqueues} >= 1).
+
+\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature 
bits}
+  None currently defined
+
+\subsection{Device configuration layout}\label{sec:Device Types / Crypto 
Device / Device configuration layout}
+
+Thirteen driver-read-only configuration fields are currently defined.
+
+\begin{lstlisting}
+struct virtio_crypto_config {
+le16  status;
+le16  max_dataqueues;
+le32  crypto_services;
+/* detailed algorithms mask */
+le32 cipher_algo_l;
+le32 cipher_algo_h;
+le32 hash_algo;
+le32 mac_algo_l;
+le32 mac_algo_h;
+le32 asym_algo;
+le32 kdf_algo;
+le32 aead_algo;
+le32 primitive_algo;
+};
+\end{lstlisting}
+
+The first field, \field{status} is currently defined: VIRTIO_CRYPTO_S_HW_READY.
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
+\end{lstlisting}
+
+The following driver-read-only field, \field{max_dataqueuess} specifies the
+maximum number of data virtqueues (dataq1\ldots dataqN). The crypto_services
+shows the crypto services the virtio crypto supports. The service currently
+defined are:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_SERVICE (0) /* cipher services */
+#define VIRTIO_CRYPTO_SERVICE_CIPHER (1) /* cipher services */
+#define VIRTIO_CRYPTO_SERVICE_HASH  (2) /* hash service */
+#define VIRTIO_CRYPTO_SERVICE_MAC   (3) /* MAC (Message Authentication Codes) 
service */
+#define VIRTIO_CRYPTO_SERVICE_AEAD  (4) /* AEAD(Authenticated Encryption with 
Associated Data) service */
+\end{lstlisting}
+
+The last driver-read-only fields specify detailed algorithms mask which
+the device offered for corresponding service. The below CIPHER algorithms
+are defined currently:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_CIPHER 0
+#define VIRTIO_CRYPTO_CIPHER_ARC4   1
+#define VIRTIO_CRYPTO_CIPHER_AES_ECB2
+#define VIRTIO_CRYPTO_CIPHER_AES_CBC3
+#define VIRTIO_CRYPTO_CIPHER_AES_CTR4
+#define VIRTIO_CRYPTO_CIPHER_DES_ECB5
+#define VIRTIO_CRYPTO_CIPHER_DES_CBC6
+#define VIRTIO_CRYPTO_CIPHER_3DES_ECB   7
+#define VIRTIO_CRYPTO_CIPHER_3DES_CBC   8
+#define VIRTIO_CRYPTO_CIPHER_3DES_CTR   9
+#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8  10
+#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA211
+#define VIRTIO_CRYPTO_CIPHER_AES_F8 12
+#define VIRTIO_CRYPTO_CIPHER_AES_XTS13
+#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3   14
+\end{lstlisting}
+
+The below HASH algorithms are defined currently:
+
+\begin{lstlisting}
+#define VIRTIO_CRYPTO_NO_HASH0
+#define VIRTIO_CRYPTO_HASH_MD5   1
+#define VIRTIO_CRYPTO_HASH_SHA1  2
+#define VIRTIO_CRYPTO_HASH_SHA_224   3