>
> 
> On 09/03/2016 05:18 AM, Gonglei (Arei) wrote:
> > Hi,
> >
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Friday, September 02, 2016 10:05 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >>
> >>
> >> On 02.09.16 14:16, Ola Liljedahl wrote:
> >>>
> >>> On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gong...@huawei.com>
> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Alexander Graf [mailto:ag...@suse.de]
> >>>>> Sent: Friday, September 02, 2016 4:07 PM
> >>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>>>> specification
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>>>>> Hi Alex,
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Alexander Graf [mailto:ag...@suse.de]
> >>>>>>> Sent: Thursday, September 01, 2016 9:37 PM
> >>>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>>>> specification
> >>>>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>>>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>>>>>>> crypto accelerator card). The virtio crypto device can provide
> >>>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
> >> PRIMITIVE.
> >>>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>>>>>> I have mostly a few high level comments.
> >>>>>>>
> >>>>>>> For starters, a lot of the structs rely on the compiler to pad them
> >>>>> to
> >>>>>>> natural alignment. That may get us into trouble when trying to
> >>>>> emulate a
> >>>>>>> virtio device on a host with different guest architecture (like arm
> >>>>> on
> >>>>>>> x86). It'd be a good idea to manually pad everything to be 64bit
> >>>>> aligned
> >>>>>>> - then all fields are always at the same spot.
> >>>>>>>
> >>>>>> Good point! I'll do this in the next version. Thanks!
> >>>>>>
> >>>>>>> I also have a hard time getting my head around the final flow of
> >>>>>>> everything. Do I always have to create a session to be able to emit a
> >>>>>>> command? In that case, doesn't that slow down everything, since a
> >>>>>>> request would then need to wait for the host reply to receive its
> >>>>>>> session id? There should be some way to fire off a simple non-iv
> >>>>>>> operation without any session set up imho.
> >>>>>>>
> >>>>>> For symmetric algorithms, we'd better create a session before
> >>>>> executing
> >>>>> encryption
> >>>>>> Or decryption, because the session usually be kept for a specific
> >>>>>> algorithm with specific key in the production environment. And if we
> >>>>> only
> >>>>> change the iv,
> >>>>>> we don't need to re-create the session.
> >>>>> I think we have a slight misunderstanding here :)
> >>>>>
> >>>>> The current workflow is
> >>>>>
> >>>>>    -> create session
> >>>>>    <- session key
> >>>>>    -> data in
> >>>>>    <- data out
> >>>>>    ...
> >>>>>    <- close session
> >>>>>    -> ack
> >>>>>
> >>>>> That means that at least for the first packet you have at least one full
> >>>>> round-trip cost from guest to host to guest to be able to send any data.
> >>>>>
> >>>> Yes, that's true...
> >>>>
> >>>>> That sounds pretty expensive to me on the latency side. There are
> >>>>> multiple ways to mitigate that. One idea was to have a separate path in
> >>>>> parallel to the create session + data + close session dance that would
> >>>>> combine them all into a single command. You would still have the session
> >>>>> based version, but accelerate the one-blob case.
> >>>>>
> >>>>> Another idea would be to make the guest be the session id janitor. Then
> >>>>> you could just do
> >>>>>
> >>>>>    -> create session with key X
> >>>>>    -> data in
> >>>>>    <- data out
> >>>>>    ...
> >>>>>
> >>>>> so you save the round trip, if you combine command and data queues, as
> >>>>> then the create and data bits are serialized by their position in the
> >>>>> queue.
> >>>>>
> >>>> ... But for lots of crypto requests, the exhaust is low:
> >>>>
> >>>> -> create session with key X
> >>>> <- session id
> >>>>     //do something in the guest if you like
> >>>> -> data in with session_id
> >>>> -> data in with session_id
> >>>> -> data in with session_id
> >>>>     ... ...    (fill out data virtqueue)
> >>>> <-data out
> >>>> <-data out
> >>>>     <-data out
> >>>>
> >>>> And this is what the production environment do currently.
> >>>>
> >>>>>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>>>>
> >>>>>> So, I don't think this is a performance degradation, but a performance
> >>>>> enhancement.
> >>>>>>> Also, I don't fully understand the split between control and data
> >>>>>>> queues. As far as I read things, the control queue is used to create
> >>>>>>> session ids and the data queues can then be used to push data. Is
> >>>>> there
> >>>>>>> any particular reason for the split? One thing that seems natural to
> >>>>> me
> >>>>>>> would be to have sessions be per-queue, so you would create a
> >>>>> session on
> >>>>>>> a particular queue and only have it ever be available there. That way
> >>>>>>> you get rid of any locking for sessions.
> >>>>>>>
> >>>>>> We want to keep a unify request type (structure) for data queue, so
> >>>>> we can
> >>>>>> keep the session operation in the control queue. Of course the
> >>>>> control queue
> >>>>>> only used to create sessions currently, but we can extend its
> >>>>> functions if
> >>>>> needed
> >>>>>> in the future.
> >>>>> I still don't understand. With separate control+data queues you just get
> >>>>> yourself into synchronization troubles. Both struct
> >>>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> >>>>> have an opcode as first le32 field. You can easily use that to determine
> >>>>> both length of the payload as well as command (control vs data).
> >>>>>
> >>>> There is a big problem that the control handle logic is synchronization,
> >>>> but the data queue
> >>>> handling logic is asynchronization. We can't combine them into one
> queue.
> >>>> It will decrease the performance because you need indentify each packet
> >>>> if we do this forcedly.
> >>> Are you saying that control and data operations are handled by separate
> >>> "blocksĀ²?
> >>> If you combined control and data queues, there would have to be a (SW)
> >>> demultiplexer
> >>> that would add overhead (and potentially decrease throughout) especially
> >>> for the data
> >>> operations?
> >> Uh, the multiplexer is as simple as a switch() statement on the opcode,
> >> no? It might stall that one particular queue, but that sounds like
> >> something you can improve by increasing the number of queues (and invent
> >> something smart to ease polling of activity on more queues).
> >>
> > Actually, the virtio multiple queue's capacity is based on the backend 
> > crypto
> device,
> > like multiple queue tap device work with virtio-net.
> 
> So what happens when you have 5 crypto accelerator cards in your system? :)
> 
That means you can support much more virito-crypto devices,
not one virtio-crypto device' queues.

Regards,
-Gonglei

Reply via email to