> > > On 09/03/2016 05:18 AM, Gonglei (Arei) wrote: > > Hi, > > > >> -----Original Message----- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Friday, September 02, 2016 10:05 PM > >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > >> > >> > >> > >> On 02.09.16 14:16, Ola Liljedahl wrote: > >>> > >>> On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gong...@huawei.com> > wrote: > >>> > >>>>> -----Original Message----- > >>>>> From: Alexander Graf [mailto:ag...@suse.de] > >>>>> Sent: Friday, September 02, 2016 4:07 PM > >>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > >>>>> specification > >>>>> > >>>>> > >>>>> > >>>>> On 02.09.16 05:08, Gonglei (Arei) wrote: > >>>>>> Hi Alex, > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Alexander Graf [mailto:ag...@suse.de] > >>>>>>> Sent: Thursday, September 01, 2016 9:37 PM > >>>>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > >>>>> specification > >>>>>>> On 08/30/2016 02:12 PM, Gonglei wrote: > >>>>>>>> The virtio crypto device is a virtual crypto device (ie. hardware > >>>>>>>> crypto accelerator card). The virtio crypto device can provide > >>>>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, > >> PRIMITIVE. > >>>>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > >>>>>>> I have mostly a few high level comments. > >>>>>>> > >>>>>>> For starters, a lot of the structs rely on the compiler to pad them > >>>>> to > >>>>>>> natural alignment. That may get us into trouble when trying to > >>>>> emulate a > >>>>>>> virtio device on a host with different guest architecture (like arm > >>>>> on > >>>>>>> x86). It'd be a good idea to manually pad everything to be 64bit > >>>>> aligned > >>>>>>> - then all fields are always at the same spot. > >>>>>>> > >>>>>> Good point! I'll do this in the next version. Thanks! > >>>>>> > >>>>>>> I also have a hard time getting my head around the final flow of > >>>>>>> everything. Do I always have to create a session to be able to emit a > >>>>>>> command? In that case, doesn't that slow down everything, since a > >>>>>>> request would then need to wait for the host reply to receive its > >>>>>>> session id? There should be some way to fire off a simple non-iv > >>>>>>> operation without any session set up imho. > >>>>>>> > >>>>>> For symmetric algorithms, we'd better create a session before > >>>>> executing > >>>>> encryption > >>>>>> Or decryption, because the session usually be kept for a specific > >>>>>> algorithm with specific key in the production environment. And if we > >>>>> only > >>>>> change the iv, > >>>>>> we don't need to re-create the session. > >>>>> I think we have a slight misunderstanding here :) > >>>>> > >>>>> The current workflow is > >>>>> > >>>>> -> create session > >>>>> <- session key > >>>>> -> data in > >>>>> <- data out > >>>>> ... > >>>>> <- close session > >>>>> -> ack > >>>>> > >>>>> That means that at least for the first packet you have at least one full > >>>>> round-trip cost from guest to host to guest to be able to send any data. > >>>>> > >>>> Yes, that's true... > >>>> > >>>>> That sounds pretty expensive to me on the latency side. There are > >>>>> multiple ways to mitigate that. One idea was to have a separate path in > >>>>> parallel to the create session + data + close session dance that would > >>>>> combine them all into a single command. You would still have the session > >>>>> based version, but accelerate the one-blob case. > >>>>> > >>>>> Another idea would be to make the guest be the session id janitor. Then > >>>>> you could just do > >>>>> > >>>>> -> create session with key X > >>>>> -> data in > >>>>> <- data out > >>>>> ... > >>>>> > >>>>> so you save the round trip, if you combine command and data queues, as > >>>>> then the create and data bits are serialized by their position in the > >>>>> queue. > >>>>> > >>>> ... But for lots of crypto requests, the exhaust is low: > >>>> > >>>> -> create session with key X > >>>> <- session id > >>>> //do something in the guest if you like > >>>> -> data in with session_id > >>>> -> data in with session_id > >>>> -> data in with session_id > >>>> ... ... (fill out data virtqueue) > >>>> <-data out > >>>> <-data out > >>>> <-data out > >>>> > >>>> And this is what the production environment do currently. > >>>> > >>>>>> For the asymmetric algorithms, we don't need create a session IIRC. > >>>>>> > >>>>>> So, I don't think this is a performance degradation, but a performance > >>>>> enhancement. > >>>>>>> Also, I don't fully understand the split between control and data > >>>>>>> queues. As far as I read things, the control queue is used to create > >>>>>>> session ids and the data queues can then be used to push data. Is > >>>>> there > >>>>>>> any particular reason for the split? One thing that seems natural to > >>>>> me > >>>>>>> would be to have sessions be per-queue, so you would create a > >>>>> session on > >>>>>>> a particular queue and only have it ever be available there. That way > >>>>>>> you get rid of any locking for sessions. > >>>>>>> > >>>>>> We want to keep a unify request type (structure) for data queue, so > >>>>> we can > >>>>>> keep the session operation in the control queue. Of course the > >>>>> control queue > >>>>>> only used to create sessions currently, but we can extend its > >>>>> functions if > >>>>> needed > >>>>>> in the future. > >>>>> I still don't understand. With separate control+data queues you just get > >>>>> yourself into synchronization troubles. Both struct > >>>>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already > >>>>> have an opcode as first le32 field. You can easily use that to determine > >>>>> both length of the payload as well as command (control vs data). > >>>>> > >>>> There is a big problem that the control handle logic is synchronization, > >>>> but the data queue > >>>> handling logic is asynchronization. We can't combine them into one > queue. > >>>> It will decrease the performance because you need indentify each packet > >>>> if we do this forcedly. > >>> Are you saying that control and data operations are handled by separate > >>> "blocksĀ²? > >>> If you combined control and data queues, there would have to be a (SW) > >>> demultiplexer > >>> that would add overhead (and potentially decrease throughout) especially > >>> for the data > >>> operations? > >> Uh, the multiplexer is as simple as a switch() statement on the opcode, > >> no? It might stall that one particular queue, but that sounds like > >> something you can improve by increasing the number of queues (and invent > >> something smart to ease polling of activity on more queues). > >> > > Actually, the virtio multiple queue's capacity is based on the backend > > crypto > device, > > like multiple queue tap device work with virtio-net. > > So what happens when you have 5 crypto accelerator cards in your system? :) > That means you can support much more virito-crypto devices, not one virtio-crypto device' queues.
Regards, -Gonglei