Hi Alex, > -----Original Message----- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, September 02, 2016 9:53 PM > Euler) > Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > > > > On 02.09.16 12:26, Gonglei (Arei) wrote: > > > >> -----Original Message----- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Friday, September 02, 2016 4:07 PM > >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > specification > >> > >> > >> > >> On 02.09.16 05:08, Gonglei (Arei) wrote: > >>> Hi Alex, > >>> > >>> > >>>> -----Original Message----- > >>>> From: Alexander Graf [mailto:ag...@suse.de] > >>>> Sent: Thursday, September 01, 2016 9:37 PM > >>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device > >> specification > >>>> > >>>> On 08/30/2016 02:12 PM, Gonglei wrote: > >>>>> The virtio crypto device is a virtual crypto device (ie. hardware > >>>>> crypto accelerator card). The virtio crypto device can provide > >>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE. > >>>>> > >>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > >>>> > >>>> I have mostly a few high level comments. > >>>> > >>>> For starters, a lot of the structs rely on the compiler to pad them to > >>>> natural alignment. That may get us into trouble when trying to emulate a > >>>> virtio device on a host with different guest architecture (like arm on > >>>> x86). It'd be a good idea to manually pad everything to be 64bit aligned > >>>> - then all fields are always at the same spot. > >>>> > >>> Good point! I'll do this in the next version. Thanks! > >>> > >>>> I also have a hard time getting my head around the final flow of > >>>> everything. Do I always have to create a session to be able to emit a > >>>> command? In that case, doesn't that slow down everything, since a > >>>> request would then need to wait for the host reply to receive its > >>>> session id? There should be some way to fire off a simple non-iv > >>>> operation without any session set up imho. > >>>> > >>> For symmetric algorithms, we'd better create a session before executing > >> encryption > >>> Or decryption, because the session usually be kept for a specific > >>> algorithm with specific key in the production environment. And if we only > >> change the iv, > >>> we don't need to re-create the session. > >> > >> I think we have a slight misunderstanding here :) > >> > >> The current workflow is > >> > >> -> create session > >> <- session key > >> -> data in > >> <- data out > >> ... > >> <- close session > >> -> ack > >> > >> That means that at least for the first packet you have at least one full > >> round-trip cost from guest to host to guest to be able to send any data. > >> > > Yes, that's true... > > > >> That sounds pretty expensive to me on the latency side. There are > >> multiple ways to mitigate that. One idea was to have a separate path in > >> parallel to the create session + data + close session dance that would > >> combine them all into a single command. You would still have the session > >> based version, but accelerate the one-blob case. > >> > >> Another idea would be to make the guest be the session id janitor. Then > >> you could just do > >> > >> -> create session with key X > >> -> data in > >> <- data out > >> ... > >> > >> so you save the round trip, if you combine command and data queues, as > >> then the create and data bits are serialized by their position in the > >> queue. > >> > > ... But for lots of crypto requests, the exhaust is low: > > > > -> create session with key X > > <- session id > > //do something in the guest if you like > > -> data in with session_id > > -> data in with session_id > > -> data in with session_id > > ... ... (fill out data virtqueue) > > <-data out > > <-data out > > <-data out > > > > And this is what the production environment do currently. > > Does this also apply to a web server handling lots of small requests? > Does crypto acceleration even make sense in such an environment or would > it be cheaper to simply to the crypto on the CPU for such small payloads? > > I'm seriously asking :). > I think I get your points now. You mean for one-blob request (different packets have different algorithms or key), then the exhaust is too higher under current approach? That's true, because for each packet we do have the below workflow:
Guest -> Host -> create session <- session id -> data in <- data out <- close session This is a big problem indeed. > > > >>> > >>> For the asymmetric algorithms, we don't need create a session IIRC. > >>> > >>> So, I don't think this is a performance degradation, but a performance > >> enhancement. > >>> > >>>> Also, I don't fully understand the split between control and data > >>>> queues. As far as I read things, the control queue is used to create > >>>> session ids and the data queues can then be used to push data. Is there > >>>> any particular reason for the split? One thing that seems natural to me > >>>> would be to have sessions be per-queue, so you would create a session on > >>>> a particular queue and only have it ever be available there. That way > >>>> you get rid of any locking for sessions. > >>>> > >>> We want to keep a unify request type (structure) for data queue, so we can > >>> keep the session operation in the control queue. Of course the control > queue > >>> only used to create sessions currently, but we can extend its functions if > >> needed > >>> in the future. > >> > >> I still don't understand. With separate control+data queues you just get > >> yourself into synchronization troubles. Both struct > >> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already > >> have an opcode as first le32 field. You can easily use that to determine > >> both length of the payload as well as command (control vs data). > >> > > > > There is a big problem that the control handle logic is synchronization, > > but the > data queue > > handling logic is asynchronization. We can't combine them into one queue. > > It will decrease the performance because you need indentify each packet > > if we do this forcedly. > > I might again just not grasp the full picture here. Is the session key > something global you get from hardware? > The specific algorithm and key, key length etc. can exclusively determine a session, which is a software concept IMO, in order to improve the performance for lots of packets with the same algo and key while executing crypto operation. A session is a handle which describes the cryptographic parameters to be applied to a number of buffers. > In that case, it might be a good idea to take a look at all existing > crypto accelerator implementations we know of today and see which model > works best across the board. > Mihai, what about your crypto accelerators for SOC? > I also think it's probably a good idea to not pass hardware session ids > into the guest, so you'd want to create a separate lookup table that > converts from virtio session to hardware session. That way hardware that > does not have global session tokens can actually leverage multiple CPUs > hammering it. > The retuned session_id is an index based on software realization, not hardware session ids. We truly need to create a separate lookup table. > > Alex Regards, -Gonglei