Re: [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target
On Wed, Jun 24, 2020 at 5:24 PM Eric Biggers wrote: > > On Wed, Jun 24, 2020 at 09:24:07AM +0100, Ignat Korchagin wrote: > > On Wed, Jun 24, 2020 at 6:04 AM Eric Biggers wrote: > > > > > > On Fri, Jun 19, 2020 at 05:41:32PM +0100, Ignat Korchagin wrote: > > > > Sometimes extra thread offloading imposed by dm-crypt hurts IO latency. > > > > This is > > > > especially visible on busy systems with many processes/threads. > > > > Moreover, most > > > > Crypto API implementaions are async, that is they offload crypto > > > > operations on > > > > their own, so this dm-crypt offloading is excessive. > > > > > > This really should say "some Crypto API implementations are async" > > > instead of > > > "most Crypto API implementations are async". > > > > The most accurate would probably be: most hardware-accelerated Crypto > > API implementations are async > > > > > Notably, the AES-NI implementation of AES-XTS is synchronous if you call > > > it in a > > > context where SIMD instructions are usable. It's only asynchronous when > > > SIMD is > > > not usable. (This seems to have been missed in your blog post.) > > > > No, it was not. This is exactly why we made xts-proxy Crypto API > > module as a second patch. But it seems now it does not make a big > > difference if a used Crypto API implementation is synchronous as well > > (based on some benchmarks outlined in the cover letter to this patch). > > I think the v2 of this patch will not require a synchronous Crypto > > API. This is probably a right thing to do, as the "inline" flag should > > control the way how dm-crypt itself handles requests, not how Crypto > > API handles requests. If a user wants to ensure a particular > > synchronous Crypto API implementation, they can already reconfigure > > dm-crypt and specify the implementation with a "capi:" prefix in the > > the dm table description. > > I think you're missing the point. Although xts-aes-aesni has the > CRYPTO_ALG_ASYNC bit set, the actual implementation processes the request > synchronously if SIMD instructions are currently usable. That's always the > case > in dm-crypt, as far as I can tell. This algorithm has the ASYNC flag only > because it's not synchronous when called in hardIRQ context. > > That's why your "xts-proxy" doesn't make a difference, and why it's misleading > to suggest that the crypto API is doing its own queueing when you're primarily > talking about xts-aes-aesni. The crypto API definitely can do its own > queueing, > mainly with hardware drivers. But it doesn't in this common and relevant > case. I think we're talking about the same things but from different points of view. I would like to clarify that the whole post and this change does not have the intention to focus on aesni (or any x86-specific crypto optimizations). In fact it is quite the opposite: we want to optimize the generic dm-crypt regardless of which crypto is used (that's why I just used a NULL cipher in the cover letter). We also have some arm64 machines [1] and I bet they would benefit here as well. The important point my post tries to make is that the original workqueue offloading in dm-crypt was added because the Crypto API was synchronous back in the day and, exactly as you say, you may not be able to use some hw-accelerated crypto in hard IRQ context as well as doing non-hw crypto in hard IRQ context is a bad idea. Now, most Crypto API are smart enough to figure out on their own if they should process the request inline or offload it to a workqueue, so the workarounds in the dm-crypt itself most likely are not needed. Though, the generic Crypto API "cipher walk" function does refuse to "walk" the buffers in hard IRQ context, so the "tasklet" functionality is still required. But from the dm-crypt perspective - it should not take into account if a particular xts-aes-aesni implementation is MOSTLY synchronous - those are details of the implementation of this particular cipher dm-crypt has no visibility into. So it would be right to say in my opinion if the cipher has the CRYPTO_ALG_ASYNC flag set - it can offload the crypto request to a workqueue at any time. How often does it do it - that's another story and probably should be reviewed elsewhere, if it does it too often. Ignat [1]: https://blog.cloudflare.com/arm-takes-wing/ > - Eric
Re: [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target
On Wed, Jun 24, 2020 at 09:24:07AM +0100, Ignat Korchagin wrote: > On Wed, Jun 24, 2020 at 6:04 AM Eric Biggers wrote: > > > > On Fri, Jun 19, 2020 at 05:41:32PM +0100, Ignat Korchagin wrote: > > > Sometimes extra thread offloading imposed by dm-crypt hurts IO latency. > > > This is > > > especially visible on busy systems with many processes/threads. Moreover, > > > most > > > Crypto API implementaions are async, that is they offload crypto > > > operations on > > > their own, so this dm-crypt offloading is excessive. > > > > This really should say "some Crypto API implementations are async" instead > > of > > "most Crypto API implementations are async". > > The most accurate would probably be: most hardware-accelerated Crypto > API implementations are async > > > Notably, the AES-NI implementation of AES-XTS is synchronous if you call it > > in a > > context where SIMD instructions are usable. It's only asynchronous when > > SIMD is > > not usable. (This seems to have been missed in your blog post.) > > No, it was not. This is exactly why we made xts-proxy Crypto API > module as a second patch. But it seems now it does not make a big > difference if a used Crypto API implementation is synchronous as well > (based on some benchmarks outlined in the cover letter to this patch). > I think the v2 of this patch will not require a synchronous Crypto > API. This is probably a right thing to do, as the "inline" flag should > control the way how dm-crypt itself handles requests, not how Crypto > API handles requests. If a user wants to ensure a particular > synchronous Crypto API implementation, they can already reconfigure > dm-crypt and specify the implementation with a "capi:" prefix in the > the dm table description. I think you're missing the point. Although xts-aes-aesni has the CRYPTO_ALG_ASYNC bit set, the actual implementation processes the request synchronously if SIMD instructions are currently usable. That's always the case in dm-crypt, as far as I can tell. This algorithm has the ASYNC flag only because it's not synchronous when called in hardIRQ context. That's why your "xts-proxy" doesn't make a difference, and why it's misleading to suggest that the crypto API is doing its own queueing when you're primarily talking about xts-aes-aesni. The crypto API definitely can do its own queueing, mainly with hardware drivers. But it doesn't in this common and relevant case. - Eric
Re: [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target
On Wed, Jun 24, 2020 at 6:04 AM Eric Biggers wrote: > > On Fri, Jun 19, 2020 at 05:41:32PM +0100, Ignat Korchagin wrote: > > Sometimes extra thread offloading imposed by dm-crypt hurts IO latency. > > This is > > especially visible on busy systems with many processes/threads. Moreover, > > most > > Crypto API implementaions are async, that is they offload crypto operations > > on > > their own, so this dm-crypt offloading is excessive. > > This really should say "some Crypto API implementations are async" instead of > "most Crypto API implementations are async". The most accurate would probably be: most hardware-accelerated Crypto API implementations are async > Notably, the AES-NI implementation of AES-XTS is synchronous if you call it > in a > context where SIMD instructions are usable. It's only asynchronous when SIMD > is > not usable. (This seems to have been missed in your blog post.) No, it was not. This is exactly why we made xts-proxy Crypto API module as a second patch. But it seems now it does not make a big difference if a used Crypto API implementation is synchronous as well (based on some benchmarks outlined in the cover letter to this patch). I think the v2 of this patch will not require a synchronous Crypto API. This is probably a right thing to do, as the "inline" flag should control the way how dm-crypt itself handles requests, not how Crypto API handles requests. If a user wants to ensure a particular synchronous Crypto API implementation, they can already reconfigure dm-crypt and specify the implementation with a "capi:" prefix in the the dm table description. > > This adds a new flag, which directs dm-crypt not to offload crypto > > operations > > and process everything inline. For cases, where crypto operations cannot > > happen > > inline (hard interrupt context, for example the read path of the NVME > > driver), > > we offload the work to a tasklet rather than a workqueue. > > This patch both removes some dm-crypt specific queueing, and changes > decryption > to use softIRQ context instead of a workqueue. It would be useful to know how > much of a difference the workqueue => softIRQ change makes by itself. Such a > change could be useful for fscrypt as well. (fscrypt uses a workqueue for > decryption, but besides that doesn't use any other queueing.) > > > @@ -127,7 +128,7 @@ struct iv_elephant_private { > > * and encrypts / decrypts at the same time. > > */ > > enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID, > > - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD }; > > + DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD, DM_CRYPT_FORCE_INLINE = > > (sizeof(unsigned long) * 8 - 1) }; > > Assigning a specific enum value isn't necessary. Yes, this is a leftover from our "internal" patch which I wanted to make "future proof" in case future iterations of dm-crypt will add some flags to avoid flag collisions. Will remove in v2. > > > @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct > > crypt_config *cc, > > > > skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]); > > > > - /* > > - * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs > > - * requests if driver request queue is full. > > - */ > > - skcipher_request_set_callback(ctx->r.req, > > - CRYPTO_TFM_REQ_MAY_BACKLOG, > > - kcryptd_async_done, dmreq_of_req(cc, ctx->r.req)); > > + if (test_bit(DM_CRYPT_FORCE_INLINE, >flags)) > > + /* make sure we zero important fields of the request */ > > + skcipher_request_set_callback(ctx->r.req, > > + 0, NULL, NULL); > > + else > > + /* > > + * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs > > + * requests if driver request queue is full. > > + */ > > + skcipher_request_set_callback(ctx->r.req, > > + CRYPTO_TFM_REQ_MAY_BACKLOG, > > + kcryptd_async_done, dmreq_of_req(cc, ctx->r.req)); > > } > > This looks wrong. Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to > crypto_alloc_skcipher(), the skcipher implementation can still be > asynchronous, > in which case providing a callback is required. > > Do you intend that the "force_inline" option forces the use of a synchronous > skcipher (alongside the other things it does)? Or should it still allow > asynchronous ones? As mentioned above, I don't think we should require synchronous crypto with the "force_inline" flag anymore. Although we may remove CRYPTO_TFM_REQ_MAY_BACKLOG with the inline flag. > We may not actually have a choice in that matter, since xts-aes-aesni has the > CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most > cases; thus, the crypto API won't give you it if you ask for a synchronous > cipher. So I think you still need to allow async skciphers? That means a > callback is still always required. > > - Eric
Re: [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target
On Fri, Jun 19, 2020 at 05:41:32PM +0100, Ignat Korchagin wrote: > Sometimes extra thread offloading imposed by dm-crypt hurts IO latency. This > is > especially visible on busy systems with many processes/threads. Moreover, most > Crypto API implementaions are async, that is they offload crypto operations on > their own, so this dm-crypt offloading is excessive. This really should say "some Crypto API implementations are async" instead of "most Crypto API implementations are async". Notably, the AES-NI implementation of AES-XTS is synchronous if you call it in a context where SIMD instructions are usable. It's only asynchronous when SIMD is not usable. (This seems to have been missed in your blog post.) > This adds a new flag, which directs dm-crypt not to offload crypto operations > and process everything inline. For cases, where crypto operations cannot > happen > inline (hard interrupt context, for example the read path of the NVME driver), > we offload the work to a tasklet rather than a workqueue. This patch both removes some dm-crypt specific queueing, and changes decryption to use softIRQ context instead of a workqueue. It would be useful to know how much of a difference the workqueue => softIRQ change makes by itself. Such a change could be useful for fscrypt as well. (fscrypt uses a workqueue for decryption, but besides that doesn't use any other queueing.) > @@ -127,7 +128,7 @@ struct iv_elephant_private { > * and encrypts / decrypts at the same time. > */ > enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID, > - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD }; > + DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD, DM_CRYPT_FORCE_INLINE = > (sizeof(unsigned long) * 8 - 1) }; Assigning a specific enum value isn't necessary. > @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct > crypt_config *cc, > > skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]); > > - /* > - * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs > - * requests if driver request queue is full. > - */ > - skcipher_request_set_callback(ctx->r.req, > - CRYPTO_TFM_REQ_MAY_BACKLOG, > - kcryptd_async_done, dmreq_of_req(cc, ctx->r.req)); > + if (test_bit(DM_CRYPT_FORCE_INLINE, >flags)) > + /* make sure we zero important fields of the request */ > + skcipher_request_set_callback(ctx->r.req, > + 0, NULL, NULL); > + else > + /* > + * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs > + * requests if driver request queue is full. > + */ > + skcipher_request_set_callback(ctx->r.req, > + CRYPTO_TFM_REQ_MAY_BACKLOG, > + kcryptd_async_done, dmreq_of_req(cc, ctx->r.req)); > } This looks wrong. Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to crypto_alloc_skcipher(), the skcipher implementation can still be asynchronous, in which case providing a callback is required. Do you intend that the "force_inline" option forces the use of a synchronous skcipher (alongside the other things it does)? Or should it still allow asynchronous ones? We may not actually have a choice in that matter, since xts-aes-aesni has the CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most cases; thus, the crypto API won't give you it if you ask for a synchronous cipher. So I think you still need to allow async skciphers? That means a callback is still always required. - Eric