On Fri, 4 Dec 2020 at 18:19, David Howells <dhowe...@redhat.com> wrote: > > Ard Biesheuvel <a...@kernel.org> wrote: > > > The tricky thing with CTS is that you have to ensure that the final > > full and partial blocks are presented to the crypto driver as one > > chunk, or it won't be able to perform the ciphertext stealing. This > > might be the reason for the current approach. If the sunrpc code has > > multiple disjoint chunks of data to encrypto, it is always better to > > wrap it in a single scatterlist and call into the skcipher only once. > > Yeah - the problem with that is that for sunrpc, we might be dealing with 1MB > plus bits of non-contiguous pages, requiring >8K of scatterlist elements > (admittedly, we can chain them, but we may have to do one or more large > allocations). > > > However, I would recommend against it: > > Sorry, recommend against what? >
Recommend against the current approach of manipulating the input like this and feeding it into the skcipher piecemeal. Herbert recently made some changes for MSG_MORE support in the AF_ALG code, which permits a skcipher encryption to be split into several invocations of the skcipher layer without the need for this complexity on the side of the caller. Maybe there is a way to reuse that here. Herbert? > > at least for ARM and arm64, I > > have already contributed SIMD based implementations that use SIMD > > permutation instructions and overlapping loads and stores to perform > > the ciphertext stealing, which means that there is only a single layer > > which implements CTS+CBC+AES, and this layer can consume the entire > > scatterlist in one go. We could easily do something similar in the > > AES-NI driver as well. > > Can you point me at that in the sources? > arm64 has arch/arm64/crypto/aes-glue.c arch/arm64/crypto/aes-modes.S where the former implements the skcipher wrapper for an implementation of "cts(cbc(aes))" static int cts_cbc_encrypt(struct skcipher_request *req) walks over the src/dst scatterlist and feeds the data into the asm helpers, one for the bulk of the input, and one for the final full and partial blocks (or two final full blocks) The SIMD asm helpers are aes_cbc_encrypt aes_cbc_decrypt aes_cbc_cts_encrypt aes_cbc_cts_decrypt > Can you also do SHA at the same time in the same loop? > SHA-1 or HMAC-SHA1? The latter could probably be modeled as an AEAD. The former doesn't really fit the current API so we'd have to invent something for it. > Note that the rfc3962 AES does the checksum over the plaintext, but rfc8009 > does it over the ciphertext. > > David >