Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-17 Thread Herbert Xu
On Tue, Jan 17, 2017 at 12:20:02PM +0100, Ondrej Mosnáček wrote:
> 2017-01-13 15:29 GMT+01:00 Herbert Xu :
> > What if the driver had hardware support for generating these IVs?
> > With your scheme this cannot be supported at all.
> 
> That's true... I'm starting to think that this isn't really a good
> idea. I was mainly trying to keep the door open for the random IV
> support and also to keep the multi-key stuff (which was really only
> intended for loop-AES partition support) out of the crypto API, but
> both of these can be probably solved in a better way...

As you said that the multi-key stuff is legacy-only I too would like
to see a way to keep that complexity out of the common path.

> > With such a definition you could either generate the IVs in dm-crypt
> > or have them generated in the IV generator.
> 
> That seems kind of hacky to me... but if that's what you prefer, then so be 
> it.

I'm open to other proposals.  The basic requirement is to be able to
process multiple blocks as one entity at the driver level, potentially
generating the IVs there too.

It's essentially the equivalent to full IPsec offload.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] random: remove unused branch in hot code path

2017-01-17 Thread Theodore Ts'o
On Tue, Dec 27, 2016 at 11:40:23PM +0100, Stephan Müller wrote:
> The variable ip is defined to be a __u64 which is always 8 bytes on any
> architecture. Thus, the check for sizeof(ip) > 4 will always be true.
> 
> As the check happens in a hot code path, remove the branch.

The fact that it's a hot code path means the compiler will optimize it
out, so the fact that it's on the hot code path is irrelevant.  The
main issue is that on platforms with a 32-bit IP's, ip >> 32 will
always be zero.  It might be that we can just do this via

#if BITS_PER_LONG == 32
  ...
#else
  ...  
#endif

I'm not sure that works for all platforms, though.  More research is
needed...

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] random: trigger random_ready callback upon crng_init == 1

2017-01-17 Thread Theodore Ts'o
On Tue, Dec 27, 2016 at 11:39:57PM +0100, Stephan Müller wrote:
> The random_ready callback mechanism is intended to replicate the
> getrandom system call behavior to in-kernel users. As the getrandom
> system call unblocks with crng_init == 1, trigger the random_ready
> wakeup call at the same time.

It was deliberate that random_ready would only get triggered with
crng_init==2.

In general I'm assuming kernel callers really want real randomness (as
opposed to using prandom), where as there's a lot of b.s. userspace
users of kernel randomness (for things that really don't require
cryptographic randomness, e.g., for salting Python dictionaries,
systemd/udev using /dev/urandom for non-cryptographic, non-security
applications etc.)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/10] crypto - AES for ARM/arm64 updates for v4.11 (round #2)

2017-01-17 Thread Ard Biesheuvel
Patch #1 is a fix for the CBC chaining issue that was discussed on the
mailing list. The driver itself is queued for v4.11, so this fix can go
right on top.

Patches #2 - #6 clear the cra_alignmasks of various drivers: all NEON
capable CPUs can perform unaligned accesses, and the advantage of using
the slightly faster aligned accessors (which only exist on ARM not arm64)
is certainly outweighed by the cost of copying data to suitably aligned
buffers.

NOTE: patch #5 won't apply unless 'crypto: arm64/aes-blk - honour iv_out
requirement in CBC and CTR modes' is applied first, which was sent out
separately as a bugfix for v3.16 - v4.9. If this is a problem, this patch
can wait.

Patch #7 and #8 are minor tweaks to the new scalar AES code.

Patch #9 improves the performance of the plain NEON AES code, to make it
more suitable as a fallback for the new bitsliced NEON code, which can
only operate on 8 blocks in parallel, and needs another driver to perform
CBC encryption or XTS tweak generation.

Patch #10 updates the new bitsliced AES NEON code to switch to the plain
NEON driver as a fallback.

Patches #9 and #10 improve the performance of CBC encryption by ~35% on
low end cores such as the Cortex-A53 that can be found in the Raspberry Pi3

Ard Biesheuvel (10):
  crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode
  crypto: arm/aes-ce - remove cra_alignmask
  crypto: arm/chacha20 - remove cra_alignmask
  crypto: arm64/aes-ce-ccm - remove cra_alignmask
  crypto: arm64/aes-blk - remove cra_alignmask
  crypto: arm64/chacha20 - remove cra_alignmask
  crypto: arm64/aes - avoid literals for cross-module symbol references
  crypto: arm64/aes - performance tweak
  crypto: arm64/aes-neon-blk - tweak performance for low end cores
  crypto: arm64/aes - replace scalar fallback with plain NEON fallback

 arch/arm/crypto/aes-ce-core.S  |  84 -
 arch/arm/crypto/aes-ce-glue.c  |  15 +-
 arch/arm/crypto/chacha20-neon-glue.c   |   1 -
 arch/arm64/crypto/Kconfig  |   2 +-
 arch/arm64/crypto/aes-ce-ccm-glue.c|   1 -
 arch/arm64/crypto/aes-cipher-core.S|  59 +++---
 arch/arm64/crypto/aes-glue.c   |  18 +-
 arch/arm64/crypto/aes-modes.S  |   8 +-
 arch/arm64/crypto/aes-neon.S   | 199 
 arch/arm64/crypto/aes-neonbs-core.S|  25 ++-
 arch/arm64/crypto/aes-neonbs-glue.c|  38 +++-
 arch/arm64/crypto/chacha20-neon-glue.c |   1 -
 12 files changed, 199 insertions(+), 252 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] crypto: arm/chacha20 - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/crypto/chacha20-neon-glue.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/crypto/chacha20-neon-glue.c 
b/arch/arm/crypto/chacha20-neon-glue.c
index 592f75ae4fa1..59a7be08e80c 100644
--- a/arch/arm/crypto/chacha20-neon-glue.c
+++ b/arch/arm/crypto/chacha20-neon-glue.c
@@ -94,7 +94,6 @@ static struct skcipher_alg alg = {
.base.cra_priority  = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize   = sizeof(struct chacha20_ctx),
-   .base.cra_alignmask = 1,
.base.cra_module= THIS_MODULE,
 
.min_keysize= CHACHA20_KEY_SIZE,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode

2017-01-17 Thread Ard Biesheuvel
Update the new bitsliced NEON AES implementation in CTR mode to return
the next IV back to the skcipher API client. This is necessary for
chaining to work correctly.

Note that this is only done if the request is a round multiple of the
block size, since otherwise, chaining is impossible anyway.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-neonbs-core.S | 25 +---
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-core.S 
b/arch/arm64/crypto/aes-neonbs-core.S
index 8d0cdaa2768d..2ada12dd768e 100644
--- a/arch/arm64/crypto/aes-neonbs-core.S
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -874,12 +874,19 @@ CPU_LE(   rev x8, x8  )
cselx4, x4, xzr, pl
cselx9, x9, xzr, le
 
+   tbnzx9, #1, 0f
next_ctrv1
+   tbnzx9, #2, 0f
next_ctrv2
+   tbnzx9, #3, 0f
next_ctrv3
+   tbnzx9, #4, 0f
next_ctrv4
+   tbnzx9, #5, 0f
next_ctrv5
+   tbnzx9, #6, 0f
next_ctrv6
+   tbnzx9, #7, 0f
next_ctrv7
 
 0: mov bskey, x2
@@ -928,11 +935,11 @@ CPU_LE(   rev x8, x8  )
eor v5.16b, v5.16b, v15.16b
st1 {v5.16b}, [x0], #16
 
-   next_ctrv0
+8: next_ctrv0
cbnzx4, 99b
 
 0: st1 {v0.16b}, [x5]
-8: ldp x29, x30, [sp], #16
+9: ldp x29, x30, [sp], #16
ret
 
/*
@@ -941,23 +948,23 @@ CPU_LE(   rev x8, x8  )
 */
 1: cbz x6, 8b
st1 {v1.16b}, [x5]
-   b   8b
+   b   9b
 2: cbz x6, 8b
st1 {v4.16b}, [x5]
-   b   8b
+   b   9b
 3: cbz x6, 8b
st1 {v6.16b}, [x5]
-   b   8b
+   b   9b
 4: cbz x6, 8b
st1 {v3.16b}, [x5]
-   b   8b
+   b   9b
 5: cbz x6, 8b
st1 {v7.16b}, [x5]
-   b   8b
+   b   9b
 6: cbz x6, 8b
st1 {v2.16b}, [x5]
-   b   8b
+   b   9b
 7: cbz x6, 8b
st1 {v5.16b}, [x5]
-   b   8b
+   b   9b
 ENDPROC(aesbs_ctr_encrypt)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] crypto: arm64/aes - avoid literals for cross-module symbol references

2017-01-17 Thread Ard Biesheuvel
Using simple adrp/add pairs to refer to the AES lookup tables exposed by
the generic AES driver (which could be loaded far away from this driver
when KASLR is in effect) was unreliable at module load time before commit
41c066f2c4d4 ("arm64: assembler: make adr_l work in modules under KASLR"),
which is why the AES code used literals instead.

So now we can get rid of the literals, and switch to the adr_l macro.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-cipher-core.S | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/crypto/aes-cipher-core.S 
b/arch/arm64/crypto/aes-cipher-core.S
index 37590ab8121a..cd58c61e6677 100644
--- a/arch/arm64/crypto/aes-cipher-core.S
+++ b/arch/arm64/crypto/aes-cipher-core.S
@@ -89,8 +89,8 @@ CPU_BE(   rev w8, w8  )
eor w7, w7, w11
eor w8, w8, w12
 
-   ldr tt, =\ttab
-   ldr lt, =\ltab
+   adr_l   tt, \ttab
+   adr_l   lt, \ltab
 
tbnzrounds, #1, 1f
 
@@ -111,9 +111,6 @@ CPU_BE( rev w8, w8  )
stp w5, w6, [out]
stp w7, w8, [out, #8]
ret
-
-   .align  4
-   .ltorg
.endm
 
.align  5
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] crypto: arm64/aes - replace scalar fallback with plain NEON fallback

2017-01-17 Thread Ard Biesheuvel
The new bitsliced NEON implementation of AES uses a fallback in two
places: CBC encryption (which is strictly sequential, whereas this
driver can only operate efficiently on 8 blocks at a time), and the
XTS tweak generation, which involves encrypting a single AES block
with a different key schedule.

The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback,
given that it is faster than scalar on low end cores (which is what
the NEON implementations target, since high end cores have dedicated
instructions for AES), and shows similar behavior in terms of D-cache
footprint and sensitivity to cache timing attacks. So switch the fallback
handling to the plain NEON driver.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/Kconfig   |  2 +-
 arch/arm64/crypto/aes-neonbs-glue.c | 38 ++--
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 5de75c3dcbd4..bed7feddfeed 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -86,7 +86,7 @@ config CRYPTO_AES_ARM64_BS
tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
-   select CRYPTO_AES_ARM64
+   select CRYPTO_AES_ARM64_NEON_BLK
select CRYPTO_SIMD
 
 endif
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c 
b/arch/arm64/crypto/aes-neonbs-glue.c
index 323dd76ae5f0..863e436ecf89 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -10,7 +10,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -42,7 +41,12 @@ asmlinkage void aesbs_xts_encrypt(u8 out[], u8 const in[], 
u8 const rk[],
 asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[],
  int rounds, int blocks, u8 iv[]);
 
-asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int 
rounds);
+/* borrowed from aes-neon-blk.ko */
+asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[],
+int rounds, int blocks, int first);
+asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[],
+int rounds, int blocks, u8 iv[],
+int first);
 
 struct aesbs_ctx {
u8  rk[13 * (8 * AES_BLOCK_SIZE) + 32];
@@ -140,16 +144,28 @@ static int aesbs_cbc_setkey(struct crypto_skcipher *tfm, 
const u8 *in_key,
return 0;
 }
 
-static void cbc_encrypt_one(struct crypto_skcipher *tfm, const u8 *src, u8 
*dst)
+static int cbc_encrypt(struct skcipher_request *req)
 {
+   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct skcipher_walk walk;
+   int err, first = 1;
 
-   __aes_arm64_encrypt(ctx->enc, dst, src, ctx->key.rounds);
-}
+   err = skcipher_walk_virt(, req, true);
 
-static int cbc_encrypt(struct skcipher_request *req)
-{
-   return crypto_cbc_encrypt_walk(req, cbc_encrypt_one);
+   kernel_neon_begin();
+   while (walk.nbytes >= AES_BLOCK_SIZE) {
+   unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+
+   /* fall back to the non-bitsliced NEON implementation */
+   neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ctx->enc, ctx->key.rounds, blocks, walk.iv,
+first);
+   err = skcipher_walk_done(, walk.nbytes % AES_BLOCK_SIZE);
+   first = 0;
+   }
+   kernel_neon_end();
+   return err;
 }
 
 static int cbc_decrypt(struct skcipher_request *req)
@@ -254,9 +270,11 @@ static int __xts_crypt(struct skcipher_request *req,
 
err = skcipher_walk_virt(, req, true);
 
-   __aes_arm64_encrypt(ctx->twkey, walk.iv, walk.iv, ctx->key.rounds);
-
kernel_neon_begin();
+
+   neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey,
+ctx->key.rounds, 1, 1);
+
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] crypto: arm64/aes-blk - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel 
---
NOTE: this won't apply unless 'crypto: arm64/aes-blk - honour iv_out
requirement in CBC and CTR modes' is applied first, which was sent out
separately as a bugfix for v3.16 - v4.9

 arch/arm64/crypto/aes-glue.c  | 16 ++--
 arch/arm64/crypto/aes-modes.S |  8 +++-
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 5164aaf82c6a..8ee1fb7aaa4f 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -215,14 +215,15 @@ static int ctr_encrypt(struct skcipher_request *req)
u8 *tsrc = walk.src.virt.addr;
 
/*
-* Minimum alignment is 8 bytes, so if nbytes is <= 8, we need
-* to tell aes_ctr_encrypt() to only read half a block.
+* Tell aes_ctr_encrypt() to process a tail block.
 */
-   blocks = (nbytes <= 8) ? -1 : 1;
+   blocks = -1;
 
-   aes_ctr_encrypt(tail, tsrc, (u8 *)ctx->key_enc, rounds,
+   aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds,
blocks, walk.iv, first);
-   memcpy(tdst, tail, nbytes);
+   if (tdst != tsrc)
+   memcpy(tdst, tsrc, nbytes);
+   crypto_xor(tdst, tail, nbytes);
err = skcipher_walk_done(, 0);
}
kernel_neon_end();
@@ -282,7 +283,6 @@ static struct skcipher_alg aes_algs[] = { {
.cra_flags  = CRYPTO_ALG_INTERNAL,
.cra_blocksize  = AES_BLOCK_SIZE,
.cra_ctxsize= sizeof(struct crypto_aes_ctx),
-   .cra_alignmask  = 7,
.cra_module = THIS_MODULE,
},
.min_keysize= AES_MIN_KEY_SIZE,
@@ -298,7 +298,6 @@ static struct skcipher_alg aes_algs[] = { {
.cra_flags  = CRYPTO_ALG_INTERNAL,
.cra_blocksize  = AES_BLOCK_SIZE,
.cra_ctxsize= sizeof(struct crypto_aes_ctx),
-   .cra_alignmask  = 7,
.cra_module = THIS_MODULE,
},
.min_keysize= AES_MIN_KEY_SIZE,
@@ -315,7 +314,6 @@ static struct skcipher_alg aes_algs[] = { {
.cra_flags  = CRYPTO_ALG_INTERNAL,
.cra_blocksize  = 1,
.cra_ctxsize= sizeof(struct crypto_aes_ctx),
-   .cra_alignmask  = 7,
.cra_module = THIS_MODULE,
},
.min_keysize= AES_MIN_KEY_SIZE,
@@ -332,7 +330,6 @@ static struct skcipher_alg aes_algs[] = { {
.cra_priority   = PRIO - 1,
.cra_blocksize  = 1,
.cra_ctxsize= sizeof(struct crypto_aes_ctx),
-   .cra_alignmask  = 7,
.cra_module = THIS_MODULE,
},
.min_keysize= AES_MIN_KEY_SIZE,
@@ -350,7 +347,6 @@ static struct skcipher_alg aes_algs[] = { {
.cra_flags  = CRYPTO_ALG_INTERNAL,
.cra_blocksize  = AES_BLOCK_SIZE,
.cra_ctxsize= sizeof(struct crypto_aes_xts_ctx),
-   .cra_alignmask  = 7,
.cra_module = THIS_MODULE,
},
.min_keysize= 2 * AES_MIN_KEY_SIZE,
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index 838dad5c209f..92b982a8b112 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -337,7 +337,7 @@ AES_ENTRY(aes_ctr_encrypt)
 
 .Lctrcarrydone:
subsw4, w4, #1
-   bmi .Lctrhalfblock  /* blocks < 0 means 1/2 block */
+   bmi .Lctrtailblock  /* blocks <0 means tail block */
ld1 {v3.16b}, [x1], #16
eor v3.16b, v0.16b, v3.16b
st1 {v3.16b}, [x0], #16
@@ -348,10 +348,8 @@ AES_ENTRY(aes_ctr_encrypt)
FRAME_POP
ret
 
-.Lctrhalfblock:
-   ld1 {v3.8b}, [x1]
-   eor v3.8b, v0.8b, v3.8b
-   st1 {v3.8b}, [x0]
+.Lctrtailblock:
+   st1 {v0.16b}, [x0]
FRAME_POP
ret
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] crypto: arm64/aes - performance tweak

2017-01-17 Thread Ard Biesheuvel
Shuffle some instructions around in the __hround macro to shave off
0.1 cycles per byte on Cortex-A57.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-cipher-core.S | 52 +++-
 1 file changed, 19 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/crypto/aes-cipher-core.S 
b/arch/arm64/crypto/aes-cipher-core.S
index cd58c61e6677..f2f9cc519309 100644
--- a/arch/arm64/crypto/aes-cipher-core.S
+++ b/arch/arm64/crypto/aes-cipher-core.S
@@ -20,46 +20,32 @@
tt  .reqx4
lt  .reqx2
 
-   .macro  __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc
-   ldp \out0, \out1, [rk], #8
-
-   ubfxw13, \in0, #0, #8
-   ubfxw14, \in1, #8, #8
-   ldr w13, [tt, w13, uxtw #2]
-   ldr w14, [tt, w14, uxtw #2]
-
+   .macro  __pair, enc, reg0, reg1, in0, in1e, in1d, shift
+   ubfx\reg0, \in0, #\shift, #8
.if \enc
-   ubfxw17, \in1, #0, #8
-   ubfxw18, \in2, #8, #8
+   ubfx\reg1, \in1e, #\shift, #8
.else
-   ubfxw17, \in3, #0, #8
-   ubfxw18, \in0, #8, #8
+   ubfx\reg1, \in1d, #\shift, #8
.endif
-   ldr w17, [tt, w17, uxtw #2]
-   ldr w18, [tt, w18, uxtw #2]
+   ldr \reg0, [tt, \reg0, uxtw #2]
+   ldr \reg1, [tt, \reg1, uxtw #2]
+   .endm
 
-   ubfxw15, \in2, #16, #8
-   ubfxw16, \in3, #24, #8
-   ldr w15, [tt, w15, uxtw #2]
-   ldr w16, [tt, w16, uxtw #2]
+   .macro  __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc
+   ldp \out0, \out1, [rk], #8
 
-   .if \enc
-   ubfx\t0, \in3, #16, #8
-   ubfx\t1, \in0, #24, #8
-   .else
-   ubfx\t0, \in1, #16, #8
-   ubfx\t1, \in2, #24, #8
-   .endif
-   ldr \t0, [tt, \t0, uxtw #2]
-   ldr \t1, [tt, \t1, uxtw #2]
+   __pair  \enc, w13, w14, \in0, \in1, \in3, 0
+   __pair  \enc, w15, w16, \in1, \in2, \in0, 8
+   __pair  \enc, w17, w18, \in2, \in3, \in1, 16
+   __pair  \enc, \t0, \t1, \in3, \in0, \in2, 24
 
eor \out0, \out0, w13
-   eor \out1, \out1, w17
-   eor \out0, \out0, w14, ror #24
-   eor \out1, \out1, w18, ror #24
-   eor \out0, \out0, w15, ror #16
-   eor \out1, \out1, \t0, ror #16
-   eor \out0, \out0, w16, ror #8
+   eor \out1, \out1, w14
+   eor \out0, \out0, w15, ror #24
+   eor \out1, \out1, w16, ror #24
+   eor \out0, \out0, w17, ror #16
+   eor \out1, \out1, w18, ror #16
+   eor \out0, \out0, \t0, ror #8
eor \out1, \out1, \t1, ror #8
.endm
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] crypto: arm64/aes-neon-blk - tweak performance for low end cores

2017-01-17 Thread Ard Biesheuvel
The non-bitsliced AES implementation using the NEON is highly sensitive
to micro-architectural details, and, as it turns out, the Cortex-A53 on
the Raspberry Pi 3 is a core that can benefit from this code, given that
its scalar AES performance is abysmal (32.9 cycles per byte).

The new bitsliced AES code manages 19.8 cycles per byte on this core,
but can only operate on 8 blocks at a time, which is not supported by
all chaining modes. With a bit of tweaking, we can get the plain NEON
code to run at 24.0 cycles per byte, making it useful for sequential
modes like CBC encryption. (Like bitsliced NEON, the plain NEON
implementation does not use any lookup tables, which makes it easy on
the D-cache, and invulnerable to cache timing attacks)

So tweak the plain NEON AES code to use tbl instructions rather than
shl/sri pairs, and to avoid the need to reload permutation vectors or
other constants from memory in every round.

To allow the ECB and CBC encrypt routines to be reused by the bitsliced
NEON code in a subsequent patch, export them from the module.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-glue.c |   2 +
 arch/arm64/crypto/aes-neon.S | 199 
 2 files changed, 77 insertions(+), 124 deletions(-)

diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 8ee1fb7aaa4f..055bc3f61138 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -409,5 +409,7 @@ static int __init aes_init(void)
 module_cpu_feature_match(AES, aes_init);
 #else
 module_init(aes_init);
+EXPORT_SYMBOL(neon_aes_ecb_encrypt);
+EXPORT_SYMBOL(neon_aes_cbc_encrypt);
 #endif
 module_exit(aes_exit);
diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S
index 85f07ead7c5c..67c68462bc20 100644
--- a/arch/arm64/crypto/aes-neon.S
+++ b/arch/arm64/crypto/aes-neon.S
@@ -1,7 +1,7 @@
 /*
  * linux/arch/arm64/crypto/aes-neon.S - AES cipher for ARMv8 NEON
  *
- * Copyright (C) 2013 Linaro Ltd 
+ * Copyright (C) 2013 - 2017 Linaro Ltd. 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -25,9 +25,9 @@
/* preload the entire Sbox */
.macro  prepare, sbox, shiftrows, temp
adr \temp, \sbox
-   moviv12.16b, #0x40
+   moviv12.16b, #0x1b
ldr q13, \shiftrows
-   moviv14.16b, #0x1b
+   ldr q14, .Lror32by8
ld1 {v16.16b-v19.16b}, [\temp], #64
ld1 {v20.16b-v23.16b}, [\temp], #64
ld1 {v24.16b-v27.16b}, [\temp], #64
@@ -50,37 +50,33 @@
 
/* apply SubBytes transformation using the the preloaded Sbox */
.macro  sub_bytes, in
-   sub v9.16b, \in\().16b, v12.16b
+   sub v9.16b, \in\().16b, v15.16b
tbl \in\().16b, {v16.16b-v19.16b}, \in\().16b
-   sub v10.16b, v9.16b, v12.16b
+   sub v10.16b, v9.16b, v15.16b
tbx \in\().16b, {v20.16b-v23.16b}, v9.16b
-   sub v11.16b, v10.16b, v12.16b
+   sub v11.16b, v10.16b, v15.16b
tbx \in\().16b, {v24.16b-v27.16b}, v10.16b
tbx \in\().16b, {v28.16b-v31.16b}, v11.16b
.endm
 
/* apply MixColumns transformation */
-   .macro  mix_columns, in
-   mul_by_xv10.16b, \in\().16b, v9.16b, v14.16b
-   rev32   v8.8h, \in\().8h
-   eor \in\().16b, v10.16b, \in\().16b
-   shl v9.4s, v8.4s, #24
-   shl v11.4s, \in\().4s, #24
-   sri v9.4s, v8.4s, #8
-   sri v11.4s, \in\().4s, #8
-   eor v9.16b, v9.16b, v8.16b
-   eor v10.16b, v10.16b, v9.16b
-   eor \in\().16b, v10.16b, v11.16b
-   .endm
-
+   .macro  mix_columns, in, enc
+   .if \enc == 0
/* Inverse MixColumns: pre-multiply by { 5, 0, 4, 0 } */
-   .macro  inv_mix_columns, in
-   mul_by_xv11.16b, \in\().16b, v10.16b, v14.16b
-   mul_by_xv11.16b, v11.16b, v10.16b, v14.16b
+   mul_by_xv11.16b, \in\().16b, v10.16b, v12.16b
+   mul_by_xv11.16b, v11.16b, v10.16b, v12.16b
eor \in\().16b, \in\().16b, v11.16b
rev32   v11.8h, v11.8h
eor \in\().16b, \in\().16b, v11.16b
-   mix_columns \in
+   .endif
+
+   mul_by_xv10.16b, \in\().16b, v9.16b, v12.16b
+   rev32   v8.8h, \in\().8h
+   eor \in\().16b, \in\().16b, v10.16b
+   eor v10.16b, v10.16b, v8.16b
+   eor v11.16b, \in\().16b, v8.16b
+   tbl v11.16b, {v11.16b}, v14.16b
+   

[PATCH 04/10] crypto: arm64/aes-ce-ccm - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-ce-ccm-glue.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c 
b/arch/arm64/crypto/aes-ce-ccm-glue.c
index cc5515dac74a..6a7dbc7c83a6 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -258,7 +258,6 @@ static struct aead_alg ccm_aes_alg = {
.cra_priority   = 300,
.cra_blocksize  = 1,
.cra_ctxsize= sizeof(struct crypto_aes_ctx),
-   .cra_alignmask  = 7,
.cra_module = THIS_MODULE,
},
.ivsize = AES_BLOCK_SIZE,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] crypto: arm64/chacha20 - remove cra_alignmask

2017-01-17 Thread Ard Biesheuvel
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/chacha20-neon-glue.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/crypto/chacha20-neon-glue.c 
b/arch/arm64/crypto/chacha20-neon-glue.c
index a7f2337d46cf..a7cd575ea223 100644
--- a/arch/arm64/crypto/chacha20-neon-glue.c
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -93,7 +93,6 @@ static struct skcipher_alg alg = {
.base.cra_priority  = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize   = sizeof(struct chacha20_ctx),
-   .base.cra_alignmask = 1,
.base.cra_module= THIS_MODULE,
 
.min_keysize= CHACHA20_KEY_SIZE,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes

2017-01-17 Thread Ard Biesheuvel
Update the ARMv8 Crypto Extensions and the plain NEON AES implementations
in CBC and CTR modes to return the next IV back to the skcipher API client.
This is necessary for chaining to work correctly.

Note that for CTR, this is only done if the request is a round multiple of
the block size, since otherwise, chaining is impossible anyway.

Cc:  # v3.16+
Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-modes.S | 88 ++--
 1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index c53dbeae79f2..838dad5c209f 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -193,15 +193,16 @@ AES_ENTRY(aes_cbc_encrypt)
cbz w6, .Lcbcencloop
 
ld1 {v0.16b}, [x5]  /* get iv */
-   enc_prepare w3, x2, x5
+   enc_prepare w3, x2, x6
 
 .Lcbcencloop:
ld1 {v1.16b}, [x1], #16 /* get next pt block */
eor v0.16b, v0.16b, v1.16b  /* ..and xor with iv */
-   encrypt_block   v0, w3, x2, x5, w6
+   encrypt_block   v0, w3, x2, x6, w7
st1 {v0.16b}, [x0], #16
subsw4, w4, #1
bne .Lcbcencloop
+   st1 {v0.16b}, [x5]  /* return iv */
ret
 AES_ENDPROC(aes_cbc_encrypt)
 
@@ -211,7 +212,7 @@ AES_ENTRY(aes_cbc_decrypt)
cbz w6, .LcbcdecloopNx
 
ld1 {v7.16b}, [x5]  /* get iv */
-   dec_prepare w3, x2, x5
+   dec_prepare w3, x2, x6
 
 .LcbcdecloopNx:
 #if INTERLEAVE >= 2
@@ -248,7 +249,7 @@ AES_ENTRY(aes_cbc_decrypt)
 .Lcbcdecloop:
ld1 {v1.16b}, [x1], #16 /* get next ct block */
mov v0.16b, v1.16b  /* ...and copy to v0 */
-   decrypt_block   v0, w3, x2, x5, w6
+   decrypt_block   v0, w3, x2, x6, w7
eor v0.16b, v0.16b, v7.16b  /* xor with iv => pt */
mov v7.16b, v1.16b  /* ct is next iv */
st1 {v0.16b}, [x0], #16
@@ -256,6 +257,7 @@ AES_ENTRY(aes_cbc_decrypt)
bne .Lcbcdecloop
 .Lcbcdecout:
FRAME_POP
+   st1 {v7.16b}, [x5]  /* return iv */
ret
 AES_ENDPROC(aes_cbc_decrypt)
 
@@ -267,24 +269,15 @@ AES_ENDPROC(aes_cbc_decrypt)
 
 AES_ENTRY(aes_ctr_encrypt)
FRAME_PUSH
-   cbnzw6, .Lctrfirst  /* 1st time around? */
-   umovx5, v4.d[1] /* keep swabbed ctr in reg */
-   rev x5, x5
-#if INTERLEAVE >= 2
-   cmn w5, w4  /* 32 bit overflow? */
-   bcs .Lctrinc
-   add x5, x5, #1  /* increment BE ctr */
-   b   .LctrincNx
-#else
-   b   .Lctrinc
-#endif
-.Lctrfirst:
+   cbz w6, .Lctrnotfirst   /* 1st time around? */
enc_prepare w3, x2, x6
ld1 {v4.16b}, [x5]
-   umovx5, v4.d[1] /* keep swabbed ctr in reg */
-   rev x5, x5
+
+.Lctrnotfirst:
+   umovx8, v4.d[1] /* keep swabbed ctr in reg */
+   rev x8, x8
 #if INTERLEAVE >= 2
-   cmn w5, w4  /* 32 bit overflow? */
+   cmn w8, w4  /* 32 bit overflow? */
bcs .Lctrloop
 .LctrloopNx:
subsw4, w4, #INTERLEAVE
@@ -292,11 +285,11 @@ AES_ENTRY(aes_ctr_encrypt)
 #if INTERLEAVE == 2
mov v0.8b, v4.8b
mov v1.8b, v4.8b
-   rev x7, x5
-   add x5, x5, #1
+   rev x7, x8
+   add x8, x8, #1
ins v0.d[1], x7
-   rev x7, x5
-   add x5, x5, #1
+   rev x7, x8
+   add x8, x8, #1
ins v1.d[1], x7
ld1 {v2.16b-v3.16b}, [x1], #32  /* get 2 input blocks */
do_encrypt_block2x
@@ -305,7 +298,7 @@ AES_ENTRY(aes_ctr_encrypt)
st1 {v0.16b-v1.16b}, [x0], #32
 #else
ldr q8, =0x300020001/* addends 1,2,3[,0] */
-   dup v7.4s, w5
+   dup v7.4s, w8
mov v0.16b, v4.16b
add v7.4s, v7.4s, v8.4s
mov v1.16b, v4.16b
@@ -323,18 +316,12 @@ AES_ENTRY(aes_ctr_encrypt)
eor v2.16b, v7.16b, v2.16b
eor v3.16b, v5.16b, v3.16b
st1 {v0.16b-v3.16b}, [x0], #64
-   add x5, x5, #INTERLEAVE
+   add x8, x8, #INTERLEAVE
 #endif
-   cbz w4, .LctroutNx
-.LctrincNx:
-   rev 

Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-17 Thread Ondrej Mosnáček
2017-01-13 15:29 GMT+01:00 Herbert Xu :
> What if the driver had hardware support for generating these IVs?
> With your scheme this cannot be supported at all.

That's true... I'm starting to think that this isn't really a good
idea. I was mainly trying to keep the door open for the random IV
support and also to keep the multi-key stuff (which was really only
intended for loop-AES partition support) out of the crypto API, but
both of these can be probably solved in a better way...

> Getting the IVs back is not actually that hard.  We could simply
> change the algorithm definition for the IV generator so that
> the IVs are embedded in the plaintext and ciphertext.  For
> example, you could declare it so that the for n sectors the
> first n*ivsize bytes would be the IV, and the actual plaintext
> or ciphertext would follow.
>
> With such a definition you could either generate the IVs in dm-crypt
> or have them generated in the IV generator.

That seems kind of hacky to me... but if that's what you prefer, then so be it.

Cheers,
Ondrej

>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-17 Thread Ondrej Mosnáček
Hi Binoy,

2017-01-16 9:37 GMT+01:00 Binoy Jayan :
> The initial goal of our proposal was to process the encryption requests with 
> the
> maximum possible block sizes with a hardware which has automated iv generation
> capabilities. But when it is done in software, and if the bulk
> requests are processed
> sequentially, one block at a time, the memory foot print could be
> reduced even if
> the bulk request exceeds a page. While your patch looks good, there
> are couple of
> drawbacks one of which is the maximum size of a bulk request is a page. This
> could limit the capability of the crypto hardware. If the whole bio is
> processed at
> once, which is what qualcomm's version of dm-req-crypt does, it achieves an 
> even
> better performance.

I see... well, I added the limit only so that the async fallback
implementation can allocate multiple requests, so they can be
processed in parallel, as they would be in the current dm-crypt code.
I'm not really sure if that brings any benefit, but I guess if some HW
accelerator has multiple engines, then this allows distributing the
work among them. (I wonder how switching to the crypto API's IV
generation will affect the situation for drivers that can process
requests in parallel, but do not support the IV generators...)

I could remove the limit and switch the fallback to sequential
processing (or maybe even allocate the requests from a mempool, the
way dm-crypt does it now...), but after Herbert's feedback I'm
probably going to scrap this patchset anyway...

>> Note that if the 'keycount' parameter of the cipher specification is set to a
>> value other than 1, dm-crypt still sends only one sector in each request, 
>> since
>> in such case the neighboring sectors are encrypted with different keys.
>
> This could be avoided if the key management is done at the crypto layer.

Yes, but remember that the only reasonable use-case for using keycount
!= 1 is mounting loop-AES partitions (which is kind of a legacy
format, so there is not much point in making HW drivers for it). It is
an unfortunate consequence of Milan's decision to make keycount an
independent part of the cipher specification (instead of making it
specific for the LMK mode), that all the other IV modes are now
'polluted' with the requirement to support it.

I discussed with Milan the possibility of deprecating the keycount
parameter (i.e. allowing only value of 64 for LMK and 1 for all the
other IV modes) and then converting the IV modes to skciphers (or IV
generators, or some combination of both). This would significantly
simplify the key management and allow for better optimization
strategies. However, I don't know if such change would be accepted by
device-mapper maintainers, since it may break someone's unusual
dm-crypt configuration...

Cheers,
Ondrej
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Herbert Xu
On Tue, Jan 17, 2017 at 09:20:11AM +, Ard Biesheuvel wrote:
> 
> So to be clear, it is part of the API that after calling
> crypto_skcipher_encrypt(req), and completing the request, req->iv
> should contain a value that could potentially be used to encrypt
> additional data? That sounds highly specific to CBC (e.g., this could
> never work with XTS, since the tweak generation is only performed
> once), so it does not make sense for skciphers in general. For
> instance, drivers for h/w peripherals that never need to map the data
> to begin with (since they only pass the physical addresses to the
> hardware) will need to explicitly map the destination buffer to
> retrieve those bytes, on the off chance that the transform may be
> wrapped by CTS.

Yes this is part of the API.  There was a patch to test this in
testmgr but I wanted to give the drivers some more time before
adding it.

It isn't just CBC that uses chaining.  Other modes such as CTR
use it too.  Disk encryption in general don't chaining but that's
because they are sector-oriented.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Herbert Xu
On Tue, Jan 17, 2017 at 09:30:30AM +, Ard Biesheuvel wrote:
>
> Got a link?

http://lkml.iu.edu/hypermail/linux/kernel/1506.2/00346.html

> OK, so that means chaining skcipher_set_crypt() calls, where req->iv
> is passed on between requests? Are there chaining modes beyond
> cts(cbc) encryption that rely on this?

I think algif_skcipher relies on this too.

> It is easily fixed in the chaining mode code, so I'm perfectly happy
> to fix it there instead, but I'd like to understand the requirements
> exactly before doing so

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Herbert Xu
On Mon, Jan 16, 2017 at 09:16:35AM +, Ard Biesheuvel wrote:
> Since the skcipher conversion in commit 0605c41cc53c ("crypto:
> cts - Convert to skcipher"), the cts code tacitly assumes that
> the underlying CBC encryption transform performed on the first
> part of the plaintext returns an IV in req->iv that is suitable
> for encrypting the final bit.
> 
> While this is usually the case, it is not mandated by the API, and
> given that the CTS code already accesses the ciphertext scatterlist
> to retrieve those bytes, we can simply copy them into req->iv before
> proceeding.

Ugh while there are some legacy drivers that break this is certainly
part of the API.

Which underlying CBC implementation is breaking this?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Ard Biesheuvel
On 17 January 2017 at 09:25, Herbert Xu  wrote:
> On Tue, Jan 17, 2017 at 09:20:11AM +, Ard Biesheuvel wrote:
>>
>> So to be clear, it is part of the API that after calling
>> crypto_skcipher_encrypt(req), and completing the request, req->iv
>> should contain a value that could potentially be used to encrypt
>> additional data? That sounds highly specific to CBC (e.g., this could
>> never work with XTS, since the tweak generation is only performed
>> once), so it does not make sense for skciphers in general. For
>> instance, drivers for h/w peripherals that never need to map the data
>> to begin with (since they only pass the physical addresses to the
>> hardware) will need to explicitly map the destination buffer to
>> retrieve those bytes, on the off chance that the transform may be
>> wrapped by CTS.
>
> Yes this is part of the API.  There was a patch to test this in
> testmgr but I wanted to give the drivers some more time before
> adding it.
>

Got a link?

> It isn't just CBC that uses chaining.  Other modes such as CTR
> use it too.  Disk encryption in general don't chaining but that's
> because they are sector-oriented.
>

OK, so that means chaining skcipher_set_crypt() calls, where req->iv
is passed on between requests? Are there chaining modes beyond
cts(cbc) encryption that rely on this?

It is easily fixed in the chaining mode code, so I'm perfectly happy
to fix it there instead, but I'd like to understand the requirements
exactly before doing so
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: generic/cts - fix regression in iv handling

2017-01-17 Thread Ard Biesheuvel
On 17 January 2017 at 09:11, Herbert Xu  wrote:
> On Mon, Jan 16, 2017 at 09:16:35AM +, Ard Biesheuvel wrote:
>> Since the skcipher conversion in commit 0605c41cc53c ("crypto:
>> cts - Convert to skcipher"), the cts code tacitly assumes that
>> the underlying CBC encryption transform performed on the first
>> part of the plaintext returns an IV in req->iv that is suitable
>> for encrypting the final bit.
>>
>> While this is usually the case, it is not mandated by the API, and
>> given that the CTS code already accesses the ciphertext scatterlist
>> to retrieve those bytes, we can simply copy them into req->iv before
>> proceeding.
>
> Ugh while there are some legacy drivers that break this is certainly
> part of the API.
>
> Which underlying CBC implementation is breaking this?
>

arch/arm64/crypto/aes-modes.S does not return the IV back to the
caller, so cts(cbc-aes-ce) is currently broken.

So to be clear, it is part of the API that after calling
crypto_skcipher_encrypt(req), and completing the request, req->iv
should contain a value that could potentially be used to encrypt
additional data? That sounds highly specific to CBC (e.g., this could
never work with XTS, since the tweak generation is only performed
once), so it does not make sense for skciphers in general. For
instance, drivers for h/w peripherals that never need to map the data
to begin with (since they only pass the physical addresses to the
hardware) will need to explicitly map the destination buffer to
retrieve those bytes, on the off chance that the transform may be
wrapped by CTS.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html