Re: BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)

2018-08-21 Thread Ondrej Mosnáček
ut 21. 8. 2018 o 16:18 Stephan Mueller  napísal(a):
> Am Dienstag, 21. August 2018, 14:48:11 CEST schrieb Ondrej Mosnáček:
>
> Hi Ondrej, Marcelo,
>
> (+Marcelo)
>
> > Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
> > holds the socket lock the whole time and yet passes
> > CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
> > wrong?
>
> I think you are referring to lock_sock(sk)?
>
> If so, this should not be the culprit: the socket lock is in essence a mutex-
> like operation with its own wait queue that it allowed to sleep. In
> lock_sock_nested that is called by lock_sock it even has the call of
> might_sleep which indicates that the caller may be put to sleep.
>
> Looking into the code (without too much debugging) I see in the function
> p8_aes_cbc_encrypt that is part of the stack trace the call to
> preempt_disable() which starts an atomic context. The preempt_enable() is
> invoked after the walk operation.
>
> The preempt_disable increases the preempt_count. That counter is used by
> in_atomic() to check whether we are in atomic context.
>
> The issue is that blkcipher_walk_done may call crypto_yield() which then
> invokes cond_resched if the implementation is allowed to sleep.

Indeed, you're right, the issue is actually in the vmx_crypto code. I
remember having looked at the 'ctr(aes)' implementation in there a few
days ago (I think I was trying to debug this very issue, but for some
reason I only looked at ctr(aes)...) and I didn't find any bug, so
that's why I jumped to suspecting the algif_skcipher code... I should
have double-checked :)

It turns out the 'cbc(aes)' (and actually also 'xts(aes)')
implementation is coded a bit differently and they both *do* contain
the sleep-in-atomic bug. I will try to fix them according to the
correct CTR implementation and send a patch.

Thanks,
Ondrej

> @Marcelo: shouldn't be the sleep flag be cleared when entering the
> preempt_disable section?
>
> Ciao
> Stephan
>
>


BUG: libkcapi tests trigger sleep-in-atomic bug in VMX code (ppc64)

2018-08-21 Thread Ondrej Mosnáček
Hi,

I hit the following BUG when running the kcapi-enc-test.sh test from
libkcapi [1] on ppc64/ppc64le with recent kernels:
[  891.863680] BUG: sleeping function called from invalid context at
include/crypto/algapi.h:424
[  891.864622] in_atomic(): 1, irqs_disabled(): 0, pid: 12347, name: kcapi-enc
[  891.864739] 1 lock held by kcapi-enc/12347:
[  891.864811]  #0: f5d42c46 (sk_lock-AF_ALG){+.+.}, at:
skcipher_recvmsg+0x50/0x530
[  891.865076] CPU: 5 PID: 12347 Comm: kcapi-enc Not tainted
4.19.0-0.rc0.git3.1.fc30.ppc64le #1
[  891.865251] Call Trace:
[  891.865340] [c003387578c0] [c0d67ea4]
dump_stack+0xe8/0x164 (unreliable)
[  891.865511] [c00338757910] [c0172a58] ___might_sleep+0x2f8/0x310
[  891.865679] [c00338757990] [c06bff74]
blkcipher_walk_done+0x374/0x4a0
[  891.865825] [c003387579e0] [d7e73e70]
p8_aes_cbc_encrypt+0x1c8/0x260 [vmx_crypto]
[  891.865993] [c00338757ad0] [c06c0ee0]
skcipher_encrypt_blkcipher+0x60/0x80
[  891.866128] [c00338757b10] [c06ec504]
skcipher_recvmsg+0x424/0x530
[  891.866283] [c00338757bd0] [c0b00654] sock_recvmsg+0x74/0xa0
[  891.866403] [c00338757c10] [c0b00f64] ___sys_recvmsg+0xf4/0x2f0
[  891.866515] [c00338757d90] [c0b02bb8] __sys_recvmsg+0x68/0xe0
[  891.866631] [c00338757e30] [c000bbe4] system_call+0x5c/0x70

This is on 4.19.0-0.rc0.git3.1.fc30.ppc64le kernel from current Fedora
rawhide, but the same happens on the Koji builders (while building
libkcapi and running its tests) which run on 4.17.* kernels. The BUG
starts to trigger more likely as the length of the message goes up
(usually it starts at 65535 bytes, but sometimes even earlier).

Looking at crypto/algif_skcipher.c, I can see that skcipher_recvmsg()
holds the socket lock the whole time and yet passes
CRYPTO_TFM_REQ_MAY_SLEEP to the cipher implementation. Isn't that
wrong?

I don't have much knowledge about the atomic context stuff in the
Linux kernel, but the dmesg output seems to imply that holding the
socket lock is what makes the context atomic and is why the cipher
implementation shouldn't be allowed to sleep here. Perhaps
_skcipher_recvmsg() could actually release the lock before invoking
the cipher operation? AFAIK that only needs to access the allocated
data, which shouldn't be accessed by other tasks anyway.

[1] https://github.com/smuellerDD/libkcapi/tree/master/test

Thanks,
Ondrej


[PATCH v2] crypto: Mark MORUS SIMD glue as x86-specific

2018-05-21 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

Commit 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for
MORUS") accidetally consiedered the glue code to be usable by different
architectures, but it seems to be only usable on x86.

This patch moves it under arch/x86/crypto and adds 'depends on X86' to
the Kconfig options and also removes the prompt to hide these internal
options from the user.

Reported-by: kbuild test robot 
Signed-off-by: Ondrej Mosnacek 
---
 arch/x86/crypto/Makefile | 3 +++
 {crypto => arch/x86/crypto}/morus1280_glue.c | 4 ++--
 {crypto => arch/x86/crypto}/morus640_glue.c  | 4 ++--
 crypto/Kconfig   | 6 --
 crypto/Makefile  | 2 --
 5 files changed, 11 insertions(+), 8 deletions(-)
 rename {crypto => arch/x86/crypto}/morus1280_glue.c (98%)
 rename {crypto => arch/x86/crypto}/morus640_glue.c (98%)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 3813e7cdaada..48e731d782e9 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -42,6 +42,9 @@ obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
 obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
 obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
 
+obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
+obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
+
 obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
 obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
 
diff --git a/crypto/morus1280_glue.c b/arch/x86/crypto/morus1280_glue.c
similarity index 98%
rename from crypto/morus1280_glue.c
rename to arch/x86/crypto/morus1280_glue.c
index ce1e5c34b09d..0dccdda1eb3a 100644
--- a/crypto/morus1280_glue.c
+++ b/arch/x86/crypto/morus1280_glue.c
@@ -1,6 +1,6 @@
 /*
  * The MORUS-1280 Authenticated-Encryption Algorithm
- *   Common glue skeleton
+ *   Common x86 SIMD glue skeleton
  *
  * Copyright (c) 2016-2018 Ondrej Mosnacek 
  * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
@@ -299,4 +299,4 @@ EXPORT_SYMBOL_GPL(cryptd_morus1280_glue_exit_tfm);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Ondrej Mosnacek ");
-MODULE_DESCRIPTION("MORUS-1280 AEAD mode -- glue for optimizations");
+MODULE_DESCRIPTION("MORUS-1280 AEAD mode -- glue for x86 optimizations");
diff --git a/crypto/morus640_glue.c b/arch/x86/crypto/morus640_glue.c
similarity index 98%
rename from crypto/morus640_glue.c
rename to arch/x86/crypto/morus640_glue.c
index c7e788cfaa29..7b58fe4d9bd1 100644
--- a/crypto/morus640_glue.c
+++ b/arch/x86/crypto/morus640_glue.c
@@ -1,6 +1,6 @@
 /*
  * The MORUS-640 Authenticated-Encryption Algorithm
- *   Common glue skeleton
+ *   Common x86 SIMD glue skeleton
  *
  * Copyright (c) 2016-2018 Ondrej Mosnacek 
  * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
@@ -295,4 +295,4 @@ EXPORT_SYMBOL_GPL(cryptd_morus640_glue_exit_tfm);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Ondrej Mosnacek ");
-MODULE_DESCRIPTION("MORUS-640 AEAD mode -- glue for optimizations");
+MODULE_DESCRIPTION("MORUS-640 AEAD mode -- glue for x86 optimizations");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 75f5efde9aa3..30d54a56e64a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -341,7 +341,8 @@ config CRYPTO_MORUS640
  Support for the MORUS-640 dedicated AEAD algorithm.
 
 config CRYPTO_MORUS640_GLUE
-   tristate "MORUS-640 AEAD algorithm (glue for SIMD optimizations)"
+   tristate
+   depends on X86
select CRYPTO_AEAD
select CRYPTO_CRYPTD
help
@@ -363,7 +364,8 @@ config CRYPTO_MORUS1280
  Support for the MORUS-1280 dedicated AEAD algorithm.
 
 config CRYPTO_MORUS1280_GLUE
-   tristate "MORUS-1280 AEAD algorithm (glue for SIMD optimizations)"
+   tristate
+   depends on X86
select CRYPTO_AEAD
select CRYPTO_CRYPTD
help
diff --git a/crypto/Makefile b/crypto/Makefile
index 68a7c546460a..6d1d40eeb964 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -91,8 +91,6 @@ obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
 obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
 obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
 obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
-obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
-obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
 obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
 obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
 obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
-- 
2.17.0



Re: [PATCH] crypto: Mark MORUS SIMD glue as x86-specific

2018-05-21 Thread Ondrej Mosnáček
2018-05-18 23:01 GMT+02:00 Ondrej Mosnáček <omosna...@gmail.com>:
> From: Ondrej Mosnacek <omosna...@gmail.com>
>
> Commit 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for
> MORUS") accidetally consiedered the glue code to be usable by different
> architectures, but it seems to be only usable on x86.
>
> This patch moves it under arch/x86/crypto and adds 'depends on X86' to
> the Kconfig options.
>
> Reported-by: kbuild test robot <l...@intel.com>
> Signed-off-by: Ondrej Mosnacek <omosna...@gmail.com>
> ---
>  arch/x86/crypto/Makefile | 3 +++
>  {crypto => arch/x86/crypto}/morus1280_glue.c | 4 ++--
>  {crypto => arch/x86/crypto}/morus640_glue.c  | 4 ++--
>  crypto/Kconfig   | 6 --
>  crypto/Makefile  | 2 --
>  5 files changed, 11 insertions(+), 8 deletions(-)
>  rename {crypto => arch/x86/crypto}/morus1280_glue.c (98%)
>  rename {crypto => arch/x86/crypto}/morus640_glue.c (98%)
>
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index 3813e7cdaada..48e731d782e9 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -42,6 +42,9 @@ obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
>  obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
>  obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
>
> +obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
> +obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
> +
>  obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
>  obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
>
> diff --git a/crypto/morus1280_glue.c b/arch/x86/crypto/morus1280_glue.c
> similarity index 98%
> rename from crypto/morus1280_glue.c
> rename to arch/x86/crypto/morus1280_glue.c
> index ce1e5c34b09d..0dccdda1eb3a 100644
> --- a/crypto/morus1280_glue.c
> +++ b/arch/x86/crypto/morus1280_glue.c
> @@ -1,6 +1,6 @@
>  /*
>   * The MORUS-1280 Authenticated-Encryption Algorithm
> - *   Common glue skeleton
> + *   Common x86 SIMD glue skeleton
>   *
>   * Copyright (c) 2016-2018 Ondrej Mosnacek <omosna...@gmail.com>
>   * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
> @@ -299,4 +299,4 @@ EXPORT_SYMBOL_GPL(cryptd_morus1280_glue_exit_tfm);
>
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Ondrej Mosnacek <omosna...@gmail.com>");
> -MODULE_DESCRIPTION("MORUS-1280 AEAD mode -- glue for optimizations");
> +MODULE_DESCRIPTION("MORUS-1280 AEAD mode -- glue for x86 optimizations");
> diff --git a/crypto/morus640_glue.c b/arch/x86/crypto/morus640_glue.c
> similarity index 98%
> rename from crypto/morus640_glue.c
> rename to arch/x86/crypto/morus640_glue.c
> index c7e788cfaa29..7b58fe4d9bd1 100644
> --- a/crypto/morus640_glue.c
> +++ b/arch/x86/crypto/morus640_glue.c
> @@ -1,6 +1,6 @@
>  /*
>   * The MORUS-640 Authenticated-Encryption Algorithm
> - *   Common glue skeleton
> + *   Common x86 SIMD glue skeleton
>   *
>   * Copyright (c) 2016-2018 Ondrej Mosnacek <omosna...@gmail.com>
>   * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
> @@ -295,4 +295,4 @@ EXPORT_SYMBOL_GPL(cryptd_morus640_glue_exit_tfm);
>
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Ondrej Mosnacek <omosna...@gmail.com>");
> -MODULE_DESCRIPTION("MORUS-640 AEAD mode -- glue for optimizations");
> +MODULE_DESCRIPTION("MORUS-640 AEAD mode -- glue for x86 optimizations");
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 75f5efde9aa3..0c9883d60a51 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -341,7 +341,8 @@ config CRYPTO_MORUS640
>   Support for the MORUS-640 dedicated AEAD algorithm.
>
>  config CRYPTO_MORUS640_GLUE
> -   tristate "MORUS-640 AEAD algorithm (glue for SIMD optimizations)"
> +   tristate "MORUS-640 AEAD algorithm (glue for x86 SIMD optimizations)"
> +   depends on X86
> select CRYPTO_AEAD
> select CRYPTO_CRYPTD
> help
> @@ -363,7 +364,8 @@ config CRYPTO_MORUS1280
>   Support for the MORUS-1280 dedicated AEAD algorithm.
>
>  config CRYPTO_MORUS1280_GLUE
> -   tristate "MORUS-1280 AEAD algorithm (glue for SIMD optimizations)"
> +   tristate "MORUS-1280 AEAD algorithm (glue for x86 SIMD optimizations)"
> +   depends on X86
> select CRYPTO_AEAD
> select CRYPTO_CRYPTD
> help

I realized these options shouldn't be shown to the user and thus
should have no prompt text set. I will send a v2 that also removes the
prompts.

Regards,

Ondrej

> diff --git a/cr

[PATCH] crypto: x86/aegis256 - Fix wrong key buffer size

2018-05-20 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

AEGIS-256 key is two blocks, not one.

Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations")
Reported-by: Eric Biggers 
Signed-off-by: Ondrej Mosnacek 
---
 arch/x86/crypto/aegis256-aesni-glue.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/aegis256-aesni-glue.c 
b/arch/x86/crypto/aegis256-aesni-glue.c
index 3181655dd862..2b5dd3af8f4d 100644
--- a/arch/x86/crypto/aegis256-aesni-glue.c
+++ b/arch/x86/crypto/aegis256-aesni-glue.c
@@ -57,7 +57,7 @@ struct aegis_state {
 };
 
 struct aegis_ctx {
-   struct aegis_block key;
+   struct aegis_block key[AEGIS256_KEY_SIZE / AEGIS256_BLOCK_SIZE];
 };
 
 struct aegis_crypt_ops {
@@ -164,7 +164,7 @@ static int crypto_aegis256_aesni_setkey(struct crypto_aead 
*aead, const u8 *key,
return -EINVAL;
}
 
-   memcpy(ctx->key.bytes, key, AEGIS256_KEY_SIZE);
+   memcpy(ctx->key, key, AEGIS256_KEY_SIZE);
 
return 0;
 }
@@ -190,7 +190,7 @@ static void crypto_aegis256_aesni_crypt(struct aead_request 
*req,
 
kernel_fpu_begin();
 
-   crypto_aegis256_aesni_init(, ctx->key.bytes, req->iv);
+   crypto_aegis256_aesni_init(, ctx->key, req->iv);
crypto_aegis256_aesni_process_ad(, req->src, req->assoclen);
crypto_aegis256_aesni_process_crypt(, req, ops);
crypto_aegis256_aesni_final(, tag_xor, req->assoclen, cryptlen);
-- 
2.17.0



Re: [PATCH 3/3] crypto: x86 - Add optimized AEGIS implementations

2018-05-20 Thread Ondrej Mosnáček
2018-05-20 4:41 GMT+02:00 Eric Biggers <ebigge...@gmail.com>:
> Hi Ondrej,
>
> On Fri, May 11, 2018 at 02:12:51PM +0200, Ondrej Mosnáček wrote:
>> From: Ondrej Mosnacek <omosna...@gmail.com>
>>
>> This patch adds optimized implementations of AEGIS-128, AEGIS-128L,
>> and AEGIS-256, utilizing the AES-NI and SSE2 x86 extensions.
>>
>> Signed-off-by: Ondrej Mosnacek <omosna...@gmail.com>
> [...]
>> +static int crypto_aegis256_aesni_setkey(struct crypto_aead *aead, const u8 
>> *key,
>> + unsigned int keylen)
>> +{
>> + struct aegis_ctx *ctx = crypto_aegis256_aesni_ctx(aead);
>> +
>> + if (keylen != AEGIS256_KEY_SIZE) {
>> + crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
>> + return -EINVAL;
>> + }
>> +
>> + memcpy(ctx->key.bytes, key, AEGIS256_KEY_SIZE);
>> +
>> + return 0;
>> +}
>
> This code is copying 32 bytes into a 16-byte buffer.

Indeed, I must have overlooked that while copy-pasting and editing the
boilerplate...

I will send a follow-up patch soon.

Thanks for the report!

>
> ==
> BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline]
> BUG: KASAN: slab-out-of-bounds in crypto_aegis256_aesni_setkey+0x23/0x60 
> arch/x86/crypto/aegis256-aesni-glue.c:167
> Write of size 32 at addr 88006c16b650 by task cryptomgr_test/120
> CPU: 2 PID: 120 Comm: cryptomgr_test Not tainted 
> 4.17.0-rc1-00069-g6ecc9d9ff91f #31
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.11.0-20171110_100015-anatol 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x86/0xca lib/dump_stack.c:113
>  print_address_description+0x65/0x204 mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.6+0x242/0x304 mm/kasan/report.c:412
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  check_memory_region+0x13c/0x1b0 mm/kasan/kasan.c:267
>  memcpy+0x37/0x50 mm/kasan/kasan.c:303
>  memcpy include/linux/string.h:345 [inline]
>  crypto_aegis256_aesni_setkey+0x23/0x60 
> arch/x86/crypto/aegis256-aesni-glue.c:167
>  crypto_aead_setkey+0xa4/0x1e0 crypto/aead.c:62
>  cryptd_aead_setkey+0x30/0x50 crypto/cryptd.c:938
>  crypto_aead_setkey+0xa4/0x1e0 crypto/aead.c:62
>  cryptd_aegis256_aesni_setkey+0x30/0x50 
> arch/x86/crypto/aegis256-aesni-glue.c:259
>  crypto_aead_setkey+0xa4/0x1e0 crypto/aead.c:62
>  __test_aead+0x8bf/0x3770 crypto/testmgr.c:675
>  test_aead+0x28/0x110 crypto/testmgr.c:957
>  alg_test_aead+0x8b/0x140 crypto/testmgr.c:1690
>  alg_test.part.5+0x1bb/0x4d0 crypto/testmgr.c:3845
>  alg_test+0x23/0x25 crypto/testmgr.c:3865
>  cryptomgr_test+0x56/0x80 crypto/algboss.c:223
>  kthread+0x329/0x3f0 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:412
> Allocated by task 120:
>  save_stack mm/kasan/kasan.c:448 [inline]
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc.part.1+0x5f/0xf0 mm/kasan/kasan.c:553
>  kasan_kmalloc+0xaf/0xc0 mm/kasan/kasan.c:538
>  __do_kmalloc mm/slab.c:3718 [inline]
>  __kmalloc+0x114/0x1d0 mm/slab.c:3727
>  kmalloc include/linux/slab.h:517 [inline]
>  kzalloc include/linux/slab.h:701 [inline]
>  crypto_create_tfm+0x80/0x2c0 crypto/api.c:464
>  crypto_spawn_tfm2+0x57/0x90 crypto/algapi.c:717
>  crypto_spawn_aead include/crypto/internal/aead.h:112 [inline]
>  cryptd_aead_init_tfm+0x3d/0x110 crypto/cryptd.c:1033
>  crypto_aead_init_tfm+0x130/0x190 crypto/aead.c:111
>  crypto_create_tfm+0xda/0x2c0 crypto/api.c:471
>  crypto_alloc_tfm+0xcf/0x1d0 crypto/api.c:543
>  crypto_alloc_aead+0x14/0x20 crypto/aead.c:351
>  cryptd_alloc_aead+0xeb/0x1c0 crypto/cryptd.c:1334
>  cryptd_aegis256_aesni_init_tfm+0x24/0xf0 
> arch/x86/crypto/aegis256-aesni-glue.c:308
>  crypto_aead_init_tfm+0x130/0x190 crypto/aead.c:111
>  crypto_create_tfm+0xda/0x2c0 crypto/api.c:471
>  crypto_alloc_tfm+0xcf/0x1d0 crypto/api.c:543
>  crypto_alloc_aead+0x14/0x20 crypto/aead.c:351
>  alg_test_aead+0x1f/0x140 crypto/testmgr.c:1682
>  alg_test.part.5+0x1bb/0x4d0 crypto/testmgr.c:3845
>  alg_test+0x23/0x25 crypto/testmgr.c:3865
>  cryptomgr_test+0x56/0x80 crypto/algboss.c:223
>  kthread+0x329/0x3f0 kernel/kthread.c:238
>  ret_from_[   16.453502] serial8250: too much work for irq4
> Freed by task 0:
> (stack is not available)
> The buggy address belongs to the object at 88006c16b600
> The buggy address is located 80 bytes inside of
> The buggy address belongs to the page:
> page:ea00017a4f68 count:1 mapcount:0 mapping:88006c16b000 index:0x0
> flags: 0

[PATCH] crypto: Mark MORUS SIMD glue as x86-specific

2018-05-18 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

Commit 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for
MORUS") accidetally consiedered the glue code to be usable by different
architectures, but it seems to be only usable on x86.

This patch moves it under arch/x86/crypto and adds 'depends on X86' to
the Kconfig options.

Reported-by: kbuild test robot 
Signed-off-by: Ondrej Mosnacek 
---
 arch/x86/crypto/Makefile | 3 +++
 {crypto => arch/x86/crypto}/morus1280_glue.c | 4 ++--
 {crypto => arch/x86/crypto}/morus640_glue.c  | 4 ++--
 crypto/Kconfig   | 6 --
 crypto/Makefile  | 2 --
 5 files changed, 11 insertions(+), 8 deletions(-)
 rename {crypto => arch/x86/crypto}/morus1280_glue.c (98%)
 rename {crypto => arch/x86/crypto}/morus640_glue.c (98%)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 3813e7cdaada..48e731d782e9 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -42,6 +42,9 @@ obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
 obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
 obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
 
+obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
+obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
+
 obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
 obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
 
diff --git a/crypto/morus1280_glue.c b/arch/x86/crypto/morus1280_glue.c
similarity index 98%
rename from crypto/morus1280_glue.c
rename to arch/x86/crypto/morus1280_glue.c
index ce1e5c34b09d..0dccdda1eb3a 100644
--- a/crypto/morus1280_glue.c
+++ b/arch/x86/crypto/morus1280_glue.c
@@ -1,6 +1,6 @@
 /*
  * The MORUS-1280 Authenticated-Encryption Algorithm
- *   Common glue skeleton
+ *   Common x86 SIMD glue skeleton
  *
  * Copyright (c) 2016-2018 Ondrej Mosnacek 
  * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
@@ -299,4 +299,4 @@ EXPORT_SYMBOL_GPL(cryptd_morus1280_glue_exit_tfm);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Ondrej Mosnacek ");
-MODULE_DESCRIPTION("MORUS-1280 AEAD mode -- glue for optimizations");
+MODULE_DESCRIPTION("MORUS-1280 AEAD mode -- glue for x86 optimizations");
diff --git a/crypto/morus640_glue.c b/arch/x86/crypto/morus640_glue.c
similarity index 98%
rename from crypto/morus640_glue.c
rename to arch/x86/crypto/morus640_glue.c
index c7e788cfaa29..7b58fe4d9bd1 100644
--- a/crypto/morus640_glue.c
+++ b/arch/x86/crypto/morus640_glue.c
@@ -1,6 +1,6 @@
 /*
  * The MORUS-640 Authenticated-Encryption Algorithm
- *   Common glue skeleton
+ *   Common x86 SIMD glue skeleton
  *
  * Copyright (c) 2016-2018 Ondrej Mosnacek 
  * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
@@ -295,4 +295,4 @@ EXPORT_SYMBOL_GPL(cryptd_morus640_glue_exit_tfm);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Ondrej Mosnacek ");
-MODULE_DESCRIPTION("MORUS-640 AEAD mode -- glue for optimizations");
+MODULE_DESCRIPTION("MORUS-640 AEAD mode -- glue for x86 optimizations");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 75f5efde9aa3..0c9883d60a51 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -341,7 +341,8 @@ config CRYPTO_MORUS640
  Support for the MORUS-640 dedicated AEAD algorithm.
 
 config CRYPTO_MORUS640_GLUE
-   tristate "MORUS-640 AEAD algorithm (glue for SIMD optimizations)"
+   tristate "MORUS-640 AEAD algorithm (glue for x86 SIMD optimizations)"
+   depends on X86
select CRYPTO_AEAD
select CRYPTO_CRYPTD
help
@@ -363,7 +364,8 @@ config CRYPTO_MORUS1280
  Support for the MORUS-1280 dedicated AEAD algorithm.
 
 config CRYPTO_MORUS1280_GLUE
-   tristate "MORUS-1280 AEAD algorithm (glue for SIMD optimizations)"
+   tristate "MORUS-1280 AEAD algorithm (glue for x86 SIMD optimizations)"
+   depends on X86
select CRYPTO_AEAD
select CRYPTO_CRYPTD
help
diff --git a/crypto/Makefile b/crypto/Makefile
index 68a7c546460a..6d1d40eeb964 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -91,8 +91,6 @@ obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
 obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
 obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
 obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
-obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
-obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
 obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
 obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
 obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
-- 
2.17.0



[PATCH 2/4] crypto: testmgr - Add test vectors for MORUS

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patch adds test vectors for MORUS-640 and MORUS-1280. The test
vectors were generated using the reference implementation from
SUPERCOP (see code comments for more details).

Signed-off-by: Ondrej Mosnacek 
---
 crypto/testmgr.c |   18 +
 crypto/testmgr.h | 3400 ++
 2 files changed, 3418 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index c31da0f3f680..79d4e97f2434 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3335,6 +3335,24 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(michael_mic_tv_template)
}
+   }, {
+   .alg = "morus1280",
+   .test = alg_test_aead,
+   .suite = {
+   .aead = {
+   .enc = __VECS(morus1280_enc_tv_template),
+   .dec = __VECS(morus1280_dec_tv_template),
+   }
+   }
+   }, {
+   .alg = "morus640",
+   .test = alg_test_aead,
+   .suite = {
+   .aead = {
+   .enc = __VECS(morus640_enc_tv_template),
+   .dec = __VECS(morus640_dec_tv_template),
+   }
+   }
}, {
.alg = "ofb(aes)",
.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index a20231f53024..f6b4193445bc 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -27377,6 +27377,3406 @@ static const struct aead_testvec 
rfc7539esp_dec_tv_template[] = {
},
 };
 
+/*
+ * MORUS-640 test vectors - generated via reference implementation from
+ * SUPERCOP (https://bench.cr.yp.to/supercop.html):
+ *
+ *   https://bench.cr.yp.to/supercop/supercop-20170228.tar.xz
+ *   (see crypto_aead/morus640128v2/)
+ */
+static const struct aead_testvec morus640_enc_tv_template[] = {
+   {
+   .key= "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .klen   = 16,
+   .iv = "\x0f\xc9\x8e\x67\x44\x9e\xaa\x86"
+ "\x20\x36\x2c\x24\xfe\xc9\x30\x81",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "",
+   .ilen   = 0,
+   .result = "\x89\x62\x7d\xf3\x07\x9d\x52\x05"
+ "\x53\xc3\x04\x60\x93\xb4\x37\x9a",
+   .rlen   = 16,
+   }, {
+   .key= "\x3c\x24\x39\x9f\x10\x7b\xa8\x1b"
+ "\x80\xda\xb2\x91\xf9\x24\xc2\x06",
+   .klen   = 16,
+   .iv = "\x4b\xed\xc8\x07\x54\x1a\x52\xa2"
+ "\xa1\x10\xde\xb5\xf8\xed\xf3\x87",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "\x69",
+   .ilen   = 1,
+   .result = "\xa8\x8d\xe4\x90\xb5\x50\x8f\x78"
+ "\xb6\x10\x9a\x59\x5f\x61\x37\x70"
+ "\x09",
+   .rlen   = 17,
+   }, {
+   .key= "\x79\x49\x73\x3e\x20\xf7\x51\x37"
+ "\x01\xb4\x64\x22\xf3\x48\x85\x0c",
+   .klen   = 16,
+   .iv = "\x88\x12\x01\xa6\x64\x96\xfb\xbe"
+ "\x22\xea\x90\x47\xf2\x11\xb5\x8e",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "\xa6\xa4\x1e\x76\xec\xd4\x50\xcc"
+ "\x62\x58\xe9\x8f\xef\xa4\x17",
+   .ilen   = 15,
+   .result = "\x76\xdd\xb9\x05\x3d\xce\x61\x38"
+ "\xf3\xef\xf7\xe5\xd7\xfd\x70\xa5"
+ "\xcf\x9d\x64\xb8\x0a\x9f\xfd\x8b"
+ "\xd4\x6e\xfe\xd9\xc8\x63\x4b",
+   .rlen   = 31,
+   }, {
+   .key= "\xb5\x6e\xad\xdd\x30\x72\xfa\x53"
+ "\x82\x8e\x16\xb4\xed\x6d\x47\x12",
+   .klen   = 16,
+   .iv = "\xc4\x37\x3b\x45\x74\x11\xa4\xda"
+ "\xa2\xc5\x42\xd8\xec\x36\x78\x94",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "\xe2\xc9\x58\x15\xfc\x4f\xf8\xe8"
+ "\xe3\x32\x9b\x21\xe9\xc8\xd9\x97",
+   .ilen   = 16,
+   .result = "\xdc\x72\xe8\x14\xfb\x63\xad\x72"
+ "\x1f\x57\x9a\x1f\x88\x81\xdb\xd6"
+ "\xc1\x91\x9d\xb9\x25\xc4\x99\x4c"
+ "\x97\xcd\x8a\x0c\x9d\x68\x00\x1c",
+   .rlen   = 32,
+   }, {
+   .key= "\xf2\x92\xe6\x7d\x40\xee\xa3\x6f"
+ "\x03\x68\xc8\x45\xe7\x91\x0a\x18",
+   .klen   = 16,
+   .iv = "\x01\x5c\x75\xe5\x84\x8d\x4d\xf6"
+   

[PATCH 3/4] crypto: Add common SIMD glue code for MORUS

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patch adds a common glue code for optimized implementations of
MORUS AEAD algorithms.

Signed-off-by: Ondrej Mosnacek 
---
 crypto/Kconfig  |  16 ++
 crypto/Makefile |   2 +
 crypto/morus1280_glue.c | 302 
 crypto/morus640_glue.c  | 298 +++
 include/crypto/morus1280_glue.h | 137 +++
 include/crypto/morus640_glue.h  | 137 +++
 6 files changed, 892 insertions(+)
 create mode 100644 crypto/morus1280_glue.c
 create mode 100644 crypto/morus640_glue.c
 create mode 100644 include/crypto/morus1280_glue.h
 create mode 100644 include/crypto/morus640_glue.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index fdf2b0958b43..34c18e242e31 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -295,12 +295,28 @@ config CRYPTO_MORUS640
help
  Support for the MORUS-640 dedicated AEAD algorithm.
 
+config CRYPTO_MORUS640_GLUE
+   tristate "MORUS-640 AEAD algorithm (glue for SIMD optimizations)"
+   select CRYPTO_AEAD
+   select CRYPTO_CRYPTD
+   help
+ Common glue for SIMD optimizations of the MORUS-640 dedicated AEAD
+ algorithm.
+
 config CRYPTO_MORUS1280
tristate "MORUS-1280 AEAD algorithm"
select CRYPTO_AEAD
help
  Support for the MORUS-1280 dedicated AEAD algorithm.
 
+config CRYPTO_MORUS1280_GLUE
+   tristate "MORUS-1280 AEAD algorithm (glue for SIMD optimizations)"
+   select CRYPTO_AEAD
+   select CRYPTO_CRYPTD
+   help
+ Common glue for SIMD optimizations of the MORUS-1280 dedicated AEAD
+ algorithm.
+
 config CRYPTO_SEQIV
tristate "Sequence Number IV Generator"
select CRYPTO_AEAD
diff --git a/crypto/Makefile b/crypto/Makefile
index 3073145c460d..77f6d36cd7a7 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -88,6 +88,8 @@ obj-$(CONFIG_CRYPTO_CCM) += ccm.o
 obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
 obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
 obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
+obj-$(CONFIG_CRYPTO_MORUS640_GLUE) += morus640_glue.o
+obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
 obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
 obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
 obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
diff --git a/crypto/morus1280_glue.c b/crypto/morus1280_glue.c
new file mode 100644
index ..ce1e5c34b09d
--- /dev/null
+++ b/crypto/morus1280_glue.c
@@ -0,0 +1,302 @@
+/*
+ * The MORUS-1280 Authenticated-Encryption Algorithm
+ *   Common glue skeleton
+ *
+ * Copyright (c) 2016-2018 Ondrej Mosnacek 
+ * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct morus1280_state {
+   struct morus1280_block s[MORUS_STATE_BLOCKS];
+};
+
+struct morus1280_ops {
+   int (*skcipher_walk_init)(struct skcipher_walk *walk,
+ struct aead_request *req, bool atomic);
+
+   void (*crypt_blocks)(void *state, const void *src, void *dst,
+unsigned int length);
+   void (*crypt_tail)(void *state, const void *src, void *dst,
+  unsigned int length);
+};
+
+static void crypto_morus1280_glue_process_ad(
+   struct morus1280_state *state,
+   const struct morus1280_glue_ops *ops,
+   struct scatterlist *sg_src, unsigned int assoclen)
+{
+   struct scatter_walk walk;
+   struct morus1280_block buf;
+   unsigned int pos = 0;
+
+   scatterwalk_start(, sg_src);
+   while (assoclen != 0) {
+   unsigned int size = scatterwalk_clamp(, assoclen);
+   unsigned int left = size;
+   void *mapped = scatterwalk_map();
+   const u8 *src = (const u8 *)mapped;
+
+   if (pos + size >= MORUS1280_BLOCK_SIZE) {
+   if (pos > 0) {
+   unsigned int fill = MORUS1280_BLOCK_SIZE - pos;
+   memcpy(buf.bytes + pos, src, fill);
+   ops->ad(state, buf.bytes, MORUS1280_BLOCK_SIZE);
+   pos = 0;
+   left -= fill;
+   src += fill;
+   }
+
+   ops->ad(state, src, left);
+   src += left & ~(MORUS1280_BLOCK_SIZE - 1);
+   left &= MORUS1280_BLOCK_SIZE - 1;
+   }
+
+   memcpy(buf.bytes + 

[PATCH 0/4] Add support for MORUS AEAD algorithm

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patchset adds the MORUS AEAD algorithm implementation to the Linux Crypto 
API.

MORUS [1] is a dedicated AEAD algorithm focused on SIMD instructions and 
designed for high throughput both on modern processors and in hardware. It is 
designed by Hongjun Wu and Tao Huang and has been submitted to the CAESAR 
competiton [2], where it is currently one of the finalists [3]. MORUS uses only 
logical bitwise operations and bitwise rotations as primitives.

MORUS has two variants:
* MORUS-640 operating on 128-bit blocks and accepting a 128-bit key.
* MORUS-1280 operating on 256-bit blocks and accepting a 128- or 256-bit key.
Both variants accept a 128-bit IV and produce an up to 128-bit tag.

The patchset contains four patches, adding:
* generic implementations
* test vectors to testmgr
* common glue code for x86_64 optimizations
* x86_64 SSE2/AVX2 optimized implementations

Since there are no official test vectors currently available, the test vectors 
in patch 2 were generated using a reference implementation from public CAESAR 
benchmarks [4]. They should be replaced/complemented with official test vectors 
if/when they become available.

The implementations have been developed in cooperation with Milan Broz (the 
maintainer of dm-crypt and cryptsetup) and there is a plan to use them for 
authenticated disk encryption in cryptsetup. They are a result of my Master's 
thesis at the Faculty of Informatics, Masaryk University, Brno [5].

[1] https://competitions.cr.yp.to/round3/morusv2.pdf
[2] https://competitions.cr.yp.to/caesar-call.html
[3] https://competitions.cr.yp.to/caesar-submissions.html
[4] https://bench.cr.yp.to/ebaead.html
[5] https://is.muni.cz/th/409879/fi_m/?lang=en

Ondrej Mosnacek (4):
  crypto: Add generic MORUS AEAD implementations
  crypto: testmgr - Add test vectors for MORUS
  crypto: Add common SIMD glue code for MORUS
  crypto: x86 - Add optimized MORUS implementations

 arch/x86/crypto/Makefile  |   10 +
 arch/x86/crypto/morus1280-avx2-asm.S  |  621 +
 arch/x86/crypto/morus1280-avx2-glue.c |   68 +
 arch/x86/crypto/morus1280-sse2-asm.S  |  895 +++
 arch/x86/crypto/morus1280-sse2-glue.c |   68 +
 arch/x86/crypto/morus640-sse2-asm.S   |  614 +
 arch/x86/crypto/morus640-sse2-glue.c  |   68 +
 crypto/Kconfig|   54 +
 crypto/Makefile   |4 +
 crypto/morus1280.c|  549 
 crypto/morus1280_glue.c   |  302 +++
 crypto/morus640.c |  544 
 crypto/morus640_glue.c|  298 +++
 crypto/testmgr.c  |   18 +
 crypto/testmgr.h  | 3400 +
 include/crypto/morus1280_glue.h   |  137 +
 include/crypto/morus640_glue.h|  137 +
 include/crypto/morus_common.h |   23 +
 18 files changed, 7810 insertions(+)
 create mode 100644 arch/x86/crypto/morus1280-avx2-asm.S
 create mode 100644 arch/x86/crypto/morus1280-avx2-glue.c
 create mode 100644 arch/x86/crypto/morus1280-sse2-asm.S
 create mode 100644 arch/x86/crypto/morus1280-sse2-glue.c
 create mode 100644 arch/x86/crypto/morus640-sse2-asm.S
 create mode 100644 arch/x86/crypto/morus640-sse2-glue.c
 create mode 100644 crypto/morus1280.c
 create mode 100644 crypto/morus1280_glue.c
 create mode 100644 crypto/morus640.c
 create mode 100644 crypto/morus640_glue.c
 create mode 100644 include/crypto/morus1280_glue.h
 create mode 100644 include/crypto/morus640_glue.h
 create mode 100644 include/crypto/morus_common.h

-- 
2.17.0



[PATCH 1/4] crypto: Add generic MORUS AEAD implementations

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patch adds the generic implementation of the MORUS family of AEAD
algorithms (MORUS-640 and MORUS-1280). The original authors of MORUS
are Hongjun Wu and Tao Huang.

At the time of writing, MORUS is one of the finalists in CAESAR, an
open competition intended to select a portfolio of alternatives to
the problematic AES-GCM:

https://competitions.cr.yp.to/caesar-submissions.html
https://competitions.cr.yp.to/round3/morusv2.pdf

Signed-off-by: Ondrej Mosnacek 
---
 crypto/Kconfig|  12 +
 crypto/Makefile   |   2 +
 crypto/morus1280.c| 549 ++
 crypto/morus640.c | 544 +
 include/crypto/morus_common.h |  23 ++
 5 files changed, 1130 insertions(+)
 create mode 100644 crypto/morus1280.c
 create mode 100644 crypto/morus640.c
 create mode 100644 include/crypto/morus_common.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index a5c5f7bbec98..fdf2b0958b43 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -289,6 +289,18 @@ config CRYPTO_CHACHA20POLY1305
  with the Poly1305 authenticator. It is defined in RFC7539 for use in
  IETF protocols.
 
+config CRYPTO_MORUS640
+   tristate "MORUS-640 AEAD algorithm"
+   select CRYPTO_AEAD
+   help
+ Support for the MORUS-640 dedicated AEAD algorithm.
+
+config CRYPTO_MORUS1280
+   tristate "MORUS-1280 AEAD algorithm"
+   select CRYPTO_AEAD
+   help
+ Support for the MORUS-1280 dedicated AEAD algorithm.
+
 config CRYPTO_SEQIV
tristate "Sequence Number IV Generator"
select CRYPTO_AEAD
diff --git a/crypto/Makefile b/crypto/Makefile
index 065423d67488..3073145c460d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -86,6 +86,8 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
 obj-$(CONFIG_CRYPTO_GCM) += gcm.o
 obj-$(CONFIG_CRYPTO_CCM) += ccm.o
 obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
+obj-$(CONFIG_CRYPTO_MORUS640) += morus640.o
+obj-$(CONFIG_CRYPTO_MORUS1280) += morus1280.o
 obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
 obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
 obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
diff --git a/crypto/morus1280.c b/crypto/morus1280.c
new file mode 100644
index ..6180b2557836
--- /dev/null
+++ b/crypto/morus1280.c
@@ -0,0 +1,549 @@
+/*
+ * The MORUS-1280 Authenticated-Encryption Algorithm
+ *
+ * Copyright (c) 2016-2018 Ondrej Mosnacek 
+ * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MORUS1280_WORD_SIZE 8
+#define MORUS1280_BLOCK_SIZE (MORUS_BLOCK_WORDS * MORUS1280_WORD_SIZE)
+#define MORUS1280_BLOCK_ALIGN (__alignof__(__le64))
+#define MORUS1280_ALIGNED(p) IS_ALIGNED((uintptr_t)p, MORUS1280_BLOCK_ALIGN)
+
+struct morus1280_block {
+   u64 words[MORUS_BLOCK_WORDS];
+};
+
+union morus1280_block_in {
+   __le64 words[MORUS_BLOCK_WORDS];
+   u8 bytes[MORUS1280_BLOCK_SIZE];
+};
+
+struct morus1280_state {
+   struct morus1280_block s[MORUS_STATE_BLOCKS];
+};
+
+struct morus1280_ctx {
+   struct morus1280_block key;
+};
+
+struct morus1280_ops {
+   int (*skcipher_walk_init)(struct skcipher_walk *walk,
+ struct aead_request *req, bool atomic);
+
+   void (*crypt_chunk)(struct morus1280_state *state,
+   u8 *dst, const u8 *src, unsigned int size);
+};
+
+static const struct morus1280_block crypto_morus1280_const[1] = {
+   { .words = {
+   U64_C(0x0d08050302010100),
+   U64_C(0x6279e99059372215),
+   U64_C(0xf12fc26d55183ddb),
+   U64_C(0xdd28b57342311120),
+   } },
+};
+
+static void crypto_morus1280_round(struct morus1280_block *b0,
+  struct morus1280_block *b1,
+  struct morus1280_block *b2,
+  struct morus1280_block *b3,
+  struct morus1280_block *b4,
+  const struct morus1280_block *m,
+  unsigned int b, unsigned int w)
+{
+   unsigned int i;
+   struct morus1280_block tmp;
+
+   for (i = 0; i < MORUS_BLOCK_WORDS; i++) {
+   b0->words[i] ^= b1->words[i] & b2->words[i];
+   b0->words[i] ^= b3->words[i];
+   b0->words[i] ^= m->words[i];
+   b0->words[i] = rol64(b0->words[i], b);
+   }
+
+   tmp = *b3;
+   for (i = 0; i < MORUS_BLOCK_WORDS; i++)
+   

[PATCH 1/3] crypto: Add generic AEGIS AEAD implementations

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patch adds the generic implementation of the AEGIS family of AEAD
algorithms (AEGIS-128, AEGIS-128L, and AEGIS-256). The original
authors of AEGIS are Hongjun Wu and Bart Preneel.

At the time of writing, AEGIS is one of the finalists in CAESAR, an
open competition intended to select a portfolio of alternatives to
the problematic AES-GCM:

https://competitions.cr.yp.to/caesar-submissions.html
https://competitions.cr.yp.to/round3/aegisv11.pdf

Signed-off-by: Ondrej Mosnacek 
---
 crypto/Kconfig |  21 ++
 crypto/Makefile|   3 +
 crypto/aegis.h |  80 +++
 crypto/aegis128.c  | 463 +++
 crypto/aegis128l.c | 527 +
 crypto/aegis256.c  | 478 
 6 files changed, 1572 insertions(+)
 create mode 100644 crypto/aegis.h
 create mode 100644 crypto/aegis128.c
 create mode 100644 crypto/aegis128l.c
 create mode 100644 crypto/aegis256.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index a5c5f7bbec98..48856238a490 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -289,6 +289,27 @@ config CRYPTO_CHACHA20POLY1305
  with the Poly1305 authenticator. It is defined in RFC7539 for use in
  IETF protocols.
 
+config CRYPTO_AEGIS128
+   tristate "AEGIS-128 AEAD algorithm"
+   select CRYPTO_AEAD
+   select CRYPTO_AES  # for AES S-box tables
+   help
+Support for the AEGIS-128 dedicated AEAD algorithm.
+
+config CRYPTO_AEGIS128L
+   tristate "AEGIS-128L AEAD algorithm"
+   select CRYPTO_AEAD
+   select CRYPTO_AES  # for AES S-box tables
+   help
+Support for the AEGIS-128L dedicated AEAD algorithm.
+
+config CRYPTO_AEGIS256
+   tristate "AEGIS-256 AEAD algorithm"
+   select CRYPTO_AEAD
+   select CRYPTO_AES  # for AES S-box tables
+   help
+Support for the AEGIS-256 dedicated AEAD algorithm.
+
 config CRYPTO_SEQIV
tristate "Sequence Number IV Generator"
select CRYPTO_AEAD
diff --git a/crypto/Makefile b/crypto/Makefile
index 065423d67488..f2008d493a28 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -86,6 +86,9 @@ obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
 obj-$(CONFIG_CRYPTO_GCM) += gcm.o
 obj-$(CONFIG_CRYPTO_CCM) += ccm.o
 obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
+obj-$(CONFIG_CRYPTO_AEGIS128) += aegis128.o
+obj-$(CONFIG_CRYPTO_AEGIS128L) += aegis128l.o
+obj-$(CONFIG_CRYPTO_AEGIS256) += aegis256.o
 obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o
 obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o
 obj-$(CONFIG_CRYPTO_MCRYPTD) += mcryptd.o
diff --git a/crypto/aegis.h b/crypto/aegis.h
new file mode 100644
index ..f1c6900ddb80
--- /dev/null
+++ b/crypto/aegis.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AEGIS common definitions
+ *
+ * Copyright (c) 2018 Ondrej Mosnacek 
+ * Copyright (c) 2018 Red Hat, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#ifndef _CRYPTO_AEGIS_H
+#define _CRYPTO_AEGIS_H
+
+#include 
+#include 
+
+#define AEGIS_BLOCK_SIZE 16
+
+union aegis_block {
+   __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
+   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
+   u8 bytes[AEGIS_BLOCK_SIZE];
+};
+
+#define AEGIS_BLOCK_ALIGN (__alignof__(union aegis_block))
+#define AEGIS_ALIGNED(p) IS_ALIGNED((uintptr_t)p, AEGIS_BLOCK_ALIGN)
+
+static const union aegis_block crypto_aegis_const[2] = {
+   { .words64 = {
+   cpu_to_le64(U64_C(0x0d08050302010100)),
+   cpu_to_le64(U64_C(0x6279e99059372215)),
+   } },
+   { .words64 = {
+   cpu_to_le64(U64_C(0xf12fc26d55183ddb)),
+   cpu_to_le64(U64_C(0xdd28b57342311120)),
+   } },
+};
+
+static void crypto_aegis_block_xor(union aegis_block *dst,
+  const union aegis_block *src)
+{
+   dst->words64[0] ^= src->words64[0];
+   dst->words64[1] ^= src->words64[1];
+}
+
+static void crypto_aegis_block_and(union aegis_block *dst,
+  const union aegis_block *src)
+{
+   dst->words64[0] &= src->words64[0];
+   dst->words64[1] &= src->words64[1];
+}
+
+static void crypto_aegis_aesenc(union aegis_block *dst,
+   const union aegis_block *src,
+   const union aegis_block *key)
+{
+   u32 *d = dst->words32;
+   const u8  *s  = src->bytes;
+   const u32 *k  = key->words32;
+   const u32 *t0 = crypto_ft_tab[0];
+   const u32 *t1 = crypto_ft_tab[1];
+   const u32 *t2 = crypto_ft_tab[2];
+   const u32 *t3 = crypto_ft_tab[3];
+   u32 d0, d1, d2, d3;
+
+   d0 

[PATCH 0/3] Add support for AEGIS AEAD algorithm

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patchset adds the AEGIS AEAD algorithm implementation to the Linux Crypto 
API.

AEGIS [1] is a dedicated AEAD algorithm based on the AES round function and 
designed for high throughput both on modern processors and in hardware. It is 
designed by Hongjun Wu and Bart Preneel and has been submitted to the CAESAR 
competiton [2], where it is currently one of the finalists [3].

AEGIS uses the AES round function and logical bitwise operations as primitives. 
It achieves extremely good performance in software (on platforms with 
HW-accelerated AES round function) and in hardware.

AEGIS has three variants:
* AEGIS-128 operating on 128-bit blocks and accepting a 128-bit IV and key.
* AEGIS-128L operating on pairs of 128-bit blocks and accepting a 128-bit IV 
and key.
* AEGIS-256 operating on 128-bit blocks and accepting a 256-bit IV and key.
All three variants produce an up to 128-bit tag.

The patchset contains three patches, adding:
* generic implementations
* test vectors to testmgr
* x86_64 AES-NI+SSE2 optimized implementations

Since there are no official test vectors currently available, the test vectors 
in patch 2 were generated using a reference implementation from public CAESAR 
benchmarks [4]. They should be replaced/complemented with official test vectors 
if/when they become available.

The implementations have been developed in cooperation with Milan Broz (the 
maintainer of dm-crypt and cryptsetup) and there is a plan to use them for 
authenticated disk encryption in cryptsetup. They are a result of my Master's 
thesis at the Faculty of Informatics, Masaryk University, Brno [5].

[1] https://competitions.cr.yp.to/round3/aegisv11.pdf
[2] https://competitions.cr.yp.to/caesar-call.html
[3] https://competitions.cr.yp.to/caesar-submissions.html
[4] https://bench.cr.yp.to/ebaead.html
[5] https://is.muni.cz/th/409879/fi_m/?lang=en

Ondrej Mosnacek (3):
  crypto: Add generic AEGIS AEAD implementations
  crypto: testmgr - Add test vectors for AEGIS
  crypto: x86 - Add optimized AEGIS implementations

 arch/x86/crypto/Makefile   |8 +
 arch/x86/crypto/aegis128-aesni-asm.S   |  749 +++
 arch/x86/crypto/aegis128-aesni-glue.c  |  407 
 arch/x86/crypto/aegis128l-aesni-asm.S  |  825 +++
 arch/x86/crypto/aegis128l-aesni-glue.c |  407 
 arch/x86/crypto/aegis256-aesni-asm.S   |  702 ++
 arch/x86/crypto/aegis256-aesni-glue.c  |  407 
 crypto/Kconfig |   45 +
 crypto/Makefile|3 +
 crypto/aegis.h |   80 +
 crypto/aegis128.c  |  463 
 crypto/aegis128l.c |  527 +
 crypto/aegis256.c  |  478 
 crypto/testmgr.c   |   27 +
 crypto/testmgr.h   | 2835 
 15 files changed, 7963 insertions(+)
 create mode 100644 arch/x86/crypto/aegis128-aesni-asm.S
 create mode 100644 arch/x86/crypto/aegis128-aesni-glue.c
 create mode 100644 arch/x86/crypto/aegis128l-aesni-asm.S
 create mode 100644 arch/x86/crypto/aegis128l-aesni-glue.c
 create mode 100644 arch/x86/crypto/aegis256-aesni-asm.S
 create mode 100644 arch/x86/crypto/aegis256-aesni-glue.c
 create mode 100644 crypto/aegis.h
 create mode 100644 crypto/aegis128.c
 create mode 100644 crypto/aegis128l.c
 create mode 100644 crypto/aegis256.c

-- 
2.17.0



[PATCH 2/3] crypto: testmgr - Add test vectors for AEGIS

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patch adds test vectors for the AEGIS family of AEAD algorithms
(AEGIS-128, AEGIS-128L, and AEGIS-256). The test vectors were
generated using the reference implementation from SUPERCOP (see code
comments for more details).

Signed-off-by: Ondrej Mosnacek 
---
 crypto/testmgr.c |   27 +
 crypto/testmgr.h | 2835 ++
 2 files changed, 2862 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index c31da0f3f680..c854b6d5faaa 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -2340,6 +2340,33 @@ static int alg_test_null(const struct alg_test_desc 
*desc,
 /* Please keep this list sorted by algorithm name. */
 static const struct alg_test_desc alg_test_descs[] = {
{
+   .alg = "aegis128",
+   .test = alg_test_aead,
+   .suite = {
+   .aead = {
+   .enc = __VECS(aegis128_enc_tv_template),
+   .dec = __VECS(aegis128_dec_tv_template),
+   }
+   }
+   }, {
+   .alg = "aegis128l",
+   .test = alg_test_aead,
+   .suite = {
+   .aead = {
+   .enc = __VECS(aegis128l_enc_tv_template),
+   .dec = __VECS(aegis128l_dec_tv_template),
+   }
+   }
+   }, {
+   .alg = "aegis256",
+   .test = alg_test_aead,
+   .suite = {
+   .aead = {
+   .enc = __VECS(aegis256_enc_tv_template),
+   .dec = __VECS(aegis256_dec_tv_template),
+   }
+   }
+   }, {
.alg = "ansi_cprng",
.test = alg_test_cprng,
.suite = {
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index a20231f53024..18acdca3c3f8 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -27377,6 +27377,2841 @@ static const struct aead_testvec 
rfc7539esp_dec_tv_template[] = {
},
 };
 
+static const struct aead_testvec aegis128_enc_tv_template[] = {
+   {
+   .key= "\x0f\xc9\x8e\x67\x44\x9e\xaa\x86"
+ "\x20\x36\x2c\x24\xfe\xc9\x30\x81",
+   .klen   = 16,
+   .iv = "\x1e\x92\x1c\xcf\x88\x3d\x54\x0d"
+ "\x40\x6d\x59\x48\xfc\x92\x61\x03",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "",
+   .ilen   = 0,
+   .result = "\x07\xa5\x11\xf2\x9d\x40\xb8\x6d"
+ "\xda\xb8\x12\x34\x4c\x53\xd9\x72",
+   .rlen   = 16,
+   }, {
+   .key= "\x4b\xed\xc8\x07\x54\x1a\x52\xa2"
+ "\xa1\x10\xde\xb5\xf8\xed\xf3\x87",
+   .klen   = 16,
+   .iv = "\x5a\xb7\x56\x6e\x98\xb9\xfd\x29"
+ "\xc1\x47\x0b\xda\xf6\xb6\x23\x09",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "\x79",
+   .ilen   = 1,
+   .result = "\x9e\x78\x52\xae\xcb\x9e\xe4\xd3"
+ "\x9a\xd7\x5d\xd7\xaa\x9a\xe9\x5a"
+ "\xcc",
+   .rlen   = 17,
+   }, {
+   .key= "\x88\x12\x01\xa6\x64\x96\xfb\xbe"
+ "\x22\xea\x90\x47\xf2\x11\xb5\x8e",
+   .klen   = 16,
+   .iv = "\x97\xdb\x90\x0e\xa8\x35\xa5\x45"
+ "\x42\x21\xbd\x6b\xf0\xda\xe6\x0f",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "\xb5\x6e\xad\xdd\x30\x72\xfa\x53"
+ "\x82\x8e\x16\xb4\xed\x6d\x47",
+   .ilen   = 15,
+   .result = "\xc3\x80\x83\x04\x5f\xaa\x61\xc7"
+ "\xca\xdd\x6f\xac\x85\x08\xb5\x35"
+ "\x2b\xc2\x3e\x0b\x1b\x39\x37\x2b"
+ "\x7a\x21\x16\xb3\xe6\x67\x66",
+   .rlen   = 31,
+   }, {
+   .key= "\xc4\x37\x3b\x45\x74\x11\xa4\xda"
+ "\xa2\xc5\x42\xd8\xec\x36\x78\x94",
+   .klen   = 16,
+   .iv = "\xd3\x00\xc9\xad\xb8\xb0\x4e\x61"
+ "\xc3\xfb\x6f\xfd\xea\xff\xa9\x15",
+   .assoc  = "",
+   .alen   = 0,
+   .input  = "\xf2\x92\xe6\x7d\x40\xee\xa3\x6f"
+ "\x03\x68\xc8\x45\xe7\x91\x0a\x18",
+   .ilen   = 16,
+   .result = "\x23\x25\x30\xe5\x6a\xb6\x36\x7d"
+ "\x38\xfd\x3a\xd2\xc2\x58\xa9\x11"
+ "\x1e\xa8\x30\x9c\x16\xa4\xdb\x65"
+ "\x51\x10\x16\x27\x70\x9b\x64\x29",
+   .rlen   = 32,
+   }, {
+   .key= 

[PATCH 3/3] crypto: x86 - Add optimized AEGIS implementations

2018-05-11 Thread Ondrej Mosnáček
From: Ondrej Mosnacek 

This patch adds optimized implementations of AEGIS-128, AEGIS-128L,
and AEGIS-256, utilizing the AES-NI and SSE2 x86 extensions.

Signed-off-by: Ondrej Mosnacek 
---
 arch/x86/crypto/Makefile   |   8 +
 arch/x86/crypto/aegis128-aesni-asm.S   | 749 ++
 arch/x86/crypto/aegis128-aesni-glue.c  | 407 
 arch/x86/crypto/aegis128l-aesni-asm.S  | 825 +
 arch/x86/crypto/aegis128l-aesni-glue.c | 407 
 arch/x86/crypto/aegis256-aesni-asm.S   | 702 +
 arch/x86/crypto/aegis256-aesni-glue.c  | 407 
 crypto/Kconfig |  24 +
 8 files changed, 3529 insertions(+)
 create mode 100644 arch/x86/crypto/aegis128-aesni-asm.S
 create mode 100644 arch/x86/crypto/aegis128-aesni-glue.c
 create mode 100644 arch/x86/crypto/aegis128l-aesni-asm.S
 create mode 100644 arch/x86/crypto/aegis128l-aesni-glue.c
 create mode 100644 arch/x86/crypto/aegis256-aesni-asm.S
 create mode 100644 arch/x86/crypto/aegis256-aesni-glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 5f07333bb224..c183553a4bd6 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -38,6 +38,10 @@ obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
 obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
+obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
+obj-$(CONFIG_CRYPTO_AEGIS256_AESNI_SSE2) += aegis256-aesni.o
+
 # These modules require assembler to support AVX.
 ifeq ($(avx_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \
@@ -72,6 +76,10 @@ salsa20-x86_64-y := salsa20-x86_64-asm_64.o salsa20_glue.o
 chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
 serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
 
+aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
+aegis128l-aesni-y := aegis128l-aesni-asm.o aegis128l-aesni-glue.o
+aegis256-aesni-y := aegis256-aesni-asm.o aegis256-aesni-glue.o
+
 ifeq ($(avx_supported),yes)
camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \
camellia_aesni_avx_glue.o
diff --git a/arch/x86/crypto/aegis128-aesni-asm.S 
b/arch/x86/crypto/aegis128-aesni-asm.S
new file mode 100644
index ..9254e0b6cc06
--- /dev/null
+++ b/arch/x86/crypto/aegis128-aesni-asm.S
@@ -0,0 +1,749 @@
+/*
+ * AES-NI + SSE2 implementation of AEGIS-128
+ *
+ * Copyright (c) 2017-2018 Ondrej Mosnacek 
+ * Copyright (C) 2017-2018 Red Hat, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+
+#define STATE0 %xmm0
+#define STATE1 %xmm1
+#define STATE2 %xmm2
+#define STATE3 %xmm3
+#define STATE4 %xmm4
+#define KEY%xmm5
+#define MSG%xmm5
+#define T0 %xmm6
+#define T1 %xmm7
+
+#define STATEP %rdi
+#define LEN%rsi
+#define SRC%rdx
+#define DST%rcx
+
+.section .rodata.cst16.aegis128_const, "aM", @progbits, 32
+.align 16
+.Laegis128_const_0:
+   .byte 0x00, 0x01, 0x01, 0x02, 0x03, 0x05, 0x08, 0x0d
+   .byte 0x15, 0x22, 0x37, 0x59, 0x90, 0xe9, 0x79, 0x62
+.Laegis128_const_1:
+   .byte 0xdb, 0x3d, 0x18, 0x55, 0x6d, 0xc2, 0x2f, 0xf1
+   .byte 0x20, 0x11, 0x31, 0x42, 0x73, 0xb5, 0x28, 0xdd
+
+.section .rodata.cst16.aegis128_counter, "aM", @progbits, 16
+.align 16
+.Laegis128_counter:
+   .byte 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+   .byte 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+
+.text
+
+/*
+ * aegis128_update
+ * input:
+ *   STATE[0-4] - input state
+ * output:
+ *   STATE[0-4] - output state (shifted positions)
+ * changed:
+ *   T0
+ */
+.macro aegis128_update
+   movdqa STATE4, T0
+   aesenc STATE0, STATE4
+   aesenc STATE1, STATE0
+   aesenc STATE2, STATE1
+   aesenc STATE3, STATE2
+   aesenc T0, STATE3
+.endm
+
+/*
+ * __load_partial: internal ABI
+ * input:
+ *   LEN - bytes
+ *   SRC - src
+ * output:
+ *   MSG  - message block
+ * changed:
+ *   T0
+ *   %r8
+ *   %r9
+ */
+__load_partial:
+   xor %r9, %r9
+   pxor MSG, MSG
+
+   mov LEN, %r8
+   and $0x1, %r8
+   jz .Lld_partial_1
+
+   mov LEN, %r8
+   and $0x1E, %r8
+   add SRC, %r8
+   mov (%r8), %r9b
+
+.Lld_partial_1:
+   mov LEN, %r8
+   and $0x2, %r8
+   jz .Lld_partial_2
+
+   mov LEN, %r8
+   and $0x1C, %r8
+   add SRC, %r8
+   shl $0x10, %r9
+   mov (%r8), %r9w
+
+.Lld_partial_2:
+   mov LEN, %r8
+   and $0x4, %r8
+   jz .Lld_partial_4
+
+   mov LEN, %r8
+   and $0x18, %r8
+   add SRC, %r8
+   shl $32, 

Re: [PATCH] crypto: skcipher - Fix skcipher_walk_aead_common

2017-11-24 Thread Ondrej Mosnáček
(I accidentally hit "reply" instead of "reply all", so resending)

2017-11-24 6:07 GMT+01:00 Herbert Xu :
> On Thu, Nov 23, 2017 at 01:49:06PM +0100, Ondrej Mosnacek wrote:
>> diff --git a/crypto/skcipher.c b/crypto/skcipher.c
>> index 4faa0fd53b0c..6c45ed536664 100644
>> --- a/crypto/skcipher.c
>> +++ b/crypto/skcipher.c
>> @@ -517,6 +517,9 @@ static int skcipher_walk_aead_common(struct 
>> skcipher_walk *walk,
>>   scatterwalk_copychunks(NULL, >in, req->assoclen, 2);
>>   scatterwalk_copychunks(NULL, >out, req->assoclen, 2);
>>
>> + scatterwalk_done(>in, 0, walk->total);
>> + scatterwalk_done(>out, 0, walk->total);
>
> That should be 1 instead of 0 for walk->out.
>
> Could you please fix and resubmit?

Since the associated data is not written, just skipped, I believe 0 is
more appropriate. scatterwalk_copychunks(..., 2) also calls
scatterwalk_pagedone() with out=0 internally.

O.M.


Re: [PATCH v4] crypto: gf128mul - define gf128mul_x_* in gf128mul.h

2017-04-01 Thread Ondrej Mosnáček
Never mind, Gmail is confusing me... there is indeed "v4" in the subject :)

O.M.

2017-04-01 17:19 GMT+02:00 Ondrej Mosnáček <omosna...@gmail.com>:
> Oops, sorry, wrong prefix...
>
> 2017-04-01 17:17 GMT+02:00 Ondrej Mosnacek <omosna...@gmail.com>:
>> The gf128mul_x_ble function is currently defined in gf128mul.c, because
>> it depends on the gf128mul_table_be multiplication table.
>>
>> However, since the function is very small and only uses two values from
>> the table, it is better for it to be defined as inline function in
>> gf128mul.h. That way, the function can be inlined by the compiler for
>> better performance.
>>
>> For consistency, the other gf128mul_x_* functions are also moved to the
>> header file. In addition, the code is rewritten to be constant-time.
>>
>> After this change, the speed of the generic 'xts(aes)' implementation
>> increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup
>> benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64
>> and CRYPTO_AES_NI_INTEL disabled).
>>
>> Signed-off-by: Ondrej Mosnacek <omosna...@gmail.com>
>> Cc: Eric Biggers <ebigg...@google.com>
>> ---
>> v3 -> v4: a faster version of gf128mul_x_lle
>> v2 -> v3: constant-time implementation
>> v1 -> v2: move all _x_ functions to the header, not just gf128mul_x_ble
>>
>>  crypto/gf128mul.c | 33 +---
>>  include/crypto/gf128mul.h | 55 
>> +--
>>  2 files changed, 54 insertions(+), 34 deletions(-)
>>
>> diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
>> index 04facc0..dc01212 100644
>> --- a/crypto/gf128mul.c
>> +++ b/crypto/gf128mul.c
>> @@ -130,43 +130,12 @@ static const u16 gf128mul_table_le[256] = 
>> gf128mul_dat(xda_le);
>>  static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be);
>>
>>  /*
>> - * The following functions multiply a field element by x or by x^8 in
>> + * The following functions multiply a field element by x^8 in
>>   * the polynomial field representation.  They use 64-bit word operations
>>   * to gain speed but compensate for machine endianness and hence work
>>   * correctly on both styles of machine.
>>   */
>>
>> -static void gf128mul_x_lle(be128 *r, const be128 *x)
>> -{
>> -   u64 a = be64_to_cpu(x->a);
>> -   u64 b = be64_to_cpu(x->b);
>> -   u64 _tt = gf128mul_table_le[(b << 7) & 0xff];
>> -
>> -   r->b = cpu_to_be64((b >> 1) | (a << 63));
>> -   r->a = cpu_to_be64((a >> 1) ^ (_tt << 48));
>> -}
>> -
>> -static void gf128mul_x_bbe(be128 *r, const be128 *x)
>> -{
>> -   u64 a = be64_to_cpu(x->a);
>> -   u64 b = be64_to_cpu(x->b);
>> -   u64 _tt = gf128mul_table_be[a >> 63];
>> -
>> -   r->a = cpu_to_be64((a << 1) | (b >> 63));
>> -   r->b = cpu_to_be64((b << 1) ^ _tt);
>> -}
>> -
>> -void gf128mul_x_ble(be128 *r, const be128 *x)
>> -{
>> -   u64 a = le64_to_cpu(x->a);
>> -   u64 b = le64_to_cpu(x->b);
>> -   u64 _tt = gf128mul_table_be[b >> 63];
>> -
>> -   r->a = cpu_to_le64((a << 1) ^ _tt);
>> -   r->b = cpu_to_le64((b << 1) | (a >> 63));
>> -}
>> -EXPORT_SYMBOL(gf128mul_x_ble);
>> -
>>  static void gf128mul_x8_lle(be128 *x)
>>  {
>> u64 a = be64_to_cpu(x->a);
>> diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
>> index 0bc9b5f..35ced9d 100644
>> --- a/include/crypto/gf128mul.h
>> +++ b/include/crypto/gf128mul.h
>> @@ -49,6 +49,7 @@
>>  #ifndef _CRYPTO_GF128MUL_H
>>  #define _CRYPTO_GF128MUL_H
>>
>> +#include 
>>  #include 
>>  #include 
>>
>> @@ -163,8 +164,58 @@ void gf128mul_lle(be128 *a, const be128 *b);
>>
>>  void gf128mul_bbe(be128 *a, const be128 *b);
>>
>> -/* multiply by x in ble format, needed by XTS */
>> -void gf128mul_x_ble(be128 *a, const be128 *b);
>> +/*
>> + * The following functions multiply a field element by x in
>> + * the polynomial field representation.  They use 64-bit word operations
>> + * to gain speed but compensate for machine endianness and hence work
>> + * correctly on both styles of machine.
>> + *
>> + * They are defined here for performance.
>> + */
>> +
>> +static inline u64 gf128mul_mask_from_bit(u64 x, int which)
>> +{
>> + 

Re: [PATCH v4] crypto: gf128mul - define gf128mul_x_* in gf128mul.h

2017-04-01 Thread Ondrej Mosnáček
Oops, sorry, wrong prefix...

2017-04-01 17:17 GMT+02:00 Ondrej Mosnacek :
> The gf128mul_x_ble function is currently defined in gf128mul.c, because
> it depends on the gf128mul_table_be multiplication table.
>
> However, since the function is very small and only uses two values from
> the table, it is better for it to be defined as inline function in
> gf128mul.h. That way, the function can be inlined by the compiler for
> better performance.
>
> For consistency, the other gf128mul_x_* functions are also moved to the
> header file. In addition, the code is rewritten to be constant-time.
>
> After this change, the speed of the generic 'xts(aes)' implementation
> increased from ~225 MiB/s to ~235 MiB/s (measured using 'cryptsetup
> benchmark -c aes-xts-plain64' on an Intel system with CRYPTO_AES_X86_64
> and CRYPTO_AES_NI_INTEL disabled).
>
> Signed-off-by: Ondrej Mosnacek 
> Cc: Eric Biggers 
> ---
> v3 -> v4: a faster version of gf128mul_x_lle
> v2 -> v3: constant-time implementation
> v1 -> v2: move all _x_ functions to the header, not just gf128mul_x_ble
>
>  crypto/gf128mul.c | 33 +---
>  include/crypto/gf128mul.h | 55 
> +--
>  2 files changed, 54 insertions(+), 34 deletions(-)
>
> diff --git a/crypto/gf128mul.c b/crypto/gf128mul.c
> index 04facc0..dc01212 100644
> --- a/crypto/gf128mul.c
> +++ b/crypto/gf128mul.c
> @@ -130,43 +130,12 @@ static const u16 gf128mul_table_le[256] = 
> gf128mul_dat(xda_le);
>  static const u16 gf128mul_table_be[256] = gf128mul_dat(xda_be);
>
>  /*
> - * The following functions multiply a field element by x or by x^8 in
> + * The following functions multiply a field element by x^8 in
>   * the polynomial field representation.  They use 64-bit word operations
>   * to gain speed but compensate for machine endianness and hence work
>   * correctly on both styles of machine.
>   */
>
> -static void gf128mul_x_lle(be128 *r, const be128 *x)
> -{
> -   u64 a = be64_to_cpu(x->a);
> -   u64 b = be64_to_cpu(x->b);
> -   u64 _tt = gf128mul_table_le[(b << 7) & 0xff];
> -
> -   r->b = cpu_to_be64((b >> 1) | (a << 63));
> -   r->a = cpu_to_be64((a >> 1) ^ (_tt << 48));
> -}
> -
> -static void gf128mul_x_bbe(be128 *r, const be128 *x)
> -{
> -   u64 a = be64_to_cpu(x->a);
> -   u64 b = be64_to_cpu(x->b);
> -   u64 _tt = gf128mul_table_be[a >> 63];
> -
> -   r->a = cpu_to_be64((a << 1) | (b >> 63));
> -   r->b = cpu_to_be64((b << 1) ^ _tt);
> -}
> -
> -void gf128mul_x_ble(be128 *r, const be128 *x)
> -{
> -   u64 a = le64_to_cpu(x->a);
> -   u64 b = le64_to_cpu(x->b);
> -   u64 _tt = gf128mul_table_be[b >> 63];
> -
> -   r->a = cpu_to_le64((a << 1) ^ _tt);
> -   r->b = cpu_to_le64((b << 1) | (a >> 63));
> -}
> -EXPORT_SYMBOL(gf128mul_x_ble);
> -
>  static void gf128mul_x8_lle(be128 *x)
>  {
> u64 a = be64_to_cpu(x->a);
> diff --git a/include/crypto/gf128mul.h b/include/crypto/gf128mul.h
> index 0bc9b5f..35ced9d 100644
> --- a/include/crypto/gf128mul.h
> +++ b/include/crypto/gf128mul.h
> @@ -49,6 +49,7 @@
>  #ifndef _CRYPTO_GF128MUL_H
>  #define _CRYPTO_GF128MUL_H
>
> +#include 
>  #include 
>  #include 
>
> @@ -163,8 +164,58 @@ void gf128mul_lle(be128 *a, const be128 *b);
>
>  void gf128mul_bbe(be128 *a, const be128 *b);
>
> -/* multiply by x in ble format, needed by XTS */
> -void gf128mul_x_ble(be128 *a, const be128 *b);
> +/*
> + * The following functions multiply a field element by x in
> + * the polynomial field representation.  They use 64-bit word operations
> + * to gain speed but compensate for machine endianness and hence work
> + * correctly on both styles of machine.
> + *
> + * They are defined here for performance.
> + */
> +
> +static inline u64 gf128mul_mask_from_bit(u64 x, int which)
> +{
> +   /* a constant-time version of 'x & ((u64)1 << which) ? (u64)-1 : 0' */
> +   return ((s64)(x << (63 - which)) >> 63);
> +}
> +
> +static inline void gf128mul_x_lle(be128 *r, const be128 *x)
> +{
> +   u64 a = be64_to_cpu(x->a);
> +   u64 b = be64_to_cpu(x->b);
> +
> +   /* equivalent to gf128mul_table_le[(b << 7) & 0xff] << 48
> +* (see crypto/gf128mul.c): */
> +   u64 _tt = gf128mul_mask_from_bit(b, 0) & ((u64)0xe1 << 56);
> +
> +   r->b = cpu_to_be64((b >> 1) | (a << 63));
> +   r->a = cpu_to_be64((a >> 1) ^ _tt);
> +}
> +
> +static inline void gf128mul_x_bbe(be128 *r, const be128 *x)
> +{
> +   u64 a = be64_to_cpu(x->a);
> +   u64 b = be64_to_cpu(x->b);
> +
> +   /* equivalent to gf128mul_table_be[a >> 63] (see crypto/gf128mul.c): 
> */
> +   u64 _tt = gf128mul_mask_from_bit(a, 63) & 0x87;
> +
> +   r->a = cpu_to_be64((a << 1) | (b >> 63));
> +   r->b = cpu_to_be64((b << 1) ^ _tt);
> +}
> +
> +/* needed by XTS */
> +static inline void gf128mul_x_ble(be128 *r, const be128 *x)
> +{
> +   u64 a = 

Re: [PATCH v3] crypto: gf128mul - define gf128mul_x_* in gf128mul.h

2017-04-01 Thread Ondrej Mosnáček
2017-04-01 5:44 GMT+02:00 Eric Biggers :
> Also, I realized that for gf128mul_x_lle() now that we aren't using the table 
> we
> don't need to shift '_tt' but rather can use the constant 0xe100:
>
> /* equivalent to (u64)gf128mul_table_le[(b << 7) & 0xff] << 48
>  * (see crypto/gf128mul.c): */
> u64 _tt = gf128mul_mask_from_bit(b, 0) & 0xe100;
>
> r->b = cpu_to_be64((b >> 1) | (a << 63));
> r->a = cpu_to_be64((a >> 1) ^ _tt);
>
> I think that would be better and you could send a v4 to do it that way if you
> want.  It's not a huge deal though.

Yes, I was hoping the compiler would be wise enough to fold the shift
into the constant, but I didn't actually check the assembly output...
I took the time to write a quick benchmark and the version without
shift is indeed notably faster.

That said, I'll go the extra mile and send a v4.

Thanks for the review!

O.M.


Re: [PATCH] crypto: gf128mul - define gf128mul_x_ble in gf128mul.h

2017-03-31 Thread Ondrej Mosnáček
Hi Jeff,

2017-03-31 8:05 GMT+02:00 Jeffrey Walton :
>>> Also note that '(b & ((u64)1 << 63)) ? 0x87 : 0x00;' is actually getting
>>> compiled as '((s64)b >> 63) & 0x87', which is branchless and therefore 
>>> makes the
>>> new version more efficient than one might expect:
>>>
>>> sar$0x3f,%rax
>>> and$0x87,%eax
>>>
>>> It could even be written the branchless way explicitly, but it shouldn't 
>>> matter.
>>
>> I think the definition using unsigned operations is more intuitive...
>> Let's just leave the clever tricks up to the compiler :)
>
> It may be a good idea to use the one that provides constant time-ness
> to help avoid leaking information.

That's a good point... I played around with various ways to write the
expression in Compiler Explorer [1] and indeed GCC fails to produce
constant-time code from my version on some architectures (e.g. the
32-bit ARM). The version with an explicit arithmetic right shift seems
to produce the most efficient code across platforms, so I'll rewrite
it like that for v3.

Thanks,
O.M.

[1] https://gcc.godbolt.org/


Re: [PATCH] crypto: gf128mul - define gf128mul_x_ble in gf128mul.h

2017-03-30 Thread Ondrej Mosnáček
Hi Eric,

2017-03-30 21:55 GMT+02:00 Eric Biggers :
> This is an improvement; I'm just thinking that maybe this should be done for 
> all
> the gf128mul_x_*() functions, if only so that they use a consistent style and
> are all defined next to each other.

Right, that doesn't seem to be a bad idea... I was confused for a
while by the '& 0xff' in the _lle one, but now I see it also uses just
two values of the table, so it can be re-written in a similar way. In
fact, the OCB mode from RFC 7253 (that I'm currently trying to port to
kernel crypto API) uses gf128mul_x_bbe, so it would be useful to have
that one accessible, too.

I will move them all in v2, then.

> Also note that '(b & ((u64)1 << 63)) ? 0x87 : 0x00;' is actually getting
> compiled as '((s64)b >> 63) & 0x87', which is branchless and therefore makes 
> the
> new version more efficient than one might expect:
>
> sar$0x3f,%rax
> and$0x87,%eax
>
> It could even be written the branchless way explicitly, but it shouldn't 
> matter.

I think the definition using unsigned operations is more intuitive...
Let's just leave the clever tricks up to the compiler :)

Thanks,
O.M.

>
> - Eric


Re: [PATCH] dm: switch dm-verity to async hash crypto API

2017-01-26 Thread Ondrej Mosnáček
Hi Gilad,

2017-01-24 15:38 GMT+01:00 Gilad Ben-Yossef :
> -   v->tfm = crypto_alloc_shash(v->alg_name, 0, 0);
> +   v->tfm = crypto_alloc_ahash(v->alg_name, 0, CRYPTO_ALG_ASYNC);

I believe you should pass zero as the mask here. When flags == 0 and
mask == CRYPTO_ALG_ASYNC, you are basically saying "I want only algs
that have flags & CRYPTO_ALG_ASYNC == 0", which means you should only
get ahash tfms that are always synchronous (see [1]). However, since
you set a non-NULL callback in verity_hash_init, I don't think this
was your intention. By setting the mask to zero, you should be able to
get also an actual async tfm.

Thanks,
Ondrej

[1] https://lkml.org/lkml/2016/12/13/904
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-19 Thread Ondrej Mosnáček
2017-01-18 5:48 GMT+01:00 Herbert Xu :
> I'm open to other proposals.  The basic requirement is to be able to
> process multiple blocks as one entity at the driver level, potentially
> generating the IVs there too.
>
> It's essentially the equivalent to full IPsec offload.

Hm, I just looked at what the IPsec IV generation is actually doing
and it seems to me that it's basically a crypto template that just
somehow transforms the IV before it is passed to the child cipher... I
thought for a while that you were implying that there already is some
facility in the crypto API that allows submitting multiple messages +
some initial sequence number that is auto-incremented and IVs are
generated from the numbers. However, I could not find anything like
that in the code, so now I think what you meant was just that I should
somehow pull the actual IV generators into the crypto layer so that
the IVs can be generated inside the hardware.

If all you had in mind is just an equivalent of the current IPsec IV
generation (as I understood it), then my bulk request scheme can in
fact support it (you'd just pass sector numbers as the IVs). Of
course, it would require additional changes over my patchset,
specifically the creation of crypto templates for the dm-crypt IV
modes, so they can be implemented by drivers. However, I wanted to
avoid this until the key management in dm-crypt is simplified...

If we also want to let the drivers process an offset+count chunk of
sectors while auto-incrementing the sector number, then something like
Binoy's approach would indeed be necessary, where the IV generators
would be just regular skciphers, taking the initial sector number as
the IV (although a disadvantage would be hard-coded sector/message
size). Note, though, that the generic implementation of such transform
could still use bulk requests on the underlying cipher so that
encryption/decryption is performed efficiently even if there are no
optimized/HW drivers for the specific IV generator templates.

I will now try to focus on the key management simplification and when
it is accepted/rejected we can discuss further about the best
approach.

Cheers,
Ondrej

>
> Thanks,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-17 Thread Ondrej Mosnáček
2017-01-13 15:29 GMT+01:00 Herbert Xu :
> What if the driver had hardware support for generating these IVs?
> With your scheme this cannot be supported at all.

That's true... I'm starting to think that this isn't really a good
idea. I was mainly trying to keep the door open for the random IV
support and also to keep the multi-key stuff (which was really only
intended for loop-AES partition support) out of the crypto API, but
both of these can be probably solved in a better way...

> Getting the IVs back is not actually that hard.  We could simply
> change the algorithm definition for the IV generator so that
> the IVs are embedded in the plaintext and ciphertext.  For
> example, you could declare it so that the for n sectors the
> first n*ivsize bytes would be the IV, and the actual plaintext
> or ciphertext would follow.
>
> With such a definition you could either generate the IVs in dm-crypt
> or have them generated in the IV generator.

That seems kind of hacky to me... but if that's what you prefer, then so be it.

Cheers,
Ondrej

>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-17 Thread Ondrej Mosnáček
Hi Binoy,

2017-01-16 9:37 GMT+01:00 Binoy Jayan :
> The initial goal of our proposal was to process the encryption requests with 
> the
> maximum possible block sizes with a hardware which has automated iv generation
> capabilities. But when it is done in software, and if the bulk
> requests are processed
> sequentially, one block at a time, the memory foot print could be
> reduced even if
> the bulk request exceeds a page. While your patch looks good, there
> are couple of
> drawbacks one of which is the maximum size of a bulk request is a page. This
> could limit the capability of the crypto hardware. If the whole bio is
> processed at
> once, which is what qualcomm's version of dm-req-crypt does, it achieves an 
> even
> better performance.

I see... well, I added the limit only so that the async fallback
implementation can allocate multiple requests, so they can be
processed in parallel, as they would be in the current dm-crypt code.
I'm not really sure if that brings any benefit, but I guess if some HW
accelerator has multiple engines, then this allows distributing the
work among them. (I wonder how switching to the crypto API's IV
generation will affect the situation for drivers that can process
requests in parallel, but do not support the IV generators...)

I could remove the limit and switch the fallback to sequential
processing (or maybe even allocate the requests from a mempool, the
way dm-crypt does it now...), but after Herbert's feedback I'm
probably going to scrap this patchset anyway...

>> Note that if the 'keycount' parameter of the cipher specification is set to a
>> value other than 1, dm-crypt still sends only one sector in each request, 
>> since
>> in such case the neighboring sectors are encrypted with different keys.
>
> This could be avoided if the key management is done at the crypto layer.

Yes, but remember that the only reasonable use-case for using keycount
!= 1 is mounting loop-AES partitions (which is kind of a legacy
format, so there is not much point in making HW drivers for it). It is
an unfortunate consequence of Milan's decision to make keycount an
independent part of the cipher specification (instead of making it
specific for the LMK mode), that all the other IV modes are now
'polluted' with the requirement to support it.

I discussed with Milan the possibility of deprecating the keycount
parameter (i.e. allowing only value of 64 for LMK and 1 for all the
other IV modes) and then converting the IV modes to skciphers (or IV
generators, or some combination of both). This would significantly
simplify the key management and allow for better optimization
strategies. However, I don't know if such change would be accepted by
device-mapper maintainers, since it may break someone's unusual
dm-crypt configuration...

Cheers,
Ondrej
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-13 Thread Ondrej Mosnáček
2017-01-13 11:41 GMT+01:00 Herbert Xu :
> On Thu, Jan 12, 2017 at 01:59:52PM +0100, Ondrej Mosnacek wrote:
>> the goal of this patchset is to allow those skcipher API users that need to
>> process batches of small messages (especially dm-crypt) to do so efficiently.
>
> Please explain why this can't be done with the existing framework
> using IV generators similar to the ones used for IPsec.

As I already mentioned in another thread, there are basically two reasons:

1) Milan would like to add authenticated encryption support to
dm-crypt (see [1]) and as part of this change, a new random IV mode
would be introduced. This mode generates a random IV for each sector
write, includes it in the authenticated data and stores it in the
sector's metadata (in a separate part of the disk). In this case
dm-crypt will need to have control over the IV generation (or at least
be able to somehow retrieve it after the crypto operation... but
passing RNG responsibility to drivers doesn't seem to be a good idea
anyway).

2) With this API, drivers wouldn't have to provide implementations for
specific IV generation modes, and just implement bulk requests for the
common modes/algorithms (XTS, CBC, ...) while still getting
performance benefit.

Regards,
Ondrej

[1] https://www.redhat.com/archives/dm-devel/2017-January/msg00028.html

>
> Thanks,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support

2017-01-13 Thread Ondrej Mosnáček
Hi Eric,

2017-01-13 4:19 GMT+01:00 Eric Biggers :
> To what extent does the performance benefit of this patchset result from just
> the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?
>
> If it's most of the benefit, would it make any sense to optimize
> kernel_fpu_begin() and kernel_fpu_end() instead?
>
> And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
> the bulk API would provide a significant performance boost, can you mention
> them?

In the case of AES-NI ciphers, this is the only benefit. However, this
change is not intended solely (or primarily) for AES-NI ciphers, but
also for other drivers that have a high per-request overhead.

This patchset is in fact a reaction to Binoy Jayan's efforts (see
[1]). The problem with small requests to HW crypto drivers comes up
for example in Qualcomm's Android [2], where they actually hacked
together their own version of dm-crypt (called 'dm-req-crypt'), which
in turn used a driver-specific crypto mode, which does the IV
generation on its own, and thereby is able to process several sectors
at once. The goal is to extend the crypto API so that vendors don't
have to roll out their own workarounds to have efficient disk
encryption.

> Interestingly, the arm64 equivalent to kernel_fpu_begin()
> (kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
> optimization where the SIMD registers aren't saved if they were already saved.
> I wonder why something similar isn't done on x86.

AFAIK, there can't be done much about the kernel_fpu_* functions, see e.g. [3].

Regards,
Ondrej

[1] https://lkml.org/lkml/2016/12/20/111
[2] 
https://nelenkov.blogspot.com/2015/05/hardware-accelerated-disk-encryption-in.html
[3] https://lkml.org/lkml/2016/12/21/354

>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2] crypto: Add IV generation algorithms

2017-01-11 Thread Ondrej Mosnáček
Hi Binoy,

2016-12-13 9:49 GMT+01:00 Binoy Jayan :
> Currently, the iv generation algorithms are implemented in dm-crypt.c.
> The goal is to move these algorithms from the dm layer to the kernel
> crypto layer by implementing them as template ciphers so they can be
> implemented in hardware for performance. As part of this patchset, the
> iv-generation code is moved from the dm layer to the crypto layer and
> adapt the dm-layer to send a whole 'bio' (as defined in the block layer)
> at a time. Each bio contains the in memory representation of physically
> contiguous disk blocks. The dm layer sets up a chained scatterlist of
> these blocks split into physically contiguous segments in memory so that
> DMA can be performed. The iv generation algorithms implemented in geniv.c
> include plain, plain64, essiv, benbi, null, lmk and tcw.

I like what you are trying to achieve, however I don't think the
solution you are heading towards (passing sector number to a special
crypto template) would be the best approach here. Milan is currently
trying to add authenticated encryption support to dm-crypt (see [1])
and as part of this change, a new random IV mode would be introduced.
This mode generates a random IV for each sector write, includes it in
the authenticated data and stores it in the sector's metadata (in a
separate part of the disk). In this case dm-crypt will need to have
control over the IV generation (or at least be able to somehow
retrieve it after the crypto operation).

That said, I believe a different approach would be preferable here. I
would suggest, instead of moving the IV generation to the crypto
layer, to add a new type of request to skcipher API (let's call it
'skcipher_bulk_request'), which could be used to submit several
messages at once (together in a single sg list), each with their own
IV, to a skcipher. This would allow drivers to optimize handling of
such requests (e.g. the SIMD ciphers could call kernel_fpu_begin/end
just once for the whole request). It could be done in such a way, that
implementing this type of requests would be optional and a fallback
implementation, which would just split the request into regular
skcipher_requests, would be automatically set for the ciphers that do
not set it themselves. That way this would require no changes to
crypto drivers in the beginning and optimizations could be added
incrementally.

The advantage of this approach to handling such "bulk" requests is
that crypto drivers could just optimize regular algorithms (xts(aes),
cbc(aes), etc.) and wouldn't need to mess with dm-crypt-specific IV
generation. This also means that other users that could potentially
benefit from bulking requests (perhaps network stack?) could use the
same functionality.

I have been playing with this idea for some time now and I should have
an RFC patchset ready soon...

Binoy, Herbert, what do you think about such approach?

[1] https://www.redhat.com/archives/dm-devel/2017-January/msg00028.html

> When using multiple keys with the original dm-crypt, the key selection is
> made based on the sector number as:
>
> key_index = sector & (key_count - 1)
>
> This restricts the usage of the same key for encrypting/decrypting a
> single bio. One way to solve this is to move the key management code from
> dm-crypt to cryto layer. But this seems tricky when using template ciphers
> because, when multiple ciphers are instantiated from dm layer, each cipher
> instance set with a unique subkey (part of the bigger master key) and
> these instances themselves do not have access to each other's instances
> or contexts. This way, a single instance cannot encryt/decrypt a whole bio.
> This has to be fixed.

Please note that the "keycount" parameter was added to dm-crypt solely
for the purpose of implementing the loop-AES partition format. In
general, the security benefit gained by using keycount > 1 is
debatable, so it does not really make sense to use it for anything
else than accessing legacy loopAES partitions. Since Milan decided to
add it as a generic parameter, instead of hard-coding the
functionality for the LMK mode, it can be technically used also in
other combinations, but IMHO it is perfectly reasonable to just give
up on optimizing the cases when keycount > 1. I believe the loop-AES
partition support is just not that important :)

Thanks,
Ondrej
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: gcm - Fix IV buffer size in crypto_gcm_setkey

2016-09-16 Thread Ondrej Mosnáček
The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.

Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek 
---
I randomly noticed this while going over igcm.c for an unrelated
reason. It seems the wrong buffer size never caused any noticeable
problems (it's been there since 2007), but it should be corrected
nonetheless...

 crypto/gcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index 70a892e8..f624ac9 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -117,7 +117,7 @@ static int crypto_gcm_setkey(struct crypto_aead
*aead, const u8 *key,
struct crypto_skcipher *ctr = ctx->ctr;
struct {
be128 hash;
-   u8 iv[8];
+   u8 iv[16];

struct crypto_gcm_setkey_result result;

-- 
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AEAD: Having separate underlying cipher handle for each request

2016-07-06 Thread Ondrej Mosnáček
2016-07-06 8:31 GMT+02:00, Herbert Xu :
> Well you're pretty much screwed as far as performance is concerned.
> So just postpone all processing to process context and allocate a new
> tfm for each request.

Yeah, I guess that's the only way then...

Thanks,
Ondrej

>
> Cheers,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AEAD: Having separate underlying cipher handle for each request

2016-07-06 Thread Ondrej Mosnáček
Hi Stephan,

2016-07-05 18:11 GMT+02:00, Stephan Mueller <smuel...@chronox.de>:
> Am Dienstag, 5. Juli 2016, 13:44:05 schrieb Ondrej Mosnáček:
>
> Hi Ondrej,
>
>> Hi,
>>
>> I'm trying to experimentally implement the GCM-SIV AEAD algorithm from
>> [1] for the Linux crypto API and I've ran into a problem...
>>
>> Basically, the encryption/decryption process starts by deriving a
>> so-called "record-encryption key" from the nonce (by encrypting it
>> using another key) and this key is then used to encrypt the plaintext
>> in CTR mode and to encrypt the final authentication tag (otherwise it
>> works similarly to GCM).
>
> I have not yet looked into [1], but it sounds like a specific GCM case, just
>
> like RFC4106 formatting.
>
> Did you consider the structure discussion in [4] and add a specific handler
>
> like the rfc4106() handler on top of GCM?
>
> [4] https://www.kernel.org/doc/htmldocs/crypto-API/ch02s07.html

Yes, if it were possible, I would certainly do it in such way :)
Unfortunately, this wouldn't work, since there are some significant
differences. For example, in GCM the initial counter block for CTR
encryption is derived directly from the nonce, while in GCM-SIV the
authentication tag is used as the ICB (with MSB set to 1).

Actually, it seems the authors tried to be clever and changed the bit
order to big endian (in gf128mul's terms it uses ble ordering instead
of lle), so even GHASH (here called POLYVAL) may need to be
reimplemented :/

Cheers,
Ondrej

>>
>> Since the API is asynchronous and multiple requests can be executed in
>> parallel over a single cipher handle (according to [2]), I need to
>> have a separate underlying cipher handle for each AEAD request.
>>
>> Now this is a problem, because aead_request has no init/destroy
>> mechanism where I could allocate/free the cipher handle, which means I
>> would have to do this inside the encrypt/decrypt function. AFAIK,
>> allocating with GFP_KERNEL inside encrypt/decrypt functions is
>> problematic, as they may be called from an atomic context.
>>
>> Besides, it seems that also the crypto_*_setkey functions are not
>> guaranteed to be atomic [3], and I will need to call such function
>> either way... OTOH, the CTR mode/AES driver should not really need to
>> allocate any memory there, so this may be tolerable...
>>
>> Does anyone have any ideas how to deal with this?
>>
>> BTW, for justification of deriving the key from the nonce see section
>> 9 of [1]. I don't really like the design decision, but there seems to
>> be no better way to achieve the same property...
>>
>> Thanks,
>> Ondrej Mosnáček
>>
>> [1] https://tools.ietf.org/html/draft-irtf-cfrg-gcmsiv-01
>> [2] https://www.kernel.org/doc/htmldocs/crypto-API/ch05s03.html
>> [3] https://www.spinics.net/lists/linux-crypto/msg17733.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-crypto"
>> in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Ciao
> Stephan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AEAD: Having separate underlying cipher handle for each request

2016-07-05 Thread Ondrej Mosnáček
Hi,

I'm trying to experimentally implement the GCM-SIV AEAD algorithm from
[1] for the Linux crypto API and I've ran into a problem...

Basically, the encryption/decryption process starts by deriving a
so-called "record-encryption key" from the nonce (by encrypting it
using another key) and this key is then used to encrypt the plaintext
in CTR mode and to encrypt the final authentication tag (otherwise it
works similarly to GCM).

Since the API is asynchronous and multiple requests can be executed in
parallel over a single cipher handle (according to [2]), I need to
have a separate underlying cipher handle for each AEAD request.

Now this is a problem, because aead_request has no init/destroy
mechanism where I could allocate/free the cipher handle, which means I
would have to do this inside the encrypt/decrypt function. AFAIK,
allocating with GFP_KERNEL inside encrypt/decrypt functions is
problematic, as they may be called from an atomic context.

Besides, it seems that also the crypto_*_setkey functions are not
guaranteed to be atomic [3], and I will need to call such function
either way... OTOH, the CTR mode/AES driver should not really need to
allocate any memory there, so this may be tolerable...

Does anyone have any ideas how to deal with this?

BTW, for justification of deriving the key from the nonce see section
9 of [1]. I don't really like the design decision, but there seems to
be no better way to achieve the same property...

Thanks,
Ondrej Mosnáček

[1] https://tools.ietf.org/html/draft-irtf-cfrg-gcmsiv-01
[2] https://www.kernel.org/doc/htmldocs/crypto-API/ch05s03.html
[3] https://www.spinics.net/lists/linux-crypto/msg17733.html
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html