[RFC PATCH cryptodev] crypto: sec_send_request() can be static

2018-08-03 Thread kbuild test robot


Fixes: 915e4e8413da ("crypto: hisilicon - SEC security accelerator driver")
Signed-off-by: kbuild test robot 
---
 sec_algs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c 
b/drivers/crypto/hisilicon/sec/sec_algs.c
index d69d3ce..f7d6d69 100644
--- a/drivers/crypto/hisilicon/sec/sec_algs.c
+++ b/drivers/crypto/hisilicon/sec/sec_algs.c
@@ -393,7 +393,7 @@ static void sec_alg_free_el(struct sec_request_el *el,
 }
 
 /* queuelock must be held */
-int sec_send_request(struct sec_request *sec_req, struct sec_queue *queue)
+static int sec_send_request(struct sec_request *sec_req, struct sec_queue 
*queue)
 {
struct sec_request_el *el, *temp;
int ret = 0;


[cryptodev:master 99/119] drivers/crypto/hisilicon/sec/sec_algs.c:396:5: sparse: symbol 'sec_send_request' was not declared. Should it be static?

2018-08-03 Thread kbuild test robot
tree:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
head:   a94bfd5f50ddf114594f5c7f99686f1cfa2b7a76
commit: 915e4e8413dacc086efcef4de04fdfdca57e8b1c [99/119] crypto: hisilicon - 
SEC security accelerator driver
reproduce:
# apt-get install sparse
git checkout 915e4e8413dacc086efcef4de04fdfdca57e8b1c
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> drivers/crypto/hisilicon/sec/sec_algs.c:396:5: sparse: symbol 
>> 'sec_send_request' was not declared. Should it be static?
   include/linux/slab.h:631:13: sparse: undefined identifier 
'__builtin_mul_overflow'
   include/linux/slab.h:631:13: sparse: not a function 
   include/linux/slab.h:631:13: sparse: not a function 
   include/linux/slab.h:631:13: sparse: call with no type!

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH v2 0/3] crypto/arm64: aes-ce-gcm - switch to 2-way aggregation

2018-08-03 Thread Ard Biesheuvel
On 3 August 2018 at 17:47, Herbert Xu  wrote:
> On Mon, Jul 30, 2018 at 11:06:39PM +0200, Ard Biesheuvel wrote:
>> Update the combined AES-GCM AEAD implementation to process two blocks
>> at a time, allowing us to switch to a faster version of the GHASH
>> implementation.
>>
>> Note that this does not update the core GHASH transform, only the
>> combined AES-GCM AEAD mode. GHASH is mostly used with AES anyway, and
>> the ARMv8 architecture mandates support for AES instructions if
>> 64-bit polynomial multiplication instructions are implemented. This
>> means that mosts users of the pmull.p64 based GHASH routines are better
>> off using the combined AES-GCM code anyway. Users of the pmull.p8 based
>> GHASH implementation are unlikely to benefit substantially from aggregation,
>> given that the multiplication phase is much more dominant in this case
>> (and it is only the reduction phase that is amortized over multiple
>> blocks)
>>
>> Performance numbers for Cortex-A53 can be found after patches #2 and #3.
>>
>> Changes since v1:
>> - rebase to take the changes in patch 'crypto: arm64 - revert NEON yield for
>>   fast AEAD implementations' which I sent out on July 29th
>> - add a patch to reduce the number of invocations of kernel_neon_begin()
>>   and kernel_neon_end() on the common path
>>
>> Ard Biesheuvel (3):
>>   crypto/arm64: aes-ce-gcm - operate on two input blocks at a time
>>   crypto/arm64: aes-ce-gcm - implement 2-way aggregation
>>   crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable
>>
>>  arch/arm64/crypto/ghash-ce-core.S | 136 +--
>>  arch/arm64/crypto/ghash-ce-glue.c | 176 
>>  2 files changed, 198 insertions(+), 114 deletions(-)
>
> All applied.  Thanks.

Thanks Herbert.


Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Herbert Xu
On Sun, Jul 29, 2018 at 04:52:30PM +0200, Ard Biesheuvel wrote:
> As it turns out, checking the TIF_NEED_RESCHED flag after each
> iteration results in a significant performance regression (~10%)
> when running fast algorithms (i.e., ones that use special instructions
> and operate in the < 4 cycles per byte range) on in-order cores with
> comparatively slow memory accesses such as the Cortex-A53.
> 
> Given the speed of these ciphers, and the fact that the page based
> nature of the AEAD scatterwalk API guarantees that the core NEON
> transform is never invoked with more than a single page's worth of
> input, we can estimate the worst case duration of any resulting
> scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
> processing a page's worth of input at 4 cycles per byte results in
> a delay of ~250 us, which is a reasonable upper bound.
> 
> So let's remove the yield checks from the fused AES-CCM and AES-GCM
> routines entirely.
> 
> This reverts commit 7b67ae4d5ce8e2f912377f5fbccb95811a92097f and
> partially reverts commit 7c50136a8aba8784f07fb66a950cc61a7f3d2ee3.
> 
> Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...")
> Fixes: 7b67ae4d5ce8 ("crypto: arm64/aes-ccm - yield NEON after every ...")
> Signed-off-by: Ard Biesheuvel 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 0/3] crypto/arm64: aes-ce-gcm - switch to 2-way aggregation

2018-08-03 Thread Herbert Xu
On Mon, Jul 30, 2018 at 11:06:39PM +0200, Ard Biesheuvel wrote:
> Update the combined AES-GCM AEAD implementation to process two blocks
> at a time, allowing us to switch to a faster version of the GHASH
> implementation.
> 
> Note that this does not update the core GHASH transform, only the
> combined AES-GCM AEAD mode. GHASH is mostly used with AES anyway, and
> the ARMv8 architecture mandates support for AES instructions if
> 64-bit polynomial multiplication instructions are implemented. This
> means that mosts users of the pmull.p64 based GHASH routines are better
> off using the combined AES-GCM code anyway. Users of the pmull.p8 based
> GHASH implementation are unlikely to benefit substantially from aggregation,
> given that the multiplication phase is much more dominant in this case
> (and it is only the reduction phase that is amortized over multiple
> blocks)
> 
> Performance numbers for Cortex-A53 can be found after patches #2 and #3.
> 
> Changes since v1:
> - rebase to take the changes in patch 'crypto: arm64 - revert NEON yield for
>   fast AEAD implementations' which I sent out on July 29th
> - add a patch to reduce the number of invocations of kernel_neon_begin()
>   and kernel_neon_end() on the common path
> 
> Ard Biesheuvel (3):
>   crypto/arm64: aes-ce-gcm - operate on two input blocks at a time
>   crypto/arm64: aes-ce-gcm - implement 2-way aggregation
>   crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable
> 
>  arch/arm64/crypto/ghash-ce-core.S | 136 +--
>  arch/arm64/crypto/ghash-ce-glue.c | 176 
>  2 files changed, 198 insertions(+), 114 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 1/2] crypto: dh - fix calculating encoded key size

2018-08-03 Thread Herbert Xu
On Fri, Jul 27, 2018 at 03:36:10PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> It was forgotten to increase DH_KPP_SECRET_MIN_SIZE to include 'q_size',
> causing an out-of-bounds write of 4 bytes in crypto_dh_encode_key(), and
> an out-of-bounds read of 4 bytes in crypto_dh_decode_key().  Fix it, and
> fix the lengths of the test vectors to match this.
> 
> Reported-by: syzbot+6d38d558c25b53b8f...@syzkaller.appspotmail.com
> Fixes: e3fe0ae12962 ("crypto: dh - add public key verification test")
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: ccp: Check for NULL PSP pointer at module unload

2018-08-03 Thread Herbert Xu
Tom Lendacky  wrote:
> Should the PSP initialization fail, the PSP data structure will be
> freed and the value contained in the sp_device struct set to NULL.
> At module unload, psp_dev_destroy() does not check if the pointer
> value is NULL and will end up dereferencing a NULL pointer.
> 
> Add a pointer check of the psp_data field in the sp_device struct
> in psp_dev_destroy() and return immediately if it is NULL.
> 
> Cc:  # 4.16.x-
> Fixes: 2a6170dfe755 ("crypto: ccp: Add Platform Security Processor (PSP) 
> device support")
> Signed-off-by: Tom Lendacky 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2 2/2] crypto: dh - make crypto_dh_encode_key() make robust

2018-08-03 Thread Herbert Xu
On Fri, Jul 27, 2018 at 03:36:11PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Make it return -EINVAL if crypto_dh_key_len() is incorrect rather than
> overflowing the buffer.
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: arm/chacha20 - always use vrev for 16-bit rotates

2018-08-03 Thread Herbert Xu
On Tue, Jul 24, 2018 at 06:29:07PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
> but the one-way code (used on remainder blocks) implements it with
> vshl + vsri, which is slower.  Switch the one-way code to vrev32.16 too.
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 0/3] crypto: fix crash in scatterwalk_pagedone()

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 10:54:55AM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> This series fixes the bug reported by Liu Chao (found using syzkaller)
> where a crash occurs in scatterwalk_pagedone() on architectures such as
> arm and arm64 that implement flush_dcache_page(), due to an invalid page
> pointer when walk->offset == 0.  This series attempts to address the
> underlying problem which is that scatterwalk_pagedone() shouldn't have
> been called at all in that case.
> 
> Eric Biggers (3):
>   crypto: skcipher - fix crash flushing dcache in error path
>   crypto: blkcipher - fix crash flushing dcache in error path
>   crypto: ablkcipher - fix crash flushing dcache in error path
> 
>  crypto/ablkcipher.c | 57 +
>  crypto/blkcipher.c  | 54 +-
>  crypto/skcipher.c   | 53 -
>  3 files changed, 79 insertions(+), 85 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: skcipher - remove unnecessary setting of walk->nbytes

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 10:21:29AM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Setting 'walk->nbytes = walk->total' in skcipher_walk_first() doesn't
> make sense because actually walk->nbytes needs to be set to the length
> of the first step in the walk, which may be less than walk->total.  This
> is done by skcipher_walk_next() which is called immediately afterwards.
> Also walk->nbytes was already set to 0 in skcipher_walk_skcipher(),
> which is a better default value in case it's forgotten to be set later.
> 
> Therefore, remove the unnecessary assignment to walk->nbytes.
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: scatterwalk - remove scatterwalk_samebuf()

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 10:04:28AM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> scatterwalk_samebuf() is never used.  Remove it.
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: skcipher - fix aligning block size in skcipher_copy_iv()

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 09:57:50AM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> The ALIGN() macro needs to be passed the alignment, not the alignmask
> (which is the alignment minus 1).
> 
> Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
> Cc:  # v4.10+
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: scatterwalk - remove 'chain' argument from scatterwalk_crypto_chain()

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 10:01:33AM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> All callers pass chain=0 to scatterwalk_crypto_chain().
> 
> Remove this unneeded parameter.
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: tcrypt - reschedule during speed tests

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 05:18:48PM +0300, Horia Geantă wrote:
> Avoid RCU stalls in the case of non-preemptible kernel and lengthy
> speed tests by rescheduling when advancing from one block size
> to another.
> 
> Signed-off-by: Horia Geantă 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH V2 0/3] Hisilicon SEC crypto driver (hip06 / hip07)

2018-08-03 Thread Herbert Xu
On Mon, Jul 23, 2018 at 04:49:52PM +0100, Jonathan Cameron wrote:
> The driver provides in kernel support for the Hisilicon SEC accelerator
> found in the hip06 and hip07 SoCs.  There are 4 such units on the D05
> board for which an appropriate DT binding has been provided.  ACPI also
> works with an appropriate UEFI build.
> 
> The hardware does not update the IV in chaining or counting modes.
> This is done in the drive ron completion of the cipher operation.
> 
> The driver support AES, DES and 3DES block ciphers in a range of
> modes (others to follow). Hash and AAED support to follow.
> 
> Sorry for the delay on this one, other priorities and all that...
> 
> Changes since V1.
> 1) DT binding fixes suggested by Rob Herring in patches 1 and 3.
> 2) Added XTS key check as suggested by Stephan Muller.
> 3) A trivial use after free found during testing of the above.
> 
> Changes since RFC.
> 1) Addition of backlog queuing as needed to support dm-crypt usecases.
> 2) iommu presence tests now done as Robin Murphy suggested.
> 3) Hardware limiation to 32MB requests worked aroud in driver so it will
>now support very large requests (512*32MB).  Larger request handling
>than this would require a longer queue with the associate overheads and
>is considered unlikely to be necessary.
> 4) The specific handling related to the inline IV patch set from Stephan
>has been dropped for now.
> 5) Interrupt handler was previous more complex than necessary so has been
>reworked.
> 6) Use of the bounce buffer for small packeets is dropped for now.  This is a
>performance optimization that made the code harder to review and can be
>reintroduced as necessary at a later date.
> 7) Restructuring of some code to simplify hash and aaed (hash implemented
>but not ready fo upstream at this time)
> 8) Various minor fixes and reworks of the code
>* several off by one errors in the cleanup paths
>* single template for enc and dec
>* drop dec_key as not used (enc_key was used in both cases)
>* drop dma pool for IVs as it breaks chaining.
>* lots of spinlocks changed to mutexes as not taken in atomic context.
>* nasty memory leak cleaned up.
> 
> Jonathan Cameron (3):
>   dt-bindings: Add bindings for Hisilicon SEC crypto accelerators.
>   crypto: hisilicon SEC security accelerator driver
>   arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC
> 
>  .../bindings/crypto/hisilicon,hip07-sec.txt|   67 +
>  arch/arm64/boot/dts/hisilicon/hip07.dtsi   |  284 +
>  drivers/crypto/Kconfig |2 +
>  drivers/crypto/Makefile|1 +
>  drivers/crypto/hisilicon/Kconfig   |   14 +
>  drivers/crypto/hisilicon/Makefile  |2 +
>  drivers/crypto/hisilicon/sec/Makefile  |3 +
>  drivers/crypto/hisilicon/sec/sec_algs.c| 1122 +
>  drivers/crypto/hisilicon/sec/sec_drv.c | 1323 
> 
>  drivers/crypto/hisilicon/sec/sec_drv.h |  428 +++
>  10 files changed, 3246 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/crypto/hisilicon,hip07-sec.txt
>  create mode 100644 drivers/crypto/hisilicon/Kconfig
>  create mode 100644 drivers/crypto/hisilicon/Makefile
>  create mode 100644 drivers/crypto/hisilicon/sec/Makefile
>  create mode 100644 drivers/crypto/hisilicon/sec/sec_algs.c
>  create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.c
>  create mode 100644 drivers/crypto/hisilicon/sec/sec_drv.h

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: CTR DRBG - in-place cipher operation

2018-08-03 Thread Herbert Xu
On Fri, Jul 20, 2018 at 07:42:01PM +0200, Stephan Müller wrote:
> The cipher implementations of the kernel crypto API favor in-place
> cipher operations. Thus, switch the CTR cipher operation in the DRBG to
> perform in-place operations. This is implemented by using the output
> buffer as input buffer and zeroizing it before the cipher operation to
> implement a CTR encryption of a NULL buffer.
> 
> The speed improvement is quite visibile with the following comparison
> using the LRNG implementation.
> 
> Without the patch set:
> 
>   16 bytes|   12.267661 MB/s|61338304 bytes |  500213 ns
>   32 bytes|   23.603770 MB/s|   118018848 bytes |  500073 ns
>   64 bytes|   46.732262 MB/s|   233661312 bytes |  500241 ns
>  128 bytes|   90.038042 MB/s|   450190208 bytes |  500244 ns
>  256 bytes|  160.399616 MB/s|   801998080 bytes |  500393 ns
>  512 bytes|  259.878400 MB/s|  1299392000 bytes |  501675 ns
> 1024 bytes|  386.050662 MB/s|  1930253312 bytes |  501661 ns
> 2048 bytes|  493.641728 MB/s|  2468208640 bytes |  501598 ns
> 4096 bytes|  581.835981 MB/s|  2909179904 bytes |  503426 ns
> 
> With the patch set:
> 
>   16 bytes | 17.051142 MB/s | 85255712 bytes |  500854 ns
>   32 bytes | 32.695898 MB/s |163479488 bytes |  500544 ns
>   64 bytes | 64.490739 MB/s |322453696 bytes |  500954 ns
>  128 bytes |123.285043 MB/s |616425216 bytes |  500201 ns
>  256 bytes |233.434573 MB/s |   1167172864 bytes |  500573 ns
>  512 bytes |384.405197 MB/s |   1922025984 bytes |  500671 ns
> 1024 bytes |566.313370 MB/s |   2831566848 bytes |  501080 ns
> 2048 bytes |744.518042 MB/s |   3722590208 bytes |  500926 ns
> 4096 bytes |867.501670 MB/s |   4337508352 bytes |  502181 ns
> 
> Signed-off-by: Stephan Mueller 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH v2] crypto: x86/aegis,morus - Fix and simplify CPUID checks

2018-08-03 Thread Ondrej Mosnacek
It turns out I had misunderstood how the x86_match_cpu() function works.
It evaluates a logical OR of the matching conditions, not logical AND.
This caused the CPU feature checks for AEGIS to pass even if only SSE2
(but not AES-NI) was supported (or vice versa), leading to potential
crashes if something tried to use the registered algs.

This patch switches the checks to a simpler method that is used e.g. in
the Camellia x86 code.

The patch also removes the MODULE_DEVICE_TABLE declarations which
actually seem to cause the modules to be auto-loaded at boot, which is
not desired. The crypto API on-demand module loading is sufficient.

Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: 6ecc9d9ff91f ("crypto: x86 - Add optimized MORUS implementations")
Signed-off-by: Ondrej Mosnacek 
---
 arch/x86/crypto/aegis128-aesni-glue.c  | 12 
 arch/x86/crypto/aegis128l-aesni-glue.c | 12 
 arch/x86/crypto/aegis256-aesni-glue.c  | 12 
 arch/x86/crypto/morus1280-avx2-glue.c  | 10 +++---
 arch/x86/crypto/morus1280-sse2-glue.c  | 10 +++---
 arch/x86/crypto/morus640-sse2-glue.c   | 10 +++---
 6 files changed, 21 insertions(+), 45 deletions(-)

diff --git a/arch/x86/crypto/aegis128-aesni-glue.c 
b/arch/x86/crypto/aegis128-aesni-glue.c
index 5de7c0d46edf..acd11b3bf639 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -375,16 +375,12 @@ static struct aead_alg crypto_aegis128_aesni_alg[] = {
}
 };
 
-static const struct x86_cpu_id aesni_cpu_id[] = {
-   X86_FEATURE_MATCH(X86_FEATURE_AES),
-   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
-   {}
-};
-MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
-
 static int __init crypto_aegis128_aesni_module_init(void)
 {
-   if (!x86_match_cpu(aesni_cpu_id))
+   if (!boot_cpu_has(X86_FEATURE_XMM2) ||
+   !boot_cpu_has(X86_FEATURE_AES) ||
+   !boot_cpu_has(X86_FEATURE_OSXSAVE) ||
+   !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
return -ENODEV;
 
return crypto_register_aeads(crypto_aegis128_aesni_alg,
diff --git a/arch/x86/crypto/aegis128l-aesni-glue.c 
b/arch/x86/crypto/aegis128l-aesni-glue.c
index 876e4866e633..2071c3d1ae07 100644
--- a/arch/x86/crypto/aegis128l-aesni-glue.c
+++ b/arch/x86/crypto/aegis128l-aesni-glue.c
@@ -375,16 +375,12 @@ static struct aead_alg crypto_aegis128l_aesni_alg[] = {
}
 };
 
-static const struct x86_cpu_id aesni_cpu_id[] = {
-   X86_FEATURE_MATCH(X86_FEATURE_AES),
-   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
-   {}
-};
-MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
-
 static int __init crypto_aegis128l_aesni_module_init(void)
 {
-   if (!x86_match_cpu(aesni_cpu_id))
+   if (!boot_cpu_has(X86_FEATURE_XMM2) ||
+   !boot_cpu_has(X86_FEATURE_AES) ||
+   !boot_cpu_has(X86_FEATURE_OSXSAVE) ||
+   !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
return -ENODEV;
 
return crypto_register_aeads(crypto_aegis128l_aesni_alg,
diff --git a/arch/x86/crypto/aegis256-aesni-glue.c 
b/arch/x86/crypto/aegis256-aesni-glue.c
index 2b5dd3af8f4d..b5f2a8fd5a71 100644
--- a/arch/x86/crypto/aegis256-aesni-glue.c
+++ b/arch/x86/crypto/aegis256-aesni-glue.c
@@ -375,16 +375,12 @@ static struct aead_alg crypto_aegis256_aesni_alg[] = {
}
 };
 
-static const struct x86_cpu_id aesni_cpu_id[] = {
-   X86_FEATURE_MATCH(X86_FEATURE_AES),
-   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
-   {}
-};
-MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
-
 static int __init crypto_aegis256_aesni_module_init(void)
 {
-   if (!x86_match_cpu(aesni_cpu_id))
+   if (!boot_cpu_has(X86_FEATURE_XMM2) ||
+   !boot_cpu_has(X86_FEATURE_AES) ||
+   !boot_cpu_has(X86_FEATURE_OSXSAVE) ||
+   !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
return -ENODEV;
 
return crypto_register_aeads(crypto_aegis256_aesni_alg,
diff --git a/arch/x86/crypto/morus1280-avx2-glue.c 
b/arch/x86/crypto/morus1280-avx2-glue.c
index f111f36d26dc..6634907d6ccd 100644
--- a/arch/x86/crypto/morus1280-avx2-glue.c
+++ b/arch/x86/crypto/morus1280-avx2-glue.c
@@ -37,15 +37,11 @@ asmlinkage void crypto_morus1280_avx2_final(void *state, 
void *tag_xor,
 
 MORUS1280_DECLARE_ALGS(avx2, "morus1280-avx2", 400);
 
-static const struct x86_cpu_id avx2_cpu_id[] = {
-X86_FEATURE_MATCH(X86_FEATURE_AVX2),
-{}
-};
-MODULE_DEVICE_TABLE(x86cpu, avx2_cpu_id);
-
 static int __init crypto_morus1280_avx2_module_init(void)
 {
-   if (!x86_match_cpu(avx2_cpu_id))
+   if (!boot_cpu_has(X86_FEATURE_AVX2) ||
+   !boot_cpu_has(X86_FEATURE_OSXSAVE) ||
+   !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
return -ENODEV;
 
return crypto_register_aeads(crypto_morus1280_avx2_algs,
diff --git a/arch/x86/crypto/morus1280-sse2-glue.c 
b/arch/x86/crypto/morus1280-sse2-glue.c
index 839270aa713c..95cf857d2cbb 

Re: [PATCH] crypto: x86/aegis - Fix CPUID checks

2018-08-03 Thread Ondrej Mosnacek
On Thu, Aug 2, 2018 at 11:17 AM Ondrej Mosnacek  wrote:
> It turns out I had misunderstood how the x86_match_cpu() function works.
> It evaluates a logical OR of the matching conditions, not logical AND.
> This caused the CPU feature checks to pass even if only SSE2 (but not
> AES-NI) was supported (ir vice versa), leading to potential crashes if
> something tried to use the registered algs.
>
> This patch fixes the checks to ensure that both required CPU features
> are supported. The MODULE_DEVICE_TABLE declaration is specified only for
> the AES-NI feature array, because I'm not sure what having multiple such
> declarations would cause and I believe it should suffice to match on the
> more important feature at the alias level and let the init routine do
> the full check.
>
> Signed-off-by: Ondrej Mosnacek 
> ---
> Hi Herbert,
>
> this patch fixes the CPU checks of the AEGIS AES-NI/SSE2 implementations
> that have been introduced in 4.18. Once reviewed, it should go to Linus'
> tree since it may cause crashes on some systems if the corresponding
> configs are enabled.
>
> @x86 folks, please take a look if I use the MODULE_DEVICE_TABLE macro
> correctly here (I'm not sure how to properly declare that the module
> needs two CPU features to be both supported; all other modules I saw had
> either only single match rule or required just one of the rules to
> match).

Never mind, I realized MODULE_DEVICE_TABLE isn't actually needed at
all here (most modules in arch/x86/crypto don't even use it) and IIUC
it will cause the modules to load automatically on boot (which isn't
desirable for these new algorithms).

I'll send a v2 of the patch that converts both AEGIS and MORUS checks
to what most other modules use (e.g. the Camellia x86 optimizations).

>
> Thanks,
>
> Ondrej Mosnacek
>
>  arch/x86/crypto/aegis128-aesni-glue.c  | 8 ++--
>  arch/x86/crypto/aegis128l-aesni-glue.c | 8 ++--
>  arch/x86/crypto/aegis256-aesni-glue.c  | 8 ++--
>  3 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/crypto/aegis128-aesni-glue.c 
> b/arch/x86/crypto/aegis128-aesni-glue.c
> index 5de7c0d46edf..6a5abed59593 100644
> --- a/arch/x86/crypto/aegis128-aesni-glue.c
> +++ b/arch/x86/crypto/aegis128-aesni-glue.c
> @@ -377,14 +377,18 @@ static struct aead_alg crypto_aegis128_aesni_alg[] = {
>
>  static const struct x86_cpu_id aesni_cpu_id[] = {
> X86_FEATURE_MATCH(X86_FEATURE_AES),
> -   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
> {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
>
> +static const struct x86_cpu_id sse2_cpu_id[] = {
> +   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
> +   {}
> +};
> +
>  static int __init crypto_aegis128_aesni_module_init(void)
>  {
> -   if (!x86_match_cpu(aesni_cpu_id))
> +   if (!x86_match_cpu(aesni_cpu_id) || !x86_match_cpu(sse2_cpu_id))
> return -ENODEV;
>
> return crypto_register_aeads(crypto_aegis128_aesni_alg,
> diff --git a/arch/x86/crypto/aegis128l-aesni-glue.c 
> b/arch/x86/crypto/aegis128l-aesni-glue.c
> index 876e4866e633..691c52701c2d 100644
> --- a/arch/x86/crypto/aegis128l-aesni-glue.c
> +++ b/arch/x86/crypto/aegis128l-aesni-glue.c
> @@ -377,14 +377,18 @@ static struct aead_alg crypto_aegis128l_aesni_alg[] = {
>
>  static const struct x86_cpu_id aesni_cpu_id[] = {
> X86_FEATURE_MATCH(X86_FEATURE_AES),
> -   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
> {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
>
> +static const struct x86_cpu_id sse2_cpu_id[] = {
> +   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
> +   {}
> +};
> +
>  static int __init crypto_aegis128l_aesni_module_init(void)
>  {
> -   if (!x86_match_cpu(aesni_cpu_id))
> +   if (!x86_match_cpu(aesni_cpu_id) || !x86_match_cpu(sse2_cpu_id))
> return -ENODEV;
>
> return crypto_register_aeads(crypto_aegis128l_aesni_alg,
> diff --git a/arch/x86/crypto/aegis256-aesni-glue.c 
> b/arch/x86/crypto/aegis256-aesni-glue.c
> index 2b5dd3af8f4d..481b5e4f4fd0 100644
> --- a/arch/x86/crypto/aegis256-aesni-glue.c
> +++ b/arch/x86/crypto/aegis256-aesni-glue.c
> @@ -377,14 +377,18 @@ static struct aead_alg crypto_aegis256_aesni_alg[] = {
>
>  static const struct x86_cpu_id aesni_cpu_id[] = {
> X86_FEATURE_MATCH(X86_FEATURE_AES),
> -   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
> {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, aesni_cpu_id);
>
> +static const struct x86_cpu_id sse2_cpu_id[] = {
> +   X86_FEATURE_MATCH(X86_FEATURE_XMM2),
> +   {}
> +};
> +
>  static int __init crypto_aegis256_aesni_module_init(void)
>  {
> -   if (!x86_match_cpu(aesni_cpu_id))
> +   if (!x86_match_cpu(aesni_cpu_id) || !x86_match_cpu(sse2_cpu_id))
> return -ENODEV;
>
> return crypto_register_aeads(crypto_aegis256_aesni_alg,
> --
> 2.17.1
>

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.


Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Ard Biesheuvel
On 3 August 2018 at 10:17, Herbert Xu  wrote:
> On Fri, Aug 03, 2018 at 09:10:08AM +0200, Ard Biesheuvel wrote:
>> But I think it's too late now to take this into v4.18. Could you
>> please queue this (and my other two pending arm64/aes-gcm patches, if
>> possible) for v4.19 instead?
>
> OK I'll do that.
>

Thanks.

But, actually, those two pending patches are 3 piece series now ...
(v2 of 'crypto/arm64: aes-ce-gcm - switch to 2-way aggregation')

Thanks,
Ard.


Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Herbert Xu
On Fri, Aug 03, 2018 at 09:10:08AM +0200, Ard Biesheuvel wrote:
> But I think it's too late now to take this into v4.18. Could you
> please queue this (and my other two pending arm64/aes-gcm patches, if
> possible) for v4.19 instead?

OK I'll do that.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Ard Biesheuvel
On 3 August 2018 at 08:14, Herbert Xu  wrote:
> On Sun, Jul 29, 2018 at 04:52:30PM +0200, Ard Biesheuvel wrote:
>> As it turns out, checking the TIF_NEED_RESCHED flag after each
>> iteration results in a significant performance regression (~10%)
>> when running fast algorithms (i.e., ones that use special instructions
>> and operate in the < 4 cycles per byte range) on in-order cores with
>> comparatively slow memory accesses such as the Cortex-A53.
>>
>> Given the speed of these ciphers, and the fact that the page based
>> nature of the AEAD scatterwalk API guarantees that the core NEON
>> transform is never invoked with more than a single page's worth of
>> input, we can estimate the worst case duration of any resulting
>> scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
>> processing a page's worth of input at 4 cycles per byte results in
>> a delay of ~250 us, which is a reasonable upper bound.
>>
>> So let's remove the yield checks from the fused AES-CCM and AES-GCM
>> routines entirely.
>>
>> This reverts commit 7b67ae4d5ce8e2f912377f5fbccb95811a92097f and
>> partially reverts commit 7c50136a8aba8784f07fb66a950cc61a7f3d2ee3.
>>
>> Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...")
>> Fixes: 7b67ae4d5ce8 ("crypto: arm64/aes-ccm - yield NEON after every ...")
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> This supersedes my series 'crypto/arm64: reduce impact of NEON yield checks'
>> sent out on the 24th of July.
>>
>> Given the significant performance regression, we may want to treat this as
>> a fix (the patches in question went into v4.18)
>>
>> This patch applies onto my patch 'crypto/arm64: aes-ce-gcm - add missing
>> kernel_neon_begin/end pair' which I sent out on the 27th of July, which
>> fixes a data corruption bug, whic should also be applied as a fix.
>
> Acked-by: Herbert Xu 
>

Thanks Herbert.

But I think it's too late now to take this into v4.18. Could you
please queue this (and my other two pending arm64/aes-gcm patches, if
possible) for v4.19 instead?

Thanks,


Re: [PATCH] crypto: arm64 - revert NEON yield for fast AEAD implementations

2018-08-03 Thread Herbert Xu
On Sun, Jul 29, 2018 at 04:52:30PM +0200, Ard Biesheuvel wrote:
> As it turns out, checking the TIF_NEED_RESCHED flag after each
> iteration results in a significant performance regression (~10%)
> when running fast algorithms (i.e., ones that use special instructions
> and operate in the < 4 cycles per byte range) on in-order cores with
> comparatively slow memory accesses such as the Cortex-A53.
> 
> Given the speed of these ciphers, and the fact that the page based
> nature of the AEAD scatterwalk API guarantees that the core NEON
> transform is never invoked with more than a single page's worth of
> input, we can estimate the worst case duration of any resulting
> scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
> processing a page's worth of input at 4 cycles per byte results in
> a delay of ~250 us, which is a reasonable upper bound.
> 
> So let's remove the yield checks from the fused AES-CCM and AES-GCM
> routines entirely.
> 
> This reverts commit 7b67ae4d5ce8e2f912377f5fbccb95811a92097f and
> partially reverts commit 7c50136a8aba8784f07fb66a950cc61a7f3d2ee3.
> 
> Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...")
> Fixes: 7b67ae4d5ce8 ("crypto: arm64/aes-ccm - yield NEON after every ...")
> Signed-off-by: Ard Biesheuvel 
> ---
> This supersedes my series 'crypto/arm64: reduce impact of NEON yield checks'
> sent out on the 24th of July.
> 
> Given the significant performance regression, we may want to treat this as
> a fix (the patches in question went into v4.18)
> 
> This patch applies onto my patch 'crypto/arm64: aes-ce-gcm - add missing
> kernel_neon_begin/end pair' which I sent out on the 27th of July, which
> fixes a data corruption bug, whic should also be applied as a fix.

Acked-by: Herbert Xu 

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt