Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 11:05 PM, Herbert Xu
 wrote:
> On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
>>
>> I'm pretty sure we have random asm code that may not maintain a
>> 16-byte stack alignment when it calls other code (including, in some
>> cases, calling C code).
>>
>> So I'm not at all convinced that this is a good idea. We shouldn't
>> expect 16-byte alignment to be something trustworthy.
>
> So what if we audited all the x86 assembly code to fix this? Would
> it then be acceptable to do a 16-byte aligned stack?
>
> On the face of it it doesn't seem to be a huge amount of code
> assuming they mostly live under arch/x86.

The problem is that we have nasties like TRACE_IRQS_OFF.  Performance
doesn't really matter for these macros, so we could probably rig up a
helper for forcibly align the stack there.  Maybe
FRAME_BEGIN_FORCE_ALIGN?  I also think I'd rather not to modify
pt_regs.  We should just fix the small number of code paths that
create a pt_regs and then call into C code to align the stack.

But if we can't do this with automatic verification, then I'm not sure
I want to do it at all.  The asm is already more precarious than I'd
like, and having a code path that is misaligned is asking for obscure
bugs down the road.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 11:47 PM, Herbert Xu
 wrote:
> Andy Lutomirski  wrote:
>> There are some hashes (e.g. sha224) that have some internal trickery
>> to make sure that only the correct number of output bytes are
>> generated.  If something goes wrong, they could potentially overrun
>> the output buffer.
>>
>> Make the test more robust by allocating only enough space for the
>> correct output size so that memory debugging will catch the error if
>> the output is overrun.
>>
>> Tested by intentionally breaking sha224 to output all 256
>> internally-generated bits while running on KASAN.
>>
>> Cc: Ard Biesheuvel 
>> Cc: Herbert Xu 
>> Signed-off-by: Andy Lutomirski 
>
> This patch doesn't seem to depend on anything else in the series.
> Do you want me to take it separately?

Yes, please.  Its only relation to the rest of the series is that I
wanted to make sure that I didn't mess up sha224's finalization code.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: arm64 broken

2017-01-11 Thread Herbert Xu
Rob Rice  wrote:
> I’m working on updating a patchset. The master branch in crypto-2.6 doesn’t 
> compile for ARM64. The first couple errors are listed below. A colleague 
> believes that the following commit in rc2 fixes the problem.

I presume you mean cryptodev and not crypto.

> commit b4b8664d291ac1998e0f0bcdc96b6397f0fe68b3
> Author: Al Viro mailto:v...@zeniv.linux.org.uk>>
> Date:   Mon Dec 26 04:10:19 2016 -0500
> 
>arm64: don't pull uaccess.h into *.S
>
>Split asm-only parts of arm64 uaccess.h into a new header and use that
>from *.S.
>
>Signed-off-by: Al Viro  >
> 
> Any chance we could either pull in the fix or move to rc2?

Sure I'll pull it in.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread Herbert Xu
Andy Lutomirski  wrote:
> There are some hashes (e.g. sha224) that have some internal trickery
> to make sure that only the correct number of output bytes are
> generated.  If something goes wrong, they could potentially overrun
> the output buffer.
> 
> Make the test more robust by allocating only enough space for the
> correct output size so that memory debugging will catch the error if
> the output is overrun.
> 
> Tested by intentionally breaking sha224 to output all 256
> internally-generated bits while running on KASAN.
> 
> Cc: Ard Biesheuvel 
> Cc: Herbert Xu 
> Signed-off-by: Andy Lutomirski 

This patch doesn't seem to depend on anything else in the series.
Do you want me to take it separately?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Ingo Molnar

* Herbert Xu  wrote:

> On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
> >
> > I'm pretty sure we have random asm code that may not maintain a
> > 16-byte stack alignment when it calls other code (including, in some
> > cases, calling C code).
> > 
> > So I'm not at all convinced that this is a good idea. We shouldn't
> > expect 16-byte alignment to be something trustworthy.
> 
> So what if we audited all the x86 assembly code to fix this? Would
> it then be acceptable to do a 16-byte aligned stack?

Audits for small but deadly details that isn't checked automatically by tooling 
would inevitably bitrot again - and in this particular case there's a 50% 
chance 
that a new, buggy change would test out to be 'fine' on a kernel developer's 
own 
box - and break on different configs, different hw or with unrelated (and 
innocent) kernel changes, sometime later - spreading the pain unnecessarily.

So my feeling is that we really need improved tooling for this (and yes, the 
GCC 
toolchain should have handled this correctly).

But fortunately we have related tooling in the kernel: could objtool handle 
this? 
My secret hope was always that objtool would grow into a kind of life insurance 
against toolchain bogosities (which is a must for things like livepatching or a 
DWARF unwinder - but I digress).

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> I find it rather annoying that gcc before 4.8 malfunctions when it
> sees __aligned__(16) on x86_64 kernels.  Sigh.

Ran into this when writing silly FPU in-kernel testcases a couple of months 
ago...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Herbert Xu
On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
>
> I'm pretty sure we have random asm code that may not maintain a
> 16-byte stack alignment when it calls other code (including, in some
> cases, calling C code).
> 
> So I'm not at all convinced that this is a good idea. We shouldn't
> expect 16-byte alignment to be something trustworthy.

So what if we audited all the x86 assembly code to fix this? Would
it then be acceptable to do a 16-byte aligned stack?

On the face of it it doesn't seem to be a huge amount of code
assuming they mostly live under arch/x86.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Andy Lutomirski
On Tue, Jan 10, 2017 at 10:01 PM, Andy Lutomirski  wrote:
> On Tue, Jan 10, 2017 at 8:35 PM, Herbert Xu  
> wrote:
>> On Tue, Jan 10, 2017 at 08:17:17PM -0800, Linus Torvalds wrote:
>>>
>>> That said, I do think that the "don't assume stack alignment, do it by
>>> hand" may be the safer thing. Because who knows what the random rules
>>> will be on other architectures.
>>
>> Sure we can ban the use of attribute aligned on stacks.  But
>> what about indirect uses through structures?  For example, if
>> someone does
>>
>> struct foo {
>> } __attribute__ ((__aligned__(16)));
>>
>> int bar(...)
>> {
>> struct foo f;
>>
>> return baz(&f);
>> }
>>
>> then baz will end up with an unaligned argument.  The worst part
>> is that it is not at all obvious to the person writing the function
>> bar.
>
> Linus, I'm starting to lean toward agreeing with Herbert here, except
> that we should consider making it conditional on having a silly GCC
> version.  After all, the silly GCC versions are wasting space and time
> with alignment instructions no matter what we do, so this would just
> mean tweaking the asm and adding some kind of check_stack_alignment()
> helper to throw out a WARN_ONCE() if we miss one.  The problem with
> making it conditional is that making pt_regs effectively live at a
> variable offset from %rsp is just nasty.

So actually doing this is gross because we have calls from asm to C
all over the place.  But... maybe we can automate all the testing.
Josh, how hard would it be to teach objtool to (if requested by an
option) check that stack frames with statically known size preserve
16-byte stack alignment?

I find it rather annoying that gcc before 4.8 malfunctions when it
sees __aligned__(16) on x86_64 kernels.  Sigh.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-11 Thread Herbert Xu
On Tue, Jan 10, 2017 at 02:36:21AM +0100, Stephan Müller wrote:
> 
> to all driver maintainers: the patches I added are compile tested, but
> I do not have the hardware to verify the code. May I ask the respective
> hardware maintainers to verify that the code is appropriate and works
> as intended? Thanks a lot.
> 
> Herbert, this is my proprosal for our discussion around copying the
> AAD for algif_aead. Instead of adding the code to algif_aead and wait
> until it transpires to all cipher implementations, I thought it would
> be more helpful to fix all cipher implementations.

I think it's too much churn.  Let's get the algif_aead code fixed
up first, and then pursue this later.

BTW, why are you only doing the copy for encryption?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Herbert Xu
On Tue, Jan 10, 2017 at 05:30:48PM +, Ard Biesheuvel wrote:
>
> Apologies for introducing this breakage. It seemed like an obvious and
> simple cleanup, so I didn't even bother to mention it in the commit
> log, but if the kernel does not guarantee 16 byte alignment, I guess
> we should revert to the old method. If SSE instructions are the only
> ones that require this alignment, then I suppose not having a ABI
> conforming stack pointer should not be an issue in general.

BTW Ard, what is the stack alignment on ARM64?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


arm64 broken

2017-01-11 Thread Rob Rice
I’m working on updating a patchset. The master branch in crypto-2.6 doesn’t 
compile for ARM64. The first couple errors are listed below. A colleague 
believes that the following commit in rc2 fixes the problem.

commit b4b8664d291ac1998e0f0bcdc96b6397f0fe68b3
Author: Al Viro mailto:v...@zeniv.linux.org.uk>>
Date:   Mon Dec 26 04:10:19 2016 -0500

arm64: don't pull uaccess.h into *.S

Split asm-only parts of arm64 uaccess.h into a new header and use that
from *.S.

Signed-off-by: Al Viro mailto:v...@zeniv.linux.org.uk>>

Any chance we could either pull in the fix or move to rc2?

Thanks,

Rob
 

  AS  arch/arm64/kernel/entry.o
In file included from ./include/linux/sched.h:17:0,
 from ./include/linux/uaccess.h:4,
 from arch/arm64/kernel/entry.S:34:
./include/linux/kernel.h:49:0: warning: "ALIGN" redefined
 #define ALIGN(x, a)  __ALIGN_KERNEL((x), (a))
 ^
In file included from arch/arm64/kernel/entry.S:22:0:
./include/linux/linkage.h:78:0: note: this is the location of the previous 
definition
 #define ALIGN __ALIGN
 ^
In file included from ./include/linux/time.h:7:0,
 from ./include/uapi/linux/timex.h:56,
 from ./include/linux/timex.h:56,
 from ./include/linux/sched.h:19,
 from ./include/linux/uaccess.h:4,
 from arch/arm64/kernel/entry.S:34:
./include/linux/time64.h:36:0: warning: "NSEC_PER_SEC" redefined
 #define NSEC_PER_SEC 10L--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 12:09 AM, Herbert Xu
 wrote:
> On Wed, Jan 11, 2017 at 08:06:54AM +, Ard Biesheuvel wrote:
>>
>> Couldn't we update the __aligned(x) macro to emit 32 if arch == x86
>> and x == 16? All other cases should work just fine afaict
>
> Not everyone uses that macro.  You'd also need to add some checks
> to stop people from using the gcc __attribute__ directly.
>

You'd also have to audit things to make sure that __aligned__(16)
isn't being used for non-stack purposes.  After all, __aligned__(16)
in static data is fine, and it's also fine as a promise to GCC that
some object is 16-byte aligned.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 1:09 AM, Daniel Borkmann  wrote:
> Hi Andy,
>
> On 01/11/2017 04:11 AM, Andy Lutomirski wrote:
>>
>> On Tue, Jan 10, 2017 at 4:50 PM, Daniel Borkmann 
>> wrote:
>>>
>>> On 01/11/2017 12:24 AM, Andy Lutomirski wrote:


 This makes it easier to add another digest algorithm down the road if
 needed.  It also serves to force any programs that might have been
 written against a kernel that had the old field name to notice the
 change and make any necessary changes.

 This shouldn't violate any stable API policies, as no released kernel
 has ever had TCA*BPF_DIGEST.
>>>
>>>
>>> Imho, this and patch 6/8 is not really needed. Should there ever
>>> another digest alg be used (doubt it), then you'd need a new nl
>>> attribute and fdinfo line anyway to keep existing stuff intact.
>>> Nobody made the claim that you can just change this underneath
>>> and not respecting abi for existing applications when I read from
>>> above that such apps now will get "forced" to notice a change.
>>
>>
>> Fair enough.  I was more concerned about prerelease iproute2 versions,
>> but maybe that's a nonissue.  I'll drop these two patches.
>
>
> Ok. Sleeping over this a bit, how about a general rename into
> "prog_tag" for fdinfo and TCA_BPF_TAG resp. TCA_ACT_BPF_TAG for
> the netlink attributes, fwiw, it might reduce any assumptions on
> this being made? If this would be preferable, I could cook that
> patch against -net for renaming it?

That would be fine with me.

I think there are two reasonable approaches to computing the actual tag.

1. Use a standard, modern cryptographic hash.  SHA-256, SHA-512,
Blake2b, whatever.  SHA-1 is a bad choice in part because it's partly
broken and in part because the implementation in lib/ is a real mess
to use (as you noticed while writing the code).

2. Use whatever algorithm you like but make the tag so short that it's
obviously not collision-free.  48 or 64 bits is probably reasonable.

The intermediate versions are just asking for trouble.  Alexei wants
to make the tag shorter, but I admit I still don't understand why he
prefers that over using a better crypto hash and letting user code
truncate the tag if it wants.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread Andy Lutomirski
On Wed, Jan 11, 2017 at 7:13 AM, David Laight  wrote:
> From: Andy Lutomirski
>> Sent: 10 January 2017 23:25
>> There are some hashes (e.g. sha224) that have some internal trickery
>> to make sure that only the correct number of output bytes are
>> generated.  If something goes wrong, they could potentially overrun
>> the output buffer.
>>
>> Make the test more robust by allocating only enough space for the
>> correct output size so that memory debugging will catch the error if
>> the output is overrun.
>
> Might be better to test this by allocating an overlong buffer
> and then explicitly checking that the output hasn't overrun
> the allowed space.
>
> If nothing else the error message will be clearer.

I thought about that, but then I'd have to figure out what poison
value to use.  Both KASAN and the usual slab debugging are quite good
at this kind of checking, and KASAN will even point you to the
problematic code directly.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/7] crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64

2017-01-11 Thread Ard Biesheuvel
This is a reimplementation of the NEON version of the bit-sliced AES
algorithm. This code is heavily based on Andy Polyakov's OpenSSL version
for ARM, which is also available in the kernel. This is an alternative for
the existing NEON implementation for arm64 authored by me, which suffers
from poor performance due to its reliance on the pathologically slow four
register variant of the tbl/tbx NEON instruction.

This version is about ~30% (*) faster than the generic C code, but only in
cases where the input can be 8x interleaved (this is a fundamental property
of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR
are implemented. (The significance of ECB is that it could potentially be
used by other chaining modes)

* Measured on Cortex-A57. Note that this is still an order of magnitude
  slower than the implementations that use the dedicated AES instructions
  introduced in ARMv8, but those are part of an optional extension, and so
  it is good to have a fallback.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/Kconfig   |   7 +
 arch/arm64/crypto/Makefile  |   3 +
 arch/arm64/crypto/aes-neonbs-core.S | 963 
 arch/arm64/crypto/aes-neonbs-glue.c | 420 +
 4 files changed, 1393 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 0826f8e599a6..5de75c3dcbd4 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -82,4 +82,11 @@ config CRYPTO_CHACHA20_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
 
+config CRYPTO_AES_ARM64_BS
+   tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_AES_ARM64
+   select CRYPTO_SIMD
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index a893507629eb..d1ae1b9cbe70 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -47,6 +47,9 @@ chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
 aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
 
+obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o
+aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
+
 AFLAGS_aes-ce.o:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o  := -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/aes-neonbs-core.S 
b/arch/arm64/crypto/aes-neonbs-core.S
new file mode 100644
index ..8d0cdaa2768d
--- /dev/null
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -0,0 +1,963 @@
+/*
+ * Bit sliced AES using NEON instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * The algorithm implemented here is described in detail by the paper
+ * 'Faster and Timing-Attack Resistant AES-GCM' by Emilia Kaesper and
+ * Peter Schwabe (https://eprint.iacr.org/2009/129.pdf)
+ *
+ * This implementation is based primarily on the OpenSSL implementation
+ * for 32-bit ARM written by Andy Polyakov 
+ */
+
+#include 
+#include 
+
+   .text
+
+   rounds  .reqx11
+   bskey   .reqx12
+
+   .macro  in_bs_ch, b0, b1, b2, b3, b4, b5, b6, b7
+   eor \b2, \b2, \b1
+   eor \b5, \b5, \b6
+   eor \b3, \b3, \b0
+   eor \b6, \b6, \b2
+   eor \b5, \b5, \b0
+   eor \b6, \b6, \b3
+   eor \b3, \b3, \b7
+   eor \b7, \b7, \b5
+   eor \b3, \b3, \b4
+   eor \b4, \b4, \b5
+   eor \b2, \b2, \b7
+   eor \b3, \b3, \b1
+   eor \b1, \b1, \b5
+   .endm
+
+   .macro  out_bs_ch, b0, b1, b2, b3, b4, b5, b6, b7
+   eor \b0, \b0, \b6
+   eor \b1, \b1, \b4
+   eor \b4, \b4, \b6
+   eor \b2, \b2, \b0
+   eor \b6, \b6, \b1
+   eor \b1, \b1, \b5
+   eor \b5, \b5, \b3
+   eor \b3, \b3, \b7
+   eor \b7, \b7, \b5
+   eor \b2, \b2, \b5
+   eor \b4, \b4, \b7
+   .endm
+
+   .macro  inv_in_bs_ch, b6, b1, b2, b4, b7, b0, b3, b5
+   eor \b1, \b1, \b7
+   eor \b4, \b4, \b7
+   eor \b7, \b7, \b5
+   eor \b1, \b1, \b3
+   eor \b2, \b2, \b5
+   eor \b3, \b3, \b7
+   eor \b6, \b6, \b1
+   eor \b2, \b2, \b0
+   eor \b5, \b5, \b3
+   eor \b4, \b4, \b6
+   eor \b0, \b0, \b6
+   eor \b1, \b1, \b4
+   .endm
+
+   .macro  inv_out_bs_ch, b6, b5, b0, b3, b7, b1, b4, b2

[PATCH v2 4/7] crypto: arm64/aes - add scalar implementation

2017-01-11 Thread Ard Biesheuvel
This adds a scalar implementation of AES, based on the precomputed tables
that are exposed by the generic AES code. Since rotates are cheap on arm64,
this implementation only uses the 4 core tables (of 1 KB each), and avoids
the prerotated ones, reducing the D-cache footprint by 75%.

On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster
than the generic C code. (Note that this is still >13x slower than the code
that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per
byte.)

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/Kconfig   |   4 +
 arch/arm64/crypto/Makefile  |   3 +
 arch/arm64/crypto/aes-cipher-core.S | 127 
 arch/arm64/crypto/aes-cipher-glue.c |  69 +++
 4 files changed, 203 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 0bf0f531f539..0826f8e599a6 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -41,6 +41,10 @@ config CRYPTO_CRC32_ARM64_CE
depends on KERNEL_MODE_NEON && CRC32
select CRYPTO_HASH
 
+config CRYPTO_AES_ARM64
+   tristate "AES core cipher using scalar instructions"
+   select CRYPTO_AES
+
 config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 9d2826c5fccf..a893507629eb 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -44,6 +44,9 @@ sha512-arm64-y := sha512-glue.o sha512-core.o
 obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
 chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 
+obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
+aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
+
 AFLAGS_aes-ce.o:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o  := -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/aes-cipher-core.S 
b/arch/arm64/crypto/aes-cipher-core.S
new file mode 100644
index ..37590ab8121a
--- /dev/null
+++ b/arch/arm64/crypto/aes-cipher-core.S
@@ -0,0 +1,127 @@
+/*
+ * Scalar AES core transform
+ *
+ * Copyright (C) 2017 Linaro Ltd 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+
+   .text
+
+   rk  .reqx0
+   out .reqx1
+   in  .reqx2
+   rounds  .reqx3
+   tt  .reqx4
+   lt  .reqx2
+
+   .macro  __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc
+   ldp \out0, \out1, [rk], #8
+
+   ubfxw13, \in0, #0, #8
+   ubfxw14, \in1, #8, #8
+   ldr w13, [tt, w13, uxtw #2]
+   ldr w14, [tt, w14, uxtw #2]
+
+   .if \enc
+   ubfxw17, \in1, #0, #8
+   ubfxw18, \in2, #8, #8
+   .else
+   ubfxw17, \in3, #0, #8
+   ubfxw18, \in0, #8, #8
+   .endif
+   ldr w17, [tt, w17, uxtw #2]
+   ldr w18, [tt, w18, uxtw #2]
+
+   ubfxw15, \in2, #16, #8
+   ubfxw16, \in3, #24, #8
+   ldr w15, [tt, w15, uxtw #2]
+   ldr w16, [tt, w16, uxtw #2]
+
+   .if \enc
+   ubfx\t0, \in3, #16, #8
+   ubfx\t1, \in0, #24, #8
+   .else
+   ubfx\t0, \in1, #16, #8
+   ubfx\t1, \in2, #24, #8
+   .endif
+   ldr \t0, [tt, \t0, uxtw #2]
+   ldr \t1, [tt, \t1, uxtw #2]
+
+   eor \out0, \out0, w13
+   eor \out1, \out1, w17
+   eor \out0, \out0, w14, ror #24
+   eor \out1, \out1, w18, ror #24
+   eor \out0, \out0, w15, ror #16
+   eor \out1, \out1, \t0, ror #16
+   eor \out0, \out0, w16, ror #8
+   eor \out1, \out1, \t1, ror #8
+   .endm
+
+   .macro  fround, out0, out1, out2, out3, in0, in1, in2, in3
+   __hround\out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1
+   __hround\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1
+   .endm
+
+   .macro  iround, out0, out1, out2, out3, in0, in1, in2, in3
+   __hround\out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0
+   __hround\out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0
+   .endm
+
+   .macro  do_crypt, round, ttab, ltab
+   ldp w5, w6, [in]
+   ldp w7, w8, [in, #8]
+   ldp w9, w10, [rk], #16
+   ldp w11, w12, [rk, #-8]
+
+CPU_BE(rev w5, w5  )
+CPU_BE(rev w6, w6  

[PATCH v2 5/7] crypto: arm/aes - replace scalar AES cipher

2017-01-11 Thread Ard Biesheuvel
This replaces the scalar AES cipher that originates in the OpenSSL project
with a new implementation that is ~15% (*) faster (on modern cores), and
reuses the lookup tables and the key schedule generation routines from the
generic C implementation (which is usually compiled in anyway due to
networking and other subsystems depending on it).

Note that the bit sliced NEON code for AES still depends on the scalar cipher
that this patch replaces, so it is not removed entirely yet.

* On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte
  for 128-bit keys.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/crypto/Kconfig   |  20 +--
 arch/arm/crypto/Makefile  |   4 +-
 arch/arm/crypto/aes-cipher-core.S | 179 
 arch/arm/crypto/aes-cipher-glue.c |  74 
 arch/arm/crypto/aes_glue.c|  98 ---
 5 files changed, 256 insertions(+), 119 deletions(-)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 2f3339f015d3..f1de658c3c8f 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -62,33 +62,15 @@ config CRYPTO_SHA512_ARM
  using optimized ARM assembler and NEON, when available.
 
 config CRYPTO_AES_ARM
-   tristate "AES cipher algorithms (ARM-asm)"
-   depends on ARM
+   tristate "Scalar AES cipher for ARM"
select CRYPTO_ALGAPI
select CRYPTO_AES
help
  Use optimized AES assembler routines for ARM platforms.
 
- AES cipher algorithms (FIPS-197). AES uses the Rijndael
- algorithm.
-
- Rijndael appears to be consistently a very good performer in
- both hardware and software across a wide range of computing
- environments regardless of its use in feedback or non-feedback
- modes. Its key setup time is excellent, and its key agility is
- good. Rijndael's very low memory requirements make it very well
- suited for restricted-space environments, in which it also
- demonstrates excellent performance. Rijndael's operations are
- among the easiest to defend against power and timing attacks.
-
- The AES specifies three key sizes: 128, 192 and 256 bits
-
- See  for more information.
-
 config CRYPTO_AES_ARM_BS
tristate "Bit sliced AES using NEON instructions"
depends on KERNEL_MODE_NEON
-   select CRYPTO_AES_ARM
select CRYPTO_BLKCIPHER
select CRYPTO_SIMD
help
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 8d74e55eacd4..8f5de2db701c 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -27,8 +27,8 @@ $(warning $(ce-obj-y) $(ce-obj-m))
 endif
 endif
 
-aes-arm-y  := aes-armv4.o aes_glue.o
-aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
+aes-arm-y  := aes-cipher-core.o aes-cipher-glue.o
+aes-arm-bs-y   := aes-armv4.o aesbs-core.o aesbs-glue.o
 sha1-arm-y := sha1-armv4-large.o sha1_glue.o
 sha1-arm-neon-y:= sha1-armv7-neon.o sha1_neon_glue.o
 sha256-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha256_neon_glue.o
diff --git a/arch/arm/crypto/aes-cipher-core.S 
b/arch/arm/crypto/aes-cipher-core.S
new file mode 100644
index ..b04261e1e068
--- /dev/null
+++ b/arch/arm/crypto/aes-cipher-core.S
@@ -0,0 +1,179 @@
+/*
+ * Scalar AES core transform
+ *
+ * Copyright (C) 2017 Linaro Ltd.
+ * Author: Ard Biesheuvel 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+
+   .text
+   .align  5
+
+   rk  .reqr0
+   rounds  .reqr1
+   in  .reqr2
+   out .reqr3
+   tt  .reqip
+
+   t0  .reqlr
+   t1  .reqr2
+   t2  .reqr3
+
+   .macro  __select, out, in, idx
+   .if __LINUX_ARM_ARCH__ < 7
+   and \out, \in, #0xff << (8 * \idx)
+   .else
+   ubfx\out, \in, #(8 * \idx), #8
+   .endif
+   .endm
+
+   .macro  __load, out, in, idx
+   .if __LINUX_ARM_ARCH__ < 7 && \idx > 0
+   ldr \out, [tt, \in, lsr #(8 * \idx) - 2]
+   .else
+   ldr \out, [tt, \in, lsl #2]
+   .endif
+   .endm
+
+   .macro  __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc
+   __select\out0, \in0, 0
+   __selectt0, \in1, 1
+   __load  \out0, \out0, 0
+   __load  t0, t0, 1
+
+   .if \enc
+   __select\out1, \in1, 0
+   __selectt1, \in2, 1
+   .else
+   __select\out1, \in3, 0
+   __selectt1, \in0, 1
+   .endif
+   __load  \out1, \out1, 0
+   __selectt2, \in2, 2
+  

[PATCH v2 2/7] crypto: arm/chacha20 - implement NEON version based on SSE3 code

2017-01-11 Thread Ard Biesheuvel
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/crypto/Kconfig  |   6 +
 arch/arm/crypto/Makefile |   2 +
 arch/arm/crypto/chacha20-neon-core.S | 524 
 arch/arm/crypto/chacha20-neon-glue.c | 128 +
 4 files changed, 660 insertions(+)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 13f1b4c289d4..2f3339f015d3 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -130,4 +130,10 @@ config CRYPTO_CRC32_ARM_CE
depends on KERNEL_MODE_NEON && CRC32
select CRYPTO_HASH
 
+config CRYPTO_CHACHA20_NEON
+   tristate "NEON accelerated ChaCha20 symmetric cipher"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_CHACHA20
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index b578a1820ab1..8d74e55eacd4 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -40,6 +41,7 @@ aes-arm-ce-y  := aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/chacha20-neon-core.S 
b/arch/arm/crypto/chacha20-neon-core.S
new file mode 100644
index ..ff1d337bdb4a
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-core.S
@@ -0,0 +1,524 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSE3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+
+   .text
+   .fpuneon
+   .align  5
+
+ENTRY(chacha20_block_xor_neon)
+   // r0: Input state matrix, s
+   // r1: 1 data block output, o
+   // r2: 1 data block input, i
+
+   //
+   // This function encrypts one ChaCha20 block by loading the state matrix
+   // in four NEON registers. It performs matrix operation on four words in
+   // parallel, but requireds shuffling to rearrange the words after each
+   // round.
+   //
+
+   // x0..3 = s0..3
+   add ip, r0, #0x20
+   vld1.32 {q0-q1}, [r0]
+   vld1.32 {q2-q3}, [ip]
+
+   vmovq8, q0
+   vmovq9, q1
+   vmovq10, q2
+   vmovq11, q3
+
+   mov r3, #10
+
+.Ldoubleround:
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vadd.i32q0, q0, q1
+   veorq4, q3, q0
+   vshl.u32q3, q4, #16
+   vsri.u32q3, q4, #16
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vadd.i32q2, q2, q3
+   veorq4, q1, q2
+   vshl.u32q1, q4, #12
+   vsri.u32q1, q4, #20
+
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   vadd.i32q0, q0, q1
+   veorq4, q3, q0
+   vshl.u32q3, q4, #8
+   vsri.u32q3, q4, #24
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   vadd.i32q2, q2, q3
+   veorq4, q1, q2
+   vshl.u32q1, q4, #7
+   vsri.u32q1, q4, #25
+
+   // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   vext.8  q1, q1, q1, #4
+   // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   vext.8  q2, q2, q2, #8
+   // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   vext.8  q3, q3, q3, #12
+
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   vadd.i32q0, q0, q1
+   veorq4, q3, q0
+   vshl.u32q3, q4, #16
+   vsri.u32q3, q4, #16
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   vadd.i32q2, q2, q3
+   veorq4, q1, q2
+   vshl.u32q1, q4, #12
+   vsri.u32   

[PATCH v2 1/7] crypto: arm64/chacha20 - implement NEON version based on SSE3 code

2017-01-11 Thread Ard Biesheuvel
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/Kconfig  |   6 +
 arch/arm64/crypto/Makefile |   3 +
 arch/arm64/crypto/chacha20-neon-core.S | 450 
 arch/arm64/crypto/chacha20-neon-glue.c | 127 ++
 4 files changed, 586 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 450a85df041a..0bf0f531f539 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -72,4 +72,10 @@ config CRYPTO_CRC32_ARM64
depends on ARM64
select CRYPTO_HASH
 
+config CRYPTO_CHACHA20_NEON
+   tristate "NEON accelerated ChaCha20 symmetric cipher"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_CHACHA20
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aad7b744..9d2826c5fccf 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -41,6 +41,9 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
 sha512-arm64-y := sha512-glue.o sha512-core.o
 
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+
 AFLAGS_aes-ce.o:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o  := -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/chacha20-neon-core.S 
b/arch/arm64/crypto/chacha20-neon-core.S
new file mode 100644
index ..13c85e272c2a
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-core.S
@@ -0,0 +1,450 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+
+   .text
+   .align  6
+
+ENTRY(chacha20_block_xor_neon)
+   // x0: Input state matrix, s
+   // x1: 1 data block output, o
+   // x2: 1 data block input, i
+
+   //
+   // This function encrypts one ChaCha20 block by loading the state matrix
+   // in four NEON registers. It performs matrix operation on four words in
+   // parallel, but requires shuffling to rearrange the words after each
+   // round.
+   //
+
+   // x0..3 = s0..3
+   adr x3, ROT8
+   ld1 {v0.4s-v3.4s}, [x0]
+   ld1 {v8.4s-v11.4s}, [x0]
+   ld1 {v12.4s}, [x3]
+
+   mov x3, #10
+
+.Ldoubleround:
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   add v0.4s, v0.4s, v1.4s
+   eor v3.16b, v3.16b, v0.16b
+   rev32   v3.8h, v3.8h
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   add v2.4s, v2.4s, v3.4s
+   eor v4.16b, v1.16b, v2.16b
+   shl v1.4s, v4.4s, #12
+   sri v1.4s, v4.4s, #20
+
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   add v0.4s, v0.4s, v1.4s
+   eor v3.16b, v3.16b, v0.16b
+   tbl v3.16b, {v3.16b}, v12.16b
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   add v2.4s, v2.4s, v3.4s
+   eor v4.16b, v1.16b, v2.16b
+   shl v1.4s, v4.4s, #7
+   sri v1.4s, v4.4s, #25
+
+   // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+   ext v1.16b, v1.16b, v1.16b, #4
+   // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+   ext v2.16b, v2.16b, v2.16b, #8
+   // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+   ext v3.16b, v3.16b, v3.16b, #12
+
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+   add v0.4s, v0.4s, v1.4s
+   eor v3.16b, v3.16b, v0.16b
+   rev32   v3.8h, v3.8h
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+   add v2.4s, v2.4s, v3.4s
+   eor v4.16b, v1.16b, v2.16b
+   shl v1.4s, v4.4s, #12
+   sri v1.4s, v4.4s, #20
+
+   // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+   add v0.4s, v0.4s, v1.4s
+   eor v3.16b, v3.16b, v0.16b
+   tbl v3.16b, {v3.16b}, v12.16b
+
+   // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+   add v2.4s, v2.4s, v3.4s
+   eor v4.16b, v1.16b, v2.16b
+   shl

[PATCH v2 3/7] crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well

2017-01-11 Thread Ard Biesheuvel
In addition to wrapping the AES-CTR cipher into the async SIMD wrapper,
which exposes it as an async skcipher that defers processing to process
context, expose our AES-CTR implementation directly as a synchronous cipher
as well, but with a lower priority.

This makes the AES-CTR transform usable in places where synchronous
transforms are required, such as the MAC802.11 encryption code, which
executes in sotfirq context, where SIMD processing is allowed on arm64.
Users of the async transform will keep the existing behavior.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-glue.c | 25 ++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 4e3f8adb1793..5164aaf82c6a 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -327,6 +327,23 @@ static struct skcipher_alg aes_algs[] = { {
.decrypt= ctr_encrypt,
 }, {
.base = {
+   .cra_name   = "ctr(aes)",
+   .cra_driver_name= "ctr-aes-" MODE,
+   .cra_priority   = PRIO - 1,
+   .cra_blocksize  = 1,
+   .cra_ctxsize= sizeof(struct crypto_aes_ctx),
+   .cra_alignmask  = 7,
+   .cra_module = THIS_MODULE,
+   },
+   .min_keysize= AES_MIN_KEY_SIZE,
+   .max_keysize= AES_MAX_KEY_SIZE,
+   .ivsize = AES_BLOCK_SIZE,
+   .chunksize  = AES_BLOCK_SIZE,
+   .setkey = skcipher_aes_setkey,
+   .encrypt= ctr_encrypt,
+   .decrypt= ctr_encrypt,
+}, {
+   .base = {
.cra_name   = "__xts(aes)",
.cra_driver_name= "__xts-aes-" MODE,
.cra_priority   = PRIO,
@@ -350,8 +367,9 @@ static void aes_exit(void)
 {
int i;
 
-   for (i = 0; i < ARRAY_SIZE(aes_simd_algs) && aes_simd_algs[i]; i++)
-   simd_skcipher_free(aes_simd_algs[i]);
+   for (i = 0; i < ARRAY_SIZE(aes_simd_algs); i++)
+   if (aes_simd_algs[i])
+   simd_skcipher_free(aes_simd_algs[i]);
 
crypto_unregister_skciphers(aes_algs, ARRAY_SIZE(aes_algs));
 }
@@ -370,6 +388,9 @@ static int __init aes_init(void)
return err;
 
for (i = 0; i < ARRAY_SIZE(aes_algs); i++) {
+   if (!(aes_algs[i].base.cra_flags & CRYPTO_ALG_INTERNAL))
+   continue;
+
algname = aes_algs[i].base.cra_name + 2;
drvname = aes_algs[i].base.cra_driver_name + 2;
basename = aes_algs[i].base.cra_driver_name;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-11 Thread Ard Biesheuvel
This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD
AES (using bit slicing). The SIMD algorithms in this series take advantage
of the new skcipher walksize attribute to iterate over the input in the most
efficient manner possible.

Patch #1 adds a NEON implementation of ChaCha20 for ARM.

Patch #2 adds a NEON implementation of ChaCha20 for arm64.

Patch #3 modifies the existing NEON and ARMv8 Crypto Extensions implementations
of AES-CTR to be available as a synchronous skcipher as well. This is intended
for the mac80211 code, which uses synchronous encapsulations of ctr(aes)
[ccm, gcm] in softirq context, during which arm64 supports use of SIMD code.

Patch #4 adds a scalar implementation of AES for arm64, using the key schedule
generation routines and lookup tables of the generic code in crypto/aes_generic.

Patch #5 does the same for ARM, replacing existing scalar code that originated
in the OpenSSL project, and contains redundant key schedule generation routines
and lookup tables (and is slightly slower on modern cores)

Patch #6 replaces the ARM bit sliced NEON code with a new implementation that
has a number of advantages over the original code (which also originated in the
OpenSSL project.) The performance should be identical.

Patch #7 adds a port of the ARM bit-sliced AES code to arm64, in ECB, CBC, CTR
and XTS modes.

Due to the size of patch #7, it may be difficult to apply these patches from
patchwork, so I pushed them here as well:

  git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git crypto-arm-v4.11
  
https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crypto-arm-v4.11

Ard Biesheuvel (7):
  crypto: arm64/chacha20 - implement NEON version based on SSE3 code
  crypto: arm/chacha20 - implement NEON version based on SSE3 code
  crypto: arm64/aes-blk - expose AES-CTR as synchronous cipher as well
  crypto: arm64/aes - add scalar implementation
  crypto: arm/aes - replace scalar AES cipher
  crypto: arm/aes - replace bit-sliced OpenSSL NEON code
  crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for
arm64

 arch/arm/crypto/Kconfig|   27 +-
 arch/arm/crypto/Makefile   |   11 +-
 arch/arm/crypto/aes-armv4.S| 1089 -
 arch/arm/crypto/aes-cipher-core.S  |  179 ++
 arch/arm/crypto/aes-cipher-glue.c  |   74 +
 arch/arm/crypto/aes-neonbs-core.S  | 1021 
 arch/arm/crypto/aes-neonbs-glue.c  |  405 
 arch/arm/crypto/aes_glue.c |   98 -
 arch/arm/crypto/aes_glue.h |   19 -
 arch/arm/crypto/aesbs-core.S_shipped   | 2548 
 arch/arm/crypto/aesbs-glue.c   |  367 ---
 arch/arm/crypto/bsaes-armv7.pl | 2471 ---
 arch/arm/crypto/chacha20-neon-core.S   |  524 
 arch/arm/crypto/chacha20-neon-glue.c   |  128 +
 arch/arm64/crypto/Kconfig  |   17 +
 arch/arm64/crypto/Makefile |9 +
 arch/arm64/crypto/aes-cipher-core.S|  127 +
 arch/arm64/crypto/aes-cipher-glue.c|   69 +
 arch/arm64/crypto/aes-glue.c   |   25 +-
 arch/arm64/crypto/aes-neonbs-core.S|  963 
 arch/arm64/crypto/aes-neonbs-glue.c|  420 
 arch/arm64/crypto/chacha20-neon-core.S |  450 
 arch/arm64/crypto/chacha20-neon-glue.c |  127 +
 23 files changed, 4549 insertions(+), 6619 deletions(-)
 delete mode 100644 arch/arm/crypto/aes-armv4.S
 create mode 100644 arch/arm/crypto/aes-cipher-core.S
 create mode 100644 arch/arm/crypto/aes-cipher-glue.c
 create mode 100644 arch/arm/crypto/aes-neonbs-core.S
 create mode 100644 arch/arm/crypto/aes-neonbs-glue.c
 delete mode 100644 arch/arm/crypto/aes_glue.c
 delete mode 100644 arch/arm/crypto/aes_glue.h
 delete mode 100644 arch/arm/crypto/aesbs-core.S_shipped
 delete mode 100644 arch/arm/crypto/aesbs-glue.c
 delete mode 100644 arch/arm/crypto/bsaes-armv7.pl
 create mode 100644 arch/arm/crypto/chacha20-neon-core.S
 create mode 100644 arch/arm/crypto/chacha20-neon-glue.c
 create mode 100644 arch/arm64/crypto/aes-cipher-core.S
 create mode 100644 arch/arm64/crypto/aes-cipher-glue.c
 create mode 100644 arch/arm64/crypto/aes-neonbs-core.S
 create mode 100644 arch/arm64/crypto/aes-neonbs-glue.c
 create mode 100644 arch/arm64/crypto/chacha20-neon-core.S
 create mode 100644 arch/arm64/crypto/chacha20-neon-glue.c

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-11 Thread David Laight
From: Andy Lutomirski
> Sent: 10 January 2017 23:25
> There are some hashes (e.g. sha224) that have some internal trickery
> to make sure that only the correct number of output bytes are
> generated.  If something goes wrong, they could potentially overrun
> the output buffer.
> 
> Make the test more robust by allocating only enough space for the
> correct output size so that memory debugging will catch the error if
> the output is overrun.

Might be better to test this by allocating an overlong buffer
and then explicitly checking that the output hasn't overrun
the allowed space.

If nothing else the error message will be clearer.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2] crypto: Add IV generation algorithms

2017-01-11 Thread Ondrej Mosnáček
Hi Binoy,

2016-12-13 9:49 GMT+01:00 Binoy Jayan :
> Currently, the iv generation algorithms are implemented in dm-crypt.c.
> The goal is to move these algorithms from the dm layer to the kernel
> crypto layer by implementing them as template ciphers so they can be
> implemented in hardware for performance. As part of this patchset, the
> iv-generation code is moved from the dm layer to the crypto layer and
> adapt the dm-layer to send a whole 'bio' (as defined in the block layer)
> at a time. Each bio contains the in memory representation of physically
> contiguous disk blocks. The dm layer sets up a chained scatterlist of
> these blocks split into physically contiguous segments in memory so that
> DMA can be performed. The iv generation algorithms implemented in geniv.c
> include plain, plain64, essiv, benbi, null, lmk and tcw.

I like what you are trying to achieve, however I don't think the
solution you are heading towards (passing sector number to a special
crypto template) would be the best approach here. Milan is currently
trying to add authenticated encryption support to dm-crypt (see [1])
and as part of this change, a new random IV mode would be introduced.
This mode generates a random IV for each sector write, includes it in
the authenticated data and stores it in the sector's metadata (in a
separate part of the disk). In this case dm-crypt will need to have
control over the IV generation (or at least be able to somehow
retrieve it after the crypto operation).

That said, I believe a different approach would be preferable here. I
would suggest, instead of moving the IV generation to the crypto
layer, to add a new type of request to skcipher API (let's call it
'skcipher_bulk_request'), which could be used to submit several
messages at once (together in a single sg list), each with their own
IV, to a skcipher. This would allow drivers to optimize handling of
such requests (e.g. the SIMD ciphers could call kernel_fpu_begin/end
just once for the whole request). It could be done in such a way, that
implementing this type of requests would be optional and a fallback
implementation, which would just split the request into regular
skcipher_requests, would be automatically set for the ciphers that do
not set it themselves. That way this would require no changes to
crypto drivers in the beginning and optimizations could be added
incrementally.

The advantage of this approach to handling such "bulk" requests is
that crypto drivers could just optimize regular algorithms (xts(aes),
cbc(aes), etc.) and wouldn't need to mess with dm-crypt-specific IV
generation. This also means that other users that could potentially
benefit from bulking requests (perhaps network stack?) could use the
same functionality.

I have been playing with this idea for some time now and I should have
an RFC patchset ready soon...

Binoy, Herbert, what do you think about such approach?

[1] https://www.redhat.com/archives/dm-devel/2017-January/msg00028.html

> When using multiple keys with the original dm-crypt, the key selection is
> made based on the sector number as:
>
> key_index = sector & (key_count - 1)
>
> This restricts the usage of the same key for encrypting/decrypting a
> single bio. One way to solve this is to move the key management code from
> dm-crypt to cryto layer. But this seems tricky when using template ciphers
> because, when multiple ciphers are instantiated from dm layer, each cipher
> instance set with a unique subkey (part of the bigger master key) and
> these instances themselves do not have access to each other's instances
> or contexts. This way, a single instance cannot encryt/decrypt a whole bio.
> This has to be fixed.

Please note that the "keycount" parameter was added to dm-crypt solely
for the purpose of implementing the loop-AES partition format. In
general, the security benefit gained by using keycount > 1 is
debatable, so it does not really make sense to use it for anything
else than accessing legacy loopAES partitions. Since Milan decided to
add it as a generic parameter, instead of hard-coding the
functionality for the LMK mode, it can be technically used also in
other combinations, but IMHO it is perfectly reasonable to just give
up on optimizing the cases when keycount > 1. I believe the loop-AES
partition support is just not that important :)

Thanks,
Ondrej
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] crypto: mediatek - fix format string for 64-bit builds

2017-01-11 Thread Arnd Bergmann
After I enabled COMPILE_TEST for non-ARM targets, I ran into these
warnings:

crypto/mediatek/mtk-aes.c: In function 'mtk_aes_info_map':
crypto/mediatek/mtk-aes.c:224:28: error: format '%d' expects argument of type 
'int', but argument 3 has type 'long unsigned int' [-Werror=format=]
   dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
crypto/mediatek/mtk-sha.c:344:28: error: format '%d' expects argument of type 
'int', but argument 3 has type 'long unsigned int' [-Werror=format=]
crypto/mediatek/mtk-sha.c:550:21: error: format '%u' expects argument of type 
'unsigned int', but argument 4 has type 'size_t {aka long unsigned int}' 
[-Werror=format=]

The correct format for size_t is %zu, so use that in all three
cases.

Fixes: 785e5c616c84 ("crypto: mediatek - Add crypto driver support for some 
MediaTek chips")
Signed-off-by: Arnd Bergmann 
---
 drivers/crypto/mediatek/mtk-aes.c | 2 +-
 drivers/crypto/mediatek/mtk-sha.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/mediatek/mtk-aes.c 
b/drivers/crypto/mediatek/mtk-aes.c
index 3271471060d9..1370cabeeb5b 100644
--- a/drivers/crypto/mediatek/mtk-aes.c
+++ b/drivers/crypto/mediatek/mtk-aes.c
@@ -221,7 +221,7 @@ static int mtk_aes_info_map(struct mtk_cryp *cryp,
aes->ct_dma = dma_map_single(cryp->dev, info, sizeof(*info),
DMA_TO_DEVICE);
if (unlikely(dma_mapping_error(cryp->dev, aes->ct_dma))) {
-   dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
+   dev_err(cryp->dev, "dma %zu bytes error\n", sizeof(*info));
return -EINVAL;
}
aes->tfm_dma = aes->ct_dma + sizeof(*ct);
diff --git a/drivers/crypto/mediatek/mtk-sha.c 
b/drivers/crypto/mediatek/mtk-sha.c
index 89513632c8ed..98b3d74ae23d 100644
--- a/drivers/crypto/mediatek/mtk-sha.c
+++ b/drivers/crypto/mediatek/mtk-sha.c
@@ -341,7 +341,7 @@ static int mtk_sha_info_map(struct mtk_cryp *cryp,
sha->ct_dma = dma_map_single(cryp->dev, info, sizeof(*info),
  DMA_BIDIRECTIONAL);
if (unlikely(dma_mapping_error(cryp->dev, sha->ct_dma))) {
-   dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
+   dev_err(cryp->dev, "dma %zu bytes error\n", sizeof(*info));
return -EINVAL;
}
sha->tfm_dma = sha->ct_dma + sizeof(*ct);
@@ -547,7 +547,7 @@ static int mtk_sha_update_slow(struct mtk_cryp *cryp,
 
final = (ctx->flags & SHA_FLAGS_FINUP) && !ctx->total;
 
-   dev_dbg(cryp->dev, "slow: bufcnt: %u\n", ctx->bufcnt);
+   dev_dbg(cryp->dev, "slow: bufcnt: %zu\n", ctx->bufcnt);
 
if (final) {
sha->flags |= SHA_FLAGS_FINAL;
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: mediatek - remove ARM dependencies

2017-01-11 Thread Arnd Bergmann
Building the mediatek driver on an older ARM architecture results in a
harmless warning:

warning: (ARCH_OMAP2PLUS_TYPICAL && CRYPTO_DEV_MEDIATEK) selects NEON which has 
unmet direct dependencies (VFPv3 && CPU_V7)

We could add an explicit dependency on CPU_V7, but it seems nicer to
open up the build to additional configurations. This replaces the ARM
optimized algorithm selection with the normal one that all other drivers
use, and that in turn lets us relax the dependency on ARM and drop
a number of the unrelated 'select' statements.

Obviously a real user would still select those other optimized drivers
as a fallback, but as there is no strict dependency, we can leave that
up to the user.

Fixes: 785e5c616c84 ("crypto: mediatek - Add crypto driver support for some 
MediaTek chips")
Signed-off-by: Arnd Bergmann 
---
 drivers/crypto/Kconfig | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 8ded3af88b16..9d37ae07b4ce 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -555,15 +555,12 @@ config CRYPTO_DEV_ROCKCHIP
 
 config CRYPTO_DEV_MEDIATEK
tristate "MediaTek's EIP97 Cryptographic Engine driver"
-   depends on ARM && (ARCH_MEDIATEK || COMPILE_TEST)
-   select NEON
-   select KERNEL_MODE_NEON
-   select ARM_CRYPTO
+   depends on (ARM && ARCH_MEDIATEK) || COMPILE_TEST
select CRYPTO_AES
select CRYPTO_BLKCIPHER
-   select CRYPTO_SHA1_ARM_NEON
-   select CRYPTO_SHA256_ARM
-   select CRYPTO_SHA512_ARM
+   select CRYPTO_SHA1
+   select CRYPTO_SHA256
+   select CRYPTO_SHA512
select CRYPTO_HMAC
help
  This driver allows you to utilize the hardware crypto accelerator
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-11 Thread Stephan Müller
Am Mittwoch, 11. Januar 2017, 16:58:17 CET schrieb George Cherian:

Hi George,

> I will add a seperate function for xts setkey and make changes as following.
> > ...
> > 
> >> +
> >> +struct crypto_alg algs[] = { {
> >> +  .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> >> +  .cra_blocksize = AES_BLOCK_SIZE,
> >> +  .cra_ctxsize = sizeof(struct cvm_enc_ctx),
> >> +  .cra_alignmask = 7,
> >> +  .cra_priority = 4001,
> >> +  .cra_name = "xts(aes)",
> >> +  .cra_driver_name = "cavium-xts-aes",
> >> +  .cra_type = &crypto_ablkcipher_type,
> >> +  .cra_u = {
> >> +  .ablkcipher = {
> >> +  .ivsize = AES_BLOCK_SIZE,
> >> +  .min_keysize = AES_MIN_KEY_SIZE,
> >> +  .max_keysize = AES_MAX_KEY_SIZE,
> >> +  .setkey = cvm_enc_dec_setkey,
> > 
> > May I ask how the setkey for XTS is intended to work? The XTS keys are
> > double in size than "normal" keys.
> 
>   .ablkcipher = {
>   .ivsize = AES_BLOCK_SIZE,
>   .min_keysize = 2 * AES_MIN_KEY_SIZE,
>   .max_keysize = 2 * AES_MAX_KEY_SIZE,
>   .setkey = cvm_xts_setkey,
> 
> Hope this is fine?
> 
Sure, please do not forget to invoke xts_verify_key.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] crypto: x86/chacha20 - Manually align stack buffer

2017-01-11 Thread Ard Biesheuvel
On 11 January 2017 at 12:28, Herbert Xu  wrote:
> On Wed, Jan 11, 2017 at 12:14:24PM +, Ard Biesheuvel wrote:
>>
>> I think the old code was fine, actually:
>>
>> u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];
>>
>> ends up allocating 16 + 3 *words* == 64 + 12 bytes , which given the
>> guaranteed 4 byte alignment is sufficient for ensuring the pointer can
>> be 16 byte aligned.
>
> Ah yes you're right, it's a u32.
>
>> So [16 + 2] should be sufficient here
>
> Here's an updated version.
>
> ---8<---
> The kernel on x86-64 cannot use gcc attribute align to align to
> a 16-byte boundary.  This patch reverts to the old way of aligning
> it by hand.
>
> Fixes: 9ae433bc79f9 ("crypto: chacha20 - convert generic and...")
> Signed-off-by: Herbert Xu 
>
> diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
> index 78f75b0..1e6af1b 100644
> --- a/arch/x86/crypto/chacha20_glue.c
> +++ b/arch/x86/crypto/chacha20_glue.c
> @@ -67,10 +67,13 @@ static int chacha20_simd(struct skcipher_request *req)
>  {
> struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -   u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
> +   u32 *state, state_buf[16 + 2] __aligned(8);
> struct skcipher_walk walk;
> int err;
>
> +   BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
> +   state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
> +
> if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
> return crypto_chacha20_crypt(req);
>

Reviewed-by: Ard Biesheuvel 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] crypto: x86/chacha20 - Manually align stack buffer

2017-01-11 Thread Herbert Xu
On Wed, Jan 11, 2017 at 12:14:24PM +, Ard Biesheuvel wrote:
> 
> I think the old code was fine, actually:
> 
> u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];
> 
> ends up allocating 16 + 3 *words* == 64 + 12 bytes , which given the
> guaranteed 4 byte alignment is sufficient for ensuring the pointer can
> be 16 byte aligned.

Ah yes you're right, it's a u32.

> So [16 + 2] should be sufficient here

Here's an updated version.

---8<---
The kernel on x86-64 cannot use gcc attribute align to align to
a 16-byte boundary.  This patch reverts to the old way of aligning
it by hand.

Fixes: 9ae433bc79f9 ("crypto: chacha20 - convert generic and...")
Signed-off-by: Herbert Xu 

diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index 78f75b0..1e6af1b 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -67,10 +67,13 @@ static int chacha20_simd(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-   u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
+   u32 *state, state_buf[16 + 2] __aligned(8);
struct skcipher_walk walk;
int err;
 
+   BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
+   state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
+
if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
return crypto_chacha20_crypt(req);
 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: x86/chacha20 - Manually align stack buffer

2017-01-11 Thread Ard Biesheuvel
On 11 January 2017 at 12:08, Herbert Xu  wrote:
> The kernel on x86-64 cannot use gcc attribute align to align to
> a 16-byte boundary.  This patch reverts to the old way of aligning
> it by hand.
>
> Incidentally the old way was actually broken in not allocating
> enough space and would silently corrupt the stack.  This patch
> fixes it by allocating an extra 8 bytes.
>

I think the old code was fine, actually:

u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];

ends up allocating 16 + 3 *words* == 64 + 12 bytes , which given the
guaranteed 4 byte alignment is sufficient for ensuring the pointer can
be 16 byte aligned.

So [16 + 2] should be sufficient here

> Fixes: 9ae433bc79f9 ("crypto: chacha20 - convert generic and...")
> Signed-off-by: Herbert Xu 
>
> diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
> index 78f75b0..054306d 100644
> --- a/arch/x86/crypto/chacha20_glue.c
> +++ b/arch/x86/crypto/chacha20_glue.c
> @@ -67,10 +67,13 @@ static int chacha20_simd(struct skcipher_request *req)
>  {
> struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -   u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
> +   u32 *state, state_buf[16 + 8] __aligned(8);
> struct skcipher_walk walk;
> int err;
>
> +   BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
> +   state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
> +
> if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
> return crypto_chacha20_crypt(req);
>
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crypto: x86/chacha20 - Manually align stack buffer

2017-01-11 Thread Herbert Xu
The kernel on x86-64 cannot use gcc attribute align to align to
a 16-byte boundary.  This patch reverts to the old way of aligning
it by hand.

Incidentally the old way was actually broken in not allocating
enough space and would silently corrupt the stack.  This patch
fixes it by allocating an extra 8 bytes.

Fixes: 9ae433bc79f9 ("crypto: chacha20 - convert generic and...")
Signed-off-by: Herbert Xu 

diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index 78f75b0..054306d 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -67,10 +67,13 @@ static int chacha20_simd(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-   u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
+   u32 *state, state_buf[16 + 8] __aligned(8);
struct skcipher_walk walk;
int err;
 
+   BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
+   state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
+
if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
return crypto_chacha20_crypt(req);
 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Crypto Fixes for 4.10

2017-01-11 Thread Herbert Xu
Hi Linus:

This push fixes a regression in aesni that renders it useless
if it's built-in with a modular pcbc configuration.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Herbert Xu (1):
  crypto: aesni - Fix failure when built-in with modular pcbc

 arch/x86/crypto/aesni-intel_glue.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 
Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-11 Thread George Cherian

Hi Stephan,

Thanks for pointing it out!!


On 01/11/2017 04:42 PM, Stephan Müller wrote:

Am Mittwoch, 11. Januar 2017, 10:56:50 CET schrieb George Cherian:

Hi George,


+int cvm_enc_dec_setkey(struct crypto_ablkcipher *cipher, const u8 *key,
+  u32 keylen)
+{
+   struct crypto_tfm *tfm = crypto_ablkcipher_tfm(cipher);
+   struct cvm_enc_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   if ((keylen == 16) || (keylen == 24) || (keylen == 32)) {
+   ctx->key_len = keylen;
+   memcpy(ctx->enc_key, key, keylen);
+   return 0;
+   }
+   crypto_ablkcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
+
+   return -EINVAL;
+}



I will add a seperate function for xts setkey and make changes as following.

...

+
+struct crypto_alg algs[] = { {
+   .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
+   .cra_blocksize = AES_BLOCK_SIZE,
+   .cra_ctxsize = sizeof(struct cvm_enc_ctx),
+   .cra_alignmask = 7,
+   .cra_priority = 4001,
+   .cra_name = "xts(aes)",
+   .cra_driver_name = "cavium-xts-aes",
+   .cra_type = &crypto_ablkcipher_type,
+   .cra_u = {
+   .ablkcipher = {
+   .ivsize = AES_BLOCK_SIZE,
+   .min_keysize = AES_MIN_KEY_SIZE,
+   .max_keysize = AES_MAX_KEY_SIZE,
+   .setkey = cvm_enc_dec_setkey,


May I ask how the setkey for XTS is intended to work? The XTS keys are double
in size than "normal" keys.

.ablkcipher = {
.ivsize = AES_BLOCK_SIZE,
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.setkey = cvm_xts_setkey,

Hope this is fine?




+   .encrypt = cvm_aes_encrypt_xts,
+   .decrypt = cvm_aes_decrypt_xts,
+   },



Ciao
Stephan



Regards,
-George
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-11 Thread Stephan Müller
Am Mittwoch, 11. Januar 2017, 10:56:50 CET schrieb George Cherian:

Hi George,

> +int cvm_enc_dec_setkey(struct crypto_ablkcipher *cipher, const u8 *key,
> +u32 keylen)
> +{
> + struct crypto_tfm *tfm = crypto_ablkcipher_tfm(cipher);
> + struct cvm_enc_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> + if ((keylen == 16) || (keylen == 24) || (keylen == 32)) {
> + ctx->key_len = keylen;
> + memcpy(ctx->enc_key, key, keylen);
> + return 0;
> + }
> + crypto_ablkcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
> +
> + return -EINVAL;
> +}

...
> +
> +struct crypto_alg algs[] = { {
> + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> + .cra_blocksize = AES_BLOCK_SIZE,
> + .cra_ctxsize = sizeof(struct cvm_enc_ctx),
> + .cra_alignmask = 7,
> + .cra_priority = 4001,
> + .cra_name = "xts(aes)",
> + .cra_driver_name = "cavium-xts-aes",
> + .cra_type = &crypto_ablkcipher_type,
> + .cra_u = {
> + .ablkcipher = {
> + .ivsize = AES_BLOCK_SIZE,
> + .min_keysize = AES_MIN_KEY_SIZE,
> + .max_keysize = AES_MAX_KEY_SIZE,
> + .setkey = cvm_enc_dec_setkey,

May I ask how the setkey for XTS is intended to work? The XTS keys are double 
in size than "normal" keys.

> + .encrypt = cvm_aes_encrypt_xts,
> + .decrypt = cvm_aes_decrypt_xts,
> + },


Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/3] drivers: crypto: Add Support for Octeon-tx CPT Engine

2017-01-11 Thread George Cherian
Enable the Physical Function driver for the Cavium Crypto Engine (CPT)
found in Octeon-tx series of SoC's. CPT is the Cryptographic Accelaration
Unit. CPT includes microcoded GigaCypher symmetric engines (SEs) and
asymmetric engines (AEs).

Signed-off-by: George Cherian 
Reviewed-by: David Daney 
---
 drivers/crypto/cavium/cpt/Kconfig|  16 +
 drivers/crypto/cavium/cpt/Makefile   |   2 +
 drivers/crypto/cavium/cpt/cpt_common.h   | 158 +++
 drivers/crypto/cavium/cpt/cpt_hw_types.h | 658 
 drivers/crypto/cavium/cpt/cptpf.h|  69 +++
 drivers/crypto/cavium/cpt/cptpf_main.c   | 708 +++
 drivers/crypto/cavium/cpt/cptpf_mbox.c   | 163 +++
 7 files changed, 1774 insertions(+)
 create mode 100644 drivers/crypto/cavium/cpt/Kconfig
 create mode 100644 drivers/crypto/cavium/cpt/Makefile
 create mode 100644 drivers/crypto/cavium/cpt/cpt_common.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_hw_types.h
 create mode 100644 drivers/crypto/cavium/cpt/cptpf.h
 create mode 100644 drivers/crypto/cavium/cpt/cptpf_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cptpf_mbox.c

diff --git a/drivers/crypto/cavium/cpt/Kconfig 
b/drivers/crypto/cavium/cpt/Kconfig
new file mode 100644
index 000..1f6ace3
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/Kconfig
@@ -0,0 +1,16 @@
+#
+# Cavium crypto device configuration
+#
+
+config CRYPTO_DEV_CPT
+   tristate
+
+config CAVIUM_CPT
+   tristate "Cavium Cryptographic Accelerator driver"
+   depends on ARCH_THUNDER || COMPILE_TEST
+   select CRYPTO_DEV_CPT
+   help
+ Support for Cavium CPT block found in octeon-tx series of
+ processors.
+
+ To compile this as a module, choose M here.
diff --git a/drivers/crypto/cavium/cpt/Makefile 
b/drivers/crypto/cavium/cpt/Makefile
new file mode 100644
index 000..fe3d454
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_CAVIUM_CPT) += cptpf.o
+cptpf-objs := cptpf_main.o cptpf_mbox.o
diff --git a/drivers/crypto/cavium/cpt/cpt_common.h 
b/drivers/crypto/cavium/cpt/cpt_common.h
new file mode 100644
index 000..ede612f
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cpt_common.h
@@ -0,0 +1,158 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __CPT_COMMON_H
+#define __CPT_COMMON_H
+
+#include 
+#include 
+#include 
+
+#include "cpt_hw_types.h"
+
+/* Device ID */
+#define CPT_81XX_PCI_PF_DEVICE_ID 0xa040
+#define CPT_81XX_PCI_VF_DEVICE_ID 0xa041
+
+/* flags to indicate the features supported */
+#define CPT_FLAG_MSIX_ENABLED BIT(0)
+#define CPT_FLAG_SRIOV_ENABLED BIT(1)
+#define CPT_FLAG_VF_DRIVER BIT(2)
+#define CPT_FLAG_DEVICE_READY BIT(3)
+
+#define cpt_msix_enabled(cpt) ((cpt)->flags & CPT_FLAG_MSIX_ENABLED)
+#define cpt_sriov_enabled(cpt) ((cpt)->flags & CPT_FLAG_SRIOV_ENABLED)
+#define cpt_vf_driver(cpt) ((cpt)->flags & CPT_FLAG_VF_DRIVER)
+#define cpt_device_ready(cpt) ((cpt)->flags & CPT_FLAG_DEVICE_READY)
+
+#define CPT_MBOX_MSG_TYPE_ACK 1
+#define CPT_MBOX_MSG_TYPE_NACK 2
+#define CPT_MBOX_MSG_TIMEOUT 2000
+#define VF_STATE_DOWN 0
+#define VF_STATE_UP 1
+
+/*
+ * CPT Registers map for 81xx
+ */
+
+/* PF registers */
+#define CPTX_PF_CONSTANTS(a) (0x0ll + ((u64)(a) << 36))
+#define CPTX_PF_RESET(a) (0x100ll + ((u64)(a) << 36))
+#define CPTX_PF_DIAG(a) (0x120ll + ((u64)(a) << 36))
+#define CPTX_PF_BIST_STATUS(a) (0x160ll + ((u64)(a) << 36))
+#define CPTX_PF_ECC0_CTL(a) (0x200ll + ((u64)(a) << 36))
+#define CPTX_PF_ECC0_FLIP(a) (0x210ll + ((u64)(a) << 36))
+#define CPTX_PF_ECC0_INT(a) (0x220ll + ((u64)(a) << 36))
+#define CPTX_PF_ECC0_INT_W1S(a) (0x230ll + ((u64)(a) << 36))
+#define CPTX_PF_ECC0_ENA_W1S(a)(0x240ll + ((u64)(a) << 36))
+#define CPTX_PF_ECC0_ENA_W1C(a)(0x250ll + ((u64)(a) << 36))
+#define CPTX_PF_MBOX_INTX(a, b)\
+   (0x400ll + ((u64)(a) << 36) + ((b) << 3))
+#define CPTX_PF_MBOX_INT_W1SX(a, b) \
+   (0x420ll + ((u64)(a) << 36) + ((b) << 3))
+#define CPTX_PF_MBOX_ENA_W1CX(a, b) \
+   (0x440ll + ((u64)(a) << 36) + ((b) << 3))
+#define CPTX_PF_MBOX_ENA_W1SX(a, b) \
+   (0x460ll + ((u64)(a) << 36) + ((b) << 3))
+#define CPTX_PF_EXEC_INT(a) (0x500ll + 0x10ll * ((a) & 0x1))
+#define CPTX_PF_EXEC_INT_W1S(a)(0x520ll + ((u64)(a) << 36))
+#define CPTX_PF_EXEC_ENA_W1C(a)(0x540ll + ((u64)(a) << 36))
+#define CPTX_PF_EXEC_ENA_W1S(a)(0x560ll + ((u64)(a) << 36))
+#define CPTX_PF_GX_EN(a, b) \
+   (0x600ll + ((u64)(a) << 36) + ((b) << 3))
+#define CPTX_PF_EXEC_INFO(a) (0x700ll + ((u64)(a) << 36))
+#define CPTX_PF_EXEC_BUSY(a) (0x800ll + ((u64)(a) << 36))
+#define CPTX_PF_EXEC_INFO0(a) (0x900ll + ((u64)(a) << 36))
+#define CPTX_PF_EXEC_INFO1(a) (0x910ll + ((u64)(a) << 36))
+#define CPTX_PF_INST_REQ_PC(a)

[PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-11 Thread George Cherian
Enable the CPT VF driver. CPT is the cryptographic Acceleration Unit
in Octeon-tx series of processors.

Signed-off-by: George Cherian 
Reviewed-by: David Daney 
---
 drivers/crypto/cavium/cpt/Makefile   |   3 +-
 drivers/crypto/cavium/cpt/cptvf.h| 135 
 drivers/crypto/cavium/cpt/cptvf_algs.c   | 413 
 drivers/crypto/cavium/cpt/cptvf_algs.h   | 112 
 drivers/crypto/cavium/cpt/cptvf_main.c   | 948 +++
 drivers/crypto/cavium/cpt/cptvf_mbox.c   | 211 ++
 drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 591 +
 drivers/crypto/cavium/cpt/request_manager.h  | 147 +
 8 files changed, 2559 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/cavium/cpt/cptvf.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_mbox.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_reqmanager.c
 create mode 100644 drivers/crypto/cavium/cpt/request_manager.h

diff --git a/drivers/crypto/cavium/cpt/Makefile 
b/drivers/crypto/cavium/cpt/Makefile
index fe3d454..dbf055e 100644
--- a/drivers/crypto/cavium/cpt/Makefile
+++ b/drivers/crypto/cavium/cpt/Makefile
@@ -1,2 +1,3 @@
-obj-$(CONFIG_CAVIUM_CPT) += cptpf.o
+obj-$(CONFIG_CAVIUM_CPT) += cptpf.o cptvf.o
 cptpf-objs := cptpf_main.o cptpf_mbox.o
+cptvf-objs := cptvf_main.o cptvf_reqmanager.o cptvf_mbox.o cptvf_algs.o
diff --git a/drivers/crypto/cavium/cpt/cptvf.h 
b/drivers/crypto/cavium/cpt/cptvf.h
new file mode 100644
index 000..1cc04aa
--- /dev/null
+++ b/drivers/crypto/cavium/cpt/cptvf.h
@@ -0,0 +1,135 @@
+/*
+ * Copyright (C) 2016 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __CPTVF_H
+#define __CPTVF_H
+
+#include 
+#include "cpt_common.h"
+
+/* Default command queue length */
+#define CPT_CMD_QLEN 2046
+#define CPT_CMD_QCHUNK_SIZE 1023
+
+/* Default command timeout in seconds */
+#define CPT_COMMAND_TIMEOUT 4
+#define CPT_TIMER_THOLD0x
+#define CPT_NUM_QS_PER_VF 1
+#define CPT_INST_SIZE 64
+#define CPT_NEXT_CHUNK_PTR_SIZE 8
+
+#defineCPT_VF_MSIX_VECTORS 2
+#define CPT_VF_INTR_MBOX_MASK BIT(0)
+#define CPT_VF_INTR_DOVF_MASK BIT(1)
+#define CPT_VF_INTR_IRDE_MASK BIT(2)
+#define CPT_VF_INTR_NWRP_MASK BIT(3)
+#define CPT_VF_INTR_SERR_MASK BIT(4)
+#define DMA_DIRECT_DIRECT 0 /* Input DIRECT, Output DIRECT */
+#define DMA_GATHER_SCATTER 1
+#define FROM_DPTR 1
+
+/**
+ * Enumeration cpt_vf_int_vec_e
+ *
+ * CPT VF MSI-X Vector Enumeration
+ * Enumerates the MSI-X interrupt vectors.
+ */
+enum cpt_vf_int_vec_e {
+   CPT_VF_INT_VEC_E_MISC = 0x00,
+   CPT_VF_INT_VEC_E_DONE = 0x01
+};
+
+struct command_chunk {
+   u8 *head;
+   dma_addr_t dma_addr;
+   u32 size; /* Chunk size, max CPT_INST_CHUNK_MAX_SIZE */
+   struct hlist_node nextchunk;
+};
+
+struct command_queue {
+   spinlock_t lock; /* command queue lock */
+   u32 idx; /* Command queue host write idx */
+   u32 nchunks; /* Number of command chunks */
+   struct command_chunk *qhead;/* Command queue head, instructions
+* are inserted here
+*/
+   struct hlist_head chead;
+};
+
+struct command_qinfo {
+   u32 cmd_size;
+   u32 qchunksize; /* Command queue chunk size */
+   struct command_queue queue[CPT_NUM_QS_PER_VF];
+};
+
+struct pending_entry {
+   u8 busy; /* Entry status (free/busy) */
+
+   volatile u64 *completion_addr; /* Completion address */
+   void *post_arg;
+   void (*callback)(int, void *); /* Kernel ASYNC request callabck */
+   void *callback_arg; /* Kernel ASYNC request callabck arg */
+};
+
+struct pending_queue {
+   struct pending_entry *head; /* head of the queue */
+   u32 front; /* Process work from here */
+   u32 rear; /* Append new work here */
+   atomic64_t pending_count;
+   spinlock_t lock; /* Queue lock */
+};
+
+struct pending_qinfo {
+   u32 nr_queues;  /* Number of queues supported */
+   u32 qlen; /* Queue length */
+   struct pending_queue queue[CPT_NUM_QS_PER_VF];
+};
+
+#define for_each_pending_queue(qinfo, q, i)\
+   for (i = 0, q = &qinfo->queue[i]; i < qinfo->nr_queues; i++, \
+q = &qinfo->queue[i])
+
+struct cpt_vf {
+   u16 flags; /* Flags to hold device status bits */
+   u8 vfid; /* Device Index 0...CPT_MAX_VF_NUM */
+   u8 vftype; /* VF type of SE_TYPE(1) or AE_TYPE(1) */
+   u8 vfgrp; /* VF group (0 - 8) */
+   u8 node; /* Operating node: Bits (46:44) in BAR0 address */
+   u8 priority; /* VF priority ring: 1-High proirity round
+ * robin ring;

[PATCH v4 3/3] drivers: crypto: Enable CPT options crypto for build

2017-01-11 Thread George Cherian
Add the CPT options in crypto Kconfig and update the
crypto Makefile

Signed-off-by: George Cherian 
Reviewed-by: David Daney 
---
 drivers/crypto/Kconfig  | 1 +
 drivers/crypto/Makefile | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4d2b81f..15f9040 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -484,6 +484,7 @@ config CRYPTO_DEV_MXS_DCP
  will be called mxs-dcp.
 
 source "drivers/crypto/qat/Kconfig"
+source "drivers/crypto/cavium/cpt/Kconfig"
 
 config CRYPTO_DEV_QCE
tristate "Qualcomm crypto engine accelerator"
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index ad7250f..dd33290 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
 obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
+obj-$(CONFIG_CRYPTO_DEV_CPT) += cavium/cpt/
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/3] Add Support for Cavium Cryptographic Acceleration Unit

2017-01-11 Thread George Cherian
This series adds the support for Cavium Cryptographic Accelerarion Unit (CPT) 
CPT is available in Cavium's Octeon-Tx SoC series.
  
The series was tested with ecryptfs and dm-crypt for in kernel cryptographic
offload operations. This driver needs a firmware to work, I will be sending the
firmware to linux-firmware once the driver is accepted.

Changes v3 -> v4
--Addressed Corentin Labbe's coments
- Convert all pr_x to dev_x.
- Fix Typo errors.
- Fix the Double unlock.
- Use sg_virt.
Changes v2 -> v3
-- Addressed David Daney's comments
- There is not much difference in performance readq/writeq vs 
readq_relaxed/writeq_relaxed, so switching to readq/writeq variant.
- Removed the useless bitfield definitions.
- Use GENMASK,dev_to_node() instead of custome functions.
- Use module_pci_driver instead of module_init/exit.
Changes v1 -> v2  
-- Addressed a crash issue when more gather components are passed.
-- Redo the cptvf request manager.
 - Get rid of the un necessary buffer copies.  
-- s/uint*_t/u*
-- Remove unwanted Macro definitions   
-- Remove the redundant ROUNDUP* macros and use kernel function
-- Select proper config option in Kconfig file.
-- Removed some of the unwanted header file inclusions
-- Miscellaneous Cleanup 

George Cherian (3):
  drivers: crypto: Add Support for Octeon-tx CPT Engine
  drivers: crypto: Add the Virtual Function driver for CPT
  drivers: crypto: Enable CPT options crypto for build

 drivers/crypto/Kconfig   |   1 +
 drivers/crypto/Makefile  |   1 +
 drivers/crypto/cavium/cpt/Kconfig|  16 +
 drivers/crypto/cavium/cpt/Makefile   |   3 +
 drivers/crypto/cavium/cpt/cpt_common.h   | 158 +
 drivers/crypto/cavium/cpt/cpt_hw_types.h | 658 +++
 drivers/crypto/cavium/cpt/cptpf.h|  69 ++
 drivers/crypto/cavium/cpt/cptpf_main.c   | 708 
 drivers/crypto/cavium/cpt/cptpf_mbox.c   | 163 +
 drivers/crypto/cavium/cpt/cptvf.h| 135 
 drivers/crypto/cavium/cpt/cptvf_algs.c   | 415 
 drivers/crypto/cavium/cpt/cptvf_algs.h   | 110 
 drivers/crypto/cavium/cpt/cptvf_main.c   | 945 +++
 drivers/crypto/cavium/cpt/cptvf_mbox.c   | 205 ++
 drivers/crypto/cavium/cpt/cptvf_reqmanager.c | 586 +
 drivers/crypto/cavium/cpt/request_manager.h  | 147 +
 16 files changed, 4320 insertions(+)
 create mode 100644 drivers/crypto/cavium/cpt/Kconfig
 create mode 100644 drivers/crypto/cavium/cpt/Makefile
 create mode 100644 drivers/crypto/cavium/cpt/cpt_common.h
 create mode 100644 drivers/crypto/cavium/cpt/cpt_hw_types.h
 create mode 100644 drivers/crypto/cavium/cpt/cptpf.h
 create mode 100644 drivers/crypto/cavium/cpt/cptpf_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cptpf_mbox.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_algs.h
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_main.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_mbox.c
 create mode 100644 drivers/crypto/cavium/cpt/cptvf_reqmanager.c
 create mode 100644 drivers/crypto/cavium/cpt/request_manager.h

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] net: Rename TCA*BPF_DIGEST to ..._SHA256

2017-01-11 Thread Daniel Borkmann

Hi Andy,

On 01/11/2017 04:11 AM, Andy Lutomirski wrote:

On Tue, Jan 10, 2017 at 4:50 PM, Daniel Borkmann  wrote:

On 01/11/2017 12:24 AM, Andy Lutomirski wrote:


This makes it easier to add another digest algorithm down the road if
needed.  It also serves to force any programs that might have been
written against a kernel that had the old field name to notice the
change and make any necessary changes.

This shouldn't violate any stable API policies, as no released kernel
has ever had TCA*BPF_DIGEST.


Imho, this and patch 6/8 is not really needed. Should there ever
another digest alg be used (doubt it), then you'd need a new nl
attribute and fdinfo line anyway to keep existing stuff intact.
Nobody made the claim that you can just change this underneath
and not respecting abi for existing applications when I read from
above that such apps now will get "forced" to notice a change.


Fair enough.  I was more concerned about prerelease iproute2 versions,
but maybe that's a nonissue.  I'll drop these two patches.


Ok. Sleeping over this a bit, how about a general rename into
"prog_tag" for fdinfo and TCA_BPF_TAG resp. TCA_ACT_BPF_TAG for
the netlink attributes, fwiw, it might reduce any assumptions on
this being made? If this would be preferable, I could cook that
patch against -net for renaming it?

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Herbert Xu
On Wed, Jan 11, 2017 at 08:06:54AM +, Ard Biesheuvel wrote:
>
> Couldn't we update the __aligned(x) macro to emit 32 if arch == x86
> and x == 16? All other cases should work just fine afaict

Not everyone uses that macro.  You'd also need to add some checks
to stop people from using the gcc __attribute__ directly.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: x86-64: Maintain 16-byte stack alignment

2017-01-11 Thread Ard Biesheuvel
On 11 January 2017 at 06:53, Linus Torvalds
 wrote:
>
>
> On Jan 10, 2017 8:36 PM, "Herbert Xu"  wrote:
>
>
> Sure we can ban the use of attribute aligned on stacks.  But
> what about indirect uses through structures?
>
>
> It should be pretty trivial to add a sparse warning for that, though.
>

Couldn't we update the __aligned(x) macro to emit 32 if arch == x86
and x == 16? All other cases should work just fine afaict
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html