date:20170112

[PATCH] crypto: testmgr - don't DMA map IV from stack in test_skcipher()

2017-01-12 Thread Horia Geantă

Fix the "DMA-API: device driver maps memory from stack" warning
generated when crypto accelerators map the IV.

Signed-off-by: Horia Geantă 
---
 crypto/testmgr.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 500a5277cc22..64245aeef634 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1081,12 +1081,16 @@ static int __test_skcipher(struct crypto_skcipher *tfm, 
int enc,
const char *e, *d;
struct tcrypt_result result;
void *data;
-   char iv[MAX_IVLEN];
+   char *iv;
char *xbuf[XBUFSIZE];
char *xoutbuf[XBUFSIZE];
int ret = -ENOMEM;
unsigned int ivsize = crypto_skcipher_ivsize(tfm);
 
+   iv = kmalloc(MAX_IVLEN, GFP_KERNEL);
+   if (!iv)
+   return ret;
+
if (testmgr_alloc_buf(xbuf))
goto out_nobuf;
 
@@ -1328,6 +1332,7 @@ static int __test_skcipher(struct crypto_skcipher *tfm, 
int enc,
 out_nooutbuf:
testmgr_free_buf(xbuf);
 out_nobuf:
+   kfree(iv);
return ret;
 }
 
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio-crypto: adjust priority of algorithm

2017-01-12 Thread Gonglei

Some hardware accelerators (like intel aseni or the s390
cpacf functions) have lower priorities than virtio
crypto, and those drivers are faster than the same in
the host via virtio. So let's lower the priority of
virtio-crypto's algorithm, make it's higher than sofeware
implimentations but lower than the hardware ones.

Suggested-by: Christian Borntraeger 
Signed-off-by: Gonglei 
---
 drivers/crypto/virtio/virtio_crypto_algs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index 6f40a42..4de4740 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -498,7 +498,7 @@ void virtio_crypto_ablkcipher_finalize_req(
 static struct crypto_alg virtio_crypto_algs[] = { {
.cra_name = "cbc(aes)",
.cra_driver_name = "virtio_crypto_aes_cbc",
-   .cra_priority = 501,
+   .cra_priority = 150,
.cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize  = sizeof(struct virtio_crypto_ablkcipher_ctx),
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 08:37:18PM -0800, Linus Torvalds wrote:
> On Jan 12, 2017 8:28 PM, "Josh Poimboeuf"  wrote:
> 
> 
> The stack frame was always 16-byte aligned regardless of whether the
> buf array size was even or odd.
> 
> 
> Including with -fomit-frame-pointer?
> 
> With frame pointers, stack frames really are naturally 16 bytes, and then
> keeping the frame 16-byte aligned is just a matter of making any extra
> frame allocations or push/pop sequences that you do also be a multiple of
> 16 bytes.
> 
> But *without* frame pointers, the"native" frame size is just 8 bytes, and a
> function that doesn't need any other local storage and then calls another
> function (think various trivial wrapper functions that just add an argument
> and then munge the return value) would thus naturally cause the frame to
> become misaligned.
> 
> So then the compiler actually needs to start adding useless instructions
> just to keep the stack 16-byte aligned.

Disabling frame pointers didn't seem to help, but I finally got it to
misalign with a different test case.  I think it had been aligning the
array, so instead I made it push a register.


void otherfunc(void);

static inline void bar(int f)
{
register void *__sp asm(_ASM_SP);
asm volatile("call otherfunc" : "+r" (__sp) : "b"(f));
}

void foo(void)
{
bar(5);
}


20f0 :
20f0:   55  push   %rbp
20f1:   48 89 e5mov%rsp,%rbp
20f4:   53  push   %rbx
20f5:   bb 05 00 00 00  mov$0x5,%ebx
20fa:   e8 00 00 00 00  callq  20ff 
20fb: R_X86_64_PC32 otherfunc-0x4
20ff:   5b  pop%rbx
2100:   5d  pop%rbp
2101:   c3  retq   
2102:   0f 1f 40 00 nopl   0x0(%rax)
2106:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
210d:   00 00 00 

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 07:23:18PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf  wrote:
> > On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf  
> >> wrote:
> >> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >> >>  wrote:
> >> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  
> >> >> > wrote:
> >> >> >>
> >> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >> >>
> >> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> >> The stack will still be misaligned by entry code.  Or am I missing
> >> >> >> something?
> >> >> >
> >> >> > I think the argument is that we *could* try to align things, if we
> >> >> > just had some tool that actually then verified that we aren't missing
> >> >> > anything.
> >> >> >
> >> >> > I'm not entirely happy with checking the generated code, though,
> >> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> >> > by mistake. So I'd much rather have some static tool that checks
> >> >> > things at a code level (ie coccinelle or sparse).
> >> >>
> >> >> What I meant was checking the entry code to see if it aligns stack
> >> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> >> >> alignment for real may actually be entirely a lost cause.  After all,
> >> >> I think we have some inline functions that do asm volatile ("call
> >> >> ..."), and I don't see any credible way of forcing alignment short of
> >> >> generating an entirely new stack frame and aligning that.
> >> >
> >> > Actually we already found all such cases and fixed them by forcing a new
> >> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> >>
> >> What I mean is: what guarantees that the stack is properly aligned for
> >> the subroutine call?  gcc promises to set up a stack frame, but does
> >> it promise that rsp will be properly aligned to call a C function?
> >
> > Yes, I did an experiment and you're right.  I had naively assumed that
> > all stack frames would be aligned.
> 
> Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

Yes, but it's too late for me to be doing hard stuff and I think my
first experiment was bogus.  I didn't use all the other kernel-specific
gcc options.

I tried again with all the kernel gcc options, except with
-mpreferred-stack-boundary=4 instead of 3, and actually came up with the
opposite conclusion.

I used the following code:

void otherfunc(void);

static inline void bar(long *f)
{
asm volatile("call otherfunc" : : "m" (f) : );
}

void foo(void)
{
long buf[3] = {0, 0, 0};
bar(buf);
}

The stack frame was always 16-byte aligned regardless of whether the
buf array size was even or odd.

So my half-asleep brain is telling me that my original assumption was
right.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Andy Lutomirski

On Thu, Jan 12, 2017 at 7:11 PM, Josh Poimboeuf  wrote:
> On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf  wrote:
>> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>> >>  wrote:
>> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  
>> >> > wrote:
>> >> >>
>> >> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> >> functions to ensure their stacks are 16-byte aligned.
>> >> >>
>> >> >> It's certainly possible, but I don't see how that solves the problem.
>> >> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> >> something?
>> >> >
>> >> > I think the argument is that we *could* try to align things, if we
>> >> > just had some tool that actually then verified that we aren't missing
>> >> > anything.
>> >> >
>> >> > I'm not entirely happy with checking the generated code, though,
>> >> > because as Ingo says, you have a 50:50 chance of just getting it right
>> >> > by mistake. So I'd much rather have some static tool that checks
>> >> > things at a code level (ie coccinelle or sparse).
>> >>
>> >> What I meant was checking the entry code to see if it aligns stack
>> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> >> alignment for real may actually be entirely a lost cause.  After all,
>> >> I think we have some inline functions that do asm volatile ("call
>> >> ..."), and I don't see any credible way of forcing alignment short of
>> >> generating an entirely new stack frame and aligning that.
>> >
>> > Actually we already found all such cases and fixed them by forcing a new
>> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
>>
>> What I mean is: what guarantees that the stack is properly aligned for
>> the subroutine call?  gcc promises to set up a stack frame, but does
>> it promise that rsp will be properly aligned to call a C function?
>
> Yes, I did an experiment and you're right.  I had naively assumed that
> all stack frames would be aligned.

Just to check: did you do your experiment with -mpreferred-stack-boundary=4?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support

2017-01-12 Thread Eric Biggers

On Thu, Jan 12, 2017 at 01:59:57PM +0100, Ondrej Mosnacek wrote:
> This patch implements bulk request handling in the AES-NI crypto drivers.
> The major advantage of this is that with bulk requests, the kernel_fpu_*
> functions (which are usually quite slow) are now called only once for the 
> whole
> request.
> 

Hi Ondrej,

To what extent does the performance benefit of this patchset result from just
the reduced numbers of calls to kernel_fpu_begin() and kernel_fpu_end()?

If it's most of the benefit, would it make any sense to optimize
kernel_fpu_begin() and kernel_fpu_end() instead?

And if there are other examples besides kernel_fpu_begin/kernel_fpu_end where
the bulk API would provide a significant performance boost, can you mention
them?

Interestingly, the arm64 equivalent to kernel_fpu_begin()
(kernel_neon_begin_partial() in arch/arm64/kernel/fpsimd.c) appears to have an
optimization where the SIMD registers aren't saved if they were already saved.
I wonder why something similar isn't done on x86.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 05:46:55PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf  wrote:
> > On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> >> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >>  wrote:
> >> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  
> >> > wrote:
> >> >>
> >> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> >> functions to ensure their stacks are 16-byte aligned.
> >> >>
> >> >> It's certainly possible, but I don't see how that solves the problem.
> >> >> The stack will still be misaligned by entry code.  Or am I missing
> >> >> something?
> >> >
> >> > I think the argument is that we *could* try to align things, if we
> >> > just had some tool that actually then verified that we aren't missing
> >> > anything.
> >> >
> >> > I'm not entirely happy with checking the generated code, though,
> >> > because as Ingo says, you have a 50:50 chance of just getting it right
> >> > by mistake. So I'd much rather have some static tool that checks
> >> > things at a code level (ie coccinelle or sparse).
> >>
> >> What I meant was checking the entry code to see if it aligns stack
> >> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> >> alignment for real may actually be entirely a lost cause.  After all,
> >> I think we have some inline functions that do asm volatile ("call
> >> ..."), and I don't see any credible way of forcing alignment short of
> >> generating an entirely new stack frame and aligning that.
> >
> > Actually we already found all such cases and fixed them by forcing a new
> > stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> 
> What I mean is: what guarantees that the stack is properly aligned for
> the subroutine call?  gcc promises to set up a stack frame, but does
> it promise that rsp will be properly aligned to call a C function?

Yes, I did an experiment and you're right.  I had naively assumed that
all stack frames would be aligned.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v8 1/1] crypto: add virtio-crypto driver

2017-01-12 Thread Gonglei (Arei)

> 
> On Thu, Jan 12, 2017 at 03:10:25PM +0100, Christian Borntraeger wrote:
> > On 01/10/2017 01:56 PM, Christian Borntraeger wrote:
> > > On 01/10/2017 01:36 PM, Gonglei (Arei) wrote:
> > >> Hi,
> > >>
> > >>>
> > >>> On 12/15/2016 03:03 AM, Gonglei wrote:
> > >>> [...]
> >  +
> >  +static struct crypto_alg virtio_crypto_algs[] = { {
> >  +  .cra_name = "cbc(aes)",
> >  +  .cra_driver_name = "virtio_crypto_aes_cbc",
> >  +  .cra_priority = 501,
> > >>>
> > >>>
> > >>> This is still higher than the hardware-accelerators (like intel aesni 
> > >>> or the
> > >>> s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported
> by the
> > >>> hardware virtualization and available to the guests. I do not see a way
> how
> > >>> virtio
> > >>> crypto can be faster than that (in the end it might be cpacf/aesni +
> overhead)
> > >>> instead it will very likely be slower.
> > >>> So we should use a number that is higher than software implementations
> but
> > >>> lower than the hw ones.
> > >>>
> > >>> Just grepping around, the software ones seem be be around 100 and the
> > >>> hardware
> > >>> ones around 200-400. So why was 150 not enough?
> > >>>
> > >> I didn't find a documentation about how we use the priority, and I 
> > >> assumed
> > >> people use virtio-crypto will configure hardware accelerators in the
> > >> host. So I choosed the number which bigger than aesni's priority.
> > >
> > > Yes, but the aesni driver will only bind if there is HW support in the 
> > > guest.
> > > And if aesni is available in the guest (or the s390 aes function from 
> > > cpacf)
> > > it will always be faster than the same in the host via virtio.So your 
> > > priority
> > > should be smaller.
> >
> >
> > any opinion on this?
> 
> Going forward, we might add an emulated aesni device and that might
> become slower than virtio. OTOH if or when this happens, we can solve it
> by adding a priority or a feature flag to virtio to raise its priority.
> 
> So I think I agree with Christian here, let's lower the priority.
> Gonglei, could you send a patch like this?
> 
OK, will do.

Thanks,
-Gonglei
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Andy Lutomirski

On Thu, Jan 12, 2017 at 12:15 PM, Josh Poimboeuf  wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>>  wrote:
>> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  
>> > wrote:
>> >>
>> >> Just to clarify, I think you're asking if, for versions of gcc which
>> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> >> functions to ensure their stacks are 16-byte aligned.
>> >>
>> >> It's certainly possible, but I don't see how that solves the problem.
>> >> The stack will still be misaligned by entry code.  Or am I missing
>> >> something?
>> >
>> > I think the argument is that we *could* try to align things, if we
>> > just had some tool that actually then verified that we aren't missing
>> > anything.
>> >
>> > I'm not entirely happy with checking the generated code, though,
>> > because as Ingo says, you have a 50:50 chance of just getting it right
>> > by mistake. So I'd much rather have some static tool that checks
>> > things at a code level (ie coccinelle or sparse).
>>
>> What I meant was checking the entry code to see if it aligns stack
>> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
>> alignment for real may actually be entirely a lost cause.  After all,
>> I think we have some inline functions that do asm volatile ("call
>> ..."), and I don't see any credible way of forcing alignment short of
>> generating an entirely new stack frame and aligning that.
>
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

What I mean is: what guarantees that the stack is properly aligned for
the subroutine call?  gcc promises to set up a stack frame, but does
it promise that rsp will be properly aligned to call a C function?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr tt,=crypto_ft_tab'

2017-01-12 Thread kbuild test robot

tree:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
head:   1abee99eafab67fb1c98f9ecfc43cd5735384a86
commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - 
replace scalar AES cipher
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 81edb42629758bacdf813dd5e4542ae26e3ad73a
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/crypto/aes-cipher-core.S: Assembler messages:
   arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not 
support `tt .req ip' in ARM mode
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> tt,=crypto_ft_tab'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r8,[tt,r8,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r9,[tt,r9,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t2,[tt,t2,lsr#(8*2)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r10,[tt,r10,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
t0,[tt,t0,lsr#(8*3)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r11,[tt,r11,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r10,[tt,r10,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r11,[tt,r11,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t2,[tt,t2,lsr#(8*2)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r5,[tt,r5,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
t0,[tt,t0,lsr#(8*3)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r6,[tt,r6,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r4,[tt,r4,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r5,[tt,r5,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t2,[tt,t2,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r6,[tt,r6,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
t0,[tt,t0,lsr#(8*3)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r7,[tt,r7,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r6,[tt,r6,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r7,[tt,r7,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t1,[tt,t1,lsr#(8*1)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t2,[tt,t2,lsr#(8*2)-2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r9,[tt,r9,lsr#(8*2)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
t0,[tt,t0,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r10,[tt,r10,lsr#(8*3)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r8,[tt,r8,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsr#(8*1)-2]'
   arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
r9,[tt,r9,lsl#2]'

vim +174 arch/arm/crypto/aes-cipher-core.S

15  .align  5
16  
17  rk  .reqr0
18  rounds  .reqr1
19  in  .reqr2
20  out .reqr3
  > 21  tt  .reqip
22  
23  t0  .reqlr
24  t1

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Linus Torvalds

On Thu, Jan 12, 2017 at 12:55 PM, Josh Poimboeuf  wrote:
>
> - Who's going to run sparse all the time to catch unauthorized users of
>   __aligned__(16)?

Well, considering that we apparently only have a small handful of
existing users without anybody having ever run any tool at all, I
don't think this is necessarily a huge problem.

One of the build servers could easily add the "make C=2" case to a
build test, and just grep the error reports for the 'excessive
alignment' string. The zero-day build bot already does much fancier
things.

So I don't think it would necessarily be all that hard to get a clean
build, and just say "if you need aligned stack space, you have to do
it yourself by hand".

That saId, if we now always enable frame pointers on x86 (and it has
gotten more and more difficult to avoid it), then the 16-byte
alignment would fairly natural.

The 8-byte alignment mainly makes sense when the basic call sequence
just adds 8 bytes, and you have functions without frames (that still
call other functions).

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 02:15:11PM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
> >  wrote:
> > > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  
> > > wrote:
> > >>
> > >> Just to clarify, I think you're asking if, for versions of gcc which
> > >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> > >> functions to ensure their stacks are 16-byte aligned.
> > >>
> > >> It's certainly possible, but I don't see how that solves the problem.
> > >> The stack will still be misaligned by entry code.  Or am I missing
> > >> something?
> > >
> > > I think the argument is that we *could* try to align things, if we
> > > just had some tool that actually then verified that we aren't missing
> > > anything.
> > >
> > > I'm not entirely happy with checking the generated code, though,
> > > because as Ingo says, you have a 50:50 chance of just getting it right
> > > by mistake. So I'd much rather have some static tool that checks
> > > things at a code level (ie coccinelle or sparse).
> > 
> > What I meant was checking the entry code to see if it aligns stack
> > frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> > alignment for real may actually be entirely a lost cause.  After all,
> > I think we have some inline functions that do asm volatile ("call
> > ..."), and I don't see any credible way of forcing alignment short of
> > generating an entirely new stack frame and aligning that.
> 
> Actually we already found all such cases and fixed them by forcing a new
> stack frame, thanks to objtool.  For example, see 55a76b59b5fe.
> 
> > Ick.  This
> > whole situation stinks, and I wish that the gcc developers had been
> > less daft here in the first place or that we'd noticed and gotten it
> > fixed much longer ago.
> > 
> > Can we come up with a macro like STACK_ALIGN_16 that turns into
> > __aligned__(32) on bad gcc versions and combine that with your sparse
> > patch?

This could work.  Only concerns I'd have are:

- Are there (or will there be in the future) any asm functions which
  assume a 16-byte aligned stack?  (Seems unlikely.  Stack alignment is
  common in the crypto code but they do the alignment manually.)

- Who's going to run sparse all the time to catch unauthorized users of
  __aligned__(16)?

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode

2017-01-12 Thread Ard Biesheuvel

Hi Arnd,

On 12 January 2017 at 19:04, kbuild test robot  wrote:
> tree:   
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git 
> master
> head:   1abee99eafab67fb1c98f9ecfc43cd5735384a86
> commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - 
> replace scalar AES cipher
> config: arm-multi_v7_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout 81edb42629758bacdf813dd5e4542ae26e3ad73a
> # save the attached .config to linux build tree
> make.cross ARCH=arm
>
> All errors (new ones prefixed by >>):
>
>arch/arm/crypto/aes-cipher-core.S: Assembler messages:
>>> arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not 
>>> support `tt .req ip' in ARM mode

Did you ever see this error? This is very odd: .req simply declares an
alias for a register name, and this works fine locally

>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- 
>>> `movw tt,#:lower16:crypto_ft_tab'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- 
>>> `movt tt,#:upper16:crypto_ft_tab'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r8,[tt,r8,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r9,[tt,r9,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t1,[tt,t1,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t2,[tt,t2,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r10,[tt,r10,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r11,[tt,r11,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r10,[tt,r10,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r11,[tt,r11,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t1,[tt,t1,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t2,[tt,t2,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r5,[tt,r5,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> t0,[tt,t0,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r6,[tt,r6,lsl#2]'
>>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>>> r4,[tt,r4,lsl#2]'
>
> vim +21 arch/arm/crypto/aes-cipher-core.S
>
> 15  .align  5
> 16
> 17  rk  .reqr0
> 18  rounds  .reqr1
> 19  in  .reqr2
> 20  out .reqr3
>   > 21  tt  .reqip
> 22
> 23  t0  .reqlr
> 24  t1  .reqr2
> 25  t2  .reqr3
> 26
> 27  .macro  __select, out, in, idx
> 28  .if __LINUX_ARM_ARCH__ < 7
> 29  and \out, \in, #0xff << (8 * \idx)
> 30  .else
> 31  ubfx\out, \in, #(8 * \idx), #8
> 32  .endif
> 33  .endm
> 34
> 35  .macro  __load, out, in, idx
> 36  .if __LINUX_ARM_ARCH__ < 7 && \idx > 0
> 37  ldr \out, [tt, \in, lsr #(8 * \idx) - 2]
> 38  .else
> 39  ldr \out, [tt, \in, lsl #2]
> 40  .endif
> 41  .endm
> 42
> 43  .macro  __hround, out0, out1, in0, in1, in2, in3, t3, 
> t4, enc
> 44  __select\out0, \in0, 0
> 45  __selectt0, \in1, 1
> 46  __load  \out0, \out0, 0
> 47  __load  t0, t0, 1
> 48
> 49  .if \enc
> 50  __select\out1, \in1, 0
> 51  __selectt1, \in2, 1
> 52  .else
> 53  __select\out1, \in3, 0
> 54  __selectt1, \in0, 1
> 55  .endif
> 56  __load  \out1, \out1, 0
> 57  __selectt2, \in2, 2
> 58  __load  t1, t1, 1
> 59  __load  t2, t2, 2
> 60
> 61

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 12:08:07PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
>  wrote:
> > On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  wrote:
> >>
> >> Just to clarify, I think you're asking if, for versions of gcc which
> >> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> >> functions to ensure their stacks are 16-byte aligned.
> >>
> >> It's certainly possible, but I don't see how that solves the problem.
> >> The stack will still be misaligned by entry code.  Or am I missing
> >> something?
> >
> > I think the argument is that we *could* try to align things, if we
> > just had some tool that actually then verified that we aren't missing
> > anything.
> >
> > I'm not entirely happy with checking the generated code, though,
> > because as Ingo says, you have a 50:50 chance of just getting it right
> > by mistake. So I'd much rather have some static tool that checks
> > things at a code level (ie coccinelle or sparse).
> 
> What I meant was checking the entry code to see if it aligns stack
> frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
> alignment for real may actually be entirely a lost cause.  After all,
> I think we have some inline functions that do asm volatile ("call
> ..."), and I don't see any credible way of forcing alignment short of
> generating an entirely new stack frame and aligning that.

Actually we already found all such cases and fixed them by forcing a new
stack frame, thanks to objtool.  For example, see 55a76b59b5fe.

> Ick.  This
> whole situation stinks, and I wish that the gcc developers had been
> less daft here in the first place or that we'd noticed and gotten it
> fixed much longer ago.
> 
> Can we come up with a macro like STACK_ALIGN_16 that turns into
> __aligned__(32) on bad gcc versions and combine that with your sparse
> patch?

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Andy Lutomirski

On Thu, Jan 12, 2017 at 11:51 AM, Linus Torvalds
 wrote:
> On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  wrote:
>>
>> Just to clarify, I think you're asking if, for versions of gcc which
>> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
>> functions to ensure their stacks are 16-byte aligned.
>>
>> It's certainly possible, but I don't see how that solves the problem.
>> The stack will still be misaligned by entry code.  Or am I missing
>> something?
>
> I think the argument is that we *could* try to align things, if we
> just had some tool that actually then verified that we aren't missing
> anything.
>
> I'm not entirely happy with checking the generated code, though,
> because as Ingo says, you have a 50:50 chance of just getting it right
> by mistake. So I'd much rather have some static tool that checks
> things at a code level (ie coccinelle or sparse).

What I meant was checking the entry code to see if it aligns stack
frames, and good luck getting sparse to do that.  Hmm, getting 16-byte
alignment for real may actually be entirely a lost cause.  After all,
I think we have some inline functions that do asm volatile ("call
..."), and I don't see any credible way of forcing alignment short of
generating an entirely new stack frame and aligning that.  Ick.  This
whole situation stinks, and I wish that the gcc developers had been
less daft here in the first place or that we'd noticed and gotten it
fixed much longer ago.

Can we come up with a macro like STACK_ALIGN_16 that turns into
__aligned__(32) on bad gcc versions and combine that with your sparse
patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Linus Torvalds

On Thu, Jan 12, 2017 at 6:02 AM, Josh Poimboeuf  wrote:
>
> Just to clarify, I think you're asking if, for versions of gcc which
> don't support -mpreferred-stack-boundary=3, objtool can analyze all C
> functions to ensure their stacks are 16-byte aligned.
>
> It's certainly possible, but I don't see how that solves the problem.
> The stack will still be misaligned by entry code.  Or am I missing
> something?

I think the argument is that we *could* try to align things, if we
just had some tool that actually then verified that we aren't missing
anything.

I'm not entirely happy with checking the generated code, though,
because as Ingo says, you have a 50:50 chance of just getting it right
by mistake. So I'd much rather have some static tool that checks
things at a code level (ie coccinelle or sparse).

Almost totally untested "sparse" patch appended. The problem with
sparse, obviously, is that few enough people run it, and it gives a
lot of other warnings. But maybe Herbert can test whether this would
actually have caught his situation, doing something like an
allmodconfig build with "C=2" to force a sparse run on everything, and
redirecting the warnings to stderr.

But this patch does seem to give a warning for the patch that Herbert
had, and that caused problems.

And in fact it seems to find a few other possible problems (most, but
not all, in crypto). This run was with the broken chacha20 patch
applied, to verify that I get a warning for that case:

   arch/x86/crypto/chacha20_glue.c:70:13: warning: symbol 'state' has
excessive alignment (16)
   arch/x86/crypto/aesni-intel_glue.c:724:12: warning: symbol 'iv' has
excessive alignment (16)
   arch/x86/crypto/aesni-intel_glue.c:803:12: warning: symbol 'iv' has
excessive alignment (16)
   crypto/shash.c:82:12: warning: symbol 'ubuf' has excessive alignment (16)
   crypto/shash.c:118:12: warning: symbol 'ubuf' has excessive alignment (16)
   drivers/char/hw_random/via-rng.c:89:14: warning: symbol 'buf' has
excessive alignment (16)
   net/bridge/netfilter/ebtables.c:1809:31: warning: symbol 'tinfo'
has excessive alignment (64)
   drivers/crypto/padlock-sha.c:85:14: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:147:14: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:304:12: warning: symbol 'buf' has
excessive alignment (16)
   drivers/crypto/padlock-sha.c:388:12: warning: symbol 'buf' has
excessive alignment (16)
   net/openvswitch/actions.c:797:33: warning: symbol 'ovs_rt' has
excessive alignment (64)
   drivers/net/ethernet/neterion/vxge/vxge-config.c:1006:38: warning:
symbol 'vpath' has excessive alignment (64)

although I think at least some of these happen to be ok.

There are a few places that clearly don't care about exact alignment,
and use "__attribute__((aligned))" without any specific alignment
value.

It's just sparse that thinks that implies 16-byte alignment (it
doesn't, really - it's unspecified, and is telling gcc to use "maximum
useful alignment", so who knows _what_ gcc will assume).

But some of them may well be real issues - if the alignment is about
correctness rather than anything else.

Anyway, the advantage of this kind of source-level check is that it
should really catch things regardless of "luck" wrt alignment.

Linus
 flow.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/flow.c b/flow.c
index 7db9548..c876869 100644
--- a/flow.c
+++ b/flow.c
@@ -601,6 +601,20 @@ static void simplify_one_symbol(struct entrypoint *ep, 
struct symbol *sym)
unsigned long mod;
int all, stores, complex;

+   /*
+* Warn about excessive local variable alignment.
+*
+* This needs to be linked up with some flag to enable
+* it, and specify the alignment. The 'max_int_alignment'
+* just happens to be what we want for the kernel for x86-64.
+*/
+   mod = sym->ctype.modifiers;
+   if (!(mod & (MOD_NONLOCAL | MOD_STATIC))) {
+   unsigned int alignment = sym->ctype.alignment;
+   if (alignment > max_int_alignment)
+   warning(sym->pos, "symbol '%s' has excessive alignment 
(%u)", show_ident(sym->ident), alignment);
+   }
+
/* Never used as a symbol? */
pseudo = sym->pseudo;
if (!pseudo)

[cryptodev:master 43/44] arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not support `tt .req ip' in ARM mode

2017-01-12 Thread kbuild test robot

tree:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
head:   1abee99eafab67fb1c98f9ecfc43cd5735384a86
commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes - 
replace scalar AES cipher
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 81edb42629758bacdf813dd5e4542ae26e3ad73a
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   arch/arm/crypto/aes-cipher-core.S: Assembler messages:
>> arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not 
>> support `tt .req ip' in ARM mode
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `movw 
>> tt,#:lower16:crypto_ft_tab'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `movt 
>> tt,#:upper16:crypto_ft_tab'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r8,[tt,r8,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r9,[tt,r9,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t1,[tt,t1,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t2,[tt,t2,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r10,[tt,r10,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r11,[tt,r11,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r10,[tt,r10,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r11,[tt,r11,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t1,[tt,t1,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t2,[tt,t2,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r5,[tt,r5,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> t0,[tt,t0,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r6,[tt,r6,lsl#2]'
>> arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- `ldr 
>> r4,[tt,r4,lsl#2]'

vim +21 arch/arm/crypto/aes-cipher-core.S

15  .align  5
16  
17  rk  .reqr0
18  rounds  .reqr1
19  in  .reqr2
20  out .reqr3
  > 21  tt  .reqip
22  
23  t0  .reqlr
24  t1  .reqr2
25  t2  .reqr3
26  
27  .macro  __select, out, in, idx
28  .if __LINUX_ARM_ARCH__ < 7
29  and \out, \in, #0xff << (8 * \idx)
30  .else
31  ubfx\out, \in, #(8 * \idx), #8
32  .endif
33  .endm
34  
35  .macro  __load, out, in, idx
36  .if __LINUX_ARM_ARCH__ < 7 && \idx > 0
37  ldr \out, [tt, \in, lsr #(8 * \idx) - 2]
38  .else
39  ldr \out, [tt, \in, lsl #2]
40  .endif
41  .endm
42  
43  .macro  __hround, out0, out1, in0, in1, in2, in3, t3, 
t4, enc
44  __select\out0, \in0, 0
45  __selectt0, \in1, 1
46  __load  \out0, \out0, 0
47  __load  t0, t0, 1
48  
49  .if \enc
50  __select\out1, \in1, 0
51  __selectt1, \in2, 1
52  .else
53  __select\out1, \in3, 0
54  __selectt1, \in0, 1
55  .endif
56  __load  \out1, \out1, 0
57  __selectt2, \in2, 2
58  __load  t1, t1, 1
59  __load  t2, t2, 2
60  
61  eor \out0, \out0, t0, ror #24
62  
63  __selectt0, \in3, 3
64  .if \enc
65  __select\t3, \in3, 2
66  __select\t4, \in0, 3
67  .else
68  __select\t3, \in1, 2
69  __select\t4, \in2, 3
70

[PATCH 0/4] n2rng: add support for m5/m7 rng register layout

2017-01-12 Thread Shannon Nelson

Commit c1e9b3b0eea1 ("hwrng: n2 - Attach on T5/M5, T7/M7 SPARC CPUs")
added config strings to enable the random number generator in the sparc
m5 and m7 platforms.  This worked fine for client LDoms, but not for the
primary LDom, or running on bare metal, because the actual rng hardware
layout changed and self-test would now fail, continually spewing error
messages on the console.

This patch series adds correct support for the new rng register layout,
and adds a limiter to the spewing of error messages.

Orabug: 25127795

Shannon Nelson (4):
  n2rng: limit error spewage when self-test fails
  n2rng: add device data descriptions
  n2rng: support new hardware register layout
  n2rng: update version info

 drivers/char/hw_random/n2-drv.c |  204 +--
 drivers/char/hw_random/n2rng.h  |   51 --
 2 files changed, 196 insertions(+), 59 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] n2rng: update version info

2017-01-12 Thread Shannon Nelson

Signed-off-by: Shannon Nelson 
---
 drivers/char/hw_random/n2-drv.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index f0bd5ee..31cbdbb 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -21,11 +21,11 @@
 
 #define DRV_MODULE_NAME"n2rng"
 #define PFX DRV_MODULE_NAME": "
-#define DRV_MODULE_VERSION "0.2"
-#define DRV_MODULE_RELDATE "July 27, 2011"
+#define DRV_MODULE_VERSION "0.3"
+#define DRV_MODULE_RELDATE "Jan 7, 2017"
 
 static char version[] =
-   DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+   DRV_MODULE_NAME " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
 
 MODULE_AUTHOR("David S. Miller (da...@davemloft.net)");
 MODULE_DESCRIPTION("Niagara2 RNG driver");
@@ -765,7 +765,7 @@ static int n2rng_probe(struct platform_device *op)
  "multi-unit-capable" : "single-unit"),
 np->num_units);
 
-   np->hwrng.name = "n2rng";
+   np->hwrng.name = DRV_MODULE_NAME;
np->hwrng.data_read = n2rng_data_read;
np->hwrng.priv = (unsigned long) np;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] n2rng: limit error spewage when self-test fails

2017-01-12 Thread Shannon Nelson

If the self-test fails, it probably won't actually suddenly
start working.  Currently, this causes an endless spew of
error messages on the console and in the logs, so this patch
adds a limiter to the test.

Reported-by: Sowmini Varadhan 
Signed-off-by: Shannon Nelson 
---
 drivers/char/hw_random/n2-drv.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index 3b06c1d..102560f 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -589,6 +589,7 @@ static void n2rng_work(struct work_struct *work)
 {
struct n2rng *np = container_of(work, struct n2rng, work.work);
int err = 0;
+   static int retries = 4;
 
if (!(np->flags & N2RNG_FLAG_CONTROL)) {
err = n2rng_guest_check(np);
@@ -606,7 +607,9 @@ static void n2rng_work(struct work_struct *work)
dev_info(&np->op->dev, "RNG ready\n");
}
 
-   if (err && !(np->flags & N2RNG_FLAG_SHUTDOWN))
+   if (--retries == 0)
+   dev_err(&np->op->dev, "Self-test retries failed, RNG not 
ready\n");
+   else if (err && !(np->flags & N2RNG_FLAG_SHUTDOWN))
schedule_delayed_work(&np->work, HZ * 2);
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] n2rng: add device data descriptions

2017-01-12 Thread Shannon Nelson

Since we're going to need to keep track of more than just one
attribute of the hardware, we'll change the use of the data field
from the match struct from a single flag to a struct pointer.
This patch adds the struct template and initial descriptions.

Signed-off-by: Shannon Nelson 
---
 drivers/char/hw_random/n2-drv.c |   47 --
 drivers/char/hw_random/n2rng.h  |   15 
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index 102560f..74c26c7 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -625,24 +625,23 @@ static void n2rng_driver_version(void)
 static int n2rng_probe(struct platform_device *op)
 {
const struct of_device_id *match;
-   int multi_capable;
int err = -ENOMEM;
struct n2rng *np;
 
match = of_match_device(n2rng_match, &op->dev);
if (!match)
return -EINVAL;
-   multi_capable = (match->data != NULL);
 
n2rng_driver_version();
np = devm_kzalloc(&op->dev, sizeof(*np), GFP_KERNEL);
if (!np)
goto out;
np->op = op;
+   np->data = (struct n2rng_template *)match->data;
 
INIT_DELAYED_WORK(&np->work, n2rng_work);
 
-   if (multi_capable)
+   if (np->data->multi_capable)
np->flags |= N2RNG_FLAG_MULTI;
 
err = -ENODEV;
@@ -673,8 +672,9 @@ static int n2rng_probe(struct platform_device *op)
dev_err(&op->dev, "VF RNG lacks rng-#units property\n");
goto out_hvapi_unregister;
}
-   } else
+   } else {
np->num_units = 1;
+   }
 
dev_info(&op->dev, "Registered RNG HVAPI major %lu minor %lu\n",
 np->hvapi_major, np->hvapi_minor);
@@ -731,30 +731,61 @@ static int n2rng_remove(struct platform_device *op)
return 0;
 }
 
+static struct n2rng_template n2_template = {
+   .id = N2_n2_rng,
+   .multi_capable = 0,
+   .chip_version = 1,
+};
+
+static struct n2rng_template vf_template = {
+   .id = N2_vf_rng,
+   .multi_capable = 1,
+   .chip_version = 1,
+};
+
+static struct n2rng_template kt_template = {
+   .id = N2_kt_rng,
+   .multi_capable = 1,
+   .chip_version = 1,
+};
+
+static struct n2rng_template m4_template = {
+   .id = N2_m4_rng,
+   .multi_capable = 1,
+   .chip_version = 2,
+};
+
+static struct n2rng_template m7_template = {
+   .id = N2_m7_rng,
+   .multi_capable = 1,
+   .chip_version = 2,
+};
+
 static const struct of_device_id n2rng_match[] = {
{
.name   = "random-number-generator",
.compatible = "SUNW,n2-rng",
+   .data   = &n2_template,
},
{
.name   = "random-number-generator",
.compatible = "SUNW,vf-rng",
-   .data   = (void *) 1,
+   .data   = &vf_template,
},
{
.name   = "random-number-generator",
.compatible = "SUNW,kt-rng",
-   .data   = (void *) 1,
+   .data   = &kt_template,
},
{
.name   = "random-number-generator",
.compatible = "ORCL,m4-rng",
-   .data   = (void *) 1,
+   .data   = &m4_template,
},
{
.name   = "random-number-generator",
.compatible = "ORCL,m7-rng",
-   .data   = (void *) 1,
+   .data   = &m7_template,
},
{},
 };
diff --git a/drivers/char/hw_random/n2rng.h b/drivers/char/hw_random/n2rng.h
index f244ac8..e41e55a 100644
--- a/drivers/char/hw_random/n2rng.h
+++ b/drivers/char/hw_random/n2rng.h
@@ -60,6 +60,20 @@ extern unsigned long sun4v_rng_data_read_diag_v2(unsigned 
long data_ra,
 extern unsigned long sun4v_rng_data_read(unsigned long data_ra,
 unsigned long *tick_delta);
 
+enum n2rng_compat_id {
+   N2_n2_rng,
+   N2_vf_rng,
+   N2_kt_rng,
+   N2_m4_rng,
+   N2_m7_rng,
+};
+
+struct n2rng_template {
+   enum n2rng_compat_id id;
+   int multi_capable;
+   int chip_version;
+};
+
 struct n2rng_unit {
u64 control[HV_RNG_NUM_CONTROL];
 };
@@ -74,6 +88,7 @@ struct n2rng {
 #define N2RNG_FLAG_SHUTDOWN0x0010 /* Driver unregistering*/
 #define N2RNG_FLAG_BUFFER_VALID0x0020 /* u32 buffer holds valid 
data */
 
+   struct n2rng_template   *data;
int num_units;
struct n2rng_unit   *units;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordo

[PATCH 3/4] n2rng: support new hardware register layout

2017-01-12 Thread Shannon Nelson

Add the new register layout constants and the requisite logic
for using them.

Signed-off-by: Shannon Nelson 
---
 drivers/char/hw_random/n2-drv.c |  144 +--
 drivers/char/hw_random/n2rng.h  |   36 +++---
 2 files changed, 134 insertions(+), 46 deletions(-)

diff --git a/drivers/char/hw_random/n2-drv.c b/drivers/char/hw_random/n2-drv.c
index 74c26c7..f0bd5ee 100644
--- a/drivers/char/hw_random/n2-drv.c
+++ b/drivers/char/hw_random/n2-drv.c
@@ -302,26 +302,57 @@ static int n2rng_try_read_ctl(struct n2rng *np)
return n2rng_hv_err_trans(hv_err);
 }
 
-#define CONTROL_DEFAULT_BASE   \
-   ((2 << RNG_CTL_ASEL_SHIFT) |\
-(N2RNG_ACCUM_CYCLES_DEFAULT << RNG_CTL_WAIT_SHIFT) |   \
-RNG_CTL_LFSR)
-
-#define CONTROL_DEFAULT_0  \
-   (CONTROL_DEFAULT_BASE | \
-(1 << RNG_CTL_VCO_SHIFT) | \
-RNG_CTL_ES1)
-#define CONTROL_DEFAULT_1  \
-   (CONTROL_DEFAULT_BASE | \
-(2 << RNG_CTL_VCO_SHIFT) | \
-RNG_CTL_ES2)
-#define CONTROL_DEFAULT_2  \
-   (CONTROL_DEFAULT_BASE | \
-(3 << RNG_CTL_VCO_SHIFT) | \
-RNG_CTL_ES3)
-#define CONTROL_DEFAULT_3  \
-   (CONTROL_DEFAULT_BASE | \
-RNG_CTL_ES1 | RNG_CTL_ES2 | RNG_CTL_ES3)
+static u64 n2rng_control_default(struct n2rng *np, int ctl)
+{
+   u64 val = 0;
+
+   if (np->data->chip_version == 1) {
+   val = ((2 << RNG_v1_CTL_ASEL_SHIFT) |
+   (N2RNG_ACCUM_CYCLES_DEFAULT << RNG_v1_CTL_WAIT_SHIFT) |
+RNG_CTL_LFSR);
+
+   switch (ctl) {
+   case 0:
+   val |= (1 << RNG_v1_CTL_VCO_SHIFT) | RNG_CTL_ES1;
+   break;
+   case 1:
+   val |= (2 << RNG_v1_CTL_VCO_SHIFT) | RNG_CTL_ES2;
+   break;
+   case 2:
+   val |= (3 << RNG_v1_CTL_VCO_SHIFT) | RNG_CTL_ES3;
+   break;
+   case 3:
+   val |= RNG_CTL_ES1 | RNG_CTL_ES2 | RNG_CTL_ES3;
+   break;
+   default:
+   break;
+   }
+
+   } else {
+   val = ((2 << RNG_v2_CTL_ASEL_SHIFT) |
+   (N2RNG_ACCUM_CYCLES_DEFAULT << RNG_v2_CTL_WAIT_SHIFT) |
+RNG_CTL_LFSR);
+
+   switch (ctl) {
+   case 0:
+   val |= (1 << RNG_v2_CTL_VCO_SHIFT) | RNG_CTL_ES1;
+   break;
+   case 1:
+   val |= (2 << RNG_v2_CTL_VCO_SHIFT) | RNG_CTL_ES2;
+   break;
+   case 2:
+   val |= (3 << RNG_v2_CTL_VCO_SHIFT) | RNG_CTL_ES3;
+   break;
+   case 3:
+   val |= RNG_CTL_ES1 | RNG_CTL_ES2 | RNG_CTL_ES3;
+   break;
+   default:
+   break;
+   }
+   }
+
+   return val;
+}
 
 static void n2rng_control_swstate_init(struct n2rng *np)
 {
@@ -336,10 +367,10 @@ static void n2rng_control_swstate_init(struct n2rng *np)
for (i = 0; i < np->num_units; i++) {
struct n2rng_unit *up = &np->units[i];
 
-   up->control[0] = CONTROL_DEFAULT_0;
-   up->control[1] = CONTROL_DEFAULT_1;
-   up->control[2] = CONTROL_DEFAULT_2;
-   up->control[3] = CONTROL_DEFAULT_3;
+   up->control[0] = n2rng_control_default(np, 0);
+   up->control[1] = n2rng_control_default(np, 1);
+   up->control[2] = n2rng_control_default(np, 2);
+   up->control[3] = n2rng_control_default(np, 3);
}
 
np->hv_state = HV_RNG_STATE_UNCONFIGURED;
@@ -399,6 +430,7 @@ static int n2rng_data_read(struct hwrng *rng, u32 *data)
} else {
int err = n2rng_generic_read_data(ra);
if (!err) {
+   np->flags |= N2RNG_FLAG_BUFFER_VALID;
np->buffer = np->test_data >> 32;
*data = np->test_data & 0x;
len = 4;
@@ -487,9 +519,21 @@ static void n2rng_dump_test_buffer(struct n2rng *np)
 
 static int n2rng_check_selftest_buffer(struct n2rng *np, unsigned long unit)
 {
-   u64 val = SELFTEST_VAL;
+   u64 val;
int err, matches, limit;
 
+   switch (np->data->id) {
+   case N2_n2_rng:
+   case N2_vf_rng:
+   case N2_kt_rng:
+   case N2_m4_rng:  /* yes, m4 uses the old value */
+   val = RNG_v1_SELFTEST_VAL;
+   break;
+   default:
+   val = RNG_v2_SELFTEST_VAL;
+   break;
+   }
+
matches = 0;
for (limit = 0; limit < SELFTEST_LOOPS_MAX; limit++) {
matches += n2rng_test_buffer_find(np, val);
@@ -5

Re: [PATCH v1 3/8] crypto:chcr- Fix key length for RFC4106

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 10:08:46PM +0530, Harsh Jain wrote:
>
> That case is already handled in next if condition.It will error out with 
> -EINVAL in next condition.
> 
> if (keylen == AES_KEYSIZE_128) {

Good point.  Please split the patches according to whether they
should go into 4.10/4.11 and then resubmit.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-12 Thread Ard Biesheuvel

On 12 January 2017 at 16:45, Herbert Xu  wrote:
> On Wed, Jan 11, 2017 at 04:41:48PM +, Ard Biesheuvel wrote:
>> This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD
>> AES (using bit slicing). The SIMD algorithms in this series take advantage
>> of the new skcipher walksize attribute to iterate over the input in the most
>> efficient manner possible.
>>
>> Patch #1 adds a NEON implementation of ChaCha20 for ARM.
>>
>> Patch #2 adds a NEON implementation of ChaCha20 for arm64.
>>
>> Patch #3 modifies the existing NEON and ARMv8 Crypto Extensions 
>> implementations
>> of AES-CTR to be available as a synchronous skcipher as well. This is 
>> intended
>> for the mac80211 code, which uses synchronous encapsulations of ctr(aes)
>> [ccm, gcm] in softirq context, during which arm64 supports use of SIMD code.
>>
>> Patch #4 adds a scalar implementation of AES for arm64, using the key 
>> schedule
>> generation routines and lookup tables of the generic code in 
>> crypto/aes_generic.
>>
>> Patch #5 does the same for ARM, replacing existing scalar code that 
>> originated
>> in the OpenSSL project, and contains redundant key schedule generation 
>> routines
>> and lookup tables (and is slightly slower on modern cores)
>>
>> Patch #6 replaces the ARM bit sliced NEON code with a new implementation that
>> has a number of advantages over the original code (which also originated in 
>> the
>> OpenSSL project.) The performance should be identical.
>>
>> Patch #7 adds a port of the ARM bit-sliced AES code to arm64, in ECB, CBC, 
>> CTR
>> and XTS modes.
>>
>> Due to the size of patch #7, it may be difficult to apply these patches from
>> patchwork, so I pushed them here as well:
>
> It seems to have made it.
>
> All applied.  Thanks.

Actually, patch #6 was the huge one not #7, and I don't see it in your tree yet.

https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/commit/?h=crypto-arm-v4.11&id=cbf03b255f7c

The order does not matter, though, so could you please put it on top? Thanks.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] crypto: mediatek - fix format string for 64-bit builds

2017-01-12 Thread Herbert Xu

On Wed, Jan 11, 2017 at 02:55:20PM +0100, Arnd Bergmann wrote:
> After I enabled COMPILE_TEST for non-ARM targets, I ran into these
> warnings:
> 
> crypto/mediatek/mtk-aes.c: In function 'mtk_aes_info_map':
> crypto/mediatek/mtk-aes.c:224:28: error: format '%d' expects argument of type 
> 'int', but argument 3 has type 'long unsigned int' [-Werror=format=]
>dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
> crypto/mediatek/mtk-sha.c:344:28: error: format '%d' expects argument of type 
> 'int', but argument 3 has type 'long unsigned int' [-Werror=format=]
> crypto/mediatek/mtk-sha.c:550:21: error: format '%u' expects argument of type 
> 'unsigned int', but argument 4 has type 'size_t {aka long unsigned int}' 
> [-Werror=format=]
> 
> The correct format for size_t is %zu, so use that in all three
> cases.
> 
> Fixes: 785e5c616c84 ("crypto: mediatek - Add crypto driver support for some 
> MediaTek chips")
> Signed-off-by: Arnd Bergmann 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/7] crypto: ARM/arm64 - AES and ChaCha20 updates for v4.11

2017-01-12 Thread Herbert Xu

On Wed, Jan 11, 2017 at 04:41:48PM +, Ard Biesheuvel wrote:
> This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD
> AES (using bit slicing). The SIMD algorithms in this series take advantage
> of the new skcipher walksize attribute to iterate over the input in the most
> efficient manner possible.
> 
> Patch #1 adds a NEON implementation of ChaCha20 for ARM.
> 
> Patch #2 adds a NEON implementation of ChaCha20 for arm64.
> 
> Patch #3 modifies the existing NEON and ARMv8 Crypto Extensions 
> implementations
> of AES-CTR to be available as a synchronous skcipher as well. This is intended
> for the mac80211 code, which uses synchronous encapsulations of ctr(aes)
> [ccm, gcm] in softirq context, during which arm64 supports use of SIMD code.
> 
> Patch #4 adds a scalar implementation of AES for arm64, using the key schedule
> generation routines and lookup tables of the generic code in 
> crypto/aes_generic.
> 
> Patch #5 does the same for ARM, replacing existing scalar code that originated
> in the OpenSSL project, and contains redundant key schedule generation 
> routines
> and lookup tables (and is slightly slower on modern cores)
> 
> Patch #6 replaces the ARM bit sliced NEON code with a new implementation that
> has a number of advantages over the original code (which also originated in 
> the
> OpenSSL project.) The performance should be identical.
> 
> Patch #7 adds a port of the ARM bit-sliced AES code to arm64, in ECB, CBC, CTR
> and XTS modes.
> 
> Due to the size of patch #7, it may be difficult to apply these patches from
> patchwork, so I pushed them here as well:

It seems to have made it.

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: mediatek - remove ARM dependencies

2017-01-12 Thread Herbert Xu

On Wed, Jan 11, 2017 at 02:50:19PM +0100, Arnd Bergmann wrote:
> Building the mediatek driver on an older ARM architecture results in a
> harmless warning:
> 
> warning: (ARCH_OMAP2PLUS_TYPICAL && CRYPTO_DEV_MEDIATEK) selects NEON which 
> has unmet direct dependencies (VFPv3 && CPU_V7)
> 
> We could add an explicit dependency on CPU_V7, but it seems nicer to
> open up the build to additional configurations. This replaces the ARM
> optimized algorithm selection with the normal one that all other drivers
> use, and that in turn lets us relax the dependency on ARM and drop
> a number of the unrelated 'select' statements.
> 
> Obviously a real user would still select those other optimized drivers
> as a fallback, but as there is no strict dependency, we can leave that
> up to the user.
> 
> Fixes: 785e5c616c84 ("crypto: mediatek - Add crypto driver support for some 
> MediaTek chips")
> Signed-off-by: Arnd Bergmann 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 8/8] crypto/testmgr: Allocate only the required output size for hash tests

2017-01-12 Thread Herbert Xu

On Tue, Jan 10, 2017 at 03:24:46PM -0800, Andy Lutomirski wrote:
> There are some hashes (e.g. sha224) that have some internal trickery
> to make sure that only the correct number of output bytes are
> generated.  If something goes wrong, they could potentially overrun
> the output buffer.
> 
> Make the test more robust by allocating only enough space for the
> correct output size so that memory debugging will catch the error if
> the output is overrun.
> 
> Tested by intentionally breaking sha224 to output all 256
> internally-generated bits while running on KASAN.
> 
> Cc: Ard Biesheuvel 
> Cc: Herbert Xu 
> Signed-off-by: Andy Lutomirski 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crypto: mediatek: don't return garbage err on successful return

2017-01-12 Thread Herbert Xu

On Tue, Jan 03, 2017 at 01:21:22PM +, Colin King wrote:
> From: Colin Ian King 
> 
> In the case where keylen <= bs mtk_sha_setkey returns an uninitialized
> return value in err.  Fix this by returning 0 instead of err.
> 
> Issue detected by static analysis with cppcheck.
> 
> Signed-off-by: Colin Ian King 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] crypto: picoxcell - Cleanups removing non-DT code

2017-01-12 Thread Herbert Xu

On Mon, Jan 02, 2017 at 02:06:56PM -0300, Javier Martinez Canillas wrote:
> Hello,
> 
> This small series contains a couple of cleanups that removes some driver's 
> code
> that isn't needed due the driver being for a DT-only platform.
> 
> The changes were suggested by Arnd Bergmann as a response to a previous patch:
> https://lkml.org/lkml/2017/1/2/342
> 
> Patch #1 allows the driver to be built when the COMPILE_TEST option is 
> enabled.
> Patch #2 removes the platform ID table since isn't needed for DT-only drivers.
> Patch #3 removes a wrapper function that's also not needed if driver is 
> DT-only.

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crypto: Replaced gcc specific attributes with macros from compiler.h

2017-01-12 Thread Herbert Xu

On Sat, Dec 31, 2016 at 09:26:23PM +0530, gidisr...@gmail.com wrote:
> From: Gideon Israel Dsouza 
> 
> Continuing from this commit: 52f5684c8e1e
> ("kernel: use macros from compiler.h instead of __attribute__((...))")
> 
> I submitted 4 total patches. They are part of task I've taken up to
> increase compiler portability in the kernel. I've cleaned up the
> subsystems under /kernel /mm /block and /security, this patch targets
> /crypto.
> 
> There is  which provides macros for various gcc specific
> constructs. Eg: __weak for __attribute__((weak)). I've cleaned all
> instances of gcc specific attributes with the right macros for the crypto
> subsystem.
> 
> I had to make one additional change into compiler-gcc.h for the case when
> one wants to use this: __attribute__((aligned) and not specify an alignment
> factor. From the gcc docs, this will result in the largest alignment for
> that data type on the target machine so I've named the macro
> __aligned_largest. Please advise if another name is more appropriate.
> 
> Signed-off-by: Gideon Israel Dsouza 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crypto: testmgr - use kmemdup instead of kmalloc+memcpy

2017-01-12 Thread Herbert Xu

On Fri, Dec 30, 2016 at 02:12:00PM -0600, Eric Biggers wrote:
> From: Eric Biggers 
> 
> It's recommended to use kmemdup instead of kmalloc followed by memcpy.
> 
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 3/8] crypto:chcr- Fix key length for RFC4106

2017-01-12 Thread Harsh Jain



On 12-01-2017 21:39, Herbert Xu wrote:
> On Fri, Jan 06, 2017 at 02:01:34PM +0530, Harsh Jain wrote:
>> Check keylen before copying salt to avoid wrap around of Integer.
>>
>> Signed-off-by: Harsh Jain 
>> ---
>>  drivers/crypto/chelsio/chcr_algo.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/crypto/chelsio/chcr_algo.c 
>> b/drivers/crypto/chelsio/chcr_algo.c
>> index deec7c0..6c2dea3 100644
>> --- a/drivers/crypto/chelsio/chcr_algo.c
>> +++ b/drivers/crypto/chelsio/chcr_algo.c
>> @@ -2194,8 +2194,8 @@ static int chcr_gcm_setkey(struct crypto_aead *aead, 
>> const u8 *key,
>>  unsigned int ck_size;
>>  int ret = 0, key_ctx_size = 0;
>>  
>> -if (get_aead_subtype(aead) ==
>> -CRYPTO_ALG_SUB_TYPE_AEAD_RFC4106) {
>> +if (get_aead_subtype(aead) == CRYPTO_ALG_SUB_TYPE_AEAD_RFC4106 &&
>> +keylen > 3) {
>>  keylen -= 4;  /* nonce/salt is present in the last 4 bytes */
>>  memcpy(aeadctx->salt, key + keylen, 4);
>>  }
> We should return an error in this case.
That case is already handled in next if condition.It will error out with 
-EINVAL in next condition.

if (keylen == AES_KEYSIZE_128) {

>
> Cheers,

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Freitag, 13. Januar 2017, 00:06:29 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 05:01:13PM +0100, Stephan Müller wrote:
> > I fully agree. Therefore, I was under the impression that disregarding the
> > AAD in recvmsg entirely would be most appropriate as offered with the
> > patch "crypto: AF_ALG - disregard AAD buffer space for output". In this
> > case we would be fully POSIX compliant, the kernel would not copy the AAD
> > (and thus perform multiple memcpy operations due to copy_from_user and
> > copy_to_user round trips) and leave the AAD copy operation entirely to
> > user space.
> Yes but then you'd have to play nasty games to fit this through
> the kernel API. 

I would not understand that statement.

With the patch mentioned above that I provided some weeks ago, we have the 
following scenario for an encryption (in case of decryption, it is almost 
identical, just the tag location is reversed):

user calls sendmsg with data buffer/IOVEC: AAD || PT
-> algif_aead turns this into the src SGL

user calls recvmsg with data buffer/IOVEC: CT || Tag
-> algif_aead creates the first SG entry in the dst SGL pointing to the 
AAD from the src SGL
-> algif_aead appends the user buffers to the dst SGL

-> algif_aead performs its operation and during that operation, only 
the 
CT and Tag parts are changed

I.e. with the pre-pending of the SG pointing to the AAD from the src SGL to 
the dst SGL we have a clean invocation of the kernel API.

Yet, we avoid copy round trips of the AAD from src to dst in the kernel.

> Besides, we could still do in-place crypto even
> though you suggested that it's complicated.

Is that crypto-in-place operation really a goal we want to achieve if we "buy" 
it with a memcpy of the data from the src SGL to the dst SGL before the crypto 
operation?

Can we really call such implementation a crypto-in-place operation?

> It's not really.

I concur, for encryption the suggested memcpy is not difficult. Only for 
decryption, the memcpy is more challenging.

> All we have to do is walk through the SG list and compare each
> page/offset.  For the common case it's going to be a single-entry
> list.
> 
> Cheers,



Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 04:50:46PM +0100, Stephan Müller wrote:
>
> So you say that we could remove it from authenc() entirely (this is currently 
> the only location where such copy operation is found for the encryption 
> direction)?
> 
> I would concur that the kernel does not need that.

authenc needs it for performance reasons.  The kernel API as it
stands basically says that it may or may not copy the AAD.  If
you choose to have a distinct AAD in the dst SG list then either
you throw away the result or you copy it yourself.

> If we only want to solve that for algif_aead, wouldn't it make more sense if 
> the user space caller takes care of that (such as libkcapi)? By tinkering 
> with 
> the SGLs and copying the data to the dst buffer before the cipher operation 
> takes place, I guess we will add performance degradation and more complexity 
> in the kernel.
> 
> Having such logic in user space would keep the algif_aead cleaner IMHO.

We need to have a sane kernel API that respects POSIX.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix itnull.cocci warnings

2017-01-12 Thread Herbert Xu

On Sat, Jan 07, 2017 at 10:46:17AM +0100, Julia Lawall wrote:
> The first argument to list_for_each_entry cannot be NULL.
> 
> Generated by: scripts/coccinelle/iterators/itnull.cocci
> 
> CC: Harsh Jain 
> Signed-off-by: Julia Lawall 
> Signed-off-by: Fengguang Wu 
> ---
> 
> This code comes from the following git tree:
> 
> url:
> https://github.com/0day-ci/linux/commits/Harsh-Jain/crypto-chcr-Bug-fixes/20170107-093356
> base:
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> master
> In-Reply-To:
> <8e0086b56d8fb61637d179c32a09a1bca03c4186.1483599449.git.ha...@chelsio.com>

Harsh, please fold this patch into your series when you resubmit.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Stephan Müller

Am Freitag, 13. Januar 2017, 00:17:59 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 05:10:14PM +0100, Stephan Müller wrote:
> > Each IOCB would transpire into an independent, separate recvmsg invocation
> > without an additional sendmsg/sendpage operation. Thus, in order to
> > support
> > multiple IOCBs, all data the multiple recvmsg invocations will operate on
> > must be injected into the kernel beforehand.
> 
> I don't understand, what's wrong with:
> 
> sendmsg(fd, ...)
> aio_read(iocb1)
> sendmsg(fd, ...)
> aio_read(iocb2)

Sure, that works. But here you limit yourself to one IOCB per aio_read. But 
aio_read supports multiple IOCBs in one invocation. And this is the issue I am 
considering.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 05:10:14PM +0100, Stephan Müller wrote:
>
> Each IOCB would transpire into an independent, separate recvmsg invocation 
> without an additional sendmsg/sendpage operation. Thus, in order to support 
> multiple IOCBs, all data the multiple recvmsg invocations will operate on 
> must 
> be injected into the kernel beforehand.

I don't understand, what's wrong with:

sendmsg(fd, ...)
aio_read(iocb1)
sendmsg(fd, ...)
aio_read(iocb2)

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 0/8] crypto:chcr- Bug fixes

2017-01-12 Thread Herbert Xu

On Fri, Jan 06, 2017 at 02:01:31PM +0530, Harsh Jain wrote:
> The patch series is based on Herbert's cryptodev-2.6 tree.
> It include bug fixes.
> 
> Atul Gupta (4):
>   crypto:chcr-Change flow IDs
>   crypto:chcr- Fix panic on dma_unmap_sg
>   crypto:chcr- Check device is allocated before use
>   crypto:chcr- Fix wrong typecasting
> Harsh Jain (4):
>   crypto:chcr- Fix key length for RFC4106
>   crypto:chcr- Use cipher instead of Block Cipher in gcm setkey
>   crypto:chcr: Change cra_flags for cipher algos
>   crypto:chcr- Change algo priority

When you resubmit this please split it into two series.  Please
send the critical bug fixes (panic + key length + alloc check)
in one series separate from the others.  This way I can push
them easily to the 4.10 tree.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Stephan Müller

Am Freitag, 13. Januar 2017, 00:07:39 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 05:05:03PM +0100, Stephan Müller wrote:
> > Am Donnerstag, 12. Januar 2017, 16:56:04 CET schrieb Stephan Müller:
> > 
> > Hi Herbert,
> > 
> > > That would mean that we would only support one IOCB.
> > 
> > As we also need to be standards compliant, would it be appropriate to only
> > support one IOCB? I think this is a significant difference to other AIO
> > operations like for networking.
> 
> Why would we be limited to one IOCB?

Each IOCB would transpire into an independent, separate recvmsg invocation 
without an additional sendmsg/sendpage operation. Thus, in order to support 
multiple IOCBs, all data the multiple recvmsg invocations will operate on must 
be injected into the kernel beforehand.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 3/8] crypto:chcr- Fix key length for RFC4106

2017-01-12 Thread Herbert Xu

On Fri, Jan 06, 2017 at 02:01:34PM +0530, Harsh Jain wrote:
> Check keylen before copying salt to avoid wrap around of Integer.
> 
> Signed-off-by: Harsh Jain 
> ---
>  drivers/crypto/chelsio/chcr_algo.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/crypto/chelsio/chcr_algo.c 
> b/drivers/crypto/chelsio/chcr_algo.c
> index deec7c0..6c2dea3 100644
> --- a/drivers/crypto/chelsio/chcr_algo.c
> +++ b/drivers/crypto/chelsio/chcr_algo.c
> @@ -2194,8 +2194,8 @@ static int chcr_gcm_setkey(struct crypto_aead *aead, 
> const u8 *key,
>   unsigned int ck_size;
>   int ret = 0, key_ctx_size = 0;
>  
> - if (get_aead_subtype(aead) ==
> - CRYPTO_ALG_SUB_TYPE_AEAD_RFC4106) {
> + if (get_aead_subtype(aead) == CRYPTO_ALG_SUB_TYPE_AEAD_RFC4106 &&
> + keylen > 3) {
>   keylen -= 4;  /* nonce/salt is present in the last 4 bytes */
>   memcpy(aeadctx->salt, key + keylen, 4);
>   }

We should return an error in this case.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 05:05:03PM +0100, Stephan Müller wrote:
> Am Donnerstag, 12. Januar 2017, 16:56:04 CET schrieb Stephan Müller:
> 
> Hi Herbert,
> 
> > 
> > That would mean that we would only support one IOCB.
> 
> As we also need to be standards compliant, would it be appropriate to only 
> support one IOCB? I think this is a significant difference to other AIO 
> operations like for networking.

Why would we be limited to one IOCB?
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 05:01:13PM +0100, Stephan Müller wrote:
>
> I fully agree. Therefore, I was under the impression that disregarding the 
> AAD 
> in recvmsg entirely would be most appropriate as offered with the patch 
> "crypto: AF_ALG - disregard AAD buffer space for output". In this case we 
> would be fully POSIX compliant, the kernel would not copy the AAD (and thus 
> perform multiple memcpy operations due to copy_from_user and copy_to_user 
> round trips) and leave the AAD copy operation entirely to user space.

Yes but then you'd have to play nasty games to fit this through
the kernel API.  Besides, we could still do in-place crypto even
though you suggested that it's complicated.  It's not really.
All we have to do is walk through the SG list and compare each
page/offset.  For the common case it's going to be a single-entry
list.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 16:56:04 CET schrieb Stephan Müller:

Hi Herbert,

> 
> That would mean that we would only support one IOCB.

As we also need to be standards compliant, would it be appropriate to only 
support one IOCB? I think this is a significant difference to other AIO 
operations like for networking.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 23:53:44 CET schrieb Herbert Xu:

Hi Herbert,

> 
> > If we only want to solve that for algif_aead, wouldn't it make more sense
> > if the user space caller takes care of that (such as libkcapi)? By
> > tinkering with the SGLs and copying the data to the dst buffer before the
> > cipher operation takes place, I guess we will add performance degradation
> > and more complexity in the kernel.
> > 
> > Having such logic in user space would keep the algif_aead cleaner IMHO.
> 
> We need to have a sane kernel API that respects POSIX.

I fully agree. Therefore, I was under the impression that disregarding the AAD 
in recvmsg entirely would be most appropriate as offered with the patch 
"crypto: AF_ALG - disregard AAD buffer space for output". In this case we 
would be fully POSIX compliant, the kernel would not copy the AAD (and thus 
perform multiple memcpy operations due to copy_from_user and copy_to_user 
round trips) and leave the AAD copy operation entirely to user space.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 23:51:28 CET schrieb Herbert Xu:

Hi Herbert,

> On Sun, Dec 25, 2016 at 06:15:06PM +0100, Stephan Müller wrote:
> > + * The following concept of the memory management is used:
> > + *
> > + * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL
> > is
> > + * filled by user space with the data submitted via sendpage/sendmsg.
> > Filling + * up the TX SGL does not cause a crypto operation -- the data
> > will only be + * tracked by the kernel. Upon receipt of one recvmsg call,
> > the caller must + * provide a buffer which is tracked with the RX SGL.
> > + *
> > + * During the processing of the recvmsg operation, the cipher request is
> > + * allocated and prepared. To support multiple recvmsg operations
> > operating + * on one TX SGL, an offset pointer into the TX SGL is
> > maintained. The TX SGL + * that is used for the crypto request is
> > scatterwalk_ffwd by the offset + * pointer to obtain the start address
> > the crypto operation shall use for + * the input data.
> 
> I think this is overcomplicating things.  The async interface
> should be really simple.  It should be exactly the same as the
> sync interface, except that completion is out-of-line.
> 
> So there should be no mixing of SGLs from different requests.
>  Just start with a clean slate after each recvmsg regardless of
> whether it's sync or async.
> 
> The only difference in the async case is that you need to keep a
> reference to the old pages and free them upon completion.  But this
> should in no way interfere with subsequent requests.

That would mean that we would only support one IOCB.

At least with algif_skcipher, having multiple IOCBs would reduce the number of 
system calls user space needs to make for multiple plaintext / ciphertext 
blocks. But then, with the use of IOVECs, user space could provide all input 
data with one system call anyway.

Ok, I will update the patch as suggested.
> 
> Cheers,



Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management

2017-01-12 Thread Herbert Xu

On Sun, Dec 25, 2016 at 06:15:06PM +0100, Stephan Müller wrote:
>
> + * The following concept of the memory management is used:
> + *
> + * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
> + * filled by user space with the data submitted via sendpage/sendmsg. Filling
> + * up the TX SGL does not cause a crypto operation -- the data will only be
> + * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
> + * provide a buffer which is tracked with the RX SGL.
> + *
> + * During the processing of the recvmsg operation, the cipher request is
> + * allocated and prepared. To support multiple recvmsg operations operating
> + * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
> + * that is used for the crypto request is scatterwalk_ffwd by the offset
> + * pointer to obtain the start address the crypto operation shall use for
> + * the input data.

I think this is overcomplicating things.  The async interface
should be really simple.  It should be exactly the same as the
sync interface, except that completion is out-of-line.

So there should be no mixing of SGLs from different requests.
 Just start with a clean slate after each recvmsg regardless of
whether it's sync or async.

The only difference in the async case is that you need to keep a
reference to the old pages and free them upon completion.  But this
should in no way interfere with subsequent requests.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 23:39:24 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 04:34:39PM +0100, Stephan Müller wrote:
> > We would only be able to remove it if all AEAD implementations are
> > converted. But for the conversion time, we do face that issue.
> 
> It doesn't matter.  Nobody in the kernel uses that.  In fact I
> wonder whether we should even do it for the kernel API.  We only
> need it for the user-space API because it goes through read/write.

So you say that we could remove it from authenc() entirely (this is currently 
the only location where such copy operation is found for the encryption 
direction)?

I would concur that the kernel does not need that.
> 
> > Are you suggesting that the entire data in the src SGL is first copied to
> > the dst SGL by algif_aead? If yes, that still requires significant
> > src/dst SGL tinkering as we have the tag -- the src SGL for encrypt does
> > not have the tag space where the dst SGL for encrypt is required to have
> > the tag size. This is vice versa for the decryption operation.
> 
> It's really only a problem for decryption.  In that case you can
> extend the dst SG list to include the tag.

If we only want to solve that for algif_aead, wouldn't it make more sense if 
the user space caller takes care of that (such as libkcapi)? By tinkering with 
the SGLs and copying the data to the dst buffer before the cipher operation 
takes place, I guess we will add performance degradation and more complexity 
in the kernel.

Having such logic in user space would keep the algif_aead cleaner IMHO.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] crypto: kpp - clear CRYPTO_ALG_DEAD bit in prepare_alg

2017-01-12 Thread Herbert Xu

On Mon, Jan 02, 2017 at 01:33:41PM +, Salvatore Benedetto wrote:
> Make sure CRYPTO_ALG_DEAD is not set when preparing for
> alg registration. This fixes qat-dh registration that occurs
> when reloading qat_c62x module.
> 
> Signed-off-by: Salvatore Benedetto 
> ---
>  crypto/kpp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/crypto/kpp.c b/crypto/kpp.c
> index d36ce05..d1adef8e 100644
> --- a/crypto/kpp.c
> +++ b/crypto/kpp.c
> @@ -101,6 +101,7 @@ static void kpp_prepare_alg(struct kpp_alg *alg)
>  
>   base->cra_type = &crypto_kpp_type;
>   base->cra_flags &= ~CRYPTO_ALG_TYPE_MASK;
> + base->cra_flags &= ~CRYPTO_ALG_DEAD;
>   base->cra_flags |= CRYPTO_ALG_TYPE_KPP;
>  }

Same comment as in

https://patchwork.kernel.org/patch/9485115/

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 04:34:39PM +0100, Stephan Müller wrote:
>
> We would only be able to remove it if all AEAD implementations are converted. 
> But for the conversion time, we do face that issue.

It doesn't matter.  Nobody in the kernel uses that.  In fact I
wonder whether we should even do it for the kernel API.  We only
need it for the user-space API because it goes through read/write.

> Are you suggesting that the entire data in the src SGL is first copied to the 
> dst SGL by algif_aead? If yes, that still requires significant src/dst SGL 
> tinkering as we have the tag -- the src SGL for encrypt does not have the tag 
> space where the dst SGL for encrypt is required to have the tag size. This is 
> vice versa for the decryption operation.

It's really only a problem for decryption.  In that case you can
extend the dst SG list to include the tag.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 23:27:10 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 04:23:50PM +0100, Stephan Müller wrote:
> > As far as I understand, we have the following AAD copy operations that we
> > discuss:
> > 
> > - algif_aead: you suggested to add the AAD copy operation here
> > 
> > - AEAD implementations: over time, the AEAD implementations shall receive
> > the AAD copy operation. The AAD copy operation shall only take place if
> > the src SGL != dst SGL
> 
> If and when that happens we'd simply need to remove the copy from
> algif_aead code. 

We would only be able to remove it if all AEAD implementations are converted. 
But for the conversion time, we do face that issue.

> But even if we didn't you wouldn't have two copies
> because algif_aead should invoke the in-place code-path after doing
> a full copy of src to dst.

Are you suggesting that the entire data in the src SGL is first copied to the 
dst SGL by algif_aead? If yes, that still requires significant src/dst SGL 
tinkering as we have the tag -- the src SGL for encrypt does not have the tag 
space where the dst SGL for encrypt is required to have the tag size. This is 
vice versa for the decryption operation.

And to me the most elegant solution seems adding the copy operation to 
crypto_aead_[en|de]crypt.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 04:23:50PM +0100, Stephan Müller wrote:
>
> As far as I understand, we have the following AAD copy operations that we 
> discuss:
> 
> - algif_aead: you suggested to add the AAD copy operation here
> 
> - AEAD implementations: over time, the AEAD implementations shall receive the 
> AAD copy operation. The AAD copy operation shall only take place if the src 
> SGL != dst SGL

If and when that happens we'd simply need to remove the copy from
algif_aead code.  But even if we didn't you wouldn't have two copies
because algif_aead should invoke the in-place code-path after doing
a full copy of src to dst.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 22:57:28 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Jan 12, 2017 at 12:22:09PM +0100, Stephan Müller wrote:
> > When addressing the issue in the algif_aead code, and expect that over
> > time
> > the AEAD implementations will gain the copy operation, eventually we will
> > copy the AAD twice. Of course, this could be prevented, if the algif_aead
> > code somehow uses the same SGL for the src and dst AAD.
> 
> Why would you copy it twice? You copy everything before you start
> and then just do in-place crypto.
> 

As far as I understand, we have the following AAD copy operations that we 
discuss:

- algif_aead: you suggested to add the AAD copy operation here

- AEAD implementations: over time, the AEAD implementations shall receive the 
AAD copy operation. The AAD copy operation shall only take place if the src 
SGL != dst SGL

The algif_aead code *always* will have the src SGL being different from the 
dst SGL. Thus, any existing AEAD implementation copy will always be performed 
which would be in addition to the algif_aead AAD copy operation. We can only 
avoid the AEAD implementation copy operation, if we add some code to 
algif_aead to make sure that the AAD data is pointed to by src/dst SGL that is 
identical. However, such logic to make src/dst SGL identical is quite complex 
compared to simply use one callback as suggested in the current patch set.

In the followup email, I suggested to add the AAD copy function invocation 
into crypto_aead_encrypt. This way, we can drop the numerous patches to the 
AEAD implementations and yet we can avoid adding such copy operation and src/
dst SGL unification logic to algif_aead.

> > > BTW, why are you only doing the copy for encryption?
> > 
> > I was looking at the only AEAD implementation that does the copy
> > operation:
> > authenc. There, the copy operation is only performed for encryption. I was
> > thinking a bit about why decryption was not covered. I think the answer is
> > the following: for encryption, the AAD is definitely needed in the dst
> > buffer as the dst buffer with the AAD must be sent to the recipient for
> > decryption. The decryption and the associated authentication only works
> > with the AAD. However, after decrypting, all the caller wants is the
> > decrypted plaintext only. There is no further use of the AAD after the
> > decryption step. Hence, copying the AAD to the dst buffer in the
> > decryption step would not serve the caller.
> That's just the current implementation.  If we're going to make
> this an API then we should do it for both directions.

Considering the suggestion above to add the AAD copy call to 
crypto_aead_encrypt, we can add such copy call also to crypto_aead_decrypt.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 11:06:50PM +0800, Herbert Xu wrote:
> On Thu, Jan 12, 2017 at 09:03:18AM -0600, Josh Poimboeuf wrote:
> > 
> > For the entry code, could we just replace all calls with CALL_ALIGNED?
> > That might be less intrusive than trying to adjust all the pt_regs
> > accesses.
> > 
> > Then to ensure that nobody ever uses 'call' directly:
> > 
> >   '#define call please-use-CALL-ALIGNED-instead-of-call'
> > 
> > I think that would be possible if CALL_ALIGNED were a ".macro".
> 
> The trouble with that is that you have got things like TRACE_IRQS_OFF
> which are also used outside of the entry code.

Where?  As far as I can tell, TRACE_IRQS_OFF is used exclusively by entry
code.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 09:03:18AM -0600, Josh Poimboeuf wrote:
> On Wed, Jan 11, 2017 at 11:51:10PM -0800, Andy Lutomirski wrote:
> > On Wed, Jan 11, 2017 at 11:05 PM, Herbert Xu
> >  wrote:
> > > On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
> > >>
> > >> I'm pretty sure we have random asm code that may not maintain a
> > >> 16-byte stack alignment when it calls other code (including, in some
> > >> cases, calling C code).
> > >>
> > >> So I'm not at all convinced that this is a good idea. We shouldn't
> > >> expect 16-byte alignment to be something trustworthy.
> > >
> > > So what if we audited all the x86 assembly code to fix this? Would
> > > it then be acceptable to do a 16-byte aligned stack?
> > >
> > > On the face of it it doesn't seem to be a huge amount of code
> > > assuming they mostly live under arch/x86.
> > 
> > The problem is that we have nasties like TRACE_IRQS_OFF.  Performance
> > doesn't really matter for these macros, so we could probably rig up a
> > helper for forcibly align the stack there.  Maybe
> > FRAME_BEGIN_FORCE_ALIGN?  I also think I'd rather not to modify
> > pt_regs.  We should just fix the small number of code paths that
> > create a pt_regs and then call into C code to align the stack.
> > 
> > But if we can't do this with automatic verification, then I'm not sure
> > I want to do it at all.  The asm is already more precarious than I'd
> > like, and having a code path that is misaligned is asking for obscure
> > bugs down the road.
> 
> For the entry code, could we just replace all calls with CALL_ALIGNED?
> That might be less intrusive than trying to adjust all the pt_regs
> accesses.
> 
> Then to ensure that nobody ever uses 'call' directly:
> 
>   '#define call please-use-CALL-ALIGNED-instead-of-call'
> 
> I think that would be possible if CALL_ALIGNED were a ".macro".

To clarify, CALL_ALIGNED could be (completely untested):

.macro CALL_ALIGNED \func
push%rbp
movq%rsp, %rbp
and $0xfff0,%rsp
call\func
movq%rbp, %rsp
pop %rbp
.endm

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Wed, Jan 11, 2017 at 11:51:10PM -0800, Andy Lutomirski wrote:
> On Wed, Jan 11, 2017 at 11:05 PM, Herbert Xu
>  wrote:
> > On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
> >>
> >> I'm pretty sure we have random asm code that may not maintain a
> >> 16-byte stack alignment when it calls other code (including, in some
> >> cases, calling C code).
> >>
> >> So I'm not at all convinced that this is a good idea. We shouldn't
> >> expect 16-byte alignment to be something trustworthy.
> >
> > So what if we audited all the x86 assembly code to fix this? Would
> > it then be acceptable to do a 16-byte aligned stack?
> >
> > On the face of it it doesn't seem to be a huge amount of code
> > assuming they mostly live under arch/x86.
> 
> The problem is that we have nasties like TRACE_IRQS_OFF.  Performance
> doesn't really matter for these macros, so we could probably rig up a
> helper for forcibly align the stack there.  Maybe
> FRAME_BEGIN_FORCE_ALIGN?  I also think I'd rather not to modify
> pt_regs.  We should just fix the small number of code paths that
> create a pt_regs and then call into C code to align the stack.
> 
> But if we can't do this with automatic verification, then I'm not sure
> I want to do it at all.  The asm is already more precarious than I'd
> like, and having a code path that is misaligned is asking for obscure
> bugs down the road.

For the entry code, could we just replace all calls with CALL_ALIGNED?
That might be less intrusive than trying to adjust all the pt_regs
accesses.

Then to ensure that nobody ever uses 'call' directly:

  '#define call please-use-CALL-ALIGNED-instead-of-call'

I think that would be possible if CALL_ALIGNED were a ".macro".

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -next] crypto: mediatek - make symbol of_crypto_id static

2017-01-12 Thread Wei Yongjun

From: Wei Yongjun 

Fixes the following sparse warning:

drivers/crypto/mediatek/mtk-platform.c:585:27: warning:
 symbol 'of_crypto_id' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 drivers/crypto/mediatek/mtk-platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/mediatek/mtk-platform.c 
b/drivers/crypto/mediatek/mtk-platform.c
index 286296f..a9c713d 100644
--- a/drivers/crypto/mediatek/mtk-platform.c
+++ b/drivers/crypto/mediatek/mtk-platform.c
@@ -582,7 +582,7 @@ static int mtk_crypto_remove(struct platform_device *pdev)
return 0;
 }
 
-const struct of_device_id of_crypto_id[] = {
+static const struct of_device_id of_crypto_id[] = {
{ .compatible = "mediatek,eip97-crypto" },
{},
 };

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 09:03:18AM -0600, Josh Poimboeuf wrote:
> 
> For the entry code, could we just replace all calls with CALL_ALIGNED?
> That might be less intrusive than trying to adjust all the pt_regs
> accesses.
> 
> Then to ensure that nobody ever uses 'call' directly:
> 
>   '#define call please-use-CALL-ALIGNED-instead-of-call'
> 
> I think that would be possible if CALL_ALIGNED were a ".macro".

The trouble with that is that you have got things like TRACE_IRQS_OFF
which are also used outside of the entry code.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 12:22:09PM +0100, Stephan Müller wrote:
> 
> When addressing the issue in the algif_aead code, and expect that over time 
> the AEAD implementations will gain the copy operation, eventually we will 
> copy 
> the AAD twice. Of course, this could be prevented, if the algif_aead code 
> somehow uses the same SGL for the src and dst AAD.

Why would you copy it twice? You copy everything before you start
and then just do in-place crypto.

> > BTW, why are you only doing the copy for encryption?
> 
> I was looking at the only AEAD implementation that does the copy operation: 
> authenc. There, the copy operation is only performed for encryption. I was 
> thinking a bit about why decryption was not covered. I think the answer is 
> the 
> following: for encryption, the AAD is definitely needed in the dst buffer as 
> the dst buffer with the AAD must be sent to the recipient for decryption. The 
> decryption and the associated authentication only works with the AAD. 
> However, 
> after decrypting, all the caller wants is the decrypted plaintext only. There 
> is no further use of the AAD after the decryption step. Hence, copying the 
> AAD 
> to the dst buffer in the decryption step would not serve the caller.

That's just the current implementation.  If we're going to make
this an API then we should do it for both directions.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Thu, Jan 12, 2017 at 08:46:01AM +0100, Ingo Molnar wrote:
> 
> * Herbert Xu  wrote:
> 
> > On Tue, Jan 10, 2017 at 09:05:28AM -0800, Linus Torvalds wrote:
> > >
> > > I'm pretty sure we have random asm code that may not maintain a
> > > 16-byte stack alignment when it calls other code (including, in some
> > > cases, calling C code).
> > > 
> > > So I'm not at all convinced that this is a good idea. We shouldn't
> > > expect 16-byte alignment to be something trustworthy.
> > 
> > So what if we audited all the x86 assembly code to fix this? Would
> > it then be acceptable to do a 16-byte aligned stack?
> 
> Audits for small but deadly details that isn't checked automatically by 
> tooling 
> would inevitably bitrot again - and in this particular case there's a 50% 
> chance 
> that a new, buggy change would test out to be 'fine' on a kernel developer's 
> own 
> box - and break on different configs, different hw or with unrelated (and 
> innocent) kernel changes, sometime later - spreading the pain unnecessarily.
> 
> So my feeling is that we really need improved tooling for this (and yes, the 
> GCC 
> toolchain should have handled this correctly).
> 
> But fortunately we have related tooling in the kernel: could objtool handle 
> this? 
> My secret hope was always that objtool would grow into a kind of life 
> insurance 
> against toolchain bogosities (which is a must for things like livepatching or 
> a 
> DWARF unwinder - but I digress).

Are we talking about entry code, or other asm code?  Because objtool
audits *most* asm code, but entry code is way too specialized for
objtool to understand.

(I do have a pending objtool rewrite which would make it very easy to
ensure 16-byte stack alignment.  But again, objtool can only understand
callable C or asm functions, not entry code.)

Another approach would be to solve this problem with unwinder warnings,
*if* there's enough test coverage.

I recently made some changes to try to standardize the "end" of the
stack, so that the stack pointer is always a certain value before
calling into C code.  I also added some warnings to the unwinder to
ensure that it always reaches that point on the stack.  So if the "end"
of the stack were adjusted by a word by adding padding to pt_regs, the
unwinder warnings could help preserve that.

We could take that a step further by adding an unwinder check to ensure
that *every* frame is 16-byte aligned if -mpreferred-stack-boundary=3
isn't used.

Yet another step would be to add a debug feature which does stack sanity
checking from a periodic NMI, to flush out these unwinder warnings.

(Though I've found that current 0-day and fuzzing efforts, combined with
lockdep and perf's frequent unwinder usage, are already doing a great
job at flushing out unwinder warnings.)

The only question is if there would be enough test coverage,
particularly with those versions of gcc which don't have
-mpreferred-stack-boundary=3.

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/1] crypto: add virtio-crypto driver

2017-01-12 Thread Michael S. Tsirkin

On Thu, Jan 12, 2017 at 03:10:25PM +0100, Christian Borntraeger wrote:
> On 01/10/2017 01:56 PM, Christian Borntraeger wrote:
> > On 01/10/2017 01:36 PM, Gonglei (Arei) wrote:
> >> Hi,
> >>
> >>>
> >>> On 12/15/2016 03:03 AM, Gonglei wrote:
> >>> [...]
>  +
>  +static struct crypto_alg virtio_crypto_algs[] = { {
>  +.cra_name = "cbc(aes)",
>  +.cra_driver_name = "virtio_crypto_aes_cbc",
>  +.cra_priority = 501,
> >>>
> >>>
> >>> This is still higher than the hardware-accelerators (like intel aesni or 
> >>> the
> >>> s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported 
> >>> by the
> >>> hardware virtualization and available to the guests. I do not see a way 
> >>> how
> >>> virtio
> >>> crypto can be faster than that (in the end it might be cpacf/aesni + 
> >>> overhead)
> >>> instead it will very likely be slower.
> >>> So we should use a number that is higher than software implementations but
> >>> lower than the hw ones.
> >>>
> >>> Just grepping around, the software ones seem be be around 100 and the
> >>> hardware
> >>> ones around 200-400. So why was 150 not enough?
> >>>
> >> I didn't find a documentation about how we use the priority, and I assumed
> >> people use virtio-crypto will configure hardware accelerators in the
> >> host. So I choosed the number which bigger than aesni's priority.
> > 
> > Yes, but the aesni driver will only bind if there is HW support in the 
> > guest.
> > And if aesni is available in the guest (or the s390 aes function from cpacf)
> > it will always be faster than the same in the host via virtio.So your 
> > priority
> > should be smaller.
> 
> 
> any opinion on this? 

Going forward, we might add an emulated aesni device and that might
become slower than virtio. OTOH if or when this happens, we can solve it
by adding a priority or a feature flag to virtio to raise its priority.

So I think I agree with Christian here, let's lower the priority.
Gonglei, could you send a patch like this?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 1/1] crypto: add virtio-crypto driver

2017-01-12 Thread Christian Borntraeger

On 01/10/2017 01:56 PM, Christian Borntraeger wrote:
> On 01/10/2017 01:36 PM, Gonglei (Arei) wrote:
>> Hi,
>>
>>>
>>> On 12/15/2016 03:03 AM, Gonglei wrote:
>>> [...]
 +
 +static struct crypto_alg virtio_crypto_algs[] = { {
 +  .cra_name = "cbc(aes)",
 +  .cra_driver_name = "virtio_crypto_aes_cbc",
 +  .cra_priority = 501,
>>>
>>>
>>> This is still higher than the hardware-accelerators (like intel aesni or the
>>> s390 cpacf functions or the arm hw). aesni and s390/cpacf are supported by 
>>> the
>>> hardware virtualization and available to the guests. I do not see a way how
>>> virtio
>>> crypto can be faster than that (in the end it might be cpacf/aesni + 
>>> overhead)
>>> instead it will very likely be slower.
>>> So we should use a number that is higher than software implementations but
>>> lower than the hw ones.
>>>
>>> Just grepping around, the software ones seem be be around 100 and the
>>> hardware
>>> ones around 200-400. So why was 150 not enough?
>>>
>> I didn't find a documentation about how we use the priority, and I assumed
>> people use virtio-crypto will configure hardware accelerators in the
>> host. So I choosed the number which bigger than aesni's priority.
> 
> Yes, but the aesni driver will only bind if there is HW support in the guest.
> And if aesni is available in the guest (or the s390 aes function from cpacf)
> it will always be faster than the same in the host via virtio.So your priority
> should be smaller.


any opinion on this? 

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/3] drivers: crypto: Enable CPT options crypto for build

2017-01-12 Thread kbuild test robot

Hi George,

[auto build test WARNING on v4.9-rc8]
[cannot apply to cryptodev/master crypto/master sparc-next/master next-20170111]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/George-Cherian/Add-Support-for-Cavium-Cryptographic-Acceleration-Unit/20170112-192240
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All warnings (new ones prefixed by >>):

   drivers/crypto/cavium/cpt/cptvf_reqmanager.c: In function 'process_request':
>> drivers/crypto/cavium/cpt/cptvf_reqmanager.c:470:49: warning: format '%lu' 
>> expects argument of type 'long unsigned int', but argument 3 has type 
>> 'unsigned int' [-Wformat=]
  dev_err(&pdev->dev, "mapping compptr Failed %lu\n",
^

vim +470 drivers/crypto/cavium/cpt/cptvf_reqmanager.c

8413476d George Cherian 2017-01-11  454 goto request_cleanup;
8413476d George Cherian 2017-01-11  455 }
8413476d George Cherian 2017-01-11  456  
8413476d George Cherian 2017-01-11  457 cpt_req->dlen = info->dlen;
8413476d George Cherian 2017-01-11  458 /*
8413476d George Cherian 2017-01-11  459  * Get buffer for union 
cpt_res_s response
8413476d George Cherian 2017-01-11  460  * structure and its physical 
address
8413476d George Cherian 2017-01-11  461  */
8413476d George Cherian 2017-01-11  462 info->completion_addr = 
kzalloc(sizeof(union cpt_res_s),
8413476d George Cherian 2017-01-11  463 
 GFP_KERNEL | GFP_ATOMIC);
8413476d George Cherian 2017-01-11  464 *((u8 
*)(info->completion_addr)) = COMPLETION_CODE_INIT;
8413476d George Cherian 2017-01-11  465 info->comp_baddr = 
dma_map_single(&pdev->dev,
8413476d George Cherian 2017-01-11  466 
   (void *)info->completion_addr,
8413476d George Cherian 2017-01-11  467 
   sizeof(union cpt_res_s),
8413476d George Cherian 2017-01-11  468 
   DMA_BIDIRECTIONAL);
8413476d George Cherian 2017-01-11  469 if 
(dma_mapping_error(&pdev->dev, info->comp_baddr)) {
8413476d George Cherian 2017-01-11 @470 dev_err(&pdev->dev, 
"mapping compptr Failed %lu\n",
8413476d George Cherian 2017-01-11  471 sizeof(union 
cpt_res_s));
8413476d George Cherian 2017-01-11  472 ret = -EFAULT;
8413476d George Cherian 2017-01-11  473 goto  request_cleanup;
8413476d George Cherian 2017-01-11  474 }
8413476d George Cherian 2017-01-11  475  
8413476d George Cherian 2017-01-11  476 /* Fill the VQ command */
8413476d George Cherian 2017-01-11  477 vq_cmd.cmd.u64 = 0;
8413476d George Cherian 2017-01-11  478 vq_cmd.cmd.s.opcode = 
cpu_to_be16(cpt_req->opcode.flags);

:: The code at line 470 was first introduced by commit
:: 8413476deed83359518ea36cc316f4669a8c521c drivers: crypto: Add the 
Virtual Function driver for CPT

:: TO: George Cherian 
:: CC: 0day robot 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Josh Poimboeuf

On Wed, Jan 11, 2017 at 10:21:07PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 10, 2017 at 10:01 PM, Andy Lutomirski  wrote:
> > On Tue, Jan 10, 2017 at 8:35 PM, Herbert Xu  
> > wrote:
> >> On Tue, Jan 10, 2017 at 08:17:17PM -0800, Linus Torvalds wrote:
> >>>
> >>> That said, I do think that the "don't assume stack alignment, do it by
> >>> hand" may be the safer thing. Because who knows what the random rules
> >>> will be on other architectures.
> >>
> >> Sure we can ban the use of attribute aligned on stacks.  But
> >> what about indirect uses through structures?  For example, if
> >> someone does
> >>
> >> struct foo {
> >> } __attribute__ ((__aligned__(16)));
> >>
> >> int bar(...)
> >> {
> >> struct foo f;
> >>
> >> return baz(&f);
> >> }
> >>
> >> then baz will end up with an unaligned argument.  The worst part
> >> is that it is not at all obvious to the person writing the function
> >> bar.
> >
> > Linus, I'm starting to lean toward agreeing with Herbert here, except
> > that we should consider making it conditional on having a silly GCC
> > version.  After all, the silly GCC versions are wasting space and time
> > with alignment instructions no matter what we do, so this would just
> > mean tweaking the asm and adding some kind of check_stack_alignment()
> > helper to throw out a WARN_ONCE() if we miss one.  The problem with
> > making it conditional is that making pt_regs effectively live at a
> > variable offset from %rsp is just nasty.
> 
> So actually doing this is gross because we have calls from asm to C
> all over the place.  But... maybe we can automate all the testing.
> Josh, how hard would it be to teach objtool to (if requested by an
> option) check that stack frames with statically known size preserve
> 16-byte stack alignment?
> 
> I find it rather annoying that gcc before 4.8 malfunctions when it
> sees __aligned__(16) on x86_64 kernels.  Sigh.

Just to clarify, I think you're asking if, for versions of gcc which
don't support -mpreferred-stack-boundary=3, objtool can analyze all C
functions to ensure their stacks are 16-byte aligned.

It's certainly possible, but I don't see how that solves the problem.
The stack will still be misaligned by entry code.  Or am I missing
something?

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] crypto: testmgr - use calculated count for number of test vectors

2017-01-12 Thread Ard Biesheuvel

When working on AES in CCM mode for ARM, my code passed the internal
tcrypt test before I had even bothered to implement the AES-192 and
AES-256 code paths, which is strange because the tcrypt does contain
AES-192 and AES-256 test vectors for CCM.

As it turned out, the define AES_CCM_ENC_TEST_VECTORS was out of sync
with the actual number of test vectors, causing only the AES-128 ones
to be executed.

So get rid of the defines, and wrap the test vector references in a
macro that calculates the number of vectors automatically.

The following test vector counts were out of sync with the respective
defines:

BF_CTR_ENC_TEST_VECTORS  2 ->  3
BF_CTR_DEC_TEST_VECTORS  2 ->  3
TF_CTR_ENC_TEST_VECTORS  2 ->  3
TF_CTR_DEC_TEST_VECTORS  2 ->  3
SERPENT_CTR_ENC_TEST_VECTORS 2 ->  3
SERPENT_CTR_DEC_TEST_VECTORS 2 ->  3
AES_CCM_ENC_TEST_VECTORS 8 -> 14
AES_CCM_DEC_TEST_VECTORS 7 -> 17
AES_CCM_4309_ENC_TEST_VECTORS7 -> 23
AES_CCM_4309_DEC_TEST_VECTORS   10 -> 23
CAMELLIA_CTR_ENC_TEST_VECTORS2 ->  3
CAMELLIA_CTR_DEC_TEST_VECTORS2 ->  3

Signed-off-by: Ard Biesheuvel 
---
 crypto/testmgr.c | 1033 +++---
 crypto/testmgr.h |  272 +-
 2 files changed, 204 insertions(+), 1101 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 44e888b0b041..b49d57826f4a 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -2251,30 +2251,23 @@ static int alg_test_null(const struct alg_test_desc 
*desc,
return 0;
 }
 
+#define __VECS(tv) { .vecs = tv, .count = ARRAY_SIZE(tv) }
+
 /* Please keep this list sorted by algorithm name. */
 static const struct alg_test_desc alg_test_descs[] = {
{
.alg = "ansi_cprng",
.test = alg_test_cprng,
.suite = {
-   .cprng = {
-   .vecs = ansi_cprng_aes_tv_template,
-   .count = ANSI_CPRNG_AES_TEST_VECTORS
-   }
+   .cprng = __VECS(ansi_cprng_aes_tv_template)
}
}, {
.alg = "authenc(hmac(md5),ecb(cipher_null))",
.test = alg_test_aead,
.suite = {
.aead = {
-   .enc = {
-   .vecs = 
hmac_md5_ecb_cipher_null_enc_tv_template,
-   .count = 
HMAC_MD5_ECB_CIPHER_NULL_ENC_TEST_VECTORS
-   },
-   .dec = {
-   .vecs = 
hmac_md5_ecb_cipher_null_dec_tv_template,
-   .count = 
HMAC_MD5_ECB_CIPHER_NULL_DEC_TEST_VECTORS
-   }
+   .enc = 
__VECS(hmac_md5_ecb_cipher_null_enc_tv_template),
+   .dec = 
__VECS(hmac_md5_ecb_cipher_null_dec_tv_template)
}
}
}, {
@@ -2282,12 +2275,7 @@ static const struct alg_test_desc alg_test_descs[] = {
.test = alg_test_aead,
.suite = {
.aead = {
-   .enc = {
-   .vecs =
-   hmac_sha1_aes_cbc_enc_tv_temp,
-   .count =
-   HMAC_SHA1_AES_CBC_ENC_TEST_VEC
-   }
+   .enc = __VECS(hmac_sha1_aes_cbc_enc_tv_temp)
}
}
}, {
@@ -2295,12 +2283,7 @@ static const struct alg_test_desc alg_test_descs[] = {
.test = alg_test_aead,
.suite = {
.aead = {
-   .enc = {
-   .vecs =
-   hmac_sha1_des_cbc_enc_tv_temp,
-   .count =
-   HMAC_SHA1_DES_CBC_ENC_TEST_VEC
-   }
+   .enc = __VECS(hmac_sha1_des_cbc_enc_tv_temp)
}
}
}, {
@@ -2309,12 +2292,7 @@ static const struct alg_test_desc alg_test_descs[] = {
.fips_allowed = 1,
.suite = {
.aead = {
-   .enc = {
-   .vecs =
-   hmac_sha1_des3_ede_cbc_enc_tv_temp,
-   .count =
-   HMAC_SHA1_DES3_EDE_CBC_ENC_TEST_VEC
-   }
+   .enc = 
__VECS(hmac_sha1_des3_ede_cbc_enc_tv_temp)
}
}
},

Re: [PATCH v4 3/3] drivers: crypto: Enable CPT options crypto for build

2017-01-12 Thread kbuild test robot

Hi George,

[auto build test WARNING on v4.9-rc8]
[cannot apply to cryptodev/master crypto/master sparc-next/master next-20170111]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/George-Cherian/Add-Support-for-Cavium-Cryptographic-Acceleration-Unit/20170112-192240


coccinelle warnings: (new ones prefixed by >>)

>> drivers/crypto/cavium/cpt/cptvf_reqmanager.c:312:2-8: WARNING: NULL check 
>> before freeing functions like kfree, debugfs_remove, 
>> debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider 
>> reorganizing relevant code to avoid passing NULL values.
   drivers/crypto/cavium/cpt/cptvf_reqmanager.c:315:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.
   drivers/crypto/cavium/cpt/cptvf_reqmanager.c:318:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.
   drivers/crypto/cavium/cpt/cptvf_reqmanager.c:321:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] drivers: crypto: fix ifnullfree.cocci warnings

2017-01-12 Thread kbuild test robot

drivers/crypto/cavium/cpt/cptvf_reqmanager.c:312:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.
drivers/crypto/cavium/cpt/cptvf_reqmanager.c:315:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.
drivers/crypto/cavium/cpt/cptvf_reqmanager.c:318:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.
drivers/crypto/cavium/cpt/cptvf_reqmanager.c:321:2-8: WARNING: NULL check 
before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive 
or usb_free_urb is not needed. Maybe consider reorganizing relevant code to 
avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

CC: George Cherian 
Signed-off-by: Fengguang Wu 
---

 cptvf_reqmanager.c |   12 
 1 file changed, 4 insertions(+), 8 deletions(-)

--- a/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
+++ b/drivers/crypto/cavium/cpt/cptvf_reqmanager.c
@@ -308,17 +308,13 @@ void do_request_cleanup(struct cpt_vf *c
}
}
 
-   if (info->scatter_components)
-   kzfree(info->scatter_components);
+   kzfree(info->scatter_components);
 
-   if (info->gather_components)
-   kzfree(info->gather_components);
+   kzfree(info->gather_components);
 
-   if (info->out_buffer)
-   kzfree(info->out_buffer);
+   kzfree(info->out_buffer);
 
-   if (info->in_buffer)
-   kzfree(info->in_buffer);
+   kzfree(info->in_buffer);
 
if (info->completion_addr)
kzfree((void *)info->completion_addr);
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-12 Thread Ondrej Mosnacek

This patch converts dm-crypt to use bulk requests when invoking skcipher
operations, allowing the crypto drivers to process multiple sectors at once,
while reducing the overhead caused by the small sector size.

The new code detects if multiple sectors from a bio are contigously stored
within a single page (which should almost always be the case), and in such case
processes all these sectors via a single bulk request.

Note that the bio can also consist of several (likely consecutive) pages, which
could be all bundled in a single request. However, since we need to specify an
upper bound on how many sectors we are going to send at once (and this bound
may affect the amount of memory allocated per single request), it is best to
just limit the request bundling to a single page.

Note that if the 'keycount' parameter of the cipher specification is set to a
value other than 1, dm-crypt still sends only one sector in each request, since
in such case the neighboring sectors are encrypted with different keys.

This change causes a detectable read/write speedup (about 5-10%) on a ramdisk
when AES-NI accelerated ciphers are used.

Signed-off-by: Ondrej Mosnacek 
---
 drivers/md/dm-crypt.c | 254 --
 1 file changed, 165 insertions(+), 89 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 7c6c572..d3f69e1 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -37,6 +37,9 @@
 
 #define DM_MSG_PREFIX "crypt"
 
+/* for now, we only bundle consecutve sectors within a single page */
+#define MAX_CONSEC_SECTORS (1 << (PAGE_SHIFT - SECTOR_SHIFT))
+
 /*
  * context holding the current state of a multi-part conversion
  */
@@ -48,7 +51,7 @@ struct convert_context {
struct bvec_iter iter_out;
sector_t cc_sector;
atomic_t cc_pending;
-   struct skcipher_request *req;
+   struct skcipher_bulk_request *req;
 };
 
 /*
@@ -73,6 +76,7 @@ struct dm_crypt_request {
struct scatterlist sg_in;
struct scatterlist sg_out;
sector_t iv_sector;
+   sector_t sector_count;
 };
 
 struct crypt_config;
@@ -83,9 +87,9 @@ struct crypt_iv_operations {
void (*dtr)(struct crypt_config *cc);
int (*init)(struct crypt_config *cc);
int (*wipe)(struct crypt_config *cc);
-   int (*generator)(struct crypt_config *cc, u8 *iv,
+   int (*generator)(struct crypt_config *cc, u8 *iv, unsigned int sector,
 struct dm_crypt_request *dmreq);
-   int (*post)(struct crypt_config *cc, u8 *iv,
+   int (*post)(struct crypt_config *cc, u8 *iv, unsigned int sector,
struct dm_crypt_request *dmreq);
 };
 
@@ -163,14 +167,14 @@ struct crypt_config {
/*
 * Layout of each crypto request:
 *
-*   struct skcipher_request
+*   struct skcipher_bulk_request
 *  context
 *  padding
 *   struct dm_crypt_request
 *  padding
-*   IV
+*   IVs
 *
-* The padding is added so that dm_crypt_request and the IV are
+* The padding is added so that dm_crypt_request and the IVs are
 * correctly aligned.
 */
unsigned int dmreq_start;
@@ -245,20 +249,24 @@ static struct crypto_skcipher *any_tfm(struct 
crypt_config *cc)
  * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
  */
 
-static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
  struct dm_crypt_request *dmreq)
 {
+   u8 *iv = ivs + i * cc->iv_size;
+
memset(iv, 0, cc->iv_size);
-   *(__le32 *)iv = cpu_to_le32(dmreq->iv_sector & 0x);
+   *(__le32 *)iv = cpu_to_le32((dmreq->iv_sector + i) & 0x);
 
return 0;
 }
 
-static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *iv,
-   struct dm_crypt_request *dmreq)
+static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *ivs,
+   unsigned int i, struct dm_crypt_request *dmreq)
 {
+   u8 *iv = ivs + i * cc->iv_size;
+
memset(iv, 0, cc->iv_size);
-   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i);
 
return 0;
 }
@@ -410,13 +418,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, 
struct dm_target *ti,
return err;
 }
 
-static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
  struct dm_crypt_request *dmreq)
 {
struct crypto_cipher *essiv_tfm = cc->iv_private;
+   u8 *iv = ivs + i * cc->iv_size;
 
memset(iv, 0, cc->iv_size);
-   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i);
crypto_cipher_encr

[RFC PATCH 1/6] crypto: skcipher - Add bulk request processing API

2017-01-12 Thread Ondrej Mosnacek

This patch adds bulk request processing to the skcipher interface.
Specifically, it adds a new type of request ('skcipher_bulk_request'), which
allows passing multiple independent messages to the skcipher driver.

The buffers for the message data are passed via just two sg lists (one for src
buffer, one for dst buffer). The IVs are passed via a single buffer, where they
are stored sequentially. The interface allows specifying either a fixed length
for all messages or a pointer to an array of message lengths.

A skcipher implementation that wants to provide support for bulk requests may
set the appropriate fields of its skcipher_alg struct. If these fields are not
provided (or the skcipher is created from an (a)blkcipher), the crypto API
automatically sets these fields to a fallback implementation, which just splits
the bulk request into a series of regular skcipher requests on the same tfm.

This means that the new type of request can be used with all skciphers, even if
they do not support bulk requests natively.

Note that when allocating a skcipher_bulk_request, the user must specify the
maximum number of messages that they are going to submit via the request. This
is necessary for the fallback implementation, which has to allocate space for
the appropriate number of subrequests so that they can be processed in
parallel. If the skcipher is synchronous, then the fallback implementation
only allocates space for one subrequest and processes the patrial requests
sequentially.

Signed-off-by: Ondrej Mosnacek 
---
 crypto/Makefile|   1 +
 crypto/skcipher.c  |  15 ++
 crypto/skcipher_bulk.c | 312 +
 include/crypto/internal/skcipher.h |  32 
 include/crypto/skcipher.h  | 299 ++-
 5 files changed, 658 insertions(+), 1 deletion(-)
 create mode 100644 crypto/skcipher_bulk.c

diff --git a/crypto/Makefile b/crypto/Makefile
index b8f0e3e..cd1cf57 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_CRYPTO_AEAD2) += aead.o
 crypto_blkcipher-y := ablkcipher.o
 crypto_blkcipher-y += blkcipher.o
 crypto_blkcipher-y += skcipher.o
+crypto_blkcipher-y += skcipher_bulk.o
 obj-$(CONFIG_CRYPTO_BLKCIPHER2) += crypto_blkcipher.o
 obj-$(CONFIG_CRYPTO_SEQIV) += seqiv.o
 obj-$(CONFIG_CRYPTO_ECHAINIV) += echainiv.o
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 6ee6a15..8b6d684 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -667,6 +667,8 @@ static int crypto_init_skcipher_ops_blkcipher(struct 
crypto_tfm *tfm)
skcipher->ivsize = crypto_blkcipher_ivsize(blkcipher);
skcipher->keysize = calg->cra_blkcipher.max_keysize;
 
+   crypto_skcipher_bulk_set_fallback(skcipher);
+
return 0;
 }
 
@@ -760,6 +762,8 @@ static int crypto_init_skcipher_ops_ablkcipher(struct 
crypto_tfm *tfm)
sizeof(struct ablkcipher_request);
skcipher->keysize = calg->cra_ablkcipher.max_keysize;
 
+   crypto_skcipher_bulk_set_fallback(skcipher);
+
return 0;
 }
 
@@ -789,6 +793,14 @@ static int crypto_skcipher_init_tfm(struct crypto_tfm *tfm)
skcipher->ivsize = alg->ivsize;
skcipher->keysize = alg->max_keysize;
 
+   if (!alg->encrypt_bulk || !alg->decrypt_bulk || !alg->reqsize_bulk)
+   crypto_skcipher_bulk_set_fallback(skcipher);
+   else {
+   skcipher->encrypt_bulk = alg->encrypt_bulk;
+   skcipher->decrypt_bulk = alg->decrypt_bulk;
+   skcipher->reqsize_bulk = alg->reqsize_bulk;
+   }
+
if (alg->exit)
skcipher->base.exit = crypto_skcipher_exit_tfm;
 
@@ -822,6 +834,9 @@ static void crypto_skcipher_show(struct seq_file *m, struct 
crypto_alg *alg)
seq_printf(m, "ivsize   : %u\n", skcipher->ivsize);
seq_printf(m, "chunksize: %u\n", skcipher->chunksize);
seq_printf(m, "walksize : %u\n", skcipher->walksize);
+   seq_printf(m, "bulk : %s\n",
+  (skcipher->encrypt_bulk && skcipher->decrypt_bulk &&
+   skcipher->reqsize_bulk) ?  "yes" : "no");
 }
 
 #ifdef CONFIG_NET
diff --git a/crypto/skcipher_bulk.c b/crypto/skcipher_bulk.c
new file mode 100644
index 000..9630122
--- /dev/null
+++ b/crypto/skcipher_bulk.c
@@ -0,0 +1,312 @@
+/*
+ * Bulk IV fallback for skcipher.
+ *
+ * Copyright (C) 2016-2017 Red Hat, Inc. All rights reserved.
+ * Copyright (c) 2016-2017 Ondrej Mosnacek 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct skcipher_bulk_subreqctx {
+   struct scatterlist sg_src[2];
+   struct scatterlist sg_dst[2];

[RFC PATCH 4/6] crypto: simd - Add bulk request support

2017-01-12 Thread Ondrej Mosnacek

This patch adds proper support for the new bulk requests to the SIMD helpers.

Signed-off-by: Ondrej Mosnacek 
---
 crypto/simd.c | 61 +++
 1 file changed, 61 insertions(+)

diff --git a/crypto/simd.c b/crypto/simd.c
index 8820337..2ae5930 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
@@ -100,6 +100,64 @@ static int simd_skcipher_decrypt(struct skcipher_request 
*req)
return crypto_skcipher_decrypt(subreq);
 }
 
+static int simd_skcipher_encrypt_bulk(struct skcipher_bulk_request *req)
+{
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct skcipher_bulk_request *subreq;
+   struct crypto_skcipher *child;
+
+   subreq = skcipher_bulk_request_ctx(req);
+   *subreq = *req;
+
+   if (!may_use_simd() ||
+   (in_atomic() && cryptd_skcipher_queued(ctx->cryptd_tfm)))
+   child = &ctx->cryptd_tfm->base;
+   else
+   child = cryptd_skcipher_child(ctx->cryptd_tfm);
+
+   skcipher_bulk_request_set_tfm(subreq, child);
+
+   return crypto_skcipher_encrypt_bulk(subreq);
+}
+
+static int simd_skcipher_decrypt_bulk(struct skcipher_bulk_request *req)
+{
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct skcipher_bulk_request *subreq;
+   struct crypto_skcipher *child;
+
+   subreq = skcipher_bulk_request_ctx(req);
+   *subreq = *req;
+
+   if (!may_use_simd() ||
+   (in_atomic() && cryptd_skcipher_queued(ctx->cryptd_tfm)))
+   child = &ctx->cryptd_tfm->base;
+   else
+   child = cryptd_skcipher_child(ctx->cryptd_tfm);
+
+   skcipher_bulk_request_set_tfm(subreq, child);
+
+   return crypto_skcipher_decrypt_bulk(subreq);
+}
+
+static unsigned int simd_skcipher_reqsize_bulk(struct crypto_skcipher *tfm,
+  unsigned int maxmsgs)
+{
+   struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_skcipher *tfm_cryptd, *tfm_child;
+   unsigned int reqsize_cryptd, reqsize_child;
+
+   tfm_cryptd = &ctx->cryptd_tfm->base;
+   tfm_child = cryptd_skcipher_child(ctx->cryptd_tfm);
+
+   reqsize_cryptd = crypto_skcipher_bulk_reqsize(tfm_cryptd, maxmsgs);
+   reqsize_child = crypto_skcipher_bulk_reqsize(tfm_child, maxmsgs);
+   return sizeof(struct skcipher_bulk_request) +
+   max(reqsize_cryptd, reqsize_child);
+}
+
 static void simd_skcipher_exit(struct crypto_skcipher *tfm)
 {
struct simd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
@@ -187,6 +245,9 @@ struct simd_skcipher_alg *simd_skcipher_create_compat(const 
char *algname,
alg->setkey = simd_skcipher_setkey;
alg->encrypt = simd_skcipher_encrypt;
alg->decrypt = simd_skcipher_decrypt;
+   alg->encrypt_bulk = simd_skcipher_encrypt_bulk;
+   alg->decrypt_bulk = simd_skcipher_decrypt_bulk;
+   alg->reqsize_bulk = simd_skcipher_reqsize_bulk;
 
err = crypto_register_skcipher(alg);
if (err)
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/6] crypto: cryptd - Add skcipher bulk request support

2017-01-12 Thread Ondrej Mosnacek

This patch adds proper support for the new bulk requests to cryptd.

Signed-off-by: Ondrej Mosnacek 
---
 crypto/cryptd.c | 111 
 1 file changed, 111 insertions(+)

diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 0508c48..b7d6e13 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -555,6 +555,114 @@ static int cryptd_skcipher_decrypt_enqueue(struct 
skcipher_request *req)
return cryptd_skcipher_enqueue(req, cryptd_skcipher_decrypt);
 }
 
+static void cryptd_skcipher_bulk_complete(struct skcipher_bulk_request *req,
+ int err)
+{
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct cryptd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct cryptd_skcipher_request_ctx *rctx =
+   skcipher_bulk_request_ctx(req);
+   int refcnt = atomic_read(&ctx->refcnt);
+
+   local_bh_disable();
+   rctx->complete(&req->base, err);
+   local_bh_enable();
+
+   if (err != -EINPROGRESS && refcnt && atomic_dec_and_test(&ctx->refcnt))
+   crypto_free_skcipher(tfm);
+}
+
+static void cryptd_skcipher_bulk_encrypt(struct crypto_async_request *base,
+int err)
+{
+   struct skcipher_bulk_request *req = skcipher_bulk_request_cast(base);
+   struct cryptd_skcipher_request_ctx *rctx =
+   skcipher_bulk_request_ctx(req);
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct cryptd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_skcipher *child = ctx->child;
+   SKCIPHER_BULK_REQUEST_ON_STACK(subreq, req->maxmsgs, child);
+
+   if (unlikely(err == -EINPROGRESS))
+   goto out;
+
+   skcipher_bulk_request_set_tfm(subreq, child);
+   skcipher_bulk_request_set_callback(subreq, CRYPTO_TFM_REQ_MAY_SLEEP,
+  NULL, NULL);
+   skcipher_bulk_request_set_crypt(subreq, req->src, req->dst, req->nmsgs,
+   req->msgsize, req->msgsizes, req->ivs);
+
+   err = crypto_skcipher_encrypt_bulk(subreq);
+   skcipher_bulk_request_zero(subreq);
+
+   req->base.complete = rctx->complete;
+
+out:
+   cryptd_skcipher_bulk_complete(req, err);
+}
+
+static void cryptd_skcipher_bulk_decrypt(struct crypto_async_request *base,
+int err)
+{
+   struct skcipher_bulk_request *req = skcipher_bulk_request_cast(base);
+   struct cryptd_skcipher_request_ctx *rctx =
+   skcipher_bulk_request_ctx(req);
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct cryptd_skcipher_ctx *ctx = crypto_skcipher_ctx(tfm);
+   struct crypto_skcipher *child = ctx->child;
+   SKCIPHER_BULK_REQUEST_ON_STACK(subreq, req->maxmsgs, child);
+
+   if (unlikely(err == -EINPROGRESS))
+   goto out;
+
+   skcipher_bulk_request_set_tfm(subreq, child);
+   skcipher_bulk_request_set_callback(subreq, CRYPTO_TFM_REQ_MAY_SLEEP,
+  NULL, NULL);
+   skcipher_bulk_request_set_crypt(subreq, req->src, req->dst, req->nmsgs,
+   req->msgsize, req->msgsizes, req->ivs);
+
+   err = crypto_skcipher_decrypt_bulk(subreq);
+   skcipher_bulk_request_zero(subreq);
+
+   req->base.complete = rctx->complete;
+
+out:
+   cryptd_skcipher_bulk_complete(req, err);
+}
+
+static int cryptd_skcipher_bulk_enqueue(struct skcipher_bulk_request *req,
+   crypto_completion_t compl)
+{
+   struct cryptd_skcipher_request_ctx *rctx =
+   skcipher_bulk_request_ctx(req);
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct cryptd_queue *queue;
+
+   queue = cryptd_get_queue(crypto_skcipher_tfm(tfm));
+   rctx->complete = req->base.complete;
+   req->base.complete = compl;
+
+   return cryptd_enqueue_request(queue, &req->base);
+}
+
+static int cryptd_skcipher_bulk_encrypt_enqueue(
+   struct skcipher_bulk_request *req)
+{
+   return cryptd_skcipher_bulk_enqueue(req, cryptd_skcipher_bulk_encrypt);
+}
+
+static int cryptd_skcipher_bulk_decrypt_enqueue(
+   struct skcipher_bulk_request *req)
+{
+   return cryptd_skcipher_bulk_enqueue(req, cryptd_skcipher_bulk_decrypt);
+}
+
+static unsigned int cryptd_skcipher_bulk_reqsize(struct crypto_skcipher *tfm,
+unsigned int maxmsgs)
+{
+   return sizeof(struct cryptd_skcipher_request_ctx);
+}
+
 static int cryptd_skcipher_init_tfm(struct crypto_skcipher *tfm)
 {
struct skcipher_instance *inst = skcipher_alg_instance(tfm);
@@ -641,6 +749,9 @@ static int cryptd_create_skcipher(struct crypto_template 
*tmpl,
inst->alg.setkey = cryptd_skci

[RFC PATCH 0/6] Add bulk skcipher requests to crypto API and dm-crypt

2017-01-12 Thread Ondrej Mosnacek

Hi,

the goal of this patchset is to allow those skcipher API users that need to
process batches of small messages (especially dm-crypt) to do so efficiently.

The first patch introduces a new request type (and corresponding encrypt/decrypt
functions) to the skcipher API. The new API can be used to submit multiple
messages at once, thus enabling the drivers to reduce overhead as opposed to
processing each message separately.

The skcipher drivers can provide support for the new request type by setting the
corresponding fields of their skcipher_alg structure. If 'native' support is not
provided by a driver (i.e. the fields are left NULL), the crypto API
transparently provides a generic fallback implementation, which simply processes
the bulk request as a set of standard requests on the same tfm.

The second patch extends skcipher_walk so it can be used for processing the new
bulk requests, while preserving equivalent functionality when used with standard
requests.

The third and fourth patches add native bulk request support to the cryptd and
SIMD helper wrappers, respectively.

The fifth patch adds bulk request support to the AES-NI skcipher drivers, in
order to provide an example for both implementing the bulk request processing
and the usage of the extended skcipher_walk in such implementation. Also, this
patch provides a slight optimization, since the kernel_fpu_* functions are
called just once per the whole bulk request. Note that both the standard and
bulk implementation mostly use the same code under the hood.

The last patch converts dm-crypt to use bulk requests and makes it submit
multiple sectors at once, whenever they are stored sequentially within a single
page.

With all the patches applied, I was able to measure a small speedup (~5-10%)
with AES-NI ciphers and dm-crypt device mapped over a ramdisk.

To-be-done:
testing the bulk API in testmgr.c
documentation update

Ondrej Mosnacek (6):
  crypto: skcipher - Add bulk request processing API
  crypto: skcipher - Add bulk request support to walk
  crypto: cryptd - Add skcipher bulk request support
  crypto: simd - Add bulk request support
  crypto: aesni-intel - Add bulk request support
  dm-crypt: Add bulk crypto processing support

 arch/x86/crypto/aesni-intel_glue.c| 267 +++--
 arch/x86/crypto/glue_helper.c |  23 +--
 arch/x86/include/asm/crypto/glue_helper.h |   2 +-
 crypto/Makefile   |   1 +
 crypto/cryptd.c   | 111 +++
 crypto/simd.c |  61 ++
 crypto/skcipher.c | 207 +++-
 crypto/skcipher_bulk.c| 312 ++
 drivers/md/dm-crypt.c | 254 +++-
 include/crypto/internal/skcipher.h|  42 +++-
 include/crypto/skcipher.h | 299 +++-
 11 files changed, 1369 insertions(+), 210 deletions(-)
 create mode 100644 crypto/skcipher_bulk.c

-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 5/6] crypto: aesni-intel - Add bulk request support

2017-01-12 Thread Ondrej Mosnacek

This patch implements bulk request handling in the AES-NI crypto drivers.
The major advantage of this is that with bulk requests, the kernel_fpu_*
functions (which are usually quite slow) are now called only once for the whole
request.

Signed-off-by: Ondrej Mosnacek 
---
 arch/x86/crypto/aesni-intel_glue.c| 267 +++---
 arch/x86/crypto/glue_helper.c |  23 ++-
 arch/x86/include/asm/crypto/glue_helper.h |   2 +-
 3 files changed, 221 insertions(+), 71 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 36ca150..5f67afc 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -364,70 +364,116 @@ static int aesni_skcipher_setkey(struct crypto_skcipher 
*tfm, const u8 *key,
  crypto_skcipher_ctx(tfm), key, len);
 }
 
-static int ecb_encrypt(struct skcipher_request *req)
+typedef void (*aesni_crypt_t)(struct crypto_aes_ctx *ctx,
+ u8 *out, const u8 *in, unsigned int len);
+
+typedef void (*aesni_ivcrypt_t)(struct crypto_aes_ctx *ctx,
+   u8 *out, const u8 *in, unsigned int len,
+   u8 *iv);
+
+static int ecb_crypt(struct crypto_aes_ctx *ctx, struct skcipher_walk *walk,
+aesni_crypt_t crypt)
 {
-   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-   struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
-   struct skcipher_walk walk;
unsigned int nbytes;
int err;
 
-   err = skcipher_walk_virt(&walk, req, true);
-
kernel_fpu_begin();
-   while ((nbytes = walk.nbytes)) {
-   aesni_ecb_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK);
+   while ((nbytes = walk->nbytes)) {
+   crypt(ctx, walk->dst.virt.addr, walk->src.virt.addr,
+ nbytes & AES_BLOCK_MASK);
nbytes &= AES_BLOCK_SIZE - 1;
-   err = skcipher_walk_done(&walk, nbytes);
+   err = skcipher_walk_done(walk, nbytes);
}
kernel_fpu_end();
 
return err;
 }
 
-static int ecb_decrypt(struct skcipher_request *req)
+static int cbc_crypt(struct crypto_aes_ctx *ctx, struct skcipher_walk *walk,
+aesni_ivcrypt_t crypt)
 {
-   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-   struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
-   struct skcipher_walk walk;
unsigned int nbytes;
int err;
 
-   err = skcipher_walk_virt(&walk, req, true);
-
kernel_fpu_begin();
-   while ((nbytes = walk.nbytes)) {
-   aesni_ecb_dec(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK);
+   while ((nbytes = walk->nbytes)) {
+   crypt(ctx, walk->dst.virt.addr, walk->src.virt.addr,
+ nbytes & AES_BLOCK_MASK, walk->iv);
nbytes &= AES_BLOCK_SIZE - 1;
-   err = skcipher_walk_done(&walk, nbytes);
+   err = skcipher_walk_done(walk, nbytes);
}
kernel_fpu_end();
 
return err;
 }
 
-static int cbc_encrypt(struct skcipher_request *req)
+static int ecb_encrypt(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
struct skcipher_walk walk;
-   unsigned int nbytes;
int err;
 
err = skcipher_walk_virt(&walk, req, true);
+   if (err)
+   return err;
 
-   kernel_fpu_begin();
-   while ((nbytes = walk.nbytes)) {
-   aesni_cbc_enc(ctx, walk.dst.virt.addr, walk.src.virt.addr,
- nbytes & AES_BLOCK_MASK, walk.iv);
-   nbytes &= AES_BLOCK_SIZE - 1;
-   err = skcipher_walk_done(&walk, nbytes);
-   }
-   kernel_fpu_end();
+   return ecb_crypt(ctx, &walk, aesni_ecb_enc);
+}
 
-   return err;
+static int ecb_decrypt(struct skcipher_request *req)
+{
+   struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+   struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+   struct skcipher_walk walk;
+   int err;
+
+   err = skcipher_walk_virt(&walk, req, true);
+   if (err)
+   return err;
+
+   return ecb_crypt(ctx, &walk, aesni_ecb_dec);
+}
+
+static int ecb_encrypt_bulk(struct skcipher_bulk_request *req)
+{
+   struct crypto_skcipher *tfm = crypto_skcipher_bulk_reqtfm(req);
+   struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm));
+   struct skcipher_walk walk;
+   int err;
+
+   err = skcipher_walk_virt_bulk(&walk, req, true);
+   if (err)
+   return err;
+
+   return ecb_crypt(ctx, &walk, aesni_ecb_enc);
+}
+
+static int ecb_decrypt_bulk(s

[RFC PATCH 2/6] crypto: skcipher - Add bulk request support to walk

2017-01-12 Thread Ondrej Mosnacek

This patch tweaks skcipher_walk so it can be used with the new bulk requests.

The new skipher_walk can be initialized either from a skcipher_request (in
which case its behavior is equivalent to the old code) or from a
skcipher_bulk_request, in which case the usage is almost identical, the most
significant exception being that skciphers which somehow tweak the IV
(e.g. XTS) must check the new nextmsg flag before processing each chunk and
re-tweak the IV if it is set. For other ciphers skcipher_walk automatically
switches to the next IV at message boundaries.

Signed-off-by: Ondrej Mosnacek 
---
 crypto/skcipher.c  | 192 +++--
 include/crypto/internal/skcipher.h |  10 +-
 2 files changed, 153 insertions(+), 49 deletions(-)

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 8b6d684..b810e90 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -33,6 +33,7 @@ enum {
SKCIPHER_WALK_COPY = 1 << 2,
SKCIPHER_WALK_DIFF = 1 << 3,
SKCIPHER_WALK_SLEEP = 1 << 4,
+   SKCIPHER_WALK_HETEROGENOUS = 1 << 5,
 };
 
 struct skcipher_walk_buffer {
@@ -94,6 +95,41 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int 
len)
return max(start, end_page);
 }
 
+static int skcipher_copy_iv(struct skcipher_walk *walk)
+{
+   unsigned a = crypto_tfm_ctx_alignment() - 1;
+   unsigned alignmask = walk->alignmask;
+   unsigned ivsize = walk->ivsize;
+   unsigned bs = walk->stride;
+   unsigned aligned_bs;
+   unsigned size;
+   u8 *iv;
+
+   aligned_bs = ALIGN(bs, alignmask);
+
+   /* Minimum size to align buffer by alignmask. */
+   size = alignmask & ~a;
+
+   if (walk->flags & SKCIPHER_WALK_PHYS)
+   size += ivsize;
+   else {
+   size += aligned_bs + ivsize;
+
+   /* Minimum size to ensure buffer does not straddle a page. */
+   size += (bs - 1) & ~(alignmask | a);
+   }
+
+   walk->buffer = kmalloc(size, skcipher_walk_gfp(walk));
+   if (!walk->buffer)
+   return -ENOMEM;
+
+   iv = PTR_ALIGN(walk->buffer, alignmask + 1);
+   iv = skcipher_get_spot(iv, bs) + aligned_bs;
+
+   walk->iv = memcpy(iv, walk->iv, walk->ivsize);
+   return 0;
+}
+
 static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
 {
u8 *addr;
@@ -108,9 +144,12 @@ static int skcipher_done_slow(struct skcipher_walk *walk, 
unsigned int bsize)
 int skcipher_walk_done(struct skcipher_walk *walk, int err)
 {
unsigned int n = walk->nbytes - err;
-   unsigned int nbytes;
+   unsigned int nbytes, nbytes_msg;
+
+   walk->nextmsg = false; /* reset the nextmsg flag */
 
nbytes = walk->total - n;
+   nbytes_msg = walk->total_msg - n;
 
if (unlikely(err < 0)) {
nbytes = 0;
@@ -139,8 +178,31 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
if (err > 0)
err = 0;
 
+   if (nbytes && !nbytes_msg) {
+   walk->nextmsg = true;
+
+   /* write the output IV: */
+   if (walk->iv != walk->oiv)
+   memcpy(walk->oiv, walk->iv, walk->ivsize);
+
+   /* advance to the IV of next message: */
+   walk->oiv += walk->ivsize;
+   walk->iv = walk->oiv;
+
+   if (unlikely(((unsigned long)walk->iv & walk->alignmask))) {
+   err = skcipher_copy_iv(walk);
+   if (err)
+   return err;
+   }
+
+   nbytes_msg = *walk->nextmsgsize;
+   if (walk->flags & SKCIPHER_WALK_HETEROGENOUS)
+   ++walk->nextmsgsize;
+   }
+
+   walk->nbytes = nbytes_msg;
+   walk->total_msg = nbytes_msg;
walk->total = nbytes;
-   walk->nbytes = nbytes;
 
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
@@ -343,13 +405,13 @@ static int skcipher_walk_next(struct skcipher_walk *walk)
walk->flags &= ~(SKCIPHER_WALK_SLOW | SKCIPHER_WALK_COPY |
 SKCIPHER_WALK_DIFF);
 
-   n = walk->total;
+   n = walk->total_msg;
bsize = min(walk->stride, max(n, walk->blocksize));
n = scatterwalk_clamp(&walk->in, n);
n = scatterwalk_clamp(&walk->out, n);
 
if (unlikely(n < bsize)) {
-   if (unlikely(walk->total < walk->blocksize))
+   if (unlikely(walk->total_msg < walk->blocksize))
return skcipher_walk_done(walk, -EINVAL);
 
 slow_path:
@@ -388,41 +450,6 @@ static int skcipher_walk_next(struct skcipher_walk *walk)
 }
 EXPORT_SYMBOL_GPL(skcipher_walk_next);
 
-static int skcipher_copy_iv(struct skcipher_walk *walk)
-{
-   unsigned a = crypto_tfm_ctx_alignment() - 1;
-   unsigned alignmask = walk->alignmask;
-   unsigned ivsize = walk->ivsize;
-   unsigned bs = walk->stride;
-

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 14:19:36 CET schrieb Herbert Xu:

Hi Herbert,

> 
> I think it's too much churn.  Let's get the algif_aead code fixed
> up first, and then pursue this later.

To eliminate the extra churn of having all AEAD implementations changed to 
invoke copy operation, what about adding the callback to crypto_aead_encrypt?

This way, all AEAD implementations benefit from it without having an extra 
call added to each of them?

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 3/3] drivers: crypto: Enable CPT options crypto for build

2017-01-12 Thread kbuild test robot

Hi George,

[auto build test ERROR on v4.9-rc8]
[cannot apply to cryptodev/master crypto/master sparc-next/master next-20170111]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/George-Cherian/Add-Support-for-Cavium-Cryptographic-Acceleration-Unit/20170112-192240
config: blackfin-allmodconfig (attached as .config)
compiler: bfin-uclinux-gcc (GCC) 6.2.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=blackfin 

All error/warnings (new ones prefixed by >>):

   In file included from drivers/crypto/cavium/cpt/cptvf.h:13:0,
from drivers/crypto/cavium/cpt/cptvf_main.c:12:
   drivers/crypto/cavium/cpt/cpt_common.h: In function 'cpt_write_csr64':
>> drivers/crypto/cavium/cpt/cpt_common.h:151:2: error: implicit declaration of 
>> function 'writeq' [-Werror=implicit-function-declaration]
 writeq(val, hw_addr + offset);
 ^~
   drivers/crypto/cavium/cpt/cpt_common.h: In function 'cpt_read_csr64':
>> drivers/crypto/cavium/cpt/cpt_common.h:156:9: error: implicit declaration of 
>> function 'readq' [-Werror=implicit-function-declaration]
 return readq(hw_addr + offset);
^
   In file included from drivers/crypto/cavium/cpt/cptvf_main.c:12:0:
   drivers/crypto/cavium/cpt/cptvf.h: At top level:
>> drivers/crypto/cavium/cpt/cptvf.h:111:20: error: array type has incomplete 
>> element type 'struct msix_entry'
 struct msix_entry msix_entries[CPT_VF_MSIX_VECTORS];
   ^~~~
   drivers/crypto/cavium/cpt/cptvf_main.c: In function 'init_worker_threads':
>> drivers/crypto/cavium/cpt/cptvf_main.c:52:9: warning: cast from pointer to 
>> integer of different size [-Wpointer-to-int-cast]
(u64)cwqe_info);
^
   drivers/crypto/cavium/cpt/cptvf_main.c: In function 'cptvf_disable_msix':
>> drivers/crypto/cavium/cpt/cptvf_main.c:375:3: error: implicit declaration of 
>> function 'pci_disable_msix' [-Werror=implicit-function-declaration]
  pci_disable_msix(cptvf->pdev);
  ^~~~
   drivers/crypto/cavium/cpt/cptvf_main.c: In function 'cptvf_enable_msix':
>> drivers/crypto/cavium/cpt/cptvf_main.c:387:8: error: implicit declaration of 
>> function 'pci_enable_msix' [-Werror=implicit-function-declaration]
 ret = pci_enable_msix(cptvf->pdev, cptvf->msix_entries,
   ^~~
   drivers/crypto/cavium/cpt/cptvf_main.c: At top level:
>> drivers/crypto/cavium/cpt/cptvf_main.c:942:1: warning: data definition has 
>> no type or storage class
module_pci_driver(cptvf_pci_driver);
^
>> drivers/crypto/cavium/cpt/cptvf_main.c:942:1: error: type defaults to 'int' 
>> in declaration of 'module_pci_driver' [-Werror=implicit-int]
>> drivers/crypto/cavium/cpt/cptvf_main.c:942:1: warning: parameter names 
>> (without types) in function declaration
   drivers/crypto/cavium/cpt/cptvf_main.c:934:26: warning: 'cptvf_pci_driver' 
defined but not used [-Wunused-variable]
static struct pci_driver cptvf_pci_driver = {
 ^~~~
   cc1: some warnings being treated as errors
--
   In file included from drivers/crypto/cavium/cpt/cptvf.h:13:0,
from drivers/crypto/cavium/cpt/cptvf_reqmanager.c:9:
   drivers/crypto/cavium/cpt/cpt_common.h: In function 'cpt_write_csr64':
>> drivers/crypto/cavium/cpt/cpt_common.h:151:2: error: implicit declaration of 
>> function 'writeq' [-Werror=implicit-function-declaration]
 writeq(val, hw_addr + offset);
 ^~
   drivers/crypto/cavium/cpt/cpt_common.h: In function 'cpt_read_csr64':
>> drivers/crypto/cavium/cpt/cpt_common.h:156:9: error: implicit declaration of 
>> function 'readq' [-Werror=implicit-function-declaration]
 return readq(hw_addr + offset);
^
   In file included from drivers/crypto/cavium/cpt/cptvf_reqmanager.c:9:0:
   drivers/crypto/cavium/cpt/cptvf.h: At top level:
>> drivers/crypto/cavium/cpt/cptvf.h:111:20: error: array type has incomplete 
>> element type 'struct msix_entry'
 struct msix_entry msix_entries[CPT_VF_MSIX_VECTORS];
   ^~~~
   cc1: some warnings being treated as errors
--
   In file included from drivers/crypto/cavium/cpt/cptpf.h:12:0,
from drivers/crypto/cavium/cpt/cptpf_main.c:18:
   drivers/crypto/cavium/cpt/cpt_common.h: In function 'cpt_write_csr64':
>> driv

Re: [PATCH 00/13] crypto: copy AAD during encrypt for AEAD ciphers

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 14:19:36 CET schrieb Herbert Xu:

Hi Herbert,

> On Tue, Jan 10, 2017 at 02:36:21AM +0100, Stephan Müller wrote:
> > to all driver maintainers: the patches I added are compile tested, but
> > I do not have the hardware to verify the code. May I ask the respective
> > hardware maintainers to verify that the code is appropriate and works
> > as intended? Thanks a lot.
> > 
> > Herbert, this is my proprosal for our discussion around copying the
> > AAD for algif_aead. Instead of adding the code to algif_aead and wait
> > until it transpires to all cipher implementations, I thought it would
> > be more helpful to fix all cipher implementations.
> 
> I think it's too much churn.  Let's get the algif_aead code fixed
> up first, and then pursue this later.

My idea with this patch set was to have only a minimal change to any AEAD 
implementation, i.e. one callback to address this issue.

When addressing the issue in the algif_aead code, and expect that over time 
the AEAD implementations will gain the copy operation, eventually we will copy 
the AAD twice. Of course, this could be prevented, if the algif_aead code 
somehow uses the same SGL for the src and dst AAD.

Some time back I published the patch "crypto: AF_ALG - disregard AAD buffer 
space for output". This patch tried changing the src and dst SGL to remove the 
AAD. Considering this patch trying to change the src and dst SGL structure, I 
expect that the patch to algif_aead to have a common src/dst SGL for the AAD 
to prevent the AAD copy from the AEAD implementations is similar in 
complexity.

Weighing the complexity of such temporary band-aid for algif_aead with the 
addition of one callback to each AEAD implementation (which would need to be 
added some when anyway), I thought it is less complex to add the one callback 
to the AEAD implementations.
> 
> BTW, why are you only doing the copy for encryption?

I was looking at the only AEAD implementation that does the copy operation: 
authenc. There, the copy operation is only performed for encryption. I was 
thinking a bit about why decryption was not covered. I think the answer is the 
following: for encryption, the AAD is definitely needed in the dst buffer as 
the dst buffer with the AAD must be sent to the recipient for decryption. The 
decryption and the associated authentication only works with the AAD. However, 
after decrypting, all the caller wants is the decrypted plaintext only. There 
is no further use of the AAD after the decryption step. Hence, copying the AAD 
to the dst buffer in the decryption step would not serve the caller.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-12 Thread George Cherian


Hi Stephan,

Thank you for the clarification.

Regards,
-George

On 01/12/2017 04:49 PM, Stephan Müller wrote:

Am Donnerstag, 12. Januar 2017, 16:40:32 CET schrieb George Cherian:

Hi George,



Sure, please do not forget to invoke xts_verify_key.


Should I be using xts_check_key or xts_verify_key?


Both are identical except for the input parameter -- the one requires
crypto_skcipher, the other crypto_tfm. Depending what pointer you have handy
in your setkey function, you would use the most appropriate one.

Ciao
Stephan


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-12 Thread Stephan Müller

Am Donnerstag, 12. Januar 2017, 16:40:32 CET schrieb George Cherian:

Hi George,

> > 
> > Sure, please do not forget to invoke xts_verify_key.
> 
> Should I be using xts_check_key or xts_verify_key?

Both are identical except for the input parameter -- the one requires 
crypto_skcipher, the other crypto_tfm. Depending what pointer you have handy 
in your setkey function, you would use the most appropriate one.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 2/3] drivers: crypto: Add the Virtual Function driver for CPT

2017-01-12 Thread George Cherian


Hi Stephan,

On 01/11/2017 06:09 PM, Stephan Müller wrote:

Am Mittwoch, 11. Januar 2017, 16:58:17 CET schrieb George Cherian:

Hi George,


I will add a seperate function for xts setkey and make changes as following.

...


+
+struct crypto_alg algs[] = { {
+   .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
+   .cra_blocksize = AES_BLOCK_SIZE,
+   .cra_ctxsize = sizeof(struct cvm_enc_ctx),
+   .cra_alignmask = 7,
+   .cra_priority = 4001,
+   .cra_name = "xts(aes)",
+   .cra_driver_name = "cavium-xts-aes",
+   .cra_type = &crypto_ablkcipher_type,
+   .cra_u = {
+   .ablkcipher = {
+   .ivsize = AES_BLOCK_SIZE,
+   .min_keysize = AES_MIN_KEY_SIZE,
+   .max_keysize = AES_MAX_KEY_SIZE,
+   .setkey = cvm_enc_dec_setkey,


May I ask how the setkey for XTS is intended to work? The XTS keys are
double in size than "normal" keys.


.ablkcipher = {
.ivsize = AES_BLOCK_SIZE,
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.setkey = cvm_xts_setkey,

Hope this is fine?


Sure, please do not forget to invoke xts_verify_key.


Should I be using xts_check_key or xts_verify_key?



Ciao
Stephan



Regards,
-George
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Ingo Molnar

* Herbert Xu  wrote:

> > But if we can't do this with automatic verification, then I'm not sure
> > I want to do it at all.  The asm is already more precarious than I'd
> > like, and having a code path that is misaligned is asking for obscure
> > bugs down the road.
> 
> I understand the need for automated checks at this point in time.
>  But longer term this is just part of the calling ABI.  After all,
> we don't add checks everywhere to ensure people preserve rbx.

The intelligent and responsible way to introduce such post facto ABI changes is 
via a smarter assembler: which would detect the obvious cases where assembly 
code 
generates a misaligned stack, at build time.

Assembly code can obviously still mess up in a hard to detect fashion if it 
tries 
- but that's OK, as most assembly code doesn't try to go outside regular stack 
allocation patterns.

Such a static check is relatively straightforward to do in assembly tooling - 
and 
perhaps objtool could do this too, as it already tracks the instructions that 
change the stack offset.

( And yes, this is what the GCC guys should have done, instead of sloppily 
  introducing such silent breakages and making the whole application landscape 
  less robust ... )

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Herbert Xu

On Thu, Jan 12, 2017 at 08:01:51AM +, Ard Biesheuvel wrote:
> 
> [From memory] the arm64 ELF psABI mandates a 16 byte stack alignment
> at function entry, and 8 byte alignment at all other times. This means
> compiled code will typically preserve 16 byte alignment, and
> __aligned(16) on a stack variable will likely not result in an
> explicit alignment of the stack pointer *. But the arm64 ISA does not
> have any load/store instructions that would trigger a misalignment
> fault on an address that is 8 byte aligned but not 16 byte aligned, so
> the situation is very different from x86 (assuming I am correct in
> asserting that there are no failure modes resulting from a misaligned
> stack other than this one and a potential performance hit)

OK, sounds like we're already using 16-byte aligned stacks on
ARM64.  So unless x86-64 stacks are much smaller, I don't see
the need to use 8-byte aligned stacks at least from a stack space
point-of-view.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Herbert Xu

On Wed, Jan 11, 2017 at 11:51:10PM -0800, Andy Lutomirski wrote:
>
> The problem is that we have nasties like TRACE_IRQS_OFF.  Performance

I don't understand.  What's the issue with TRACE_IRQS_OFF? It should
be treated as any other function call.  That is, enter it with an
aligned stack, and the TRACE_IRQS_OFF code itself should ensure
the stack stays aligned before it calls down into C.

> But if we can't do this with automatic verification, then I'm not sure
> I want to do it at all.  The asm is already more precarious than I'd
> like, and having a code path that is misaligned is asking for obscure
> bugs down the road.

I understand the need for automated checks at this point in time.
 But longer term this is just part of the calling ABI.  After all,
we don't add checks everywhere to ensure people preserve rbx.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86-64: Maintain 16-byte stack alignment

2017-01-12 Thread Ard Biesheuvel

On 12 January 2017 at 06:12, Herbert Xu  wrote:
> On Tue, Jan 10, 2017 at 05:30:48PM +, Ard Biesheuvel wrote:
>>
>> Apologies for introducing this breakage. It seemed like an obvious and
>> simple cleanup, so I didn't even bother to mention it in the commit
>> log, but if the kernel does not guarantee 16 byte alignment, I guess
>> we should revert to the old method. If SSE instructions are the only
>> ones that require this alignment, then I suppose not having a ABI
>> conforming stack pointer should not be an issue in general.
>
> BTW Ard, what is the stack alignment on ARM64?
>

[From memory] the arm64 ELF psABI mandates a 16 byte stack alignment
at function entry, and 8 byte alignment at all other times. This means
compiled code will typically preserve 16 byte alignment, and
__aligned(16) on a stack variable will likely not result in an
explicit alignment of the stack pointer *. But the arm64 ISA does not
have any load/store instructions that would trigger a misalignment
fault on an address that is 8 byte aligned but not 16 byte aligned, so
the situation is very different from x86 (assuming I am correct in
asserting that there are no failure modes resulting from a misaligned
stack other than this one and a potential performance hit)

* I didn't check whether the exception handling realigns the stack
pointer on nested exceptions (arm64 has separate IRQ stacks)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

84 matches

Mail list logo