Re: questions of crypto async api

2013-04-08 Thread Kim Phillips
On Mon, 8 Apr 2013 13:49:58 +
"Hsieh, Che-Min"  wrote:

> Thanks for the answer.
> 
> I have further question on the same subject.
> With regard to the commit in talitos.c, (attached at end of this mail),  
>   the driver submits requests of same tfm to the same channel to ensure the 
> ordering.  
> 
> Is it because the tfm context needs to be maintained from one operation to 
> next operation? Eg, aead_givencrypt() generates iv based on previous iv 
> result stored in tfm.

is that what the commit text says?

> If requests are sent to different channels dynamically. And driver at the 
> completion of a request from HW, reorders the requests completion callback, 
> what would happen? 

about the same thing as wrapping the driver with pcrypt?  why not
use the h/w to maintain ordering?

Kim

> Thanks in advance.
> 
> Chemin
> 
> 
> 
> commit 5228f0f79e983c2b39c202c75af901ceb0003fc1
> Author: Kim Phillips 
> Date:   Fri Jul 15 11:21:38 2011 +0800
> 
> crypto: talitos - ensure request ordering within a single tfm
> 
> Assign single target channel per tfm in talitos_cra_init instead of
> performing channel scheduling dynamically during the encryption request.
> This changes the talitos_submit interface to accept a new channel
> number argument.  Without this, rapid bursts of misc. sized requests
> could make it possible for IPsec packets to be encrypted out-of-order,
> which would result in packet drops due to sequence numbers falling
> outside the anti-reply window on a peer gateway.
> 
> Signed-off-by: Kim Phillips 
> Signed-off-by: Herbert Xu 
> 
> -Original Message-
> From: Kim Phillips [mailto:kim.phill...@freescale.com] 
> Sent: Friday, April 05, 2013 6:33 PM
> To: Hsieh, Che-Min
> Cc: linux-crypto@vger.kernel.org
> Subject: Re: questions of crypto async api
> 
> On Thu, 4 Apr 2013 14:38:41 +
> "Hsieh, Che-Min"  wrote:
> 
> > If a driver supports multiple instances of HW crypto engines, the order of 
> > the request completion from HW can be different from the order of requests 
> > submitted to different HW.  The 2nd request sent out to the 2nd HW instance 
> > may take shorter time to complete than the first request for different HW 
> > instance.  Is the driver responsible for re-ordering the completion 
> > callout? Or the agents (such as IP protocol stack) are responsible for 
> > reordering? How does pcrypt do it?
> > 
> >  Does it make sense for a transform to send multiple requests outstanding 
> > to async crypto api?
> 
> see:
> 
> http://comments.gmane.org/gmane.linux.kernel.cryptoapi/5350
> 
> >  Is scatterwalk_sg_next() preferred method over sg_next()?  Why?
> 
> scatterwalk_* is the crypto subsystem's version of the function, so yes.
> 
> >  sg_copy_to_buffer() and sg_copy_from_buffer() -> 
> > sg_copy_buffer()->sg_copy_buffer() -> sg_miter_next()-> sg_next() Sometimes 
> > sg_copy_to_buffer() and sg_copy_from_buffer() in our driver do not copy the 
> > whole list. We have to rewrite those functions by using 
> > scattewalk_sg_next() to walk down the list. Is this the correct behavior?
> 
> sounds like you're on the right track, although buffers shouldn't be being 
> copied that often, if at all.
> 
> Kim
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread David Miller
From: Herbert Xu 
Date: Mon, 8 Apr 2013 17:33:40 +0800

> On Mon, Apr 08, 2013 at 10:24:16AM +0200, Steffen Klassert wrote:
>> On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
>> > Patch adds support for NIST recommended block cipher mode CMAC to 
>> > CryptoAPI.
>> > 
>> > This work is based on Tom St Denis' earlier patch,
>> >  http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2
>> > 
>> > Cc: Tom St Denis 
>> > Signed-off-by: Jussi Kivilinna 
>> 
>> This patch does not apply clean to the ipsec-next tree
>> because of some crypto changes I don't have in ipsec-next.
>> The IPsec part should apply to the cryptodev tree,
>> so it's probaply the best if we route this patchset
>> through the cryptodev tree.
>> 
>> Herbert,
>> 
>> are you going to take these patches?
> 
> Sure I can do that.

I'm fine with this:

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] crypto: aesni_intel - add more optimized XTS mode for x86-64

2013-04-08 Thread Jussi Kivilinna
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack
usage and boost for speed.

tcrypt results, with Intel i5-2450M:
256-bit key
enc dec
16B 0.98x   0.99x
64B 0.64x   0.63x
256B1.29x   1.32x
1024B   1.54x   1.58x
8192B   1.57x   1.60x

512-bit key
enc dec
16B 0.98x   0.99x
64B 0.60x   0.59x
256B1.24x   1.25x
1024B   1.39x   1.42x
8192B   1.38x   1.42x

I chose not to optimize smaller than block size of 256 bytes, since XTS is
practically always used with data blocks of size 512 bytes. This is why
performance is reduced in tcrypt for 64 byte long blocks.

Cc: Huang Ying 
Signed-off-by: Jussi Kivilinna 
---
 arch/x86/crypto/aesni-intel_asm.S  |  117 
 arch/x86/crypto/aesni-intel_glue.c |   80 +
 crypto/Kconfig |1 
 3 files changed, 198 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 04b7977..62fe22c 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -34,6 +34,10 @@
 
 #ifdef __x86_64__
 .data
+.align 16
+.Lgf128mul_x_ble_mask:
+   .octa 0x00010087
+
 POLY:   .octa 0xC201
 TWOONE: .octa 0x00010001
 
@@ -105,6 +109,8 @@ enc:.octa 0x2
 #define CTR%xmm11
 #define INC%xmm12
 
+#define GF128MUL_MASK %xmm10
+
 #ifdef __x86_64__
 #define AREG   %rax
 #define KEYP   %rdi
@@ -2636,4 +2642,115 @@ ENTRY(aesni_ctr_enc)
 .Lctr_enc_just_ret:
ret
 ENDPROC(aesni_ctr_enc)
+
+/*
+ * _aesni_gf128mul_x_ble:  internal ABI
+ * Multiply in GF(2^128) for XTS IVs
+ * input:
+ * IV: current IV
+ * GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ * IV: next IV
+ * changed:
+ * CTR:== temporary value
+ */
+#define _aesni_gf128mul_x_ble() \
+   pshufd $0x13, IV, CTR; \
+   paddq IV, IV; \
+   psrad $31, CTR; \
+   pand GF128MUL_MASK, CTR; \
+   pxor CTR, IV;
+
+/*
+ * void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ *  bool enc, u8 *iv)
+ */
+ENTRY(aesni_xts_crypt8)
+   cmpb $0, %cl
+   movl $0, %ecx
+   movl $240, %r10d
+   leaq _aesni_enc4, %r11
+   leaq _aesni_dec4, %rax
+   cmovel %r10d, %ecx
+   cmoveq %rax, %r11
+
+   movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+   movups (IVP), IV
+
+   mov 480(KEYP), KLEN
+   addq %rcx, KEYP
+
+   movdqa IV, STATE1
+   pxor 0x00(INP), STATE1
+   movdqu IV, 0x00(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE2
+   pxor 0x10(INP), STATE2
+   movdqu IV, 0x10(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE3
+   pxor 0x20(INP), STATE3
+   movdqu IV, 0x20(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE4
+   pxor 0x30(INP), STATE4
+   movdqu IV, 0x30(OUTP)
+
+   call *%r11
+
+   pxor 0x00(OUTP), STATE1
+   movdqu STATE1, 0x00(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE1
+   pxor 0x40(INP), STATE1
+   movdqu IV, 0x40(OUTP)
+
+   pxor 0x10(OUTP), STATE2
+   movdqu STATE2, 0x10(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE2
+   pxor 0x50(INP), STATE2
+   movdqu IV, 0x50(OUTP)
+
+   pxor 0x20(OUTP), STATE3
+   movdqu STATE3, 0x20(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE3
+   pxor 0x60(INP), STATE3
+   movdqu IV, 0x60(OUTP)
+
+   pxor 0x30(OUTP), STATE4
+   movdqu STATE4, 0x30(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE4
+   pxor 0x70(INP), STATE4
+   movdqu IV, 0x70(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movups IV, (IVP)
+
+   call *%r11
+
+   pxor 0x40(OUTP), STATE1
+   movdqu STATE1, 0x40(OUTP)
+
+   pxor 0x50(OUTP), STATE2
+   movdqu STATE2, 0x50(OUTP)
+
+   pxor 0x60(OUTP), STATE3
+   movdqu STATE3, 0x60(OUTP)
+
+   pxor 0x70(OUTP), STATE4
+   movdqu STATE4, 0x70(OUTP)
+
+   ret
+ENDPROC(aesni_xts_crypt8)
+
 #endif
diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index a0795da..f80e668 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -39,6 +39,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_X86_64
+#include 
+#endif
 
 #if defined(CONFIG_CRYPTO_PCBC) || defined(CONFIG_CRYPTO_PCBC_MODULE)
 #define HAS_PCBC
@@ -102,6 +105,9 @@ void crypto_fpu_exit(void);
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv);
 
+asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out,
+const u8 *in, bool enc, u8 *iv);
+
 /* asmlinkage void aesni_gcm_enc()
  * void *ctx,  AES Key schedule. Starts on a 16 byte boundary.
  

[PATCH 4/5] crypto: x86/camellia-aesni-avx - add more optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage
and small boost for speed.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.10x   1.01x
64B 0.82x   0.77x
256B1.14x   1.10x
1024B   1.17x   1.16x
8192B   1.10x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of camellia-2way for block sized smaller than
256 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna 
---
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  180 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c   |   91 --
 2 files changed, 229 insertions(+), 42 deletions(-)

diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index cfc1634..ce71f92 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1,7 +1,7 @@
 /*
  * x86_64/AVX/AES-NI assembler implementation of Camellia
  *
- * Copyright © 2012 Jussi Kivilinna 
+ * Copyright © 2012-2013 Jussi Kivilinna 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -589,6 +589,10 @@ 
ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 .Lbswap128_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
 
+/* For XTS mode IV generation */
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
+
 /*
  * pre-SubByte transform
  *
@@ -1090,3 +1094,177 @@ ENTRY(camellia_ctr_16way)
 
ret;
 ENDPROC(camellia_ctr_16way)
+
+#define gf128mul_x_ble(iv, mask, tmp) \
+   vpsrad $31, iv, tmp; \
+   vpaddq iv, iv, iv; \
+   vpshufd $0x13, tmp, tmp; \
+   vpand mask, tmp, tmp; \
+   vpxor tmp, iv, iv;
+
+.align 8
+camellia_xts_crypt_16way:
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst (16 blocks)
+*  %rdx: src (16 blocks)
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*  %r8: index for input whitening key
+*  %r9: pointer to  __camellia_enc_blk16 or __camellia_dec_blk16
+*/
+
+   subq $(16 * 16), %rsp;
+   movq %rsp, %rax;
+
+   vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+
+   /* load IV */
+   vmovdqu (%rcx), %xmm0;
+   vpxor 0 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 15 * 16(%rax);
+   vmovdqu %xmm0, 0 * 16(%rsi);
+
+   /* construct IVs */
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 1 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 14 * 16(%rax);
+   vmovdqu %xmm0, 1 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 2 * 16(%rdx), %xmm0, %xmm13;
+   vmovdqu %xmm0, 2 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 3 * 16(%rdx), %xmm0, %xmm12;
+   vmovdqu %xmm0, 3 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 4 * 16(%rdx), %xmm0, %xmm11;
+   vmovdqu %xmm0, 4 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 5 * 16(%rdx), %xmm0, %xmm10;
+   vmovdqu %xmm0, 5 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 6 * 16(%rdx), %xmm0, %xmm9;
+   vmovdqu %xmm0, 6 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 7 * 16(%rdx), %xmm0, %xmm8;
+   vmovdqu %xmm0, 7 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 8 * 16(%rdx), %xmm0, %xmm7;
+   vmovdqu %xmm0, 8 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 9 * 16(%rdx), %xmm0, %xmm6;
+   vmovdqu %xmm0, 9 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 10 * 16(%rdx), %xmm0, %xmm5;
+   vmovdqu %xmm0, 10 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 11 * 16(%rdx), %xmm0, %xmm4;
+   vmovdqu %xmm0, 11 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 12 * 16(%rdx), %xmm0, %xmm3;
+   vmovdqu %xmm0, 12 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 13 * 16(%rdx), %xmm0, %xmm2;
+   vmovdqu %xmm0, 13 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 14 * 16(%rdx), %xmm0, %xmm1;
+   vmovdqu %xmm0, 14 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 15 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 0 * 16(%rax);
+   vmovdqu %xmm0, 15 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vmovdqu %xmm0, (%rcx);
+
+   /* inpack16_pre: */
+   vmovq (key_table)(CTX, %r8, 8), %xmm15;
+   vpshufb .Lpack_bswap, %xmm15, %xmm15;
+   vpxor 0 * 16(%rax), %xmm15, %xmm0;
+   vpxor %xmm1, %xmm15, %xmm1;
+   vpxor %xmm2, %xmm15, %xmm2;
+   vpxor %xmm3, %xmm15, %xmm3;
+   vpxor %xmm4, %xmm15, %xmm4;
+   vpxor %xmm5, %

[PATCH 3/5] crypto: cast6-avx: use new optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Change cast6-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.01x   1.01x
64B 1.01x   1.00x
256B1.09x   1.02x
1024B   1.08x   1.06x
8192B   1.08x   1.07x

Signed-off-by: Jussi Kivilinna 
---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S |   48 +++
 arch/x86/crypto/cast6_avx_glue.c  |   91 -
 2 files changed, 98 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S 
b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index f93b610..e3531f8 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -4,7 +4,7 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * 
  *
- * Copyright © 2012 Jussi Kivilinna 
+ * Copyright © 2012-2013 Jussi Kivilinna 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -227,6 +227,8 @@
 .data
 
 .align 16
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
 .Lbswap_mask:
.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
 .Lbswap128_mask:
@@ -424,3 +426,47 @@ ENTRY(cast6_ctr_8way)
 
ret;
 ENDPROC(cast6_ctr_8way)
+
+ENTRY(cast6_xts_enc_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs <= src, dst <= IVs, regs <= regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask);
+
+   call __cast6_enc_blk8;
+
+   /* dst <= regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(cast6_xts_enc_8way)
+
+ENTRY(cast6_xts_dec_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs <= src, dst <= IVs, regs <= regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask);
+
+   call __cast6_dec_blk8;
+
+   /* dst <= regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(cast6_xts_dec_8way)
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 92f7ca2..8d0dfb8 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * 
  *
+ * Copyright © 2013 Jussi Kivilinna 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -50,6 +52,23 @@ asmlinkage void cast6_cbc_dec_8way(struct cast6_ctx *ctx, u8 
*dst,
 asmlinkage void cast6_ctr_8way(struct cast6_ctx *ctx, u8 *dst, const u8 *src,
   le128 *iv);
 
+asmlinkage void cast6_xts_enc_8way(struct cast6_ctx *ctx, u8 *dst,
+  const u8 *src, le128 *iv);
+asmlinkage void cast6_xts_dec_8way(struct cast6_ctx *ctx, u8 *dst,
+  const u8 *src, le128 *iv);
+
+static void cast6_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(__cast6_encrypt));
+}
+
+static void cast6_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(__cast6_decrypt));
+}
+
 static void cast6_crypt_ctr(void *ctx, u128 *dst, const u128 *src, le128 *iv)
 {
be128 ctrblk;
@@ -87,6 +106,19 @@ static const struct common_glue_ctx cast6_ctr = {
} }
 };
 
+static const struct common_glue_ctx cast6_enc_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = CAST6_PARALLEL_BLOCKS,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc_8way) }
+   }, {
+   .num_blocks = 1,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc) }
+   } }
+};
+
 static const struct common_glue_ctx cast6_dec = {
.num_funcs = 2,
.fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
@@ -113,6 +145,19 @@ static const struct common_glue_ctx cast6_dec_cbc = {
} }
 };
 
+static const struct common_glue_ctx cast6_dec_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = CAST6_PARAL

[PATCH 2/5] crypto: x86/twofish-avx - use optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Change twofish-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.03x   1.02x
64B 0.91x   0.91x
256B1.10x   1.09x
1024B   1.12x   1.11x
8192B   1.12x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of twofish-3way for block sized smaller than
128 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna 
---
 arch/x86/crypto/twofish-avx-x86_64-asm_64.S |   48 ++
 arch/x86/crypto/twofish_avx_glue.c  |   91 +++
 2 files changed, 98 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S 
b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
index 8d3e113..0505813 100644
--- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
@@ -4,7 +4,7 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * 
  *
- * Copyright © 2012 Jussi Kivilinna 
+ * Copyright © 2012-2013 Jussi Kivilinna 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -33,6 +33,8 @@
 
 .Lbswap128_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
 
 .text
 
@@ -408,3 +410,47 @@ ENTRY(twofish_ctr_8way)
 
ret;
 ENDPROC(twofish_ctr_8way)
+
+ENTRY(twofish_xts_enc_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs <= src, dst <= IVs, regs <= regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask);
+
+   call __twofish_enc_blk8;
+
+   /* dst <= regs xor IVs(in dst) */
+   store_xts_8way(%r11, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2);
+
+   ret;
+ENDPROC(twofish_xts_enc_8way)
+
+ENTRY(twofish_xts_dec_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs <= src, dst <= IVs, regs <= regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2,
+ RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask);
+
+   call __twofish_dec_blk8;
+
+   /* dst <= regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(twofish_xts_dec_8way)
diff --git a/arch/x86/crypto/twofish_avx_glue.c 
b/arch/x86/crypto/twofish_avx_glue.c
index 94ac91d..a62ba54 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * 
  *
+ * Copyright © 2013 Jussi Kivilinna 
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -56,12 +58,29 @@ asmlinkage void twofish_cbc_dec_8way(struct twofish_ctx 
*ctx, u8 *dst,
 asmlinkage void twofish_ctr_8way(struct twofish_ctx *ctx, u8 *dst,
 const u8 *src, le128 *iv);
 
+asmlinkage void twofish_xts_enc_8way(struct twofish_ctx *ctx, u8 *dst,
+const u8 *src, le128 *iv);
+asmlinkage void twofish_xts_dec_8way(struct twofish_ctx *ctx, u8 *dst,
+const u8 *src, le128 *iv);
+
 static inline void twofish_enc_blk_3way(struct twofish_ctx *ctx, u8 *dst,
const u8 *src)
 {
__twofish_enc_blk_3way(ctx, dst, src, false);
 }
 
+static void twofish_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(twofish_enc_blk));
+}
+
+static void twofish_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(twofish_dec_blk));
+}
+
 
 static const struct common_glue_ctx twofish_enc = {
.num_funcs = 3,
@@ -95,6 +114,19 @@ static const struct common_glue_ctx twofish_ctr = {
} }
 };
 
+static const struct common_glue_ctx twofish_enc_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = TWOFISH_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = TWOFISH_PARALLEL_BLOCKS,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(twofish_xts_enc_8way) }
+   }, {
+   .num_blocks = 1,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(twofish_xts_enc) }

[PATCH 1/5] crypto: x86 - add more optimized XTS-mode for serpent-avx

2013-04-08 Thread Jussi Kivilinna
This patch adds AVX optimized XTS-mode helper functions/macros and converts
serpent-avx to use the new facilities. Benefits are slightly improved speed
and reduced stack usage as use of temporary IV-array is avoided.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.00x   1.00x
64B 1.00x   1.00x
256B1.04x   1.06x
1024B   1.09x   1.09x
8192B   1.10x   1.09x

Signed-off-by: Jussi Kivilinna 
---
 arch/x86/crypto/glue_helper-asm-avx.S   |   61 +
 arch/x86/crypto/glue_helper.c   |   97 +++
 arch/x86/crypto/serpent-avx-x86_64-asm_64.S |   45 -
 arch/x86/crypto/serpent_avx_glue.c  |   87 +---
 arch/x86/include/asm/crypto/glue_helper.h   |   24 +++
 arch/x86/include/asm/crypto/serpent-avx.h   |5 +
 6 files changed, 273 insertions(+), 46 deletions(-)

diff --git a/arch/x86/crypto/glue_helper-asm-avx.S 
b/arch/x86/crypto/glue_helper-asm-avx.S
index f7b6ea2..02ee230 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -1,7 +1,7 @@
 /*
  * Shared glue code for 128bit block ciphers, AVX assembler macros
  *
- * Copyright (c) 2012 Jussi Kivilinna 
+ * Copyright © 2012-2013 Jussi Kivilinna 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -89,3 +89,62 @@
vpxor (6*16)(src), x6, x6; \
vpxor (7*16)(src), x7, x7; \
store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
+
+#define gf128mul_x_ble(iv, mask, tmp) \
+   vpsrad $31, iv, tmp; \
+   vpaddq iv, iv, iv; \
+   vpshufd $0x13, tmp, tmp; \
+   vpand mask, tmp, tmp; \
+   vpxor tmp, iv, iv;
+
+#define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
+ t1, xts_gf128mul_and_shl1_mask) \
+   vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+   \
+   /* load IV */ \
+   vmovdqu (iv), tiv; \
+   vpxor (0*16)(src), tiv, x0; \
+   vmovdqu tiv, (0*16)(dst); \
+   \
+   /* construct and store IVs, also xor with source */ \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (1*16)(src), tiv, x1; \
+   vmovdqu tiv, (1*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (2*16)(src), tiv, x2; \
+   vmovdqu tiv, (2*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (3*16)(src), tiv, x3; \
+   vmovdqu tiv, (3*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (4*16)(src), tiv, x4; \
+   vmovdqu tiv, (4*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (5*16)(src), tiv, x5; \
+   vmovdqu tiv, (5*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (6*16)(src), tiv, x6; \
+   vmovdqu tiv, (6*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (7*16)(src), tiv, x7; \
+   vmovdqu tiv, (7*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vmovdqu tiv, (iv);
+
+#define store_xts_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7) \
+   vpxor (0*16)(dst), x0, x0; \
+   vpxor (1*16)(dst), x1, x1; \
+   vpxor (2*16)(dst), x2, x2; \
+   vpxor (3*16)(dst), x3, x3; \
+   vpxor (4*16)(dst), x4, x4; \
+   vpxor (5*16)(dst), x5, x5; \
+   vpxor (6*16)(dst), x6, x6; \
+   vpxor (7*16)(dst), x7, x7; \
+   store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c
index 22ce4f6..432f1d76 100644
--- a/arch/x86/crypto/glue_helper.c
+++ b/arch/x86/crypto/glue_helper.c
@@ -1,7 +1,7 @@
 /*
  * Shared glue code for 128bit block ciphers
  *
- * Copyright (c) 2012 Jussi Kivilinna 
+ * Copyright © 2012-2013 Jussi Kivilinna 
  *
  * CBC & ECB parts based on code (crypto/cbc.c,ecb.c) by:
  *   Copyright (c) 2006 Herbert Xu 
@@ -304,4 +304,99 @@ int glue_ctr_crypt_128bit(const struct common_glue_ctx 
*gctx,
 }
 EXPORT_SYMBOL_GPL(glue_ctr_crypt_128bit);
 
+static unsigned int __glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
+   void *ctx,
+   struct blkcipher_desc *desc,
+   struct blkcipher_walk *walk)
+{
+   const unsigned int bsize = 128 / 8;
+   unsigned int nbytes = walk->nbytes;
+   u128 *src = (u128 *)walk->src.virt.addr;
+   u128 *dst = (u128 *)walk->dst.virt.addr;
+   unsigned int num_blocks, func_bytes;
+   unsigned int i;
+
+   /* Process multi-block batch */
+   for (i = 0; i < gctx->num_funcs; i++) {
+   num_blocks = gctx->funcs[i].num_blocks;
+   func_bytes = bsize * num_blocks;
+
+   if (nbytes >= func_bytes) {
+   do {
+   gctx->funcs[i].fn_u.xts(ctx, dst, src,
+   

RE: questions of crypto async api

2013-04-08 Thread Hsieh, Che-Min
Thanks for the answer.

I have further question on the same subject.
With regard to the commit in talitos.c, (attached at end of this mail),  
  the driver submits requests of same tfm to the same channel to ensure the 
ordering.  

Is it because the tfm context needs to be maintained from one operation to next 
operation? Eg, aead_givencrypt() generates iv based on previous iv result 
stored in tfm.

If requests are sent to different channels dynamically. And driver at the 
completion of a request from HW, reorders the requests completion callback, 
what would happen? 

Thanks in advance.

Chemin



commit 5228f0f79e983c2b39c202c75af901ceb0003fc1
Author: Kim Phillips 
Date:   Fri Jul 15 11:21:38 2011 +0800

crypto: talitos - ensure request ordering within a single tfm

Assign single target channel per tfm in talitos_cra_init instead of
performing channel scheduling dynamically during the encryption request.
This changes the talitos_submit interface to accept a new channel
number argument.  Without this, rapid bursts of misc. sized requests
could make it possible for IPsec packets to be encrypted out-of-order,
which would result in packet drops due to sequence numbers falling
outside the anti-reply window on a peer gateway.

Signed-off-by: Kim Phillips 
Signed-off-by: Herbert Xu 

-Original Message-
From: Kim Phillips [mailto:kim.phill...@freescale.com] 
Sent: Friday, April 05, 2013 6:33 PM
To: Hsieh, Che-Min
Cc: linux-crypto@vger.kernel.org
Subject: Re: questions of crypto async api

On Thu, 4 Apr 2013 14:38:41 +
"Hsieh, Che-Min"  wrote:

> If a driver supports multiple instances of HW crypto engines, the order of 
> the request completion from HW can be different from the order of requests 
> submitted to different HW.  The 2nd request sent out to the 2nd HW instance 
> may take shorter time to complete than the first request for different HW 
> instance.  Is the driver responsible for re-ordering the completion callout? 
> Or the agents (such as IP protocol stack) are responsible for reordering? How 
> does pcrypt do it?
> 
>  Does it make sense for a transform to send multiple requests outstanding to 
> async crypto api?

see:

http://comments.gmane.org/gmane.linux.kernel.cryptoapi/5350

>  Is scatterwalk_sg_next() preferred method over sg_next()?  Why?

scatterwalk_* is the crypto subsystem's version of the function, so yes.

>  sg_copy_to_buffer() and sg_copy_from_buffer() -> 
> sg_copy_buffer()->sg_copy_buffer() -> sg_miter_next()-> sg_next() Sometimes 
> sg_copy_to_buffer() and sg_copy_from_buffer() in our driver do not copy the 
> whole list. We have to rewrite those functions by using scattewalk_sg_next() 
> to walk down the list. Is this the correct behavior?

sounds like you're on the right track, although buffers shouldn't be being 
copied that often, if at all.

Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Herbert Xu
On Mon, Apr 08, 2013 at 10:24:16AM +0200, Steffen Klassert wrote:
> On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
> > Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.
> > 
> > This work is based on Tom St Denis' earlier patch,
> >  http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2
> > 
> > Cc: Tom St Denis 
> > Signed-off-by: Jussi Kivilinna 
> 
> This patch does not apply clean to the ipsec-next tree
> because of some crypto changes I don't have in ipsec-next.
> The IPsec part should apply to the cryptodev tree,
> so it's probaply the best if we route this patchset
> through the cryptodev tree.
> 
> Herbert,
> 
> are you going to take these patches?

Sure I can do that.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Jussi Kivilinna
On 08.04.2013 11:24, Steffen Klassert wrote:
> On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
>> Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.
>>
>> This work is based on Tom St Denis' earlier patch,
>>  http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2
>>
>> Cc: Tom St Denis 
>> Signed-off-by: Jussi Kivilinna 
> 
> This patch does not apply clean to the ipsec-next tree
> because of some crypto changes I don't have in ipsec-next.
> The IPsec part should apply to the cryptodev tree,
> so it's probaply the best if we route this patchset
> through the cryptodev tree.

I should have mentioned that the patchset is on top of cryptodev tree and
previous crypto patches that I send yesterday, likely to cause problems
atleast at tcrypt.c:

http://marc.info/?l=linux-crypto-vger&m=136534223503368&w=2

-Jussi

> 
> Herbert,
> 
> are you going to take these patches?
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Steffen Klassert
On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
> Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.
> 
> This work is based on Tom St Denis' earlier patch,
>  http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2
> 
> Cc: Tom St Denis 
> Signed-off-by: Jussi Kivilinna 

This patch does not apply clean to the ipsec-next tree
because of some crypto changes I don't have in ipsec-next.
The IPsec part should apply to the cryptodev tree,
so it's probaply the best if we route this patchset
through the cryptodev tree.

Herbert,

are you going to take these patches?
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] xfrm: add rfc4494 AES-CMAC-96 support

2013-04-08 Thread Jussi Kivilinna
Now that CryptoAPI has support for CMAC, we can add support for AES-CMAC-96
(rfc4494).

Cc: Tom St Denis 
Signed-off-by: Jussi Kivilinna 
---
 net/xfrm/xfrm_algo.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 6fb9d00..ab4ef72 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -311,6 +311,19 @@ static struct xfrm_algo_desc aalg_list[] = {
.sadb_alg_maxbits = 128
}
 },
+{
+   /* rfc4494 */
+   .name = "cmac(aes)",
+
+   .uinfo = {
+   .auth = {
+   .icv_truncbits = 96,
+   .icv_fullbits = 128,
+   }
+   },
+
+   .pfkey_supported = 0,
+},
 };
 
 static struct xfrm_algo_desc ealg_list[] = {

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Jussi Kivilinna
Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.

This work is based on Tom St Denis' earlier patch,
 http://marc.info/?l=linux-crypto-vger&m=135877306305466&w=2

Cc: Tom St Denis 
Signed-off-by: Jussi Kivilinna 
---
 crypto/Kconfig   |   11 ++
 crypto/Makefile  |1 
 crypto/cmac.c|  315 ++
 crypto/tcrypt.c  |   11 ++
 crypto/testmgr.c |   18 +++
 crypto/testmgr.h |  125 +
 6 files changed, 480 insertions(+), 1 deletion(-)
 create mode 100644 crypto/cmac.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 6cc27f1..c1142f3 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -283,6 +283,17 @@ config CRYPTO_XTS
 
 comment "Hash modes"
 
+config CRYPTO_CMAC
+   tristate "CMAC support"
+   select CRYPTO_HASH
+   select CRYPTO_MANAGER
+   help
+ Cipher-based Message Authentication Code (CMAC) specified by
+ The National Institute of Standards and Technology (NIST).
+
+ https://tools.ietf.org/html/rfc4493
+ http://csrc.nist.gov/publications/nistpubs/800-38B/SP_800-38B.pdf
+
 config CRYPTO_HMAC
tristate "HMAC support"
select CRYPTO_HASH
diff --git a/crypto/Makefile b/crypto/Makefile
index be1a1be..a8e9b0f 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -32,6 +32,7 @@ cryptomgr-y := algboss.o testmgr.o
 
 obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o
 obj-$(CONFIG_CRYPTO_USER) += crypto_user.o
+obj-$(CONFIG_CRYPTO_CMAC) += cmac.o
 obj-$(CONFIG_CRYPTO_HMAC) += hmac.o
 obj-$(CONFIG_CRYPTO_VMAC) += vmac.o
 obj-$(CONFIG_CRYPTO_XCBC) += xcbc.o
diff --git a/crypto/cmac.c b/crypto/cmac.c
new file mode 100644
index 000..50880cf
--- /dev/null
+++ b/crypto/cmac.c
@@ -0,0 +1,315 @@
+/*
+ * CMAC: Cipher Block Mode for Authentication
+ *
+ * Copyright © 2013 Jussi Kivilinna 
+ *
+ * Based on work by:
+ *  Copyright © 2013 Tom St Denis 
+ * Based on crypto/xcbc.c:
+ *  Copyright © 2006 USAGI/WIDE Project,
+ *   Author: Kazunori Miyazawa 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * +
+ * | 
+ * +
+ * | cmac_tfm_ctx
+ * +
+ * | consts (block size * 2)
+ * +
+ */
+struct cmac_tfm_ctx {
+   struct crypto_cipher *child;
+   u8 ctx[];
+};
+
+/*
+ * +
+ * | 
+ * +
+ * | cmac_desc_ctx
+ * +
+ * | odds (block size)
+ * +
+ * | prev (block size)
+ * +
+ */
+struct cmac_desc_ctx {
+   unsigned int len;
+   u8 ctx[];
+};
+
+static int crypto_cmac_digest_setkey(struct crypto_shash *parent,
+const u8 *inkey, unsigned int keylen)
+{
+   unsigned long alignmask = crypto_shash_alignmask(parent);
+   struct cmac_tfm_ctx *ctx = crypto_shash_ctx(parent);
+   unsigned int bs = crypto_shash_blocksize(parent);
+   __be64 *consts = PTR_ALIGN((void *)ctx->ctx, alignmask + 1);
+   u64 _const[2];
+   int i, err = 0;
+   u8 msb_mask, gfmask;
+
+   err = crypto_cipher_setkey(ctx->child, inkey, keylen);
+   if (err)
+   return err;
+
+   /* encrypt the zero block */
+   memset(consts, 0, bs);
+   crypto_cipher_encrypt_one(ctx->child, (u8 *)consts, (u8 *)consts);
+
+   switch (bs) {
+   case 16:
+   gfmask = 0x87;
+   _const[0] = be64_to_cpu(consts[1]);
+   _const[1] = be64_to_cpu(consts[0]);
+
+   /* gf(2^128) multiply zero-ciphertext with u and u^2 */
+   for (i = 0; i < 4; i += 2) {
+   msb_mask = ((s64)_const[1] >> 63) & gfmask;
+   _const[1] = (_const[1] << 1) | (_const[0] >> 63);
+   _const[0] = (_const[0] << 1) ^ msb_mask;
+
+   consts[i + 0] = cpu_to_be64(_const[1]);
+   consts[i + 1] = cpu_to_be64(_const[0]);
+   }
+
+   break;
+   case 8:
+   gfmask = 0x1B;
+   _const[0] = be64_to_cpu(consts[0]);
+
+   /* gf(2^64) multiply zero-ciphertext with u and u^2 */
+   for (i = 0; i < 2; i++) {
+   msb_mask = ((s64)_const[0] >> 63) & gfmask;
+   _const[0] = (_const[0] << 1) ^ msb_mask;
+
+   consts[i] = cpu_to_be64(_const[0]);
+   }
+
+   break;
+   }
+
+   return 0;
+}
+
+static int crypto_cmac_digest_init(struct shash_desc *pdesc)
+{
+   unsigned long alignmask = crypto_shash_alignmask(pdesc->tfm);
+   struct cmac_desc_ctx *ctx = shash_desc_ct