Re: [PATCH v2 0/5] crypto: add algif_akcipher user space API

2015-10-28 Thread Stephan Mueller
Am Mittwoch, 28. Oktober 2015, 11:56:59 schrieb Marcel Holtmann:

Hi Marcel,
>> 
>> With all due respect, I would object here. When we say yes to TLS (even if
>> it is parts of TLS up to the point where the KDF happens), we invite all
>> higher level crypto implementations: IKE, SNMP, SSH -- I would not want to
>> go down that path that started by simply supporting accelerated asymmetric
>> ciphers.
>Reality is that TLS in the kernel is happening. Reality is also that we do

This is simply wrong IMHO for anything beyond the key handling. But that is 
not the topic here, I guess.


Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel tainted while exporting shash context using af_alg interface

2015-10-28 Thread Harsh Jain
Hi Stephan,

I tried your patch on my machine. Kernel is not crashing. The openssl
break with this. Can you share HMAC program which you are suspecting
it will not work or do you already have some test written in
libkcapi/test.sh which will fail.


Regards
Harsh Jain

On Wed, Oct 28, 2015 at 6:25 AM, Stephan Mueller  wrote:
> Am Mittwoch, 28. Oktober 2015, 01:09:58 schrieb Stephan Mueller:
>
> Hi Harsh,
>
>>
>>
>> However, any error in user space should not crash the kernel. So, a fix
>> should be done. But I think your code is not correct as it solidifies a
>> broken user space code.
>
> After thinking a bit again, I think your approach is correct after all. I was
> able to reproduce the crash by simply adding more accept calls to my test
> code. And I can confirm that your patch works, for hashes.
>
> *BUT* it does NOT work for HMAC as the key is set on the TFM and the
> subsequent accepts do not transport the key. Albeit your code prevents the
> kernel from crashing, the HMAC calculation will be done with an empty key as
> the setkey operation does not reach the TFM handle in the subordinate accept()
> call.
>
> So, I would think that the second accept is simply broken, for HMAC at least.
>
> Herbert, what is the purpose of that subordinate accept that is implemented
> with hash_accept? As this is broken for HMACs, should it be removed entirely?
>
> --
> Ciao
> Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel tainted while exporting shash context using af_alg interface

2015-10-28 Thread Stephan Mueller
Am Mittwoch, 28. Oktober 2015, 16:24:34 schrieb Harsh Jain:

Hi Harsh,

>Hi Stephan,
>
>I tried your patch on my machine. Kernel is not crashing. The openssl
>break with this. Can you share HMAC program which you are suspecting
>it will not work or do you already have some test written in
>libkcapi/test.sh which will fail.

See comments above test/kcapi-main.c:cavs_hash

 * HMAC command line invocation:
 * $ ./kcapi -x 3 -c "hmac(sha1)" -k 6e77ebd479da794707bc6cde3694f552ea892dab 
-p  
31b62a797adbff6b8a358d2b5206e01fee079de8cdfc4695138bba163b4efbf30127343e7fd4fbc696c3d38d8f27f57c024b5056f726ceeb4c31d98e57751ec8cbe8904ee0f9b031ae6a0c55da5e062475b3d7832191d4057643ef5fa446801d59a04693e573a8159cd2416b7bd39c7f0fe63c599365e04d596c05736beaab58
 * 7f204ea665666f5bd2b370e546d1b408005e4d85

To do that, apply your patch and then

1. open lib/kcapi-kernel-if.c and change line 567 from

handle->opfd = accept(handle->tfmfd, NULL, 0);


to

handle->opfd = accept(handle->tfmfd, NULL, 0);
handle->opfd = accept(handle->opfd, NULL, 0);
handle->opfd = accept(handle->opfd, NULL, 0);
handle->opfd = accept(handle->opfd, NULL, 0);
handle->opfd = accept(handle->opfd, NULL, 0);

You will see that the hash commands will pass, the HMAC fails

Without your patch, the kernel crashes (same as with your OpenSSL code).

The reason is that setkey is applied on the TFM that is not conveyed to the 
subsequent TFMs generated with new accepts.
>
>
>Regards
>Harsh Jain
>
>On Wed, Oct 28, 2015 at 6:25 AM, Stephan Mueller  wrote:
>> Am Mittwoch, 28. Oktober 2015, 01:09:58 schrieb Stephan Mueller:
>> 
>> Hi Harsh,
>> 
>>> However, any error in user space should not crash the kernel. So, a fix
>>> should be done. But I think your code is not correct as it solidifies a
>>> broken user space code.
>> 
>> After thinking a bit again, I think your approach is correct after all. I
>> was able to reproduce the crash by simply adding more accept calls to my
>> test code. And I can confirm that your patch works, for hashes.
>> 
>> *BUT* it does NOT work for HMAC as the key is set on the TFM and the
>> subsequent accepts do not transport the key. Albeit your code prevents the
>> kernel from crashing, the HMAC calculation will be done with an empty key
>> as
>> the setkey operation does not reach the TFM handle in the subordinate
>> accept() call.
>> 
>> So, I would think that the second accept is simply broken, for HMAC at
>> least.
>> 
>> Herbert, what is the purpose of that subordinate accept that is implemented
>> with hash_accept? As this is broken for HMACs, should it be removed
>> entirely?
>> 
>> --
>> Ciao
>> Stephan
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] x86 AES-CBC encryption with AVX2 multi-buffer

2015-10-28 Thread Tim Chen

In this patch series, we introduce AES CBC encryption that is parallelized
on x86_64 cpu with AVX2. The multi-buffer technique takes advantage
of wide AVX2 register and encrypt 8 data streams in parallel with SIMD
instructions. Decryption is handled as in the existing AESNI Intel CBC
implementation which can already parallelize decryption even for a single
data stream.

Please see the multi-buffer whitepaper for details of the technique:
http://www.intel.com/content/www/us/en/communications/communications-ia-multi-buffer-paper.html

It is important that any driver uses this algorithm properly for scenarios
where we have many data streams that can fill up the data lanes most of
the time.  It shouldn't be used when only a single data stream is expected
mostly. Otherwise we may incurr extra delays when we have frequent gaps in
data lanes, causing us to wait till data come in to fill the data lanes
before initiating encryption.  We may have to wait for flush operations
to commence when no new data come in after some wait time. However we
keep this extra delay to a minimum by opportunistically flushing the
unfinished jobs if crypto daemon is the only active task running on a cpu.

By using this technique, we saw a throughput increase of up to
5.8x under optimal conditions when we have fully loaded encryption jobs
filling up all the data lanes.

Tim Chen (5):
  crypto: Multi-buffer encryptioin infrastructure support
  crypto: AES CBC multi-buffer data structures
  crypto: AES CBC multi-buffer scheduler
  crypto: AES CBC by8 encryption
  crypto: AES CBC multi-buffer glue code

 arch/x86/crypto/Makefile   |   1 +
 arch/x86/crypto/aes-cbc-mb/Makefile|  22 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S| 774 +++
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c| 815 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h|  96 +++
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h| 131 
 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c   | 145 
 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S | 270 +++
 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S | 222 ++
 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S | 416 +++
 arch/x86/crypto/aes-cbc-mb/reg_sizes.S | 125 
 crypto/Kconfig |  16 +
 crypto/blkcipher.c |  29 +-
 crypto/mcryptd.c   | 256 ++-
 crypto/scatterwalk.c   |   7 +
 include/crypto/algapi.h|   1 +
 include/crypto/mcryptd.h   |  36 +
 include/crypto/scatterwalk.h   |   6 +
 include/linux/crypto.h |   1 +
 19 files changed, 3364 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/reg_sizes.S

-- 
1.7.11.7


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] crypto: Multi-buffer encryptioin infrastructure support

2015-10-28 Thread Tim Chen

In this patch, the infrastructure needed in support of multibuffer
encryption implementation is added:

a) Enhace mcryptd daemon to support blkcipher requests.

b) Update configuration to include multi-buffer encryption build support.

c) Add support to crypto scatterwalk that can sleep during encryption
operation, as we may have buffers for jobs in data lanes that are
half-finished, waiting for additional jobs to come to fill empty lanes
before we start the encryption again.  Therefore, we need to enhance
crypto walk with the option to map data buffers non-atomically.  This is
done by algorithms run from crypto daemon who knows it is safe to do so
as it can save and restore FPU state in correct context.

For an introduction to the multi-buffer implementation, please see
http://www.intel.com/content/www/us/en/communications/communications-ia-multi-buffer-paper.html

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 crypto/Kconfig   |  16 +++
 crypto/blkcipher.c   |  29 -
 crypto/mcryptd.c | 256 ++-
 crypto/scatterwalk.c |   7 ++
 include/crypto/algapi.h  |   1 +
 include/crypto/mcryptd.h |  36 ++
 include/crypto/scatterwalk.h |   6 +
 include/linux/crypto.h   |   1 +
 8 files changed, 347 insertions(+), 5 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 7240821..6b51084 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -888,6 +888,22 @@ config CRYPTO_AES_NI_INTEL
  ECB, CBC, LRW, PCBC, XTS. The 64 bit version has additional
  acceleration for CTR.
 
+config CRYPTO_AES_CBC_MB
+   tristate "AES CBC algorithm (x86_64 Multi-Buffer, Experimental)"
+   depends on X86 && 64BIT
+   select CRYPTO_ABLK_HELPER
+   select CRYPTO_MCRYPTD
+   help
+ AES CBC encryption implemented using multi-buffer technique.
+ This algorithm computes on multiple data lanes concurrently with
+ SIMD instructions for better throughput.  It should only be
+ used when there is significant work to generate many separate
+ crypto requests that keep all the data lanes filled to get
+ the performance benefit.  If the data lanes are unfilled, a
+ flush operation will be initiated after some delay to process
+ the exisiting crypto jobs, adding some extra latency at low
+ load case.
+
 config CRYPTO_AES_SPARC64
tristate "AES cipher algorithms (SPARC64)"
depends on SPARC64
diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index 11b9814..9fd4028 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -35,6 +35,9 @@ enum {
BLKCIPHER_WALK_SLOW = 1 << 1,
BLKCIPHER_WALK_COPY = 1 << 2,
BLKCIPHER_WALK_DIFF = 1 << 3,
+   /* deal with scenarios where we can sleep during sg walk */
+   /* when we process part of a request */
+   BLKCIPHER_WALK_MAY_SLEEP = 1 << 4,
 };
 
 static int blkcipher_walk_next(struct blkcipher_desc *desc,
@@ -44,22 +47,38 @@ static int blkcipher_walk_first(struct blkcipher_desc *desc,
 
 static inline void blkcipher_map_src(struct blkcipher_walk *walk)
 {
-   walk->src.virt.addr = scatterwalk_map(&walk->in);
+   /* add support for asynchronous requests which need no atomic map */
+   if (walk->flags & BLKCIPHER_WALK_MAY_SLEEP)
+   walk->src.virt.addr = scatterwalk_map_nonatomic(&walk->in);
+   else
+   walk->src.virt.addr = scatterwalk_map(&walk->in);
 }
 
 static inline void blkcipher_map_dst(struct blkcipher_walk *walk)
 {
-   walk->dst.virt.addr = scatterwalk_map(&walk->out);
+   /* add support for asynchronous requests which need no atomic map */
+   if (walk->flags & BLKCIPHER_WALK_MAY_SLEEP)
+   walk->dst.virt.addr = scatterwalk_map_nonatomic(&walk->out);
+   else
+   walk->dst.virt.addr = scatterwalk_map(&walk->out);
 }
 
 static inline void blkcipher_unmap_src(struct blkcipher_walk *walk)
 {
-   scatterwalk_unmap(walk->src.virt.addr);
+   /* add support for asynchronous requests which need no atomic map */
+   if (walk->flags & BLKCIPHER_WALK_MAY_SLEEP)
+   scatterwalk_unmap_nonatomic(walk->src.virt.addr);
+   else
+   scatterwalk_unmap(walk->src.virt.addr);
 }
 
 static inline void blkcipher_unmap_dst(struct blkcipher_walk *walk)
 {
-   scatterwalk_unmap(walk->dst.virt.addr);
+   /* add support for asynchronous requests which need no atomic map */
+   if (walk->flags & BLKCIPHER_WALK_MAY_SLEEP)
+   scatterwalk_unmap_nonatomic(walk->dst.virt.addr);
+   else
+   scatterwalk_unmap(walk->dst.virt.addr);
 }
 
 /* Get a spot of the specified length that does not straddle a page.
@@ -299,6 +318,8 @@ static inline int blkcipher_copy_iv(struct blkcipher_walk 
*walk)
 int blkcipher_walk_virt(struct blkcipher_desc *desc,
struct blkcipher_walk *walk)
 {
+   

[PATCH 2/5] crypto: AES CBC multi-buffer data structures

2015-10-28 Thread Tim Chen

This patch introduces the data structures and prototypes of functions
needed for doing AES CBC encryption using multi-buffer. Included are
the structures of the multi-buffer AES CBC job, job scheduler in C and
data structure defines in x86 assembly code.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h|  96 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h| 131 
 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S | 270 +
 arch/x86/crypto/aes-cbc-mb/reg_sizes.S | 125 
 4 files changed, 622 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_datastruct.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/reg_sizes.S

diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
new file mode 100644
index 000..5493f83
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_ctx.h
@@ -0,0 +1,96 @@
+/*
+ * Header file for multi buffer AES CBC algorithm manager
+ * that deals with 8 buffers at a time
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * James Guilford 
+ * Sean Gulley 
+ * Tim Chen 
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#ifndef __AES_CBC_MB_CTX_H
+#define __AES_CBC_MB_CTX_H
+
+
+#include 
+
+#include "aes_cbc_mb_mgr.h"
+
+#define CBC_ENCRYPT0x01
+#define CBC_DECRYPT0x02
+#define CBC_START  0x04
+#define CBC_DONE   0x08
+
+#define CBC_CTX_STS_IDLE   0x00
+#define CBC_CTX_STS_PROCESSING 0x01
+#define CBC_CTX_STS_LAST   0x02
+#define CBC_CTX_STS_COMPLETE   0x04
+
+enum cbc_ctx_error {
+   CBC_CTX_ERROR_NONE   =  0,
+   CBC_CTX_ERROR_INVALID_FLAGS  = -1,
+   CBC_CTX_ERROR_ALREADY_PROCESSING = -2,
+   CBC_CTX_ERROR_ALREADY_COMPLETED  = -3,
+};
+
+#define cbc_ctx_init(ctx, nbytes, op) \
+   do { \
+   (ctx)->flag = (op) | CBC_START; \
+   (ctx)->nbytes = nbytes; \
+   } while (0)
+
+/* AESNI routines to perform cbc decrypt and key expansion */
+
+asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
+ const u8 *in, unsigned int len, u8 *iv);
+asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+unsigned int key_len);
+
+#endif /* __AES_CBC_MB_CTX_H */
diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
new file mode 100644
index 000..0def82e
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb_mgr.h
@@ -0,0 +1,131 @@
+/*
+ * Header file for multi buffer AES CBC algorithm manager
+ *

[PATCH 5/5] crypto: AES CBC multi-buffer glue code

2015-10-28 Thread Tim Chen

This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several AES CBC jobs
to the multi-buffer algorithm. The glue code interfaces with the
underlying algorithm that handles 8 data streams of AES CBC encryption
in parallel. AES key expansion and CBC decryption requests are performed
in a manner similar to the existing AESNI Intel glue driver.

The outline of the algorithm for AES CBC encryption requests is
sketched below:

Any driver requesting the crypto service will place an async crypto
request on the workqueue.  The multi-buffer crypto daemon will pull an
AES CBC encryption request from work queue and put each request in an
empty data lane for multi-buffer crypto computation.  When all the empty
lanes are filled, computation will commence on the jobs in parallel and
the job with the shortest remaining buffer will get completed and be
returned. To prevent prolonged stall, when no new jobs arrive, we will
flush workqueue of jobs after a maximum allowable delay has elapsed.

To accommodate the fragmented nature of scatter-gather, we will keep
submitting the next scatter-buffer fragment for a job for multi-buffer
computation until a job is completed and no more buffer fragments remain.
At that time we will pull a new job to fill the now empty data slot.
We check with the multibuffer scheduler to see if there are other
completed jobs to prevent extraneous delay in returning any completed
jobs.

This multi-buffer algorithm should be used for cases where we get at
least 8 streams of crypto jobs submitted at a reasonably high rate.
For low crypto job submission rate and low number of data streams, this
algorithm will not be beneficial. The reason is at low rate, we do not
fill out the data lanes before flushing the jobs instead of processing
them with all the data lanes full.  We will miss the benefit of parallel
computation, and adding delay to the processing of the crypto job at the
same time.  Some tuning of the maximum latency parameter may be needed
to get the best performance.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/Makefile|   1 +
 arch/x86/crypto/aes-cbc-mb/Makefile |  22 +
 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c | 815 
 3 files changed, 838 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index b9b912a..000db49 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_CRYPTO_CRC32_PCLMUL) += crc32-pclmul.o
 obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o
 obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
+obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb/
 obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
 
 # These modules require assembler to support AVX.
diff --git a/arch/x86/crypto/aes-cbc-mb/Makefile 
b/arch/x86/crypto/aes-cbc-mb/Makefile
new file mode 100644
index 000..b642bd8
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/Makefile
@@ -0,0 +1,22 @@
+#
+# Arch-specific CryptoAPI modules.
+#
+
+avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
+
+# we need decryption and key expansion routine symbols
+# if either AESNI_NI_INTEL or AES_CBC_MB is a module
+
+ifeq ($(CONFIG_CRYPTO_AES_NI_INTEL),m)
+   dec_support := ../aesni-intel_asm.o
+endif
+ifeq ($(CONFIG_CRYPTO_AES_CBC_MB),m)
+   dec_support := ../aesni-intel_asm.o
+endif
+
+ifeq ($(avx_supported),yes)
+   obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb.o
+   aes-cbc-mb-y := $(dec_support) aes_cbc_mb.o aes_mb_mgr_init.o \
+   mb_mgr_inorder_x8_asm.o mb_mgr_ooo_x8_asm.o \
+   aes_cbc_enc_x8.o
+endif
diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
new file mode 100644
index 000..037d4e8
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c
@@ -0,0 +1,815 @@
+/*
+ * Multi buffer AES CBC algorithm glue code
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * James Guilford 
+ * Sean Gulley 
+ * Tim Chen 
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME "

[PATCH 4/5] crypto: AES CBC by8 encryption

2015-10-28 Thread Tim Chen

This patch introduces the assembly routine to do a by8 AES CBC encryption
in support of the AES CBC multi-buffer implementation.

Encryption of 8 data streams of a key size are done simultaneously.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S | 774 
 1 file changed, 774 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S

diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S 
b/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S
new file mode 100644
index 000..eaffc28
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_enc_x8.S
@@ -0,0 +1,774 @@
+/*
+ * AES CBC by8 multibuffer optimization (x86_64)
+ * This file implements 128/192/256 bit AES CBC encryption
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * James Guilford 
+ * Sean Gulley 
+ * Tim Chen 
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#include 
+
+/* stack size needs to be an odd multiple of 8 for alignment */
+
+#define AES_KEYSIZE_12816
+#define AES_KEYSIZE_19224
+#define AES_KEYSIZE_25632
+
+#define XMM_SAVE_SIZE  16*10
+#define GPR_SAVE_SIZE  8*9
+#define STACK_SIZE (XMM_SAVE_SIZE + GPR_SAVE_SIZE)
+
+#define GPR_SAVE_REG   %rsp
+#define GPR_SAVE_AREA  %rsp + XMM_SAVE_SIZE
+#define LEN_AREA_OFFSETXMM_SAVE_SIZE + 8*8
+#define LEN_AREA_REG   %rsp
+#define LEN_AREA   %rsp + XMM_SAVE_SIZE + 8*8
+
+#define IN_OFFSET  0
+#define OUT_OFFSET 8*8
+#define KEYS_OFFSET16*8
+#define IV_OFFSET  24*8
+
+
+#define IDX%rax
+#define TMP%rbx
+#define ARG%rdi
+#define LEN%rsi
+
+#define KEYS0  %r14
+#define KEYS1  %r15
+#define KEYS2  %rbp
+#define KEYS3  %rdx
+#define KEYS4  %rcx
+#define KEYS5  %r8
+#define KEYS6  %r9
+#define KEYS7  %r10
+
+#define IN0%r11
+#define IN2%r12
+#define IN4%r13
+#define IN6LEN
+
+#define XDATA0 %xmm0
+#define XDATA1 %xmm1
+#define XDATA2 %xmm2
+#define XDATA3 %xmm3
+#define XDATA4 %xmm4
+#define XDATA5 %xmm5
+#define XDATA6 %xmm6
+#define XDATA7 %xmm7
+
+#define XKEY0_3%xmm8
+#define XKEY1_4%xmm9
+#define XKEY2_5%xmm10
+#define XKEY3_6%xmm11
+#define XKEY4_7%xmm12
+#define XKEY5_8%xmm13
+#define XKEY6_9%xmm14
+#define XTMP   %xmm15
+
+#defineMOVDQ movdqu /* assume buffers not aligned */
+#define CONCAT(a, b)   a##b
+#define INPUT_REG_SUFX 1   /* IN */
+#define XDATA_REG_SUFX 2   /* XDAT */
+#define KEY_REG_SUFX   3   /* KEY */
+#define XMM_REG_SUFX   4   /* XMM */
+
+/*
+ * To avoid positional parameter errors while compiling
+ * three registers need to be passed
+ */
+.text
+
+.macro pxor2 x, y, z
+   MOVDQ   (\x,\y), XTMP
+   pxorXTMP, \z
+.endm
+
+.macro inreg n

[PATCH 3/5] crypto: AES CBC multi-buffer scheduler

2015-10-28 Thread Tim Chen

This patch implements in-order scheduler for encrypting multiple buffers
in parallel supporting AES CBC encryption with key sizes of
128, 192 and 256 bits. It uses 8 data lanes by taking advantage of the
SIMD instructions with AVX2 registers.

The multibuffer manager and scheduler is mostly written in assembly and
the initialization support is written C. The AES CBC multibuffer crypto
driver support interfaces with the multibuffer manager and scheduler
to support AES CBC encryption in parallel. The scheduler supports
job submissions, job flushing and and job retrievals after completion.

The basic flow of usage of the CBC multibuffer scheduler is as follows:

- The caller allocates an aes_cbc_mb_mgr_inorder_x8 object
and initializes it once by calling aes_cbc_init_mb_mgr_inorder_x8().

- The aes_cbc_mb_mgr_inorder_x8 structure has an array of JOB_AES
objects. Allocation and scheduling of JOB_AES objects are managed
by the multibuffer scheduler support routines. The caller allocates
a JOB_AES using aes_cbc_get_next_job_inorder_x8().

- The returned JOB_AES must be filled in with parameters for CBC
encryption (eg: plaintext buffer, ciphertext buffer, key, iv, etc) and
submitted to the manager object using aes_cbc_submit_job_inorder_xx().

- If the oldest JOB_AES is completed during a call to
aes_cbc_submit_job_inorder_x8(), it is returned. Otherwise,
NULL is returned.

- A call to aes_cbc_flush_job_inorder_x8() always returns the
oldest job, unless the multibuffer manager is empty of jobs.

- A call to aes_cbc_get_completed_job_inorder_x8() returns
a completed job. This routine is useful to process completed
jobs instead of waiting for the flusher to engage.

- When a job is returned from submit or flush, the caller extracts
the useful data and returns it to the multibuffer manager implicitly
by the next call to aes_cbc_get_next_job_xx().

Jobs are always returned from submit or flush routines in the order they
were submitted (hence "inorder").A job allocated using
aes_cbc_get_next_job_inorder_x8() must be filled in and submitted before
another call. A job returned by aes_cbc_submit_job_inorder_x8() or
aes_cbc_flush_job_inorder_x8() is 'deallocated' upon the next call to
get a job structure. Calls to get_next_job() cannot fail. If all jobs are
allocated after a call to get_next_job(), the subsequent call to submit
always returns the oldest job in a completed state.

Originally-by: Chandramouli Narayanan 
Signed-off-by: Tim Chen 
---
 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c   | 145 +++
 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S | 222 +++
 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S | 416 +
 3 files changed, 783 insertions(+)
 create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_inorder_x8_asm.S
 create mode 100644 arch/x86/crypto/aes-cbc-mb/mb_mgr_ooo_x8_asm.S

diff --git a/arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c 
b/arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c
new file mode 100644
index 000..7a7f8a1
--- /dev/null
+++ b/arch/x86/crypto/aes-cbc-mb/aes_mb_mgr_init.c
@@ -0,0 +1,145 @@
+/*
+ * Initialization code for multi buffer AES CBC algorithm
+ *
+ *
+ * This file is provided under a dual BSD/GPLv2 license.  When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * James Guilford 
+ * Sean Gulley 
+ * Tim Chen 
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOS

Re: [PATCH] crypto: add asynchronous compression support

2015-10-28 Thread Dan Streetman
On Wed, Oct 21, 2015 at 3:59 AM, Li, Weigang  wrote:
>> -Original Message-
>> From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
>> Sent: Wednesday, October 21, 2015 3:34 PM
>> To: Sergey Senozhatsky
>> Cc: Minchan Kim; Joonsoo Kim; Dan Streetman; Seth Jennings; Li, Weigang;
>> Struk, Tadeusz
>> Subject: Re: [PATCH] crypto: add asynchronous compression support
>>
>> On Wed, Oct 21, 2015 at 04:33:22PM +0900, Sergey Senozhatsky wrote:
>> >
>> > the thing is -- I still want to have/use SW compressors; and they [the
>> > existing S/W algorithms] want virtual addresses. so we need to
>> > kmap_atomic pages in every SW algorithm? isn't it simpler to keep
>> > kmap_atomic_to_page around?
>>
>> The Crypto API will do it for you.  Have a look at how we handle it for aead
>> and ahash for example.
>>
>> Cheers,
>> --
>> Email: Herbert Xu  Home Page:
>> http://gondor.apana.org.au/~herbert/
>> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
> Adding the "linux-crypto" list back to the latest discussion.

re: zswap, I suspect it won't be able to use async compression,
because when its store function returns the page must be available for
reallocation.  Of course, async compression can be useful elsewhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] crypto: AES CBC multi-buffer glue code

2015-10-28 Thread Stephan Mueller
Am Mittwoch, 28. Oktober 2015, 14:19:29 schrieb Tim Chen:

Hi Tim,

>+
>+  /* check for dependent cpu features */
>+  if (!cpu_has_aes) {
>+  pr_err("aes_cbc_mb_mod_init: no aes support\n");
>+  err = -ENODEV;
>+  goto err1;
>+  }

In your post 0/5, you say that this mechanism needs AVX2. In the existing 
AESNI glue code I find

#ifdef CONFIG_X86_64
#ifdef CONFIG_AS_AVX2
if (boot_cpu_has(X86_FEATURE_AVX2)) {

...

Why would that CPU check not be needed here?

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] ARM: dts: Enable Exynos RNG module

2015-10-28 Thread Krzysztof Kozlowski
On 25.10.2015 08:58, Tobias Jakobi wrote:
> Hello Krzysztof,
> 
> 
> Krzysztof Kozlowski wrote:
>> On 20.10.2015 01:11, Tobias Jakobi wrote:
>>> Hello Krzysztof,
>>>
>>> I can confirm that this also works on a Odroid-X2, so I guess it's safe
>>> to enable the PRNG for all Exynos4412-based Odroid devices.
>>
>> Sure, I can send a patch for that. I can test it later also on Odroid-U3.
> Thanks already!
> 
> 
> 
>>> Any chance that you might also take a look at the other hwcrypto stuff
>>> on the SoC ('samsung,exynos4210-secss' compatible)?
>>
>> What do you mean? The s5p-sss driver already supports Device Tree.
> The driver supports DT, but it doesn't really work.
> 
> I'm using the following DT entry to let the driver probe correctly:
> https://github.com/tobiasjakobi/linux-odroid/commit/82c00cddb5cbf89fad994784c28c8125beae8e13
> 
> But the crypto self-test fails on boot:
> alg: skcipher: encryption failed on test 1 for ecb-aes-s5p: ret=22
> 
> 
> Another problems is that SSS and PRNG can't be used at the same time,
> since they both use common hardware resources (I think it was IO).
> 

Thanks for explaining this. I added the issue to the long TODO list but
I don't know when I will be able to dig into this.

If anyone wants to look into this, please go ahead...

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html