Re: [PATCH 1/2] crypto: caam: Delete an error message for a failed memory allocation in seven functions

2018-02-14 Thread Horia Geantă
On 2/14/2018 8:31 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 14 Feb 2018 18:22:38 +0100
> 
> Omit an extra message for a memory allocation failure in these functions.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
Reviewed-by: Horia Geantă 

though please consider the following

> @@ -689,10 +687,8 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
> caam_hash_ctx *ctx,
>   unsigned int sg_size = sg_num * sizeof(struct sec4_sg_entry);
>  
>   edesc = kzalloc(sizeof(*edesc) + sg_size, GFP_DMA | flags);
> - if (!edesc) {
> - dev_err(ctx->jrdev, "could not allocate extended descriptor\n");
> + if (!edesc)
>   return NULL;
> - }
>  
With this change, ctx parameter is no longer used in ahash_edesc_alloc().
Either here on in a different patch the function should be updated.

Thanks,
Horia


Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-14 Thread Stephan Mueller
Am Donnerstag, 15. Februar 2018, 08:03:20 CET schrieb Harsh Jain:

Hi Harsh,

> Even after guarantee of serialization, In the end we will get wrong result
> as mentioned above. which destination side cannot decrypt it. What I feel
> is scenario of sending 2 of more IOCB in case of AEAD itself is wrong.

Without the inline IV handling, I would concur.

> We
> should not allow this type of requests for AEAD.

"Not allow" as in "technically block"? As a user would only shoot itself when 
he does that not knowing the consequences, I am not in favor of such an 
artificial block.

> Can you think of any use
> case it is going to solve?

Well, I could fathom a use case of this. In FIPS 140-2 (yes, a term not well 
received by some here), NIST insists for GCM that the IV is handled by the 
cryptographic implementation.

So, when using GCM for TLS, for example, the GCM implementation would know a 
bit about how the IV is updated as a session ID. I.e. after the end of one 
AEAD operation, the IV is written back but modified such to comply with the 
rules of some higher level proto. Thus, if such a scenarios is implemented by 
a driver here, multiple IOCBs could be used with such "TLSified" GCM, for 
example.

And such "TLSification" could be as simple as implementing an IV generator 
that can be used with every (AEAD) cipher implementation.

> Can receiver decrypt(with 2 IOCB) the same request successfully without
> knowing  sender has done the operation in 2 request with size "x" each?
> > Ciao
> > Stephan



Ciao
Stephan




Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-14 Thread Harsh Jain


On 15-02-2018 11:58, Stephan Mueller wrote:
> Am Donnerstag, 15. Februar 2018, 06:30:36 CET schrieb Harsh Jain:
>
> Hi Harsh,
>
>> On 14-02-2018 18:22, Stephan Mueller wrote:
>>> Am Mittwoch, 14. Februar 2018, 06:43:53 CET schrieb Harsh Jain:
>>>
>>> Hi Harsh,
>>>
 Patch set is working fine with chelsio Driver.
>>> Thank you.
>>>
 Do we really need IV locking mechanism for AEAD algo because AEAD algo's
 don't support Partial mode operation and Driver are not updating(atleast
 Chelsio) IV's on AEAD request completions.
>>> Yes, I think we would need it. It is technically possible to have multiple
>>> IOCBs for AEAD ciphers. Even though your implementation may not write the
>>> IV back, others may do that. At least I do not see a guarantee that the
>>> IV is *not* written back by a driver.
>> There is no  use of writing IV back in AEAD algo till Framework starts
>> supporting Partial mode.
> I agree.
>
>> Even if Driver starts updating IV for AEAD,
>> Multiple IOCB's in both cases will yield wrong results only.
> This would only be the case if the driver would not implicitly or explicitly 
> serialize the requests.
>> Case 1 : If we have AEAD IV serialization  applied,  Encryption will be
>> wrong if same IV gets used.
> Agreed.
>
>> Case 2: If we do not have IV serialization for
>> AEAD. Encryption will be fine but user will have multiple Authentication 
>> tag (that too with final block processed).  Its like 2nd Block encryption
>> is based on IV received from 1st block  and Authentication Tag value is
>> based on 2nd block content only.
> Agreed.
>
> But are we sure that all drivers behave correctly? Before you notified us of 
> the issue, I was not even aware of the fact that this serialization may not 
> be 
> done in the driver. And we only have seen that issue with AF_ALG where we 
> test 
> for multiple concurrent AIO operations.
I am sure other H/W will have similar problem, It's just that we tested it 
first.

>
> Besides, when we do not have the locking for AEAD, what would we gain: one 
> less lock to take vs. guarantee that the AEAD operation is always properly 
> serialized.
Even after guarantee of serialization, In the end we will get wrong result as 
mentioned above. which destination side cannot decrypt it.
What I feel is scenario of sending 2 of more IOCB in case of AEAD itself is 
wrong.  We should not allow this type of requests for AEAD.
Can you think of any use case it is going to solve?
Can receiver decrypt(with 2 IOCB) the same request successfully without knowing 
 sender has done the operation in 2 request with size "x" each?
>
> Ciao
> Stephan
>
>



[Crypto v5 05/12] cxgb4: Inline TLS FW Interface

2018-02-14 Thread Atul Gupta
Key area size in hw-config file. CPL struct for TLS request
and response. Work request for Inline TLS.

Signed-off-by: Atul Gupta 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h   | 121 ++-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  |   2 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 165 +-
 3 files changed, 283 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index 7e12f24..9a56e0d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -81,6 +81,7 @@ enum {
CPL_RX_ISCSI_CMP  = 0x45,
CPL_TRACE_PKT_T5  = 0x48,
CPL_RX_ISCSI_DDP  = 0x49,
+   CPL_RX_TLS_CMP= 0x4E,
 
CPL_RDMA_READ_REQ = 0x60,
 
@@ -88,6 +89,7 @@ enum {
CPL_ACT_OPEN_REQ6 = 0x83,
 
CPL_TX_TLS_PDU =0x88,
+   CPL_TX_TLS_SFO= 0x89,
CPL_TX_SEC_PDU= 0x8A,
CPL_TX_TLS_ACK= 0x8B,
 
@@ -97,6 +99,7 @@ enum {
CPL_RX_MPS_PKT= 0xAF,
 
CPL_TRACE_PKT = 0xB0,
+   CPL_TLS_DATA  = 0xB1,
CPL_ISCSI_DATA= 0xB2,
 
CPL_FW4_MSG   = 0xC0,
@@ -150,6 +153,7 @@ enum {
ULP_MODE_RDMA  = 4,
ULP_MODE_TCPDDP= 5,
ULP_MODE_FCOE  = 6,
+   ULP_MODE_TLS   = 8,
 };
 
 enum {
@@ -1414,6 +1418,14 @@ struct cpl_tx_data {
 #define TX_FORCE_S 13
 #define TX_FORCE_V(x)  ((x) << TX_FORCE_S)
 
+#define TX_SHOVE_S14
+#define TX_SHOVE_V(x) ((x) << TX_SHOVE_S)
+
+#define TX_ULP_MODE_S10
+#define TX_ULP_MODE_M0x7
+#define TX_ULP_MODE_V(x) ((x) << TX_ULP_MODE_S)
+#define TX_ULP_MODE_G(x) (((x) >> TX_ULP_MODE_S) & TX_ULP_MODE_M)
+
 #define T6_TX_FORCE_S  20
 #define T6_TX_FORCE_V(x)   ((x) << T6_TX_FORCE_S)
 #define T6_TX_FORCE_F  T6_TX_FORCE_V(1U)
@@ -1428,12 +1440,21 @@ enum {
ULP_TX_SC_NOOP = 0x80,
ULP_TX_SC_IMM  = 0x81,
ULP_TX_SC_DSGL = 0x82,
-   ULP_TX_SC_ISGL = 0x83
+   ULP_TX_SC_ISGL = 0x83,
+   ULP_TX_SC_MEMRD = 0x86
 };
 
 #define ULPTX_CMD_S24
 #define ULPTX_CMD_V(x) ((x) << ULPTX_CMD_S)
 
+#define ULPTX_LEN16_S0
+#define ULPTX_LEN16_M0xFF
+#define ULPTX_LEN16_V(x) ((x) << ULPTX_LEN16_S)
+
+#define ULP_TX_SC_MORE_S 23
+#define ULP_TX_SC_MORE_V(x) ((x) << ULP_TX_SC_MORE_S)
+#define ULP_TX_SC_MORE_F  ULP_TX_SC_MORE_V(1U)
+
 struct ulptx_sge_pair {
__be32 len[2];
__be64 addr[2];
@@ -1948,4 +1969,102 @@ enum {
X_CPL_RX_MPS_PKT_TYPE_QFC   = 1 << 2,
X_CPL_RX_MPS_PKT_TYPE_PTP   = 1 << 3
 };
+
+struct cpl_tx_tls_sfo {
+   __be32 op_to_seg_len;
+   __be32 pld_len;
+   __be32 type_protover;
+   __be32 r1_lo;
+   __be32 seqno_numivs;
+   __be32 ivgen_hdrlen;
+   __be64 scmd1;
+};
+
+/* cpl_tx_tls_sfo macros */
+#define CPL_TX_TLS_SFO_OPCODE_S 24
+#define CPL_TX_TLS_SFO_OPCODE_V(x)  ((x) << CPL_TX_TLS_SFO_OPCODE_S)
+
+#define CPL_TX_TLS_SFO_DATA_TYPE_S  20
+#define CPL_TX_TLS_SFO_DATA_TYPE_V(x)   ((x) << CPL_TX_TLS_SFO_DATA_TYPE_S)
+
+#define CPL_TX_TLS_SFO_CPL_LEN_S16
+#define CPL_TX_TLS_SFO_CPL_LEN_V(x) ((x) << CPL_TX_TLS_SFO_CPL_LEN_S)
+
+#define CPL_TX_TLS_SFO_SEG_LEN_S0
+#define CPL_TX_TLS_SFO_SEG_LEN_M0x
+#define CPL_TX_TLS_SFO_SEG_LEN_V(x) ((x) << CPL_TX_TLS_SFO_SEG_LEN_S)
+#define CPL_TX_TLS_SFO_SEG_LEN_G(x) \
+   (((x) >> CPL_TX_TLS_SFO_SEG_LEN_S) & CPL_TX_TLS_SFO_SEG_LEN_M)
+
+#define CPL_TX_TLS_SFO_TYPE_S   24
+#define CPL_TX_TLS_SFO_TYPE_M   0xff
+#define CPL_TX_TLS_SFO_TYPE_V(x)((x) << CPL_TX_TLS_SFO_TYPE_S)
+#define CPL_TX_TLS_SFO_TYPE_G(x)\
+   (((x) >> CPL_TX_TLS_SFO_TYPE_S) & CPL_TX_TLS_SFO_TYPE_M)
+
+#define CPL_TX_TLS_SFO_PROTOVER_S   8
+#define CPL_TX_TLS_SFO_PROTOVER_M   0x
+#define CPL_TX_TLS_SFO_PROTOVER_V(x)((x) << CPL_TX_TLS_SFO_PROTOVER_S)
+#define CPL_TX_TLS_SFO_PROTOVER_G(x)\
+   (((x) >> CPL_TX_TLS_SFO_PROTOVER_S) & CPL_TX_TLS_SFO_PROTOVER_M)
+
+struct cpl_tls_data {
+   struct rss_header rsshdr;
+   union opcode_tid ot;
+   __be32 length_pkd;
+   __be32 seq;
+   __be32 r1;
+};
+
+#define CPL_TLS_DATA_OPCODE_S   24
+#define CPL_TLS_DATA_OPCODE_M   0xff
+#define CPL_TLS_DATA_OPCODE_V(x)((x) << CPL_TLS_DATA_OPCODE_S)
+#define CPL_TLS_DATA_OPCODE_G(x)\
+   (((x) >> CPL_TLS_DATA_OPCODE_S) & CPL_TLS_DATA_OPCODE_M)
+
+#define CPL_TLS_DATA_TID_S  0
+#define CPL_TLS_DATA_TID_M  0xff
+#define CPL_TLS_DATA_TID_V(x)   ((x) << CPL_TLS_DATA_TID_S)
+#define CPL_TLS_DATA_TID_G(x)   \
+   (((x) >> CPL_TLS_DATA_TID_S) & CPL_TLS_DATA_TID_M)
+
+#define CPL_TLS_DATA_LENGTH_S   0
+#define CPL_TLS_DATA_LENGTH_M   0x
+#define CPL_TLS_DATA_LENGTH_V(x)((x) 

[Crypto v5 12/12] Makefile Kconfig

2018-02-14 Thread Atul Gupta
Entry for Inline TLS as another driver dependent on cxgb4 and chcr

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/Kconfig| 11 +++
 drivers/crypto/chelsio/Makefile   |  1 +
 drivers/crypto/chelsio/chtls/Makefile |  4 
 3 files changed, 16 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
index 5ae9f87..930d82d 100644
--- a/drivers/crypto/chelsio/Kconfig
+++ b/drivers/crypto/chelsio/Kconfig
@@ -29,3 +29,14 @@ config CHELSIO_IPSEC_INLINE
 default n
 ---help---
   Enable support for IPSec Tx Inline.
+
+config CRYPTO_DEV_CHELSIO_TLS
+tristate "Chelsio Crypto Inline TLS Driver"
+depends on CHELSIO_T4
+depends on TLS
+select CRYPTO_DEV_CHELSIO
+---help---
+  Support Chelsio Inline TLS with Chelsio crypto accelerator.
+
+  To compile this driver as a module, choose M here: the module
+  will be called chtls.
diff --git a/drivers/crypto/chelsio/Makefile b/drivers/crypto/chelsio/Makefile
index eaecaf1..639e571 100644
--- a/drivers/crypto/chelsio/Makefile
+++ b/drivers/crypto/chelsio/Makefile
@@ -3,3 +3,4 @@ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chcr.o
 chcr-objs :=  chcr_core.o chcr_algo.o
 chcr-$(CONFIG_CHELSIO_IPSEC_INLINE) += chcr_ipsec.o
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls/
diff --git a/drivers/crypto/chelsio/chtls/Makefile 
b/drivers/crypto/chelsio/chtls/Makefile
new file mode 100644
index 000..df13795
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/Makefile
@@ -0,0 +1,4 @@
+ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4 -Idrivers/crypto/chelsio/
+
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls.o
+chtls-objs := chtls_main.o chtls_cm.o chtls_io.o chtls_hw.o
-- 
1.8.3.1



[Crypto v5 08/12] chtls: Key program

2018-02-14 Thread Atul Gupta
Program the tx and rx key on chip.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_hw.c | 394 
 1 file changed, 394 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_hw.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c 
b/drivers/crypto/chelsio/chtls/chtls_hw.c
new file mode 100644
index 000..c3e17159
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
@@ -0,0 +1,394 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static void __set_tcb_field_direct(struct chtls_sock *csk,
+  struct cpl_set_tcb_field *req, u16 word,
+  u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct ulptx_idata *sc;
+
+   INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, csk->tid);
+   req->wr.wr_mid |= htonl(FW_WR_FLOWID_V(csk->tid));
+   req->reply_ctrl = htons(NO_REPLY_V(no_reply) |
+   QUEUENO_V(csk->rss_qid));
+   req->word_cookie = htons(TCB_WORD_V(word) | TCB_COOKIE_V(cookie));
+   req->mask = cpu_to_be64(mask);
+   req->val = cpu_to_be64(val);
+   sc = (struct ulptx_idata *)(req + 1);
+   sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
+   sc->len = htonl(0);
+}
+
+static void __set_tcb_field(struct sock *sk, struct sk_buff *skb, u16 word,
+   u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+
+   req = (struct cpl_set_tcb_field *)__skb_put(skb, wrlen);
+   __set_tcb_field_direct(csk, req, word, mask, val, cookie, no_reply);
+   set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
+}
+
+static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+   unsigned int credits_needed = DIV_ROUND_UP(wrlen, 16);
+
+   skb = alloc_skb(wrlen, GFP_ATOMIC);
+   if (!skb)
+   return -ENOMEM;
+
+   __set_tcb_field(sk, skb, word, mask, val, 0, 1);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+   csk->wr_credits -= credits_needed;
+   csk->wr_unacked += credits_needed;
+   enqueue_wr(csk, skb);
+   cxgb4_ofld_send(csk->egress_dev, skb);
+   return 0;
+}
+
+/*
+ * Set one of the t_flags bits in the TCB.
+ */
+int chtls_set_tcb_tflag(struct sock *sk, unsigned int bit_pos, int val)
+{
+   return chtls_set_tcb_field(sk, 1, 1ULL << bit_pos,
+   val << bit_pos);
+}
+
+static int chtls_set_tcb_keyid(struct sock *sk, int keyid)
+{
+   return chtls_set_tcb_field(sk, 31, 0xULL, keyid);
+}
+
+static int chtls_set_tcb_seqno(struct sock *sk)
+{
+   return chtls_set_tcb_field(sk, 28, ~0ULL, 0);
+}
+
+static int chtls_set_tcb_quiesce(struct sock *sk, int val)
+{
+   return chtls_set_tcb_field(sk, 1, (1ULL << TF_RX_QUIESCE_S),
+  TF_RX_QUIESCE_V(val));
+}
+
+static void *chtls_alloc_mem(unsigned long size)
+{
+   void *p = kmalloc(size, GFP_KERNEL);
+
+   if (!p)
+   p = vmalloc(size);
+   if (p)
+   memset(p, 0, size);
+   return p;
+}
+
+static void chtls_free_mem(void *addr)
+{
+   unsigned long p = (unsigned long)addr;
+
+   if (p >= VMALLOC_START && p < VMALLOC_END)
+   vfree(addr);
+   else
+   kfree(addr);
+}
+
+/* TLS Key bitmap processing */
+int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi)
+{
+   unsigned int num_key_ctx, bsize;
+
+   num_key_ctx = (lldi->vr->key.size / TLS_KEY_CONTEXT_SZ);
+   bsize = BITS_TO_LONGS(num_key_ctx);
+
+   cdev->kmap.size = num_key_ctx;
+   cdev->kmap.available = bsize;
+   cdev->kmap.addr = chtls_alloc_mem(sizeof(*cdev->kmap.addr) *
+ bsize);
+   if (!cdev->kmap.addr)
+   return -1;
+
+   cdev->kmap.start = lldi->vr->key.start;
+   spin_lock_init(&cdev->kmap.lock);
+   return 0;
+}
+
+void chtls_free_kmap(struct chtls_dev *cdev)
+{
+   if (cdev->kmap.addr)
+   chtls_free_mem(cdev->kmap.addr);
+}
+
+static int get_new_keyid(struct chtls

[Crypto v5 07/12] chcr: Key Macro

2018-02-14 Thread Atul Gupta
Define macro for TLS Key context

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_algo.h | 42 +
 drivers/crypto/chelsio/chcr_core.h | 55 +-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.h 
b/drivers/crypto/chelsio/chcr_algo.h
index d1673a5..f263cd4 100644
--- a/drivers/crypto/chelsio/chcr_algo.h
+++ b/drivers/crypto/chelsio/chcr_algo.h
@@ -86,6 +86,39 @@
 KEY_CONTEXT_OPAD_PRESENT_M)
 #define KEY_CONTEXT_OPAD_PRESENT_F  KEY_CONTEXT_OPAD_PRESENT_V(1U)
 
+#define TLS_KEYCTX_RXFLIT_CNT_S 24
+#define TLS_KEYCTX_RXFLIT_CNT_V(x) ((x) << TLS_KEYCTX_RXFLIT_CNT_S)
+
+#define TLS_KEYCTX_RXPROT_VER_S 20
+#define TLS_KEYCTX_RXPROT_VER_M 0xf
+#define TLS_KEYCTX_RXPROT_VER_V(x) ((x) << TLS_KEYCTX_RXPROT_VER_S)
+
+#define TLS_KEYCTX_RXCIPH_MODE_S 16
+#define TLS_KEYCTX_RXCIPH_MODE_M 0xf
+#define TLS_KEYCTX_RXCIPH_MODE_V(x) ((x) << TLS_KEYCTX_RXCIPH_MODE_S)
+
+#define TLS_KEYCTX_RXAUTH_MODE_S 12
+#define TLS_KEYCTX_RXAUTH_MODE_M 0xf
+#define TLS_KEYCTX_RXAUTH_MODE_V(x) ((x) << TLS_KEYCTX_RXAUTH_MODE_S)
+
+#define TLS_KEYCTX_RXCIAU_CTRL_S 11
+#define TLS_KEYCTX_RXCIAU_CTRL_V(x) ((x) << TLS_KEYCTX_RXCIAU_CTRL_S)
+
+#define TLS_KEYCTX_RX_SEQCTR_S 9
+#define TLS_KEYCTX_RX_SEQCTR_M 0x3
+#define TLS_KEYCTX_RX_SEQCTR_V(x) ((x) << TLS_KEYCTX_RX_SEQCTR_S)
+
+#define TLS_KEYCTX_RX_VALID_S 8
+#define TLS_KEYCTX_RX_VALID_V(x) ((x) << TLS_KEYCTX_RX_VALID_S)
+
+#define TLS_KEYCTX_RXCK_SIZE_S 3
+#define TLS_KEYCTX_RXCK_SIZE_M 0x7
+#define TLS_KEYCTX_RXCK_SIZE_V(x) ((x) << TLS_KEYCTX_RXCK_SIZE_S)
+
+#define TLS_KEYCTX_RXMK_SIZE_S 0
+#define TLS_KEYCTX_RXMK_SIZE_M 0x7
+#define TLS_KEYCTX_RXMK_SIZE_V(x) ((x) << TLS_KEYCTX_RXMK_SIZE_S)
+
 #define CHCR_HASH_MAX_DIGEST_SIZE 64
 #define CHCR_MAX_SHA_DIGEST_SIZE 64
 
@@ -176,6 +209,15 @@
  KEY_CONTEXT_SALT_PRESENT_V(1) | \
  KEY_CONTEXT_CTX_LEN_V((ctx_len)))
 
+#define  FILL_KEY_CRX_HDR(ck_size, mk_size, d_ck, opad, ctx_len) \
+   htonl(TLS_KEYCTX_RXMK_SIZE_V(mk_size) | \
+ TLS_KEYCTX_RXCK_SIZE_V(ck_size) | \
+ TLS_KEYCTX_RX_VALID_V(1) | \
+ TLS_KEYCTX_RX_SEQCTR_V(3) | \
+ TLS_KEYCTX_RXAUTH_MODE_V(4) | \
+ TLS_KEYCTX_RXCIPH_MODE_V(2) | \
+ TLS_KEYCTX_RXFLIT_CNT_V((ctx_len)))
+
 #define FILL_WR_OP_CCTX_SIZE \
htonl( \
FW_CRYPTO_LOOKASIDE_WR_OPCODE_V( \
diff --git a/drivers/crypto/chelsio/chcr_core.h 
b/drivers/crypto/chelsio/chcr_core.h
index 3c29ee0..77056a9 100644
--- a/drivers/crypto/chelsio/chcr_core.h
+++ b/drivers/crypto/chelsio/chcr_core.h
@@ -65,10 +65,58 @@
 struct _key_ctx {
__be32 ctx_hdr;
u8 salt[MAX_SALT];
-   __be64 reserverd;
+   __be64 iv_to_auth;
unsigned char key[0];
 };
 
+#define KEYCTX_TX_WR_IV_S  55
+#define KEYCTX_TX_WR_IV_M  0x1ffULL
+#define KEYCTX_TX_WR_IV_V(x) ((x) << KEYCTX_TX_WR_IV_S)
+#define KEYCTX_TX_WR_IV_G(x) \
+   (((x) >> KEYCTX_TX_WR_IV_S) & KEYCTX_TX_WR_IV_M)
+
+#define KEYCTX_TX_WR_AAD_S 47
+#define KEYCTX_TX_WR_AAD_M 0xffULL
+#define KEYCTX_TX_WR_AAD_V(x) ((x) << KEYCTX_TX_WR_AAD_S)
+#define KEYCTX_TX_WR_AAD_G(x) (((x) >> KEYCTX_TX_WR_AAD_S) & \
+   KEYCTX_TX_WR_AAD_M)
+
+#define KEYCTX_TX_WR_AADST_S 39
+#define KEYCTX_TX_WR_AADST_M 0xffULL
+#define KEYCTX_TX_WR_AADST_V(x) ((x) << KEYCTX_TX_WR_AADST_S)
+#define KEYCTX_TX_WR_AADST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AADST_S) & KEYCTX_TX_WR_AADST_M)
+
+#define KEYCTX_TX_WR_CIPHER_S 30
+#define KEYCTX_TX_WR_CIPHER_M 0x1ffULL
+#define KEYCTX_TX_WR_CIPHER_V(x) ((x) << KEYCTX_TX_WR_CIPHER_S)
+#define KEYCTX_TX_WR_CIPHER_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHER_S) & KEYCTX_TX_WR_CIPHER_M)
+
+#define KEYCTX_TX_WR_CIPHERST_S 23
+#define KEYCTX_TX_WR_CIPHERST_M 0x7f
+#define KEYCTX_TX_WR_CIPHERST_V(x) ((x) << KEYCTX_TX_WR_CIPHERST_S)
+#define KEYCTX_TX_WR_CIPHERST_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHERST_S) & KEYCTX_TX_WR_CIPHERST_M)
+
+#define KEYCTX_TX_WR_AUTH_S 14
+#define KEYCTX_TX_WR_AUTH_M 0x1ff
+#define KEYCTX_TX_WR_AUTH_V(x) ((x) << KEYCTX_TX_WR_AUTH_S)
+#define KEYCTX_TX_WR_AUTH_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTH_S) & KEYCTX_TX_WR_AUTH_M)
+
+#define KEYCTX_TX_WR_AUTHST_S 7
+#define KEYCTX_TX_WR_AUTHST_M 0x7f
+#define KEYCTX_TX_WR_AUTHST_V(x) ((x) << KEYCTX_TX_WR_AUTHST_S)
+#define KEYCTX_TX_WR_AUTHST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHST_S) & KEYCTX_TX_WR_AUTHST_M)
+
+#define KEYCTX_TX_WR_AUTHIN_S 0
+#define KEYCTX_TX_WR_AUTHIN_M 0x7f
+#define KEYCTX_TX_WR_AUTHIN_V(x) ((x) << KEYCTX_TX_WR_AUTHIN_S)
+#define KEYCTX_TX_WR_AUTHIN_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHIN_S) & KEYCTX_TX_WR_AUTHIN_M)
+
 struct chcr_wr {
struct fw_crypto_lookaside_wr wreq;
struct ulp_txpkt ulptx;
@@ -90,6 +138,11 @@ struct uld_ctx {
struct chcr_dev *dev;
 };
 
+struct sge_opaq

[Crypto v5 10/12] chtls: Inline crypto request Tx/Rx

2018-02-14 Thread Atul Gupta
TLS handler for record transmit and receive.
Create Inline TLS work request and post to FW.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 1867 +++
 1 file changed, 1867 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_io.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
new file mode 100644
index 000..a0f03fb
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -0,0 +1,1867 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static bool is_tls_hw(struct chtls_sock *csk)
+{
+   return csk->tlshws.ofld;
+}
+
+static bool is_tls_rx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.rxkey >= 0);
+}
+
+static bool is_tls_tx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.txkey >= 0);
+}
+
+static bool is_tls_skb(struct chtls_sock *csk, const struct sk_buff *skb)
+{
+   return (is_tls_hw(csk) && skb_ulp_tls_skb_flags(skb));
+}
+
+static int key_size(void *sk)
+{
+   return 16; /* Key on DDR */
+}
+
+#define ceil(x, y) \
+   ({ unsigned long __x = (x), __y = (y); (__x + __y - 1) / __y; })
+
+static int data_sgl_len(const struct sk_buff *skb)
+{
+   unsigned int cnt;
+
+   cnt = skb_shinfo(skb)->nr_frags;
+   return (sgl_len(cnt) * 8);
+}
+
+static int nos_ivs(struct sock *sk, unsigned int size)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+
+   return ceil(size, csk->tlshws.mfs);
+}
+
+#define TLS_WR_CPL_LEN \
+   (sizeof(struct fw_tlstx_data_wr) + \
+   sizeof(struct cpl_tx_tls_sfo))
+
+static int is_ivs_imm(struct sock *sk, const struct sk_buff *skb)
+{
+   int ivs_size = nos_ivs(sk, skb->len) * CIPHER_BLOCK_SIZE;
+   int hlen = TLS_WR_CPL_LEN + data_sgl_len(skb);
+
+   if ((hlen + key_size(sk) + ivs_size) <
+   MAX_IMM_OFLD_TX_DATA_WR_LEN) {
+   ULP_SKB_CB(skb)->ulp.tls.iv = 1;
+   return 1;
+   }
+   ULP_SKB_CB(skb)->ulp.tls.iv = 0;
+   return 0;
+}
+
+static int max_ivs_size(struct sock *sk, int size)
+{
+   return (nos_ivs(sk, size) * CIPHER_BLOCK_SIZE);
+}
+
+static int ivs_size(struct sock *sk, const struct sk_buff *skb)
+{
+   return (is_ivs_imm(sk, skb) ? (nos_ivs(sk, skb->len) *
+CIPHER_BLOCK_SIZE) : 0);
+}
+
+static int flowc_wr_credits(int nparams, int *flowclenp)
+{
+   int flowclen16, flowclen;
+
+   flowclen = offsetof(struct fw_flowc_wr, mnemval[nparams]);
+   flowclen16 = DIV_ROUND_UP(flowclen, 16);
+   flowclen = flowclen16 * 16;
+
+   if (flowclenp)
+   *flowclenp = flowclen;
+
+   return flowclen16;
+}
+
+static struct sk_buff *create_flowc_wr_skb(struct sock *sk,
+  struct fw_flowc_wr *flowc,
+  int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   skb = alloc_skb(flowclen, GFP_ATOMIC);
+   if (!skb)
+   return NULL;
+
+   memcpy(__skb_put(skb, flowclen), flowc, flowclen);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+
+   return skb;
+}
+
+static int send_flowc_wr(struct sock *sk, struct fw_flowc_wr *flowc,
+int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   bool syn_sent = (sk->sk_state == TCP_SYN_SENT);
+   struct tcp_sock *tp = tcp_sk(sk);
+   int flowclen16 = flowclen / 16;
+   struct sk_buff *skb;
+
+   if (csk_flag(sk, CSK_TX_DATA_SENT)) {
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+
+   if (syn_sent)
+   __skb_queue_tail(&csk->ooo_queue, skb);
+   else
+   skb_entail(sk, skb,
+  ULPCB_FLAG_NO_HDR | ULPCB_FLAG_NO_APPEND);
+   return 0;
+   }
+
+   if (!syn_sent) {
+   int ret;
+
+   ret = cxgb4_immdata_send(csk->egress_dev,
+csk->txq_idx,
+flowc, flowclen);
+   if (!ret)
+   return flowclen16;
+   }
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+   send_or_defer(sk, tp, skb, 0);
+   return flowclen16;
+}
+
+static u8 tcp_state_to_flowc_s

[Crypto v5 06/12] cxgb4: LLD driver changes to enable TLS

2018-02-14 Thread Atul Gupta
Read FW capability. Read key area size. Dump the TLS record count.

Signed-off-by: Atul Gupta 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 18 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 32 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |  7 ++
 drivers/net/ethernet/chelsio/cxgb4/sge.c   | 98 +-
 4 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index cf47183..cfc9210 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2826,8 +2826,8 @@ static int meminfo_show(struct seq_file *seq, void *v)
"Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:",
"TDDP region:", "TPT region:", "STAG region:", "RQ region:",
"RQUDP region:", "PBL region:", "TXPBL region:",
-   "DBVFIFO region:", "ULPRX state:", "ULPTX state:",
-   "On-chip queues:"
+   "TLSKey region:", "DBVFIFO region:", "ULPRX state:",
+   "ULPTX state:", "On-chip queues:"
};
 
int i, n;
@@ -2943,6 +2943,12 @@ static int meminfo_show(struct seq_file *seq, void *v)
ulp_region(RX_RQUDP);
ulp_region(RX_PBL);
ulp_region(TX_PBL);
+   if (adap->params.crypto & FW_CAPS_CONFIG_TLS_INLINE) {
+   ulp_region(RX_TLS_KEY);
+   } else {
+   md->base = 0;
+   md->idx = ARRAY_SIZE(region);
+   }
 #undef ulp_region
md->base = 0;
md->idx = ARRAY_SIZE(region);
@@ -3098,6 +3104,14 @@ static int chcr_show(struct seq_file *seq, void *v)
   atomic_read(&adap->chcr_stats.fallback));
seq_printf(seq, "IPSec PDU: %10u\n",
   atomic_read(&adap->chcr_stats.ipsec_cnt));
+
+   seq_puts(seq, "\nChelsio Inline TLS Stats\n");
+   seq_printf(seq, "TLS PDU Tx: %u\n",
+  atomic_read(&adap->chcr_stats.tls_pdu_tx));
+   seq_printf(seq, "TLS PDU Rx: %u\n",
+  atomic_read(&adap->chcr_stats.tls_pdu_rx));
+   seq_printf(seq, "TLS Keys (DDR) Count: %u\n",
+  atomic_read(&adap->chcr_stats.tls_key));
return 0;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 05a4abf..60eb18b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4086,18 +4086,32 @@ static int adap_init0(struct adapter *adap)
adap->num_ofld_uld += 2;
}
if (caps_cmd.cryptocaps) {
-   /* Should query params here...TODO */
-   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
-   ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 2,
- params, val);
-   if (ret < 0) {
-   if (ret != -EINVAL)
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_CRYPTO_LOOKASIDE) {
+   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0) {
+   if (ret != -EINVAL)
+   goto bye;
+   } else {
+   adap->vres.ncrypto_fc = val[0];
+   }
+   adap->num_ofld_uld += 1;
+   }
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_TLS_INLINE) {
+   params[0] = FW_PARAM_PFVF(TLS_START);
+   params[1] = FW_PARAM_PFVF(TLS_END);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0)
goto bye;
-   } else {
-   adap->vres.ncrypto_fc = val[0];
+   adap->vres.key.start = val[0];
+   adap->vres.key.size = val[1] - val[0] + 1;
+   adap->num_uld += 1;
}
adap->params.crypto = ntohs(caps_cmd.cryptocaps);
-   adap->num_uld += 1;
}
 #undef FW_PARAM_PFVF
 #undef FW_PARAM_DEV
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 1d37672..55863f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -237,6 +237,7 @@ enum cxgb4_uld {
CXGB4_ULD_ISCSI,
CXGB4_ULD_ISCSIT,
CXGB4_ULD_CRYPTO,
+   CXGB4_ULD_TLS,
CXGB4_ULD_MAX
 };
 
@@ -287,6 +288,7 @@ struct cxgb4_virt_

[Crypto v5 11/12] chtls: Register the chtls Inline TLS with net tls

2018-02-14 Thread Atul Gupta
Add new uld driver for Inline TLS support. Register ULP for chtls.
Setsockopt to program key on chip. support AES GCM key size 128.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_main.c | 574 ++
 include/uapi/linux/tls.h  |   1 +
 2 files changed, 575 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_main.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
new file mode 100644
index 000..66d4ce9
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -0,0 +1,574 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+#define DRV_NAME "chtls"
+
+/*
+ * chtls device management
+ * maintains a list of the chtls devices
+ */
+static LIST_HEAD(cdev_list);
+static DEFINE_MUTEX(cdev_mutex);
+static DEFINE_MUTEX(cdev_list_lock);
+
+static struct proto chtls_cpl_prot;
+static DEFINE_MUTEX(notify_mutex);
+static RAW_NOTIFIER_HEAD(listen_notify_list);
+struct request_sock_ops chtls_rsk_ops;
+static uint send_page_order = (14 - PAGE_SHIFT < 0) ? 0 : 14 - PAGE_SHIFT;
+
+static int register_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(¬ify_mutex);
+   err = raw_notifier_chain_register(&listen_notify_list, nb);
+   mutex_unlock(¬ify_mutex);
+   return err;
+}
+
+static int unregister_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(¬ify_mutex);
+   err = raw_notifier_chain_unregister(&listen_notify_list, nb);
+   mutex_unlock(¬ify_mutex);
+   return err;
+}
+
+static int listen_notify_handler(struct notifier_block *this,
+unsigned long event, void *data)
+{
+   struct sock *sk = data;
+   struct chtls_dev *cdev;
+   int ret =  NOTIFY_DONE;
+
+   switch (event) {
+   case CHTLS_LISTEN_START:
+   case CHTLS_LISTEN_STOP:
+   mutex_lock(&cdev_list_lock);
+   list_for_each_entry(cdev, &cdev_list, list) {
+   if (event == CHTLS_LISTEN_START)
+   ret = chtls_listen_start(cdev, sk);
+   else
+   chtls_listen_stop(cdev, sk);
+   }
+   mutex_unlock(&cdev_list_lock);
+   break;
+   }
+   return ret;
+}
+
+static struct notifier_block listen_notifier = {
+   .notifier_call = listen_notify_handler
+};
+
+static int listen_backlog_rcv(struct sock *sk, struct sk_buff *skb)
+{
+   if (likely(skb_transport_header(skb) != skb_network_header(skb)))
+   return tcp_v4_do_rcv(sk, skb);
+   BLOG_SKB_CB(skb)->backlog_rcv(sk, skb);
+   return 0;
+}
+
+static int chtls_start_listen(struct sock *sk)
+{
+   int err;
+
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   if (sk->sk_family == PF_INET &&
+   LOOPBACK(inet_sk(sk)->inet_rcv_saddr))
+   return -EADDRNOTAVAIL;
+
+   sk->sk_backlog_rcv = listen_backlog_rcv;
+   mutex_lock(¬ify_mutex);
+   err = raw_notifier_call_chain(&listen_notify_list, 0, sk);
+   mutex_unlock(¬ify_mutex);
+   return err;
+}
+
+static int chtls_stop_listen(struct sock *sk)
+{
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   mutex_lock(¬ify_mutex);
+   raw_notifier_call_chain(&listen_notify_list, 1, sk);
+   mutex_unlock(¬ify_mutex);
+   return 0;
+}
+
+int chtls_netdev(struct tls_device *dev,
+struct net_device *netdev)
+{
+   struct chtls_dev *cdev = to_chtls_dev(dev);
+   int i;
+
+   for (i = 0; i < cdev->lldi->nports; i++)
+   if (cdev->ports[i] == netdev)
+   return 1;
+
+   return 0;
+}
+
+int chtls_inline_feature(struct tls_device *dev)
+{
+   struct chtls_dev *cdev = to_chtls_dev(dev);
+   struct net_device *netdev;
+   int i;
+
+   for (i = 0; i < cdev->lldi->nports; i++) {
+   netdev = cdev->ports[i];
+   if (netdev->features & NETIF_F_HW_TLS_INLINE)
+   return 1;
+   }
+   return 1;
+}
+
+int chtls_create_hash(struct tls_device *dev, struct sock *sk)
+{
+   if (sk->sk_state == TCP_LISTEN)
+   return chtls_start_listen(sk);
+   return 0;
+}
+
+void chtls_destroy_hash(struct tls_device *dev, struct sock *sk)
+{
+   if (sk->sk_state == TCP_LISTEN)
+   chtls_stop_listen(sk);
+}
+
+static void chtls_regis

[Crypto v5 00/12] Chelsio Inline TLS

2018-02-14 Thread Atul Gupta
Series for Chelsio Inline TLS driver (chtls.ko)

Driver use the ULP infrastructure to register chtls as Inline TLS ULP.
Chtls use TCP Sockets to transmit and receive TLS record. TCP proto_ops
is extended to offload TLS record.

T6 adapter provides the following features:
-TLS record offload, TLS header, encrypt, digest and transmit
-TLS record receive and decrypt
-TLS keys store
-TCP/IP engine
-TLS engine
-GCM crypto engine [support CBC also]

TLS provides security at the transport layer. It uses TCP to provide
reliable end-to-end transport of application data. It relies on TCP
for any retransmission. TLS session comprises of three parts:
a. TCP/IP connection
b. TLS handshake
c. Record layer processing

TLS handshake state machine is executed in host (refer standard
implementation eg. OpenSSL).  Setsockopt [SOL_TCP, TCP_ULP] initialize
TCP proto-ops for Chelsio inline tls support. setsockopt(sock, SOL_TCP,
TCP_ULP, "tls", sizeof("tls"));

Tx and Rx Keys are decided during handshake and programmed onto the chip
after CCS is exchanged.
struct tls12_crypto_info_aes_gcm_128 crypto_info
setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info))
Finish is the first encrypted/decrypted message tx/rx inline.

On the Tx path TLS engine receive plain text from openssl, insert IV,
fetches the tx key, create cipher text records and generate MAC. TLS
header is added to cipher text and forward to TCP/IP engine for transport
layer processing and transmission on wire.
TX:
Application--openssl--chtls---TLS engine---encrypt/auth---TCP/IP
engine---wire.

On the Rx side, data received is PDU aligned at record
boundaries. TLS processes only the complete record. If rx key is programmed
on CCS receive, data is decrypted and plain text is posted to host.
RX:
Wire--cipher-text--TCP/IP engine [PDU align]---TLS engine---
decrypt/auth---plain-text--chtls--openssl--application

v5: set TLS_FULL_HW for registered inline tls drivers
   -set TLS_FULL_HW prot for offload connection else move
to TLS_SW_TX
   -Case handled for interface with same IP [David Miller]
   -Removed Specific IP and INADDR_ANY handling [v4]

v4: removed chtls ULP type, retained tls ULP
   -registered chtls with net tls
   -defined struct tls_device to register the Inline drivers
   -ethtool interface tls-inline to enable Inline TLS for interface
   -prot update to support inline TLS

v3: fixed the kbuild test issues
   -made few funtions static
   -initialized few variables

v2: fixed the following based on the review comments of Stephan Mueller,
Stefano Brivio and Hannes Frederic
-Added more details in cover letter
-Fixed indentation and formating issues
-Using aes instead of aes-generic
-memset key info after programing the key on chip
-reordered the patch sequence

Atul Gupta (12):
  tls: tls_device struct to register TLS drivers
  ethtool: feature for Inline TLS in HW
  support for inline tls
  chtls: structure and macro definiton
  cxgb4: Inline TLS FW Interface
  cxgb4: LLD driver changes to enable TLS
  chcr: Key Macro
  chtls: Key program
  chtls: CPL handler definition
  chtls: Inline crypto request Tx/Rx
  chtls: Register the chtls Inline TLS with net tls
  Makefile Kconfig

 drivers/crypto/chelsio/Kconfig |   11 +
 drivers/crypto/chelsio/Makefile|1 +
 drivers/crypto/chelsio/chcr_algo.h |   42 +
 drivers/crypto/chelsio/chcr_core.h |   55 +-
 drivers/crypto/chelsio/chtls/Makefile  |4 +
 drivers/crypto/chelsio/chtls/chtls.h   |  487 +
 drivers/crypto/chelsio/chtls/chtls_cm.c| 2046 
 drivers/crypto/chelsio/chtls/chtls_cm.h|  203 ++
 drivers/crypto/chelsio/chtls/chtls_hw.c|  394 
 drivers/crypto/chelsio/chtls/chtls_io.c| 1867 ++
 drivers/crypto/chelsio/chtls/chtls_main.c  |  574 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |   18 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|   32 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |7 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   98 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|  121 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h   |2 +
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  |  165 +-
 include/linux/netdev_features.h|2 +
 include/net/tls.h  |   24 +
 include/uapi/linux/tls.h   |1 +
 net/core/ethtool.c |1 +
 net/ipv4/tcp_minisocks.c   |1 +
 net/tls/tls_main.c |  124 +-
 24 files changed, 6254 insertions(+), 26 deletions(-)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644 drivers/crypto/chel

[Crypto v5 01/12] tls: tls_device struct to register TLS drivers

2018-02-14 Thread Atul Gupta
tls_device structure to register Inline TLS drivers
with net/tls

Signed-off-by: Atul Gupta 
---
 include/net/tls.h | 24 
 1 file changed, 24 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 936cfc5..6b64510 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -55,6 +55,28 @@
 
 #define TLS_AAD_SPACE_SIZE 13
 
+#define TLS_DEVICE_NAME_MAX 32
+
+enum {
+   TLS_BASE_TX,
+   TLS_SW_TX,
+   TLS_FULL_HW, /* TLS record processed Inline */
+   TLS_NUM_CONFIG,
+};
+extern struct proto tls_prots[TLS_NUM_CONFIG];
+
+struct tls_device {
+   char name[TLS_DEVICE_NAME_MAX];
+   struct list_head dev_list;
+
+   /* netdev present in registered inline tls driver */
+   int (*netdev)(struct tls_device *device,
+ struct net_device *netdev);
+   int (*feature)(struct tls_device *device);
+   int (*hash)(struct tls_device *device, struct sock *sk);
+   void (*unhash)(struct tls_device *device, struct sock *sk);
+};
+
 struct tls_sw_context {
struct crypto_aead *aead_send;
 
@@ -254,5 +276,7 @@ static inline struct tls_offload_context *tls_offload_ctx(
 
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
+void tls_register_device(struct tls_device *device);
+void tls_unregister_device(struct tls_device *device);
 
 #endif /* _TLS_OFFLOAD_H */
-- 
1.8.3.1



[Crypto v5 09/12] chtls: CPL handler definition

2018-02-14 Thread Atul Gupta
CPL handlers for TLS session, record transmit and receive.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_cm.c | 2046 +++
 net/ipv4/tcp_minisocks.c|1 +
 2 files changed, 2047 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.c 
b/drivers/crypto/chelsio/chtls/chtls_cm.c
new file mode 100644
index 000..670bac6
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -0,0 +1,2046 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+extern struct request_sock_ops chtls_rsk_ops;
+
+/*
+ * State transitions and actions for close.  Note that if we are in SYN_SENT
+ * we remain in that state as we cannot control a connection while it's in
+ * SYN_SENT; such connections are allowed to establish and are then aborted.
+ */
+static unsigned char new_state[16] = {
+   /* current state: new state:  action: */
+   /* (Invalid)   */ TCP_CLOSE,
+   /* TCP_ESTABLISHED */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_SYN_SENT*/ TCP_SYN_SENT,
+   /* TCP_SYN_RECV*/ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_FIN_WAIT1   */ TCP_FIN_WAIT1,
+   /* TCP_FIN_WAIT2   */ TCP_FIN_WAIT2,
+   /* TCP_TIME_WAIT   */ TCP_CLOSE,
+   /* TCP_CLOSE   */ TCP_CLOSE,
+   /* TCP_CLOSE_WAIT  */ TCP_LAST_ACK | TCP_ACTION_FIN,
+   /* TCP_LAST_ACK*/ TCP_LAST_ACK,
+   /* TCP_LISTEN  */ TCP_CLOSE,
+   /* TCP_CLOSING */ TCP_CLOSING,
+};
+
+static struct chtls_sock *chtls_sock_create(struct chtls_dev *cdev)
+{
+   struct chtls_sock *csk = kzalloc(sizeof(*csk), GFP_ATOMIC);
+
+   if (!csk)
+   return NULL;
+
+   csk->txdata_skb_cache = alloc_skb(TXDATA_SKB_LEN, GFP_ATOMIC);
+   if (!csk->txdata_skb_cache) {
+   kfree(csk);
+   return NULL;
+   }
+
+   kref_init(&csk->kref);
+   csk->cdev = cdev;
+   skb_queue_head_init(&csk->txq);
+   csk->wr_skb_head = NULL;
+   csk->wr_skb_tail = NULL;
+   csk->mss = MAX_MSS;
+   csk->tlshws.ofld = 1;
+   csk->tlshws.txkey = -1;
+   csk->tlshws.rxkey = -1;
+   csk->tlshws.mfs = TLS_MFS;
+   skb_queue_head_init(&csk->tlshws.sk_recv_queue);
+   return csk;
+}
+
+static void chtls_sock_release(struct kref *ref)
+{
+   struct chtls_sock *csk =
+   container_of(ref, struct chtls_sock, kref);
+
+   kfree(csk);
+}
+
+static struct net_device *chtls_ipv4_netdev(struct chtls_dev *cdev,
+   struct sock *sk)
+{
+   struct net_device *ndev = cdev->ports[0];
+
+   if (likely(!inet_sk(sk)->inet_rcv_saddr))
+   return ndev;
+
+   ndev = ip_dev_find(&init_net, inet_sk(sk)->inet_rcv_saddr);
+   if (!ndev)
+   return NULL;
+
+   if (is_vlan_dev(ndev))
+   return vlan_dev_real_dev(ndev);
+   return ndev;
+}
+
+static void assign_rxopt(struct sock *sk, unsigned int opt)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct tcp_sock *tp = tcp_sk(sk);
+   const struct chtls_dev *cdev;
+
+   cdev = csk->cdev;
+   tp->tcp_header_len   = sizeof(struct tcphdr);
+   tp->rx_opt.mss_clamp = cdev->mtus[TCPOPT_MSS_G(opt)] - 40;
+   tp->mss_cache= tp->rx_opt.mss_clamp;
+   tp->rx_opt.tstamp_ok = TCPOPT_TSTAMP_G(opt);
+   tp->rx_opt.snd_wscale= TCPOPT_SACK_G(opt);
+   tp->rx_opt.wscale_ok = TCPOPT_WSCALE_OK_G(opt);
+   SND_WSCALE(tp)   = TCPOPT_SND_WSCALE_G(opt);
+   if (!tp->rx_opt.wscale_ok)
+   tp->rx_opt.rcv_wscale = 0;
+   if (tp->rx_opt.tstamp_ok) {
+   tp->tcp_header_len += TCPOLEN_TSTAMP_ALIGNED;
+   tp->rx_opt.mss_clamp -= TCPOLEN_TSTAMP_ALIGNED;
+   } else if (csk->opt2 & TSTAMPS_EN_F) {
+   csk->opt2 &= ~TSTAMPS_EN_F;
+   csk->mtu_idx = TCPOPT_MSS_G(opt);
+   }
+}
+
+static void chtls_purge_rcv_queue(struct sock *sk)
+{
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
+   skb_dst_set(skb, (void *)NULL);
+   kfree_skb(skb);
+   }
+}
+
+static void chtls_purge_write_queue(struct sock *sk)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(&csk->txq))) {

[Crypto v5 04/12] chtls: structure and macro definiton

2018-02-14 Thread Atul Gupta
Inline TLS state, connection management. Supporting macros definition.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls.h| 487 
 drivers/crypto/chelsio/chtls/chtls_cm.h | 203 +
 2 files changed, 690 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.h

diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
new file mode 100644
index 000..c7b8d59
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -0,0 +1,487 @@
+/*
+ * Copyright (c) 2016 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __CHTLS_H__
+#define __CHTLS_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "t4fw_api.h"
+#include "t4_msg.h"
+#include "cxgb4.h"
+#include "cxgb4_uld.h"
+#include "l2t.h"
+#include "chcr_algo.h"
+#include "chcr_core.h"
+#include "chcr_crypto.h"
+
+#define CIPHER_BLOCK_SIZE   16
+#define MAX_IVS_PAGE256
+#define TLS_KEY_CONTEXT_SZ 64
+#define TLS_HEADER_LENGTH  5
+#define SCMD_CIPH_MODE_AES_GCM  2
+#define GCM_TAG_SIZE16
+#define AEAD_EXPLICIT_DATA_SIZE 8
+/* Any MFS size should work and come from openssl */
+#define TLS_MFS16384
+
+#define SOCK_INLINE (31)
+#define RSS_HDR sizeof(struct rss_header)
+
+enum {
+   CHTLS_KEY_CONTEXT_DSGL,
+   CHTLS_KEY_CONTEXT_IMM,
+   CHTLS_KEY_CONTEXT_DDR,
+};
+
+enum {
+   CHTLS_LISTEN_START,
+   CHTLS_LISTEN_STOP,
+};
+
+/* Flags for return value of CPL message handlers */
+enum {
+   CPL_RET_BUF_DONE = 1,   /* buffer processing done */
+   CPL_RET_BAD_MSG = 2,/* bad CPL message */
+   CPL_RET_UNKNOWN_TID = 4 /* unexpected unknown TID */
+};
+
+#define TLS_RCV_ST_READ_HEADER  0xF0
+#define TLS_RCV_ST_READ_BODY0xF1
+#define TLS_RCV_ST_READ_DONE0xF2
+#define TLS_RCV_ST_READ_NB  0xF3
+
+#define RSPQ_HASH_BITS 5
+#define LISTEN_INFO_HASH_SIZE 32
+struct listen_info {
+   struct listen_info *next;  /* Link to next entry */
+   struct sock *sk;   /* The listening socket */
+   unsigned int stid; /* The server TID */
+};
+
+enum {
+   T4_LISTEN_START_PENDING,
+   T4_LISTEN_STARTED
+};
+
+enum csk_flags {
+   CSK_CALLBACKS_CHKD, /* socket callbacks have been sanitized */
+   CSK_ABORT_REQ_RCVD, /* received one ABORT_REQ_RSS message */
+   CSK_TX_MORE_DATA,   /* sending ULP data; don't set SHOVE bit */
+   CSK_TX_WAIT_IDLE,   /* suspend Tx until in-flight data is ACKed */
+   CSK_ABORT_SHUTDOWN, /* shouldn't send more abort requests */
+   CSK_ABORT_RPL_PENDING,  /* expecting an abort reply */
+   CSK_CLOSE_CON_REQUESTED,/* we've sent a close_conn_req */
+   CSK_TX_DATA_SENT,   /* sent a TX_DATA WR on this connection */
+   CSK_TX_FAILOVER,/* Tx traffic failing over */
+   CSK_UPDATE_RCV_WND, /* Need to update rcv window */
+   CSK_RST_ABORTED,/* outgoing RST was aborted */
+   CSK_TLS_HANDSHK,/* TLS Handshake */
+};
+
+struct listen_ctx {
+   struct sock *lsk;
+   struct chtls_dev *cdev;
+   u32 state;
+};
+
+struct key_map {
+   unsigned long *addr;
+   unsigned int start;
+   unsigned int available;
+   unsigned int size;
+   spinlock_t lock; /* lock for key id request from map */
+} __packed;
+
+struct tls_scmd {
+   __be32 seqno_numivs;
+   __be32 ivgen_hdrlen;
+};
+
+struct chtls_dev {
+   struct tls_device tlsdev;
+   struct list_head list;
+   struct cxgb4_lld_info *lldi;
+   struct pci_dev *pdev;
+   struct listen_info *listen_hash_tab[LISTEN_INFO_HASH_SIZE];
+   spinlock_t listen_lock; /* lock for listen list */
+   struct net_device **ports;
+   struct tid_info *tids;
+   unsigned int pfvf;
+   const unsigned short *mtus;
+
+   spinlock_t aidr_lock cacheline_aligned_in_smp;
+   struct idr aidr; /* ATID id space */
+   struct idr hwtid_idr;
+   struct idr stid_idr;
+
+   spinlock_t idr_lock cacheline_aligned_in_smp;
+
+   struct net_device *egr_dev[NCHAN * 2];
+   struct sk_buff *rspq_skb_cache[1 << RSPQ_HASH_BITS];
+   struct sk_buff *askb;
+
+   struct sk_buff_head deferq;
+   struct work_struct deferq_task;
+
+   struct list_head list_node;
+   struct list_head rcu_node;
+   struct list_head na_node;
+   unsigned int send_page_order;
+   struct key_map kmap;
+};
+
+struct chtls_hws {
+   struct 

[Crypto v5 02/12] ethtool: feature for Inline TLS in HW

2018-02-14 Thread Atul Gupta
Signed-off-by: Atul Gupta 
---
 include/linux/netdev_features.h | 2 ++
 net/core/ethtool.c  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index b1b0ca7..e1a33b7 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -77,6 +77,7 @@ enum {
NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload 
*/
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
+   NETIF_F_HW_TLS_INLINE_BIT,  /* Offload TLS record */
 
/*
 * Add your fresh new feature above and remember to update
@@ -142,6 +143,7 @@ enum {
 #define NETIF_F_HW_ESP __NETIF_F(HW_ESP)
 #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
 #defineNETIF_F_RX_UDP_TUNNEL_PORT  __NETIF_F(RX_UDP_TUNNEL_PORT)
+#define NETIF_F_HW_TLS_INLINE   __NETIF_F(HW_TLS_INLINE)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf45..cac1c77 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -106,6 +106,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct 
ethtool_ts_info *info)
[NETIF_F_HW_ESP_BIT] =   "esp-hw-offload",
[NETIF_F_HW_ESP_TX_CSUM_BIT] =   "esp-tx-csum-hw-offload",
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] =   "rx-udp_tunnel-port-offload",
+   [NETIF_F_HW_TLS_INLINE_BIT] ="tls-inline",
 };
 
 static const char
-- 
1.8.3.1



[Crypto v5 03/12] support for inline tls

2018-02-14 Thread Atul Gupta
Facility to register Inline TLS drivers to net/tls. Setup
TLS_FULL_HW prot to listen on offload device.

Cases handled
1. Inline TLS device exists, setup prot for TLS_FULL_HW
2. Atleast one Inline TLS exists, sets TLS_FULL_HW. If
non-inline capable device establish connection, move to TLS_SW_TX
3. default mode TLS_SW_TX continues

Signed-off-by: Atul Gupta 
---
 net/tls/tls_main.c | 124 ++---
 1 file changed, 117 insertions(+), 7 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index e07ee3a..81c61e9 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -45,13 +46,9 @@
 MODULE_DESCRIPTION("Transport Layer Security Support");
 MODULE_LICENSE("Dual BSD/GPL");
 
-enum {
-   TLS_BASE_TX,
-   TLS_SW_TX,
-   TLS_NUM_CONFIG,
-};
-
-static struct proto tls_prots[TLS_NUM_CONFIG];
+static LIST_HEAD(device_list);
+static DEFINE_MUTEX(device_mutex);
+struct proto tls_prots[TLS_NUM_CONFIG];
 
 static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx)
 {
@@ -260,6 +257,38 @@ static void tls_sk_proto_close(struct sock *sk, long 
timeout)
sk_proto_close(sk, timeout);
 }
 
+static struct net_device *get_netdev(struct sock *sk)
+{
+   struct inet_sock *inet = inet_sk(sk);
+   struct net_device *netdev = NULL;
+
+   netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif);
+   return netdev;
+}
+
+static int get_tls_offload_dev(struct sock *sk)
+{
+   struct net_device *netdev;
+   struct tls_device *dev;
+   int rc = -EINVAL;
+
+   netdev = get_netdev(sk);
+   if (!netdev)
+   goto out;
+
+   mutex_lock(&device_mutex);
+   list_for_each_entry(dev, &device_list, dev_list) {
+   if (dev->netdev && dev->netdev(dev, netdev)) {
+   rc = 0;
+   break;
+   }
+   }
+   mutex_unlock(&device_mutex);
+   dev_put(netdev);
+out:
+   return rc;
+}
+
 static int do_tls_getsockopt_tx(struct sock *sk, char __user *optval,
int __user *optlen)
 {
@@ -401,6 +430,15 @@ static int do_tls_setsockopt_tx(struct sock *sk, char 
__user *optval,
goto out;
}
 
+   rc = get_tls_offload_dev(sk);
+   if (rc) {
+   goto out;
+   } else {
+   /* Retain HW unhash for cleanup and move to SW Tx */
+   sk->sk_prot[TLS_BASE_TX].unhash =
+   sk->sk_prot[TLS_FULL_HW].unhash;
+   }
+
/* currently SW is default, we will have ethtool in future */
rc = tls_set_sw_offload(sk, ctx);
tx_conf = TLS_SW_TX;
@@ -448,6 +486,54 @@ static int tls_setsockopt(struct sock *sk, int level, int 
optname,
return do_tls_setsockopt(sk, optname, optval, optlen);
 }
 
+static int tls_hw_prot(struct sock *sk)
+{
+   struct tls_context *ctx = tls_get_ctx(sk);
+   struct tls_device *dev;
+
+   mutex_lock(&device_mutex);
+   list_for_each_entry(dev, &device_list, dev_list) {
+   if (dev->feature && dev->feature(dev)) {
+   ctx->tx_conf = TLS_FULL_HW;
+   update_sk_prot(sk, ctx);
+   break;
+   }
+   }
+   mutex_unlock(&device_mutex);
+   return ctx->tx_conf;
+}
+
+static void tls_hw_unhash(struct sock *sk)
+{
+   struct tls_device *dev;
+
+   mutex_lock(&device_mutex);
+   list_for_each_entry(dev, &device_list, dev_list) {
+   if (dev->unhash)
+   dev->unhash(dev, sk);
+   }
+   mutex_unlock(&device_mutex);
+   tcp_prot.unhash(sk);
+}
+
+static int tls_hw_hash(struct sock *sk)
+{
+   struct tls_device *dev;
+   int err;
+
+   err = tcp_prot.hash(sk);
+   mutex_lock(&device_mutex);
+   list_for_each_entry(dev, &device_list, dev_list) {
+   if (dev->hash)
+   err |= dev->hash(dev, sk);
+   }
+   mutex_unlock(&device_mutex);
+
+   if (err)
+   tls_hw_unhash(sk);
+   return err;
+}
+
 static int tls_init(struct sock *sk)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -466,6 +552,9 @@ static int tls_init(struct sock *sk)
ctx->sk_proto_close = sk->sk_prot->close;
 
ctx->tx_conf = TLS_BASE_TX;
+   if (tls_hw_prot(sk) == TLS_FULL_HW)
+   goto out;
+
update_sk_prot(sk, ctx);
 out:
return rc;
@@ -487,7 +576,27 @@ static void build_protos(struct proto *prot, struct proto 
*base)
prot[TLS_SW_TX] = prot[TLS_BASE_TX];
prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg;
prot[TLS_SW_TX].sendpage= tls_sw_sendpage;
+
+   prot[TLS_FULL_HW] = prot[TLS_BASE_TX];
+   prot[TLS_FULL_HW].hash  = tls_hw_hash;
+   prot[TLS_FULL_HW].unhash= tls_hw_unhash;
+}
+
+void tls_regi

Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-14 Thread Stephan Mueller
Am Donnerstag, 15. Februar 2018, 06:30:36 CET schrieb Harsh Jain:

Hi Harsh,

> On 14-02-2018 18:22, Stephan Mueller wrote:
> > Am Mittwoch, 14. Februar 2018, 06:43:53 CET schrieb Harsh Jain:
> > 
> > Hi Harsh,
> > 
> >> Patch set is working fine with chelsio Driver.
> > 
> > Thank you.
> > 
> >> Do we really need IV locking mechanism for AEAD algo because AEAD algo's
> >> don't support Partial mode operation and Driver are not updating(atleast
> >> Chelsio) IV's on AEAD request completions.
> > 
> > Yes, I think we would need it. It is technically possible to have multiple
> > IOCBs for AEAD ciphers. Even though your implementation may not write the
> > IV back, others may do that. At least I do not see a guarantee that the
> > IV is *not* written back by a driver.
> 
> There is no  use of writing IV back in AEAD algo till Framework starts
> supporting Partial mode.

I agree.

> Even if Driver starts updating IV for AEAD,
> Multiple IOCB's in both cases will yield wrong results only.

This would only be the case if the driver would not implicitly or explicitly 
serialize the requests.
> 
> Case 1 : If we have AEAD IV serialization  applied,  Encryption will be
> wrong if same IV gets used.

Agreed.

> Case 2: If we do not have IV serialization for
> AEAD. Encryption will be fine but user will have multiple Authentication 
> tag (that too with final block processed).  Its like 2nd Block encryption
> is based on IV received from 1st block  and Authentication Tag value is
> based on 2nd block content only.

Agreed.

But are we sure that all drivers behave correctly? Before you notified us of 
the issue, I was not even aware of the fact that this serialization may not be 
done in the driver. And we only have seen that issue with AF_ALG where we test 
for multiple concurrent AIO operations.

Besides, when we do not have the locking for AEAD, what would we gain: one 
less lock to take vs. guarantee that the AEAD operation is always properly 
serialized.

Ciao
Stephan




Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-14 Thread Harsh Jain


On 14-02-2018 18:22, Stephan Mueller wrote:
> Am Mittwoch, 14. Februar 2018, 06:43:53 CET schrieb Harsh Jain:
>
> Hi Harsh,
>
>> Patch set is working fine with chelsio Driver.
> Thank you.
>
>> Do we really need IV locking mechanism for AEAD algo because AEAD algo's
>> don't support Partial mode operation and Driver are not updating(atleast
>> Chelsio) IV's on AEAD request completions.
> Yes, I think we would need it. It is technically possible to have multiple 
> IOCBs for AEAD ciphers. Even though your implementation may not write the IV 
> back, others may do that. At least I do not see a guarantee that the IV is 
> *not* written back by a driver.
There is no  use of writing IV back in AEAD algo till Framework starts 
supporting Partial mode.
Even if Driver starts updating IV for AEAD, Multiple IOCB's in both cases will 
yield wrong results only.

Case 1 : If we have AEAD IV serialization  applied,  Encryption will be wrong 
if same IV gets used.
Case 2: If we do not have IV serialization for AEAD. Encryption will be fine 
but user will have multiple Authentication  tag (that too with final block 
processed).  Its like 2nd Block encryption is based on IV received from 1st 
block  and Authentication Tag value is based on 2nd block content only.

>
> In case your driver does not write the IV back and thus does not need to 
> serialize, the driver can report CRYPTO_ALG_SERIALIZES_IV_ACCESS. In this 
> case, the higher level functions would not serialize as the driver serializes 
> the requests (or the driver deems it appropriate that no serialization is 
> needed as is the case with your driver).
>
> Ciao
> Stephan
>
>



Re: [Crypto v4 12/12] Makefile Kconfig

2018-02-14 Thread kbuild test robot
Hi Atul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on cryptodev/master]
[cannot apply to net/master net-next/master v4.16-rc1 next-20180214]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Atul-Gupta/Chelsio-Inline-TLS/20180215-072600
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All warnings (new ones prefixed by >>):

   drivers/crypto/chelsio/chtls/chtls_cm.c: In function 'chtls_rx_ack':
>> drivers/crypto/chelsio/chtls/chtls_cm.c:1979:4: warning: this 'if' clause 
>> does not guard... [-Wmisleading-indentation]
   if (csk->wr_nondata)
   ^~
   drivers/crypto/chelsio/chtls/chtls_cm.c:1981:5: note: ...this statement, but 
the latter is misleadingly indented as if it were guarded by the 'if'
break;
^

vim +/if +1979 drivers/crypto/chelsio/chtls/chtls_cm.c

763a7f5f Atul Gupta 2018-02-12  1961  
763a7f5f Atul Gupta 2018-02-12  1962  static void chtls_rx_ack(struct sock *sk, 
struct sk_buff *skb)
763a7f5f Atul Gupta 2018-02-12  1963  {
763a7f5f Atul Gupta 2018-02-12  1964struct cpl_fw4_ack *hdr = cplhdr(skb) + 
RSS_HDR;
763a7f5f Atul Gupta 2018-02-12  1965struct chtls_sock *csk = 
sk->sk_user_data;
763a7f5f Atul Gupta 2018-02-12  1966struct tcp_sock *tp = tcp_sk(sk);
763a7f5f Atul Gupta 2018-02-12  1967u32 snd_una = ntohl(hdr->snd_una);
763a7f5f Atul Gupta 2018-02-12  1968u8 credits = hdr->credits;
763a7f5f Atul Gupta 2018-02-12  1969  
763a7f5f Atul Gupta 2018-02-12  1970csk->wr_credits += credits;
763a7f5f Atul Gupta 2018-02-12  1971  
763a7f5f Atul Gupta 2018-02-12  1972if (csk->wr_unacked > 
csk->wr_max_credits - csk->wr_credits)
763a7f5f Atul Gupta 2018-02-12  1973csk->wr_unacked = 
csk->wr_max_credits - csk->wr_credits;
763a7f5f Atul Gupta 2018-02-12  1974  
763a7f5f Atul Gupta 2018-02-12  1975while (credits) {
763a7f5f Atul Gupta 2018-02-12  1976struct sk_buff *pskb = 
csk->wr_skb_head;
763a7f5f Atul Gupta 2018-02-12  1977  
763a7f5f Atul Gupta 2018-02-12  1978if (unlikely(!pskb)) {
763a7f5f Atul Gupta 2018-02-12 @1979if (csk->wr_nondata)
763a7f5f Atul Gupta 2018-02-12  1980csk->wr_nondata 
-= credits;
763a7f5f Atul Gupta 2018-02-12  1981break;
763a7f5f Atul Gupta 2018-02-12  1982}
763a7f5f Atul Gupta 2018-02-12  1983if (unlikely(credits < 
pskb->csum)) {
763a7f5f Atul Gupta 2018-02-12  1984pskb->csum -= credits;
763a7f5f Atul Gupta 2018-02-12  1985break;
763a7f5f Atul Gupta 2018-02-12  1986}
763a7f5f Atul Gupta 2018-02-12  1987dequeue_wr(sk);
763a7f5f Atul Gupta 2018-02-12  1988credits -= pskb->csum;
763a7f5f Atul Gupta 2018-02-12  1989kfree_skb(pskb);
763a7f5f Atul Gupta 2018-02-12  1990}
763a7f5f Atul Gupta 2018-02-12  1991if (hdr->seq_vld & 
CPL_FW4_ACK_FLAGS_SEQVAL) {
763a7f5f Atul Gupta 2018-02-12  1992if (unlikely(before(snd_una, 
tp->snd_una))) {
763a7f5f Atul Gupta 2018-02-12  1993kfree_skb(skb);
763a7f5f Atul Gupta 2018-02-12  1994return;
763a7f5f Atul Gupta 2018-02-12  1995}
763a7f5f Atul Gupta 2018-02-12  1996  
763a7f5f Atul Gupta 2018-02-12  1997if (tp->snd_una != snd_una) {
763a7f5f Atul Gupta 2018-02-12  1998tp->snd_una = snd_una;
763a7f5f Atul Gupta 2018-02-12  1999
dst_confirm(sk->sk_dst_cache);
763a7f5f Atul Gupta 2018-02-12  2000tp->rcv_tstamp = 
tcp_time_stamp(tp);
763a7f5f Atul Gupta 2018-02-12  2001if (tp->snd_una == 
tp->snd_nxt &&
763a7f5f Atul Gupta 2018-02-12  2002
!csk_flag_nochk(csk, CSK_TX_FAILOVER))
763a7f5f Atul Gupta 2018-02-12  2003
csk_reset_flag(csk, CSK_TX_WAIT_IDLE);
763a7f5f Atul Gupta 2018-02-12  2004}
763a7f5f Atul Gupta 2018-02-12  2005}
763a7f5f Atul Gupta 2018-02-12  2006  
763a7f5f Atul Gupta 2018-02-12  2007if (hdr->seq_vld & 
CPL_FW4_ACK_FLAGS_CH) {
763a7f5f Atul Gupta 2018-02-12  2008unsigned int fclen16 = 
roundup(failover_flowc_wr_len, 16);
763a7f5f Atul Gupta 2018-02-12  2009  
763a7f5f Atul Gupta 2018-02-12  2010csk->wr_credits -= fclen16;
763a7f5f Atul Gupta 2

[PATCH 2/2] crypto: bcm: One function call less in do_shash() after error detection

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 22:22:20 +0100

The kfree() function was called in one case by the do_shash() function
during error handling even if the passed variable contained a null pointer.

* Reorder two function calls at the end.

* Add a jump target.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/bcm/util.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/bcm/util.c b/drivers/crypto/bcm/util.c
index a912c6ad3e85..c141a0242223 100644
--- a/drivers/crypto/bcm/util.c
+++ b/drivers/crypto/bcm/util.c
@@ -279,7 +279,7 @@ int do_shash(unsigned char *name, unsigned char *result,
sdesc = kmalloc(size, GFP_KERNEL);
if (!sdesc) {
rc = -ENOMEM;
-   goto do_shash_err;
+   goto free_shash;
}
sdesc->shash.tfm = hash;
sdesc->shash.flags = 0x0;
@@ -314,9 +314,9 @@ int do_shash(unsigned char *name, unsigned char *result,
pr_err("%s: Could not generate %s hash\n", __func__, name);
 
 do_shash_err:
-   crypto_free_shash(hash);
kfree(sdesc);
-
+free_shash:
+   crypto_free_shash(hash);
return rc;
 }
 
-- 
2.16.1



[PATCH 1/2] crypto: bcm: Delete an error message for a failed memory allocation in do_shash()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 22:05:11 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/bcm/util.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/crypto/bcm/util.c b/drivers/crypto/bcm/util.c
index d543c010ccd9..a912c6ad3e85 100644
--- a/drivers/crypto/bcm/util.c
+++ b/drivers/crypto/bcm/util.c
@@ -279,7 +279,6 @@ int do_shash(unsigned char *name, unsigned char *result,
sdesc = kmalloc(size, GFP_KERNEL);
if (!sdesc) {
rc = -ENOMEM;
-   pr_err("%s: Memory allocation failure\n", __func__);
goto do_shash_err;
}
sdesc->shash.tfm = hash;
-- 
2.16.1



[PATCH 0/2] crypto/bcm: Adjustments for do_shash()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 22:30:07 +0100

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete an error message for a failed memory allocation
  One function call less after error detection

 drivers/crypto/bcm/util.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

-- 
2.16.1



[PATCH] crypto: bfin_crc: Delete an error message for a failed memory allocation in bfin_crypto_crc_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 21:34:54 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/bfin_crc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/crypto/bfin_crc.c b/drivers/crypto/bfin_crc.c
index a118b9bed669..7a1d6ed6d0b7 100644
--- a/drivers/crypto/bfin_crc.c
+++ b/drivers/crypto/bfin_crc.c
@@ -575,10 +575,8 @@ static int bfin_crypto_crc_probe(struct platform_device 
*pdev)
int ret;
 
crc = devm_kzalloc(dev, sizeof(*crc), GFP_KERNEL);
-   if (!crc) {
-   dev_err(&pdev->dev, "fail to malloc bfin_crypto_crc\n");
+   if (!crc)
return -ENOMEM;
-   }
 
crc->dev = dev;
 
-- 
2.16.1



[PATCH v3 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-02-14 Thread Eric Biggers
Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.

The performance depends on the processor but can be about 3 times faster
than the generic code.  For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:

xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s

In comparison to AES-256-XTS without the Cryptography Extensions:

xts-aes-neonbs:Encryption  41.2 MB/s, Decryption  36.7 MB/s
xts(aes-asm):  Encryption  31.7 MB/s, Decryption  30.8 MB/s
xts(aes-generic):  Encryption  21.2 MB/s, Decryption  20.9 MB/s

Speck64/128-XTS is even faster:

xts-speck64-neon:  Encryption 138.6 MB/s, Decryption 139.1 MB/s

Note that as with the generic code, only the Speck128 and Speck64
variants are supported.  Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases.  The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback.  Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.

The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper.  Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.

Signed-off-by: Eric Biggers 
---
 arch/arm/crypto/Kconfig   |   6 +
 arch/arm/crypto/Makefile  |   2 +
 arch/arm/crypto/speck-neon-core.S | 432 ++
 arch/arm/crypto/speck-neon-glue.c | 288 
 4 files changed, 728 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index b8e69fe282b8..925d1364727a 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
 
+config CRYPTO_SPECK_NEON
+   tristate "NEON accelerated Speck cipher algorithms"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_SPECK
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 30ef8e291271..a758107c5525 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -53,6 +54,7 @@ ghash-arm-ce-y:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+speck-neon-y := speck-neon-core.o speck-neon-glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/speck-neon-core.S 
b/arch/arm/crypto/speck-neon-core.S
new file mode 100644
index ..3c1e203e53b9
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -0,0 +1,432 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Author: Eric Biggers 
+ */
+
+#include 
+
+   .text
+   .fpuneon
+
+   // arguments
+   ROUND_KEYS  .reqr0  // const {u64,u32} *round_keys
+   NROUNDS .reqr1  // int nrounds
+   DST .reqr2  // void *dst
+   SRC .reqr3  // const void *src
+   NBYTES  .reqr4  // unsigned int nbytes
+   TWEAK   .reqr5  // void *tweak
+
+   // registers which hold the data being encrypted/decrypted
+   X0  .reqq0
+   X0_L.reqd0
+   X0_H.reqd1
+   Y0  .reqq1
+   Y0_H.reqd3
+   X1  .reqq2
+   X1_L.reqd4
+   X1_H.reqd5
+   Y1  .reqq3
+   Y1_H.reqd7
+   X2  .reqq4
+   X2_L 

[PATCH v3 4/5] crypto: speck - add test vectors for Speck128-XTS

2018-02-14 Thread Eric Biggers
Add test vectors for Speck128-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors.

Both xts(speck128-generic) and xts-speck128-neon pass these tests.

Signed-off-by: Eric Biggers 
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 687 +++
 2 files changed, 696 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 058ed5eb6620..e011a347d51b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3575,6 +3575,15 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(serpent_xts_dec_tv_template)
}
}
+   }, {
+   .alg = "xts(speck128)",
+   .test = alg_test_skcipher,
+   .suite = {
+   .cipher = {
+   .enc = __VECS(speck128_xts_enc_tv_template),
+   .dec = __VECS(speck128_xts_dec_tv_template)
+   }
+   }
}, {
.alg = "xts(twofish)",
.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 3818210f77cf..0212e0ebcd0c 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14411,6 +14411,693 @@ static const struct cipher_testvec 
speck128_dec_tv_template[] = {
},
 };
 
+/*
+ * Speck128-XTS test vectors, taken from the AES-XTS test vectors with the
+ * result recomputed with Speck128 as the cipher
+ */
+
+static const struct cipher_testvec speck128_xts_enc_tv_template[] = {
+   {
+   .key= "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .klen   = 32,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .ilen   = 32,
+   .result = "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+ "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+ "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+ "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+   .rlen   = 32,
+   }, {
+   .key= "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 32,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+ "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+ "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+ "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+   .rlen   = 32,
+   }, {
+   .key= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 32,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+ "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+ "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+ "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+   .rlen   = 32,
+   }, {
+   .key= "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93"
+ "\x23\x84\x62\x64\x33\x83\x27\x95",
+   .klen   = 32,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0

[PATCH v3 5/5] crypto: speck - add test vectors for Speck64-XTS

2018-02-14 Thread Eric Biggers
Add test vectors for Speck64-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors, with key lengths
adjusted.

xts-speck64-neon passes these tests.  However, they aren't currently
applicable for the generic XTS template, as that only supports a 128-bit
block size.

Signed-off-by: Eric Biggers 
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 671 +++
 2 files changed, 680 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index e011a347d51b..9f82e7bc9c56 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3584,6 +3584,15 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(speck128_xts_dec_tv_template)
}
}
+   }, {
+   .alg = "xts(speck64)",
+   .test = alg_test_skcipher,
+   .suite = {
+   .cipher = {
+   .enc = __VECS(speck64_xts_enc_tv_template),
+   .dec = __VECS(speck64_xts_dec_tv_template)
+   }
+   }
}, {
.alg = "xts(twofish)",
.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 0212e0ebcd0c..da72fd394f35 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -15138,6 +15138,677 @@ static const struct cipher_testvec 
speck64_dec_tv_template[] = {
},
 };
 
+/*
+ * Speck64-XTS test vectors, taken from the AES-XTS test vectors with the 
result
+ * recomputed with Speck64 as the cipher, and key lengths adjusted
+ */
+
+static const struct cipher_testvec speck64_xts_enc_tv_template[] = {
+   {
+   .key= "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .klen   = 24,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .ilen   = 32,
+   .result = "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+ "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+ "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+ "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+   .rlen   = 32,
+   }, {
+   .key= "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 24,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+ "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+ "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+ "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+   .rlen   = 32,
+   }, {
+   .key= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 24,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+ "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+ "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+ "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+   .rlen   = 32,
+   }, {
+   .key= "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93",
+   .klen   = 24,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x1

[PATCH v3 0/5] crypto: Speck support

2018-02-14 Thread Eric Biggers
Hello,

This series adds Speck support to the crypto API, including the Speck128
and Speck64 variants.  Speck is a lightweight block cipher that can be
much faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

Thus, patch 1 adds a generic implementation of Speck, and the following
patches add a 32-bit ARM NEON implementation of Speck-XTS.  The
NEON-accelerated implementation is much faster than the generic
implementation and therefore is the implementation that would primarily
be used in practice on the devices we are targeting.

There is no AArch64 implementation included, since most such CPUs have
the Cryptography Extensions, allowing the use of AES.  An AArch64
implementation can be added later if there is interest though.

Changed since v2:

  - Fix __speck64_xts_crypt() to work on big endian CPUs.

Changed since v1:

  - Use the word order recommended by the Speck authors.  All test
vectors were updated.

Eric Biggers (5):
  crypto: add support for the Speck block cipher
  crypto: speck - export common helpers
  crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  crypto: speck - add test vectors for Speck128-XTS
  crypto: speck - add test vectors for Speck64-XTS

 arch/arm/crypto/Kconfig   |6 +
 arch/arm/crypto/Makefile  |2 +
 arch/arm/crypto/speck-neon-core.S |  432 +
 arch/arm/crypto/speck-neon-glue.c |  288 ++
 crypto/Kconfig|   14 +
 crypto/Makefile   |1 +
 crypto/speck.c|  307 ++
 crypto/testmgr.c  |   36 +
 crypto/testmgr.h  | 1486 +
 include/crypto/speck.h|   62 ++
 10 files changed, 2634 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c
 create mode 100644 crypto/speck.c
 create mode 100644 include/crypto/speck.h

-- 
2.16.1.291.g4437f3f132-goog



[PATCH v3 1/5] crypto: add support for the Speck block cipher

2018-02-14 Thread Eric Biggers
Add a generic implementation of Speck, including the Speck128 and
Speck64 variants.  Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them.  But we have implemented it using
the orders the authors recommended in a correspondence with them.  The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.

Signed-off-by: Eric Biggers 
---
 crypto/Kconfig   |  14 +++
 crypto/Makefile  |   1 +
 crypto/speck.c   | 299 +++
 crypto/testmgr.c |  18 +++
 crypto/testmgr.h | 128 
 5 files changed, 460 insertions(+)
 create mode 100644 crypto/speck.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index b75264b09a46..558eff07b799 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1508,6 +1508,20 @@ config CRYPTO_SERPENT_AVX2_X86_64
  See also:
  
 
+config CRYPTO_SPECK
+   tristate "Speck cipher algorithm"
+   select CRYPTO_ALGAPI
+   help
+ Speck is a lightweight block cipher that is tuned for optimal
+ performance in software (rather than hardware).
+
+ Speck may not be as secure as AES, and should only be used on systems
+ where AES is not fast enough.
+
+ See also: 
+
+ If unsure, say N.
+
 config CRYPTO_TEA
tristate "TEA, XTEA and XETA cipher algorithms"
select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index cdbc03b35510..ba6019471447 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_SEED) += seed.o
+obj-$(CONFIG_CRYPTO_SPECK) += speck.o
 obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
 obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
 obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
diff --git a/crypto/speck.c b/crypto/speck.c
new file mode 100644
index ..4e80ad76bcd7
--- /dev/null
+++ b/crypto/speck.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Speck: a lightweight block cipher
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Speck has 10 variants, including 5 block sizes.  For now we only implement
+ * the variants Speck128/128, Speck128/192, Speck128/256, Speck64/96, and
+ * Speck64/128.   Speck${B}/${K} denotes the variant with a block size of B 
bits
+ * and a key size of K bits.  The Speck128 variants are believed to be the most
+ * secure variants, and they use the same block size and key sizes as AES.  The
+ * Speck64 variants are less secure, but on 32-bit processors are usually
+ * faster.  The remaining variants (Speck32, Speck48, and Speck96) are even 
less
+ * secure and/or not as well suited for implementation on either 32-bit or
+ * 64-bit processors, so are omitted.
+ *
+ * Reference: "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * In a correspondence, the Speck designers have also clarified that 

[PATCH v3 2/5] crypto: speck - export common helpers

2018-02-14 Thread Eric Biggers
Export the Speck constants and transform context and the ->setkey(),
->encrypt(), and ->decrypt() functions so that they can be reused by the
ARM NEON implementation of Speck-XTS.  The generic key expansion code
will be reused because it is not performance-critical and is not
vectorizable, while the generic encryption and decryption functions are
needed as fallbacks and for the XTS tweak encryption.

Signed-off-by: Eric Biggers 
---
 crypto/speck.c | 90 +++---
 include/crypto/speck.h | 62 +
 2 files changed, 111 insertions(+), 41 deletions(-)
 create mode 100644 include/crypto/speck.h

diff --git a/crypto/speck.c b/crypto/speck.c
index 4e80ad76bcd7..58aa9f7f91f7 100644
--- a/crypto/speck.c
+++ b/crypto/speck.c
@@ -24,6 +24,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,22 +32,6 @@
 
 /* Speck128 */
 
-#define SPECK128_BLOCK_SIZE16
-
-#define SPECK128_128_KEY_SIZE  16
-#define SPECK128_128_NROUNDS   32
-
-#define SPECK128_192_KEY_SIZE  24
-#define SPECK128_192_NROUNDS   33
-
-#define SPECK128_256_KEY_SIZE  32
-#define SPECK128_256_NROUNDS   34
-
-struct speck128_tfm_ctx {
-   u64 round_keys[SPECK128_256_NROUNDS];
-   int nrounds;
-};
-
 static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
 {
*x = ror64(*x, 8);
@@ -65,9 +50,9 @@ static __always_inline void speck128_unround(u64 *x, u64 *y, 
u64 k)
*x = rol64(*x, 8);
 }
 
-static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+u8 *out, const u8 *in)
 {
-   const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u64 y = get_unaligned_le64(in);
u64 x = get_unaligned_le64(in + 8);
int i;
@@ -78,10 +63,16 @@ static void speck128_encrypt(struct crypto_tfm *tfm, u8 
*out, const u8 *in)
put_unaligned_le64(y, out);
put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_encrypt);
 
-static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   crypto_speck128_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+u8 *out, const u8 *in)
 {
-   const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u64 y = get_unaligned_le64(in);
u64 x = get_unaligned_le64(in + 8);
int i;
@@ -92,11 +83,16 @@ static void speck128_decrypt(struct crypto_tfm *tfm, u8 
*out, const u8 *in)
put_unaligned_le64(y, out);
put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_decrypt);
 
-static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   crypto_speck128_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
   unsigned int keylen)
 {
-   struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u64 l[3];
u64 k;
int i;
@@ -138,21 +134,15 @@ static int speck128_setkey(struct crypto_tfm *tfm, const 
u8 *key,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_setkey);
 
-/* Speck64 */
-
-#define SPECK64_BLOCK_SIZE 8
-
-#define SPECK64_96_KEY_SIZE12
-#define SPECK64_96_NROUNDS 26
-
-#define SPECK64_128_KEY_SIZE   16
-#define SPECK64_128_NROUNDS27
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+  unsigned int keylen)
+{
+   return crypto_speck128_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
 
-struct speck64_tfm_ctx {
-   u32 round_keys[SPECK64_128_NROUNDS];
-   int nrounds;
-};
+/* Speck64 */
 
 static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
 {
@@ -172,9 +162,9 @@ static __always_inline void speck64_unround(u32 *x, u32 *y, 
u32 k)
*x = rol32(*x, 8);
 }
 
-static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+   u8 *out, const u8 *in)
 {
-   const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u32 y = get_unaligned_le32(in);
u32 x = get_unaligned_le32(in + 4);
int i;
@@ -185,10 +175,16 @@ static void speck64_encrypt(struct crypto_tfm *tfm, u8 
*out, const u8 *in)
put_unaligned_le32(y, out);
put_unaligned_le32(x, out + 4);
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_encrypt);
 
-static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   crypto_speck64_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+   u8 *ou

[PATCH 2/2] crypto: caam: Use common error handling code in four functions

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 19:14:49 +0100

Add jump targets so that a bit of exception handling can be better reused
at the end of these functions.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/caam/caamalg.c  | 32 
 drivers/crypto/caam/caamhash.c | 23 ++-
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index d1f25a90552a..3d26c44040c7 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1560,11 +1560,8 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
/* allocate space for base edesc and hw desc commands, link tables */
edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
GFP_DMA | flags);
-   if (!edesc) {
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
-  iv_dma, ivsize, 0, 0);
-   return ERR_PTR(-ENOMEM);
-   }
+   if (!edesc)
+   goto unmap_caam;
 
edesc->src_nents = src_nents;
edesc->dst_nents = dst_nents;
@@ -1587,10 +1584,8 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
-  iv_dma, ivsize, 0, 0);
kfree(edesc);
-   return ERR_PTR(-ENOMEM);
+   goto unmap_caam;
}
 
edesc->iv_dma = iv_dma;
@@ -1603,6 +1598,11 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
 
*iv_contig_out = in_contig;
return edesc;
+
+unmap_caam:
+   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
+  iv_dma, ivsize, 0, 0);
+   return ERR_PTR(-ENOMEM);
 }
 
 static int ablkcipher_encrypt(struct ablkcipher_request *req)
@@ -1768,11 +1768,8 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
sec4_sg_bytes = sec4_sg_ents * sizeof(struct sec4_sg_entry);
edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
GFP_DMA | flags);
-   if (!edesc) {
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
-  iv_dma, ivsize, 0, 0);
-   return ERR_PTR(-ENOMEM);
-   }
+   if (!edesc)
+   goto unmap_caam;
 
edesc->src_nents = src_nents;
edesc->dst_nents = dst_nents;
@@ -1795,10 +1792,8 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
-   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
-  iv_dma, ivsize, 0, 0);
kfree(edesc);
-   return ERR_PTR(-ENOMEM);
+   goto unmap_caam;
}
edesc->iv_dma = iv_dma;
 
@@ -1811,6 +1806,11 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
 
*iv_contig_out = out_contig;
return edesc;
+
+unmap_caam:
+   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
+  iv_dma, ivsize, 0, 0);
+   return ERR_PTR(-ENOMEM);
 }
 
 static int ablkcipher_givencrypt(struct skcipher_givcrypt_request *creq)
diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index dc269eba08ad..b5e43a1f38f0 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -371,16 +371,16 @@ static int hash_digest_key(struct caam_hash_ctx *ctx, 
const u8 *key_in,
 DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, src_dma)) {
dev_err(jrdev, "unable to map key input memory\n");
-   kfree(desc);
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto free_desc;
}
dst_dma = dma_map_single(jrdev, (void *)key_out, digestsize,
 DMA_FROM_DEVICE);
if (dma_mapping_error(jrdev, dst_dma)) {
dev_err(jrdev, "unable to map key output memory\n");
dma_unmap_single(jrdev, src_dma, *keylen, DMA_TO_DEVICE);
-   kfree(desc);
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto free_desc;
}
 
/* Job descriptor to perform unkeyed hash on key_in */
@@ -419,7 +419,7 @@ static int hash_digest_key(struct caam_hash_ctx *ctx, const 
u8 *key_in,
dma_unmap_single(jrdev, dst_dma, digestsize, DMA_FROM_DEVICE);
 
*keylen = digestsize;
-
+free_desc:

[PATCH 1/2] crypto: caam: Delete an error message for a failed memory allocation in seven functions

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 18:22:38 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/caam/caamalg.c  |  6 +-
 drivers/crypto/caam/caamhash.c | 12 +++-
 drivers/crypto/caam/key_gen.c  |  4 +---
 3 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 2188235be02d..d1f25a90552a 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1561,7 +1561,6 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
GFP_DMA | flags);
if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
   iv_dma, ivsize, 0, 0);
return ERR_PTR(-ENOMEM);
@@ -1770,7 +1769,6 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
GFP_DMA | flags);
if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents,
   iv_dma, ivsize, 0, 0);
return ERR_PTR(-ENOMEM);
@@ -3372,10 +3370,8 @@ static struct caam_crypto_alg *caam_alg_alloc(struct 
caam_alg_template
struct crypto_alg *alg;
 
t_alg = kzalloc(sizeof(*t_alg), GFP_KERNEL);
-   if (!t_alg) {
-   pr_err("failed to allocate t_alg\n");
+   if (!t_alg)
return ERR_PTR(-ENOMEM);
-   }
 
alg = &t_alg->crypto_alg;
 
diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 0beb28196e20..dc269eba08ad 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -362,10 +362,8 @@ static int hash_digest_key(struct caam_hash_ctx *ctx, 
const u8 *key_in,
int ret;
 
desc = kmalloc(CAAM_CMD_SZ * 8 + CAAM_PTR_SZ * 2, GFP_KERNEL | GFP_DMA);
-   if (!desc) {
-   dev_err(jrdev, "unable to allocate key input memory\n");
+   if (!desc)
return -ENOMEM;
-   }
 
init_job_desc(desc, 0);
 
@@ -689,10 +687,8 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
caam_hash_ctx *ctx,
unsigned int sg_size = sg_num * sizeof(struct sec4_sg_entry);
 
edesc = kzalloc(sizeof(*edesc) + sg_size, GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(ctx->jrdev, "could not allocate extended descriptor\n");
+   if (!edesc)
return NULL;
-   }
 
init_job_desc_shared(edesc->hw_desc, sh_desc_dma, desc_len(sh_desc),
 HDR_SHARE_DEFER | HDR_REVERSE);
@@ -1818,10 +1814,8 @@ caam_hash_alloc(struct caam_hash_template *template,
struct crypto_alg *alg;
 
t_alg = kzalloc(sizeof(*t_alg), GFP_KERNEL);
-   if (!t_alg) {
-   pr_err("failed to allocate t_alg\n");
+   if (!t_alg)
return ERR_PTR(-ENOMEM);
-   }
 
t_alg->ahash_alg = template->template_ahash;
halg = &t_alg->ahash_alg;
diff --git a/drivers/crypto/caam/key_gen.c b/drivers/crypto/caam/key_gen.c
index 312b5f042f31..dd077ac8c41e 100644
--- a/drivers/crypto/caam/key_gen.c
+++ b/drivers/crypto/caam/key_gen.c
@@ -66,10 +66,8 @@ int gen_split_key(struct device *jrdev, u8 *key_out,
return -EINVAL;
 
desc = kmalloc(CAAM_CMD_SZ * 6 + CAAM_PTR_SZ * 2, GFP_KERNEL | GFP_DMA);
-   if (!desc) {
-   dev_err(jrdev, "unable to allocate key input memory\n");
+   if (!desc)
return ret;
-   }
 
dma_addr_in = dma_map_single(jrdev, (void *)key_in, keylen,
 DMA_TO_DEVICE);
-- 
2.16.1



[PATCH 0/2] crypto/caam: Adjustments for eight function implementations

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 19:23:45 +0100

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete an error message for a failed memory allocation in seven functions
  Use common error handling code in four functions

 drivers/crypto/caam/caamalg.c  | 38 +-
 drivers/crypto/caam/caamhash.c | 35 +--
 drivers/crypto/caam/key_gen.c  |  4 +---
 3 files changed, 31 insertions(+), 46 deletions(-)

-- 
2.16.1



[PATCH v2 03/14] x86/crypto: aesni: Add GCM_INIT macro

2018-02-14 Thread Dave Watson
Reduce code duplication by introducting GCM_INIT macro.  This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 84 +++
 1 file changed, 33 insertions(+), 51 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 39b42b1..b9fe2ab 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -191,6 +191,37 @@ ALL_F:  .octa 0x
pop %r12
 .endm
 
+
+# GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
+# Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
+.macro GCM_INIT
+   mov %arg6, %r12
+   movdqu  (%r12), %xmm13
+   movdqa  SHUF_MASK(%rip), %xmm2
+   PSHUFB_XMM %xmm2, %xmm13
+
+   # precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
+
+   movdqa  %xmm13, %xmm2
+   psllq   $1, %xmm13
+   psrlq   $63, %xmm2
+   movdqa  %xmm2, %xmm1
+   pslldq  $8, %xmm2
+   psrldq  $8, %xmm1
+   por %xmm2, %xmm13
+
+   # reduce HashKey<<1
+
+   pshufd  $0x24, %xmm1, %xmm2
+   pcmpeqd TWOONE(%rip), %xmm2
+   pandPOLY(%rip), %xmm2
+   pxor%xmm2, %xmm13
+   movdqa  %xmm13, HashKey(%rsp)
+   mov %arg4, %r13 # %xmm13 holds HashKey<<1 (mod 
poly)
+   and $-16, %r13
+   mov %r13, %r12
+.endm
+
 #ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
@@ -1151,36 +1182,11 @@ _esb_loop_\@:
 */
 ENTRY(aesni_gcm_dec)
FUNC_SAVE
-   mov %arg6, %r12
-   movdqu  (%r12), %xmm13# %xmm13 = HashKey
-movdqa  SHUF_MASK(%rip), %xmm2
-   PSHUFB_XMM %xmm2, %xmm13
-
-
-# Precompute HashKey<<1 (mod poly) from the hash key (required for GHASH)
-
-   movdqa  %xmm13, %xmm2
-   psllq   $1, %xmm13
-   psrlq   $63, %xmm2
-   movdqa  %xmm2, %xmm1
-   pslldq  $8, %xmm2
-   psrldq  $8, %xmm1
-   por %xmm2, %xmm13
-
-# Reduction
-
-   pshufd  $0x24, %xmm1, %xmm2
-   pcmpeqd TWOONE(%rip), %xmm2
-   pandPOLY(%rip), %xmm2
-   pxor%xmm2, %xmm13 # %xmm13 holds the HashKey<<1 (mod poly)
 
+   GCM_INIT
 
 # Decrypt first few blocks
 
-   movdqa %xmm13, HashKey(%rsp)   # store HashKey<<1 (mod poly)
-   mov %arg4, %r13# save the number of bytes of plaintext/ciphertext
-   and $-16, %r13  # %r13 = %r13 - (%r13 mod 16)
-   mov %r13, %r12
and $(3<<4), %r12
jz _initial_num_blocks_is_0_decrypt
cmp $(2<<4), %r12
@@ -1402,32 +1408,8 @@ ENDPROC(aesni_gcm_dec)
 ***/
 ENTRY(aesni_gcm_enc)
FUNC_SAVE
-   mov %arg6, %r12
-   movdqu  (%r12), %xmm13
-movdqa  SHUF_MASK(%rip), %xmm2
-   PSHUFB_XMM %xmm2, %xmm13
-
-# precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
-
-   movdqa  %xmm13, %xmm2
-   psllq   $1, %xmm13
-   psrlq   $63, %xmm2
-   movdqa  %xmm2, %xmm1
-   pslldq  $8, %xmm2
-   psrldq  $8, %xmm1
-   por %xmm2, %xmm13
-
-# reduce HashKey<<1
-
-   pshufd  $0x24, %xmm1, %xmm2
-   pcmpeqd TWOONE(%rip), %xmm2
-   pandPOLY(%rip), %xmm2
-   pxor%xmm2, %xmm13
-   movdqa  %xmm13, HashKey(%rsp)
-   mov %arg4, %r13# %xmm13 holds HashKey<<1 (mod poly)
-   and $-16, %r13
-   mov %r13, %r12
 
+   GCM_INIT
 # Encrypt first few blocks
 
and $(3<<4), %r12
-- 
2.9.5



[PATCH v2 05/14] x86/crypto: aesni: Merge encode and decode to GCM_ENC_DEC macro

2018-02-14 Thread Dave Watson
Make a macro for the main encode/decode routine.  Only a small handful
of lines differ for enc and dec.   This will also become the main
scatter/gather update routine.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 293 +++---
 1 file changed, 114 insertions(+), 179 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 529c542..8021fd1 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -222,6 +222,118 @@ ALL_F:  .octa 0x
mov %r13, %r12
 .endm
 
+# GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
+# struct has been initialized by GCM_INIT.
+# Requires the input data be at least 1 byte long because of READ_PARTIAL_BLOCK
+# Clobbers rax, r10-r13, and xmm0-xmm15
+.macro GCM_ENC_DEC operation
+   # Encrypt/Decrypt first few blocks
+
+   and $(3<<4), %r12
+   jz  _initial_num_blocks_is_0_\@
+   cmp $(2<<4), %r12
+   jb  _initial_num_blocks_is_1_\@
+   je  _initial_num_blocks_is_2_\@
+_initial_num_blocks_is_3_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 5, 678, \operation
+   sub $48, %r13
+   jmp _initial_blocks_\@
+_initial_num_blocks_is_2_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 6, 78, \operation
+   sub $32, %r13
+   jmp _initial_blocks_\@
+_initial_num_blocks_is_1_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 7, 8, \operation
+   sub $16, %r13
+   jmp _initial_blocks_\@
+_initial_num_blocks_is_0_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 8, 0, \operation
+_initial_blocks_\@:
+
+   # Main loop - Encrypt/Decrypt remaining blocks
+
+   cmp $0, %r13
+   je  _zero_cipher_left_\@
+   sub $64, %r13
+   je  _four_cipher_left_\@
+_crypt_by_4_\@:
+   GHASH_4_ENCRYPT_4_PARALLEL_\operation   %xmm9, %xmm10, %xmm11, %xmm12, \
+   %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, \
+   %xmm7, %xmm8, enc
+   add $64, %r11
+   sub $64, %r13
+   jne _crypt_by_4_\@
+_four_cipher_left_\@:
+   GHASH_LAST_4%xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \
+%xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8
+_zero_cipher_left_\@:
+   mov %arg4, %r13
+   and $15, %r13   # %r13 = arg4 (mod 16)
+   je  _multiple_of_16_bytes_\@
+
+   # Handle the last <16 Byte block separately
+   paddd ONE(%rip), %xmm0# INCR CNT to get Yn
+movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10, %xmm0
+
+   ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
+
+   lea (%arg3,%r11,1), %r10
+   mov %r13, %r12
+   READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1
+
+   lea ALL_F+16(%rip), %r12
+   sub %r13, %r12
+.ifc \operation, dec
+   movdqa  %xmm1, %xmm2
+.endif
+   pxor%xmm1, %xmm0# XOR Encrypt(K, Yn)
+   movdqu  (%r12), %xmm1
+   # get the appropriate mask to mask out top 16-r13 bytes of xmm0
+   pand%xmm1, %xmm0# mask out top 16-r13 bytes of xmm0
+.ifc \operation, dec
+   pand%xmm1, %xmm2
+   movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10 ,%xmm2
+
+   pxor %xmm2, %xmm8
+.else
+   movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10,%xmm0
+
+   pxor%xmm0, %xmm8
+.endif
+
+   GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+.ifc \operation, enc
+   # GHASH computation for the last <16 byte block
+   movdqa SHUF_MASK(%rip), %xmm10
+   # shuffle xmm0 back to output as ciphertext
+   PSHUFB_XMM %xmm10, %xmm0
+.endif
+
+   # Output %r13 bytes
+   MOVQ_R64_XMM %xmm0, %rax
+   cmp $8, %r13
+   jle _less_than_8_bytes_left_\@
+   mov %rax, (%arg2 , %r11, 1)
+   add $8, %r11
+   psrldq $8, %xmm0
+   MOVQ_R64_XMM %xmm0, %rax
+   sub $8, %r13
+_less_than_8_bytes_left_\@:
+   mov %al,  (%arg2, %r11, 1)
+   add $1, %r11
+   shr $8, %rax
+   sub $1, %r13
+   jne _less_than_8_bytes_left_\@
+_multiple_of_16_bytes_\@:
+.endm
+
 # GCM_COMPLETE Finishes update of tag of last partial block
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
@@ -1245,93 +1357,7 @@ ENTRY(aesni_gcm_dec)
FUNC_SAVE
 
GCM_INIT
-
-# Decrypt first few blocks
-
-   and $(3<<4), %r12
-   jz _initial_num_blocks_is_0_decrypt
-   cmp $(2<<4), %r12
-   jb _initial_num_blocks_is_1_decrypt
-   je 

[PATCH v2 04/14] x86/crypto: aesni: Add GCM_COMPLETE macro

2018-02-14 Thread Dave Watson
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 172 ++
 1 file changed, 63 insertions(+), 109 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index b9fe2ab..529c542 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -222,6 +222,67 @@ ALL_F:  .octa 0x
mov %r13, %r12
 .endm
 
+# GCM_COMPLETE Finishes update of tag of last partial block
+# Output: Authorization Tag (AUTH_TAG)
+# Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
+.macro GCM_COMPLETE
+   mov arg8, %r12# %r13 = aadLen (number of bytes)
+   shl $3, %r12  # convert into number of bits
+   movd%r12d, %xmm15 # len(A) in %xmm15
+   shl $3, %arg4 # len(C) in bits (*128)
+   MOVQ_R64_XMM%arg4, %xmm1
+   pslldq  $8, %xmm15# %xmm15 = len(A)||0x
+   pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
+   pxor%xmm15, %xmm8
+   GHASH_MUL   %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+   # final GHASH computation
+   movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10, %xmm8
+
+   mov %arg5, %rax   # %rax = *Y0
+   movdqu  (%rax), %xmm0 # %xmm0 = Y0
+   ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
+   pxor%xmm8, %xmm0
+_return_T_\@:
+   mov arg9, %r10 # %r10 = authTag
+   mov arg10, %r11# %r11 = auth_tag_len
+   cmp $16, %r11
+   je  _T_16_\@
+   cmp $8, %r11
+   jl  _T_4_\@
+_T_8_\@:
+   MOVQ_R64_XMM%xmm0, %rax
+   mov %rax, (%r10)
+   add $8, %r10
+   sub $8, %r11
+   psrldq  $8, %xmm0
+   cmp $0, %r11
+   je  _return_T_done_\@
+_T_4_\@:
+   movd%xmm0, %eax
+   mov %eax, (%r10)
+   add $4, %r10
+   sub $4, %r11
+   psrldq  $4, %xmm0
+   cmp $0, %r11
+   je  _return_T_done_\@
+_T_123_\@:
+   movd%xmm0, %eax
+   cmp $2, %r11
+   jl  _T_1_\@
+   mov %ax, (%r10)
+   cmp $2, %r11
+   je  _return_T_done_\@
+   add $2, %r10
+   sar $16, %eax
+_T_1_\@:
+   mov %al, (%r10)
+   jmp _return_T_done_\@
+_T_16_\@:
+   movdqu  %xmm0, (%r10)
+_return_T_done_\@:
+.endm
+
 #ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
@@ -1271,61 +1332,7 @@ _less_than_8_bytes_left_decrypt:
sub $1, %r13
jne _less_than_8_bytes_left_decrypt
 _multiple_of_16_bytes_decrypt:
-   mov arg8, %r12# %r13 = aadLen (number of bytes)
-   shl $3, %r12  # convert into number of bits
-   movd%r12d, %xmm15 # len(A) in %xmm15
-   shl $3, %arg4 # len(C) in bits (*128)
-   MOVQ_R64_XMM%arg4, %xmm1
-   pslldq  $8, %xmm15# %xmm15 = len(A)||0x
-   pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
-   pxor%xmm15, %xmm8
-   GHASH_MUL   %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
-# final GHASH computation
-movdqa SHUF_MASK(%rip), %xmm10
-   PSHUFB_XMM %xmm10, %xmm8
-
-   mov %arg5, %rax   # %rax = *Y0
-   movdqu  (%rax), %xmm0 # %xmm0 = Y0
-   ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
-   pxor%xmm8, %xmm0
-_return_T_decrypt:
-   mov arg9, %r10# %r10 = authTag
-   mov arg10, %r11   # %r11 = auth_tag_len
-   cmp $16, %r11
-   je  _T_16_decrypt
-   cmp $8, %r11
-   jl  _T_4_decrypt
-_T_8_decrypt:
-   MOVQ_R64_XMM%xmm0, %rax
-   mov %rax, (%r10)
-   add $8, %r10
-   sub $8, %r11
-   psrldq  $8, %xmm0
-   cmp $0, %r11
-   je  _return_T_done_decrypt
-_T_4_decrypt:
-   movd%xmm0, %eax
-   mov %eax, (%r10)
-   add $4, %r10
-   sub $4, %r11
-   psrldq  $4, %xmm0
-   cmp $0, %r11
-   je  _return_T_done_decrypt
-_T_123_decrypt:
-   movd%xmm0, %eax
-   cmp $2, %r11
-   jl  _T_1_decrypt
-   mov %ax, (%r10)
-   cmp $2, %r11
-   je  _return_T_done_decrypt
-   add $2, %r10
-   sar $16, %eax
-_T_1_decrypt:
-   mov %al, (%r10)
-   jmp _return_T_done_decrypt
-_T_16_decrypt:
-   movdqu  %xmm0, (%r10)
-_return_T_done_decrypt:
+   GCM_COMPLETE
FUNC_RESTORE
ret
 ENDPROC(aesni_gcm_dec)
@@ -1501,

[PATCH v2 07/14] x86/crypto: aesni: Split AAD hash calculation to separate macro

2018-02-14 Thread Dave Watson
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 71 ---
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 6c5a80d..58bbfac 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -229,6 +229,10 @@ ALL_F:  .octa 0x
mov %arg5, %r13 # %xmm13 holds HashKey<<1 (mod poly)
and $-16, %r13
mov %r13, %r12
+
+   CALC_AAD_HASH %xmm13 %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 \
+   %xmm5 %xmm6
+   mov %r13, %r12
 .endm
 
 # GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
@@ -496,51 +500,62 @@ _read_next_byte_lt8_\@:
 _done_read_partial_block_\@:
 .endm
 
-/*
-* if a = number of total plaintext bytes
-* b = floor(a/16)
-* num_initial_blocks = b mod 4
-* encrypt the initial num_initial_blocks blocks and apply ghash on
-* the ciphertext
-* %r10, %r11, %r12, %rax, %xmm5, %xmm6, %xmm7, %xmm8, %xmm9 registers
-* are clobbered
-* arg1, %arg3, %arg4, %r14 are used as a pointer only, not modified
-*/
-
-
-.macro INITIAL_BLOCKS_ENC_DEC TMP1 TMP2 TMP3 TMP4 TMP5 XMM0 XMM1 \
-XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
-MOVADQ SHUF_MASK(%rip), %xmm14
-   movarg8, %r10   # %r10 = AAD
-   movarg9, %r11   # %r11 = aadLen
-   pxor   %xmm\i, %xmm\i
-   pxor   \XMM2, \XMM2
+# CALC_AAD_HASH: Calculates the hash of the data which will not be encrypted.
+# clobbers r10-11, xmm14
+.macro CALC_AAD_HASH HASHKEY TMP1 TMP2 TMP3 TMP4 TMP5 \
+   TMP6 TMP7
+   MOVADQ SHUF_MASK(%rip), %xmm14
+   movarg8, %r10   # %r10 = AAD
+   movarg9, %r11   # %r11 = aadLen
+   pxor   \TMP7, \TMP7
+   pxor   \TMP6, \TMP6
 
cmp$16, %r11
jl _get_AAD_rest\@
 _get_AAD_blocks\@:
-   movdqu (%r10), %xmm\i
-   PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   %xmm\i, \XMM2
-   GHASH_MUL  \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+   movdqu (%r10), \TMP7
+   PSHUFB_XMM   %xmm14, \TMP7 # byte-reflect the AAD data
+   pxor   \TMP7, \TMP6
+   GHASH_MUL  \TMP6, \HASHKEY, \TMP1, \TMP2, \TMP3, \TMP4, \TMP5
add$16, %r10
sub$16, %r11
cmp$16, %r11
jge_get_AAD_blocks\@
 
-   movdqu \XMM2, %xmm\i
+   movdqu \TMP6, \TMP7
 
/* read the last <16B of AAD */
 _get_AAD_rest\@:
cmp$0, %r11
je _get_AAD_done\@
 
-   READ_PARTIAL_BLOCK %r10, %r11, \TMP1, %xmm\i
-   PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   \XMM2, %xmm\i
-   GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+   READ_PARTIAL_BLOCK %r10, %r11, \TMP1, \TMP7
+   PSHUFB_XMM   %xmm14, \TMP7 # byte-reflect the AAD data
+   pxor   \TMP6, \TMP7
+   GHASH_MUL  \TMP7, \HASHKEY, \TMP1, \TMP2, \TMP3, \TMP4, \TMP5
+   movdqu \TMP7, \TMP6
 
 _get_AAD_done\@:
+   movdqu \TMP6, AadHash(%arg2)
+.endm
+
+/*
+* if a = number of total plaintext bytes
+* b = floor(a/16)
+* num_initial_blocks = b mod 4
+* encrypt the initial num_initial_blocks blocks and apply ghash on
+* the ciphertext
+* %r10, %r11, %r12, %rax, %xmm5, %xmm6, %xmm7, %xmm8, %xmm9 registers
+* are clobbered
+* arg1, %arg2, %arg3, %r14 are used as a pointer only, not modified
+*/
+
+
+.macro INITIAL_BLOCKS_ENC_DEC TMP1 TMP2 TMP3 TMP4 TMP5 XMM0 XMM1 \
+   XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
+
+   movdqu AadHash(%arg2), %xmm\i   # XMM0 = Y0
+
xor%r11, %r11 # initialise the data pointer offset as zero
# start AES for num_initial_blocks blocks
 
-- 
2.9.5



[PATCH v2 08/14] x86/crypto: aesni: Fill in new context data structures

2018-02-14 Thread Dave Watson
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 51 ++-
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 58bbfac..aa82493 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -204,6 +204,21 @@ ALL_F:  .octa 0x
 # GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
 # Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
 .macro GCM_INIT
+
+   mov arg9, %r11
+   mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
+   xor %r11, %r11
+   mov %r11, InLen(%arg2) # ctx_data.in_length = 0
+   mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
+   mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
+   mov %arg6, %rax
+   movdqu (%rax), %xmm0
+   movdqu %xmm0, OrigIV(%arg2) # ctx_data.orig_IV = iv
+
+   movdqa  SHUF_MASK(%rip), %xmm2
+   PSHUFB_XMM %xmm2, %xmm0
+   movdqu %xmm0, CurCount(%arg2) # ctx_data.current_counter = iv
+
mov arg7, %r12
movdqu  (%r12), %xmm13
movdqa  SHUF_MASK(%rip), %xmm2
@@ -226,13 +241,9 @@ ALL_F:  .octa 0x
pandPOLY(%rip), %xmm2
pxor%xmm2, %xmm13
movdqa  %xmm13, HashKey(%rsp)
-   mov %arg5, %r13 # %xmm13 holds HashKey<<1 (mod poly)
-   and $-16, %r13
-   mov %r13, %r12
 
CALC_AAD_HASH %xmm13 %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 \
%xmm5 %xmm6
-   mov %r13, %r12
 .endm
 
 # GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
@@ -240,6 +251,12 @@ ALL_F:  .octa 0x
 # Requires the input data be at least 1 byte long because of READ_PARTIAL_BLOCK
 # Clobbers rax, r10-r13, and xmm0-xmm15
 .macro GCM_ENC_DEC operation
+   movdqu AadHash(%arg2), %xmm8
+   movdqu HashKey(%rsp), %xmm13
+   add %arg5, InLen(%arg2)
+   mov %arg5, %r13 # save the number of bytes
+   and $-16, %r13  # %r13 = %r13 - (%r13 mod 16)
+   mov %r13, %r12
# Encrypt/Decrypt first few blocks
 
and $(3<<4), %r12
@@ -284,16 +301,23 @@ _four_cipher_left_\@:
GHASH_LAST_4%xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \
 %xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8
 _zero_cipher_left_\@:
+   movdqu %xmm8, AadHash(%arg2)
+   movdqu %xmm0, CurCount(%arg2)
+
mov %arg5, %r13
and $15, %r13   # %r13 = arg5 (mod 16)
je  _multiple_of_16_bytes_\@
 
+   mov %r13, PBlockLen(%arg2)
+
# Handle the last <16 Byte block separately
paddd ONE(%rip), %xmm0# INCR CNT to get Yn
+   movdqu %xmm0, CurCount(%arg2)
movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm0
 
ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
+   movdqu %xmm0, PBlockEncKey(%arg2)
 
lea (%arg4,%r11,1), %r10
mov %r13, %r12
@@ -322,6 +346,7 @@ _zero_cipher_left_\@:
 .endif
 
GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+   movdqu %xmm8, AadHash(%arg2)
 .ifc \operation, enc
# GHASH computation for the last <16 byte block
movdqa SHUF_MASK(%rip), %xmm10
@@ -351,11 +376,15 @@ _multiple_of_16_bytes_\@:
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
 .macro GCM_COMPLETE
-   mov arg9, %r12# %r13 = aadLen (number of bytes)
+   movdqu AadHash(%arg2), %xmm8
+   movdqu HashKey(%rsp), %xmm13
+   mov AadLen(%arg2), %r12  # %r13 = aadLen (number of bytes)
shl $3, %r12  # convert into number of bits
movd%r12d, %xmm15 # len(A) in %xmm15
-   shl $3, %arg5 # len(C) in bits (*128)
-   MOVQ_R64_XMM%arg5, %xmm1
+   mov InLen(%arg2), %r12
+   shl $3, %r12  # len(C) in bits (*128)
+   MOVQ_R64_XMM%r12, %xmm1
+
pslldq  $8, %xmm15# %xmm15 = len(A)||0x
pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
pxor%xmm15, %xmm8
@@ -364,8 +393,7 @@ _multiple_of_16_bytes_\@:
movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm8
 
-   mov %arg6, %rax   # %rax = *Y0
-   movdqu  (%rax), %xmm0 # %xmm0 = Y0
+   movdqu OrigIV(%arg2), %xmm0   # %xmm0 = Y0
ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
pxor%xmm8, %xmm0
 _return_T_\@:
@@ -553,15 +581,14 @@ _get_AAD_done\@:
 
 .macro INITIAL

[PATCH v2 06/14] x86/crypto: aesni: Introduce gcm_context_data

2018-02-14 Thread Dave Watson
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls.  It is passed
as the second argument (after crypto keys), other args are
renumbered.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S  | 115 +
 arch/x86/crypto/aesni-intel_glue.c |  81 ++
 2 files changed, 121 insertions(+), 75 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 8021fd1..6c5a80d 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -111,6 +111,14 @@ ALL_F:  .octa 0x
// (for Karatsuba purposes)
 #defineVARIABLE_OFFSET 16*8
 
+#define AadHash 16*0
+#define AadLen 16*1
+#define InLen (16*1)+8
+#define PBlockEncKey 16*2
+#define OrigIV 16*3
+#define CurCount 16*4
+#define PBlockLen 16*5
+
 #define arg1 rdi
 #define arg2 rsi
 #define arg3 rdx
@@ -121,6 +129,7 @@ ALL_F:  .octa 0x
 #define arg8 STACK_OFFSET+16(%r14)
 #define arg9 STACK_OFFSET+24(%r14)
 #define arg10 STACK_OFFSET+32(%r14)
+#define arg11 STACK_OFFSET+40(%r14)
 #define keysize 2*15*16(%arg1)
 #endif
 
@@ -195,9 +204,9 @@ ALL_F:  .octa 0x
 # GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
 # Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
 .macro GCM_INIT
-   mov %arg6, %r12
+   mov arg7, %r12
movdqu  (%r12), %xmm13
-   movdqa  SHUF_MASK(%rip), %xmm2
+   movdqa  SHUF_MASK(%rip), %xmm2
PSHUFB_XMM %xmm2, %xmm13
 
# precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
@@ -217,7 +226,7 @@ ALL_F:  .octa 0x
pandPOLY(%rip), %xmm2
pxor%xmm2, %xmm13
movdqa  %xmm13, HashKey(%rsp)
-   mov %arg4, %r13 # %xmm13 holds HashKey<<1 (mod 
poly)
+   mov %arg5, %r13 # %xmm13 holds HashKey<<1 (mod poly)
and $-16, %r13
mov %r13, %r12
 .endm
@@ -271,18 +280,18 @@ _four_cipher_left_\@:
GHASH_LAST_4%xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \
 %xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8
 _zero_cipher_left_\@:
-   mov %arg4, %r13
-   and $15, %r13   # %r13 = arg4 (mod 16)
+   mov %arg5, %r13
+   and $15, %r13   # %r13 = arg5 (mod 16)
je  _multiple_of_16_bytes_\@
 
# Handle the last <16 Byte block separately
paddd ONE(%rip), %xmm0# INCR CNT to get Yn
-movdqa SHUF_MASK(%rip), %xmm10
+   movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm0
 
ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
 
-   lea (%arg3,%r11,1), %r10
+   lea (%arg4,%r11,1), %r10
mov %r13, %r12
READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1
 
@@ -320,13 +329,13 @@ _zero_cipher_left_\@:
MOVQ_R64_XMM %xmm0, %rax
cmp $8, %r13
jle _less_than_8_bytes_left_\@
-   mov %rax, (%arg2 , %r11, 1)
+   mov %rax, (%arg3 , %r11, 1)
add $8, %r11
psrldq $8, %xmm0
MOVQ_R64_XMM %xmm0, %rax
sub $8, %r13
 _less_than_8_bytes_left_\@:
-   mov %al,  (%arg2, %r11, 1)
+   mov %al,  (%arg3, %r11, 1)
add $1, %r11
shr $8, %rax
sub $1, %r13
@@ -338,11 +347,11 @@ _multiple_of_16_bytes_\@:
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
 .macro GCM_COMPLETE
-   mov arg8, %r12# %r13 = aadLen (number of bytes)
+   mov arg9, %r12# %r13 = aadLen (number of bytes)
shl $3, %r12  # convert into number of bits
movd%r12d, %xmm15 # len(A) in %xmm15
-   shl $3, %arg4 # len(C) in bits (*128)
-   MOVQ_R64_XMM%arg4, %xmm1
+   shl $3, %arg5 # len(C) in bits (*128)
+   MOVQ_R64_XMM%arg5, %xmm1
pslldq  $8, %xmm15# %xmm15 = len(A)||0x
pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
pxor%xmm15, %xmm8
@@ -351,13 +360,13 @@ _multiple_of_16_bytes_\@:
movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm8
 
-   mov %arg5, %rax   # %rax = *Y0
+   mov %arg6, %rax   # %rax = *Y0
movdqu  (%rax), %xmm0 # %xmm0 = Y0
ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
pxor%xmm8, %xmm0
 _return_T_\@:
-   mov arg9, %r10 # %r10 = authTag
-   mov arg10, %r11# %r11 = auth_tag_len
+   mov arg10, %r10 # %r10 = authTag
+   mov arg11, %r11# %r11 = auth_tag_le

[PATCH v2 09/14] x86/crypto: aesni: Move ghash_mul to GCM_COMPLETE

2018-02-14 Thread Dave Watson
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index aa82493..37b1cee 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -345,7 +345,6 @@ _zero_cipher_left_\@:
pxor%xmm0, %xmm8
 .endif
 
-   GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
movdqu %xmm8, AadHash(%arg2)
 .ifc \operation, enc
# GHASH computation for the last <16 byte block
@@ -378,6 +377,15 @@ _multiple_of_16_bytes_\@:
 .macro GCM_COMPLETE
movdqu AadHash(%arg2), %xmm8
movdqu HashKey(%rsp), %xmm13
+
+   mov PBlockLen(%arg2), %r12
+
+   cmp $0, %r12
+   je _partial_done\@
+
+   GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+
+_partial_done\@:
mov AadLen(%arg2), %r12  # %r13 = aadLen (number of bytes)
shl $3, %r12  # convert into number of bits
movd%r12d, %xmm15 # len(A) in %xmm15
-- 
2.9.5



[PATCH v2 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-14 Thread Dave Watson
Add gcmaes_crypt_by_sg routine, that will do scatter/gather
by sg. Either src or dst may contain multiple buffers, so
iterate over both at the same time if they are different.
If the input is the same as the output, iterate only over one.

Currently both the AAD and TAG must be linear, so copy them out
with scatterlist_map_and_copy.  If first buffer contains the
entire AAD, we can optimize and not copy.   Since the AAD
can be any size, if copied it must be on the heap.  TAG can
be on the stack since it is always < 16 bytes.

Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_glue.c | 133 +
 1 file changed, 133 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index de986f9..acbe7e8 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -791,6 +791,127 @@ static int generic_gcmaes_set_authsize(struct crypto_aead 
*tfm,
return 0;
 }
 
+static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
+ unsigned int assoclen, u8 *hash_subkey,
+ u8 *iv, void *aes_ctx)
+{
+   struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+   unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+   struct gcm_context_data data AESNI_ALIGN_ATTR;
+   struct scatter_walk dst_sg_walk = {};
+   unsigned long left = req->cryptlen;
+   unsigned long len, srclen, dstlen;
+   struct scatter_walk assoc_sg_walk;
+   struct scatter_walk src_sg_walk;
+   struct scatterlist src_start[2];
+   struct scatterlist dst_start[2];
+   struct scatterlist *src_sg;
+   struct scatterlist *dst_sg;
+   u8 *src, *dst, *assoc;
+   u8 *assocmem = NULL;
+   u8 authTag[16];
+
+   if (!enc)
+   left -= auth_tag_len;
+
+   /* Linearize assoc, if not already linear */
+   if (req->src->length >= assoclen && req->src->length &&
+   (!PageHighMem(sg_page(req->src)) ||
+   req->src->offset + req->src->length < PAGE_SIZE)) {
+   scatterwalk_start(&assoc_sg_walk, req->src);
+   assoc = scatterwalk_map(&assoc_sg_walk);
+   } else {
+   /* assoc can be any length, so must be on heap */
+   assocmem = kmalloc(assoclen, GFP_ATOMIC);
+   if (unlikely(!assocmem))
+   return -ENOMEM;
+   assoc = assocmem;
+
+   scatterwalk_map_and_copy(assoc, req->src, 0, assoclen, 0);
+   }
+
+   src_sg = scatterwalk_ffwd(src_start, req->src, req->assoclen);
+   scatterwalk_start(&src_sg_walk, src_sg);
+   if (req->src != req->dst) {
+   dst_sg = scatterwalk_ffwd(dst_start, req->dst, req->assoclen);
+   scatterwalk_start(&dst_sg_walk, dst_sg);
+   }
+
+   kernel_fpu_begin();
+   aesni_gcm_init(aes_ctx, &data, iv,
+   hash_subkey, assoc, assoclen);
+   if (req->src != req->dst) {
+   while (left) {
+   src = scatterwalk_map(&src_sg_walk);
+   dst = scatterwalk_map(&dst_sg_walk);
+   srclen = scatterwalk_clamp(&src_sg_walk, left);
+   dstlen = scatterwalk_clamp(&dst_sg_walk, left);
+   len = min(srclen, dstlen);
+   if (len) {
+   if (enc)
+   aesni_gcm_enc_update(aes_ctx, &data,
+dst, src, len);
+   else
+   aesni_gcm_dec_update(aes_ctx, &data,
+dst, src, len);
+   }
+   left -= len;
+
+   scatterwalk_unmap(src);
+   scatterwalk_unmap(dst);
+   scatterwalk_advance(&src_sg_walk, len);
+   scatterwalk_advance(&dst_sg_walk, len);
+   scatterwalk_done(&src_sg_walk, 0, left);
+   scatterwalk_done(&dst_sg_walk, 1, left);
+   }
+   } else {
+   while (left) {
+   dst = src = scatterwalk_map(&src_sg_walk);
+   len = scatterwalk_clamp(&src_sg_walk, left);
+   if (len) {
+   if (enc)
+   aesni_gcm_enc_update(aes_ctx, &data,
+src, src, len);
+   else
+   aesni_gcm_dec_update(aes_ctx, &data,
+ 

[PATCH v2 13/14] x86/crypto: aesni: Introduce scatter/gather asm function stubs

2018-02-14 Thread Dave Watson
The asm macros are all set up now, introduce entry points.

GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S  | 116 -
 arch/x86/crypto/aesni-intel_glue.c |  16 +
 2 files changed, 106 insertions(+), 26 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index b941952..311b2de 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -200,8 +200,8 @@ ALL_F:  .octa 0x
 # Output: HashKeys stored in gcm_context_data.  Only needs to be called
 # once per key.
 # clobbers r12, and tmp xmm registers.
-.macro PRECOMPUTE TMP1 TMP2 TMP3 TMP4 TMP5 TMP6 TMP7
-   mov arg7, %r12
+.macro PRECOMPUTE SUBKEY TMP1 TMP2 TMP3 TMP4 TMP5 TMP6 TMP7
+   mov \SUBKEY, %r12
movdqu  (%r12), \TMP3
movdqa  SHUF_MASK(%rip), \TMP2
PSHUFB_XMM \TMP2, \TMP3
@@ -254,14 +254,14 @@ ALL_F:  .octa 0x
 
 # GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
 # Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
-.macro GCM_INIT
-   mov arg9, %r11
+.macro GCM_INIT Iv SUBKEY AAD AADLEN
+   mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
xor %r11, %r11
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
-   mov %arg6, %rax
+   mov \Iv, %rax
movdqu (%rax), %xmm0
movdqu %xmm0, OrigIV(%arg2) # ctx_data.orig_IV = iv
 
@@ -269,11 +269,11 @@ ALL_F:  .octa 0x
PSHUFB_XMM %xmm2, %xmm0
movdqu %xmm0, CurCount(%arg2) # ctx_data.current_counter = iv
 
-   PRECOMPUTE %xmm1 %xmm2 %xmm3 %xmm4 %xmm5 %xmm6 %xmm7
+   PRECOMPUTE \SUBKEY, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
movdqa HashKey(%arg2), %xmm13
 
-   CALC_AAD_HASH %xmm13 %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 \
-   %xmm5 %xmm6
+   CALC_AAD_HASH %xmm13, \AAD, \AADLEN, %xmm0, %xmm1, %xmm2, %xmm3, \
+   %xmm4, %xmm5, %xmm6
 .endm
 
 # GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
@@ -435,7 +435,7 @@ _multiple_of_16_bytes_\@:
 # GCM_COMPLETE Finishes update of tag of last partial block
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
-.macro GCM_COMPLETE
+.macro GCM_COMPLETE AUTHTAG AUTHTAGLEN
movdqu AadHash(%arg2), %xmm8
movdqu HashKey(%arg2), %xmm13
 
@@ -466,8 +466,8 @@ _partial_done\@:
ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
pxor%xmm8, %xmm0
 _return_T_\@:
-   mov arg10, %r10 # %r10 = authTag
-   mov arg11, %r11# %r11 = auth_tag_len
+   mov \AUTHTAG, %r10 # %r10 = authTag
+   mov \AUTHTAGLEN, %r11# %r11 = auth_tag_len
cmp $16, %r11
je  _T_16_\@
cmp $8, %r11
@@ -599,11 +599,11 @@ _done_read_partial_block_\@:
 
 # CALC_AAD_HASH: Calculates the hash of the data which will not be encrypted.
 # clobbers r10-11, xmm14
-.macro CALC_AAD_HASH HASHKEY TMP1 TMP2 TMP3 TMP4 TMP5 \
+.macro CALC_AAD_HASH HASHKEY AAD AADLEN TMP1 TMP2 TMP3 TMP4 TMP5 \
TMP6 TMP7
MOVADQ SHUF_MASK(%rip), %xmm14
-   movarg8, %r10   # %r10 = AAD
-   movarg9, %r11   # %r11 = aadLen
+   mov\AAD, %r10   # %r10 = AAD
+   mov\AADLEN, %r11# %r11 = aadLen
pxor   \TMP7, \TMP7
pxor   \TMP6, \TMP6
 
@@ -1103,18 +1103,18 @@ TMP6 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 
operation
mov   keysize,%eax
shr   $2,%eax   # 128->4, 192->6, 256->8
sub   $4,%eax   # 128->0, 192->2, 256->4
-   jzaes_loop_par_enc_done
+   jzaes_loop_par_enc_done\@
 
-aes_loop_par_enc:
+aes_loop_par_enc\@:
MOVADQ(%r10),\TMP3
 .irpc  index, 1234
AESENC\TMP3, %xmm\index
 .endr
add   $16,%r10
sub   $1,%eax
-   jnz   aes_loop_par_enc
+   jnz   aes_loop_par_enc\@
 
-aes_loop_par_enc_done:
+aes_loop_par_enc_done\@:
MOVADQ(%r10), \TMP3
AESENCLAST \TMP3, \XMM1   # Round 10
AESENCLAST \TMP3, \XMM2
@@ -1311,18 +1311,18 @@ TMP6 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 
operation
mov   keysize,%eax
shr   $2,%eax   # 128->4, 192->6, 256->8
sub   $4,%eax   # 128->0, 192->2, 256->4
-   jzaes_

[PATCH v2 12/14] x86/crypto: aesni: Add fast path for > 16 byte update

2018-02-14 Thread Dave Watson
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount.  Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e2 for the average case.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 398bd2237f..b941952 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -355,12 +355,37 @@ _zero_cipher_left_\@:
ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
movdqu %xmm0, PBlockEncKey(%arg2)
 
+   cmp $16, %arg5
+   jge _large_enough_update_\@
+
lea (%arg4,%r11,1), %r10
mov %r13, %r12
READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1
+   jmp _data_read_\@
+
+_large_enough_update_\@:
+   sub $16, %r11
+   add %r13, %r11
+
+   # receive the last <16 Byte block
+   movdqu  (%arg4, %r11, 1), %xmm1
 
+   sub %r13, %r11
+   add $16, %r11
+
+   lea SHIFT_MASK+16(%rip), %r12
+   # adjust the shuffle mask pointer to be able to shift 16-r13 bytes
+   # (r13 is the number of bytes in plaintext mod 16)
+   sub %r13, %r12
+   # get the appropriate shuffle mask
+   movdqu  (%r12), %xmm2
+   # shift right 16-r13 bytes
+   PSHUFB_XMM  %xmm2, %xmm1
+
+_data_read_\@:
lea ALL_F+16(%rip), %r12
sub %r13, %r12
+
 .ifc \operation, dec
movdqa  %xmm1, %xmm2
 .endif
-- 
2.9.5



[PATCH v2 11/14] x86/crypto: aesni: Introduce partial block macro

2018-02-14 Thread Dave Watson
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.

Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.

The data offset %r11 is also updated after the partial block.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 151 +-
 1 file changed, 150 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 3ada06b..398bd2237f 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -284,7 +284,13 @@ ALL_F:  .octa 0x
movdqu AadHash(%arg2), %xmm8
movdqu HashKey(%arg2), %xmm13
add %arg5, InLen(%arg2)
+
+   xor %r11, %r11 # initialise the data pointer offset as zero
+   PARTIAL_BLOCK %arg3 %arg4 %arg5 %r11 %xmm8 \operation
+
+   sub %r11, %arg5 # sub partial block data used
mov %arg5, %r13 # save the number of bytes
+
and $-16, %r13  # %r13 = %r13 - (%r13 mod 16)
mov %r13, %r12
# Encrypt/Decrypt first few blocks
@@ -605,6 +611,150 @@ _get_AAD_done\@:
movdqu \TMP6, AadHash(%arg2)
 .endm
 
+# PARTIAL_BLOCK: Handles encryption/decryption and the tag partial blocks
+# between update calls.
+# Requires the input data be at least 1 byte long due to READ_PARTIAL_BLOCK
+# Outputs encrypted bytes, and updates hash and partial info in 
gcm_data_context
+# Clobbers rax, r10, r12, r13, xmm0-6, xmm9-13
+.macro PARTIAL_BLOCK CYPH_PLAIN_OUT PLAIN_CYPH_IN PLAIN_CYPH_LEN DATA_OFFSET \
+   AAD_HASH operation
+   mov PBlockLen(%arg2), %r13
+   cmp $0, %r13
+   je  _partial_block_done_\@  # Leave Macro if no partial blocks
+   # Read in input data without over reading
+   cmp $16, \PLAIN_CYPH_LEN
+   jl  _fewer_than_16_bytes_\@
+   movups  (\PLAIN_CYPH_IN), %xmm1 # If more than 16 bytes, just fill xmm
+   jmp _data_read_\@
+
+_fewer_than_16_bytes_\@:
+   lea (\PLAIN_CYPH_IN, \DATA_OFFSET, 1), %r10
+   mov \PLAIN_CYPH_LEN, %r12
+   READ_PARTIAL_BLOCK %r10 %r12 %xmm0 %xmm1
+
+   mov PBlockLen(%arg2), %r13
+
+_data_read_\@: # Finished reading in data
+
+   movdqu  PBlockEncKey(%arg2), %xmm9
+   movdqu  HashKey(%arg2), %xmm13
+
+   lea SHIFT_MASK(%rip), %r12
+
+   # adjust the shuffle mask pointer to be able to shift r13 bytes
+   # r16-r13 is the number of bytes in plaintext mod 16)
+   add %r13, %r12
+   movdqu  (%r12), %xmm2   # get the appropriate shuffle mask
+   PSHUFB_XMM %xmm2, %xmm9 # shift right r13 bytes
+
+.ifc \operation, dec
+   movdqa  %xmm1, %xmm3
+   pxor%xmm1, %xmm9# Cyphertext XOR E(K, Yn)
+
+   mov \PLAIN_CYPH_LEN, %r10
+   add %r13, %r10
+   # Set r10 to be the amount of data left in CYPH_PLAIN_IN after filling
+   sub $16, %r10
+   # Determine if if partial block is not being filled and
+   # shift mask accordingly
+   jge _no_extra_mask_1_\@
+   sub %r10, %r12
+_no_extra_mask_1_\@:
+
+   movdqu  ALL_F-SHIFT_MASK(%r12), %xmm1
+   # get the appropriate mask to mask out bottom r13 bytes of xmm9
+   pand%xmm1, %xmm9# mask out bottom r13 bytes of xmm9
+
+   pand%xmm1, %xmm3
+   movdqa  SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM  %xmm10, %xmm3
+   PSHUFB_XMM  %xmm2, %xmm3
+   pxor%xmm3, \AAD_HASH
+
+   cmp $0, %r10
+   jl  _partial_incomplete_1_\@
+
+   # GHASH computation for the last <16 Byte block
+   GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
+   xor %rax,%rax
+
+   mov %rax, PBlockLen(%arg2)
+   jmp _dec_done_\@
+_partial_incomplete_1_\@:
+   add \PLAIN_CYPH_LEN, PBlockLen(%arg2)
+_dec_done_\@:
+   movdqu  \AAD_HASH, AadHash(%arg2)
+.else
+   pxor%xmm1, %xmm9# Plaintext XOR E(K, Yn)
+
+   mov \PLAIN_CYPH_LEN, %r10
+   add %r13, %r10
+   # Set r10 to be the amount of data left in CYPH_PLAIN_IN after filling
+   sub $16, %r10
+   # Determine if if partial block is not being filled and
+   # shift mask accordingly
+   jge _no_extra_mask_2_\@
+   sub %r10, %r12
+_no_extra_mask_2_\@:
+
+   movdqu  ALL_F-SHIFT_MASK(%r12), %xmm1
+   # get the appropriate mask to mask out bottom r13 bytes of xmm9
+   pand%xmm1, %xmm9
+
+   movdqa  SHUF_MASK(%rip), %xmm1
+   PSHUFB_XMM %xmm1, %xmm9
+   PSHUFB_XMM %xmm2, %xmm9
+   pxor%xmm9, \AAD_HASH
+
+   cmp $0, %r10
+   jl  _partial_incomplete_2_\@
+
+   # GHASH computation for the last <16 Byte block
+   GHASH_MUL \AAD_HASH, %xmm13, %xmm0, %xmm10, %xmm11, %xmm5, %xmm6
+   xor %rax,%rax
+
+ 

[PATCH v2 10/14] x86/crypto: aesni: Move HashKey computation from stack to gcm_context

2018-02-14 Thread Dave Watson
HashKey computation only needs to happen once per scatter/gather operation,
save it between calls in gcm_context struct instead of on the stack.
Since the asm no longer stores anything on the stack, we can use
%rsp directly, and clean up the frame save/restore macros a bit.

Hashkeys actually only need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 205 --
 1 file changed, 106 insertions(+), 99 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 37b1cee..3ada06b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -93,23 +93,6 @@ ALL_F:  .octa 0x
 
 
 #defineSTACK_OFFSET8*3
-#defineHashKey 16*0// store HashKey <<1 mod poly here
-#defineHashKey_2   16*1// store HashKey^2 <<1 mod poly here
-#defineHashKey_3   16*2// store HashKey^3 <<1 mod poly here
-#defineHashKey_4   16*3// store HashKey^4 <<1 mod poly here
-#defineHashKey_k   16*4// store XOR of High 64 bits and Low 64
-   // bits of  HashKey <<1 mod poly here
-   //(for Karatsuba purposes)
-#defineHashKey_2_k 16*5// store XOR of High 64 bits and Low 64
-   // bits of  HashKey^2 <<1 mod poly here
-   // (for Karatsuba purposes)
-#defineHashKey_3_k 16*6// store XOR of High 64 bits and Low 64
-   // bits of  HashKey^3 <<1 mod poly here
-   // (for Karatsuba purposes)
-#defineHashKey_4_k 16*7// store XOR of High 64 bits and Low 64
-   // bits of  HashKey^4 <<1 mod poly here
-   // (for Karatsuba purposes)
-#defineVARIABLE_OFFSET 16*8
 
 #define AadHash 16*0
 #define AadLen 16*1
@@ -118,6 +101,22 @@ ALL_F:  .octa 0x
 #define OrigIV 16*3
 #define CurCount 16*4
 #define PBlockLen 16*5
+#defineHashKey 16*6// store HashKey <<1 mod poly here
+#defineHashKey_2   16*7// store HashKey^2 <<1 mod poly here
+#defineHashKey_3   16*8// store HashKey^3 <<1 mod poly here
+#defineHashKey_4   16*9// store HashKey^4 <<1 mod poly here
+#defineHashKey_k   16*10   // store XOR of High 64 bits and Low 64
+   // bits of  HashKey <<1 mod poly here
+   //(for Karatsuba purposes)
+#defineHashKey_2_k 16*11   // store XOR of High 64 bits and Low 64
+   // bits of  HashKey^2 <<1 mod poly here
+   // (for Karatsuba purposes)
+#defineHashKey_3_k 16*12   // store XOR of High 64 bits and Low 64
+   // bits of  HashKey^3 <<1 mod poly here
+   // (for Karatsuba purposes)
+#defineHashKey_4_k 16*13   // store XOR of High 64 bits and Low 64
+   // bits of  HashKey^4 <<1 mod poly here
+   // (for Karatsuba purposes)
 
 #define arg1 rdi
 #define arg2 rsi
@@ -125,11 +124,11 @@ ALL_F:  .octa 0x
 #define arg4 rcx
 #define arg5 r8
 #define arg6 r9
-#define arg7 STACK_OFFSET+8(%r14)
-#define arg8 STACK_OFFSET+16(%r14)
-#define arg9 STACK_OFFSET+24(%r14)
-#define arg10 STACK_OFFSET+32(%r14)
-#define arg11 STACK_OFFSET+40(%r14)
+#define arg7 STACK_OFFSET+8(%rsp)
+#define arg8 STACK_OFFSET+16(%rsp)
+#define arg9 STACK_OFFSET+24(%rsp)
+#define arg10 STACK_OFFSET+32(%rsp)
+#define arg11 STACK_OFFSET+40(%rsp)
 #define keysize 2*15*16(%arg1)
 #endif
 
@@ -183,28 +182,79 @@ ALL_F:  .octa 0x
push%r12
push%r13
push%r14
-   mov %rsp, %r14
 #
 # states of %xmm registers %xmm6:%xmm15 not saved
 # all %xmm registers are clobbered
 #
-   sub $VARIABLE_OFFSET, %rsp
-   and $~63, %rsp
 .endm
 
 
 .macro FUNC_RESTORE
-   mov %r14, %rsp
pop %r14
pop %r13
pop %r12
 .endm
 
+# Precompute hashkeys.
+# Input: Hash subkey.
+# Output: HashKeys stored in gcm_context_data.  Only needs to be called
+# once per key.
+# clobbers r12, and tmp xmm registers.
+.macro PRECOMPUTE TMP1 TMP2 TMP3 TMP4 TMP5 TMP6 TMP7
+   mov arg7, %r12
+   movdqu  (%r12), \TMP3
+   movdqa  SHUF_MASK(%rip), \TMP2
+   PSHUFB_XMM \TMP2, \TMP3
+
+   # precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
+
+   movdqa  \TMP3, \TMP2
+   psllq   $1, \TMP3
+   psrlq   $6

[PATCH v2 01/14] x86/crypto: aesni: Merge INITIAL_BLOCKS_ENC/DEC

2018-02-14 Thread Dave Watson
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.

Use macro counter \@ to simplify implementation.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 298 ++
 1 file changed, 48 insertions(+), 250 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 76d8cd4..48911fe 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -275,234 +275,7 @@ _done_read_partial_block_\@:
 */
 
 
-.macro INITIAL_BLOCKS_DEC num_initial_blocks TMP1 TMP2 TMP3 TMP4 TMP5 XMM0 
XMM1 \
-XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
-MOVADQ SHUF_MASK(%rip), %xmm14
-   movarg7, %r10   # %r10 = AAD
-   movarg8, %r11   # %r11 = aadLen
-   pxor   %xmm\i, %xmm\i
-   pxor   \XMM2, \XMM2
-
-   cmp$16, %r11
-   jl _get_AAD_rest\num_initial_blocks\operation
-_get_AAD_blocks\num_initial_blocks\operation:
-   movdqu (%r10), %xmm\i
-   PSHUFB_XMM %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   %xmm\i, \XMM2
-   GHASH_MUL  \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-   add$16, %r10
-   sub$16, %r11
-   cmp$16, %r11
-   jge_get_AAD_blocks\num_initial_blocks\operation
-
-   movdqu \XMM2, %xmm\i
-
-   /* read the last <16B of AAD */
-_get_AAD_rest\num_initial_blocks\operation:
-   cmp$0, %r11
-   je _get_AAD_done\num_initial_blocks\operation
-
-   READ_PARTIAL_BLOCK %r10, %r11, \TMP1, %xmm\i
-   PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   \XMM2, %xmm\i
-   GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-
-_get_AAD_done\num_initial_blocks\operation:
-   xor%r11, %r11 # initialise the data pointer offset as zero
-   # start AES for num_initial_blocks blocks
-
-   mov%arg5, %rax  # %rax = *Y0
-   movdqu (%rax), \XMM0# XMM0 = Y0
-   PSHUFB_XMM   %xmm14, \XMM0
-
-.if (\i == 5) || (\i == 6) || (\i == 7)
-   MOVADQ  ONE(%RIP),\TMP1
-   MOVADQ  (%arg1),\TMP2
-.irpc index, \i_seq
-   paddd  \TMP1, \XMM0 # INCR Y0
-   movdqa \XMM0, %xmm\index
-   PSHUFB_XMM   %xmm14, %xmm\index  # perform a 16 byte swap
-   pxor   \TMP2, %xmm\index
-.endr
-   lea 0x10(%arg1),%r10
-   mov keysize,%eax
-   shr $2,%eax # 128->4, 192->6, 256->8
-   add $5,%eax   # 128->9, 192->11, 256->13
-
-aes_loop_initial_dec\num_initial_blocks:
-   MOVADQ  (%r10),\TMP1
-.irpc  index, \i_seq
-   AESENC  \TMP1, %xmm\index
-.endr
-   add $16,%r10
-   sub $1,%eax
-   jnz aes_loop_initial_dec\num_initial_blocks
-
-   MOVADQ  (%r10), \TMP1
-.irpc index, \i_seq
-   AESENCLAST \TMP1, %xmm\index # Last Round
-.endr
-.irpc index, \i_seq
-   movdqu (%arg3 , %r11, 1), \TMP1
-   pxor   \TMP1, %xmm\index
-   movdqu %xmm\index, (%arg2 , %r11, 1)
-   # write back plaintext/ciphertext for num_initial_blocks
-   add$16, %r11
-
-   movdqa \TMP1, %xmm\index
-   PSHUFB_XMM %xmm14, %xmm\index
-# prepare plaintext/ciphertext for GHASH computation
-.endr
-.endif
-
-# apply GHASH on num_initial_blocks blocks
-
-.if \i == 5
-pxor   %xmm5, %xmm6
-   GHASH_MUL  %xmm6, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-pxor   %xmm6, %xmm7
-   GHASH_MUL  %xmm7, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-pxor   %xmm7, %xmm8
-   GHASH_MUL  %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-.elseif \i == 6
-pxor   %xmm6, %xmm7
-   GHASH_MUL  %xmm7, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-pxor   %xmm7, %xmm8
-   GHASH_MUL  %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-.elseif \i == 7
-pxor   %xmm7, %xmm8
-   GHASH_MUL  %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-.endif
-   cmp$64, %r13
-   jl  _initial_blocks_done\num_initial_blocks\operation
-   # no need for precomputed values
-/*
-*
-* Precomputations for HashKey parallel with encryption of first 4 blocks.
-* Haskey_i_k holds XORed values of the low and high parts of the Haskey_i
-*/
-   MOVADQ ONE(%rip), \TMP1
-   paddd  \TMP1, \XMM0  # INCR Y0
-   MOVADQ \XMM0, \XMM1
-   PSHUFB_XMM  %xmm14, \XMM1# perform a 16 byte swap
-
-   paddd  \TMP1, \XMM0  # INCR Y0
-   MOVADQ \XMM0, \XMM2
-   PSHUFB_XMM  %xmm14, \XMM2# perform a 16 byte swap
-
-   paddd  \TMP1, \XMM0  # INCR Y0
-   MOVADQ \XMM0, \XMM3
-   PSHUFB_XMM %xmm14, \X

[PATCH v2 02/14] x86/crypto: aesni: Macro-ify func save/restore

2018-02-14 Thread Dave Watson
Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 53 ++-
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 48911fe..39b42b1 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -170,6 +170,26 @@ ALL_F:  .octa 0x
 #define TKEYP  T1
 #endif
 
+.macro FUNC_SAVE
+   push%r12
+   push%r13
+   push%r14
+   mov %rsp, %r14
+#
+# states of %xmm registers %xmm6:%xmm15 not saved
+# all %xmm registers are clobbered
+#
+   sub $VARIABLE_OFFSET, %rsp
+   and $~63, %rsp
+.endm
+
+
+.macro FUNC_RESTORE
+   mov %r14, %rsp
+   pop %r14
+   pop %r13
+   pop %r12
+.endm
 
 #ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
@@ -1130,16 +1150,7 @@ _esb_loop_\@:
 *
 */
 ENTRY(aesni_gcm_dec)
-   push%r12
-   push%r13
-   push%r14
-   mov %rsp, %r14
-/*
-* states of %xmm registers %xmm6:%xmm15 not saved
-* all %xmm registers are clobbered
-*/
-   sub $VARIABLE_OFFSET, %rsp
-   and $~63, %rsp# align rsp to 64 bytes
+   FUNC_SAVE
mov %arg6, %r12
movdqu  (%r12), %xmm13# %xmm13 = HashKey
 movdqa  SHUF_MASK(%rip), %xmm2
@@ -1309,10 +1320,7 @@ _T_1_decrypt:
 _T_16_decrypt:
movdqu  %xmm0, (%r10)
 _return_T_done_decrypt:
-   mov %r14, %rsp
-   pop %r14
-   pop %r13
-   pop %r12
+   FUNC_RESTORE
ret
 ENDPROC(aesni_gcm_dec)
 
@@ -1393,22 +1401,12 @@ ENDPROC(aesni_gcm_dec)
 * poly = x^128 + x^127 + x^126 + x^121 + 1
 ***/
 ENTRY(aesni_gcm_enc)
-   push%r12
-   push%r13
-   push%r14
-   mov %rsp, %r14
-#
-# states of %xmm registers %xmm6:%xmm15 not saved
-# all %xmm registers are clobbered
-#
-   sub $VARIABLE_OFFSET, %rsp
-   and $~63, %rsp
+   FUNC_SAVE
mov %arg6, %r12
movdqu  (%r12), %xmm13
 movdqa  SHUF_MASK(%rip), %xmm2
PSHUFB_XMM %xmm2, %xmm13
 
-
 # precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
 
movdqa  %xmm13, %xmm2
@@ -1576,10 +1574,7 @@ _T_1_encrypt:
 _T_16_encrypt:
movdqu  %xmm0, (%r10)
 _return_T_done_encrypt:
-   mov %r14, %rsp
-   pop %r14
-   pop %r13
-   pop %r12
+   FUNC_RESTORE
ret
 ENDPROC(aesni_gcm_enc)
 
-- 
2.9.5



[PATCH v2 00/14] x86/crypto gcmaes SSE scatter/gather support

2018-02-14 Thread Dave Watson
This patch set refactors the x86 aes/gcm SSE crypto routines to
support true scatter/gather by adding gcm_enc/dec_update methods.

The layout is:

* First 5 patches refactor the code to use macros, so changes only
  need to be applied once for encode and decode.  There should be no
  functional changes.

* The next 6 patches introduce a gcm_context structure to be passed
  between scatter/gather calls to maintain state.  The struct is also
  used as scratch space for the existing enc/dec routines.

* The last 2 set up the asm function entry points for scatter gather
  support, and then call the new routines per buffer in the passed in
  sglist in aesni-intel_glue.

Testing: 
asm itself fuzz tested vs. existing code and isa-l asm.
Ran libkcapi test suite, passes.

perf of a large (16k messages) TLS sends sg vs. no sg:

no-sg

33287255597  cycles  
53702871176  instructions

43.47%   _crypt_by_4
17.83%   memcpy
16.36%   aes_loop_par_enc_done

sg

27568944591  cycles 
54580446678  instructions

49.87%   _crypt_by_4
17.40%   aes_loop_par_enc_done
1.79%aes_loop_initial_5416
1.52%aes_loop_initial_4974
1.27%gcmaes_encrypt_sg.constprop.15

V1 -> V2:

patch 14: merge enc/dec
  also use new routine if cryptlen < AVX_GEN2_OPTSIZE
  optimize case if assoc is already linear

Dave Watson (14):
  x86/crypto: aesni: Merge INITIAL_BLOCKS_ENC/DEC
  x86/crypto: aesni: Macro-ify func save/restore
  x86/crypto: aesni: Add GCM_INIT macro
  x86/crypto: aesni: Add GCM_COMPLETE macro
  x86/crypto: aesni: Merge encode and decode to GCM_ENC_DEC macro
  x86/crypto: aesni: Introduce gcm_context_data
  x86/crypto: aesni: Split AAD hash calculation to separate macro
  x86/crypto: aesni: Fill in new context data structures
  x86/crypto: aesni: Move ghash_mul to GCM_COMPLETE
  x86/crypto: aesni: Move HashKey computation from stack to gcm_context
  x86/crypto: aesni: Introduce partial block macro
  x86/crypto: aesni: Add fast path for > 16 byte update
  x86/crypto: aesni: Introduce scatter/gather asm function stubs
  x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

 arch/x86/crypto/aesni-intel_asm.S  | 1414 ++--
 arch/x86/crypto/aesni-intel_glue.c |  230 +-
 2 files changed, 899 insertions(+), 745 deletions(-)

-- 
2.9.5



Re: [PATCH] crypto: nx-842: Delete an error message for a failed memory allocation in nx842_pseries_init()

2018-02-14 Thread Dan Streetman
On Wed, Feb 14, 2018 at 11:17 AM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Wed, 14 Feb 2018 17:05:13 +0100
>
> Omit an extra message for a memory allocation failure in this function.
>
> This issue was detected by using the Coccinelle software.
>
> Signed-off-by: Markus Elfring 

Reviewed-by: Dan Streetman 

> ---
>  drivers/crypto/nx/nx-842-pseries.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/crypto/nx/nx-842-pseries.c 
> b/drivers/crypto/nx/nx-842-pseries.c
> index bf52cd1d7fca..66869976cfa2 100644
> --- a/drivers/crypto/nx/nx-842-pseries.c
> +++ b/drivers/crypto/nx/nx-842-pseries.c
> @@ -1105,10 +1105,9 @@ static int __init nx842_pseries_init(void)
>
> RCU_INIT_POINTER(devdata, NULL);
> new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
> -   if (!new_devdata) {
> -   pr_err("Could not allocate memory for device data\n");
> +   if (!new_devdata)
> return -ENOMEM;
> -   }
> +
> RCU_INIT_POINTER(devdata, new_devdata);
>
> ret = vio_register_driver(&nx842_vio_driver);
> --
> 2.16.1
>


[PATCH] crypto: nx-842: Delete an error message for a failed memory allocation in nx842_pseries_init()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 17:05:13 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/nx/nx-842-pseries.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c 
b/drivers/crypto/nx/nx-842-pseries.c
index bf52cd1d7fca..66869976cfa2 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -1105,10 +1105,9 @@ static int __init nx842_pseries_init(void)
 
RCU_INIT_POINTER(devdata, NULL);
new_devdata = kzalloc(sizeof(*new_devdata), GFP_KERNEL);
-   if (!new_devdata) {
-   pr_err("Could not allocate memory for device data\n");
+   if (!new_devdata)
return -ENOMEM;
-   }
+
RCU_INIT_POINTER(devdata, new_devdata);
 
ret = vio_register_driver(&nx842_vio_driver);
-- 
2.16.1



Re: [PATCH v2 4/6] crypto: virtio: convert to new crypto engine API

2018-02-14 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 08:15:32PM +0100, Corentin Labbe wrote:
> This patch convert the driver to the new crypto engine API.
> 
> Signed-off-by: Corentin Labbe 

Acked-by: Michael S. Tsirkin 

Pls queue when/if rest of changes go in.

> ---
>  drivers/crypto/virtio/virtio_crypto_algs.c   | 16 ++--
>  drivers/crypto/virtio/virtio_crypto_common.h |  3 +--
>  drivers/crypto/virtio/virtio_crypto_core.c   |  3 ---
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> index abe8c15450df..ba190cfa7aa1 100644
> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -29,6 +29,7 @@
>  
>  
>  struct virtio_crypto_ablkcipher_ctx {
> + struct crypto_engine_ctx enginectx;
>   struct virtio_crypto *vcrypto;
>   struct crypto_tfm *tfm;
>  
> @@ -491,7 +492,7 @@ static int virtio_crypto_ablkcipher_encrypt(struct 
> ablkcipher_request *req)
>   vc_sym_req->ablkcipher_req = req;
>   vc_sym_req->encrypt = true;
>  
> - return crypto_transfer_cipher_request_to_engine(data_vq->engine, req);
> + return crypto_transfer_ablkcipher_request_to_engine(data_vq->engine, 
> req);
>  }
>  
>  static int virtio_crypto_ablkcipher_decrypt(struct ablkcipher_request *req)
> @@ -511,7 +512,7 @@ static int virtio_crypto_ablkcipher_decrypt(struct 
> ablkcipher_request *req)
>   vc_sym_req->ablkcipher_req = req;
>   vc_sym_req->encrypt = false;
>  
> - return crypto_transfer_cipher_request_to_engine(data_vq->engine, req);
> + return crypto_transfer_ablkcipher_request_to_engine(data_vq->engine, 
> req);
>  }
>  
>  static int virtio_crypto_ablkcipher_init(struct crypto_tfm *tfm)
> @@ -521,6 +522,9 @@ static int virtio_crypto_ablkcipher_init(struct 
> crypto_tfm *tfm)
>   tfm->crt_ablkcipher.reqsize = sizeof(struct virtio_crypto_sym_request);
>   ctx->tfm = tfm;
>  
> + ctx->enginectx.op.do_one_request = virtio_crypto_ablkcipher_crypt_req;
> + ctx->enginectx.op.prepare_request = NULL;
> + ctx->enginectx.op.unprepare_request = NULL;
>   return 0;
>  }
>  
> @@ -538,9 +542,9 @@ static void virtio_crypto_ablkcipher_exit(struct 
> crypto_tfm *tfm)
>  }
>  
>  int virtio_crypto_ablkcipher_crypt_req(
> - struct crypto_engine *engine,
> - struct ablkcipher_request *req)
> + struct crypto_engine *engine, void *vreq)
>  {
> + struct ablkcipher_request *req = container_of(vreq, struct 
> ablkcipher_request, base);
>   struct virtio_crypto_sym_request *vc_sym_req =
>   ablkcipher_request_ctx(req);
>   struct virtio_crypto_request *vc_req = &vc_sym_req->base;
> @@ -561,8 +565,8 @@ static void virtio_crypto_ablkcipher_finalize_req(
>   struct ablkcipher_request *req,
>   int err)
>  {
> - crypto_finalize_cipher_request(vc_sym_req->base.dataq->engine,
> - req, err);
> + crypto_finalize_ablkcipher_request(vc_sym_req->base.dataq->engine,
> +req, err);
>   kzfree(vc_sym_req->iv);
>   virtcrypto_clear_request(&vc_sym_req->base);
>  }
> diff --git a/drivers/crypto/virtio/virtio_crypto_common.h 
> b/drivers/crypto/virtio/virtio_crypto_common.h
> index e976539a05d9..72621bd67211 100644
> --- a/drivers/crypto/virtio/virtio_crypto_common.h
> +++ b/drivers/crypto/virtio/virtio_crypto_common.h
> @@ -107,8 +107,7 @@ struct virtio_crypto *virtcrypto_get_dev_node(int node);
>  int virtcrypto_dev_start(struct virtio_crypto *vcrypto);
>  void virtcrypto_dev_stop(struct virtio_crypto *vcrypto);
>  int virtio_crypto_ablkcipher_crypt_req(
> - struct crypto_engine *engine,
> - struct ablkcipher_request *req);
> + struct crypto_engine *engine, void *vreq);
>  
>  void
>  virtcrypto_clear_request(struct virtio_crypto_request *vc_req);
> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
> b/drivers/crypto/virtio/virtio_crypto_core.c
> index ff1410a32c2b..83326986c113 100644
> --- a/drivers/crypto/virtio/virtio_crypto_core.c
> +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> @@ -111,9 +111,6 @@ static int virtcrypto_find_vqs(struct virtio_crypto *vi)
>   ret = -ENOMEM;
>   goto err_engine;
>   }
> -
> - vi->data_vq[i].engine->cipher_one_request =
> - virtio_crypto_ablkcipher_crypt_req;
>   }
>  
>   kfree(names);
> -- 
> 2.13.6


[PATCH 2/2] crypto: omap: Improve a size determination in three functions

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 16:12:05 +0100

Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/omap-aes.c  | 3 +--
 drivers/crypto/omap-des.c  | 3 +--
 drivers/crypto/omap-sham.c | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index a2bac3b869b6..82282a5e9b3f 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -1032,14 +1032,13 @@ static int omap_aes_get_res_pdev(struct omap_aes_dev 
*dd,
 static int omap_aes_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
-   struct omap_aes_dev *dd;
struct crypto_alg *algp;
struct aead_alg *aalg;
struct resource res;
int err = -ENOMEM, i, j, irq = -1;
u32 reg;
+   struct omap_aes_dev *dd = devm_kzalloc(dev, sizeof(*dd), GFP_KERNEL);
 
-   dd = devm_kzalloc(dev, sizeof(struct omap_aes_dev), GFP_KERNEL);
if (!dd)
goto err_data;
 
diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c
index f4199be783a9..09833709fbed 100644
--- a/drivers/crypto/omap-des.c
+++ b/drivers/crypto/omap-des.c
@@ -957,13 +957,12 @@ static int omap_des_get_pdev(struct omap_des_dev *dd,
 static int omap_des_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
-   struct omap_des_dev *dd;
struct crypto_alg *algp;
struct resource *res;
int err = -ENOMEM, i, j, irq = -1;
u32 reg;
+   struct omap_des_dev *dd = devm_kzalloc(dev, sizeof(*dd), GFP_KERNEL);
 
-   dd = devm_kzalloc(dev, sizeof(struct omap_des_dev), GFP_KERNEL);
if (!dd)
goto err_data;
 
diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 7aa4eb50ebc9..ffa3ac3bde55 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -2015,14 +2015,13 @@ static int omap_sham_get_res_pdev(struct omap_sham_dev 
*dd,
 
 static int omap_sham_probe(struct platform_device *pdev)
 {
-   struct omap_sham_dev *dd;
struct device *dev = &pdev->dev;
struct resource res;
dma_cap_mask_t mask;
int err, i, j;
u32 rev;
+   struct omap_sham_dev *dd = devm_kzalloc(dev, sizeof(*dd), GFP_KERNEL);
 
-   dd = devm_kzalloc(dev, sizeof(struct omap_sham_dev), GFP_KERNEL);
if (dd == NULL) {
err = -ENOMEM;
goto data_err;
-- 
2.16.1



[PATCH 1/2] crypto: omap: Delete an error message for a failed memory allocation in three functions

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 16:00:33 +0100

Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/omap-aes.c  | 5 ++---
 drivers/crypto/omap-des.c  | 5 ++---
 drivers/crypto/omap-sham.c | 1 -
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index fbec0a2e76dd..a2bac3b869b6 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -1040,10 +1040,9 @@ static int omap_aes_probe(struct platform_device *pdev)
u32 reg;
 
dd = devm_kzalloc(dev, sizeof(struct omap_aes_dev), GFP_KERNEL);
-   if (dd == NULL) {
-   dev_err(dev, "unable to alloc data struct.\n");
+   if (!dd)
goto err_data;
-   }
+
dd->dev = dev;
platform_set_drvdata(pdev, dd);
 
diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c
index ebc5c0f11f03..f4199be783a9 100644
--- a/drivers/crypto/omap-des.c
+++ b/drivers/crypto/omap-des.c
@@ -964,10 +964,9 @@ static int omap_des_probe(struct platform_device *pdev)
u32 reg;
 
dd = devm_kzalloc(dev, sizeof(struct omap_des_dev), GFP_KERNEL);
-   if (dd == NULL) {
-   dev_err(dev, "unable to alloc data struct.\n");
+   if (!dd)
goto err_data;
-   }
+
dd->dev = dev;
platform_set_drvdata(pdev, dd);
 
diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 86b89ace836f..7aa4eb50ebc9 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -2024,7 +2024,6 @@ static int omap_sham_probe(struct platform_device *pdev)
 
dd = devm_kzalloc(dev, sizeof(struct omap_sham_dev), GFP_KERNEL);
if (dd == NULL) {
-   dev_err(dev, "unable to alloc data struct.\n");
err = -ENOMEM;
goto data_err;
}
-- 
2.16.1



[PATCH 0/2] crypto/omap: Adjustments for three function implementations

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 16:18:19 +0100

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete error messages for a failed memory allocation
  Improve size determinations

 drivers/crypto/omap-aes.c  | 8 +++-
 drivers/crypto/omap-des.c  | 8 +++-
 drivers/crypto/omap-sham.c | 4 +---
 3 files changed, 7 insertions(+), 13 deletions(-)

-- 
2.16.1



Re: [PATCH v3 2/3] MIPS: crypto: Add crc32 and crc32c hw accelerated module

2018-02-14 Thread James Hogan
Hi crypto folk,

On Fri, Feb 09, 2018 at 10:11:06PM +, James Hogan wrote:
> From: Marcin Nowakowski 
> 
> This module registers crc32 and crc32c algorithms that use the
> optional CRC32[bhwd] and CRC32C[bhwd] instructions in MIPSr6 cores.
> 
> Signed-off-by: Marcin Nowakowski 
> Signed-off-by: James Hogan 
> Cc: Ralf Baechle 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: linux-m...@linux-mips.org
> Cc: linux-crypto@vger.kernel.org

I don't think any version of this patch has had any feedback from the
crypto side. Can some review or an ack be expected?

Thanks
James

> ---
> Changes in v3:
>  - Convert to using assembler macros to support CRC instructions on
>older toolchains, using the helpers merged for 4.16. This removes the
>need to hardcode either rt or rs (i.e. as $v0 (CRC_REGISTER) and
>$at), and drops the C "register" keywords sprinkled everywhere.
>  - Minor whitespace rearrangement of _CRC32 macro.
>  - Add SPDX-License-Identifier to crc32-mips.c and the crypo Makefile.
>  - Update copyright from ImgTec to MIPS Tech, LLC.
>  - Update imgtec.com email addresses to mips.com.
> 
> Changes in v2:
>  - minor code refactoring as suggested by JamesH which produces
>a better assembly output for 32-bit builds
> ---
>  arch/mips/Kconfig |   4 +-
>  arch/mips/Makefile|   3 +-
>  arch/mips/crypto/Makefile |   6 +-
>  arch/mips/crypto/crc32-mips.c | 346 +++-
>  crypto/Kconfig|   9 +-
>  5 files changed, 368 insertions(+)
>  create mode 100644 arch/mips/crypto/Makefile
>  create mode 100644 arch/mips/crypto/crc32-mips.c
> 
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index ac0f5bb10f0b..cccd17c07bfc 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -2023,6 +2023,7 @@ config CPU_MIPSR6
>   select CPU_HAS_RIXI
>   select HAVE_ARCH_BITREVERSE
>   select MIPS_ASID_BITS_VARIABLE
> + select MIPS_CRC_SUPPORT
>   select MIPS_SPRAM
>  
>  config EVA
> @@ -2490,6 +2491,9 @@ config MIPS_ASID_BITS
>  config MIPS_ASID_BITS_VARIABLE
>   bool
>  
> +config MIPS_CRC_SUPPORT
> + bool
> +
>  #
>  # - Highmem only makes sense for the 32-bit kernel.
>  # - The current highmem code will only work properly on physically indexed
> diff --git a/arch/mips/Makefile b/arch/mips/Makefile
> index d1ca839c3981..44a6ed53d018 100644
> --- a/arch/mips/Makefile
> +++ b/arch/mips/Makefile
> @@ -222,6 +222,8 @@ xpa-cflags-y  := 
> $(mips-cflags)
>  xpa-cflags-$(micromips-ase)  += -mmicromips 
> -Wa$(comma)-fatal-warnings
>  toolchain-xpa:= $(call 
> cc-option-yn,$(xpa-cflags-y) -mxpa)
>  cflags-$(toolchain-xpa)  += -DTOOLCHAIN_SUPPORTS_XPA
> +toolchain-crc:= $(call 
> cc-option-yn,$(mips-cflags) -Wa$(comma)-mcrc)
> +cflags-$(toolchain-crc)  += -DTOOLCHAIN_SUPPORTS_CRC
>  
>  #
>  # Firmware support
> @@ -330,6 +332,7 @@ libs-y+= arch/mips/math-emu/
>  # See arch/mips/Kbuild for content of core part of the kernel
>  core-y += arch/mips/
>  
> +drivers-$(CONFIG_MIPS_CRC_SUPPORT) += arch/mips/crypto/
>  drivers-$(CONFIG_OPROFILE)   += arch/mips/oprofile/
>  
>  # suspend and hibernation support
> diff --git a/arch/mips/crypto/Makefile b/arch/mips/crypto/Makefile
> new file mode 100644
> index ..e07aca572c2e
> --- /dev/null
> +++ b/arch/mips/crypto/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for MIPS crypto files..
> +#
> +
> +obj-$(CONFIG_CRYPTO_CRC32_MIPS) += crc32-mips.o
> diff --git a/arch/mips/crypto/crc32-mips.c b/arch/mips/crypto/crc32-mips.c
> new file mode 100644
> index ..8d4122f37fa5
> --- /dev/null
> +++ b/arch/mips/crypto/crc32-mips.c
> @@ -0,0 +1,346 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * crc32-mips.c - CRC32 and CRC32C using optional MIPSr6 instructions
> + *
> + * Module based on arm64/crypto/crc32-arm.c
> + *
> + * Copyright (C) 2014 Linaro Ltd 
> + * Copyright (C) 2018 MIPS Tech, LLC
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +enum crc_op_size {
> + b, h, w, d,
> +};
> +
> +enum crc_type {
> + crc32,
> + crc32c,
> +};
> +
> +#ifndef TOOLCHAIN_SUPPORTS_CRC
> +#define _ASM_MACRO_CRC32(OP, SZ, TYPE)   
>   \
> +_ASM_MACRO_3R(OP, rt, rs, rt2,   
>   \
> + ".ifnc  \\rt, \\rt2\n\t"  \
> + ".error \"invalid operands \\\"" #OP " \\rt,\\rs,\\rt2\\\"\"\n\t" \
> + ".endif\n\t"  \
> + _ASM_INSN_IF_MIPS(0x7c0f | (__rt << 16) | (__rs << 21) |  \
> +   ((SZ) <<  6) | ((TYPE) << 8))   \
> + _ASM_INSN32_IF_MM(0x0030 | (__rs << 16) | (__rt 

[PATCH 2/2] crypto: sahara: Improve a size determination in sahara_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 14:14:05 +0100

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/sahara.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/sahara.c b/drivers/crypto/sahara.c
index 9f3cdda59139..0f2245e1af2b 100644
--- a/drivers/crypto/sahara.c
+++ b/drivers/crypto/sahara.c
@@ -1397,7 +1397,7 @@ static int sahara_probe(struct platform_device *pdev)
int err;
int i;
 
-   dev = devm_kzalloc(&pdev->dev, sizeof(struct sahara_dev), GFP_KERNEL);
+   dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
 
-- 
2.16.1



[PATCH 1/2] crypto: sahara: Delete an error message for a failed memory allocation in sahara_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 14:10:03 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/sahara.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/crypto/sahara.c b/drivers/crypto/sahara.c
index 08e7bdcaa6e3..9f3cdda59139 100644
--- a/drivers/crypto/sahara.c
+++ b/drivers/crypto/sahara.c
@@ -1398,10 +1398,8 @@ static int sahara_probe(struct platform_device *pdev)
int i;
 
dev = devm_kzalloc(&pdev->dev, sizeof(struct sahara_dev), GFP_KERNEL);
-   if (dev == NULL) {
-   dev_err(&pdev->dev, "unable to alloc data struct.\n");
+   if (!dev)
return -ENOMEM;
-   }
 
dev->device = &pdev->dev;
platform_set_drvdata(pdev, dev);
-- 
2.16.1



Re: [PATCH v2 2/6] crypto: engine - Permit to enqueue all async requests

2018-02-14 Thread Fabien DESSENNE
Adding my tested-by for the AEAD part which is new in v2


On 26/01/18 20:15, Corentin Labbe wrote:
> The crypto engine could actually only enqueue hash and ablkcipher request.
> This patch permit it to enqueue any type of crypto_async_request.
>
> Signed-off-by: Corentin Labbe 
> Tested-by: Fabien Dessenne 

Tested-by: Fabien Dessenne 


> ---
>   crypto/crypto_engine.c  | 301 
> ++--
>   include/crypto/engine.h |  68 ++-
>   2 files changed, 203 insertions(+), 166 deletions(-)
>
> diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
> index 61e7c4e02fd2..992e8d8dcdd9 100644
> --- a/crypto/crypto_engine.c
> +++ b/crypto/crypto_engine.c
> @@ -15,13 +15,50 @@
>   #include 
>   #include 
>   #include 
> -#include 
>   #include 
>   #include "internal.h"
>   
>   #define CRYPTO_ENGINE_MAX_QLEN 10
>   
>   /**
> + * crypto_finalize_request - finalize one request if the request is done
> + * @engine: the hardware engine
> + * @req: the request need to be finalized
> + * @err: error number
> + */
> +static void crypto_finalize_request(struct crypto_engine *engine,
> +  struct crypto_async_request *req, int err)
> +{
> + unsigned long flags;
> + bool finalize_cur_req = false;
> + int ret;
> + struct crypto_engine_ctx *enginectx;
> +
> + spin_lock_irqsave(&engine->queue_lock, flags);
> + if (engine->cur_req == req)
> + finalize_cur_req = true;
> + spin_unlock_irqrestore(&engine->queue_lock, flags);
> +
> + if (finalize_cur_req) {
> + enginectx = crypto_tfm_ctx(req->tfm);
> + if (engine->cur_req_prepared &&
> + enginectx->op.unprepare_request) {
> + ret = enginectx->op.unprepare_request(engine, req);
> + if (ret)
> + dev_err(engine->dev, "failed to unprepare 
> request\n");
> + }
> + spin_lock_irqsave(&engine->queue_lock, flags);
> + engine->cur_req = NULL;
> + engine->cur_req_prepared = false;
> + spin_unlock_irqrestore(&engine->queue_lock, flags);
> + }
> +
> + req->complete(req, err);
> +
> + kthread_queue_work(engine->kworker, &engine->pump_requests);
> +}
> +
> +/**
>* crypto_pump_requests - dequeue one request from engine queue to process
>* @engine: the hardware engine
>* @in_kthread: true if we are in the context of the request pump thread
> @@ -34,11 +71,10 @@ static void crypto_pump_requests(struct crypto_engine 
> *engine,
>bool in_kthread)
>   {
>   struct crypto_async_request *async_req, *backlog;
> - struct ahash_request *hreq;
> - struct ablkcipher_request *breq;
>   unsigned long flags;
>   bool was_busy = false;
> - int ret, rtype;
> + int ret;
> + struct crypto_engine_ctx *enginectx;
>   
>   spin_lock_irqsave(&engine->queue_lock, flags);
>   
> @@ -94,7 +130,6 @@ static void crypto_pump_requests(struct crypto_engine 
> *engine,
>   
>   spin_unlock_irqrestore(&engine->queue_lock, flags);
>   
> - rtype = crypto_tfm_alg_type(engine->cur_req->tfm);
>   /* Until here we get the request need to be encrypted successfully */
>   if (!was_busy && engine->prepare_crypt_hardware) {
>   ret = engine->prepare_crypt_hardware(engine);
> @@ -104,57 +139,31 @@ static void crypto_pump_requests(struct crypto_engine 
> *engine,
>   }
>   }
>   
> - switch (rtype) {
> - case CRYPTO_ALG_TYPE_AHASH:
> - hreq = ahash_request_cast(engine->cur_req);
> - if (engine->prepare_hash_request) {
> - ret = engine->prepare_hash_request(engine, hreq);
> - if (ret) {
> - dev_err(engine->dev, "failed to prepare 
> request: %d\n",
> - ret);
> - goto req_err;
> - }
> - engine->cur_req_prepared = true;
> - }
> - ret = engine->hash_one_request(engine, hreq);
> - if (ret) {
> - dev_err(engine->dev, "failed to hash one request from 
> queue\n");
> - goto req_err;
> - }
> - return;
> - case CRYPTO_ALG_TYPE_ABLKCIPHER:
> - breq = ablkcipher_request_cast(engine->cur_req);
> - if (engine->prepare_cipher_request) {
> - ret = engine->prepare_cipher_request(engine, breq);
> - if (ret) {
> - dev_err(engine->dev, "failed to prepare 
> request: %d\n",
> - ret);
> - goto req_err;
> - }
> - engine->cur_req_prepared = true;
> - }
> - ret = engine->cipher_one_request(engine, breq);
> + enginectx = crypto_t

[PATCH 0/2] crypto/sahara: Adjustments for sahara_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 14:30:28 +0100

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete an error message for a failed memory allocation
  Improve a size determination

 drivers/crypto/sahara.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

-- 
2.16.1



Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-14 Thread Stephan Mueller
Am Mittwoch, 14. Februar 2018, 06:43:53 CET schrieb Harsh Jain:

Hi Harsh,

> 
> Patch set is working fine with chelsio Driver.

Thank you.

> Do we really need IV locking mechanism for AEAD algo because AEAD algo's
> don't support Partial mode operation and Driver are not updating(atleast
> Chelsio) IV's on AEAD request completions.

Yes, I think we would need it. It is technically possible to have multiple 
IOCBs for AEAD ciphers. Even though your implementation may not write the IV 
back, others may do that. At least I do not see a guarantee that the IV is 
*not* written back by a driver.

In case your driver does not write the IV back and thus does not need to 
serialize, the driver can report CRYPTO_ALG_SERIALIZES_IV_ACCESS. In this 
case, the higher level functions would not serialize as the driver serializes 
the requests (or the driver deems it appropriate that no serialization is 
needed as is the case with your driver).

Ciao
Stephan




Re: [PATCH v3 4/4] crypto: add CRYPTO_TFM_REQ_IV_SERIALIZE flag

2018-02-14 Thread Stephan Mueller
Am Mittwoch, 14. Februar 2018, 06:50:38 CET schrieb Harsh Jain:

Hi Harsh,

> On 10-02-2018 03:34, Stephan Müller wrote:
> > Crypto drivers may implement a streamlined serialization support for AIO
> > requests that is reported by the CRYPTO_ALG_SERIALIZES_IV_ACCESS flag to
> > the crypto user. When the user decides that he wants to send multiple
> > AIO requests concurrently and wants the crypto driver to handle the
> > serialization, the caller has to set CRYPTO_TFM_REQ_IV_SERIALIZE to notify
> > the crypto driver.
> 
> Will crypto_alloc_* API takes cares of this flag?. For Kernel Crypto user IV
> Synchronization logic depends on weather tfm allocated supports IV
> Serialisation or not.

The alloc API calls are not related to this flag. This flag is set in the 
request sent to the driver. If the driver sees this flag, it shall perform its 
serialization.

The idea is the following: the driver reports CRYPTO_ALG_SERIALIZES_IV_ACCESS 
if it can serialize requests. In this case, the higher level functions (like 
AF_ALG) would not serialize the request. Now, the higher level functions must 
inform the driver that its serialization function shall be performed which 
implemented with CRYPTO_TFM_REQ_IV_SERIALIZE.

Note, the higher level functions may decide that no serialization is necessary 
(e.g. in the case the inline IV handling is followed by AF_ALG). This implies 
that the CRYPTO_TFM_REQ_IV_SERIALIZE flag would not be set even though the 
driver is capable of serializing (and thus would report 
CRYPTO_ALG_SERIALIZES_IV_ACCESS).

Ciao
Stephan




[PATCH 4/4] crypto: ux500: Delete two unnecessary variable initialisations in ux500_cryp_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 10:56:38 +0100

Two local variables will eventually be set to appropriate pointers
a bit later. Thus omit their explicit initialisation at the beginning.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/ux500/cryp/cryp_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/ux500/cryp/cryp_core.c 
b/drivers/crypto/ux500/cryp/cryp_core.c
index 7c811d7eb274..cb31b59c9d53 100644
--- a/drivers/crypto/ux500/cryp/cryp_core.c
+++ b/drivers/crypto/ux500/cryp/cryp_core.c
@@ -1404,8 +1404,8 @@ static void cryp_algs_unregister_all(void)
 static int ux500_cryp_probe(struct platform_device *pdev)
 {
int ret;
-   struct resource *res = NULL;
-   struct resource *res_irq = NULL;
+   struct resource *res;
+   struct resource *res_irq;
struct cryp_device_data *device_data;
struct cryp_protection_config prot = {
.privilege_access = CRYP_STATE_ENABLE
-- 
2.16.1



[PATCH 3/4] crypto: ux500: Adjust an error message in ux500_cryp_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 10:47:31 +0100

Replace the function name in this error message so that the same name
is mentioned according to what was called before.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/ux500/cryp/cryp_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/ux500/cryp/cryp_core.c 
b/drivers/crypto/ux500/cryp/cryp_core.c
index 07cc92f88933..7c811d7eb274 100644
--- a/drivers/crypto/ux500/cryp/cryp_core.c
+++ b/drivers/crypto/ux500/cryp/cryp_core.c
@@ -1478,7 +1478,7 @@ static int ux500_cryp_probe(struct platform_device *pdev)
}
 
if (cryp_check(device_data)) {
-   dev_err(dev, "[%s]: cryp_init() failed!", __func__);
+   dev_err(dev, "[%s]: cryp_check() failed!", __func__);
ret = -EINVAL;
goto out_power;
}
-- 
2.16.1



[PATCH 2/4] crypto: ux500: Adjust two condition checks in ux500_cryp_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 10:38:44 +0100

The local variable "cryp_error" was used only for two condition checks.

* Check the return values from these function calls directly instead.

* Delete this variable which became unnecessary with this refactoring.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/ux500/cryp/cryp_core.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/ux500/cryp/cryp_core.c 
b/drivers/crypto/ux500/cryp/cryp_core.c
index 50bfc7b4c641..07cc92f88933 100644
--- a/drivers/crypto/ux500/cryp/cryp_core.c
+++ b/drivers/crypto/ux500/cryp/cryp_core.c
@@ -1404,7 +1404,6 @@ static void cryp_algs_unregister_all(void)
 static int ux500_cryp_probe(struct platform_device *pdev)
 {
int ret;
-   int cryp_error = 0;
struct resource *res = NULL;
struct resource *res_irq = NULL;
struct cryp_device_data *device_data;
@@ -1478,15 +1477,13 @@ static int ux500_cryp_probe(struct platform_device 
*pdev)
goto out_clk_unprepare;
}
 
-   cryp_error = cryp_check(device_data);
-   if (cryp_error != 0) {
+   if (cryp_check(device_data)) {
dev_err(dev, "[%s]: cryp_init() failed!", __func__);
ret = -EINVAL;
goto out_power;
}
 
-   cryp_error = cryp_configure_protection(device_data, &prot);
-   if (cryp_error != 0) {
+   if (cryp_configure_protection(device_data, &prot)) {
dev_err(dev, "[%s]: cryp_configure_protection() failed!",
__func__);
ret = -EINVAL;
-- 
2.16.1



[PATCH 1/4] crypto: ux500: Delete an error message for a failed memory allocation in ux500_cryp_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 10:12:38 +0100

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/crypto/ux500/cryp/cryp_core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/crypto/ux500/cryp/cryp_core.c 
b/drivers/crypto/ux500/cryp/cryp_core.c
index 765f53e548ab..50bfc7b4c641 100644
--- a/drivers/crypto/ux500/cryp/cryp_core.c
+++ b/drivers/crypto/ux500/cryp/cryp_core.c
@@ -1416,7 +1416,6 @@ static int ux500_cryp_probe(struct platform_device *pdev)
dev_dbg(dev, "[%s]", __func__);
device_data = devm_kzalloc(dev, sizeof(*device_data), GFP_ATOMIC);
if (!device_data) {
-   dev_err(dev, "[%s]: kzalloc() failed!", __func__);
ret = -ENOMEM;
goto out;
}
-- 
2.16.1



[PATCH 0/4] Ux500 crypto: Adjustments for ux500_cryp_probe()

2018-02-14 Thread SF Markus Elfring
From: Markus Elfring 
Date: Wed, 14 Feb 2018 11:12:34 +0100

A few update suggestions were taken into account
from static source code analysis.

Markus Elfring (4):
  Delete an error message for a failed memory allocation
  Adjust two condition checks
  Adjust an error message
  Delete two unnecessary variable initialisations

 drivers/crypto/ux500/cryp/cryp_core.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

-- 
2.16.1