[PATCH v2] crypto: talitos: Extend max key length for SHA384/512-HMAC and AEAD
An updated patch that also handles the additional key length requirements for the AEAD algorithms. The max keysize is not 96. For SHA384/512 it's 128, and for the AEAD algorithms it's longer still. Extend the max keysize for the AEAD size for AES256 + HMAC(SHA512). Cc: <sta...@vger.kernel.org> # 3.6+ Fixes: 357fb60502ede ("crypto: talitos - add sha224, sha384 and sha512 to existing AEAD algorithms") Signed-off-by: Martin Hicks <m...@bork.org> --- drivers/crypto/talitos.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 0bba6a1..79791c6 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -816,7 +816,7 @@ static void talitos_unregister_rng(struct device *dev) * HMAC_SNOOP_NO_AFEA (HSNA) instead of type IPSEC_ESP */ #define TALITOS_CRA_PRIORITY_AEAD_HSNA (TALITOS_CRA_PRIORITY - 1) -#define TALITOS_MAX_KEY_SIZE 96 +#define TALITOS_MAX_KEY_SIZE (AES_MAX_KEY_SIZE + SHA512_BLOCK_SIZE) #define TALITOS_MAX_IV_LENGTH 16 /* max of AES_BLOCK_SIZE, DES3_EDE_BLOCK_SIZE */ struct talitos_ctx { @@ -1495,6 +1495,11 @@ static int ablkcipher_setkey(struct crypto_ablkcipher *cipher, { struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher); + if (keylen > TALITOS_MAX_KEY_SIZE) { + crypto_ablkcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN); + return -EINVAL; + } + memcpy(>key, key, keylen); ctx->keylen = keylen; -- 1.7.10.4 -- Martin Hicks P.Eng.| m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296
[PATCH] crypto: talitos: Extend max key length for SHA384/512-HMAC
The max keysize for both of these is 128, not 96. Before, with keysizes over 96, the memcpy in ahash_setkey() would overwrite memory beyond the key field. Signed-off-by: Martin Hicks <m...@bork.org> --- drivers/crypto/talitos.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 0bba6a1..97dc85e 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -816,7 +816,7 @@ static void talitos_unregister_rng(struct device *dev) * HMAC_SNOOP_NO_AFEA (HSNA) instead of type IPSEC_ESP */ #define TALITOS_CRA_PRIORITY_AEAD_HSNA (TALITOS_CRA_PRIORITY - 1) -#define TALITOS_MAX_KEY_SIZE 96 +#define TALITOS_MAX_KEY_SIZE SHA512_BLOCK_SIZE /* SHA512 has the largest keysize input */ #define TALITOS_MAX_IV_LENGTH 16 /* max of AES_BLOCK_SIZE, DES3_EDE_BLOCK_SIZE */ struct talitos_ctx { -- 1.7.10.4 -- Martin Hicks P.Eng.| m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296
Re: [PATCH 4/4] crypto: talitos - add software backlog queue handling
On Sat, Mar 14, 2015 at 1:16 PM, Horia Geantă horia.gea...@freescale.com wrote: On 3/13/2015 8:37 PM, Tom Lendacky wrote: + +/* Try to backlog request (if allowed) */ +return crypto_enqueue_request(priv-chan[ch].queue, areq); I'd remembered something about how hardware drivers should use their own list element for queuing, searched back and found this: http://marc.info/?l=linux-crypto-vgerm=137609769605139w=2 The crypto_async_request list field is used only for queuing using the crypto API functions - crypto_{enqueue,dequeue}_request. I am not sure I understand what is the problem. Are crypto_{enqueue,dequeue}_request part of *internal* crypto API? I don't believe that the enqueue/dequeue functions are internal API. I based my code on the usage of other hardware offload drivers. Their use is somewhat different because most of the other hardware can only handle a single request at a time. mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
Hi Horia, On Wed, Mar 11, 2015 at 11:48 AM, Horia Geantă horia.gea...@freescale.com wrote: While here: note that xts-talitos supports only two key lengths - 256 and 512 bits. There are tcrypt speed tests that check also for 384-bit keys (which is out-of-spec, but still...), leading to a Key Size Error - see below (KSE bit in AESU Interrupt Status Register is set) Ok. I've limited the keysize to 32 or 64 bytes for AES-XTS in the talitos driver. This was my first experiments with the tcrypt module. It also brought up another issue related to the IV limitations of this hardware. The latest patch that I have returns an error when there is a non-zero value in the second 8 bytes of the IV: + /* +* AES-XTS uses the first two AES Context registers for: +* +* Register 1: Sector Number (Little Endian) +* Register 2: Sector Size (Big Endian) +* +* Whereas AES-CBC uses registers 1/2 as a 16-byte IV. +*/ + if ((ctx-desc_hdr_template +(DESC_HDR_SEL0_MASK | DESC_HDR_MODE0_MASK)) == +(DESC_HDR_SEL0_AESU | DESC_HDR_MODE0_AESU_XTS)) { + u64 *aesctx2 = (u64 *)areq-info + 1; + + if (*aesctx2 != 0) { + dev_err(ctx-dev, + IV length limited to the first 8 bytes.); + return ERR_PTR(-EINVAL); + } + + /* Fixed sized sector */ + *aesctx2 = cpu_to_be64(1 SECTOR_SHIFT); + } This approach causes the tcrypt tests to fail because tcrypt sets all 16 bytes of the IV to 0xff. I think returning an error is the right approach for the talitos module, but it would be nice if tcrypt still worked. Should tcrypt just set the IV bytes to 0 instead of 0xff? Isn't one IV just as good as another? I think adding exceptions to the tcrypt code would be ugly, but maybe one should be made for XTS since the standard dictates that the IV should be plain or plain64? Thanks, mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 9, 2015 at 6:16 AM, Horia Geantă horia.gea...@freescale.com wrote: On 3/3/2015 7:44 PM, Martin Hicks wrote: On Tue, Mar 3, 2015 at 10:44 AM, Horia Geantă horia.gea...@freescale.com wrote: For talitos, there are two cases: 1. request data size is = data unit / sector size talitos can handle any IV / tweak scheme 2. request data size sector size since talitos internally generates the IV for the next sector by incrementing the previous IV, only IV schemes that allocate consecutive IV to consecutive sectors will function correctly. it's not clear to me that #1 is right. I guess it could be, but the IV length would be limited to 8 bytes. Yes, there's a limitation in talitos wrt. XTS IV / tweak size - it's up to 8 bytes. So I guess ESSIV won't work with talitos-xts, since the encrypted IV output is 16 bytes. But as previously said, ESSIV breaks the XTS standard requirement for having a consecutive IV for consecutive blocks. ESSIV should really be used only with disk-level encryption schemes that require an unpredictable IV. Ok. I'll verify that the second half of the IV is zeroed. One last thing that I'm not sure of is what string to place in cra_ablkcipher.geniv field. eseqiv seems wrong if plain/plain64 are the IVs that XTS is designed for. Thanks, mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: talitos: Add AES-XTS Support
On Thu, Mar 5, 2015 at 7:16 PM, Kim Phillips kim.phill...@freescale.com wrote: On Fri, 20 Feb 2015 12:00:10 -0500 Martin Hicks m...@bork.org wrote: The newer talitos hardware has support for AES in XTS mode. Assuming it's the same thing, AES-XCBC gets added with SEC v3.0 h/w. Assuming hw_supports() doesn't already support this algorithm AES-XCBC isn't the same thing as AES-XTS. combination (technically via the mode bit), this needs to be reflected in the patch so the driver doesn't think SEC 2.x devices can do XTS, too. Right. I hadn't looked into how exactly hw_supports() works. It only indicates which execution units are present (in this case the AES unit). I actually think XTS gets introduced in SEC v3.3.2. I also have an MPC8379 (sec3.3) and it does not have XTS. Can you look internally to find out in which hardware it was introduced? Is there a SEC 3.3.1 that also has XTS? I guess I just need a -features flag to conditionally register the XTS algorithm for 3.3.x and newer. Adding anything more complicated doesn't seem warranted at this time, although that could change if someone came along and needed to add a whole whack more of the AES modes that were introduced at various times in the different SEC revisions. + .desc_hdr_template = DESC_HDR_TYPE_COMMON_NONSNOOP_NO_AFEU | + DESC_HDR_SEL0_AESU | + DESC_HDR_MODE0_AESU_XTS, ... /* primary execution unit mode (MODE0) and derivatives */ #define DESC_HDR_MODE0_ENCRYPT cpu_to_be32(0x0010) #define DESC_HDR_MODE0_AESU_CBC cpu_to_be32(0x0020) +#define DESC_HDR_MODE0_AESU_XTS cpu_to_be32(0x0420) I'd prefer these defines be kept as single bit definitions, and keep their names from the manual. An XTS-specific definition can be composed from them either after this, or manually in the desc_hdr_template assignment above. It doesn't look like there are divisible single-bit definitions for the MODE0 bits. The AES Cipher modes are composed of 5 bits called CipherMode, Extended CipherMode and Aux2. Individually they don't seem to mean anything. Unless you want something like: #define AES_MODE(cm, ecm, aux2) cpu_to_be32(((ecm6) | (aux25) | (cm1)) 20) #define DESC_HDR_MODE0_AESU_CBC AES_MODE(1, 0, 0) #define DESC_HDR_MODE0_AESU_XTS AES_MODE(1, 1, 0) Thanks, mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5] crypto: talitos: Fix off-by-one and use all hardware slots
Ok, I'm fine dropping this patch. I'm sure it doesn't affect performance in a measurable way. mh On Tue, Mar 3, 2015 at 7:35 PM, Kim Phillips kim.phill...@freescale.com wrote: On Tue, 3 Mar 2015 08:21:35 -0500 Martin Hicks m...@bork.org wrote: The submission count was off by one. Signed-off-by: Martin Hicks m...@bork.org --- sadly, this directly contradicts: commit 4b24ea971a93f5d0bec34bf7bfd0939f70cfaae6 Author: Vishnu Suresh vis...@freescale.com Date: Mon Oct 20 21:06:18 2008 +0800 crypto: talitos - Preempt overflow interrupts off-by-one fix My guess is your request submission pattern differs from that of Vishnu's (probably IPSec and/or tcrypt), or later h/w versions have gotten better about dealing with channel near-overflow conditions. Either way, I'd prefer we not do this: it might break others, and I'm guessing doesn't improve performance _that_ much? If it does, we could risk it and restrict it to SEC versions 3.3 and above maybe? Not sure what to do here exactly, barring digging up and old 2.x SEC and testing. Kim p.s. I checked, Vishnu isn't with Freescale anymore, so I can't cc him. -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE
This is properly defined in the md5 header file. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index c49d977..89cf4d5 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -637,8 +637,6 @@ static void talitos_unregister_rng(struct device *dev) #define TALITOS_MAX_KEY_SIZE 96 #define TALITOS_MAX_IV_LENGTH 16 /* max of AES_BLOCK_SIZE, DES3_EDE_BLOCK_SIZE */ -#define MD5_BLOCK_SIZE64 - struct talitos_ctx { struct device *dev; int ch; @@ -2195,7 +2193,7 @@ static struct talitos_alg_template driver_algs[] = { .halg.base = { .cra_name = md5, .cra_driver_name = md5-talitos, - .cra_blocksize = MD5_BLOCK_SIZE, + .cra_blocksize = MD5_HMAC_BLOCK_SIZE, .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC, } @@ -2285,7 +2283,7 @@ static struct talitos_alg_template driver_algs[] = { .halg.base = { .cra_name = hmac(md5), .cra_driver_name = hmac-md5-talitos, - .cra_blocksize = MD5_BLOCK_SIZE, + .cra_blocksize = MD5_HMAC_BLOCK_SIZE, .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC, } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/5] crypto: talitos: Add crypto async queue handling
I was testing dm-crypt performance with a Freescale P1022 board with a recent kernel and was getting IO errors while doing testing with LUKS. Investigation showed that all hardware FIFO slots were filling and the driver was returning EAGAIN to the block layer, which is not an expected response for an async crypto implementation. The following patch series adds a few small fixes, and reworks the submission path to use the crypto_queue mechanism to handle the request backlog. Changes since v1: - Ran checkpatch.pl - Split the path for submitting new requests vs. issuing backlogged requests. - Avoid enqueuing a submitted request to the crypto queue unnecessarily. - Fix return paths where CRYPTO_TFM_REQ_MAY_BACKLOG is not set. Martin Hicks (5): crypto: talitos: Simplify per-channel initialization crypto: talitos: Remove MD5_BLOCK_SIZE crypto: talitos: Fix off-by-one and use all hardware slots crypto: talitos: Reorganize request submission data structures crypto: talitos: Add software backlog queue handling drivers/crypto/talitos.c | 240 +++--- drivers/crypto/talitos.h | 44 +++-- 2 files changed, 177 insertions(+), 107 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/5] crypto: talitos: Simplify per-channel initialization
There were multiple loops in a row, for each separate step of the initialization of the channels. Simplify to a single loop. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 067ec21..c49d977 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2706,20 +2706,16 @@ static int talitos_probe(struct platform_device *ofdev) goto err_out; } + priv-fifo_len = roundup_pow_of_two(priv-chfifo_len); + for (i = 0; i priv-num_channels; i++) { priv-chan[i].reg = priv-reg + TALITOS_CH_STRIDE * (i + 1); if (!priv-irq[1] || !(i 1)) priv-chan[i].reg += TALITOS_CH_BASE_OFFSET; - } - for (i = 0; i priv-num_channels; i++) { spin_lock_init(priv-chan[i].head_lock); spin_lock_init(priv-chan[i].tail_lock); - } - priv-fifo_len = roundup_pow_of_two(priv-chfifo_len); - - for (i = 0; i priv-num_channels; i++) { priv-chan[i].fifo = kzalloc(sizeof(struct talitos_request) * priv-fifo_len, GFP_KERNEL); if (!priv-chan[i].fifo) { @@ -2727,11 +2723,10 @@ static int talitos_probe(struct platform_device *ofdev) err = -ENOMEM; goto err_out; } - } - for (i = 0; i priv-num_channels; i++) atomic_set(priv-chan[i].submit_count, -(priv-chfifo_len - 1)); + } dma_set_mask(dev, DMA_BIT_MASK(36)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/5] crypto: talitos: Fix off-by-one and use all hardware slots
The submission count was off by one. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 89cf4d5..7709805 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2722,8 +2722,7 @@ static int talitos_probe(struct platform_device *ofdev) goto err_out; } - atomic_set(priv-chan[i].submit_count, - -(priv-chfifo_len - 1)); + atomic_set(priv-chan[i].submit_count, -priv-chfifo_len); } dma_set_mask(dev, DMA_BIT_MASK(36)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/5] crypto: talitos: Reorganize request submission data structures
This is preparatory work for moving to using the crypto async queue handling code. A talitos_request structure is buried into each talitos_edesc so that when talitos_submit() is called, everything required to defer the submission to the hardware is contained within talitos_edesc. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 95 +++--- drivers/crypto/talitos.h | 41 +--- 2 files changed, 66 insertions(+), 70 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 7709805..b0c85ce 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -186,22 +186,16 @@ static int init_device(struct device *dev) * talitos_submit - submits a descriptor to the device for processing * @dev: the SEC device to be used * @ch:the SEC device channel to be used - * @desc: the descriptor to be processed by the device - * @callback: whom to call when processing is complete - * @context: a handle for use by caller (optional) + * @edesc: the descriptor to be processed by the device * * desc must contain valid dma-mapped (bus physical) address pointers. * callback must check err and feedback in descriptor header * for device processing status. */ -int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc, - void (*callback)(struct device *dev, - struct talitos_desc *desc, - void *context, int error), - void *context) +int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request; + struct talitos_request *request = edesc-req; unsigned long flags; int head; @@ -214,19 +208,15 @@ int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc, } head = priv-chan[ch].head; - request = priv-chan[ch].fifo[head]; - - /* map descriptor and save caller data */ - request-dma_desc = dma_map_single(dev, desc, sizeof(*desc), + request-dma_desc = dma_map_single(dev, request-desc, + sizeof(*request-desc), DMA_BIDIRECTIONAL); - request-callback = callback; - request-context = context; /* increment fifo head */ priv-chan[ch].head = (priv-chan[ch].head + 1) (priv-fifo_len - 1); smp_wmb(); - request-desc = desc; + priv-chan[ch].fifo[head] = request; /* GO! */ wmb(); @@ -247,15 +237,16 @@ EXPORT_SYMBOL(talitos_submit); static void flush_channel(struct device *dev, int ch, int error, int reset_ch) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request, saved_req; + struct talitos_request *request; unsigned long flags; int tail, status; spin_lock_irqsave(priv-chan[ch].tail_lock, flags); tail = priv-chan[ch].tail; - while (priv-chan[ch].fifo[tail].desc) { - request = priv-chan[ch].fifo[tail]; + while (priv-chan[ch].fifo[tail]) { + request = priv-chan[ch].fifo[tail]; + status = 0; /* descriptors with their done bits set don't get the error */ rmb(); @@ -271,14 +262,9 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) sizeof(struct talitos_desc), DMA_BIDIRECTIONAL); - /* copy entries so we can call callback outside lock */ - saved_req.desc = request-desc; - saved_req.callback = request-callback; - saved_req.context = request-context; - /* release request entry in fifo */ smp_wmb(); - request-desc = NULL; + priv-chan[ch].fifo[tail] = NULL; /* increment fifo tail */ priv-chan[ch].tail = (tail + 1) (priv-fifo_len - 1); @@ -287,8 +273,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) atomic_dec(priv-chan[ch].submit_count); - saved_req.callback(dev, saved_req.desc, saved_req.context, - status); + request-callback(dev, request-desc, request-context, status); + /* channel may resume processing in single desc error case */ if (error !reset_ch status == error) return; @@ -352,7 +338,8 @@ static u32 current_desc_hdr(struct device *dev, int ch) tail = priv-chan[ch].tail; iter = tail; - while (priv-chan[ch].fifo[iter].dma_desc != cur_desc) { + while (priv-chan[ch].fifo[iter] + priv-chan[ch].fifo
[PATCH v2 5/5] crypto: talitos: Add software backlog queue handling
I was running into situations where the hardware FIFO was filling up, and the code was returning EAGAIN to dm-crypt and just dropping the submitted crypto request. This adds support in talitos for a software backlog queue. When requests can't be queued to the hardware immediately EBUSY is returned. The queued requests are dispatched to the hardware in received order as hardware FIFO slots become available. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 135 -- drivers/crypto/talitos.h |3 ++ 2 files changed, 110 insertions(+), 28 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index b0c85ce..bb7fba0 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -182,55 +182,118 @@ static int init_device(struct device *dev) return 0; } -/** - * talitos_submit - submits a descriptor to the device for processing - * @dev: the SEC device to be used - * @ch:the SEC device channel to be used - * @edesc: the descriptor to be processed by the device - * - * desc must contain valid dma-mapped (bus physical) address pointers. - * callback must check err and feedback in descriptor header - * for device processing status. - */ -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) +/* Dispatch 'request' if provided, otherwise a backlogged request */ +static int __talitos_handle_queue(struct device *dev, int ch, + struct talitos_edesc *edesc, + unsigned long *irq_flags) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request = edesc-req; - unsigned long flags; + struct talitos_request *request; + struct crypto_async_request *areq; int head; + int ret = -EINPROGRESS; - spin_lock_irqsave(priv-chan[ch].head_lock, flags); - - if (!atomic_inc_not_zero(priv-chan[ch].submit_count)) { + if (!atomic_inc_not_zero(priv-chan[ch].submit_count)) /* h/w fifo is full */ - spin_unlock_irqrestore(priv-chan[ch].head_lock, flags); - return -EAGAIN; + return -EBUSY; + + if (!edesc) { + /* Dequeue the oldest request */ + areq = crypto_dequeue_request(priv-chan[ch].queue); + request = container_of(areq, struct talitos_request, base); + } else { + request = edesc-req; } - head = priv-chan[ch].head; request-dma_desc = dma_map_single(dev, request-desc, sizeof(*request-desc), DMA_BIDIRECTIONAL); /* increment fifo head */ + head = priv-chan[ch].head; priv-chan[ch].head = (priv-chan[ch].head + 1) (priv-fifo_len - 1); - smp_wmb(); - priv-chan[ch].fifo[head] = request; + spin_unlock_irqrestore(priv-chan[ch].head_lock, *irq_flags); + + /* +* Mark a backlogged request as in-progress. +*/ + if (!edesc) { + areq = request-context; + areq-complete(areq, -EINPROGRESS); + } + + spin_lock_irqsave(priv-chan[ch].head_lock, *irq_flags); /* GO! */ + priv-chan[ch].fifo[head] = request; wmb(); out_be32(priv-chan[ch].reg + TALITOS_FF, upper_32_bits(request-dma_desc)); out_be32(priv-chan[ch].reg + TALITOS_FF_LO, lower_32_bits(request-dma_desc)); + return ret; +} + +/** + * talitos_submit - performs submissions of a new descriptors + * + * @dev: the SEC device to be used + * @ch:the SEC device channel to be used + * @edesc: the request to be processed by the device + * + * edesc-req must contain valid dma-mapped (bus physical) address pointers. + * callback must check err and feedback in descriptor header + * for device processing status upon completion. + */ +int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) +{ + struct talitos_private *priv = dev_get_drvdata(dev); + struct talitos_request *request = edesc-req; + unsigned long flags; + int ret = -EINPROGRESS; + + spin_lock_irqsave(priv-chan[ch].head_lock, flags); + + if (priv-chan[ch].queue.qlen) { + /* +* There are backlogged requests. Just queue this new request +* and dispatch the oldest backlogged request to the hardware. +*/ + crypto_enqueue_request(priv-chan[ch].queue, + request-base); + __talitos_handle_queue(dev, ch, NULL, flags); + ret = -EBUSY; + } else { + ret = __talitos_handle_queue(dev, ch, edesc, flags); + if (ret == -EBUSY) + /* Hardware FIFO is full
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Tue, Mar 3, 2015 at 10:44 AM, Horia Geantă horia.gea...@freescale.com wrote: On 3/3/2015 12:09 AM, Martin Hicks wrote: On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote: If crypto API allows to encrypt more sectors in one run (handling IV internally) dmcrypt can be modified of course. But do not forget we can use another IV (not only sequential number) e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people are using it). Interesting, I'd not considered using XTS with an IV other than plain/64. The talitos hardware would not support aes/xts in any mode other than plain/plain64 I don't think...Although perhaps you could push in an 8-byte IV and the hardware would interpret it as the sector #. For talitos, there are two cases: 1. request data size is = data unit / sector size talitos can handle any IV / tweak scheme 2. request data size sector size since talitos internally generates the IV for the next sector by incrementing the previous IV, only IV schemes that allocate consecutive IV to consecutive sectors will function correctly. it's not clear to me that #1 is right. I guess it could be, but the IV length would be limited to 8 bytes. This also points out that claiming that the XTS IV size is 16 bytes, as my current patch does, could be problematic. It's handy because the first 8 bytes should contain a plain64 sector #, and the second u64 can be used to encode the sector size but it would be a mistake for someone to use the second 8 bytes for the rest of a 16byte IV. mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote: If crypto API allows to encrypt more sectors in one run (handling IV internally) dmcrypt can be modified of course. But do not forget we can use another IV (not only sequential number) e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people are using it). Interesting, I'd not considered using XTS with an IV other than plain/64. The talitos hardware would not support aes/xts in any mode other than plain/plain64 I don't think...Although perhaps you could push in an 8-byte IV and the hardware would interpret it as the sector #. Maybe the following question would be if the dmcrypt sector IV algorithms should moved into crypto API as well. (But because I misused dmcrypt IVs hooks for some additional operations for loopAES and old Truecrypt CBC mode, it is not so simple...) Speaking again with talitos in mind, there would be no advantage for this hardware. Although larger requests are possible only a single IV can be provided per request, so for algorithms like AES-CBC and dm-crypt 512byte IOs are the only option (short of switching to 4kB block size). mh -- Martin Hicks P.Eng.| m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 02, 2015 at 03:25:56PM +0200, Horia Geantă wrote: On 2/20/2015 7:00 PM, Martin Hicks wrote: This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2. One of the nice things about this hardware is that it knows how to deal with encrypt/decrypt requests that are larger than sector size, but that also requires that that the sector size be passed into the crypto engine as an XTS cipher context parameter. When a request is larger than the sector size the sector number is incremented by the talitos engine and the tweak key is re-calculated for the new sector. I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit and 256bit) to ensure interoperability with the software AES-XTS implementation. All testing was done using dm-crypt/LUKS with aes-xts-plain64. Is there a better solution that just hard coding the sector size to (1SECTOR_SHIFT)? Maybe dm-crypt should be modified to pass the sector size along with the plain/plain64 IV to an XTS algorithm? AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not aware of a sector size (data unit size in IEEE P1619 terminology): There's a hidden assumption that all the data send to xts in one request belongs to a single sector. Even more, it's supposed that the first 16-byte block in the request is block 0 in the sector. These can be seen from the way the tweak (T) value is computed. (Side note: there's no support of ciphertext stealing in crypto/xts.c - i.e. sector sizes must be a multiple of underlying block cipher size - that is 16B.) If dm-crypt would be modified to pass sector size somehow, all in-kernel xts implementations would have to be made aware of the change. I have nothing against this, but let's see what crypto maintainers have to say... Right. Additionally, there may be some requirement for the encryption implementation to broadcast the maximum size that can be handled in a single request. For example Talitos could handle XTS encrypt/decrypt requests of up to 64kB (regardless of the block device's sector size). BTW, there were some discussions back in 2013 wrt. being able to configure / increase sector size, smth. crypto engines would benefit from: http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html (experimental patch) http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html The experimental patch sends sector size as the req-nbytes - hidden assumption: data size sent in an xts crypto request equals a sector. I found this last week, and used it as a starting point for some testing. I modified it to keep the underlying sector size of the dm-crypt mapping as 512byte, but allowed the code to combine requests in IOs up to 4kB. Doing greater request sizes would require allocating additional pages...I plan to implement that to see how much extra performance can be squeezed out. patch below... With regards to performance, with my low-powered Freescale P1022 board, I see performance numbers like this on ext4, as measured by bonnie++. Write (MB/s)Read (MB/s) Unencrypted 140 176 aes-xts-plain64 512b113 115 aes-xts-plain64 4kB 71 56 The more detailed bonnie++ output is here: http://www.bork.org/~mort/dm-crypt-enc-blksize.html The larger IO sizes is a huge win for this board. The patch I'm using to send IOs up to 4kB to talitos follows. Thanks, mh diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 08981be..88e95b5 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -42,6 +42,7 @@ struct convert_context { struct bvec_iter iter_out; sector_t cc_sector; atomic_t cc_pending; + unsigned int block_size; struct ablkcipher_request *req; }; @@ -142,6 +143,8 @@ struct crypt_config { sector_t iv_offset; unsigned int iv_size; + unsigned int block_size; + /* ESSIV: struct crypto_cipher *essiv_tfm */ void *iv_private; struct crypto_ablkcipher **tfms; @@ -801,10 +804,17 @@ static void crypt_convert_init(struct crypt_config *cc, { ctx-bio_in = bio_in; ctx-bio_out = bio_out; - if (bio_in) + ctx-block_size = 0; + if (bio_in) { ctx-iter_in = bio_in-bi_iter; - if (bio_out) + ctx-block_size = max(ctx-block_size, bio_cur_bytes(bio_in)); + } + if (bio_out) { ctx-iter_out = bio_out-bi_iter; + ctx-block_size = max(ctx-block_size, bio_cur_bytes(bio_out)); + } + if (ctx-block_size cc-block_size) + ctx-block_size = cc-block_size; ctx-cc_sector = sector + cc-iv_offset; init_completion(ctx-restart); } @@ -844,15 +854,15 @@ static int crypt_convert_block(struct crypt_config *cc, dmreq-iv_sector = ctx-cc_sector; dmreq-ctx = ctx; sg_init_table(dmreq-sg_in, 1
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 02, 2015 at 04:44:19PM -0500, Martin Hicks wrote: Write (MB/s)Read (MB/s) Unencrypted 140 176 aes-xts-plain64 512b 113 115 aes-xts-plain64 4kB 71 56 I got the two AES lines backwards. Sorry about that. mh -- Martin Hicks P.Eng.| m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] crypto: talitos: Add software backlog queue handling
Hi Horia, On Tue, Feb 24, 2015 at 1:21 PM, Horia Geantă horia.gea...@freescale.com wrote: On 2/20/2015 6:21 PM, Martin Hicks wrote: + int ret = -EINPROGRESS; spin_lock_irqsave(priv-chan[ch].head_lock, flags); + if (edesc) { + orig_request = edesc-req; + crypto_enqueue_request(priv-chan[ch].queue, orig_request-base); + } The request goes through the SW queue even if there are empty slots in the HW queue, doing unnecessary crypto_queue_encrypt() and crypto_queue_decrypt(). Trying to use the HW queue first would be better. Definitely a valid point. In trying to reorganize this it really complicates things. Splitting the submit from issuing requests from the backlog seems to make it much more straightforward. @@ -1170,6 +1211,8 @@ static struct talitos_edesc *talitos_edesc_alloc(struct device *dev, edesc-dma_len, DMA_BIDIRECTIONAL); edesc-req.desc = edesc-desc; + /* A copy of the crypto_async_request to use the crypto_queue backlog */ + memcpy(edesc-req.base, areq, sizeof(struct crypto_async_request)); Why not have a struct crypto_async_request *base; instead of struct crypto_async_request base; in talitos_request structure? This way: 1. memcopy above is avoided 2. talitos_request structure gets smaller - remember that talitos_request is now embedded in talitos_edesc structure, which gets kmalloc-ed for every request The trouble I ran into was that after a request is backlogged and put on the crypto queue, I couldn't see how else I could go from the crypto_async_request that is pulled from the queue back to the talitos_request that is needed in order put the pointer into the hardware FIFO. This is the one issue from you review that I didn't address in v2 of the patch. Thanks for the review. Not releasing / re-acquiring the spinlock while trying to flush the backlog really improved performance. mh -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] crypto: talitos: Add software backlog queue handling
I was running into situations where the hardware FIFO was filling up, and the code was returning EAGAIN to dm-crypt and just dropping the submitted crypto request. This adds support in talitos for a software backlog queue. When requests can't be queued to the hardware immediately EBUSY is returned. The queued requests are dispatched to the hardware in received order as hardware FIFO slots become available. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 92 +++--- drivers/crypto/talitos.h |3 ++ 2 files changed, 74 insertions(+), 21 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index d3472be..226654c 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -183,43 +183,72 @@ static int init_device(struct device *dev) } /** - * talitos_submit - submits a descriptor to the device for processing + * talitos_handle_queue - performs submissions either of new descriptors + *or ones waiting in the queue backlog. * @dev: the SEC device to be used * @ch:the SEC device channel to be used - * @edesc: the descriptor to be processed by the device - * @context: a handle for use by caller (optional) + * @edesc: the descriptor to be processed by the device (optional) * * desc must contain valid dma-mapped (bus physical) address pointers. * callback must check err and feedback in descriptor header - * for device processing status. + * for device processing status upon completion. */ -int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) +int talitos_handle_queue(struct device *dev, int ch, struct talitos_edesc *edesc) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request = edesc-req; + struct talitos_request *request, *orig_request = NULL; + struct crypto_async_request *async_req; unsigned long flags; int head; + int ret = -EINPROGRESS; spin_lock_irqsave(priv-chan[ch].head_lock, flags); + if (edesc) { + orig_request = edesc-req; + crypto_enqueue_request(priv-chan[ch].queue, orig_request-base); + } + +flush_another: + if (priv-chan[ch].queue.qlen == 0) { + spin_unlock_irqrestore(priv-chan[ch].head_lock, flags); + return 0; + } + if (!atomic_inc_not_zero(priv-chan[ch].submit_count)) { /* h/w fifo is full */ spin_unlock_irqrestore(priv-chan[ch].head_lock, flags); - return -EAGAIN; + return -EBUSY; } - head = priv-chan[ch].head; + /* Dequeue the oldest request */ + async_req = crypto_dequeue_request(priv-chan[ch].queue); + + request = container_of(async_req, struct talitos_request, base); request-dma_desc = dma_map_single(dev, request-desc, sizeof(*request-desc), DMA_BIDIRECTIONAL); /* increment fifo head */ + head = priv-chan[ch].head; priv-chan[ch].head = (priv-chan[ch].head + 1) (priv-fifo_len - 1); - smp_wmb(); - priv-chan[ch].fifo[head] = request; + spin_unlock_irqrestore(priv-chan[ch].head_lock, flags); + + /* +* Mark a backlogged request as in-progress, return EBUSY because +* the original request that was submitted is backlogged. +*/ + if (request != orig_request) { + struct crypto_async_request *areq = request-context; + areq-complete(areq, -EINPROGRESS); + ret = -EBUSY; + } + + spin_lock_irqsave(priv-chan[ch].head_lock, flags); /* GO! */ + priv-chan[ch].fifo[head] = request; wmb(); out_be32(priv-chan[ch].reg + TALITOS_FF, upper_32_bits(request-dma_desc)); @@ -228,9 +257,18 @@ int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) spin_unlock_irqrestore(priv-chan[ch].head_lock, flags); - return -EINPROGRESS; + /* +* When handling the queue via the completion path, queue more +* requests if the hardware has room. +*/ + if (!edesc) { + spin_lock_irqsave(priv-chan[ch].head_lock, flags); + goto flush_another; + } + + return ret; } -EXPORT_SYMBOL(talitos_submit); +EXPORT_SYMBOL(talitos_handle_queue); /* * process what was done, notify callback of error if not @@ -284,6 +322,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) } spin_unlock_irqrestore(priv-chan[ch].tail_lock, flags); + + talitos_handle_queue(dev, ch, NULL); } /* @@ -1038,8 +1078,8 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, edesc-req.callback = callback; edesc
[PATCH 0/5] crypto: talitos: Add crypto async queue handling
I was testing dm-crypt performance with a Freescale P1022 board with a recent kernel and was getting IO errors while doing testing with LUKS. Investigation showed that all hardware FIFO slots were filling and the driver was returning EAGAIN to the block layer, which is not an expected response for an async crypto implementation. The following patch series adds a few small fixes, and reworks the submission path to use the crypto_queue mechanism to handle the request backlog. Martin Hicks (5): crypto: talitos: Simplify per-channel initialization crypto: talitos: Remove MD5_BLOCK_SIZE crypto: talitos: Fix off-by-one and use all hardware slots crypto: talitos: Reorganize request submission data structures crypto: talitos: Add software backlog queue handling drivers/crypto/talitos.c | 189 -- drivers/crypto/talitos.h | 44 +-- 2 files changed, 137 insertions(+), 96 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] crypto: talitos: Simplify per-channel initialization
There were multiple loops in a row, for each separate step of the initialization of the channels. Simplify to a single loop. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 067ec21..c49d977 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2706,20 +2706,16 @@ static int talitos_probe(struct platform_device *ofdev) goto err_out; } + priv-fifo_len = roundup_pow_of_two(priv-chfifo_len); + for (i = 0; i priv-num_channels; i++) { priv-chan[i].reg = priv-reg + TALITOS_CH_STRIDE * (i + 1); if (!priv-irq[1] || !(i 1)) priv-chan[i].reg += TALITOS_CH_BASE_OFFSET; - } - for (i = 0; i priv-num_channels; i++) { spin_lock_init(priv-chan[i].head_lock); spin_lock_init(priv-chan[i].tail_lock); - } - priv-fifo_len = roundup_pow_of_two(priv-chfifo_len); - - for (i = 0; i priv-num_channels; i++) { priv-chan[i].fifo = kzalloc(sizeof(struct talitos_request) * priv-fifo_len, GFP_KERNEL); if (!priv-chan[i].fifo) { @@ -2727,11 +2723,10 @@ static int talitos_probe(struct platform_device *ofdev) err = -ENOMEM; goto err_out; } - } - for (i = 0; i priv-num_channels; i++) atomic_set(priv-chan[i].submit_count, -(priv-chfifo_len - 1)); + } dma_set_mask(dev, DMA_BIT_MASK(36)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] crypto: talitos: Remove MD5_BLOCK_SIZE
This is properly defined in the md5 header file. --- drivers/crypto/talitos.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index c49d977..89cf4d5 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -637,8 +637,6 @@ static void talitos_unregister_rng(struct device *dev) #define TALITOS_MAX_KEY_SIZE 96 #define TALITOS_MAX_IV_LENGTH 16 /* max of AES_BLOCK_SIZE, DES3_EDE_BLOCK_SIZE */ -#define MD5_BLOCK_SIZE64 - struct talitos_ctx { struct device *dev; int ch; @@ -2195,7 +2193,7 @@ static struct talitos_alg_template driver_algs[] = { .halg.base = { .cra_name = md5, .cra_driver_name = md5-talitos, - .cra_blocksize = MD5_BLOCK_SIZE, + .cra_blocksize = MD5_HMAC_BLOCK_SIZE, .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC, } @@ -2285,7 +2283,7 @@ static struct talitos_alg_template driver_algs[] = { .halg.base = { .cra_name = hmac(md5), .cra_driver_name = hmac-md5-talitos, - .cra_blocksize = MD5_BLOCK_SIZE, + .cra_blocksize = MD5_HMAC_BLOCK_SIZE, .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC, } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] crypto: talitos: Fix off-by-one and use all hardware slots
The submission count was off by one. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 89cf4d5..7709805 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2722,8 +2722,7 @@ static int talitos_probe(struct platform_device *ofdev) goto err_out; } - atomic_set(priv-chan[i].submit_count, - -(priv-chfifo_len - 1)); + atomic_set(priv-chan[i].submit_count, -priv-chfifo_len); } dma_set_mask(dev, DMA_BIT_MASK(36)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] crypto: talitos: Reorganize request submission data structures
This is preparatory work for moving to using the crypto async queue handling code. A talitos_request structure is buried into each talitos_edesc so that when talitos_submit() is called, everything required to defer the submission to the hardware is contained within talitos_edesc. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 93 +++--- drivers/crypto/talitos.h | 41 +--- 2 files changed, 65 insertions(+), 69 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 7709805..d3472be 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -186,22 +186,17 @@ static int init_device(struct device *dev) * talitos_submit - submits a descriptor to the device for processing * @dev: the SEC device to be used * @ch:the SEC device channel to be used - * @desc: the descriptor to be processed by the device - * @callback: whom to call when processing is complete + * @edesc: the descriptor to be processed by the device * @context: a handle for use by caller (optional) * * desc must contain valid dma-mapped (bus physical) address pointers. * callback must check err and feedback in descriptor header * for device processing status. */ -int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc, - void (*callback)(struct device *dev, - struct talitos_desc *desc, - void *context, int error), - void *context) +int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edesc) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request; + struct talitos_request *request = edesc-req; unsigned long flags; int head; @@ -214,19 +209,15 @@ int talitos_submit(struct device *dev, int ch, struct talitos_desc *desc, } head = priv-chan[ch].head; - request = priv-chan[ch].fifo[head]; - - /* map descriptor and save caller data */ - request-dma_desc = dma_map_single(dev, desc, sizeof(*desc), + request-dma_desc = dma_map_single(dev, request-desc, + sizeof(*request-desc), DMA_BIDIRECTIONAL); - request-callback = callback; - request-context = context; /* increment fifo head */ priv-chan[ch].head = (priv-chan[ch].head + 1) (priv-fifo_len - 1); smp_wmb(); - request-desc = desc; + priv-chan[ch].fifo[head] = request; /* GO! */ wmb(); @@ -247,15 +238,16 @@ EXPORT_SYMBOL(talitos_submit); static void flush_channel(struct device *dev, int ch, int error, int reset_ch) { struct talitos_private *priv = dev_get_drvdata(dev); - struct talitos_request *request, saved_req; + struct talitos_request *request; unsigned long flags; int tail, status; spin_lock_irqsave(priv-chan[ch].tail_lock, flags); tail = priv-chan[ch].tail; - while (priv-chan[ch].fifo[tail].desc) { - request = priv-chan[ch].fifo[tail]; + while (priv-chan[ch].fifo[tail]) { + request = priv-chan[ch].fifo[tail]; + status = 0; /* descriptors with their done bits set don't get the error */ rmb(); @@ -271,14 +263,9 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) sizeof(struct talitos_desc), DMA_BIDIRECTIONAL); - /* copy entries so we can call callback outside lock */ - saved_req.desc = request-desc; - saved_req.callback = request-callback; - saved_req.context = request-context; - /* release request entry in fifo */ smp_wmb(); - request-desc = NULL; + priv-chan[ch].fifo[tail] = NULL; /* increment fifo tail */ priv-chan[ch].tail = (tail + 1) (priv-fifo_len - 1); @@ -287,8 +274,8 @@ static void flush_channel(struct device *dev, int ch, int error, int reset_ch) atomic_dec(priv-chan[ch].submit_count); - saved_req.callback(dev, saved_req.desc, saved_req.context, - status); + request-callback(dev, request-desc, request-context, status); + /* channel may resume processing in single desc error case */ if (error !reset_ch status == error) return; @@ -352,7 +339,8 @@ static u32 current_desc_hdr(struct device *dev, int ch) tail = priv-chan[ch].tail; iter = tail; - while (priv-chan[ch].fifo[iter].dma_desc != cur_desc) { + while (priv-chan[ch].fifo[iter] + priv-chan
[PATCH 1/2] crypto: talitos: Clean ups and comment fixes for ablkcipher commands
This just cleans up some of the initializers, and improves the comments should any other ablkcipher modes be added in the future. The header words 1 and 5 have more possibilities than just passing an IV. These are pointers to the Cipher Context in/out registers. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 226654c..6b2a19a 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1377,11 +1377,9 @@ static int common_nonsnoop(struct talitos_edesc *edesc, int sg_count, ret; /* first DWORD empty */ - desc-ptr[0].len = 0; - to_talitos_ptr(desc-ptr[0], 0); - desc-ptr[0].j_extent = 0; + desc-ptr[0] = zero_entry; - /* cipher iv */ + /* cipher context */ to_talitos_ptr(desc-ptr[1], edesc-iv_dma); desc-ptr[1].len = cpu_to_be16(ivsize); desc-ptr[1].j_extent = 0; @@ -1444,14 +1442,12 @@ static int common_nonsnoop(struct talitos_edesc *edesc, edesc-dma_len, DMA_BIDIRECTIONAL); } - /* iv out */ + /* cipher context out */ map_single_talitos_ptr(dev, desc-ptr[5], ivsize, ctx-iv, 0, DMA_FROM_DEVICE); /* last DWORD empty */ - desc-ptr[6].len = 0; - to_talitos_ptr(desc-ptr[6], 0); - desc-ptr[6].j_extent = 0; + desc-ptr[6] = zero_entry; edesc-req.callback = callback; edesc-req.context = areq; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] crypto: talitos: Add AES-XTS mode
This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2. One of the nice things about this hardware is that it knows how to deal with encrypt/decrypt requests that are larger than sector size, but that also requires that that the sector size be passed into the crypto engine as an XTS cipher context parameter. When a request is larger than the sector size the sector number is incremented by the talitos engine and the tweak key is re-calculated for the new sector. I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit and 256bit) to ensure interoperability with the software AES-XTS implementation. All testing was done using dm-crypt/LUKS with aes-xts-plain64. Is there a better solution that just hard coding the sector size to (1SECTOR_SHIFT)? Maybe dm-crypt should be modified to pass the sector size along with the plain/plain64 IV to an XTS algorithm? Martin Hicks (2): crypto: talitos: Clean ups and comment fixes for ablkcipher commands crypto: talitos: Add AES-XTS Support drivers/crypto/talitos.c | 45 + drivers/crypto/talitos.h |1 + 2 files changed, 38 insertions(+), 8 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] crypto: talitos: Add AES-XTS Support
The newer talitos hardware has support for AES in XTS mode. Signed-off-by: Martin Hicks m...@bork.org --- drivers/crypto/talitos.c | 33 + drivers/crypto/talitos.h |1 + 2 files changed, 34 insertions(+) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 6b2a19a..38cbde1 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -40,9 +40,11 @@ #include linux/spinlock.h #include linux/rtnetlink.h #include linux/slab.h +#include linux/device-mapper.h #include crypto/algapi.h #include crypto/aes.h +#include crypto/xts.h #include crypto/des.h #include crypto/sha.h #include crypto/md5.h @@ -1464,9 +1466,22 @@ static struct talitos_edesc *ablkcipher_edesc_alloc(struct ablkcipher_request * areq, bool encrypt) { struct crypto_ablkcipher *cipher = crypto_ablkcipher_reqtfm(areq); + struct crypto_tfm *tfm = (struct crypto_tfm *)cipher; struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher); unsigned int ivsize = crypto_ablkcipher_ivsize(cipher); + /* +* AES-XTS uses the first two AES Context registers for: +* +* Register 1: Sector Number (Little Endian) +* Register 2: Sector Size (Big Endian) +* +* Whereas AES-CBC uses registers 1/2 as a 16-byte IV. +*/ + if (!strcmp(crypto_tfm_alg_name(tfm),xts(aes))) + /* Fixed sized sector */ + *((u64 *)areq-info + 1) = cpu_to_be64((1SECTOR_SHIFT)); + return talitos_edesc_alloc(ctx-dev, NULL, areq-src, areq-dst, areq-info, 0, areq-nbytes, 0, ivsize, 0, areq-base.flags, areq-base, encrypt); @@ -2192,6 +2207,24 @@ static struct talitos_alg_template driver_algs[] = { DESC_HDR_MODE0_DEU_CBC | DESC_HDR_MODE0_DEU_3DES, }, + { .type = CRYPTO_ALG_TYPE_ABLKCIPHER, + .alg.crypto = { + .cra_name = xts(aes), + .cra_driver_name = xts-aes-talitos, + .cra_blocksize = XTS_BLOCK_SIZE, + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | + CRYPTO_ALG_ASYNC, + .cra_ablkcipher = { + .min_keysize = AES_MIN_KEY_SIZE * 2, + .max_keysize = AES_MAX_KEY_SIZE * 2, + .ivsize = XTS_BLOCK_SIZE, + } + }, + .desc_hdr_template = DESC_HDR_TYPE_COMMON_NONSNOOP_NO_AFEU | + DESC_HDR_SEL0_AESU | + DESC_HDR_MODE0_AESU_XTS, + }, + /* AHASH algorithms. */ { .type = CRYPTO_ALG_TYPE_AHASH, .alg.hash = { diff --git a/drivers/crypto/talitos.h b/drivers/crypto/talitos.h index a6f73e2..735da82 100644 --- a/drivers/crypto/talitos.h +++ b/drivers/crypto/talitos.h @@ -316,6 +316,7 @@ extern int talitos_submit(struct device *dev, int ch, struct talitos_edesc *edes /* primary execution unit mode (MODE0) and derivatives */ #defineDESC_HDR_MODE0_ENCRYPT cpu_to_be32(0x0010) #defineDESC_HDR_MODE0_AESU_CBC cpu_to_be32(0x0020) +#defineDESC_HDR_MODE0_AESU_XTS cpu_to_be32(0x0420) #defineDESC_HDR_MODE0_DEU_CBC cpu_to_be32(0x0040) #defineDESC_HDR_MODE0_DEU_3DES cpu_to_be32(0x0020) #defineDESC_HDR_MODE0_MDEU_CONTcpu_to_be32(0x0800) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] crypto: talitos: Add crypto async queue handling
Resending to linux-crypto in plain text. Sorry to everyone else for the duplication. mh On Fri, Feb 20, 2015 at 1:23 PM, Martin Hicks m...@bork.org wrote: I've just noticed that performance is pretty terrible with maxcpus=1, so I'll investigate that. mh On Fri, Feb 20, 2015 at 11:21 AM, Martin Hicks m...@bork.org wrote: I was testing dm-crypt performance with a Freescale P1022 board with a recent kernel and was getting IO errors while doing testing with LUKS. Investigation showed that all hardware FIFO slots were filling and the driver was returning EAGAIN to the block layer, which is not an expected response for an async crypto implementation. The following patch series adds a few small fixes, and reworks the submission path to use the crypto_queue mechanism to handle the request backlog. Martin Hicks (5): crypto: talitos: Simplify per-channel initialization crypto: talitos: Remove MD5_BLOCK_SIZE crypto: talitos: Fix off-by-one and use all hardware slots crypto: talitos: Reorganize request submission data structures crypto: talitos: Add software backlog queue handling drivers/crypto/talitos.c | 189 -- drivers/crypto/talitos.h | 44 +-- 2 files changed, 137 insertions(+), 96 deletions(-) -- 1.7.10.4 -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- Martin Hicks P.Eng. | m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html