Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-17 Thread Ondrej Mosnáček
Hi Binoy,

2017-01-16 9:37 GMT+01:00 Binoy Jayan :
> The initial goal of our proposal was to process the encryption requests with 
> the
> maximum possible block sizes with a hardware which has automated iv generation
> capabilities. But when it is done in software, and if the bulk
> requests are processed
> sequentially, one block at a time, the memory foot print could be
> reduced even if
> the bulk request exceeds a page. While your patch looks good, there
> are couple of
> drawbacks one of which is the maximum size of a bulk request is a page. This
> could limit the capability of the crypto hardware. If the whole bio is
> processed at
> once, which is what qualcomm's version of dm-req-crypt does, it achieves an 
> even
> better performance.

I see... well, I added the limit only so that the async fallback
implementation can allocate multiple requests, so they can be
processed in parallel, as they would be in the current dm-crypt code.
I'm not really sure if that brings any benefit, but I guess if some HW
accelerator has multiple engines, then this allows distributing the
work among them. (I wonder how switching to the crypto API's IV
generation will affect the situation for drivers that can process
requests in parallel, but do not support the IV generators...)

I could remove the limit and switch the fallback to sequential
processing (or maybe even allocate the requests from a mempool, the
way dm-crypt does it now...), but after Herbert's feedback I'm
probably going to scrap this patchset anyway...

>> Note that if the 'keycount' parameter of the cipher specification is set to a
>> value other than 1, dm-crypt still sends only one sector in each request, 
>> since
>> in such case the neighboring sectors are encrypted with different keys.
>
> This could be avoided if the key management is done at the crypto layer.

Yes, but remember that the only reasonable use-case for using keycount
!= 1 is mounting loop-AES partitions (which is kind of a legacy
format, so there is not much point in making HW drivers for it). It is
an unfortunate consequence of Milan's decision to make keycount an
independent part of the cipher specification (instead of making it
specific for the LMK mode), that all the other IV modes are now
'polluted' with the requirement to support it.

I discussed with Milan the possibility of deprecating the keycount
parameter (i.e. allowing only value of 64 for LMK and 1 for all the
other IV modes) and then converting the IV modes to skciphers (or IV
generators, or some combination of both). This would significantly
simplify the key management and allow for better optimization
strategies. However, I don't know if such change would be accepted by
device-mapper maintainers, since it may break someone's unusual
dm-crypt configuration...

Cheers,
Ondrej
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-16 Thread Binoy Jayan
Hi Ondrej,

On 12 January 2017 at 18:29, Ondrej Mosnacek  wrote:
> This patch converts dm-crypt to use bulk requests when invoking skcipher
> operations, allowing the crypto drivers to process multiple sectors at once,
> while reducing the overhead caused by the small sector size.
>
> The new code detects if multiple sectors from a bio are contigously stored
> within a single page (which should almost always be the case), and in such 
> case
> processes all these sectors via a single bulk request.
>
> Note that the bio can also consist of several (likely consecutive) pages, 
> which
> could be all bundled in a single request. However, since we need to specify an
> upper bound on how many sectors we are going to send at once (and this bound
> may affect the amount of memory allocated per single request), it is best to
> just limit the request bundling to a single page.

The initial goal of our proposal was to process the encryption requests with the
maximum possible block sizes with a hardware which has automated iv generation
capabilities. But when it is done in software, and if the bulk
requests are processed
sequentially, one block at a time, the memory foot print could be
reduced even if
the bulk request exceeds a page. While your patch looks good, there
are couple of
drawbacks one of which is the maximum size of a bulk request is a page. This
could limit the capability of the crypto hardware. If the whole bio is
processed at
once, which is what qualcomm's version of dm-req-crypt does, it achieves an even
better performance.

> Note that if the 'keycount' parameter of the cipher specification is set to a
> value other than 1, dm-crypt still sends only one sector in each request, 
> since
> in such case the neighboring sectors are encrypted with different keys.

This could be avoided if the key management is done at the crypto layer.

Thanks,
Binoy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support

2017-01-12 Thread Ondrej Mosnacek
This patch converts dm-crypt to use bulk requests when invoking skcipher
operations, allowing the crypto drivers to process multiple sectors at once,
while reducing the overhead caused by the small sector size.

The new code detects if multiple sectors from a bio are contigously stored
within a single page (which should almost always be the case), and in such case
processes all these sectors via a single bulk request.

Note that the bio can also consist of several (likely consecutive) pages, which
could be all bundled in a single request. However, since we need to specify an
upper bound on how many sectors we are going to send at once (and this bound
may affect the amount of memory allocated per single request), it is best to
just limit the request bundling to a single page.

Note that if the 'keycount' parameter of the cipher specification is set to a
value other than 1, dm-crypt still sends only one sector in each request, since
in such case the neighboring sectors are encrypted with different keys.

This change causes a detectable read/write speedup (about 5-10%) on a ramdisk
when AES-NI accelerated ciphers are used.

Signed-off-by: Ondrej Mosnacek 
---
 drivers/md/dm-crypt.c | 254 --
 1 file changed, 165 insertions(+), 89 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 7c6c572..d3f69e1 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -37,6 +37,9 @@
 
 #define DM_MSG_PREFIX "crypt"
 
+/* for now, we only bundle consecutve sectors within a single page */
+#define MAX_CONSEC_SECTORS (1 << (PAGE_SHIFT - SECTOR_SHIFT))
+
 /*
  * context holding the current state of a multi-part conversion
  */
@@ -48,7 +51,7 @@ struct convert_context {
struct bvec_iter iter_out;
sector_t cc_sector;
atomic_t cc_pending;
-   struct skcipher_request *req;
+   struct skcipher_bulk_request *req;
 };
 
 /*
@@ -73,6 +76,7 @@ struct dm_crypt_request {
struct scatterlist sg_in;
struct scatterlist sg_out;
sector_t iv_sector;
+   sector_t sector_count;
 };
 
 struct crypt_config;
@@ -83,9 +87,9 @@ struct crypt_iv_operations {
void (*dtr)(struct crypt_config *cc);
int (*init)(struct crypt_config *cc);
int (*wipe)(struct crypt_config *cc);
-   int (*generator)(struct crypt_config *cc, u8 *iv,
+   int (*generator)(struct crypt_config *cc, u8 *iv, unsigned int sector,
 struct dm_crypt_request *dmreq);
-   int (*post)(struct crypt_config *cc, u8 *iv,
+   int (*post)(struct crypt_config *cc, u8 *iv, unsigned int sector,
struct dm_crypt_request *dmreq);
 };
 
@@ -163,14 +167,14 @@ struct crypt_config {
/*
 * Layout of each crypto request:
 *
-*   struct skcipher_request
+*   struct skcipher_bulk_request
 *  context
 *  padding
 *   struct dm_crypt_request
 *  padding
-*   IV
+*   IVs
 *
-* The padding is added so that dm_crypt_request and the IV are
+* The padding is added so that dm_crypt_request and the IVs are
 * correctly aligned.
 */
unsigned int dmreq_start;
@@ -245,20 +249,24 @@ static struct crypto_skcipher *any_tfm(struct 
crypt_config *cc)
  * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
  */
 
-static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
  struct dm_crypt_request *dmreq)
 {
+   u8 *iv = ivs + i * cc->iv_size;
+
memset(iv, 0, cc->iv_size);
-   *(__le32 *)iv = cpu_to_le32(dmreq->iv_sector & 0x);
+   *(__le32 *)iv = cpu_to_le32((dmreq->iv_sector + i) & 0x);
 
return 0;
 }
 
-static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *iv,
-   struct dm_crypt_request *dmreq)
+static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *ivs,
+   unsigned int i, struct dm_crypt_request *dmreq)
 {
+   u8 *iv = ivs + i * cc->iv_size;
+
memset(iv, 0, cc->iv_size);
-   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i);
 
return 0;
 }
@@ -410,13 +418,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, 
struct dm_target *ti,
return err;
 }
 
-static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *iv,
+static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *ivs, unsigned int i,
  struct dm_crypt_request *dmreq)
 {
struct crypto_cipher *essiv_tfm = cc->iv_private;
+   u8 *iv = ivs + i * cc->iv_size;
 
memset(iv, 0, cc->iv_size);
-   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector);
+   *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i);
crypto_cipher_encr