Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
Hi Binoy, 2017-01-16 9:37 GMT+01:00 Binoy Jayan : > The initial goal of our proposal was to process the encryption requests with > the > maximum possible block sizes with a hardware which has automated iv generation > capabilities. But when it is done in software, and if the bulk > requests are processed > sequentially, one block at a time, the memory foot print could be > reduced even if > the bulk request exceeds a page. While your patch looks good, there > are couple of > drawbacks one of which is the maximum size of a bulk request is a page. This > could limit the capability of the crypto hardware. If the whole bio is > processed at > once, which is what qualcomm's version of dm-req-crypt does, it achieves an > even > better performance. I see... well, I added the limit only so that the async fallback implementation can allocate multiple requests, so they can be processed in parallel, as they would be in the current dm-crypt code. I'm not really sure if that brings any benefit, but I guess if some HW accelerator has multiple engines, then this allows distributing the work among them. (I wonder how switching to the crypto API's IV generation will affect the situation for drivers that can process requests in parallel, but do not support the IV generators...) I could remove the limit and switch the fallback to sequential processing (or maybe even allocate the requests from a mempool, the way dm-crypt does it now...), but after Herbert's feedback I'm probably going to scrap this patchset anyway... >> Note that if the 'keycount' parameter of the cipher specification is set to a >> value other than 1, dm-crypt still sends only one sector in each request, >> since >> in such case the neighboring sectors are encrypted with different keys. > > This could be avoided if the key management is done at the crypto layer. Yes, but remember that the only reasonable use-case for using keycount != 1 is mounting loop-AES partitions (which is kind of a legacy format, so there is not much point in making HW drivers for it). It is an unfortunate consequence of Milan's decision to make keycount an independent part of the cipher specification (instead of making it specific for the LMK mode), that all the other IV modes are now 'polluted' with the requirement to support it. I discussed with Milan the possibility of deprecating the keycount parameter (i.e. allowing only value of 64 for LMK and 1 for all the other IV modes) and then converting the IV modes to skciphers (or IV generators, or some combination of both). This would significantly simplify the key management and allow for better optimization strategies. However, I don't know if such change would be accepted by device-mapper maintainers, since it may break someone's unusual dm-crypt configuration... Cheers, Ondrej -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
Hi Ondrej, On 12 January 2017 at 18:29, Ondrej Mosnacek wrote: > This patch converts dm-crypt to use bulk requests when invoking skcipher > operations, allowing the crypto drivers to process multiple sectors at once, > while reducing the overhead caused by the small sector size. > > The new code detects if multiple sectors from a bio are contigously stored > within a single page (which should almost always be the case), and in such > case > processes all these sectors via a single bulk request. > > Note that the bio can also consist of several (likely consecutive) pages, > which > could be all bundled in a single request. However, since we need to specify an > upper bound on how many sectors we are going to send at once (and this bound > may affect the amount of memory allocated per single request), it is best to > just limit the request bundling to a single page. The initial goal of our proposal was to process the encryption requests with the maximum possible block sizes with a hardware which has automated iv generation capabilities. But when it is done in software, and if the bulk requests are processed sequentially, one block at a time, the memory foot print could be reduced even if the bulk request exceeds a page. While your patch looks good, there are couple of drawbacks one of which is the maximum size of a bulk request is a page. This could limit the capability of the crypto hardware. If the whole bio is processed at once, which is what qualcomm's version of dm-req-crypt does, it achieves an even better performance. > Note that if the 'keycount' parameter of the cipher specification is set to a > value other than 1, dm-crypt still sends only one sector in each request, > since > in such case the neighboring sectors are encrypted with different keys. This could be avoided if the key management is done at the crypto layer. Thanks, Binoy -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 6/6] dm-crypt: Add bulk crypto processing support
This patch converts dm-crypt to use bulk requests when invoking skcipher operations, allowing the crypto drivers to process multiple sectors at once, while reducing the overhead caused by the small sector size. The new code detects if multiple sectors from a bio are contigously stored within a single page (which should almost always be the case), and in such case processes all these sectors via a single bulk request. Note that the bio can also consist of several (likely consecutive) pages, which could be all bundled in a single request. However, since we need to specify an upper bound on how many sectors we are going to send at once (and this bound may affect the amount of memory allocated per single request), it is best to just limit the request bundling to a single page. Note that if the 'keycount' parameter of the cipher specification is set to a value other than 1, dm-crypt still sends only one sector in each request, since in such case the neighboring sectors are encrypted with different keys. This change causes a detectable read/write speedup (about 5-10%) on a ramdisk when AES-NI accelerated ciphers are used. Signed-off-by: Ondrej Mosnacek --- drivers/md/dm-crypt.c | 254 -- 1 file changed, 165 insertions(+), 89 deletions(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 7c6c572..d3f69e1 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -37,6 +37,9 @@ #define DM_MSG_PREFIX "crypt" +/* for now, we only bundle consecutve sectors within a single page */ +#define MAX_CONSEC_SECTORS (1 << (PAGE_SHIFT - SECTOR_SHIFT)) + /* * context holding the current state of a multi-part conversion */ @@ -48,7 +51,7 @@ struct convert_context { struct bvec_iter iter_out; sector_t cc_sector; atomic_t cc_pending; - struct skcipher_request *req; + struct skcipher_bulk_request *req; }; /* @@ -73,6 +76,7 @@ struct dm_crypt_request { struct scatterlist sg_in; struct scatterlist sg_out; sector_t iv_sector; + sector_t sector_count; }; struct crypt_config; @@ -83,9 +87,9 @@ struct crypt_iv_operations { void (*dtr)(struct crypt_config *cc); int (*init)(struct crypt_config *cc); int (*wipe)(struct crypt_config *cc); - int (*generator)(struct crypt_config *cc, u8 *iv, + int (*generator)(struct crypt_config *cc, u8 *iv, unsigned int sector, struct dm_crypt_request *dmreq); - int (*post)(struct crypt_config *cc, u8 *iv, + int (*post)(struct crypt_config *cc, u8 *iv, unsigned int sector, struct dm_crypt_request *dmreq); }; @@ -163,14 +167,14 @@ struct crypt_config { /* * Layout of each crypto request: * -* struct skcipher_request +* struct skcipher_bulk_request * context * padding * struct dm_crypt_request * padding -* IV +* IVs * -* The padding is added so that dm_crypt_request and the IV are +* The padding is added so that dm_crypt_request and the IVs are * correctly aligned. */ unsigned int dmreq_start; @@ -245,20 +249,24 @@ static struct crypto_skcipher *any_tfm(struct crypt_config *cc) * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454 */ -static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *iv, +static int crypt_iv_plain_gen(struct crypt_config *cc, u8 *ivs, unsigned int i, struct dm_crypt_request *dmreq) { + u8 *iv = ivs + i * cc->iv_size; + memset(iv, 0, cc->iv_size); - *(__le32 *)iv = cpu_to_le32(dmreq->iv_sector & 0x); + *(__le32 *)iv = cpu_to_le32((dmreq->iv_sector + i) & 0x); return 0; } -static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *iv, - struct dm_crypt_request *dmreq) +static int crypt_iv_plain64_gen(struct crypt_config *cc, u8 *ivs, + unsigned int i, struct dm_crypt_request *dmreq) { + u8 *iv = ivs + i * cc->iv_size; + memset(iv, 0, cc->iv_size); - *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector); + *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i); return 0; } @@ -410,13 +418,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, return err; } -static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *iv, +static int crypt_iv_essiv_gen(struct crypt_config *cc, u8 *ivs, unsigned int i, struct dm_crypt_request *dmreq) { struct crypto_cipher *essiv_tfm = cc->iv_private; + u8 *iv = ivs + i * cc->iv_size; memset(iv, 0, cc->iv_size); - *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector); + *(__le64 *)iv = cpu_to_le64(dmreq->iv_sector + i); crypto_cipher_encr