Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
On Tue, Nov 20, 2018 at 05:24:41PM +0100, Jason A. Donenfeld wrote: > On Tue, Nov 20, 2018 at 3:19 PM Herbert Xu > wrote: > > Yes. In fact it's used for FIPS certification testing. > > Sure, nobody sane should be doing it. But when it comes to > > government certification... :) > > The kernel does not aim toward any FIPS certification, and we're not > going to start bloating our designs to fulfill this. It's never been a > goal. Maybe ask Ted to add a FIPS mode to random.c and see what > happens... When you start arguing "because FIPS!" as your > justification, you really hit a head scratcher. FIPS is not the issue. The issue is that we need to keep the existing user-space ABI. We can't break that. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH 1/1] crypto: cavium/nitrox - crypto request format changes
nitrox_skcipher_crypt() will do the necessary formatting/ordering of input and output sglists based on the algorithm requirements. It will also accommodate the mandatory output buffers required for NITROX hardware like Output request headers (ORH) and Completion headers. Signed-off-by: Nagadheeraj Rottela Reviewed-by: Srikanth Jampala --- drivers/crypto/cavium/nitrox/nitrox_algs.c | 111 ++- drivers/crypto/cavium/nitrox/nitrox_req.h| 94 ++ drivers/crypto/cavium/nitrox/nitrox_reqmgr.c | 266 ++- 3 files changed, 227 insertions(+), 244 deletions(-) diff --git a/drivers/crypto/cavium/nitrox/nitrox_algs.c b/drivers/crypto/cavium/nitrox/nitrox_algs.c index 5d54ebc20cb3..10075a97ff0d 100644 --- a/drivers/crypto/cavium/nitrox/nitrox_algs.c +++ b/drivers/crypto/cavium/nitrox/nitrox_algs.c @@ -155,13 +155,109 @@ static int nitrox_aes_setkey(struct crypto_skcipher *cipher, const u8 *key, return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen); } +static int alloc_src_sglist(struct skcipher_request *skreq, int ivsize) +{ + struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq); + int nents = sg_nents(skreq->src) + 1; + struct se_crypto_request *creq = &nkreq->creq; + char *iv; + struct scatterlist *sg; + + /* Allocate buffer to hold IV and input scatterlist array */ + nkreq->src = alloc_req_buf(nents, ivsize, creq->gfp); + if (!nkreq->src) + return -ENOMEM; + + /* copy iv */ + iv = nkreq->src; + memcpy(iv, skreq->iv, ivsize); + + sg = (struct scatterlist *)(iv + ivsize); + creq->src = sg; + sg_init_table(sg, nents); + + /* Input format: +* +++ +* | IV | SRC sg entries | +* +++ +*/ + + /* IV */ + sg = create_single_sg(sg, iv, ivsize); + /* SRC entries */ + create_multi_sg(sg, skreq->src); + + return 0; +} + +static int alloc_dst_sglist(struct skcipher_request *skreq, int ivsize) +{ + struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq); + int nents = sg_nents(skreq->dst) + 3; + int extralen = ORH_HLEN + COMP_HLEN; + struct se_crypto_request *creq = &nkreq->creq; + struct scatterlist *sg; + char *iv = nkreq->src; + + /* Allocate buffer to hold ORH, COMPLETION and output scatterlist +* array +*/ + nkreq->dst = alloc_req_buf(nents, extralen, creq->gfp); + if (!nkreq->dst) + return -ENOMEM; + + creq->orh = (u64 *)(nkreq->dst); + set_orh_value(creq->orh); + + creq->comp = (u64 *)(nkreq->dst + ORH_HLEN); + set_comp_value(creq->comp); + + sg = (struct scatterlist *)(nkreq->dst + ORH_HLEN + COMP_HLEN); + creq->dst = sg; + sg_init_table(sg, nents); + + /* Output format: +* +-+++-+ +* | ORH | IV | DST sg entries | COMPLETION Bytes| +* +-+++-+ +*/ + + /* ORH */ + sg = create_single_sg(sg, creq->orh, ORH_HLEN); + /* IV */ + sg = create_single_sg(sg, iv, ivsize); + /* DST entries */ + sg = create_multi_sg(sg, skreq->dst); + /* COMPLETION Bytes */ + create_single_sg(sg, creq->comp, COMP_HLEN); + + return 0; +} + +static void free_src_sglist(struct skcipher_request *skreq) +{ + struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq); + + kfree(nkreq->src); +} + +static void free_dst_sglist(struct skcipher_request *skreq) +{ + struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq); + + kfree(nkreq->dst); +} + static void nitrox_skcipher_callback(struct skcipher_request *skreq, int err) { + free_src_sglist(skreq); + free_dst_sglist(skreq); if (err) { pr_err_ratelimited("request failed status 0x%0x\n", err); err = -EINVAL; } + skcipher_request_complete(skreq, err); } @@ -172,6 +268,7 @@ static int nitrox_skcipher_crypt(struct skcipher_request *skreq, bool enc) struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq); int ivsize = crypto_skcipher_ivsize(cipher); struct se_crypto_request *creq; + int ret; creq = &nkreq->creq; creq->flags = skreq->base.flags; @@ -192,11 +289,15 @@ static int nitrox_skcipher_crypt(struct skcipher_request *skreq, bool enc) creq->ctx_handle = nctx->u.ctx_handle; creq->ctrl.s.ctxl = sizeof(struct flexi_crypto_context); - /* copy the iv */ - memcpy(creq->iv, skreq->iv, ivsize); - creq->ivsize = ivsize; - creq->src = skreq->src; - creq->dst = skreq->dst; + ret = alloc_src_sglist(skreq, ivsize); + if (ret) + return ret; + + ret = allo
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 08:29:39PM -0700, Jason Gunthorpe wrote: > Date: Mon, 19 Nov 2018 20:29:39 -0700 > From: Jason Gunthorpe > To: Kenneth Lee > CC: Leon Romanovsky , Kenneth Lee , > Tim Sell , linux-...@vger.kernel.org, Alexander > Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Zhou Wang , > Doug Ledford , Uwe Kleine-König > , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-ker...@vger.kernel.org, Vinod Koul > , linux-crypto@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.9.4 (2018-02-28) > Message-ID: <20181120032939.gr4...@ziepe.ca> > > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-...@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-König > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.gb4...@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to > > > > have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But > > > > this cannot > > > > solve the COW problem: if the user process A share a page P with > > > > device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you > > > > have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing > > to > > do with mm_notifer. > > Well, that sure sounds like it would be a bug in mmu_notifiers.. Yes, it can be taken that way:) But it is going to be a tough bug. > > But considering Jean's SVA stuff seems based on mmu notifiers, I have > a hard time believing that it has any different behavior from RDMA's > ODP, and if it does have different behavior, then it is probably just > a bug in the ODP implementation. As Jean has explained, his solution is based on page table sharing. I think ODP should also consider this new feature. > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > > > > support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > > > > don't need > > > > to write any code for that. Because it has been done by IOMMU > > > > framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff where the IOMMU can directly use the CPU's page table and the > > > IOMMU page table (in device or CPU) is eliminated. > > > > Yes. We are not focusing on the current implementation. As mentioned in the > > cover letter. We are expecting Jean Philips' SVA patch: > > git://linux-arm.org/linux-jpb. > > This SVA stuff does not look comparable to CAPI as it still requires > maintaining seperate IOMMU page tables. > > Also, those patches from Jean have a lot of references to > mmu_notifiers (ie look at iommu_mmu_notifier). > > Are you really sure it is actually any different at all? > > > > Anyhow, I don't think a single instance of hardware should justify an > > > entire new subsystem. Subsystems are hard to make and without multiple > > > hardware
Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
On Tue, Nov 20, 2018 at 05:18:24PM +0100, Jason A. Donenfeld wrote: > > N'ack. As I mentioned in the last email, this really isn't okay, and > mostly defeats the purpose of Zinc in the first place, adding > complexity in a discombobulated half-baked way. Your proposal seems to > be the worst of all worlds. Having talked to lots of people about the > design of Zinc at this point to very positive reception, I'm going to > move forward with submitting v9, once I rebase with the required > changes for Eric's latest merge. Do you have any actual technical reasons for rejecting this apart from your apparent hatred of the existing crypto API? Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Tue, Nov 20, 2018 at 07:17:44AM +0200, Leon Romanovsky wrote: > Date: Tue, 20 Nov 2018 07:17:44 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: Jason Gunthorpe , Kenneth Lee , Tim > Sell , linux-...@vger.kernel.org, Alexander > Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Zhou Wang , > Doug Ledford , Uwe Kleine-König > , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-ker...@vger.kernel.org, Vinod Koul > , linux-crypto@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.10.1 (2018-07-13) > Message-ID: <20181120051743.gd25...@mtr-leonro.mtl.com> > > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-...@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-König > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.gb4...@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to > > > > have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But > > > > this cannot > > > > solve the COW problem: if the user process A share a page P with > > > > device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you > > > > have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing > > to > > do with mm_notifer. > > > > > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > > > > support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > > > > don't need > > > > to write any code for that. Because it has been done by IOMMU > > > > framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff where the IOMMU can directly use the CPU's page table and the > > > IOMMU page table (in device or CPU) is eliminated. > > > > > > > Yes. We are not focusing on the current implementation. As mentioned in the > > cover letter. We are expecting Jean Philips' SVA patch: > > git://linux-arm.org/linux-jpb. > > > > > Anyhow, I don't think a single instance of hardware should justify an > > > entire new subsystem. Subsystems are hard to make and without multiple > > > hardware examples there is no way to expect that it would cover any > > > future use cases. > > > > Yes. That's our first expectation. We can keep it with our driver. But > > because > > there is no user driver support for any accelerator in mainline kernel. > > Even the > > well known QuickAssit has to be maintained out of tree. So we try to see if > > people is interested in working together to solve the problem. > > > > > > > > If all your driver needs is to mmap some PCI bar space, route > > > interrupts and do DMA mapping then mediated VFIO is probably a good > > > choice. > > > > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion > > and > > try not to add complexity to the mm subsystem. > > > > > > > > If it needs to do a bunch of other stuff, not rel
Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
On Tue, Nov 20, 2018 at 05:24:41PM +0100, Jason A. Donenfeld wrote: > On Tue, Nov 20, 2018 at 3:19 PM Herbert Xu > wrote: > > Yes. In fact it's used for FIPS certification testing. > > Sure, nobody sane should be doing it. But when it comes to > > government certification... :) > > The kernel does not aim toward any FIPS certification, and we're not > going to start bloating our designs to fulfill this. It's never been a > goal. Maybe ask Ted to add a FIPS mode to random.c and see what > happens... When you start arguing "because FIPS!" as your > justification, you really hit a head scratcher. There are crazy people who go for FIPS certification for the kernel. That's why crypto/drbg.c exists. There is a crazy fips mode in drivers/char/random.c which causes the kernel to panic with a 1 in 2**80 probability each time _extract_entropy is called. It's not the default, and I have no idea if any one uses it, or if it's like the NIST OSI mandate, which forced everyone to buy an OSI networking stack --- and then put it on the shelf and use TCP/IP instead. Press release from March 2018: https://www.redhat.com/en/about/press-releases/red-hat-completes-fips-140-2-re-certification-red-hat-enterprise-linux-7 - Ted
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Martin, On Tue, Nov 20, 2018 at 5:29 PM Martin Willi wrote: > Thanks for the offer, no need at this time. But I certainly would > welcome if you could do some (Wireguard) benching with that code to see > if it works for you. I certainly will test it in a few different network circumstances, especially since real testing like this is sometimes more telling than busy-loop benchmarks. > > Actually, similarly here, a 10nm Cannon Lake machine should be > > arriving at my house this week, which should make for some > > interesting testing ground for non-throttled zmm, if you'd like to > > play with it. > > Maybe in a future iteration, thanks. In fact would it be interesting to > know if Cannon Lake can handle that throttling better. Everything I've read on the Internet seems to indicate that's the case, so one of the first things I'll be doing is seeing if that's true. There are also the AVX512 IFMA instructions to play with! Jason
[PATCH 3/3] crypto: x86/chacha20 - Add a 4-block AVX-512VL variant
This version uses the same principle as the AVX2 version by scheduling the operations for two block pairs in parallel. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of the raw block function of ~20%. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx512vl-x86_64.S | 272 + arch/x86/crypto/chacha20_glue.c| 7 + 2 files changed, 279 insertions(+) diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S b/arch/x86/crypto/chacha20-avx512vl-x86_64.S index 261097578715..55d34de29e3e 100644 --- a/arch/x86/crypto/chacha20-avx512vl-x86_64.S +++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S @@ -12,6 +12,11 @@ CTR2BL:.octa 0x .octa 0x0001 +.section .rodata.cst32.CTR4BL, "aM", @progbits, 32 +.align 32 +CTR4BL:.octa 0x0002 + .octa 0x0003 + .section .rodata.cst32.CTR8BL, "aM", @progbits, 32 .align 32 CTR8BL:.octa 0x000300020001 @@ -185,6 +190,273 @@ ENTRY(chacha20_2block_xor_avx512vl) ENDPROC(chacha20_2block_xor_avx512vl) +ENTRY(chacha20_4block_xor_avx512vl) + # %rdi: Input state matrix, s + # %rsi: up to 4 data blocks output, o + # %rdx: up to 4 data blocks input, i + # %rcx: input/output length in bytes + + # This function encrypts four ChaCha20 block by loading the state + # matrix four times across eight AVX registers. It performs matrix + # operations on four words in two matrices in parallel, sequentially + # to the operations on the four words of the other two matrices. The + # required word shuffling has a rather high latency, we can do the + # arithmetic on two matrix-pairs without much slowdown. + + vzeroupper + + # x0..3[0-4] = s0..3 + vbroadcasti128 0x00(%rdi),%ymm0 + vbroadcasti128 0x10(%rdi),%ymm1 + vbroadcasti128 0x20(%rdi),%ymm2 + vbroadcasti128 0x30(%rdi),%ymm3 + + vmovdqa %ymm0,%ymm4 + vmovdqa %ymm1,%ymm5 + vmovdqa %ymm2,%ymm6 + vmovdqa %ymm3,%ymm7 + + vpaddd CTR2BL(%rip),%ymm3,%ymm3 + vpaddd CTR4BL(%rip),%ymm7,%ymm7 + + vmovdqa %ymm0,%ymm11 + vmovdqa %ymm1,%ymm12 + vmovdqa %ymm2,%ymm13 + vmovdqa %ymm3,%ymm14 + vmovdqa %ymm7,%ymm15 + + mov $10,%rax + +.Ldoubleround4: + + # x0 += x1, x3 = rotl32(x3 ^ x0, 16) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $16,%ymm3,%ymm3 + + vpaddd %ymm5,%ymm4,%ymm4 + vpxord %ymm4,%ymm7,%ymm7 + vprold $16,%ymm7,%ymm7 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 12) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $12,%ymm1,%ymm1 + + vpaddd %ymm7,%ymm6,%ymm6 + vpxord %ymm6,%ymm5,%ymm5 + vprold $12,%ymm5,%ymm5 + + # x0 += x1, x3 = rotl32(x3 ^ x0, 8) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $8,%ymm3,%ymm3 + + vpaddd %ymm5,%ymm4,%ymm4 + vpxord %ymm4,%ymm7,%ymm7 + vprold $8,%ymm7,%ymm7 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 7) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $7,%ymm1,%ymm1 + + vpaddd %ymm7,%ymm6,%ymm6 + vpxord %ymm6,%ymm5,%ymm5 + vprold $7,%ymm5,%ymm5 + + # x1 = shuffle32(x1, MASK(0, 3, 2, 1)) + vpshufd $0x39,%ymm1,%ymm1 + vpshufd $0x39,%ymm5,%ymm5 + # x2 = shuffle32(x2, MASK(1, 0, 3, 2)) + vpshufd $0x4e,%ymm2,%ymm2 + vpshufd $0x4e,%ymm6,%ymm6 + # x3 = shuffle32(x3, MASK(2, 1, 0, 3)) + vpshufd $0x93,%ymm3,%ymm3 + vpshufd $0x93,%ymm7,%ymm7 + + # x0 += x1, x3 = rotl32(x3 ^ x0, 16) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $16,%ymm3,%ymm3 + + vpaddd %ymm5,%ymm4,%ymm4 + vpxord %ymm4,%ymm7,%ymm7 + vprold $16,%ymm7,%ymm7 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 12) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $12,%ymm1,%ymm1 + + vpaddd %ymm7,%ymm6,%ymm6 + vpxord %ymm6,%ymm5,%ymm5 + vprold $12,%ymm5,%ymm5 + + # x0 += x1, x3 = rotl32(x3 ^ x0, 8) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $8,%ymm3,%ymm3 + + vpaddd %ymm5,%ymm4,%ymm4
[PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions
In the quest for pushing the limits of chacha20 encryption for both IPsec and Wireguard, this small series adds AVX-512VL block functions. The VL variant works on 256-bit ymm registers, but compared to AVX2 can benefit from the new instructions. Compared to the AVX2 version, these block functions bring an overall speed improvement across encryption lengths of ~20%. Below the tcrypt results for additional block sizes in kOps/s, for the current AVX2 code path, the new AVX-512VL code path and the comparison to Zinc in AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz). These numbers result in a very nice chart, available at: https://download.strongswan.org/misc/chacha-avx-512vl.svg zinc zinc len avx2 512vl avx2 512vl 8 5719 5672 5468 5612 16 5675 5627 5355 5621 24 5687 5601 5322 5633 32 5667 5622 5244 5564 40 5603 5582 5337 5578 48 5638 5539 5400 5556 56 5624 5566 5375 5482 64 5590 5573 5352 5531 72 4841 5467 3365 3457 80 5316 5761 3310 3381 88 4798 5470 3239 3343 96 5324 5723 3197 3281 104 4819 5460 3155 3232 112 5266 5749 3020 3195 120 4776 5391 2959 3145 128 5291 5723 3398 3489 136 4122 4837 3321 3423 144 4507 5057 3247 3389 152 4139 4815 3233 3329 160 4482 5043 3159 3256 168 4142 4766 3131 3224 176 4506 5028 3073 3162 184 4119 4772 3010 3109 192 4499 5016 3402 3502 200 4127 4766 3329 3448 208 4452 5012 3276 3371 216 4128 4744 3243 3334 224 4484 5008 3203 3298 232 4103 4772 3141 3237 240 4458 4963 3115 3217 248 4121 4751 3085 3177 256 4461 4987 3364 4046 264 3406 4282 3270 4006 272 3408 4287 3207 3961 280 3371 4271 3203 3825 288 3625 4301 3129 3751 296 3402 4283 3093 3688 304 3401 4247 3062 3637 312 3382 4282 2995 3614 320 3611 4279 3305 4070 328 3386 4260 3276 3968 336 3369 4288 3171 3929 344 3389 4289 3134 3847 352 3609 4266 3127 3720 360 3355 4252 3076 3692 368 3387 4264 3048 3650 376 3387 4238 2967 3553 384 3568 4265 3277 4035 392 3369 4262 3299 3973 400 3362 4235 3239 3899 408 3352 4269 3196 3843 416 3585 4243 3127 3736 424 3364 4216 3092 3672 432 3341 4246 3067 3628 440 3353 4235 3018 3593 448 3538 4245 3327 4035 456 3322 4244 3275 3900 464 3340 4237 3212 3880 472 3330 4242 3054 3802 480 3530 4234 3078 3707 488 3337 4228 3094 3664 496 3330 4223 3015 3591 504 3317 4214 3002 3517 512 3531 4197 3339 4016 520 2511 3101 2030 2682 528 2627 3087 2027 2641 536 2508 3102 2001 2601 544 2638 3090 1964 2564 552 2494 3077 1962 2516 560 2625 3064 1941 2515 568 2500 3086 1922 2493 576 2611 3074 2050 2689 584 2482 3062 2041 2680 592 2595 3074 2026 2644 600 2470 3060 1985 2595 608 2581 3039 1961 2555 616 2478 3062 1956 2521 624 2587 3066 1930 2493 632 2457 3053 1923 2486 640 2581 3050 2059 2712 648 2296 2839 2024 2655 656 2389 2845 2019 2642 664 2292 2842 2002 2610 672 2404 2838 1959 2537 680 2273 2827 1956 2527 688 2389 2840 1938 2510 696 2280 2837 1911 2463 704 2370 2819 2055 2702 712 2277 2834 2029 2663 720 2369 2829 2020 2625 728 2255 2820 2001 2600 736 2373 2819 1958 2543 744 2269 2827 1956 2524 752 2364 2817 1937 2492 760 2270 2805 1909 2483 768 2378 2820 2050 2696 776 2053 2700 2002 2643 784 2066 2693 1922 2640 792 2065 2703 1928 2602 800 2138 2706 1962 2535 808 2065 2679 1938 2528 816 2063 2699 1929 2500 824 2053 2676 1915 2468 832 2149 2692 2036 2693 840 2055 2689 2024 2659 848 2049 2689 2006 2610 856 2057 2702 1979 2585 864 2144 2703 1960 2547 872 2047 2685 1945 2501 880 2055 2683 1902 2497 888 2060 2689 1897 2478 896 2139 2693 2023 2663 904 2049 2686 1970 2644 912 2055 2688 1925 2621 920 2047 2685 1911 2572 928 2114 2695 1907 2545 936 2055 2681 1927 2492 944 2055 2693 1930 2478 952 2042 2688 1909 2471 960 2136 2682 2014 2672 968 2054 2687 1999 2626 976 2040 2682 1982 2598 984 2055 2687 1943 2569 992 2138 2694 1884 2522 1000 2036 2681 1929 2506 1008 2052 2676 1926 2475 1016 2050 2686 1889 2430 1024 2125 2670 2039 2656
[PATCH 2/3] crypto: x86/chacha20 - Add a 2-block AVX-512VL variant
This version uses the same principle as the AVX2 version. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of ~20%. Unlike the AVX2 version, it is faster than the single block SSSE3 version to process a single block. Hence we engage that function for (partial) single block lengths as well. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx512vl-x86_64.S | 171 + arch/x86/crypto/chacha20_glue.c| 7 + 2 files changed, 178 insertions(+) diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S b/arch/x86/crypto/chacha20-avx512vl-x86_64.S index e1877afcaa73..261097578715 100644 --- a/arch/x86/crypto/chacha20-avx512vl-x86_64.S +++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S @@ -7,6 +7,11 @@ #include +.section .rodata.cst32.CTR2BL, "aM", @progbits, 32 +.align 32 +CTR2BL:.octa 0x + .octa 0x0001 + .section .rodata.cst32.CTR8BL, "aM", @progbits, 32 .align 32 CTR8BL:.octa 0x000300020001 @@ -14,6 +19,172 @@ CTR8BL: .octa 0x000300020001 .text +ENTRY(chacha20_2block_xor_avx512vl) + # %rdi: Input state matrix, s + # %rsi: up to 2 data blocks output, o + # %rdx: up to 2 data blocks input, i + # %rcx: input/output length in bytes + + # This function encrypts two ChaCha20 blocks by loading the state + # matrix twice across four AVX registers. It performs matrix operations + # on four words in each matrix in parallel, but requires shuffling to + # rearrange the words after each round. + + vzeroupper + + # x0..3[0-2] = s0..3 + vbroadcasti128 0x00(%rdi),%ymm0 + vbroadcasti128 0x10(%rdi),%ymm1 + vbroadcasti128 0x20(%rdi),%ymm2 + vbroadcasti128 0x30(%rdi),%ymm3 + + vpaddd CTR2BL(%rip),%ymm3,%ymm3 + + vmovdqa %ymm0,%ymm8 + vmovdqa %ymm1,%ymm9 + vmovdqa %ymm2,%ymm10 + vmovdqa %ymm3,%ymm11 + + mov $10,%rax + +.Ldoubleround: + + # x0 += x1, x3 = rotl32(x3 ^ x0, 16) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $16,%ymm3,%ymm3 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 12) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $12,%ymm1,%ymm1 + + # x0 += x1, x3 = rotl32(x3 ^ x0, 8) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $8,%ymm3,%ymm3 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 7) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $7,%ymm1,%ymm1 + + # x1 = shuffle32(x1, MASK(0, 3, 2, 1)) + vpshufd $0x39,%ymm1,%ymm1 + # x2 = shuffle32(x2, MASK(1, 0, 3, 2)) + vpshufd $0x4e,%ymm2,%ymm2 + # x3 = shuffle32(x3, MASK(2, 1, 0, 3)) + vpshufd $0x93,%ymm3,%ymm3 + + # x0 += x1, x3 = rotl32(x3 ^ x0, 16) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $16,%ymm3,%ymm3 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 12) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $12,%ymm1,%ymm1 + + # x0 += x1, x3 = rotl32(x3 ^ x0, 8) + vpaddd %ymm1,%ymm0,%ymm0 + vpxord %ymm0,%ymm3,%ymm3 + vprold $8,%ymm3,%ymm3 + + # x2 += x3, x1 = rotl32(x1 ^ x2, 7) + vpaddd %ymm3,%ymm2,%ymm2 + vpxord %ymm2,%ymm1,%ymm1 + vprold $7,%ymm1,%ymm1 + + # x1 = shuffle32(x1, MASK(2, 1, 0, 3)) + vpshufd $0x93,%ymm1,%ymm1 + # x2 = shuffle32(x2, MASK(1, 0, 3, 2)) + vpshufd $0x4e,%ymm2,%ymm2 + # x3 = shuffle32(x3, MASK(0, 3, 2, 1)) + vpshufd $0x39,%ymm3,%ymm3 + + dec %rax + jnz .Ldoubleround + + # o0 = i0 ^ (x0 + s0) + vpaddd %ymm8,%ymm0,%ymm7 + cmp $0x10,%rcx + jl .Lxorpart2 + vpxord 0x00(%rdx),%xmm7,%xmm6 + vmovdqu %xmm6,0x00(%rsi) + vextracti128$1,%ymm7,%xmm0 + # o1 = i1 ^ (x1 + s1) + vpaddd %ymm9,%ymm1,%ymm7 + cmp $0x20,%rcx + jl .Lxorpart2 + vpxord 0x10(%rdx),%xmm7,%xmm6 + vmovdqu %xmm6,0x10(%rsi) + vextracti128$1,%ymm7,%xmm1 + # o2 = i2 ^ (x2 + s2) + vpaddd %ymm10,%ymm2,%ymm7 + cmp $0x30,%rcx + jl .Lxorpart2 + vpxord 0x20(%rdx),%xmm7,%xmm6 + vmovdqu %xmm6,0x20(%rsi) + vextracti128$1,%ymm7,%xmm2 + # o3 = i3 ^ (x3 +
[PATCH 1/3] crypto: x86/chacha20 - Add a 8-block AVX-512VL variant
This variant is similar to the AVX2 version, but benefits from the AVX-512 rotate instructions and the additional registers, so it can operate without any data on the stack. It uses ymm registers only to avoid the massive core throttling on Skylake-X platforms. Nontheless does it bring a ~30% speed improvement compared to the AVX2 variant for random encryption lengths. The AVX2 version uses "rep movsb" for partial block XORing via the stack. With AVX-512, the new "vmovdqu8" can do this much more efficiently. The associated "kmov" instructions to work with dynamic masks is not part of the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given that the major AVX-512VL architectures provide AVX-512BW and this extension does not affect core clocking, this seems to be no problem at least for now. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 5 + arch/x86/crypto/chacha20-avx512vl-x86_64.S | 396 + arch/x86/crypto/chacha20_glue.c| 26 ++ 3 files changed, 427 insertions(+) create mode 100644 arch/x86/crypto/chacha20-avx512vl-x86_64.S diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index a4b0007a54e1..ce4e43642984 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -8,6 +8,7 @@ OBJECT_FILES_NON_STANDARD := y avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\ $(comma)4)$(comma)%ymm2,yes,no) +avx512_supported :=$(call as-instr,vpmovm2b %k1$(comma)%zmm5,yes,no) sha1_ni_supported :=$(call as-instr,sha1msg1 %xmm0$(comma)%xmm1,yes,no) sha256_ni_supported :=$(call as-instr,sha256msg1 %xmm0$(comma)%xmm1,yes,no) @@ -103,6 +104,10 @@ ifeq ($(avx2_supported),yes) morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o endif +ifeq ($(avx512_supported),yes) + chacha20-x86_64-y += chacha20-avx512vl-x86_64.o +endif + aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S b/arch/x86/crypto/chacha20-avx512vl-x86_64.S new file mode 100644 index ..e1877afcaa73 --- /dev/null +++ b/arch/x86/crypto/chacha20-avx512vl-x86_64.S @@ -0,0 +1,396 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX-512VL functions + * + * Copyright (C) 2018 Martin Willi + */ + +#include + +.section .rodata.cst32.CTR8BL, "aM", @progbits, 32 +.align 32 +CTR8BL:.octa 0x000300020001 + .octa 0x0007000600050004 + +.text + +ENTRY(chacha20_8block_xor_avx512vl) + # %rdi: Input state matrix, s + # %rsi: up to 8 data blocks output, o + # %rdx: up to 8 data blocks input, i + # %rcx: input/output length in bytes + + # This function encrypts eight consecutive ChaCha20 blocks by loading + # the state matrix in AVX registers eight times. Compared to AVX2, this + # mostly benefits from the new rotate instructions in VL and the + # additional registers. + + vzeroupper + + # x0..15[0-7] = s[0..15] + vpbroadcastd0x00(%rdi),%ymm0 + vpbroadcastd0x04(%rdi),%ymm1 + vpbroadcastd0x08(%rdi),%ymm2 + vpbroadcastd0x0c(%rdi),%ymm3 + vpbroadcastd0x10(%rdi),%ymm4 + vpbroadcastd0x14(%rdi),%ymm5 + vpbroadcastd0x18(%rdi),%ymm6 + vpbroadcastd0x1c(%rdi),%ymm7 + vpbroadcastd0x20(%rdi),%ymm8 + vpbroadcastd0x24(%rdi),%ymm9 + vpbroadcastd0x28(%rdi),%ymm10 + vpbroadcastd0x2c(%rdi),%ymm11 + vpbroadcastd0x30(%rdi),%ymm12 + vpbroadcastd0x34(%rdi),%ymm13 + vpbroadcastd0x38(%rdi),%ymm14 + vpbroadcastd0x3c(%rdi),%ymm15 + + # x12 += counter values 0-3 + vpaddd CTR8BL(%rip),%ymm12,%ymm12 + + vmovdqa64 %ymm0,%ymm16 + vmovdqa64 %ymm1,%ymm17 + vmovdqa64 %ymm2,%ymm18 + vmovdqa64 %ymm3,%ymm19 + vmovdqa64 %ymm4,%ymm20 + vmovdqa64 %ymm5,%ymm21 + vmovdqa64 %ymm6,%ymm22 + vmovdqa64 %ymm7,%ymm23 + vmovdqa64 %ymm8,%ymm24 + vmovdqa64 %ymm9,%ymm25 + vmovdqa64 %ymm10,%ymm26 + vmovdqa64 %ymm11,%ymm27 + vmovdqa64 %ymm12,%ymm28 + vmovdqa64 %ymm13,%ymm29 + vmovdqa64 %ymm14,%ymm30 + vmovdqa64 %ymm15,%ymm31 + + mov $10,%eax + +.Ldoubleround8: + # x0 += x4, x12 = rotl32(x12 ^ x0, 16) + vpaddd %ymm0,%ymm4,%ymm0 + vpxord %ymm0,%ymm12,%ymm12 + vprold $16,%ymm12,%ymm12 + # x1 += x5, x13 = rotl32(x13 ^ x1, 16) +
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Jason, > [...] I have a massive Xeon Gold 5120 machine that I can give you > access to if you'd like to do some testing and benching. Thanks for the offer, no need at this time. But I certainly would welcome if you could do some (Wireguard) benching with that code to see if it works for you. > Actually, similarly here, a 10nm Cannon Lake machine should be > arriving at my house this week, which should make for some > interesting testing ground for non-throttled zmm, if you'd like to > play with it. Maybe in a future iteration, thanks. In fact would it be interesting to know if Cannon Lake can handle that throttling better. Regards Martin
Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
On Tue, Nov 20, 2018 at 3:19 PM Herbert Xu wrote: > Yes. In fact it's used for FIPS certification testing. > Sure, nobody sane should be doing it. But when it comes to > government certification... :) The kernel does not aim toward any FIPS certification, and we're not going to start bloating our designs to fulfill this. It's never been a goal. Maybe ask Ted to add a FIPS mode to random.c and see what happens... When you start arguing "because FIPS!" as your justification, you really hit a head scratcher. > They've already paid for the indirect > function call so why make them go through yet another run-time > branch? The indirect function call in the crypto API is the performance hit. The branch in Zinc is not, as the predictor does the correct thing every single time. I'm not able to distinguish between the two looking at the performance measurements between it being there and the branch being commented out. Give me a few more days to finish v9's latest required changes for chacha12, and then I'll submit a revision that I think should address the remaining technical objections raised over the last several months we've been discussing this. Regards, Jason
Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
Hi Herbert, On Tue, Nov 20, 2018 at 7:02 AM Herbert Xu wrote: > Here is an updated version demonstrating how we could access the > accelerated versions of chacha20. It also includes a final patch > to deposit the new zinc version of x86-64 chacha20 into > arch/x86/crypto where it can be used by both the crypto API as well > as zinc. N'ack. As I mentioned in the last email, this really isn't okay, and mostly defeats the purpose of Zinc in the first place, adding complexity in a discombobulated half-baked way. Your proposal seems to be the worst of all worlds. Having talked to lots of people about the design of Zinc at this point to very positive reception, I'm going to move forward with submitting v9, once I rebase with the required changes for Eric's latest merge. Jason
Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
Hi Ard: On Tue, Nov 20, 2018 at 11:32:05AM +0100, Ard Biesheuvel wrote: > > > 1. The crypto API algorithms remain individually accessible, this > > is crucial as these algorithm names are exported to user-space so > > changing the names to foo-zinc is not going to work. > > Are you saying user space may use names like "ctr-aes-neon" directly > rather than "ctr(aes)" for any implementation of the mode? Yes. In fact it's used for FIPS certification testing. > If so, that is highly unfortunate, since it means we'd be breaking > user space by wrapping a crypto library function with its own arch > specific specializations into a generic crypto API wrapper. You can never break things by introducing new algorithms. The problem is only when you remove existing ones. > Note that I think that using AF_ALG to access software crypto routines > (as opposed to accelerators) is rather pointless to begin with, but if > it permits that today than we're stuck with it. Sure, nobody sane should be doing it. But when it comes to government certification... :) > > 2. The arch-specific algorithm code lives in their own module rather > > than in a monolithic module. > > This is one of the remaining issues I have with Zinc. However, modulo > the above concerns, I think it does make sense to have a separate > function API for synchronous software routines below the current > crypto API. What I don't want is a separate Zinc kingdom with a > different name, different maintainers and going through a different > tree, and with its own approach to test vectors, arch specific code, > etc etc Even without the issue of replacing chacha20-generic with chacha20-zinc which breaks point 1 above, we simply don't want or need to go through zinc's run-time implementation choice for the crypto API algorithms. They've already paid for the indirect function call so why make them go through yet another run-time branch? If the optics of having the code in crypto is the issue, we could move the actual algorithm code into a third location, perhaps arch/x86/crypto/lib or arch/x86/lib/crypto. Then both zinc and the crypto API can sit on top of that. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v3 01/10] crypto: crypto_user_stat: made crypto_user_stat optional
On Tue, Nov 20, 2018 at 12:32:17PM +, Corentin Labbe wrote: > Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are > compiled. > This patch made all part of crypto_user_stat uncompiled in that case. > > Signed-off-by: Corentin Labbe > --- > crypto/Makefile | 3 ++- > crypto/algapi.c | 2 ++ > include/crypto/internal/cryptouser.h | 17 + > include/linux/crypto.h | 2 ++ > 4 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/crypto/Makefile b/crypto/Makefile > index abbd86fdbad2..22f9f84f961d 100644 > --- a/crypto/Makefile > +++ b/crypto/Makefile > @@ -54,7 +54,8 @@ cryptomgr-y := algboss.o testmgr.o > > obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o > obj-$(CONFIG_CRYPTO_USER) += crypto_user.o > -crypto_user-y := crypto_user_base.o crypto_user_stat.o > +crypto_user-y := crypto_user_base.o > +crypto_user-$(CONFIG_CRYPTO_STATS) += crypto_user_stat.o > obj-$(CONFIG_CRYPTO_CMAC) += cmac.o > obj-$(CONFIG_CRYPTO_HMAC) += hmac.o > obj-$(CONFIG_CRYPTO_VMAC) += vmac.o > diff --git a/crypto/algapi.c b/crypto/algapi.c > index 2545c5f89c4c..f5396c88e8cd 100644 > --- a/crypto/algapi.c > +++ b/crypto/algapi.c > @@ -258,6 +258,7 @@ static struct crypto_larval *__crypto_register_alg(struct > crypto_alg *alg) > list_add(&alg->cra_list, &crypto_alg_list); > list_add(&larval->alg.cra_list, &crypto_alg_list); > > +#ifdef CONFIG_CRYPTO_STATS > atomic_set(&alg->encrypt_cnt, 0); > atomic_set(&alg->decrypt_cnt, 0); > atomic64_set(&alg->encrypt_tlen, 0); > @@ -265,6 +266,7 @@ static struct crypto_larval *__crypto_register_alg(struct > crypto_alg *alg) > atomic_set(&alg->verify_cnt, 0); > atomic_set(&alg->cipher_err_cnt, 0); > atomic_set(&alg->sign_cnt, 0); > +#endif > If you created a helper function in crypto_user_stat.c to initalize all the stats, you could ifdef it in the cryptouser.h header if unconfigured, and avoid the ifdef leakage to this file above. Neil > out: > return larval; > diff --git a/include/crypto/internal/cryptouser.h > b/include/crypto/internal/cryptouser.h > index 8db299c25566..3492ab42eefb 100644 > --- a/include/crypto/internal/cryptouser.h > +++ b/include/crypto/internal/cryptouser.h > @@ -3,6 +3,23 @@ > > struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact); > > +#ifdef CONFIG_CRYPTO_STATS > int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb); > int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, > struct nlattr **attrs); > int crypto_dump_reportstat_done(struct netlink_callback *cb); > +#else > +static int crypto_dump_reportstat(struct sk_buff *skb, struct > netlink_callback *cb) > +{ > + return -ENOTSUPP; > +} > + > +static int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr > *in_nlh, struct nlattr **attrs) > +{ > + return -ENOTSUPP; > +} > + > +static int crypto_dump_reportstat_done(struct netlink_callback *cb) > +{ > + return -ENOTSUPP; > +} > +#endif > diff --git a/include/linux/crypto.h b/include/linux/crypto.h > index 3634ad6fe202..3e05053b8d57 100644 > --- a/include/linux/crypto.h > +++ b/include/linux/crypto.h > @@ -515,6 +515,7 @@ struct crypto_alg { > > struct module *cra_module; > > +#ifdef CONFIG_CRYPTO_STATS > union { > atomic_t encrypt_cnt; > atomic_t compress_cnt; > @@ -552,6 +553,7 @@ struct crypto_alg { > atomic_t compute_shared_secret_cnt; > }; > atomic_t sign_cnt; > +#endif /* CONFIG_CRYPTO_STATS */ > > } CRYPTO_MINALIGN_ATTR; > > -- > 2.18.1 > >
[PATCH v3 03/10] crypto: crypto_user_stat: convert all stats from u32 to u64
All the 32-bit fields need to be 64-bit. In some cases, UINT32_MAX crypto operations can be done in seconds. Reported-by: Eric Biggers Signed-off-by: Corentin Labbe --- crypto/algapi.c | 10 +-- crypto/crypto_user_stat.c | 114 +++- include/crypto/acompress.h | 8 +-- include/crypto/aead.h | 8 +-- include/crypto/akcipher.h | 16 ++--- include/crypto/hash.h | 6 +- include/crypto/kpp.h| 12 ++-- include/crypto/rng.h| 8 +-- include/crypto/skcipher.h | 8 +-- include/linux/crypto.h | 46 ++--- include/uapi/linux/cryptouser.h | 38 +-- 11 files changed, 133 insertions(+), 141 deletions(-) diff --git a/crypto/algapi.c b/crypto/algapi.c index f5396c88e8cd..42fe316f80ee 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -259,13 +259,13 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg) list_add(&larval->alg.cra_list, &crypto_alg_list); #ifdef CONFIG_CRYPTO_STATS - atomic_set(&alg->encrypt_cnt, 0); - atomic_set(&alg->decrypt_cnt, 0); + atomic64_set(&alg->encrypt_cnt, 0); + atomic64_set(&alg->decrypt_cnt, 0); atomic64_set(&alg->encrypt_tlen, 0); atomic64_set(&alg->decrypt_tlen, 0); - atomic_set(&alg->verify_cnt, 0); - atomic_set(&alg->cipher_err_cnt, 0); - atomic_set(&alg->sign_cnt, 0); + atomic64_set(&alg->verify_cnt, 0); + atomic64_set(&alg->cipher_err_cnt, 0); + atomic64_set(&alg->sign_cnt, 0); #endif out: diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c index a6fb2e6f618d..352569f378a0 100644 --- a/crypto/crypto_user_stat.c +++ b/crypto/crypto_user_stat.c @@ -35,22 +35,21 @@ static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat raead; u64 v64; - u32 v32; memset(&raead, 0, sizeof(raead)); strscpy(raead.type, "aead", sizeof(raead.type)); - v32 = atomic_read(&alg->encrypt_cnt); - raead.stat_encrypt_cnt = v32; + v64 = atomic64_read(&alg->encrypt_cnt); + raead.stat_encrypt_cnt = v64; v64 = atomic64_read(&alg->encrypt_tlen); raead.stat_encrypt_tlen = v64; - v32 = atomic_read(&alg->decrypt_cnt); - raead.stat_decrypt_cnt = v32; + v64 = atomic64_read(&alg->decrypt_cnt); + raead.stat_decrypt_cnt = v64; v64 = atomic64_read(&alg->decrypt_tlen); raead.stat_decrypt_tlen = v64; - v32 = atomic_read(&alg->aead_err_cnt); - raead.stat_aead_err_cnt = v32; + v64 = atomic64_read(&alg->aead_err_cnt); + raead.stat_aead_err_cnt = v64; return nla_put(skb, CRYPTOCFGA_STAT_AEAD, sizeof(raead), &raead); } @@ -59,22 +58,21 @@ static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat rcipher; u64 v64; - u32 v32; memset(&rcipher, 0, sizeof(rcipher)); strscpy(rcipher.type, "cipher", sizeof(rcipher.type)); - v32 = atomic_read(&alg->encrypt_cnt); - rcipher.stat_encrypt_cnt = v32; + v64 = atomic64_read(&alg->encrypt_cnt); + rcipher.stat_encrypt_cnt = v64; v64 = atomic64_read(&alg->encrypt_tlen); rcipher.stat_encrypt_tlen = v64; - v32 = atomic_read(&alg->decrypt_cnt); - rcipher.stat_decrypt_cnt = v32; + v64 = atomic64_read(&alg->decrypt_cnt); + rcipher.stat_decrypt_cnt = v64; v64 = atomic64_read(&alg->decrypt_tlen); rcipher.stat_decrypt_tlen = v64; - v32 = atomic_read(&alg->cipher_err_cnt); - rcipher.stat_cipher_err_cnt = v32; + v64 = atomic64_read(&alg->cipher_err_cnt); + rcipher.stat_cipher_err_cnt = v64; return nla_put(skb, CRYPTOCFGA_STAT_CIPHER, sizeof(rcipher), &rcipher); } @@ -83,21 +81,20 @@ static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat rcomp; u64 v64; - u32 v32; memset(&rcomp, 0, sizeof(rcomp)); strscpy(rcomp.type, "compression", sizeof(rcomp.type)); - v32 = atomic_read(&alg->compress_cnt); - rcomp.stat_compress_cnt = v32; + v64 = atomic64_read(&alg->compress_cnt); + rcomp.stat_compress_cnt = v64; v64 = atomic64_read(&alg->compress_tlen); rcomp.stat_compress_tlen = v64; - v32 = atomic_read(&alg->decompress_cnt); - rcomp.stat_decompress_cnt = v32; + v64 = atomic64_read(&alg->decompress_cnt); + rcomp.stat_decompress_cnt = v64; v64 = atomic64_read(&alg->decompress_tlen); rcomp.stat_decompress_tlen = v64; - v32 = atomic_read(&alg->cipher_err_cnt); - rcomp.stat_compress_err_cnt = v32; + v64 = atomic64_read(&alg->cipher_err_cnt); + rcomp.stat_compress_err_cnt = v64; return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp); } @@ -106,21 +1
[PATCH v3 02/10] crypto: CRYPTO_STATS should depend on CRYPTO_USER
CRYPTO_STATS is using CRYPTO_USER stuff, so it should depends on it. Signed-off-by: Corentin Labbe --- crypto/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/Kconfig b/crypto/Kconfig index 62dbd1a99fa3..a2f1b4a86b92 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1829,6 +1829,7 @@ config CRYPTO_USER_API_AEAD config CRYPTO_STATS bool "Crypto usage statistics for User-space" + depends on CRYPTO_USER help This option enables the gathering of crypto stats. This will collect: -- 2.18.1
[PATCH v3 01/10] crypto: crypto_user_stat: made crypto_user_stat optional
Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are compiled. This patch made all part of crypto_user_stat uncompiled in that case. Signed-off-by: Corentin Labbe --- crypto/Makefile | 3 ++- crypto/algapi.c | 2 ++ include/crypto/internal/cryptouser.h | 17 + include/linux/crypto.h | 2 ++ 4 files changed, 23 insertions(+), 1 deletion(-) diff --git a/crypto/Makefile b/crypto/Makefile index abbd86fdbad2..22f9f84f961d 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -54,7 +54,8 @@ cryptomgr-y := algboss.o testmgr.o obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o obj-$(CONFIG_CRYPTO_USER) += crypto_user.o -crypto_user-y := crypto_user_base.o crypto_user_stat.o +crypto_user-y := crypto_user_base.o +crypto_user-$(CONFIG_CRYPTO_STATS) += crypto_user_stat.o obj-$(CONFIG_CRYPTO_CMAC) += cmac.o obj-$(CONFIG_CRYPTO_HMAC) += hmac.o obj-$(CONFIG_CRYPTO_VMAC) += vmac.o diff --git a/crypto/algapi.c b/crypto/algapi.c index 2545c5f89c4c..f5396c88e8cd 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -258,6 +258,7 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg) list_add(&alg->cra_list, &crypto_alg_list); list_add(&larval->alg.cra_list, &crypto_alg_list); +#ifdef CONFIG_CRYPTO_STATS atomic_set(&alg->encrypt_cnt, 0); atomic_set(&alg->decrypt_cnt, 0); atomic64_set(&alg->encrypt_tlen, 0); @@ -265,6 +266,7 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg) atomic_set(&alg->verify_cnt, 0); atomic_set(&alg->cipher_err_cnt, 0); atomic_set(&alg->sign_cnt, 0); +#endif out: return larval; diff --git a/include/crypto/internal/cryptouser.h b/include/crypto/internal/cryptouser.h index 8db299c25566..3492ab42eefb 100644 --- a/include/crypto/internal/cryptouser.h +++ b/include/crypto/internal/cryptouser.h @@ -3,6 +3,23 @@ struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact); +#ifdef CONFIG_CRYPTO_STATS int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb); int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, struct nlattr **attrs); int crypto_dump_reportstat_done(struct netlink_callback *cb); +#else +static int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb) +{ + return -ENOTSUPP; +} + +static int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh, struct nlattr **attrs) +{ + return -ENOTSUPP; +} + +static int crypto_dump_reportstat_done(struct netlink_callback *cb) +{ + return -ENOTSUPP; +} +#endif diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 3634ad6fe202..3e05053b8d57 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -515,6 +515,7 @@ struct crypto_alg { struct module *cra_module; +#ifdef CONFIG_CRYPTO_STATS union { atomic_t encrypt_cnt; atomic_t compress_cnt; @@ -552,6 +553,7 @@ struct crypto_alg { atomic_t compute_shared_secret_cnt; }; atomic_t sign_cnt; +#endif /* CONFIG_CRYPTO_STATS */ } CRYPTO_MINALIGN_ATTR; -- 2.18.1
[PATCH v3 04/10] crypto: crypto_user_stat: split user space crypto stat structures
It is cleaner to have each stat in their own structures. Signed-off-by: Corentin Labbe --- crypto/crypto_user_stat.c | 20 +++ include/uapi/linux/cryptouser.h | 100 2 files changed, 72 insertions(+), 48 deletions(-) diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c index 352569f378a0..3c14be2f7a1b 100644 --- a/crypto/crypto_user_stat.c +++ b/crypto/crypto_user_stat.c @@ -33,7 +33,7 @@ struct crypto_dump_info { static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat raead; + struct crypto_stat_aead raead; u64 v64; memset(&raead, 0, sizeof(raead)); @@ -56,7 +56,7 @@ static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rcipher; + struct crypto_stat_cipher rcipher; u64 v64; memset(&rcipher, 0, sizeof(rcipher)); @@ -79,7 +79,7 @@ static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rcomp; + struct crypto_stat_compress rcomp; u64 v64; memset(&rcomp, 0, sizeof(rcomp)); @@ -101,7 +101,7 @@ static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat racomp; + struct crypto_stat_compress racomp; u64 v64; memset(&racomp, 0, sizeof(racomp)); @@ -123,7 +123,7 @@ static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rakcipher; + struct crypto_stat_akcipher rakcipher; u64 v64; memset(&rakcipher, 0, sizeof(rakcipher)); @@ -150,7 +150,7 @@ static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rkpp; + struct crypto_stat_kpp rkpp; u64 v; memset(&rkpp, 0, sizeof(rkpp)); @@ -171,7 +171,7 @@ static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_ahash(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rhash; + struct crypto_stat_hash rhash; u64 v64; memset(&rhash, 0, sizeof(rhash)); @@ -190,7 +190,7 @@ static int crypto_report_ahash(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_shash(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rhash; + struct crypto_stat_hash rhash; u64 v64; memset(&rhash, 0, sizeof(rhash)); @@ -209,7 +209,7 @@ static int crypto_report_shash(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_rng(struct sk_buff *skb, struct crypto_alg *alg) { - struct crypto_stat rrng; + struct crypto_stat_rng rrng; u64 v64; memset(&rrng, 0, sizeof(rrng)); @@ -248,7 +248,7 @@ static int crypto_reportstat_one(struct crypto_alg *alg, if (nla_put_u32(skb, CRYPTOCFGA_PRIORITY_VAL, alg->cra_priority)) goto nla_put_failure; if (alg->cra_flags & CRYPTO_ALG_LARVAL) { - struct crypto_stat rl; + struct crypto_stat_larval rl; memset(&rl, 0, sizeof(rl)); strscpy(rl.type, "larval", sizeof(rl.type)); diff --git a/include/uapi/linux/cryptouser.h b/include/uapi/linux/cryptouser.h index 9f8187077ce4..3a70f025e27d 100644 --- a/include/uapi/linux/cryptouser.h +++ b/include/uapi/linux/cryptouser.h @@ -76,45 +76,69 @@ struct crypto_user_alg { __u32 cru_flags; }; -struct crypto_stat { - char type[CRYPTO_MAX_NAME]; - union { - __u64 stat_encrypt_cnt; - __u64 stat_compress_cnt; - __u64 stat_generate_cnt; - __u64 stat_hash_cnt; - __u64 stat_setsecret_cnt; - }; - union { - __u64 stat_encrypt_tlen; - __u64 stat_compress_tlen; - __u64 stat_generate_tlen; - __u64 stat_hash_tlen; - }; - union { - __u64 stat_akcipher_err_cnt; - __u64 stat_cipher_err_cnt; - __u64 stat_compress_err_cnt; - __u64 stat_aead_err_cnt; - __u64 stat_hash_err_cnt; - __u64 stat_rng_err_cnt; - __u64 stat_kpp_err_cnt; - }; - union { - __u64 stat_decrypt_cnt; - __u64 stat_decompress_cnt; - __u64 stat_seed_cnt; - __u64 stat_generate_public_key_cnt; - }; - union { - __u64 stat_decrypt_tlen; - __u
[PATCH v3 08/10] crypto: crypto_user_stat: remove intermediate variable
The use of the v64 intermediate variable is useless, and removing it bring to much readable code. Signed-off-by: Corentin Labbe --- crypto/crypto_user_stat.c | 132 -- 1 file changed, 41 insertions(+), 91 deletions(-) diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c index 838123758423..7b668c659122 100644 --- a/crypto/crypto_user_stat.c +++ b/crypto/crypto_user_stat.c @@ -34,22 +34,16 @@ struct crypto_dump_info { static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat_aead raead; - u64 v64; memset(&raead, 0, sizeof(raead)); strscpy(raead.type, "aead", sizeof(raead.type)); - v64 = atomic64_read(&alg->encrypt_cnt); - raead.stat_encrypt_cnt = v64; - v64 = atomic64_read(&alg->encrypt_tlen); - raead.stat_encrypt_tlen = v64; - v64 = atomic64_read(&alg->decrypt_cnt); - raead.stat_decrypt_cnt = v64; - v64 = atomic64_read(&alg->decrypt_tlen); - raead.stat_decrypt_tlen = v64; - v64 = atomic64_read(&alg->aead_err_cnt); - raead.stat_aead_err_cnt = v64; + raead.stat_encrypt_cnt = atomic64_read(&alg->encrypt_cnt); + raead.stat_encrypt_tlen = atomic64_read(&alg->encrypt_tlen); + raead.stat_decrypt_cnt = atomic64_read(&alg->decrypt_cnt); + raead.stat_decrypt_tlen = atomic64_read(&alg->decrypt_tlen); + raead.stat_aead_err_cnt = atomic64_read(&alg->aead_err_cnt); return nla_put(skb, CRYPTOCFGA_STAT_AEAD, sizeof(raead), &raead); } @@ -57,22 +51,16 @@ static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat_cipher rcipher; - u64 v64; memset(&rcipher, 0, sizeof(rcipher)); strscpy(rcipher.type, "cipher", sizeof(rcipher.type)); - v64 = atomic64_read(&alg->encrypt_cnt); - rcipher.stat_encrypt_cnt = v64; - v64 = atomic64_read(&alg->encrypt_tlen); - rcipher.stat_encrypt_tlen = v64; - v64 = atomic64_read(&alg->decrypt_cnt); - rcipher.stat_decrypt_cnt = v64; - v64 = atomic64_read(&alg->decrypt_tlen); - rcipher.stat_decrypt_tlen = v64; - v64 = atomic64_read(&alg->cipher_err_cnt); - rcipher.stat_cipher_err_cnt = v64; + rcipher.stat_encrypt_cnt = atomic64_read(&alg->encrypt_cnt); + rcipher.stat_encrypt_tlen = atomic64_read(&alg->encrypt_tlen); + rcipher.stat_decrypt_cnt = atomic64_read(&alg->decrypt_cnt); + rcipher.stat_decrypt_tlen = atomic64_read(&alg->decrypt_tlen); + rcipher.stat_cipher_err_cnt = atomic64_read(&alg->cipher_err_cnt); return nla_put(skb, CRYPTOCFGA_STAT_CIPHER, sizeof(rcipher), &rcipher); } @@ -80,21 +68,15 @@ static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat_compress rcomp; - u64 v64; memset(&rcomp, 0, sizeof(rcomp)); strscpy(rcomp.type, "compression", sizeof(rcomp.type)); - v64 = atomic64_read(&alg->compress_cnt); - rcomp.stat_compress_cnt = v64; - v64 = atomic64_read(&alg->compress_tlen); - rcomp.stat_compress_tlen = v64; - v64 = atomic64_read(&alg->decompress_cnt); - rcomp.stat_decompress_cnt = v64; - v64 = atomic64_read(&alg->decompress_tlen); - rcomp.stat_decompress_tlen = v64; - v64 = atomic64_read(&alg->compress_err_cnt); - rcomp.stat_compress_err_cnt = v64; + rcomp.stat_compress_cnt = atomic64_read(&alg->compress_cnt); + rcomp.stat_compress_tlen = atomic64_read(&alg->compress_tlen); + rcomp.stat_decompress_cnt = atomic64_read(&alg->decompress_cnt); + rcomp.stat_decompress_tlen = atomic64_read(&alg->decompress_tlen); + rcomp.stat_compress_err_cnt = atomic64_read(&alg->compress_err_cnt); return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp); } @@ -102,21 +84,15 @@ static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg) static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg) { struct crypto_stat_compress racomp; - u64 v64; memset(&racomp, 0, sizeof(racomp)); strscpy(racomp.type, "acomp", sizeof(racomp.type)); - v64 = atomic64_read(&alg->compress_cnt); - racomp.stat_compress_cnt = v64; - v64 = atomic64_read(&alg->compress_tlen); - racomp.stat_compress_tlen = v64; - v64 = atomic64_read(&alg->decompress_cnt); - racomp.stat_decompress_cnt = v64; - v64 = atomic64_read(&alg->decompress_tlen); - racomp.stat_decompress_tlen = v64; - v64 = atomic64_read(&alg->compress_err_cnt); - racomp.stat_compress_err_cnt = v64; + racomp.stat_compress_cnt = atomic64_read(&alg->compress_cnt); + racomp.stat_compress_tlen = atomic64_rea
[PATCH v3 10/10] crypto: crypto_user_stat: rename err_cnt parameter
Since now all crypto stats are on their own structures, it is now useless to have the algorithm name in the err_cnt member. Signed-off-by: Corentin Labbe --- crypto/algapi.c | 38 - crypto/crypto_user_stat.c | 18 include/linux/crypto.h | 28 include/uapi/linux/cryptouser.h | 14 ++-- tools/crypto/getstat.c | 18 5 files changed, 58 insertions(+), 58 deletions(-) diff --git a/crypto/algapi.c b/crypto/algapi.c index 7ef812c4872f..ec3866d6ced0 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -1082,7 +1082,7 @@ void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.cipher.cipher_err_cnt); + atomic64_inc(&alg->stats.cipher.err_cnt); } else { atomic64_inc(&alg->stats.cipher.encrypt_cnt); atomic64_add(nbytes, &alg->stats.cipher.encrypt_tlen); @@ -1094,7 +1094,7 @@ void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.cipher.cipher_err_cnt); + atomic64_inc(&alg->stats.cipher.err_cnt); } else { atomic64_inc(&alg->stats.cipher.decrypt_cnt); atomic64_add(nbytes, &alg->stats.cipher.decrypt_tlen); @@ -1106,7 +1106,7 @@ void crypto_stats_aead_encrypt(unsigned int cryptlen, struct crypto_alg *alg, int ret) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.aead.aead_err_cnt); + atomic64_inc(&alg->stats.aead.err_cnt); } else { atomic64_inc(&alg->stats.aead.encrypt_cnt); atomic64_add(cryptlen, &alg->stats.aead.encrypt_tlen); @@ -1118,7 +1118,7 @@ void crypto_stats_aead_decrypt(unsigned int cryptlen, struct crypto_alg *alg, int ret) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.aead.aead_err_cnt); + atomic64_inc(&alg->stats.aead.err_cnt); } else { atomic64_inc(&alg->stats.aead.decrypt_cnt); atomic64_add(cryptlen, &alg->stats.aead.decrypt_tlen); @@ -1130,7 +1130,7 @@ void crypto_stats_akcipher_encrypt(unsigned int src_len, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt); + atomic64_inc(&alg->stats.akcipher.err_cnt); } else { atomic64_inc(&alg->stats.akcipher.encrypt_cnt); atomic64_add(src_len, &alg->stats.akcipher.encrypt_tlen); @@ -1142,7 +1142,7 @@ void crypto_stats_akcipher_decrypt(unsigned int src_len, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt); + atomic64_inc(&alg->stats.akcipher.err_cnt); } else { atomic64_inc(&alg->stats.akcipher.decrypt_cnt); atomic64_add(src_len, &alg->stats.akcipher.decrypt_tlen); @@ -1153,7 +1153,7 @@ void crypto_stats_akcipher_decrypt(unsigned int src_len, int ret, void crypto_stats_akcipher_sign(int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) - atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt); + atomic64_inc(&alg->stats.akcipher.err_cnt); else atomic64_inc(&alg->stats.akcipher.sign_cnt); crypto_alg_put(alg); @@ -1162,7 +1162,7 @@ void crypto_stats_akcipher_sign(int ret, struct crypto_alg *alg) void crypto_stats_akcipher_verify(int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) - atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt); + atomic64_inc(&alg->stats.akcipher.err_cnt); else atomic64_inc(&alg->stats.akcipher.verify_cnt); crypto_alg_put(alg); @@ -1172,7 +1172,7 @@ void crypto_stats_compress(unsigned int slen, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->stats.compress.compress_err_cnt); + atomic64_inc(&alg->stats.compress.err_cnt); } else { atomic64_inc(&alg->stats.compress.compress_cnt); atomic64_add(slen, &alg->stats.compress.compress_tlen); @@ -
[PATCH v3 06/10] crypto: crypto_user_stat: fix use_after_free of struct xxx_request
All crypto_stats functions use the struct xxx_request for feeding stats, but in some case this structure could already be freed. For fixing this, the needed parameters (len and alg) will be stored before the request being executed. Fixes: cac5818c25d0 ("crypto: user - Implement a generic crypto statistics") Reported-by: syzbot Signed-off-by: Corentin Labbe --- crypto/ahash.c | 17 ++- crypto/algapi.c| 283 + crypto/rng.c | 4 +- include/crypto/acompress.h | 38 ++--- include/crypto/aead.h | 38 ++--- include/crypto/akcipher.h | 74 ++ include/crypto/hash.h | 32 + include/crypto/kpp.h | 48 ++- include/crypto/rng.h | 27 +--- include/crypto/skcipher.h | 36 ++--- include/linux/crypto.h | 63 - 11 files changed, 384 insertions(+), 276 deletions(-) diff --git a/crypto/ahash.c b/crypto/ahash.c index 3a348fbcf8f9..5d320a811f75 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -364,20 +364,28 @@ static int crypto_ahash_op(struct ahash_request *req, int crypto_ahash_final(struct ahash_request *req) { + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct crypto_alg *alg = tfm->base.__crt_alg; + unsigned int nbytes = req->nbytes; int ret; + crypto_stats_get(alg); ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->final); - crypto_stat_ahash_final(req, ret); + crypto_stats_ahash_final(nbytes, ret, alg); return ret; } EXPORT_SYMBOL_GPL(crypto_ahash_final); int crypto_ahash_finup(struct ahash_request *req) { + struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct crypto_alg *alg = tfm->base.__crt_alg; + unsigned int nbytes = req->nbytes; int ret; + crypto_stats_get(alg); ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->finup); - crypto_stat_ahash_final(req, ret); + crypto_stats_ahash_final(nbytes, ret, alg); return ret; } EXPORT_SYMBOL_GPL(crypto_ahash_finup); @@ -385,13 +393,16 @@ EXPORT_SYMBOL_GPL(crypto_ahash_finup); int crypto_ahash_digest(struct ahash_request *req) { struct crypto_ahash *tfm = crypto_ahash_reqtfm(req); + struct crypto_alg *alg = tfm->base.__crt_alg; + unsigned int nbytes = req->nbytes; int ret; + crypto_stats_get(alg); if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) ret = -ENOKEY; else ret = crypto_ahash_op(req, tfm->digest); - crypto_stat_ahash_final(req, ret); + crypto_stats_ahash_final(nbytes, ret, alg); return ret; } EXPORT_SYMBOL_GPL(crypto_ahash_digest); diff --git a/crypto/algapi.c b/crypto/algapi.c index 42fe316f80ee..6c98851f3a28 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -1078,6 +1078,289 @@ int crypto_type_has_alg(const char *name, const struct crypto_type *frontend, } EXPORT_SYMBOL_GPL(crypto_type_has_alg); +#ifdef CONFIG_CRYPTO_STATS +void crypto_stats_get(struct crypto_alg *alg) +{ + crypto_alg_get(alg); +} + +void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, int ret, + struct crypto_alg *alg) +{ + if (ret && ret != -EINPROGRESS && ret != -EBUSY) { + atomic64_inc(&alg->cipher_err_cnt); + } else { + atomic64_inc(&alg->encrypt_cnt); + atomic64_add(nbytes, &alg->encrypt_tlen); + } + crypto_alg_put(alg); +} + +void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, int ret, + struct crypto_alg *alg) +{ + if (ret && ret != -EINPROGRESS && ret != -EBUSY) { + atomic64_inc(&alg->cipher_err_cnt); + } else { + atomic64_inc(&alg->decrypt_cnt); + atomic64_add(nbytes, &alg->decrypt_tlen); + } + crypto_alg_put(alg); +} + +void crypto_stats_aead_encrypt(unsigned int cryptlen, + struct crypto_alg *alg, int ret) +{ + if (ret && ret != -EINPROGRESS && ret != -EBUSY) { + atomic64_inc(&alg->aead_err_cnt); + } else { + atomic64_inc(&alg->encrypt_cnt); + atomic64_add(cryptlen, &alg->encrypt_tlen); + } + crypto_alg_put(alg); +} + +void crypto_stats_aead_decrypt(unsigned int cryptlen, + struct crypto_alg *alg, int ret) +{ + if (ret && ret != -EINPROGRESS && ret != -EBUSY) { + atomic64_inc(&alg->aead_err_cnt); + } else { + atomic64_inc(&alg->decrypt_cnt); + atomic64_add(cryptlen, &alg->decrypt_tlen); + } + crypto_alg_put(alg); +} + +void crypto_stats_akcipher_encrypt(unsigned int src_len, int ret, + struct crypto_alg *alg) +{ + if (ret && ret != -EINPROGRESS && ret != -EBUSY)
[PATCH v3 07/10] crypto: crypto_user_stat: Fix invalid stat reporting
Some error count use the wrong name for getting this data. But this had not caused any reporting problem, since all error count are shared in the same union. Signed-off-by: Corentin Labbe --- crypto/crypto_user_stat.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c index 3c14be2f7a1b..838123758423 100644 --- a/crypto/crypto_user_stat.c +++ b/crypto/crypto_user_stat.c @@ -93,7 +93,7 @@ static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg) rcomp.stat_decompress_cnt = v64; v64 = atomic64_read(&alg->decompress_tlen); rcomp.stat_decompress_tlen = v64; - v64 = atomic64_read(&alg->cipher_err_cnt); + v64 = atomic64_read(&alg->compress_err_cnt); rcomp.stat_compress_err_cnt = v64; return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp); @@ -115,7 +115,7 @@ static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg) racomp.stat_decompress_cnt = v64; v64 = atomic64_read(&alg->decompress_tlen); racomp.stat_decompress_tlen = v64; - v64 = atomic64_read(&alg->cipher_err_cnt); + v64 = atomic64_read(&alg->compress_err_cnt); racomp.stat_compress_err_cnt = v64; return nla_put(skb, CRYPTOCFGA_STAT_ACOMP, sizeof(racomp), &racomp); @@ -222,7 +222,7 @@ static int crypto_report_rng(struct sk_buff *skb, struct crypto_alg *alg) rrng.stat_generate_tlen = v64; v64 = atomic64_read(&alg->seed_cnt); rrng.stat_seed_cnt = v64; - v64 = atomic64_read(&alg->hash_err_cnt); + v64 = atomic64_read(&alg->rng_err_cnt); rrng.stat_rng_err_cnt = v64; return nla_put(skb, CRYPTOCFGA_STAT_RNG, sizeof(rrng), &rrng); -- 2.18.1
[PATCH v3 09/10] crypto: crypto_user_stat: Split stats in multiple structures
Like for userspace, this patch splits stats into multiple structures, one for each algorithm class. Signed-off-by: Corentin Labbe --- crypto/algapi.c | 108 +++ crypto/crypto_user_stat.c | 82 - include/linux/crypto.h| 180 +- 3 files changed, 210 insertions(+), 160 deletions(-) diff --git a/crypto/algapi.c b/crypto/algapi.c index 6c98851f3a28..7ef812c4872f 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -259,13 +259,7 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg) list_add(&larval->alg.cra_list, &crypto_alg_list); #ifdef CONFIG_CRYPTO_STATS - atomic64_set(&alg->encrypt_cnt, 0); - atomic64_set(&alg->decrypt_cnt, 0); - atomic64_set(&alg->encrypt_tlen, 0); - atomic64_set(&alg->decrypt_tlen, 0); - atomic64_set(&alg->verify_cnt, 0); - atomic64_set(&alg->cipher_err_cnt, 0); - atomic64_set(&alg->sign_cnt, 0); + memset(&alg->stats, 0, sizeof(alg->stats)); #endif out: @@ -1088,10 +1082,10 @@ void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->cipher_err_cnt); + atomic64_inc(&alg->stats.cipher.cipher_err_cnt); } else { - atomic64_inc(&alg->encrypt_cnt); - atomic64_add(nbytes, &alg->encrypt_tlen); + atomic64_inc(&alg->stats.cipher.encrypt_cnt); + atomic64_add(nbytes, &alg->stats.cipher.encrypt_tlen); } crypto_alg_put(alg); } @@ -1100,10 +1094,10 @@ void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->cipher_err_cnt); + atomic64_inc(&alg->stats.cipher.cipher_err_cnt); } else { - atomic64_inc(&alg->decrypt_cnt); - atomic64_add(nbytes, &alg->decrypt_tlen); + atomic64_inc(&alg->stats.cipher.decrypt_cnt); + atomic64_add(nbytes, &alg->stats.cipher.decrypt_tlen); } crypto_alg_put(alg); } @@ -1112,10 +1106,10 @@ void crypto_stats_aead_encrypt(unsigned int cryptlen, struct crypto_alg *alg, int ret) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->aead_err_cnt); + atomic64_inc(&alg->stats.aead.aead_err_cnt); } else { - atomic64_inc(&alg->encrypt_cnt); - atomic64_add(cryptlen, &alg->encrypt_tlen); + atomic64_inc(&alg->stats.aead.encrypt_cnt); + atomic64_add(cryptlen, &alg->stats.aead.encrypt_tlen); } crypto_alg_put(alg); } @@ -1124,10 +1118,10 @@ void crypto_stats_aead_decrypt(unsigned int cryptlen, struct crypto_alg *alg, int ret) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->aead_err_cnt); + atomic64_inc(&alg->stats.aead.aead_err_cnt); } else { - atomic64_inc(&alg->decrypt_cnt); - atomic64_add(cryptlen, &alg->decrypt_tlen); + atomic64_inc(&alg->stats.aead.decrypt_cnt); + atomic64_add(cryptlen, &alg->stats.aead.decrypt_tlen); } crypto_alg_put(alg); } @@ -1136,10 +1130,10 @@ void crypto_stats_akcipher_encrypt(unsigned int src_len, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->akcipher_err_cnt); + atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt); } else { - atomic64_inc(&alg->encrypt_cnt); - atomic64_add(src_len, &alg->encrypt_tlen); + atomic64_inc(&alg->stats.akcipher.encrypt_cnt); + atomic64_add(src_len, &alg->stats.akcipher.encrypt_tlen); } crypto_alg_put(alg); } @@ -1148,10 +1142,10 @@ void crypto_stats_akcipher_decrypt(unsigned int src_len, int ret, struct crypto_alg *alg) { if (ret && ret != -EINPROGRESS && ret != -EBUSY) { - atomic64_inc(&alg->akcipher_err_cnt); + atomic64_inc(&alg->stats.akcipher.akcipher_err_cnt); } else { - atomic64_inc(&alg->decrypt_cnt); - atomic64_add(src_len, &alg->decrypt_tlen); + atomic64_inc(&alg->stats.akcipher.decrypt_cnt); + atomic64_add(src_len, &alg->stats.akcipher.decrypt_tlen); } crypto_alg_put(alg); } @@ -1159,18 +1153,18 @@ void crypto_stats_akcipher_decrypt(unsigned int src_len, int r
[PATCH v3 05/10] crypto: tool: getstat: convert user space example to the new crypto_user_stat uapi
This patch converts the getstat example tool to the recent changes done in crypto_user_stat - changed all stats to u64 - separated struct stats for each crypto alg Signed-off-by: Corentin Labbe --- tools/crypto/getstat.c | 54 +- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/tools/crypto/getstat.c b/tools/crypto/getstat.c index 24115173a483..57fbb94608d4 100644 --- a/tools/crypto/getstat.c +++ b/tools/crypto/getstat.c @@ -152,53 +152,53 @@ static int get_stat(const char *drivername) if (tb[CRYPTOCFGA_STAT_HASH]) { struct rtattr *rta = tb[CRYPTOCFGA_STAT_HASH]; - struct crypto_stat *rhash = - (struct crypto_stat *)RTA_DATA(rta); - printf("%s\tHash\n\tHash: %u bytes: %llu\n\tErrors: %u\n", + struct crypto_stat_hash *rhash = + (struct crypto_stat_hash *)RTA_DATA(rta); + printf("%s\tHash\n\tHash: %llu bytes: %llu\n\tErrors: %llu\n", drivername, rhash->stat_hash_cnt, rhash->stat_hash_tlen, rhash->stat_hash_err_cnt); } else if (tb[CRYPTOCFGA_STAT_COMPRESS]) { struct rtattr *rta = tb[CRYPTOCFGA_STAT_COMPRESS]; - struct crypto_stat *rblk = - (struct crypto_stat *)RTA_DATA(rta); - printf("%s\tCompress\n\tCompress: %u bytes: %llu\n\tDecompress: %u bytes: %llu\n\tErrors: %u\n", + struct crypto_stat_compress *rblk = + (struct crypto_stat_compress *)RTA_DATA(rta); + printf("%s\tCompress\n\tCompress: %llu bytes: %llu\n\tDecompress: %llu bytes: %llu\n\tErrors: %llu\n", drivername, rblk->stat_compress_cnt, rblk->stat_compress_tlen, rblk->stat_decompress_cnt, rblk->stat_decompress_tlen, rblk->stat_compress_err_cnt); } else if (tb[CRYPTOCFGA_STAT_ACOMP]) { struct rtattr *rta = tb[CRYPTOCFGA_STAT_ACOMP]; - struct crypto_stat *rcomp = - (struct crypto_stat *)RTA_DATA(rta); - printf("%s\tACompress\n\tCompress: %u bytes: %llu\n\tDecompress: %u bytes: %llu\n\tErrors: %u\n", + struct crypto_stat_compress *rcomp = + (struct crypto_stat_compress *)RTA_DATA(rta); + printf("%s\tACompress\n\tCompress: %llu bytes: %llu\n\tDecompress: %llu bytes: %llu\n\tErrors: %llu\n", drivername, rcomp->stat_compress_cnt, rcomp->stat_compress_tlen, rcomp->stat_decompress_cnt, rcomp->stat_decompress_tlen, rcomp->stat_compress_err_cnt); } else if (tb[CRYPTOCFGA_STAT_AEAD]) { struct rtattr *rta = tb[CRYPTOCFGA_STAT_AEAD]; - struct crypto_stat *raead = - (struct crypto_stat *)RTA_DATA(rta); - printf("%s\tAEAD\n\tEncrypt: %u bytes: %llu\n\tDecrypt: %u bytes: %llu\n\tErrors: %u\n", + struct crypto_stat_aead *raead = + (struct crypto_stat_aead *)RTA_DATA(rta); + printf("%s\tAEAD\n\tEncrypt: %llu bytes: %llu\n\tDecrypt: %llu bytes: %llu\n\tErrors: %llu\n", drivername, raead->stat_encrypt_cnt, raead->stat_encrypt_tlen, raead->stat_decrypt_cnt, raead->stat_decrypt_tlen, raead->stat_aead_err_cnt); } else if (tb[CRYPTOCFGA_STAT_BLKCIPHER]) { struct rtattr *rta = tb[CRYPTOCFGA_STAT_BLKCIPHER]; - struct crypto_stat *rblk = - (struct crypto_stat *)RTA_DATA(rta); - printf("%s\tCipher\n\tEncrypt: %u bytes: %llu\n\tDecrypt: %u bytes: %llu\n\tErrors: %u\n", + struct crypto_stat_cipher *rblk = + (struct crypto_stat_cipher *)RTA_DATA(rta); + printf("%s\tCipher\n\tEncrypt: %llu bytes: %llu\n\tDecrypt: %llu bytes: %llu\n\tErrors: %llu\n", drivername, rblk->stat_encrypt_cnt, rblk->stat_encrypt_tlen, rblk->stat_decrypt_cnt, rblk->stat_decrypt_tlen, rblk->stat_cipher_err_cnt); } else if (tb[CRYPTOCFGA_STAT_AKCIPHER]) { struct rtattr *rta = tb[CRYPTOCFGA_STAT_AKCIPHER]; - struct crypto_stat *rblk = - (struct crypto_stat *)RTA_DATA(rta); - printf("%s\tAkcipher\n\tEncrypt: %u bytes: %llu\n\tDecrypt: %u bytes: %llu\n\tSign: %u\n\tVerify: %u\n\tErrors: %u\n", + struct crypto_stat_akcipher *rblk = + (struct crypto_stat_akcipher *)RTA_DATA(rta); + printf("%s\tAkcipher\n\tEncrypt: %llu bytes: %llu\n\tDecrypt: %llu bytes: %llu\n\tSign:
[PATCH v3 00/10] crypto: crypto_user_stat: misc enhancement
Hello This patchset fixes all reported problem by Eric biggers. Regards Changes since v3: - moved all crypto_stats functions from header to algapi.c for using crypto_alg_get/put Changes since v2: - Better locking of crypto_alg via crypto_alg_get/crypto_alg_put - remove all intermediate variables in crypto/crypto_user_stat.c - splited all internal stats variables into different structures Corentin Labbe (10): crypto: crypto_user_stat: made crypto_user_stat optional crypto: CRYPTO_STATS should depend on CRYPTO_USER crypto: crypto_user_stat: convert all stats from u32 to u64 crypto: crypto_user_stat: split user space crypto stat structures crypto: tool: getstat: convert user space example to the new crypto_user_stat uapi crypto: crypto_user_stat: fix use_after_free of struct xxx_request crypto: crypto_user_stat: Fix invalid stat reporting crypto: crypto_user_stat: remove intermediate variable crypto: crypto_user_stat: Split stats in multiple structures crypto: crypto_user_stat: rename err_cnt parameter crypto/Kconfig | 1 + crypto/Makefile | 3 +- crypto/ahash.c | 17 +- crypto/algapi.c | 293 ++- crypto/crypto_user_stat.c| 160 +-- crypto/rng.c | 4 +- include/crypto/acompress.h | 38 +--- include/crypto/aead.h| 38 +--- include/crypto/akcipher.h| 74 ++- include/crypto/hash.h| 32 +-- include/crypto/internal/cryptouser.h | 17 ++ include/crypto/kpp.h | 48 + include/crypto/rng.h | 27 +-- include/crypto/skcipher.h| 36 +--- include/linux/crypto.h | 245 +- include/uapi/linux/cryptouser.h | 102 ++ tools/crypto/getstat.c | 72 +++ 17 files changed, 677 insertions(+), 530 deletions(-) -- 2.18.1
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On 20/11/2018 09:16, Jonathan Cameron wrote: > +CC Jean-Phillipe and iommu list. Thanks for the Cc, sorry I don't have enough bandwidth to follow this thread at the moment. > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > support > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > don't need > to write any code for that. Because it has been done by IOMMU framework. > If it Looks like the IOMMU code uses mmu_notifier, so it is identical to IB's ODP. The only difference is that IB tends to have the IOMMU page table in the device, not in the CPU. The only case I know if that is different is the new-fangled CAPI stuff where the IOMMU can directly use the CPU's page table and the IOMMU page table (in device or CPU) is eliminated. >>> >>> Yes. We are not focusing on the current implementation. As mentioned in the >>> cover letter. We are expecting Jean Philips' SVA patch: >>> git://linux-arm.org/linux-jpb. >> >> This SVA stuff does not look comparable to CAPI as it still requires >> maintaining seperate IOMMU page tables. With SVA, we use the same page tables in the IOMMU and CPU. It's the same pgd pointer, there is no mirroring of mappings. We bind the process page tables with the device using a PASID (Process Address Space ID). After fork(), the child's mm is different from the parent's one, and is not automatically bound to the device. The device driver will have to issue a new bind() request, and the child mm will be bound with a different PASID. There could be a problem if the child inherits the parent's device handle. Then depending on the device, the child could be able to program DMA and possibly access the parent's address space. The parent needs to be aware of that when using the bind() API, and close the device fd in the child after fork(). We use MMU notifiers for some address space changes: * The IOTLB needs to be invalidated after any unmap() to the process address space. On Arm systems the SMMU IOTLBs can be invalidated by the CPU TLBI instructions, but we still need to invalidate TLBs private to devices that are arch-agnostic (Address Translation Cache in PCI ATS). * When the process mm exits, we need to remove the associated PASID configuration in the IOMMU and invalidate the TLBs. Thanks, Jean
Re: Guidance required for a crypto requirement
On Tue, Nov 20, 2018 at 10:30:56AM +, Kalyani Akula wrote: > This email and any attachments are intended for the sole use of the named > recipient(s) and contain(s) confidential information that may be proprietary, > privileged or copyrighted under applicable law. If you are not the intended > recipient, do not read, copy, or forward this email message or any > attachments. Delete this email message and any attachments immediately. Now deleted, sorry.
Re: [RFC PATCH v2 0/4] Exporting existing crypto API code through zinc
On Tue, 20 Nov 2018 at 07:02, Herbert Xu wrote: > > On Mon, Nov 19, 2018 at 01:24:51PM +0800, Herbert Xu wrote: > > > > In response to Martin's patch-set which I merged last week, I think > > here is quick way out for the zinc interface. > > > > Going through the past zinc discussions it would appear that > > everybody is quite happy with the zinc interface per se. The > > most contentious areas are in fact the algorithm implementations > > under zinc, as well as the conversion of the crypto API algorithms > > over to using the zinc interface (e.g., you can no longer access > > specific implementations). > > > > My proposal is to merge the zinc interface as is, but to invert > > how we place the algorithm implementations. IOW the implementations > > should stay where they are now, with in the crypto API. However, > > we will provide direct access to them for zinc without going through > > the crypto API. IOW we'll just export the functions directly. > > > > Here is a proof of concept patch to do it for chacha20-generic. > > It basically replaces patch 3 in the October zinc patch series. > > Actually this patch also exports the arm/x86 chacha20 functions > > too so they are ready for use by zinc. > > > > If everyone is happy with this then we can immediately add the > > zinc interface and expose the existing crypto API algorithms > > through it. This would allow wireguard to be merged right away. > > > > In parallel, the discussions over replacing the implementations > > underneath can carry on without stalling the whole project. > > > > PS This patch is totally untested :) > > Here is an updated version demonstrating how we could access the > accelerated versions of chacha20. It also includes a final patch > to deposit the new zinc version of x86-64 chacha20 into > arch/x86/crypto where it can be used by both the crypto API as well > as zinc. > > The key differences between this and the last zinc patch series: > > 1. The crypto API algorithms remain individually accessible, this > is crucial as these algorithm names are exported to user-space so > changing the names to foo-zinc is not going to work. > Are you saying user space may use names like "ctr-aes-neon" directly rather than "ctr(aes)" for any implementation of the mode? If so, that is highly unfortunate, since it means we'd be breaking user space by wrapping a crypto library function with its own arch specific specializations into a generic crypto API wrapper. Note that I think that using AF_ALG to access software crypto routines (as opposed to accelerators) is rather pointless to begin with, but if it permits that today than we're stuck with it. > 2. The arch-specific algorithm code lives in their own module rather > than in a monolithic module. > This is one of the remaining issues I have with Zinc. However, modulo the above concerns, I think it does make sense to have a separate function API for synchronous software routines below the current crypto API. What I don't want is a separate Zinc kingdom with a different name, different maintainers and going through a different tree, and with its own approach to test vectors, arch specific code, etc etc
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
+CC Jean-Phillipe and iommu list. On Mon, 19 Nov 2018 20:29:39 -0700 Jason Gunthorpe wrote: > On Tue, Nov 20, 2018 at 11:07:02AM +0800, Kenneth Lee wrote: > > On Mon, Nov 19, 2018 at 11:49:54AM -0700, Jason Gunthorpe wrote: > > > Date: Mon, 19 Nov 2018 11:49:54 -0700 > > > From: Jason Gunthorpe > > > To: Kenneth Lee > > > CC: Leon Romanovsky , Kenneth Lee , > > > Tim Sell , linux-...@vger.kernel.org, Alexander > > > Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Doug Ledford , Uwe Kleine-König > > > , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.9.4 (2018-02-28) > > > Message-ID: <20181119184954.gb4...@ziepe.ca> > > > > > > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > > > > > > If the hardware cannot share page table with the CPU, we then need to > > > > have > > > > some way to change the device page table. This is what happen in ODP. It > > > > invalidates the page table in device upon mmu_notifier call back. But > > > > this cannot > > > > solve the COW problem: if the user process A share a page P with > > > > device, and A > > > > forks a new process B, and it continue to write to the page. By COW, the > > > > process B will keep the page P, while A will get a new page P'. But you > > > > have > > > > no way to let the device know it should use P' rather than P. > > > > > > Is this true? I thought mmu_notifiers covered all these cases. > > > > > > The mm_notifier for A should fire if B causes the physical address of > > > A's pages to change via COW. > > > > > > And this causes the device page tables to re-synchronize. > > > > I don't see such code. The current do_cow_fault() implemenation has nothing > > to > > do with mm_notifer. > > Well, that sure sounds like it would be a bug in mmu_notifiers.. > > But considering Jean's SVA stuff seems based on mmu notifiers, I have > a hard time believing that it has any different behavior from RDMA's > ODP, and if it does have different behavior, then it is probably just > a bug in the ODP implementation. > > > > > In WarpDrive/uacce, we make this simple. If you support IOMMU and it > > > > support > > > > SVM/SVA. Everything will be fine just like ODP implicit mode. And you > > > > don't need > > > > to write any code for that. Because it has been done by IOMMU > > > > framework. If it > > > > > > Looks like the IOMMU code uses mmu_notifier, so it is identical to > > > IB's ODP. The only difference is that IB tends to have the IOMMU page > > > table in the device, not in the CPU. > > > > > > The only case I know if that is different is the new-fangled CAPI > > > stuff where the IOMMU can directly use the CPU's page table and the > > > IOMMU page table (in device or CPU) is eliminated. > > > > Yes. We are not focusing on the current implementation. As mentioned in the > > cover letter. We are expecting Jean Philips' SVA patch: > > git://linux-arm.org/linux-jpb. > > This SVA stuff does not look comparable to CAPI as it still requires > maintaining seperate IOMMU page tables. > > Also, those patches from Jean have a lot of references to > mmu_notifiers (ie look at iommu_mmu_notifier). > > Are you really sure it is actually any different at all? > > > > Anyhow, I don't think a single instance of hardware should justify an > > > entire new subsystem. Subsystems are hard to make and without multiple > > > hardware examples there is no way to expect that it would cover any > > > future use cases. > > > > Yes. That's our first expectation. We can keep it with our driver. But > > because > > there is no user driver support for any accelerator in mainline kernel. > > Even the > > well known QuickAssit has to be maintained out of tree. So we try to see if > > people is interested in working together to solve the problem. > > Well, you should come with patches ack'ed by these other groups. > > > > If all your driver needs is to mmap some PCI bar space, route > > > interrupts and do DMA mapping then mediated VFIO is probably a good > > > choice. > > > > Yes. That is what is done in our RFCv1/v2. But we accepted Jerome's opinion > > and > > try not to add complexity to the mm subsystem. > > Why would a mediated VFIO driver touch the mm subsystem? Sounds like > you don't have a VFIO driver if it needs to do stuff like that... > > > > If it needs to do a bunch of other stuff, not related to PCI bar > > > space, i