Re: [PATCH v5 00/11] crypto: crypto_user_stat: misc enhancement
On Thu, Nov 29, 2018 at 02:42:15PM +, Corentin Labbe wrote: > Hello > > This patchset fixes all reported problem by Eric biggers. > > Regards > > Changes since v4: > - Inlined functions when !CRYPTO_STATS > > Changes since v3: > - Added a crypto_stats_init as asked vy Neil Horman > - Fixed some checkpatch complaints > > Changes since v2: > - moved all crypto_stats functions from header to algapi.c for using > crypto_alg_get/put > > Changes since v1: > - Better locking of crypto_alg via crypto_alg_get/crypto_alg_put > - remove all intermediate variables in crypto/crypto_user_stat.c > - splited all internal stats variables into different structures > > Corentin Labbe (11): > crypto: crypto_user_stat: made crypto_user_stat optional > crypto: CRYPTO_STATS should depend on CRYPTO_USER > crypto: crypto_user_stat: convert all stats from u32 to u64 > crypto: crypto_user_stat: split user space crypto stat structures > crypto: tool: getstat: convert user space example to the new > crypto_user_stat uapi > crypto: crypto_user_stat: fix use_after_free of struct xxx_request > crypto: crypto_user_stat: Fix invalid stat reporting > crypto: crypto_user_stat: remove intermediate variable > crypto: crypto_user_stat: Split stats in multiple structures > crypto: crypto_user_stat: rename err_cnt parameter > crypto: crypto_user_stat: Add crypto_stats_init > > crypto/Kconfig | 1 + > crypto/Makefile | 3 +- > crypto/ahash.c | 17 +- > crypto/algapi.c | 247 ++- > crypto/crypto_user_stat.c| 160 +-- > crypto/rng.c | 4 +- > include/crypto/acompress.h | 38 +--- > include/crypto/aead.h| 38 +--- > include/crypto/akcipher.h| 74 ++- > include/crypto/hash.h| 32 +-- > include/crypto/internal/cryptouser.h | 17 ++ > include/crypto/kpp.h | 48 + > include/crypto/rng.h | 27 +-- > include/crypto/skcipher.h| 36 +--- > include/linux/crypto.h | 290 ++- > include/uapi/linux/cryptouser.h | 102 ++ > tools/crypto/getstat.c | 72 +++ > 17 files changed, 676 insertions(+), 530 deletions(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [crypto chcr 1/2] small packet Tx stalls the queue
On Fri, Nov 30, 2018 at 02:31:48PM +0530, Atul Gupta wrote: > Immediate packets sent to hardware should include the work > request length in calculating the flits. WR occupy one flit and > if not accounted result in invalid request which stalls the HW > queue. > > Cc: sta...@vger.kernel.org > Signed-off-by: Atul Gupta > --- > drivers/crypto/chelsio/chcr_ipsec.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] fscrypt: remove CRYPTO_CTR dependency
On Thu, Sep 06, 2018 at 12:43:41PM +0200, Ard Biesheuvel wrote: > On 5 September 2018 at 21:24, Eric Biggers wrote: > > From: Eric Biggers > > > > fscrypt doesn't use the CTR mode of operation for anything, so there's > > no need to select CRYPTO_CTR. It was added by commit 71dea01ea2ed > > ("ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is > > enabled"). But, I've been unable to identify the arm64 crypto bug it > > was supposedly working around. > > > > I suspect the issue was seen only on some old Android device kernel > > (circa 3.10?). So if the fix wasn't mistaken, the real bug is probably > > already fixed. Or maybe it was actually a bug in a non-upstream crypto > > driver. > > > > So, remove the dependency. If it turns out there's actually still a > > bug, we'll fix it properly. > > > > Signed-off-by: Eric Biggers > > Acked-by: Ard Biesheuvel > > This may be related to > > 11e3b725cfc2 crypto: arm64/aes-blk - honour iv_out requirement in CBC > and CTR modes > > given that the commit in question mentions CTS. How it actually works > around the issue is unclear to me, though. > > > > > > --- > > fs/crypto/Kconfig | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig > > index 02b7d91c92310..284b589b4774d 100644 > > --- a/fs/crypto/Kconfig > > +++ b/fs/crypto/Kconfig > > @@ -6,7 +6,6 @@ config FS_ENCRYPTION > > select CRYPTO_ECB > > select CRYPTO_XTS > > select CRYPTO_CTS > > - select CRYPTO_CTR > > select CRYPTO_SHA256 > > select KEYS > > help > > -- > > 2.19.0.rc2.392.g5ba43deb5a-goog > > Ping. Ted, can you consider applying this to the fscrypt tree for 4.21? Thanks, - Eric
Re: [PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions
On Tue, Nov 20, 2018 at 05:30:47PM +0100, Martin Willi wrote: > In the quest for pushing the limits of chacha20 encryption for both IPsec > and Wireguard, this small series adds AVX-512VL block functions. The VL > variant works on 256-bit ymm registers, but compared to AVX2 can benefit > from the new instructions. > > Compared to the AVX2 version, these block functions bring an overall > speed improvement across encryption lengths of ~20%. Below the tcrypt > results for additional block sizes in kOps/s, for the current AVX2 > code path, the new AVX-512VL code path and the comparison to Zinc in > AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz). > > These numbers result in a very nice chart, available at: > https://download.strongswan.org/misc/chacha-avx-512vl.svg > > zinc zinc > len avx2 512vl avx2 512vl >8 5719 5672 5468 5612 > 16 5675 5627 5355 5621 > 24 5687 5601 5322 5633 > 32 5667 5622 5244 5564 > 40 5603 5582 5337 5578 > 48 5638 5539 5400 5556 > 56 5624 5566 5375 5482 > 64 5590 5573 5352 5531 > 72 4841 5467 3365 3457 > 80 5316 5761 3310 3381 > 88 4798 5470 3239 3343 > 96 5324 5723 3197 3281 > 104 4819 5460 3155 3232 > 112 5266 5749 3020 3195 > 120 4776 5391 2959 3145 > 128 5291 5723 3398 3489 > 136 4122 4837 3321 3423 > 144 4507 5057 3247 3389 > 152 4139 4815 3233 3329 > 160 4482 5043 3159 3256 > 168 4142 4766 3131 3224 > 176 4506 5028 3073 3162 > 184 4119 4772 3010 3109 > 192 4499 5016 3402 3502 > 200 4127 4766 3329 3448 > 208 4452 5012 3276 3371 > 216 4128 4744 3243 3334 > 224 4484 5008 3203 3298 > 232 4103 4772 3141 3237 > 240 4458 4963 3115 3217 > 248 4121 4751 3085 3177 > 256 4461 4987 3364 4046 > 264 3406 4282 3270 4006 > 272 3408 4287 3207 3961 > 280 3371 4271 3203 3825 > 288 3625 4301 3129 3751 > 296 3402 4283 3093 3688 > 304 3401 4247 3062 3637 > 312 3382 4282 2995 3614 > 320 3611 4279 3305 4070 > 328 3386 4260 3276 3968 > 336 3369 4288 3171 3929 > 344 3389 4289 3134 3847 > 352 3609 4266 3127 3720 > 360 3355 4252 3076 3692 > 368 3387 4264 3048 3650 > 376 3387 4238 2967 3553 > 384 3568 4265 3277 4035 > 392 3369 4262 3299 3973 > 400 3362 4235 3239 3899 > 408 3352 4269 3196 3843 > 416 3585 4243 3127 3736 > 424 3364 4216 3092 3672 > 432 3341 4246 3067 3628 > 440 3353 4235 3018 3593 > 448 3538 4245 3327 4035 > 456 3322 4244 3275 3900 > 464 3340 4237 3212 3880 > 472 3330 4242 3054 3802 > 480 3530 4234 3078 3707 > 488 3337 4228 3094 3664 > 496 3330 4223 3015 3591 > 504 3317 4214 3002 3517 > 512 3531 4197 3339 4016 > 520 2511 3101 2030 2682 > 528 2627 3087 2027 2641 > 536 2508 3102 2001 2601 > 544 2638 3090 1964 2564 > 552 2494 3077 1962 2516 > 560 2625 3064 1941 2515 > 568 2500 3086 1922 2493 > 576 2611 3074 2050 2689 > 584 2482 3062 2041 2680 > 592 2595 3074 2026 2644 > 600 2470 3060 1985 2595 > 608 2581 3039 1961 2555 > 616 2478 3062 1956 2521 > 624 2587 3066 1930 2493 > 632 2457 3053 1923 2486 > 640 2581 3050 2059 2712 > 648 2296 2839 2024 2655 > 656 2389 2845 2019 2642 > 664 2292 2842 2002 2610 > 672 2404 2838 1959 2537 > 680 2273 2827 1956 2527 > 688 2389 2840 1938 2510 > 696 2280 2837 1911 2463 > 704 2370 2819 2055 2702 > 712 2277 2834 2029 2663 > 720 2369 2829 2020 2625 > 728 2255 2820 2001 2600 > 736 2373 2819 1958 2543 > 744 2269 2827 1956 2524 > 752 2364 2817 1937 2492 > 760 2270 2805 1909 2483 > 768 2378 2820 2050 2696 > 776 2053 2700 2002 2643 > 784 2066 2693 1922 2640 > 792 2065 2703 1928 2602 > 800 2138 2706 1962 2535 > 808 2065 2679 1938 2528 > 816 2063 2699 1929 2500 > 824 2053 2676 1915 2468 > 832 2149 2692 2036 2693 > 840 2055 2689 2024 2659 > 848 2049 2689 2006 2610 > 856 2057 2702 1979 2585 > 864 2144 2703 1960 2547 > 872 2047 2685 1945 2501 > 880 2055 2683 1902 2497 > 888 2060 2689 1897 2478 > 896 2139 2693 2023 2663 > 904 2049 2686 1970 2644 > 912 2055 2688 1925 2621 > 920 2047 2685 1911 2572 > 928 2114 2695 1907 2545 > 936 2055 2681 1927 2492 > 944 2055 2693 1930 2478
Re: [Help] Null pointer exception in scatterwalk_start() in kernel-4.9
On Tue, Nov 20, 2018 at 07:09:53AM +, gongchen (E) wrote: > Hi Dear Herbert, > > Sorry to bother you , but we’ve met a problem in crypto module, > would you please kindly help us look into it ? Thank you very much. > > In the below function chain, scatterwalk_start() doesn't check > the result of sg_next(), so the kernel will crash if sg_next() returns a null > pointer, which is our case. (The full stack is at the end of letter) > > blkcipher_walk_done()->scatterwalk_done()->scatterwalk_pagedone()->scatterwalk_start(walk, > sg_next(walk->sg)); > > Should we add a null-pointer-check in scatterwalk_start()? Or is > there any process can ensure that there should be a valid sg pointer if the > condition (walk->offset >= walk->sg->offset + walk->sg->length) is true? > > We are really looking forward to your reply, any information will > be appreciated , thanks again. Did you apply the following patch? commit 0868def3e4100591e7a1fdbf3eed1439cc8f7ca3 Author: Eric Biggers Date: Mon Jul 23 10:54:57 2018 -0700 crypto: blkcipher - fix crash flushing dcache in error path Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Martin, On Tue, Nov 20, 2018 at 5:29 PM Martin Willi wrote: > Thanks for the offer, no need at this time. But I certainly would > welcome if you could do some (Wireguard) benching with that code to see > if it works for you. I certainly will test it in a few different network circumstances, especially since real testing like this is sometimes more telling than busy-loop benchmarks. > > Actually, similarly here, a 10nm Cannon Lake machine should be > > arriving at my house this week, which should make for some > > interesting testing ground for non-throttled zmm, if you'd like to > > play with it. > > Maybe in a future iteration, thanks. In fact would it be interesting to > know if Cannon Lake can handle that throttling better. Everything I've read on the Internet seems to indicate that's the case, so one of the first things I'll be doing is seeing if that's true. There are also the AVX512 IFMA instructions to play with! Jason
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Jason, > [...] I have a massive Xeon Gold 5120 machine that I can give you > access to if you'd like to do some testing and benching. Thanks for the offer, no need at this time. But I certainly would welcome if you could do some (Wireguard) benching with that code to see if it works for you. > Actually, similarly here, a 10nm Cannon Lake machine should be > arriving at my house this week, which should make for some > interesting testing ground for non-throttled zmm, if you'd like to > play with it. Maybe in a future iteration, thanks. In fact would it be interesting to know if Cannon Lake can handle that throttling better. Regards Martin
Re: [PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations
On Wed, Nov 14, 2018 at 12:21:11PM -0800, Eric Biggers wrote: > From: Eric Biggers > > 'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC > in the mask to crypto_alloc_shash() has no effect. Many users therefore > already don't pass it, but some still do. This inconsistency can cause > confusion, especially since the way the 'mask' argument works is > somewhat counterintuitive. > > Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags. > > This patch shouldn't change any actual behavior. > > Signed-off-by: Eric Biggers > --- > drivers/block/drbd/drbd_receiver.c | 2 +- > drivers/md/dm-integrity.c | 2 +- > drivers/net/wireless/intersil/orinoco/mic.c | 6 ++ > fs/ubifs/auth.c | 5 ++--- > net/bluetooth/smp.c | 2 +- > security/apparmor/crypto.c | 2 +- > security/integrity/evm/evm_crypto.c | 3 +-- > security/keys/encrypted-keys/encrypted.c| 4 ++-- > security/keys/trusted.c | 4 ++-- > 9 files changed, 13 insertions(+), 17 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations
On Wed, Nov 14, 2018 at 12:19:39PM -0800, Eric Biggers wrote: > From: Eric Biggers > > 'cipher' algorithms (single block ciphers) are always synchronous, so > passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no > effect. Many users therefore already don't pass it, but some still do. > This inconsistency can cause confusion, especially since the way the > 'mask' argument works is somewhat counterintuitive. > > Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags. > > This patch shouldn't change any actual behavior. > > Signed-off-by: Eric Biggers > --- > arch/s390/crypto/aes_s390.c | 2 +- > drivers/crypto/amcc/crypto4xx_alg.c | 3 +-- > drivers/crypto/ccp/ccp-crypto-aes-cmac.c | 4 +--- > drivers/crypto/geode-aes.c| 2 +- > drivers/md/dm-crypt.c | 2 +- > drivers/net/wireless/cisco/airo.c | 2 +- > drivers/staging/rtl8192e/rtllib_crypt_ccmp.c | 2 +- > drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_ccmp.c | 2 +- > drivers/usb/wusbcore/crypto.c | 2 +- > net/bluetooth/smp.c | 6 +++--- > net/mac80211/wep.c| 4 ++-- > net/wireless/lib80211_crypt_ccmp.c| 2 +- > net/wireless/lib80211_crypt_tkip.c| 4 ++-- > net/wireless/lib80211_crypt_wep.c | 4 ++-- > 14 files changed, 19 insertions(+), 22 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: remove useless initializations of cra_list
On Wed, Nov 14, 2018 at 11:35:48AM -0800, Eric Biggers wrote: > From: Eric Biggers > > Some algorithms initialize their .cra_list prior to registration. > But this is unnecessary since crypto_register_alg() will overwrite > .cra_list when adding the algorithm to the 'crypto_alg_list'. > Apparently the useless assignment has just been copy+pasted around. > > So, remove the useless assignments. > > Exception: paes_s390.c uses cra_list to check whether the algorithm is > registered or not, so I left that as-is for now. > > This patch shouldn't change any actual behavior. > > Signed-off-by: Eric Biggers > --- > arch/sparc/crypto/aes_glue.c | 5 - > arch/sparc/crypto/camellia_glue.c | 5 - > arch/sparc/crypto/des_glue.c | 5 - > crypto/lz4.c | 1 - > crypto/lz4hc.c| 1 - > drivers/crypto/bcm/cipher.c | 2 -- > drivers/crypto/omap-aes.c | 2 -- > drivers/crypto/omap-des.c | 1 - > drivers/crypto/qce/ablkcipher.c | 1 - > drivers/crypto/qce/sha.c | 1 - > drivers/crypto/sahara.c | 1 - > 11 files changed, 25 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: inside-secure - remove useless setting of type flags
On Wed, Nov 14, 2018 at 11:10:53AM -0800, Eric Biggers wrote: > From: Eric Biggers > > Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER. > Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type > flags") took care of this everywhere else, but a few more instances made > it into the tree at about the same time. Squash them before they get > copy+pasted around again. > > This patch shouldn't change any actual behavior. > > Signed-off-by: Eric Biggers > --- > drivers/crypto/inside-secure/safexcel_cipher.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Martin, On Mon, Nov 19, 2018 at 8:52 AM Martin Willi wrote: > > Adding AVX-512VL support is relatively simple. I have a patchset mostly > ready that is more than competitive with the code from Zinc. I'll clean > that up and do more testing before posting it later this week. Terrific. Depending on how it turns out, it'll be nice to try integrating this into Zinc. I have a massive Xeon Gold 5120 machine that I can give you access to if you'd like to do some testing and benching. Poke me on IRC -- I'm zx2c4. > I don't think that having AVX-512F is that important until it is really > usable on CPUs in the market. Actually, similarly here, a 10nm Cannon Lake machine should be arriving at my house this week, which should make for some interesting testing ground for non-throttled zmm, if you'd like to play with it. Jason
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote: > On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > > Date: Mon, 19 Nov 2018 17:14:05 +0800 > > From: Kenneth Lee > > To: Leon Romanovsky > > CC: Tim Sell , linux-...@vger.kernel.org, > > Alexander Shishkin , Zaibo Xu > > , zhangfei@foxmail.com, linux...@huawei.com, > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > , Gavin Schenk , RDMA mailing > > list , Vinod Koul , Jason > > Gunthorpe , Doug Ledford , Uwe > > Kleine-König , David Kershner > > , Kenneth Lee , Johan > > Hovold , Cyrille Pitchen > > , Sagar Dharia > > , Jens Axboe , > > guodong...@linaro.org, linux-netdev , Randy Dunlap > > , linux-ker...@vger.kernel.org, Zhou Wang > > , linux-crypto@vger.kernel.org, Philippe > > Ombredanne , Sanyog Kale , > > "David S. Miller" , > > linux-accelerat...@lists.ozlabs.org > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > User-Agent: Mutt/1.5.21 (2010-09-15) > > Message-ID: <20181119091405.GE157308@Turing-Arch-b> > > > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > > > Date: Thu, 15 Nov 2018 16:54:55 +0200 > > > From: Leon Romanovsky > > > To: Kenneth Lee > > > CC: Kenneth Lee , Tim Sell , > > > linux-...@vger.kernel.org, Alexander Shishkin > > > , Zaibo Xu , > > > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > > > Christoph Lameter , Hao Fang , > > > Gavin > > > Schenk , RDMA mailing list > > > , Zhou Wang , Jason > > > Gunthorpe , Doug Ledford , Uwe > > > Kleine-König , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , "David S. > > > Miller" , linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > > > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > > > From: Leon Romanovsky > > > > > To: Kenneth Lee > > > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > > > Alexander Shishkin , Zaibo Xu > > > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao > > > > > Fang > > > > > , Gavin Schenk , RDMA > > > > > mailing > > > > > list , Zhou Wang > > > > > , > > > > > Jason Gunthorpe , Doug Ledford , > > > > > Uwe > > > > > Kleine-König , David Kershner > > > > > , Johan Hovold , Cyrille > > > > > Pitchen , Sagar Dharia > > > > > , Jens Axboe , > > > > > guodong...@linaro.org, linux-netdev , Randy > > > > > Dunlap > > > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > > > , Sanyog Kale , > > > > > Kenneth Lee > > > > > , "David S. Miller" , > > > > > linux-accelerat...@lists.ozlabs.org > > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for > > > > > WarpDrive/uacce > > > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > > > From: Kenneth Lee > > > > > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > > > application to > > > > > > > > access the hardware without going through the ker
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > Date: Mon, 19 Nov 2018 17:14:05 +0800 > From: Kenneth Lee > To: Leon Romanovsky > CC: Tim Sell , linux-...@vger.kernel.org, > Alexander Shishkin , Zaibo Xu > , zhangfei@foxmail.com, linux...@huawei.com, > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > , Gavin Schenk , RDMA mailing > list , Vinod Koul , Jason > Gunthorpe , Doug Ledford , Uwe > Kleine-König , David Kershner > , Kenneth Lee , Johan > Hovold , Cyrille Pitchen > , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-ker...@vger.kernel.org, Zhou Wang > , linux-crypto@vger.kernel.org, Philippe > Ombredanne , Sanyog Kale , > "David S. Miller" , > linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.5.21 (2010-09-15) > Message-ID: <20181119091405.GE157308@Turing-Arch-b> > > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > > Date: Thu, 15 Nov 2018 16:54:55 +0200 > > From: Leon Romanovsky > > To: Kenneth Lee > > CC: Kenneth Lee , Tim Sell , > > linux-...@vger.kernel.org, Alexander Shishkin > > , Zaibo Xu , > > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > > Christoph Lameter , Hao Fang , Gavin > > Schenk , RDMA mailing list > > , Zhou Wang , Jason > > Gunthorpe , Doug Ledford , Uwe > > Kleine-König , David Kershner > > , Johan Hovold , Cyrille > > Pitchen , Sagar Dharia > > , Jens Axboe , > > guodong...@linaro.org, linux-netdev , Randy Dunlap > > , linux-ker...@vger.kernel.org, Vinod Koul > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > , Sanyog Kale , "David S. > > Miller" , linux-accelerat...@lists.ozlabs.org > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > User-Agent: Mutt/1.10.1 (2018-07-13) > > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > > From: Leon Romanovsky > > > > To: Kenneth Lee > > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > > Alexander Shishkin , Zaibo Xu > > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > > , Gavin Schenk , RDMA > > > > mailing > > > > list , Zhou Wang , > > > > Jason Gunthorpe , Doug Ledford , > > > > Uwe > > > > Kleine-König , David Kershner > > > > , Johan Hovold , Cyrille > > > > Pitchen , Sagar Dharia > > > > , Jens Axboe , > > > > guodong...@linaro.org, linux-netdev , Randy > > > > Dunlap > > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > > , Sanyog Kale , Kenneth > > > > Lee > > > > , "David S. Miller" , > > > > linux-accelerat...@lists.ozlabs.org > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > > From: Kenneth Lee > > > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > > application to > > > > > > > access the hardware without going through the kernel in data path. > > > > > > > > > > > > > > The kernel component to provide kernel facility to driver for > > > > > > > expose the > > > > > > > user interface is called uacce. It a short name for > > > > > > > "Unified/User-space-access-intended Accelerator Framework". > > > > > > > > > > > > > > This patch add document to explain how it works. > > > > > > + RDMA and netdev folks > > > > > > > > > > >
Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote: > Date: Thu, 15 Nov 2018 16:54:55 +0200 > From: Leon Romanovsky > To: Kenneth Lee > CC: Kenneth Lee , Tim Sell , > linux-...@vger.kernel.org, Alexander Shishkin > , Zaibo Xu , > zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org, > Christoph Lameter , Hao Fang , Gavin > Schenk , RDMA mailing list > , Zhou Wang , Jason > Gunthorpe , Doug Ledford , Uwe > Kleine-König , David Kershner > , Johan Hovold , Cyrille > Pitchen , Sagar Dharia > , Jens Axboe , > guodong...@linaro.org, linux-netdev , Randy Dunlap > , linux-ker...@vger.kernel.org, Vinod Koul > , linux-crypto@vger.kernel.org, Philippe Ombredanne > , Sanyog Kale , "David S. > Miller" , linux-accelerat...@lists.ozlabs.org > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > User-Agent: Mutt/1.10.1 (2018-07-13) > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com> > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote: > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote: > > > Date: Wed, 14 Nov 2018 18:00:17 +0200 > > > From: Leon Romanovsky > > > To: Kenneth Lee > > > CC: Tim Sell , linux-...@vger.kernel.org, > > > Alexander Shishkin , Zaibo Xu > > > , zhangfei@foxmail.com, linux...@huawei.com, > > > haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang > > > , Gavin Schenk , RDMA > > > mailing > > > list , Zhou Wang , > > > Jason Gunthorpe , Doug Ledford , Uwe > > > Kleine-König , David Kershner > > > , Johan Hovold , Cyrille > > > Pitchen , Sagar Dharia > > > , Jens Axboe , > > > guodong...@linaro.org, linux-netdev , Randy > > > Dunlap > > > , linux-ker...@vger.kernel.org, Vinod Koul > > > , linux-crypto@vger.kernel.org, Philippe Ombredanne > > > , Sanyog Kale , Kenneth > > > Lee > > > , "David S. Miller" , > > > linux-accelerat...@lists.ozlabs.org > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce > > > User-Agent: Mutt/1.10.1 (2018-07-13) > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com> > > > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote: > > > > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道: > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote: > > > > > > From: Kenneth Lee > > > > > > > > > > > > WarpDrive is a general accelerator framework for the user > > > > > > application to > > > > > > access the hardware without going through the kernel in data path. > > > > > > > > > > > > The kernel component to provide kernel facility to driver for > > > > > > expose the > > > > > > user interface is called uacce. It a short name for > > > > > > "Unified/User-space-access-intended Accelerator Framework". > > > > > > > > > > > > This patch add document to explain how it works. > > > > > + RDMA and netdev folks > > > > > > > > > > Sorry, to be late in the game, I don't see other patches, but from > > > > > the description below it seems like you are reinventing RDMA verbs > > > > > model. I have hard time to see the differences in the proposed > > > > > framework to already implemented in drivers/infiniband/* for the > > > > > kernel > > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the > > > > > user > > > > > space parts. > > > > > > > > Thanks Leon, > > > > > > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot > > > > from > > > > the exist code of RDMA. But we we have to make a new one because we > > > > cannot > > > > register accelerators such as AI operation, encryption or compression > > > > to the > > > > RDMA framework:) > > > > > > Assuming that you did everything right and still failed to use RDMA > > > framework, you was supposed to fix it and not to reinvent new exactly > > > same one. It is how we develop kernel, by reusing existing code. > > > > Yes, but we don't force other system such as NIC or GPU into RDMA, do we? > > You don't introduce new NIC or GPU, but proposing another interfac
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Jason, > I'd be inclined to roll with your implementation if it can eventually > become competitive with Andy Polyakov's, [...] I think for the SSSE3/AVX2 code paths it is competitive; especially for small sizes it is faster, which is not that unimportant when implementing layer 3 VPNs. > there are still no AVX-512 paths, which means it's considerably > slower on all newer generation Intel chips. Andy's has the AVX-512VL > implementation for Skylake (using ymm, so as not to hit throttling) > and AVX-512F for Cannon Lake and beyond (using zmm). I don't think that having AVX-512F is that important until it is really usable on CPUs in the market. Adding AVX-512VL support is relatively simple. I have a patchset mostly ready that is more than competitive with the code from Zinc. I'll clean that up and do more testing before posting it later this week. Best regards Martin
Re: [PATCH 0/5] crypto: caam - add support for Era 10
On Thu, Nov 08, 2018 at 03:36:26PM +0200, Horia Geantă wrote: > This patch set adds support for CAAM Era 10, currently used in LX2160A SoC: > -new register mapping: some registers/fields are deprecated and moved > to different locations, mainly version registers > -algorithms > chacha20 (over DPSECI - Data Path SEC Interface on fsl-mc bus) > rfc7539(chacha20,poly1305) (over both DPSECI and Job Ring Interface) > rfc7539esp(chacha20,poly1305) (over both DPSECI and Job Ring Interface) > > Note: the patch set is generated on top of cryptodev-2.6, however testing > was performed based on linux-next (tag: next-20181108) - which includes > LX2160A platform support + manually updating LX2160A dts with: > -fsl-mc bus DT node > -missing dma-ranges property in soc DT node > > Cristian Stoica (1): > crypto: export CHACHAPOLY_IV_SIZE > > Horia Geantă (4): > crypto: caam - add register map changes cf. Era 10 > crypto: caam/qi2 - add support for ChaCha20 > crypto: caam/jr - add support for Chacha20 + Poly1305 > crypto: caam/qi2 - add support for Chacha20 + Poly1305 > > crypto/chacha20poly1305.c | 2 - > drivers/crypto/caam/caamalg.c | 266 > ++--- > drivers/crypto/caam/caamalg_desc.c | 139 ++- > drivers/crypto/caam/caamalg_desc.h | 5 + > drivers/crypto/caam/caamalg_qi.c | 37 -- > drivers/crypto/caam/caamalg_qi2.c | 156 +- > drivers/crypto/caam/caamhash.c | 20 ++- > drivers/crypto/caam/caampkc.c | 10 +- > drivers/crypto/caam/caamrng.c | 10 +- > drivers/crypto/caam/compat.h | 2 + > drivers/crypto/caam/ctrl.c | 28 +++- > drivers/crypto/caam/desc.h | 28 > drivers/crypto/caam/desc_constr.h | 7 +- > drivers/crypto/caam/regs.h | 74 +-- > include/crypto/chacha20.h | 1 + > 15 files changed, 724 insertions(+), 61 deletions(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote: > This patchset improves performance of the ChaCha20 SIMD implementations > for x86_64. For some specific encryption lengths, performance is more > than doubled. Two mechanisms are used to achieve this: > > * Instead of calculating the minimal number of required blocks for a > given encryption length, functions producing more blocks are used > more aggressively. Calculating a 4-block function can be faster than > calculating a 2-block and a 1-block function, even if only three > blocks are actually required. > > * In addition to the 8-block AVX2 function, a 4-block and a 2-block > function are introduced. > > Patches 1-3 add support for partial lengths to the existing 1-, 4- and > 8-block functions. Patch 4 makes use of that by engaging the next higher > level block functions more aggressively. Patch 5 and 6 add the new AVX2 > functions for 2 and 4 blocks. Patches are based on cryptodev and would > need adjustments to apply on top of the Adiantum patchset. > > Note that the more aggressive use of larger block functions calculate > blocks that may get discarded. This may have a negative impact on energy > usage or the processors thermal budget. However, with the new block > functions we can avoid this over-calculation for many lengths, so the > performance win can be considered more important. > > Below are performance numbers measured with tcrypt using additional > encryption lengths; numbers in kOps/s, on my i7-5557U. old is the > existing, new the implementation with this patchset. As comparison > the numbers for zinc in v6: > > len old new zinc >8 5908 5818 5818 > 16 5917 5828 5726 > 24 5916 5869 5757 > 32 5920 5789 5813 > 40 5868 5799 5710 > 48 5877 5761 5761 > 56 5869 5797 5742 > 64 5897 5862 5685 > 72 3381 4979 3520 > 80 3364 5541 3475 > 88 3350 4977 3424 > 96 3342 5530 3371 > 104 3328 4923 3313 > 112 3317 5528 3207 > 120 3313 4970 3150 > 128 3492 5535 3568 > 136 2487 4570 3690 > 144 2481 5047 3599 > 152 2473 4565 3566 > 160 2459 5022 3515 > 168 2461 4550 3437 > 176 2454 5020 3325 > 184 2449 4535 3279 > 192 2538 5011 3762 > 200 1962 4537 3702 > 208 1962 4971 3622 > 216 1954 4487 3518 > 224 1949 4936 3445 > 232 1948 4497 3422 > 240 1941 4947 3317 > 248 1940 4481 3279 > 256 3798 4964 3723 > 264 2638 3577 3639 > 272 2637 3567 3597 > 280 2628 3563 3565 > 288 2630 3795 3484 > 296 2621 3580 3422 > 304 2612 3569 3352 > 312 2602 3599 3308 > 320 2694 3821 3694 > 328 2060 3538 3681 > 336 2054 3565 3599 > 344 2054 3553 3523 > 352 2049 3809 3419 > 360 2045 3575 3403 > 368 2035 3560 3334 > 376 2036 3555 3257 > 384 2092 3785 3715 > 392 1691 3505 3612 > 400 1684 3527 3553 > 408 1686 3527 3496 > 416 1684 3804 3430 > 424 1681 3555 3402 > 432 1675 3559 3311 > 440 1672 3558 3275 > 448 1710 3780 3689 > 456 1431 3541 3618 > 464 1428 3538 3576 > 472 1430 3527 3509 > 480 1426 3788 3405 > 488 1423 3502 3397 > 496 1423 3519 3298 > 504 1418 3519 3277 > 512 3694 3736 3735 > 520 2601 2571 2209 > 528 2601 2677 2148 > 536 2587 2534 2164 > 544 2578 2659 2138 > 552 2570 2552 2126 > 560 2566 2661 2035 > 568 2567 2542 2041 > 576 2639 2674 2199 > 584 2031 2531 2183 > 592 2027 2660 2145 > 600 2016 2513 2155 > 608 2009 2638 2133 > 616 2006 2522 2115 > 624 2000 2649 2064 > 632 1996 2518 2045 > 640 2053 2651 2188 > 648 1666 2402 2182 > 656 1663 2517 2158 > 664 1659 2397 2147 > 672 1657 2510 2139 > 680 1656 2394 2114 > 688 1653 2497 2077 > 696 1646 2393 2043 > 704 1678 2510 2208 > 712 1414 2391 2189 > 720 1412 2506 2169 > 728 1411 2384 2145 > 736 1408 2494 2142 > 744 1408 2379 2081 > 752 1405 2485 2064 > 760 1403 2376 2043 > 768 2189 2498 2211 > 776 1756 2137 2192 > 784 1746 2145 2146 > 792 1744 2141 2141 > 800 1743 2094 > 808 1742 2140 2100 > 816 1735 2134 2061 > 824 1731 2135 2045 > 832 1778 2223 > 840 1480 2132 2184 > 848 1480 2134 2173 > 856 1476 2124 2145 > 864 1474 2210 2126 > 872 1472 2127 2105 > 880 1463 2123 2056 > 888 1468 2123 2043 > 896 1494 2208 2219 > 904 1278 2120 2192 > 912 1277 2121 2170 > 920 1273 2118 2149 > 928 1272 2207 2125 > 936 1267 2125 2098 > 944 1265 2127 2060 > 952 1267 2126 2049 > 960 1289 2213 2204 > 968 1125 2123 2187 > 976 1122 2127 2166 > 984 1120 2123 2136 > 992 1118 2207 2119 > 1000 1118 2120 2101 > 1008 1117 2122 2042 > 1016 1115 2121 2048 > 1024 2174 2191 2195 > 1032 1748 1724 1565 > 1040 1745 1782 1544 > 1048 1736 1737 1554 > 1056 1738 1802 1541 > 1064 1735 1728 1523 > 1072 1730 1780 1507 > 1080 1729 1724 1497 > 1088 1757 1783 1592 > 1096 1475 1723 1575 > 1104 1474 1778 1563 > 1112 1472 1708 1544 > 1120 1468 1774 1521 > 1128 1466 1718 1521 > 1136 1462 1780 1501 > 1144 1460 1719 1491 > 1152 1481 1782 1575 > 1160 1271 1647 1558 > 1168 1271 1706 1554 > 1176 1268 1645 1545 > 1184 1265 1711 1538 > 1192 1265 1648 1530 > 1200 1264 1705 1493 > 1208 1262 1647 1498 > 1216 1277 1695 1581
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
Hi Martin, This is nice work, and given that it's quite clean -- and that it's usually hard to screw up chacha in subtle ways when test vectors pass (unlike, say, poly1305 or curve25519), I'd be inclined to roll with your implementation if it can eventually become competitive with Andy Polyakov's, which I'm currently working on for Zinc (which no longer has pre-generated code, addressing the biggest hurdle; v9 will be sent shortly). Specifically, I'm not quite sure the improvements here tip the balance apply to all avx2 microarchitectures, and most importantly, there are still no AVX-512 paths, which means it's considerably slower on all newer generation Intel chips. Andy's has the AVX-512VL implementation for Skylake (using ymm, so as not to hit throttling) and AVX-512F for Cannon Lake and beyond (using zmm). I've attached some measurements below showing how stark the difference is. The take away is that while Andy's implementation is still ahead in terms of performance today, I'd certainly encourage your efforts to gain parity with that, and I'd be happy have that when the performance and fuzzing time is right for it. So please do keep chipping away at it; I think it's a potentially useful effort. Regards, Jason size old zinc 0 64 54 16 386 372 32 388 396 48 388 420 64 366 350 80 708 666 96 708 692 112 706 736 128 692 648 144 1036 682 160 1036 708 176 1036 730 192 1016 658 208 1360 684 224 1362 708 240 1360 732 256 644 500 272 990 526 288 988 556 304 988 576 320 972 500 336 1314 532 352 1316 558 368 1318 578 384 1308 506 400 1644 532 416 1644 556 432 1644 594 448 1624 508 464 1970 534 480 1970 556 496 1968 582 512 660 624 528 1016 682 544 1016 702 560 1018 728 576 998 654 592 1344 680 608 1344 708 624 1344 730 640 1326 654 656 1670 686 672 1670 708 688 1670 732 704 1652 658 720 1998 682 736 1998 710 752 1996 734 768 1256 662 784 1606 688 800 1606 714 816 1606 736 832 1584 660 848 1948 688 864 1950 714 880 1948 736 896 1912 688 912 2258 718 928 2258 744 944 2256 768 960 2238 692 976 2584 718 992 2584 744 1008 2584 770 On Thu, Nov 15, 2018 at 6:21 PM Herbert Xu wrote: > > On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote: > > This patchset improves performance of the ChaCha20 SIMD implementations > > for x86_64. For some specific encryption lengths, performance is more > > than doubled. Two mechanisms are used to achieve this: > > > > * Instead of calculating the minimal number of required blocks for a > > given encryption length, functions producing more blocks are used > > more aggressively. Calculating a 4-block function can be faster than > > calculating a 2-block and a 1-block function, even if only three > > blocks are actually required. > > > > * In addition to the 8-block AVX2 function, a 4-block and a 2-block > > function are introduced. > > > > Patches 1-3 add support for partial lengths to the existing 1-, 4- and > > 8-block functions. Patch 4 makes use of that by engaging the next higher > > level block functions more aggressively. Patch 5 and 6 add the new AVX2 > > functions for 2 and 4 blocks. Patches are based on cryptodev and would > > need adjustments to apply on top of the Adiantum patchset. > > > > Note that the more aggressive use of larger block functions calculate > > blocks that may get discarded. This may have a negative impact on energy > > usage or the processors thermal budget. However, with the new block > > functions we can avoid this over-calculation for many lengths, so the > > performance win can be considered more important. > > > > Below are performance numbers measured with tcrypt using additional > > encryption lengths; numbers in kOps/s, on my i7-5557U. old is the > > existing, new the implementation with this patchset. As comparison > > the numbers for zinc in v6: > > > > len old new zinc > >8 5908 5818 5818 > > 16 5917 5828 5726 > > 24 5916 5869 5757 > > 32 5920 5789 5813 > > 40 5868 5799 5710 > > 48 5877 5761 5761 > > 56 5869 5797 5742 > > 64 5897 5862 5685 > > 72 3381 4979 3520 > > 80 3364 5541 3475 > > 88 3350 4977 3424 > > 96 3342 5530 3371 > > 104 3328 4923 3313 > > 112 3317 5528 3207 > > 120 3313 4970 3150 > > 128 3492 5535 3568 > > 136 2487 4570 3690 > > 144 2481 5047 3599 > > 152 2473 4565 3566 > > 160 2459 5022 3515 > > 168 2461 4550 3437 > > 176 2454 5020 3325 > > 184 2449 4535 3279 > > 192 2538 5011 3762 > > 200 1962 4537 3702 > > 208 1962 4971 3622 > > 216 1954 4487 3518 > > 224 1949 4936 3445 > > 232 1948 4497 3422 > > 240 1941 4947 3317 > > 248 1940 4481 3279 > > 256 3798 4964 3723 > > 264 2638 3577 3639 > > 272 2637 3567 3597 > > 280 2628 3563 3565 > > 288 2630 3795 3484 > > 296 2621 3580 3422 > > 304 2612 3569 3352 > > 312 2602 3599 3308 > > 320 2694 3821 3694 > > 328 2060 3538 3681 > > 336 2054 3565 3599 > > 344 2054 3553 3523 > > 352 2049 3809 3419 > > 360 2045 3575 3403 > > 368 2035 3560 3334 > > 376 2036 3555 3257 > > 384 2092 3785 3715 > >
Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements
On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote: > This patchset improves performance of the ChaCha20 SIMD implementations > for x86_64. For some specific encryption lengths, performance is more > than doubled. Two mechanisms are used to achieve this: > > * Instead of calculating the minimal number of required blocks for a > given encryption length, functions producing more blocks are used > more aggressively. Calculating a 4-block function can be faster than > calculating a 2-block and a 1-block function, even if only three > blocks are actually required. > > * In addition to the 8-block AVX2 function, a 4-block and a 2-block > function are introduced. > > Patches 1-3 add support for partial lengths to the existing 1-, 4- and > 8-block functions. Patch 4 makes use of that by engaging the next higher > level block functions more aggressively. Patch 5 and 6 add the new AVX2 > functions for 2 and 4 blocks. Patches are based on cryptodev and would > need adjustments to apply on top of the Adiantum patchset. > > Note that the more aggressive use of larger block functions calculate > blocks that may get discarded. This may have a negative impact on energy > usage or the processors thermal budget. However, with the new block > functions we can avoid this over-calculation for many lengths, so the > performance win can be considered more important. > > Below are performance numbers measured with tcrypt using additional > encryption lengths; numbers in kOps/s, on my i7-5557U. old is the > existing, new the implementation with this patchset. As comparison > the numbers for zinc in v6: > > len old new zinc >8 5908 5818 5818 > 16 5917 5828 5726 > 24 5916 5869 5757 > 32 5920 5789 5813 > 40 5868 5799 5710 > 48 5877 5761 5761 > 56 5869 5797 5742 > 64 5897 5862 5685 > 72 3381 4979 3520 > 80 3364 5541 3475 > 88 3350 4977 3424 > 96 3342 5530 3371 > 104 3328 4923 3313 > 112 3317 5528 3207 > 120 3313 4970 3150 > 128 3492 5535 3568 > 136 2487 4570 3690 > 144 2481 5047 3599 > 152 2473 4565 3566 > 160 2459 5022 3515 > 168 2461 4550 3437 > 176 2454 5020 3325 > 184 2449 4535 3279 > 192 2538 5011 3762 > 200 1962 4537 3702 > 208 1962 4971 3622 > 216 1954 4487 3518 > 224 1949 4936 3445 > 232 1948 4497 3422 > 240 1941 4947 3317 > 248 1940 4481 3279 > 256 3798 4964 3723 > 264 2638 3577 3639 > 272 2637 3567 3597 > 280 2628 3563 3565 > 288 2630 3795 3484 > 296 2621 3580 3422 > 304 2612 3569 3352 > 312 2602 3599 3308 > 320 2694 3821 3694 > 328 2060 3538 3681 > 336 2054 3565 3599 > 344 2054 3553 3523 > 352 2049 3809 3419 > 360 2045 3575 3403 > 368 2035 3560 3334 > 376 2036 3555 3257 > 384 2092 3785 3715 > 392 1691 3505 3612 > 400 1684 3527 3553 > 408 1686 3527 3496 > 416 1684 3804 3430 > 424 1681 3555 3402 > 432 1675 3559 3311 > 440 1672 3558 3275 > 448 1710 3780 3689 > 456 1431 3541 3618 > 464 1428 3538 3576 > 472 1430 3527 3509 > 480 1426 3788 3405 > 488 1423 3502 3397 > 496 1423 3519 3298 > 504 1418 3519 3277 > 512 3694 3736 3735 > 520 2601 2571 2209 > 528 2601 2677 2148 > 536 2587 2534 2164 > 544 2578 2659 2138 > 552 2570 2552 2126 > 560 2566 2661 2035 > 568 2567 2542 2041 > 576 2639 2674 2199 > 584 2031 2531 2183 > 592 2027 2660 2145 > 600 2016 2513 2155 > 608 2009 2638 2133 > 616 2006 2522 2115 > 624 2000 2649 2064 > 632 1996 2518 2045 > 640 2053 2651 2188 > 648 1666 2402 2182 > 656 1663 2517 2158 > 664 1659 2397 2147 > 672 1657 2510 2139 > 680 1656 2394 2114 > 688 1653 2497 2077 > 696 1646 2393 2043 > 704 1678 2510 2208 > 712 1414 2391 2189 > 720 1412 2506 2169 > 728 1411 2384 2145 > 736 1408 2494 2142 > 744 1408 2379 2081 > 752 1405 2485 2064 > 760 1403 2376 2043 > 768 2189 2498 2211 > 776 1756 2137 2192 > 784 1746 2145 2146 > 792 1744 2141 2141 > 800 1743 2094 > 808 1742 2140 2100 > 816 1735 2134 2061 > 824 1731 2135 2045 > 832 1778 2223 > 840 1480 2132 2184 > 848 1480 2134 2173 > 856 1476 2124 2145 > 864 1474 2210 2126 > 872 1472 2127 2105 > 880 1463 2123 2056 > 888 1468 2123 2043 > 896 1494 2208 2219 > 904 1278 2120 2192 > 912 1277 2121 2170 > 920 1273 2118 2149 > 928 1272 2207 2125 > 936 1267 2125 2098 > 944 1265 2127 2060 > 952 1267 2126 2049 > 960 1289 2213 2204 > 968 1125 2123 2187 > 976 1122 2127 2166 > 984 1120 2123 2136 > 992 1118 2207 2119 > 1000 1118 2120 2101 > 1008 1117 2122 2042 > 1016 1115 2121 2048 > 1024 2174 2191 2195 > 1032 1748 1724 1565 > 1040 1745 1782 1544 > 1048 1736 1737 1554 > 1056 1738 1802 1541 > 1064 1735 1728 1523 > 1072 1730 1780 1507 > 1080 1729 1724 1497 > 1088 1757 1783 1592 > 1096 1475 1723 1575 > 1104 1474 1778 1563 > 1112 1472 1708 1544 > 1120 1468 1774 1521 > 1128 1466 1718 1521 > 1136 1462 1780 1501 > 1144 1460 1719 1491 > 1152 1481 1782 1575 > 1160 1271 1647 1558 > 1168 1271 1706 1554 > 1176 1268 1645 1545 > 1184 1265 1711 1538 > 1192 1265 1648 1530 > 1200 1264 1705 1493 > 1208 1262 1647 1498 > 1216 1277 1695 1581
Re: [PATCH] crypto: inside-secure - remove useless setting of type flags
Hi Eric, On Wed, Nov 14, 2018 at 11:10:53AM -0800, Eric Biggers wrote: > From: Eric Biggers > > Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER. > Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type > flags") took care of this everywhere else, but a few more instances made > it into the tree at about the same time. Squash them before they get > copy+pasted around again. > > This patch shouldn't change any actual behavior. > > Signed-off-by: Eric Biggers Acked-by: Antoine Tenart Thanks! Antoine > --- > drivers/crypto/inside-secure/safexcel_cipher.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c > b/drivers/crypto/inside-secure/safexcel_cipher.c > index 3aef1d43e4351..d531c14020dcb 100644 > --- a/drivers/crypto/inside-secure/safexcel_cipher.c > +++ b/drivers/crypto/inside-secure/safexcel_cipher.c > @@ -970,7 +970,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des = { > .cra_name = "cbc(des)", > .cra_driver_name = "safexcel-cbc-des", > .cra_priority = 300, > - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | > CRYPTO_ALG_ASYNC | > + .cra_flags = CRYPTO_ALG_ASYNC | >CRYPTO_ALG_KERN_DRIVER_ONLY, > .cra_blocksize = DES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct safexcel_cipher_ctx), > @@ -1010,7 +1010,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des = { > .cra_name = "ecb(des)", > .cra_driver_name = "safexcel-ecb-des", > .cra_priority = 300, > - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | > CRYPTO_ALG_ASYNC | > + .cra_flags = CRYPTO_ALG_ASYNC | >CRYPTO_ALG_KERN_DRIVER_ONLY, > .cra_blocksize = DES_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct safexcel_cipher_ctx), > @@ -1074,7 +1074,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des3_ede > = { > .cra_name = "cbc(des3_ede)", > .cra_driver_name = "safexcel-cbc-des3_ede", > .cra_priority = 300, > - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | > CRYPTO_ALG_ASYNC | > + .cra_flags = CRYPTO_ALG_ASYNC | >CRYPTO_ALG_KERN_DRIVER_ONLY, > .cra_blocksize = DES3_EDE_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct safexcel_cipher_ctx), > @@ -1114,7 +1114,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des3_ede > = { > .cra_name = "ecb(des3_ede)", > .cra_driver_name = "safexcel-ecb-des3_ede", > .cra_priority = 300, > - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | > CRYPTO_ALG_ASYNC | > + .cra_flags = CRYPTO_ALG_ASYNC | >CRYPTO_ALG_KERN_DRIVER_ONLY, > .cra_blocksize = DES3_EDE_BLOCK_SIZE, > .cra_ctxsize = sizeof(struct safexcel_cipher_ctx), > -- > 2.19.1.930.g4563a0d9d0-goog > -- Antoine Ténart, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: Something wrong with cryptodev-2.6 tree?
On Mon, Nov 12, 2018 at 09:44:41AM +0200, Gilad Ben-Yossef wrote: > Hi, > > It seems that the cryptodev-2.6 tree at > https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git > has somehow rolled back 3 months ago. > > Not sure if it's a git.kernel.org issue or something else but probably > worth taking a look? Thanks Gilad. It should be fixed now. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier
On Sat, 2018-11-10 at 15:51 +0100, Stefan Wahren wrote: > Adopt the SPDX license identifier headers to ease license compliance > management. While we are at this fix the comment style, too. > > Cc: Lubomir Rintel > Signed-off-by: Stefan Wahren > --- > drivers/char/hw_random/bcm2835-rng.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/char/hw_random/bcm2835-rng.c > b/drivers/char/hw_random/bcm2835-rng.c > index 6767d96..256b0b1 100644 > --- a/drivers/char/hw_random/bcm2835-rng.c > +++ b/drivers/char/hw_random/bcm2835-rng.c > @@ -1,10 +1,7 @@ > -/** > +// SPDX-License-Identifier: GPL-2.0 > +/* > * Copyright (c) 2010-2012 Broadcom. All rights reserved. > * Copyright (c) 2013 Lubomir Rintel > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public License > ("GPL") > - * version 2, as published by the Free Software Foundation. > */ > > #include Acked-by: Lubomir Rintel
Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier
Stefan Wahren writes: > Adopt the SPDX license identifier headers to ease license compliance > management. While we are at this fix the comment style, too. Reviewed-by: Eric Anholt signature.asc Description: PGP signature
Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier
On Sat, Nov 10, 2018 at 03:51:16PM +0100, Stefan Wahren wrote: > Adopt the SPDX license identifier headers to ease license compliance > management. While we are at this fix the comment style, too. > > Cc: Lubomir Rintel > Signed-off-by: Stefan Wahren > --- > drivers/char/hw_random/bcm2835-rng.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) Acked-by: Greg Kroah-Hartman
Re: [PATCH 1/2] crypto: fix cfb mode decryption
On Sat, Oct 20, 2018 at 02:01:52AM +0300, Dmitry Eremin-Solenikov wrote: > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with > IV, rather than with data stream, resulting in incorrect decryption. > Test vectors will be added in the next patch. > > Signed-off-by: Dmitry Eremin-Solenikov > Cc: sta...@vger.kernel.org > --- > crypto/cfb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v3 0/2] crypto: some hardening against AES cache-timing attacks
On Wed, Oct 17, 2018 at 09:37:57PM -0700, Eric Biggers wrote: > This series makes the "aes-fixed-time" and "aes-arm" implementations of > AES more resistant to cache-timing attacks. > > Note that even after these changes, the implementations still aren't > necessarily guaranteed to be constant-time; see > https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion > of the many difficulties involved in writing truly constant-time AES > software. But it's valuable to make such attacks more difficult. > > Changed since v2: > - In aes-arm, move the IRQ disable/enable into the assembly file. > - Other aes-arm tweaks. > - Add Kconfig help text. > > Thanks to Ard Biesheuvel for the suggestions. > > Eric Biggers (2): > crypto: aes_ti - disable interrupts while accessing S-box > crypto: arm/aes - add some hardening against cache-timing attacks > > arch/arm/crypto/Kconfig | 9 + > arch/arm/crypto/aes-cipher-core.S | 62 ++- > crypto/Kconfig| 3 +- > crypto/aes_generic.c | 9 +++-- > crypto/aes_ti.c | 18 + > 5 files changed, 86 insertions(+), 15 deletions(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account
On 9 November 2018 at 10:45, Herbert Xu wrote: > On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote: >> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote: >> > >> > This should be >> > >> > reqsize += max(crypto_skcipher_reqsize(_tfm->base); >> >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm))); >> > >> > since the cryptd path in simd still needs some space in the subreq for >> > the completion. >> >> OK this is what I applied to the cryptodev tree, please double-check >> to see if I did anything silly: > > I meant the crypto tree rather than cryptodev. > That looks fine. Thanks Herbert.
Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account
On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote: > On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote: > > > > This should be > > > > reqsize += max(crypto_skcipher_reqsize(_tfm->base); > >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm))); > > > > since the cryptd path in simd still needs some space in the subreq for > > the completion. > > OK this is what I applied to the cryptodev tree, please double-check > to see if I did anything silly: I meant the crypto tree rather than cryptodev. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account
On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote: > > This should be > > reqsize += max(crypto_skcipher_reqsize(_tfm->base); >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm))); > > since the cryptd path in simd still needs some space in the subreq for > the completion. OK this is what I applied to the cryptodev tree, please double-check to see if I did anything silly: diff --git a/crypto/simd.c b/crypto/simd.c index ea7240be3001..78e8d037ae2b 100644 --- a/crypto/simd.c +++ b/crypto/simd.c @@ -124,8 +124,9 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm) ctx->cryptd_tfm = cryptd_tfm; - reqsize = sizeof(struct skcipher_request); - reqsize += crypto_skcipher_reqsize(_tfm->base); + reqsize = crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)); + reqsize = max(reqsize, crypto_skcipher_reqsize(_tfm->base)); + reqsize += sizeof(struct skcipher_request); crypto_skcipher_set_reqsize(tfm, reqsize); Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: .S_shipped unnecessary?
On Fri, Nov 9, 2018 at 8:42 AM Ard Biesheuvel wrote: > > (+ Masahiro, kbuild ml) > > On 8 November 2018 at 21:37, Jason A. Donenfeld wrote: > > Hi Ard, Eric, and others, > > > > As promised, the next Zinc patchset will have less generated code! After a > > bit of work with Andy and Samuel, I'll be bundling the perlasm. > > > > Wonderful! Any problems doing that for x86_64 ? > > > One thing I'm wondering about, though, is the wisdom behind the current > > .S_shipped pattern. Usually the _shipped is for big firmware blobs that are > > hard (or impossible) to build independently. But in this case, the .S is > > generated from the .pl significantly faster than gcc even compiles a basic > > C file. And, since perl is needed to build the kernel anyway, it's not like > > it will be impossible to find the right tools. Rather than clutter up > > commits > > with the .pl _and_ the .S_shipped, what would you think if I just generated > > the .S each time as an ordinary build artifact. AFAICT, this is fairly > > usual, > > and it's hard to see downsides. Hence, why I'm writing this email: are there > > any downsides to that? > > > > I agree 100%. When I added this the first time, it was at the request > of the ARM maintainer, who was reluctant to rely on Perl for some > reason. > > Recently, we have had to add a kludge to prevent spurious rebuilds of > the .S_shipped files as well. > > I'd be perfectly happy to get rid of this entirely, and always > generate the .S from the .pl, which to me is kind of the point of > carrying these files in the first place. > > Masahiro: do you see any problems with this? No problem. Documentation/process/changes.rst says the following: You will need perl 5 and the following modules: ``Getopt::Long``, ``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the kernel. We can assume perl is installed on the user's build machine. -- Best Regards Masahiro Yamada
Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account
> On Nov 8, 2018, at 6:33 PM, Ard Biesheuvel wrote: > > On 8 November 2018 at 23:55, Ard Biesheuvel wrote: >> The simd wrapper's skcipher request context structure consists >> of a single subrequest whose size is taken from the subordinate >> skcipher. However, in simd_skcipher_init(), the reqsize that is >> retrieved is not from the subordinate skcipher but from the >> cryptd request structure, whose size is completely unrelated to >> the actual wrapped skcipher. >> >> Reported-by: Qian Cai >> Signed-off-by: Ard Biesheuvel >> --- >> crypto/simd.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/crypto/simd.c b/crypto/simd.c >> index ea7240be3001..2f3d6e897afc 100644 >> --- a/crypto/simd.c >> +++ b/crypto/simd.c >> @@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher >> *tfm) >>ctx->cryptd_tfm = cryptd_tfm; >> >>reqsize = sizeof(struct skcipher_request); >> - reqsize += crypto_skcipher_reqsize(_tfm->base); >> + reqsize += >> crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)); >> > > This should be > > reqsize += max(crypto_skcipher_reqsize(_tfm->base); > crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm))); > > since the cryptd path in simd still needs some space in the subreq for > the completion. Tested-by: Qian Cai
Re: .S_shipped unnecessary?
Hey Ard, On Fri, Nov 9, 2018 at 12:42 AM Ard Biesheuvel wrote: > Wonderful! Any problems doing that for x86_64 ? The x86_64 is still a WIP, but hopefully we'll succeed. > I agree 100%. When I added this the first time, it was at the request > of the ARM maintainer, who was reluctant to rely on Perl for some > reason. > > Recently, we have had to add a kludge to prevent spurious rebuilds of > the .S_shipped files as well. > > I'd be perfectly happy to get rid of this entirely, and always > generate the .S from the .pl, which to me is kind of the point of > carrying these files in the first place. Terrific. I'll move ahead in that direction then. It makes things _so_ much cleaner, and doesn't introduce new build modes ("should the generated _ship go into the build directory or the source directory? what kind of artifact is it? how to address $(srcdir) vs $(src) in that context? bla bla") that really over complicate things. Jason
Re: .S_shipped unnecessary?
(+ Masahiro, kbuild ml) On 8 November 2018 at 21:37, Jason A. Donenfeld wrote: > Hi Ard, Eric, and others, > > As promised, the next Zinc patchset will have less generated code! After a > bit of work with Andy and Samuel, I'll be bundling the perlasm. > Wonderful! Any problems doing that for x86_64 ? > One thing I'm wondering about, though, is the wisdom behind the current > .S_shipped pattern. Usually the _shipped is for big firmware blobs that are > hard (or impossible) to build independently. But in this case, the .S is > generated from the .pl significantly faster than gcc even compiles a basic > C file. And, since perl is needed to build the kernel anyway, it's not like > it will be impossible to find the right tools. Rather than clutter up commits > with the .pl _and_ the .S_shipped, what would you think if I just generated > the .S each time as an ordinary build artifact. AFAICT, this is fairly usual, > and it's hard to see downsides. Hence, why I'm writing this email: are there > any downsides to that? > I agree 100%. When I added this the first time, it was at the request of the ARM maintainer, who was reluctant to rely on Perl for some reason. Recently, we have had to add a kludge to prevent spurious rebuilds of the .S_shipped files as well. I'd be perfectly happy to get rid of this entirely, and always generate the .S from the .pl, which to me is kind of the point of carrying these files in the first place. Masahiro: do you see any problems with this?
Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account
On 8 November 2018 at 23:55, Ard Biesheuvel wrote: > The simd wrapper's skcipher request context structure consists > of a single subrequest whose size is taken from the subordinate > skcipher. However, in simd_skcipher_init(), the reqsize that is > retrieved is not from the subordinate skcipher but from the > cryptd request structure, whose size is completely unrelated to > the actual wrapped skcipher. > > Reported-by: Qian Cai > Signed-off-by: Ard Biesheuvel > --- > crypto/simd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/crypto/simd.c b/crypto/simd.c > index ea7240be3001..2f3d6e897afc 100644 > --- a/crypto/simd.c > +++ b/crypto/simd.c > @@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm) > ctx->cryptd_tfm = cryptd_tfm; > > reqsize = sizeof(struct skcipher_request); > - reqsize += crypto_skcipher_reqsize(_tfm->base); > + reqsize += crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)); > This should be reqsize += max(crypto_skcipher_reqsize(_tfm->base); crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm))); since the cryptd path in simd still needs some space in the subreq for the completion.
Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include
On 07/11/18 14:55, Will Deacon wrote: > On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote: >> There are cases where the whole feature, for instance arm64/lse or >> arm/crypto, can depend on assembler. Current practice is to report >> buildtime that selected feature is not supported, which can be quite >> annoying... > > Why is it annoying? You still end up with a working kernel. .config doesn't really represent if option was built or not, annoying part is digging build logs (if anyone's saved them at all!) or relevant parts of dmesg (if option throws anything in there and which not always part of reports). > >> It'd nicer if we can check assembler first and opt-in feature >> visibility in Kconfig. >> >> Cc: Masahiro Yamada >> Cc: Will Deacon >> Cc: Marc Zyngier >> Cc: Ard Biesheuvel >> Signed-off-by: Vladimir Murzin >> --- >> scripts/Kconfig.include | 4 >> 1 file changed, 4 insertions(+) > > One issue I have with doing the check like this is that if somebody sends > you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel > using that .config and an old toolchain, the option is silently dropped. I see... at least we have some tools like ./scripts/diffconfig > > I think the diagnostic is actually useful in this case. Fully agree on diagnostic side, any suggestions how it can be improved? Cheers Vladimir > > Will >
Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include
On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote: > There are cases where the whole feature, for instance arm64/lse or > arm/crypto, can depend on assembler. Current practice is to report > buildtime that selected feature is not supported, which can be quite > annoying... Why is it annoying? You still end up with a working kernel. > It'd nicer if we can check assembler first and opt-in feature > visibility in Kconfig. > > Cc: Masahiro Yamada > Cc: Will Deacon > Cc: Marc Zyngier > Cc: Ard Biesheuvel > Signed-off-by: Vladimir Murzin > --- > scripts/Kconfig.include | 4 > 1 file changed, 4 insertions(+) One issue I have with doing the check like this is that if somebody sends you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel using that .config and an old toolchain, the option is silently dropped. I think the diagnostic is actually useful in this case. Will
Re: [PATCH 1/2] crypto: fix cfb mode decryption
чт, 1 нояб. 2018 г. в 11:41, Herbert Xu : > > On Thu, Nov 01, 2018 at 11:32:37AM +0300, Dmitry Eremin-Solenikov wrote: > > > > Since 4.20 pull went into Linus'es tree, any change of getting these two > > patches > > in crypto tree? > > These aren't critical enough for the current mainline so they will > go in at the next merge window. Thank you. -- With best wishes Dmitry
Re: [PATCH 1/2] crypto: fix cfb mode decryption
On Thu, Nov 01, 2018 at 11:32:37AM +0300, Dmitry Eremin-Solenikov wrote: > > Since 4.20 pull went into Linus'es tree, any change of getting these two > patches > in crypto tree? These aren't critical enough for the current mainline so they will go in at the next merge window. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 1/2] crypto: fix cfb mode decryption
Hello, вс, 21 окт. 2018 г. в 11:07, James Bottomley : > > On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: > > (+ James) > > Thanks! > > > On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov > > wrote: > > > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream > > > with > > > IV, rather than with data stream, resulting in incorrect > > > decryption. > > > Test vectors will be added in the next patch. > > > > > > Signed-off-by: Dmitry Eremin-Solenikov > > > Cc: sta...@vger.kernel.org > > > --- > > > crypto/cfb.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/crypto/cfb.c b/crypto/cfb.c > > > index a0d68c09e1b9..fd4e8500e121 100644 > > > --- a/crypto/cfb.c > > > +++ b/crypto/cfb.c > > > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct > > > skcipher_walk *walk, > > > > > > do { > > > crypto_cfb_encrypt_one(tfm, iv, dst); > > > - crypto_xor(dst, iv, bsize); > > > + crypto_xor(dst, src, bsize); > > This does look right. I think the reason the TPM code works is that it > always does encrypt/decrypt in-place, which is a separate piece of the > code which appears to be correct. Since 4.20 pull went into Linus'es tree, any change of getting these two patches in crypto tree? -- With best wishes Dmitry
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
On Wed, 24 Oct 2018, James Bottomley wrote: +static void KDFa(u8 *key, int keylen, const char *label, u8 *u, +u8 *v, int bytes, u8 *out) Should this be in lower case? I would rename it as tpm_kdfa(). This one is defined as KDFa() in the standards and it's not TPM specific (although some standards refer to it as KDFA). I'm not averse to making them tpm_kdfe() and tpm_kdfa() but I was hoping that one day the crypto subsystem would need them and we could move them in there because KDFs are the new shiny in crypto primitives (TLS 1.2 started using them, for instance). I care more about tracing and debugging than naming and having 'tpm_' in front of every TPM function makes tracing a lean process. AFAIK using upper case letters is against kernel coding conventions. I'm not sure why this would make an exception on that. Why doesn't it matter here? Because, as the comment says, it eventually gets overwritten by running ecdh to derive the two co-ordinates. (pointers to these two uninitialized areas are passed into the ecdh destination sg list). Oh, I just misunderstood the comment. Wouldn't it be easier to say that the data is initialized later? + buf_len = crypto_ecdh_key_len(); + if (sizeof(encoded_key) < buf_len) { + dev_err(>dev, "salt buffer too small needs %d\n", + buf_len); + goto out; + } In what situation this can happen? Can sizeof(encoded_key) >= buf_len? Yes, but only if someone is trying to crack your ecdh. One of the security issues in ecdh is if someone makes a very specific point choice (usually in the cofactor space) that has a very short period, the attacker can guess the input to KDFe. In this case if TPM genie provided a specially crafted attack EC point, we'd detect it here because the resulting buffer would be too short. Right. Thank you for the explanation. Here some kind of comment might not be a bad idea... In general this function should have a clear explanation what it does and maybe less these one character variables but instead variables with more documenting names. Explain in high level what algorithms are used and how the salt is calculated. I'll try, but this is a rather complex function. Understood. I do not expect perfection here and we can improve documetation later on. For anyone wanting to review James' patches and w/o much experience on EC, I recommend reading this article: https://arstechnica.com/information-technology/2013/10/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/ I read it few years ago and refreshed my memory few days ago by re-reading it. + +/** + * tpm_buf_append_hmac_session() append a TPM session element + * @buf: The buffer to be appended + * @auth: the auth structure allocated by tpm2_start_auth_session() + * @attributes: The session attributes + * @passphrase: The session authority (NULL if none) + * @passphraselen: The length of the session authority (0 if none) The alignment. the alignment of what? We generally have parameter descriptions tab-aligned. Why there would be trailing zeros? Because TPM 1.2 mandated zero padded fixed size passphrases so the TPM 2.0 standard specifies a way of converting these to variable size strings by eliminating the zero padding. Ok. James I'm also looking forward for the CONTEXT_GAP patch based on the yesterdays discussion. We do want it and I was stupid not to take it couple years ago :-) Thanks. /Jarkko
Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions
On Wed, 24 Oct 2018, James Bottomley wrote: On Wed, 2018-10-24 at 02:51 +0300, Jarkko Sakkinen wrote: I would consider sending first a patch set that would iterate the existing session stuff to be ready for this i.e. merge in two iterations (emphasis on the word "consider"). We can probably merge the groundwork quite fast. I realise we're going to have merge conflicts on the later ones, so why don't we do this: I'll still send as one series, but you apply the ones you think are precursors and I'll rebase and resend the rest? James Works for me and now I think after yesterdays dicussions etc. that this should be merged as one series. /Jarkko
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
On Wed, 2018-10-24 at 02:48 +0300, Jarkko Sakkinen wrote: > On Mon, 22 Oct 2018, James Bottomley wrote: > > [...] I'll tidy up the descriptions. > These all sould be combined with the existing session stuff inside > tpm2-cmd.c and not have duplicate infrastructures. The file name > should be tpm2-session.c (we neither have tpm2-cmds.c). You mean move tpm2_buf_append_auth() into the new sessions file as well ... sure, I can do that. [...] > > + > > +/* > > + * assume hash sha256 and nonces u, v of size SHA256_DIGEST_SIZE > > but > > + * otherwise standard KDFa. Note output is in bytes not bits. > > + */ > > +static void KDFa(u8 *key, int keylen, const char *label, u8 *u, > > +u8 *v, int bytes, u8 *out) > > Should this be in lower case? I would rename it as tpm_kdfa(). This one is defined as KDFa() in the standards and it's not TPM specific (although some standards refer to it as KDFA). I'm not averse to making them tpm_kdfe() and tpm_kdfa() but I was hoping that one day the crypto subsystem would need them and we could move them in there because KDFs are the new shiny in crypto primitives (TLS 1.2 started using them, for instance). > > +{ > > + u32 counter; > > + const __be32 bits = cpu_to_be32(bytes * 8); > > + > > + for (counter = 1; bytes > 0; bytes -= SHA256_DIGEST_SIZE, > > counter++, > > +out += SHA256_DIGEST_SIZE) { > > Only one counter is actually used for anything so this is overly > complicated and IMHO it is ok to call the counter just 'i'. Maybe > just: > > for (i = 1; (bytes - (i - 1) * SHA256_DIGEST_SIZE) > 0; i++) { > > > + SHASH_DESC_ON_STACK(desc, sha256_hash); > > + __be32 c = cpu_to_be32(counter); > > + > > + hmac_init(desc, key, keylen); > > + crypto_shash_update(desc, (u8 *), sizeof(c)); > > + crypto_shash_update(desc, label, strlen(label)+1); > > + crypto_shash_update(desc, u, SHA256_DIGEST_SIZE); > > + crypto_shash_update(desc, v, SHA256_DIGEST_SIZE); > > + crypto_shash_update(desc, (u8 *), > > sizeof(bits)); > > + hmac_final(desc, key, keylen, out); > > + } > > +} > > + > > +/* > > + * Somewhat of a bastardization of the real KDFe. We're assuming > > + * we're working with known point sizes for the input parameters > > and > > + * the hash algorithm is fixed at sha256. Because we know that > > the > > + * point size is 32 bytes like the hash size, there's no need to > > loop > > + * in this KDF. > > + */ > > +static void KDFe(u8 z[EC_PT_SZ], const char *str, u8 *pt_u, u8 > > *pt_v, > > +u8 *keyout) > > +{ > > + SHASH_DESC_ON_STACK(desc, sha256_hash); > > + /* > > +* this should be an iterative counter, but because we > > know > > +* we're only taking 32 bytes for the point using a > > sha256 > > +* hash which is also 32 bytes, there's only one loop > > +*/ > > + __be32 c = cpu_to_be32(1); > > + > > + desc->tfm = sha256_hash; > > + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; > > + > > + crypto_shash_init(desc); > > + /* counter (BE) */ > > + crypto_shash_update(desc, (u8 *), sizeof(c)); > > + /* secret value */ > > + crypto_shash_update(desc, z, EC_PT_SZ); > > + /* string including trailing zero */ > > + crypto_shash_update(desc, str, strlen(str)+1); > > + crypto_shash_update(desc, pt_u, EC_PT_SZ); > > + crypto_shash_update(desc, pt_v, EC_PT_SZ); > > + crypto_shash_final(desc, keyout); > > +} > > + > > +static void tpm_buf_append_salt(struct tpm_buf *buf, struct > > tpm_chip *chip, > > + struct tpm2_auth *auth) > > Given the complexity of this function and some not that obvious > choices in the implementation (coordinates), it would make sense to > document this function. I'll try to beef up the salting description > > +{ > > + struct crypto_kpp *kpp; > > + struct kpp_request *req; > > + struct scatterlist s[2], d[1]; > > + struct ecdh p = {0}; > > + u8 encoded_key[EC_PT_SZ], *x, *y; > > Why you use one character variable name 'p' and longer name > 'encoded_key'? > > > + unsigned int buf_len; > > + u8 *secret; > > + > > + secret = kmalloc(EC_PT_SZ, GFP_KERNEL); > > + if (!secret) > > + return; > > + > > + p.curve_id = ECC_CURVE_NIST_P256; > > Could this be set already in the initialization? I'm never sure about designated initializers, but I think, after looking them up again, it will zero fill unmentioned elements. > > + > > + /* secret is two sized points */ > > + tpm_buf_append_u16(buf, (EC_PT_SZ + 2)*2); > > White space missing. Should be "(EC_PT_SZ + 2) * 2". The comment is a > bit obscure (maybe, do not have any specific suggestion how to make > it less obscure). > > > + /* > > +* we cheat here and append uninitialized data to form > > +* the points. All we care about is getting the two > > +* co-ordinate pointers, which will be used to overwrite > > +* the uninitialized data > > +*/ > >
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
On Tue, 23 Oct 2018, Ard Biesheuvel wrote: On 23 October 2018 at 04:01, James Bottomley wrote: On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote: [...] +static void hmac_init(struct shash_desc *desc, u8 *key, int keylen) +{ + u8 pad[SHA256_BLOCK_SIZE]; + int i; + + desc->tfm = sha256_hash; + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; I don't think this actually does anything in the shash API implementation, so you can drop this. OK, I find crypto somewhat hard to follow. There were bits I had to understand, like when I wrote the CFB implementation or when I fixed the ECDH scatterlist handling, but I've got to confess, in time honoured tradition I simply copied this from EVM crypto without actually digging into the code to understand why. Yeah, it is notoriously hard to use, and we should try to improve that. James, I would hope (already said in my review) to use longer than one character variable names for most of the stuff. I did not quite understand why you decided to use 'counter' for obvious counter variable and one character names for non-obvious stuff :-) I'm not sure where the 'encoded' exactly comes in the variable name 'encoded_key' especially in the context of these cryptic names. /Jarkko
Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions
On Wed, 2018-10-24 at 02:51 +0300, Jarkko Sakkinen wrote: > I would consider sending first a patch set that would iterate the > existing session stuff to be ready for this i.e. merge in two > iterations (emphasis on the word "consider"). We can probably merge > the groundwork quite fast. I realise we're going to have merge conflicts on the later ones, so why don't we do this: I'll still send as one series, but you apply the ones you think are precursors and I'll rebase and resend the rest? James
Re: [PATCH v4 5/7] trusted keys: Add session encryption protection to the seal/unseal path
The tag in the short description does not look at all. Should be either "tpm:" or "keys, trusted:". On Mon, 22 Oct 2018, James Bottomley wrote: If some entity is snooping the TPM bus, the can see the data going in to be sealed and the data coming out as it is unsealed. Add parameter and response encryption to these cases to ensure that no secrets are leaked even if the bus is snooped. As part of doing this conversion it was discovered that policy sessions can't work with HMAC protected authority because of missing pieces (the tpm Nonce). I've added code to work the same way as before, which will result in potential authority exposure (while still adding security for the command and the returned blob), and a fixme to redo the API to get rid of this security hole. Signed-off-by: James Bottomley --- drivers/char/tpm/tpm2-cmd.c | 155 1 file changed, 98 insertions(+), 57 deletions(-) diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index 22f1c7bee173..a8655cd535d1 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -425,7 +425,9 @@ int tpm2_seal_trusted(struct tpm_chip *chip, { unsigned int blob_len; struct tpm_buf buf; + struct tpm_buf t2b; u32 hash; + struct tpm2_auth *auth; int i; int rc; @@ -439,45 +441,56 @@ int tpm2_seal_trusted(struct tpm_chip *chip, if (i == ARRAY_SIZE(tpm2_hash_map)) return -EINVAL; - rc = tpm_buf_init(, TPM2_ST_SESSIONS, TPM2_CC_CREATE); + rc = tpm2_start_auth_session(chip, ); if (rc) return rc; - tpm_buf_append_u32(, options->keyhandle); - tpm2_buf_append_auth(, TPM2_RS_PW, -NULL /* nonce */, 0, -0 /* session_attributes */, -options->keyauth /* hmac */, -TPM_DIGEST_SIZE); + rc = tpm_buf_init(, TPM2_ST_SESSIONS, TPM2_CC_CREATE); + if (rc) { + tpm2_end_auth_session(auth); + return rc; + } + + rc = tpm_buf_init_2b(); + if (rc) { + tpm_buf_destroy(); + tpm2_end_auth_session(auth); + return rc; + } + tpm_buf_append_name(, auth, options->keyhandle, NULL); + tpm_buf_append_hmac_session(, auth, TPM2_SA_DECRYPT, + options->keyauth, TPM_DIGEST_SIZE); /* sensitive */ - tpm_buf_append_u16(, 4 + TPM_DIGEST_SIZE + payload->key_len + 1); + tpm_buf_append_u16(, TPM_DIGEST_SIZE); + tpm_buf_append(, options->blobauth, TPM_DIGEST_SIZE); + tpm_buf_append_u16(, payload->key_len + 1); + tpm_buf_append(, payload->key, payload->key_len); + tpm_buf_append_u8(, payload->migratable); - tpm_buf_append_u16(, TPM_DIGEST_SIZE); - tpm_buf_append(, options->blobauth, TPM_DIGEST_SIZE); - tpm_buf_append_u16(, payload->key_len + 1); - tpm_buf_append(, payload->key, payload->key_len); - tpm_buf_append_u8(, payload->migratable); + tpm_buf_append_2b(, ); /* public */ - tpm_buf_append_u16(, 14 + options->policydigest_len); - tpm_buf_append_u16(, TPM2_ALG_KEYEDHASH); - tpm_buf_append_u16(, hash); + tpm_buf_append_u16(, TPM2_ALG_KEYEDHASH); + tpm_buf_append_u16(, hash); /* policy */ if (options->policydigest_len) { - tpm_buf_append_u32(, 0); - tpm_buf_append_u16(, options->policydigest_len); - tpm_buf_append(, options->policydigest, + tpm_buf_append_u32(, 0); + tpm_buf_append_u16(, options->policydigest_len); + tpm_buf_append(, options->policydigest, options->policydigest_len); } else { - tpm_buf_append_u32(, TPM2_OA_USER_WITH_AUTH); - tpm_buf_append_u16(, 0); + tpm_buf_append_u32(, TPM2_OA_USER_WITH_AUTH); + tpm_buf_append_u16(, 0); } /* public parameters */ - tpm_buf_append_u16(, TPM2_ALG_NULL); - tpm_buf_append_u16(, 0); + tpm_buf_append_u16(, TPM2_ALG_NULL); + /* unique (zero) */ + tpm_buf_append_u16(, 0); + + tpm_buf_append_2b(, ); /* outside info */ tpm_buf_append_u16(, 0); @@ -490,8 +503,11 @@ int tpm2_seal_trusted(struct tpm_chip *chip, goto out; } - rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 4, 0, - "sealing data"); + tpm_buf_fill_hmac_session(, auth); + + rc = tpm_transmit_cmd(chip, >kernel_space, buf.data, + PAGE_SIZE, 4, 0, "sealing data"); + rc = tpm_buf_check_hmac_response(, auth, rc); if (rc) goto out; @@ -509,6 +525,7 @@ int tpm2_seal_trusted(struct tpm_chip *chip, payload->blob_len = blob_len;
Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions
I would consider sending first a patch set that would iterate the existing session stuff to be ready for this i.e. merge in two iterations (emphasis on the word "consider"). We can probably merge the groundwork quite fast. /Jarkko On Mon, 22 Oct 2018, James Bottomley wrote: By now, everybody knows we have a problem with the TPM2_RS_PW easy button on TPM2 in that transactions on the TPM bus can be intercepted and altered. The way to fix this is to use real sessions for HMAC capabilities to ensure integrity and to use parameter and response encryption to ensure confidentiality of the data flowing over the TPM bus. This patch series is about adding a simple API which can ensure the above properties as a layered addition to the existing TPM handling code. This series now includes protections for PCR extend, getting random numbers from the TPM and data sealing and unsealing. It therefore eliminates all uses of TPM2_RS_PW in the kernel and adds encryption protection to sensitive data flowing into and out of the TPM. In the third version I added data sealing and unsealing protection, apart from one API based problem which means that the way trusted keys were protected it's not currently possible to HMAC protect an authority that comes with a policy, so the API will have to be extended to fix that case In this fourth version, I tidy up some of the code and add more security features, the most notable is that we now calculate the NULL seed name and compare our calculation to the value returned in TPM2_ReadPublic, which means we now can't be spoofed. This version also gives a sysfs variable for the null seed which userspace can use to run a key certification operation to prove that the TPM was always secure when communicating with the kernel. I've verified this using the test suite in the last patch on a VM connected to a tpm2 emulator. I also instrumented the emulator to make sure the sensitive data was properly encrypted. James --- James Bottomley (7): tpm-buf: create new functions for handling TPM buffers tpm2-sessions: Add full HMAC and encrypt/decrypt session handling tpm2: add hmac checks to tpm2_pcr_extend() tpm2: add session encryption protection to tpm2_get_random() trusted keys: Add session encryption protection to the seal/unseal path tpm: add the null key name as a tpm2 sysfs variable tpm2-sessions: NOT FOR COMMITTING add sessions testing drivers/char/tpm/Kconfig |3 + drivers/char/tpm/Makefile |3 +- drivers/char/tpm/tpm-buf.c| 191 ++ drivers/char/tpm/tpm-chip.c |1 + drivers/char/tpm/tpm-sysfs.c | 27 +- drivers/char/tpm/tpm.h| 129 ++-- drivers/char/tpm/tpm2-cmd.c | 248 --- drivers/char/tpm/tpm2-sessions-test.c | 360 ++ drivers/char/tpm/tpm2-sessions.c | 1188 + drivers/char/tpm/tpm2-sessions.h | 57 ++ 10 files changed, 2027 insertions(+), 180 deletions(-) create mode 100644 drivers/char/tpm/tpm-buf.c create mode 100644 drivers/char/tpm/tpm2-sessions-test.c create mode 100644 drivers/char/tpm/tpm2-sessions.c create mode 100644 drivers/char/tpm/tpm2-sessions.h -- 2.16.4
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
On Mon, 22 Oct 2018, James Bottomley wrote: This code adds true session based HMAC authentication plus parameter decryption and response encryption using AES. In order to reduce complexity it would make sense to split into two commits: authentication and parameter encryption. The basic design of this code is to segregate all the nasty crypto, hash and hmac code into tpm2-sessions.c and export a usable API. The API first of all starts off by gaining a session with tpm2_start_auth_session() Which initiates a session with the TPM and allocates an opaque tpm2_auth structure to handle the session parameters. Then the use is simply: * tpm_buf_append_name() in place of the tpm_buf_append_u32 for the handles Do not understand the description of this function. * tpm_buf_append_hmac_session() where tpm2_append_auth() would go Another fuzzy description. * tpm_buf_fill_hmac_session() called after the entire command buffer is finished but before tpm_transmit_cmd() is called which computes the correct HMAC and places it in the command at the correct location. I would call this simply tpm_buf_insert_hmac(). It is clear and precise name what it does. These all sould be combined with the existing session stuff inside tpm2-cmd.c and not have duplicate infrastructures. The file name should be tpm2-session.c (we neither have tpm2-cmds.c). Finally, after tpm_transmit_cmd() is called, tpm_buf_check_hmac_response() is called to check that the returned HMAC matched and collect the new state for the next use of the session, if any. The features of the session is controlled by the session attributes set in tpm_buf_append_hmac_session(). If TPM2_SA_CONTINUE_SESSION is not specified, the session will be flushed and the tpm2_auth structure freed in tpm_buf_check_hmac_response(); otherwise the session may be used again. Parameter encryption is specified by or'ing the flag TPM2_SA_DECRYPT and response encryption by or'ing the flag TPM2_SA_ENCRYPT. the various encryptions will be taken care of by tpm_buf_fill_hmac_session() and tpm_buf_check_hmac_response() respectively. To get all of this to work securely, the Kernel now needs a primary key to encrypt the session salt to, so we derive an EC key from the NULL seed and store it in the tpm_chip structure. We also make sure that this seed remains for the kernel by using a kernel space to take it out of the TPM when userspace wants to use it. Signed-off-by: James Bottomley --- v2: Added docbook and improved response check API v3: Add readpublic, fix hmac length, add API for close on error allow for the hmac session not being first in the sessions v4: Make NULL seed template exactly match the SRK ECC template. Also check the NULL primary key name is what getpublic returns to prevent spoofing. Also parametrise the name size for reuse --- drivers/char/tpm/Kconfig |3 + drivers/char/tpm/Makefile|2 +- drivers/char/tpm/tpm.h | 32 + drivers/char/tpm/tpm2-cmd.c | 34 +- drivers/char/tpm/tpm2-sessions.c | 1188 ++ drivers/char/tpm/tpm2-sessions.h | 57 ++ 6 files changed, 1300 insertions(+), 16 deletions(-) create mode 100644 drivers/char/tpm/tpm2-sessions.c create mode 100644 drivers/char/tpm/tpm2-sessions.h tpm2-cmd.c changes should be in their own commit e.g.: "tpm: add own space for in-kernel TPM communication" diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig index 0aee88df98d1..8c714d8550c4 100644 --- a/drivers/char/tpm/Kconfig +++ b/drivers/char/tpm/Kconfig @@ -8,6 +8,9 @@ menuconfig TCG_TPM select SECURITYFS select CRYPTO select CRYPTO_HASH_INFO + select CRYPTO_ECDH + select CRYPTO_AES + select CRYPTO_CFB ---help--- If you have a TPM security chip in your system, which implements the Trusted Computing Group's specification, diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile index 65d165cce530..1b5f6ccbb86d 100644 --- a/drivers/char/tpm/Makefile +++ b/drivers/char/tpm/Makefile @@ -5,7 +5,7 @@ obj-$(CONFIG_TCG_TPM) += tpm.o tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o \ tpm-dev-common.o tpmrm-dev.o eventlog/common.o eventlog/tpm1.o \ -eventlog/tpm2.o tpm2-space.o tpm-buf.o +eventlog/tpm2.o tpm2-space.o tpm-buf.o tpm2-sessions.o tpm-$(CONFIG_ACPI) += tpm_ppi.o eventlog/acpi.o tpm-$(CONFIG_EFI) += eventlog/efi.o tpm-$(CONFIG_OF) += eventlog/of.o diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index d73701e36eba..d39065d9995d 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -42,6 +42,15 @@ #include #endif +/* fixed define for the curve we use which is NIST_P256 */ +#define EC_PT_SZ 32 + +/* + * fixed define for the size of a name. This is actually HASHALG size + * plus 2, so 32 for SHA256 + */ +#define TPM2_NAME_SIZE 34 Please, remove this definition completely. It only
Re: [PATCH v4 1/7] tpm-buf: create new functions for handling TPM buffers
On Mon, 22 Oct 2018, James Bottomley wrote: This separates out the old tpm_buf_... handling functions from static inlines into tpm.h and makes them their own tpm-buf.c file. It also adds handling for tpm2b structures and also incremental pointer advancing parsers. Signed-off-by: James Bottomley Would also prefer simply "tpm:" tag e.g. (in all of the commits actually in this series but I remark it just here). "tpm: create new tpm_buf_functions for parsing and TPM2B" /Jarkko
Re: [PATCH v4 1/7] tpm-buf: create new functions for handling TPM buffers
On Mon, 22 Oct 2018, James Bottomley wrote: This separates out the old tpm_buf_... handling functions from static inlines into tpm.h and makes them their own tpm-buf.c file. It also adds handling for tpm2b structures and also incremental pointer advancing parsers. Nitpicking: when my SGX patch set has been reviewed one of the comments has been that commit messages should be in imperative form e.g. "Separate out the old tpm_buf handling functions from static inlines into tpm.h and make them their..." Signed-off-by: James Bottomley --- v2: added this patch to separate out the API changes v3: added tpm_buf_reset_cmd() v4: renamed tpm_buf_reset_cmd() to tpm_buf_reset() --- drivers/char/tpm/Makefile | 2 +- drivers/char/tpm/tpm-buf.c | 191 + drivers/char/tpm/tpm.h | 96 --- 3 files changed, 208 insertions(+), 81 deletions(-) create mode 100644 drivers/char/tpm/tpm-buf.c diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile index 3655258bee73..65d165cce530 100644 --- a/drivers/char/tpm/Makefile +++ b/drivers/char/tpm/Makefile @@ -5,7 +5,7 @@ obj-$(CONFIG_TCG_TPM) += tpm.o tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o \ tpm-dev-common.o tpmrm-dev.o eventlog/common.o eventlog/tpm1.o \ -eventlog/tpm2.o tpm2-space.o +eventlog/tpm2.o tpm2-space.o tpm-buf.o tpm-$(CONFIG_ACPI) += tpm_ppi.o eventlog/acpi.o tpm-$(CONFIG_EFI) += eventlog/efi.o tpm-$(CONFIG_OF) += eventlog/of.o diff --git a/drivers/char/tpm/tpm-buf.c b/drivers/char/tpm/tpm-buf.c new file mode 100644 index ..faa22bb09d99 --- /dev/null +++ b/drivers/char/tpm/tpm-buf.c @@ -0,0 +1,191 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Handing for tpm2b structures to facilitate the building of commands + */ + +#include "tpm.h" + +#include + +#include + +static int __tpm_buf_init(struct tpm_buf *buf) +{ + buf->data_page = alloc_page(GFP_HIGHUSER); + if (!buf->data_page) + return -ENOMEM; + + buf->flags = 0; + buf->data = kmap(buf->data_page); + + return 0; +} + +void tpm_buf_reset(struct tpm_buf *buf, u16 tag, u32 ordinal) +{ + struct tpm_input_header *head; + + head = (struct tpm_input_header *) buf->data; + + head->tag = cpu_to_be16(tag); + head->length = cpu_to_be32(sizeof(*head)); + head->ordinal = cpu_to_be32(ordinal); +} +EXPORT_SYMBOL_GPL(tpm_buf_reset); + +int tpm_buf_init(struct tpm_buf *buf, u16 tag, u32 ordinal) +{ + int rc; + + rc = __tpm_buf_init(buf); + if (rc) + return rc; + + tpm_buf_reset(buf, tag, ordinal); + + return 0; +} +EXPORT_SYMBOL_GPL(tpm_buf_init); + +int tpm_buf_init_2b(struct tpm_buf *buf) +{ + struct tpm_input_header *head; + int rc; + + rc = __tpm_buf_init(buf); + if (rc) + return rc; + + head = (struct tpm_input_header *) buf->data; + + head->length = cpu_to_be32(sizeof(*head)); + + buf->flags = TPM_BUF_2B; + return 0; +} +EXPORT_SYMBOL_GPL(tpm_buf_init_2b); + +void tpm_buf_destroy(struct tpm_buf *buf) +{ + kunmap(buf->data_page); + __free_page(buf->data_page); +} +EXPORT_SYMBOL_GPL(tpm_buf_destroy); + +static void *tpm_buf_data(struct tpm_buf *buf) +{ + if (buf->flags & TPM_BUF_2B) + return buf->data + TPM_HEADER_SIZE; + return buf->data; +} + +u32 tpm_buf_length(struct tpm_buf *buf) +{ + struct tpm_input_header *head = (struct tpm_input_header *)buf->data; + u32 len; + + len = be32_to_cpu(head->length); + if (buf->flags & TPM_BUF_2B) + len -= sizeof(*head); + return len; +} +EXPORT_SYMBOL_GPL(tpm_buf_length); + +u16 tpm_buf_tag(struct tpm_buf *buf) +{ + struct tpm_input_header *head = (struct tpm_input_header *)buf->data; + + return be16_to_cpu(head->tag); +} +EXPORT_SYMBOL_GPL(tpm_buf_tag); + +void tpm_buf_append(struct tpm_buf *buf, + const unsigned char *new_data, + unsigned int new_len) +{ + struct tpm_input_header *head = (struct tpm_input_header *) buf->data; + u32 len = be32_to_cpu(head->length); + + /* Return silently if overflow has already happened. */ + if (buf->flags & TPM_BUF_OVERFLOW) + return; + + if ((len + new_len) > PAGE_SIZE) { + WARN(1, "tpm_buf: overflow\n"); + buf->flags |= TPM_BUF_OVERFLOW; + return; + } + + memcpy(>data[len], new_data, new_len); + head->length = cpu_to_be32(len + new_len); +} +EXPORT_SYMBOL_GPL(tpm_buf_append); + +void tpm_buf_append_u8(struct tpm_buf *buf, const u8 value) +{ + tpm_buf_append(buf, , 1); +} +EXPORT_SYMBOL_GPL(tpm_buf_append_u8); + +void tpm_buf_append_u16(struct tpm_buf *buf, const u16 value) +{ + __be16 value2 = cpu_to_be16(value); + + tpm_buf_append(buf, (u8 *) , 2); +}
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
On 23 October 2018 at 04:01, James Bottomley wrote: > On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote: > [...] >> > +static void hmac_init(struct shash_desc *desc, u8 *key, int >> > keylen) >> > +{ >> > + u8 pad[SHA256_BLOCK_SIZE]; >> > + int i; >> > + >> > + desc->tfm = sha256_hash; >> > + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; >> >> I don't think this actually does anything in the shash API >> implementation, so you can drop this. > > OK, I find crypto somewhat hard to follow. There were bits I had to > understand, like when I wrote the CFB implementation or when I fixed > the ECDH scatterlist handling, but I've got to confess, in time > honoured tradition I simply copied this from EVM crypto without > actually digging into the code to understand why. > Yeah, it is notoriously hard to use, and we should try to improve that. >> However, I take it this means that hmac_init() is never called in >> contexts where sleeping is not allowed? For the relevance of that, >> please see below. > > Right, these routines are always called as an adjunct to a TPM > operation and TPM operations can sleep, so we must at least have kernel > thread context. > > [...] >> > + /* encrypt before HMAC */ >> > + if (auth->attrs & TPM2_SA_DECRYPT) { >> > + struct scatterlist sg[1]; >> > + u16 len; >> > + SKCIPHER_REQUEST_ON_STACK(req, auth->aes); >> > + >> > + skcipher_request_set_tfm(req, auth->aes); >> > + skcipher_request_set_callback(req, >> > CRYPTO_TFM_REQ_MAY_SLEEP, >> > + NULL, NULL); >> > + >> >> Your crypto_alloc_skcipher() call [further down] does not mask out >> CRYPTO_ALG_ASYNC, and so it permits asynchronous implementations to >> be selected. Your generic cfb template only permits synchronous >> implementations, since it wraps the cipher directly (which is always >> synchronous). However, we have support in the tree for some >> accelerators that implement cfb(aes), and those will return >> -EINPROGRESS when invoking crypto_skcipher_en/decrypt(req), which you >> are not set up to handle. >> >> So the simple solution is to call 'crypto_alloc_skcipher("cfb(aes)", >> 0, CRYPTO_ALG_ASYNC)' below instead. >> >> However, I would prefer it if we permit asynchronous skciphers here. >> The reason is that, on a given platform, the accelerator may be the >> only truly time invariant AES implementation that is available, and >> since we are dealing with the TPM, a bit of paranoia does not hurt. >> It also makes it easier to implement it as a [time invariant] SIMD >> based asynchronous skcipher, which are simpler than synchronous ones >> since they don't require a non-SIMD fallback path for calls from >> contexts where the SIMD unit may not be used. >> >> I have already implemented cfb(aes) for arm64/NEON. I will post the >> patch after the merge window closes. >> >> > + /* need key and IV */ >> > + KDFa(auth->session_key, SHA256_DIGEST_SIZE >> > ++ auth->passphraselen, "CFB", auth->our_nonce, >> > +auth->tpm_nonce, AES_KEYBYTES + >> > AES_BLOCK_SIZE, >> > +auth->scratch); >> > + crypto_skcipher_setkey(auth->aes, auth->scratch, >> > AES_KEYBYTES); >> > + len = tpm_get_inc_u16(); >> > + sg_init_one(sg, p, len); >> > + skcipher_request_set_crypt(req, sg, sg, len, >> > + auth->scratch + >> > AES_KEYBYTES); >> > + crypto_skcipher_encrypt(req); >> >> So please consider replacing this with something like. >> >> DECLARE_CRYPTO_WAIT(wait); [further up] >> skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, >> crypto_req_done, ); >> crypto_wait_req(crypto_skcipher_encrypt(req), ); > > Sure, I can do this ... the crypto skcipher handling was also cut and > paste, but I forget where from this time. So what I think you're > asking for is below as the incremental diff? I've tested this out and > it all works fine in my session testing environment (and on my real > laptop) ... although since I'm fully sync, I won't really have tested > the -EINPROGRESS do the wait case. > Yes that looks fine. I will try to test this on one of my arm64 systems, but it may take me some time to get around to it. In any case, Reviewed-by: Ard Biesheuvel # crypto API parts > > diff --git a/drivers/char/tpm/tpm2-sessions.c > b/drivers/char/tpm/tpm2-sessions.c > index 422c3ec64f8c..bbd3e8580954 100644 > --- a/drivers/char/tpm/tpm2-sessions.c > +++ b/drivers/char/tpm/tpm2-sessions.c > @@ -165,7 +165,6 @@ static void hmac_init(struct shash_desc *desc, u8 *key, > int keylen) > int i; > > desc->tfm = sha256_hash; > - desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; > crypto_shash_init(desc); > for (i = 0; i < sizeof(pad); i++) { >
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote: [...] > > +static void hmac_init(struct shash_desc *desc, u8 *key, int > > keylen) > > +{ > > + u8 pad[SHA256_BLOCK_SIZE]; > > + int i; > > + > > + desc->tfm = sha256_hash; > > + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; > > I don't think this actually does anything in the shash API > implementation, so you can drop this. OK, I find crypto somewhat hard to follow. There were bits I had to understand, like when I wrote the CFB implementation or when I fixed the ECDH scatterlist handling, but I've got to confess, in time honoured tradition I simply copied this from EVM crypto without actually digging into the code to understand why. > However, I take it this means that hmac_init() is never called in > contexts where sleeping is not allowed? For the relevance of that, > please see below. Right, these routines are always called as an adjunct to a TPM operation and TPM operations can sleep, so we must at least have kernel thread context. [...] > > + /* encrypt before HMAC */ > > + if (auth->attrs & TPM2_SA_DECRYPT) { > > + struct scatterlist sg[1]; > > + u16 len; > > + SKCIPHER_REQUEST_ON_STACK(req, auth->aes); > > + > > + skcipher_request_set_tfm(req, auth->aes); > > + skcipher_request_set_callback(req, > > CRYPTO_TFM_REQ_MAY_SLEEP, > > + NULL, NULL); > > + > > Your crypto_alloc_skcipher() call [further down] does not mask out > CRYPTO_ALG_ASYNC, and so it permits asynchronous implementations to > be selected. Your generic cfb template only permits synchronous > implementations, since it wraps the cipher directly (which is always > synchronous). However, we have support in the tree for some > accelerators that implement cfb(aes), and those will return > -EINPROGRESS when invoking crypto_skcipher_en/decrypt(req), which you > are not set up to handle. > > So the simple solution is to call 'crypto_alloc_skcipher("cfb(aes)", > 0, CRYPTO_ALG_ASYNC)' below instead. > > However, I would prefer it if we permit asynchronous skciphers here. > The reason is that, on a given platform, the accelerator may be the > only truly time invariant AES implementation that is available, and > since we are dealing with the TPM, a bit of paranoia does not hurt. > It also makes it easier to implement it as a [time invariant] SIMD > based asynchronous skcipher, which are simpler than synchronous ones > since they don't require a non-SIMD fallback path for calls from > contexts where the SIMD unit may not be used. > > I have already implemented cfb(aes) for arm64/NEON. I will post the > patch after the merge window closes. > > > + /* need key and IV */ > > + KDFa(auth->session_key, SHA256_DIGEST_SIZE > > ++ auth->passphraselen, "CFB", auth->our_nonce, > > +auth->tpm_nonce, AES_KEYBYTES + > > AES_BLOCK_SIZE, > > +auth->scratch); > > + crypto_skcipher_setkey(auth->aes, auth->scratch, > > AES_KEYBYTES); > > + len = tpm_get_inc_u16(); > > + sg_init_one(sg, p, len); > > + skcipher_request_set_crypt(req, sg, sg, len, > > + auth->scratch + > > AES_KEYBYTES); > > + crypto_skcipher_encrypt(req); > > So please consider replacing this with something like. > > DECLARE_CRYPTO_WAIT(wait); [further up] > skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, > crypto_req_done, ); > crypto_wait_req(crypto_skcipher_encrypt(req), ); Sure, I can do this ... the crypto skcipher handling was also cut and paste, but I forget where from this time. So what I think you're asking for is below as the incremental diff? I've tested this out and it all works fine in my session testing environment (and on my real laptop) ... although since I'm fully sync, I won't really have tested the -EINPROGRESS do the wait case. James --- diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index 422c3ec64f8c..bbd3e8580954 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -165,7 +165,6 @@ static void hmac_init(struct shash_desc *desc, u8 *key, int keylen) int i; desc->tfm = sha256_hash; - desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; crypto_shash_init(desc); for (i = 0; i < sizeof(pad); i++) { if (i < keylen) @@ -244,7 +243,6 @@ static void KDFe(u8 z[EC_PT_SZ], const char *str, u8 *pt_u, u8 *pt_v, __be32 c = cpu_to_be32(1); desc->tfm = sha256_hash; - desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; crypto_shash_init(desc); /* counter (BE) */ @@ -445,7 +443,6 @@ void tpm_buf_fill_hmac_session(struct tpm_buf *buf, struct tpm2_auth *auth) auth->ordinal = head->ordinal;
Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
Hi James, Some comments below on how you are using the crypto API. On 22 October 2018 at 04:36, James Bottomley wrote: > This code adds true session based HMAC authentication plus parameter > decryption and response encryption using AES. > > The basic design of this code is to segregate all the nasty crypto, > hash and hmac code into tpm2-sessions.c and export a usable API. > > The API first of all starts off by gaining a session with > > tpm2_start_auth_session() > > Which initiates a session with the TPM and allocates an opaque > tpm2_auth structure to handle the session parameters. Then the use is > simply: > > * tpm_buf_append_name() in place of the tpm_buf_append_u32 for the > handles > > * tpm_buf_append_hmac_session() where tpm2_append_auth() would go > > * tpm_buf_fill_hmac_session() called after the entire command buffer > is finished but before tpm_transmit_cmd() is called which computes > the correct HMAC and places it in the command at the correct > location. > > Finally, after tpm_transmit_cmd() is called, > tpm_buf_check_hmac_response() is called to check that the returned > HMAC matched and collect the new state for the next use of the > session, if any. > > The features of the session is controlled by the session attributes > set in tpm_buf_append_hmac_session(). If TPM2_SA_CONTINUE_SESSION is > not specified, the session will be flushed and the tpm2_auth structure > freed in tpm_buf_check_hmac_response(); otherwise the session may be > used again. Parameter encryption is specified by or'ing the flag > TPM2_SA_DECRYPT and response encryption by or'ing the flag > TPM2_SA_ENCRYPT. the various encryptions will be taken care of by > tpm_buf_fill_hmac_session() and tpm_buf_check_hmac_response() > respectively. > > To get all of this to work securely, the Kernel now needs a primary > key to encrypt the session salt to, so we derive an EC key from the > NULL seed and store it in the tpm_chip structure. We also make sure > that this seed remains for the kernel by using a kernel space to take > it out of the TPM when userspace wants to use it. > > Signed-off-by: James Bottomley > > --- > > v2: Added docbook and improved response check API > v3: Add readpublic, fix hmac length, add API for close on error > allow for the hmac session not being first in the sessions > v4: Make NULL seed template exactly match the SRK ECC template. > Also check the NULL primary key name is what getpublic returns > to prevent spoofing. Also parametrise the name size for reuse > --- > drivers/char/tpm/Kconfig |3 + > drivers/char/tpm/Makefile|2 +- > drivers/char/tpm/tpm.h | 32 + > drivers/char/tpm/tpm2-cmd.c | 34 +- > drivers/char/tpm/tpm2-sessions.c | 1188 > ++ > drivers/char/tpm/tpm2-sessions.h | 57 ++ > 6 files changed, 1300 insertions(+), 16 deletions(-) > create mode 100644 drivers/char/tpm/tpm2-sessions.c > create mode 100644 drivers/char/tpm/tpm2-sessions.h > ... > diff --git a/drivers/char/tpm/tpm2-sessions.c > b/drivers/char/tpm/tpm2-sessions.c > new file mode 100644 > index ..422c3ec64f8c > --- /dev/null > +++ b/drivers/char/tpm/tpm2-sessions.c > @@ -0,0 +1,1188 @@ ... > +/* > + * this is our static crypto shash. This is possible because the hash > + * is multi-threaded and all the state stored in the desc > + */ > +static struct crypto_shash *sha256_hash; > + > +/* > + * It turns out the crypto hmac(sha256) is hard for us to consume > + * because it assumes a fixed key and the TPM seems to change the key > + * on every operation, so we weld the hmac init and final functions in > + * here to give it the same usage characteristics as a regular hash > + */ > +static void hmac_init(struct shash_desc *desc, u8 *key, int keylen) > +{ > + u8 pad[SHA256_BLOCK_SIZE]; > + int i; > + > + desc->tfm = sha256_hash; > + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP; I don't think this actually does anything in the shash API implementation, so you can drop this. However, I take it this means that hmac_init() is never called in contexts where sleeping is not allowed? For the relevance of that, please see below. > + crypto_shash_init(desc); > + for (i = 0; i < sizeof(pad); i++) { > + if (i < keylen) > + pad[i] = key[i]; > + else > + pad[i] = 0; > + pad[i] ^= HMAC_IPAD_VALUE; > + } > + crypto_shash_update(desc, pad, sizeof(pad)); > +} > + > +static void hmac_final(struct shash_desc *desc, u8 *key, int keylen, u8 *out) > +{ > + u8 pad[SHA256_BLOCK_SIZE]; > + int i; > + > + for (i = 0; i < sizeof(pad); i++) { > + if (i < keylen) > + pad[i] = key[i]; > + else > + pad[i] = 0; > + pad[i] ^= HMAC_OPAD_VALUE; > + } > + > + /* collect the final hash; use out as
Re: [PATCH 1/2] crypto: fix cfb mode decryption
On 21 October 2018 at 11:00, James Bottomley wrote: > On October 21, 2018 9:58:04 AM GMT, Ard Biesheuvel > wrote: >>On 21 October 2018 at 10:07, James Bottomley >> wrote: >>> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: (+ James) >>> >>> Thanks! >>> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov wrote: > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated >>keystream > with > IV, rather than with data stream, resulting in incorrect > decryption. > Test vectors will be added in the next patch. > > Signed-off-by: Dmitry Eremin-Solenikov > Cc: sta...@vger.kernel.org > --- > crypto/cfb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/crypto/cfb.c b/crypto/cfb.c > index a0d68c09e1b9..fd4e8500e121 100644 > --- a/crypto/cfb.c > +++ b/crypto/cfb.c > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct > skcipher_walk *walk, > > do { > crypto_cfb_encrypt_one(tfm, iv, dst); > - crypto_xor(dst, iv, bsize); > + crypto_xor(dst, src, bsize); >>> >>> This does look right. I think the reason the TPM code works is that >>it >>> always does encrypt/decrypt in-place, which is a separate piece of >>the >>> code which appears to be correct. >>> >> >>Yeah I figured that. >> >>So where is the TPM code that actually uses this code? > > It was posted to the integrity list a while ago. I'm planning a repost > shortly. > OK, found it. Mind cc'ing me on that repost?
Re: [PATCH 1/2] crypto: fix cfb mode decryption
On October 21, 2018 9:58:04 AM GMT, Ard Biesheuvel wrote: >On 21 October 2018 at 10:07, James Bottomley > wrote: >> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: >>> (+ James) >> >> Thanks! >> >>> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov >>> wrote: >>> > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated >keystream >>> > with >>> > IV, rather than with data stream, resulting in incorrect >>> > decryption. >>> > Test vectors will be added in the next patch. >>> > >>> > Signed-off-by: Dmitry Eremin-Solenikov >>> > Cc: sta...@vger.kernel.org >>> > --- >>> > crypto/cfb.c | 2 +- >>> > 1 file changed, 1 insertion(+), 1 deletion(-) >>> > >>> > diff --git a/crypto/cfb.c b/crypto/cfb.c >>> > index a0d68c09e1b9..fd4e8500e121 100644 >>> > --- a/crypto/cfb.c >>> > +++ b/crypto/cfb.c >>> > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct >>> > skcipher_walk *walk, >>> > >>> > do { >>> > crypto_cfb_encrypt_one(tfm, iv, dst); >>> > - crypto_xor(dst, iv, bsize); >>> > + crypto_xor(dst, src, bsize); >> >> This does look right. I think the reason the TPM code works is that >it >> always does encrypt/decrypt in-place, which is a separate piece of >the >> code which appears to be correct. >> > >Yeah I figured that. > >So where is the TPM code that actually uses this code? It was posted to the integrity list a while ago. I'm planning a repost shortly. James -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH 1/2] crypto: fix cfb mode decryption
On 21 October 2018 at 10:07, James Bottomley wrote: > On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: >> (+ James) > > Thanks! > >> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov >> wrote: >> > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream >> > with >> > IV, rather than with data stream, resulting in incorrect >> > decryption. >> > Test vectors will be added in the next patch. >> > >> > Signed-off-by: Dmitry Eremin-Solenikov >> > Cc: sta...@vger.kernel.org >> > --- >> > crypto/cfb.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/crypto/cfb.c b/crypto/cfb.c >> > index a0d68c09e1b9..fd4e8500e121 100644 >> > --- a/crypto/cfb.c >> > +++ b/crypto/cfb.c >> > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct >> > skcipher_walk *walk, >> > >> > do { >> > crypto_cfb_encrypt_one(tfm, iv, dst); >> > - crypto_xor(dst, iv, bsize); >> > + crypto_xor(dst, src, bsize); > > This does look right. I think the reason the TPM code works is that it > always does encrypt/decrypt in-place, which is a separate piece of the > code which appears to be correct. > Yeah I figured that. So where is the TPM code that actually uses this code?
Re: [PATCH 1/2] crypto: fix cfb mode decryption
On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote: > (+ James) Thanks! > On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov > wrote: > > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream > > with > > IV, rather than with data stream, resulting in incorrect > > decryption. > > Test vectors will be added in the next patch. > > > > Signed-off-by: Dmitry Eremin-Solenikov > > Cc: sta...@vger.kernel.org > > --- > > crypto/cfb.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/crypto/cfb.c b/crypto/cfb.c > > index a0d68c09e1b9..fd4e8500e121 100644 > > --- a/crypto/cfb.c > > +++ b/crypto/cfb.c > > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct > > skcipher_walk *walk, > > > > do { > > crypto_cfb_encrypt_one(tfm, iv, dst); > > - crypto_xor(dst, iv, bsize); > > + crypto_xor(dst, src, bsize); This does look right. I think the reason the TPM code works is that it always does encrypt/decrypt in-place, which is a separate piece of the code which appears to be correct. James
Re: [PATCH 2/2] crypto: testmgr: add AES-CFB tests
(+ James) On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov wrote: > Add AES128/192/256-CFB testvectors from NIST SP800-38A. > > Signed-off-by: Dmitry Eremin-Solenikov > Cc: sta...@vger.kernel.org > Signed-off-by: Dmitry Eremin-Solenikov > --- > crypto/tcrypt.c | 5 > crypto/testmgr.c | 7 + > crypto/testmgr.h | 76 > 3 files changed, 88 insertions(+) > > diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c > index bdde95e8d369..a6315827d240 100644 > --- a/crypto/tcrypt.c > +++ b/crypto/tcrypt.c > @@ -1733,6 +1733,7 @@ static int do_test(const char *alg, u32 type, u32 mask, > int m, u32 num_mb) > ret += tcrypt_test("xts(aes)"); > ret += tcrypt_test("ctr(aes)"); > ret += tcrypt_test("rfc3686(ctr(aes))"); > + ret += tcrypt_test("cfb(aes)"); > break; > > case 11: > @@ -2059,6 +2060,10 @@ static int do_test(const char *alg, u32 type, u32 > mask, int m, u32 num_mb) > speed_template_16_24_32); > test_cipher_speed("ctr(aes)", DECRYPT, sec, NULL, 0, > speed_template_16_24_32); > + test_cipher_speed("cfb(aes)", ENCRYPT, sec, NULL, 0, > + speed_template_16_24_32); > + test_cipher_speed("cfb(aes)", DECRYPT, sec, NULL, 0, > + speed_template_16_24_32); > break; > > case 201: > diff --git a/crypto/testmgr.c b/crypto/testmgr.c > index a1d42245082a..016d61c419fc 100644 > --- a/crypto/testmgr.c > +++ b/crypto/testmgr.c > @@ -2684,6 +2684,13 @@ static const struct alg_test_desc alg_test_descs[] = { > .dec = __VECS(aes_ccm_dec_tv_template) > } > } > + }, { > + .alg = "cfb(aes)", > + .test = alg_test_skcipher, > + .fips_allowed = 1, > + .suite = { > + .cipher = __VECS(aes_cfb_tv_template) > + }, > }, { > .alg = "chacha20", > .test = alg_test_skcipher, > diff --git a/crypto/testmgr.h b/crypto/testmgr.h > index 173111c70746..19b6d184c8fb 100644 > --- a/crypto/testmgr.h > +++ b/crypto/testmgr.h > @@ -12081,6 +12081,82 @@ static const struct cipher_testvec > aes_cbc_tv_template[] = { > }, > }; > > +static const struct cipher_testvec aes_cfb_tv_template[] = { > + { /* From NIST SP800-38A */ > + .key= "\x2b\x7e\x15\x16\x28\xae\xd2\xa6" > + "\xab\xf7\x15\x88\x09\xcf\x4f\x3c", > + .klen = 16, > + .iv = "\x00\x01\x02\x03\x04\x05\x06\x07" > + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", > + .ptext = "\x6b\xc1\xbe\xe2\x2e\x40\x9f\x96" > + "\xe9\x3d\x7e\x11\x73\x93\x17\x2a" > + "\xae\x2d\x8a\x57\x1e\x03\xac\x9c" > + "\x9e\xb7\x6f\xac\x45\xaf\x8e\x51" > + "\x30\xc8\x1c\x46\xa3\x5c\xe4\x11" > + "\xe5\xfb\xc1\x19\x1a\x0a\x52\xef" > + "\xf6\x9f\x24\x45\xdf\x4f\x9b\x17" > + "\xad\x2b\x41\x7b\xe6\x6c\x37\x10", > + .ctext = "\x3b\x3f\xd9\x2e\xb7\x2d\xad\x20" > + "\x33\x34\x49\xf8\xe8\x3c\xfb\x4a" > + "\xc8\xa6\x45\x37\xa0\xb3\xa9\x3f" > + "\xcd\xe3\xcd\xad\x9f\x1c\xe5\x8b" > + "\x26\x75\x1f\x67\xa3\xcb\xb1\x40" > + "\xb1\x80\x8c\xf1\x87\xa4\xf4\xdf" > + "\xc0\x4b\x05\x35\x7c\x5d\x1c\x0e" > + "\xea\xc4\xc6\x6f\x9f\xf7\xf2\xe6", > + .len= 64, > + }, { > + .key= "\x8e\x73\xb0\xf7\xda\x0e\x64\x52" > + "\xc8\x10\xf3\x2b\x80\x90\x79\xe5" > + "\x62\xf8\xea\xd2\x52\x2c\x6b\x7b", > + .klen = 24, > + .iv = "\x00\x01\x02\x03\x04\x05\x06\x07" > + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", > + .ptext = "\x6b\xc1\xbe\xe2\x2e\x40\x9f\x96" > + "\xe9\x3d\x7e\x11\x73\x93\x17\x2a" > + "\xae\x2d\x8a\x57\x1e\x03\xac\x9c" > + "\x9e\xb7\x6f\xac\x45\xaf\x8e\x51" > + "\x30\xc8\x1c\x46\xa3\x5c\xe4\x11" > + "\xe5\xfb\xc1\x19\x1a\x0a\x52\xef" > + "\xf6\x9f\x24\x45\xdf\x4f\x9b\x17" > + "\xad\x2b\x41\x7b\xe6\x6c\x37\x10", > + .ctext = "\xcd\xc8\x0d\x6f\xdd\xf1\x8c\xab" > + "\x34\xc2\x59\x09\xc9\x9a\x41\x74" > + "\x67\xce\x7f\x7f\x81\x17\x36\x21" > +
RE: [PATCH 0/3] crypto: ccree: add SM3 support
Hi Herbert, I'm sorry for notifying just now, but this patch set should be applies on top of "crypto: ccree: add CryptoCell 713 baseline support" patch set by Gilad Ben-Yossef Hence failures reported by kbuild test robot . > -Original Message- > From: Gilad Ben-Yossef > Sent: Thursday, 18 October 2018 16:26 > To: yaeceh01 > Cc: Herbert Xu ; David Miller > ; Linux Crypto Mailing List cry...@vger.kernel.org>; Ofir Drang ; Linux kernel > mailing list > Subject: Re: [PATCH 0/3] crypto: ccree: add SM3 support > > On Thu, Oct 18, 2018 at 4:00 PM Yael Chemla > wrote: > > > > Add support for SM3 in CryptoCell 713. > > > > Yael Chemla (3): > > crypto: ccree: adjust hash length to suit certain context specifics > > crypto: ccree: modify set_cipher_mode usage from cc_hash > > crypto: ccree: add SM3 support > > > > drivers/crypto/Kconfig | 1 + > > drivers/crypto/ccree/cc_aead.c | 19 +++- > > drivers/crypto/ccree/cc_crypto_ctx.h| 4 +- > > drivers/crypto/ccree/cc_driver.c| 10 +- > > drivers/crypto/ccree/cc_driver.h| 2 +- > > drivers/crypto/ccree/cc_hash.c | 175 +++-- > --- > > drivers/crypto/ccree/cc_hw_queue_defs.h | 27 + > > 7 files changed, 182 insertions(+), 56 deletions(-) > > > > -- > > 2.7.4 > > > > Herbert, these go on top of my own previous patch set that adds the > CryptoCell 713 and its SM4 cipher support. > > Acked-by: Gilad Ben-Yossef > > Thanks, > Gilad > > -- > Gilad Ben-Yossef > Chief Coffee Drinker > > values of β will give rise to dom!
Re: [PATCH 1/2] crypto: fix cfb mode decryption
(+ James) On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov wrote: > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with > IV, rather than with data stream, resulting in incorrect decryption. > Test vectors will be added in the next patch. > > Signed-off-by: Dmitry Eremin-Solenikov > Cc: sta...@vger.kernel.org > --- > crypto/cfb.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/crypto/cfb.c b/crypto/cfb.c > index a0d68c09e1b9..fd4e8500e121 100644 > --- a/crypto/cfb.c > +++ b/crypto/cfb.c > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct > skcipher_walk *walk, > > do { > crypto_cfb_encrypt_one(tfm, iv, dst); > - crypto_xor(dst, iv, bsize); > + crypto_xor(dst, src, bsize); > iv = src; > > src += bsize; > -- > 2.19.1 >
Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks
On 20 October 2018 at 04:39, Eric Biggers wrote: > On Fri, Oct 19, 2018 at 05:54:12PM +0800, Ard Biesheuvel wrote: >> On 19 October 2018 at 13:41, Ard Biesheuvel >> wrote: >> > On 18 October 2018 at 12:37, Eric Biggers wrote: >> >> From: Eric Biggers >> >> >> >> Make the ARM scalar AES implementation closer to constant-time by >> >> disabling interrupts and prefetching the tables into L1 cache. This is >> >> feasible because due to ARM's "free" rotations, the main tables are only >> >> 1024 bytes instead of the usual 4096 used by most AES implementations. >> >> >> >> On ARM Cortex-A7, the speed loss is only about 5%. The resulting code >> >> is still over twice as fast as aes_ti.c. Responsiveness is potentially >> >> a concern, but interrupts are only disabled for a single AES block. >> >> >> > >> > So that would be in the order of 700 cycles, based on the numbers you >> > shared in v1 of the aes_ti.c patch. Does that sound about right? So >> > that would be around 1 microsecond, which is really not a number to >> > obsess about imo. >> > >> > I considered another option, which is to detect whether an interrupt >> > has been taken (by writing some canary value below that stack pointer >> > in the location where the exception handler will preserve the value of >> > sp, and checking at the end whether it has been modified) and doing a >> > usleep_range(x, y) if that is the case. >> > >> > But this is much simpler so let's only go there if we must. >> > >> >> I played around a bit and implemented it for discussion purposes, but >> restarting the operation if it gets interrupted, as suggested in the >> paper (whitespace corruption courtesy of Gmail) >> >> >> diff --git a/arch/arm/crypto/aes-cipher-core.S >> b/arch/arm/crypto/aes-cipher-core.S >> index 184d6c2d15d5..2e8a84a47784 100644 >> --- a/arch/arm/crypto/aes-cipher-core.S >> +++ b/arch/arm/crypto/aes-cipher-core.S >> @@ -10,6 +10,7 @@ >> */ >> >> #include >> +#include >> #include >> >> .text >> @@ -139,6 +140,34 @@ >> >> __adrl ttab, \ttab >> >> + /* >> + * Set a canary that will allow us to tell whether any >> + * interrupts were taken while this function was executing. >> + * The zero value will be overwritten with the process counter >> + * value at the point where the IRQ exception is taken. >> + */ >> + mov t0, #0 >> + str t0, [sp, #-(SVC_REGS_SIZE - S_PC)] >> + >> + /* >> + * Prefetch the 1024-byte 'ft' or 'it' table into L1 cache, >> + * assuming cacheline size >= 32. This is a hardening measure >> + * intended to make cache-timing attacks more difficult. >> + * They may not be fully prevented, however; see the paper >> + * https://cr.yp.to/antiforgery/cachetiming-20050414.pdf >> + * ("Cache-timing attacks on AES") for a discussion of the many >> + * difficulties involved in writing truly constant-time AES >> + * software. >> + */ >> + .set i, 0 >> + .rept 1024 / 128 >> + ldr r8, [ttab, #i + 0] >> + ldr r9, [ttab, #i + 32] >> + ldr r10, [ttab, #i + 64] >> + ldr r11, [ttab, #i + 96] >> + .set i, i + 128 >> + .endr >> + >> tst rounds, #2 >> bne 1f >> >> @@ -154,6 +183,8 @@ >> 2: __adrl ttab, \ltab >> \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b >> >> + ldr r0, [sp, #-(SVC_REGS_SIZE - S_PC)] // check canary >> + >> #ifdef CONFIG_CPU_BIG_ENDIAN >> __rev r4, r4 >> __rev r5, r5 >> diff --git a/arch/arm/crypto/aes-cipher-glue.c >> b/arch/arm/crypto/aes-cipher-glue.c >> index c222f6e072ad..de8f32121511 100644 >> --- a/arch/arm/crypto/aes-cipher-glue.c >> +++ b/arch/arm/crypto/aes-cipher-glue.c >> @@ -11,28 +11,39 @@ >> >> #include >> #include >> +#include >> #include >> >> -asmlinkage void __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 >> *out); >> +asmlinkage int __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 >> *out); >> EXPORT_SYMBOL(__aes_arm_encrypt); >> >> -asmlinkage void __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 >> *out); >> +asmlinkage int __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 >> *out); >> EXPORT_SYMBOL(__aes_arm_decrypt); >> >> static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) >> { >> struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); >> int rounds = 6 + ctx->key_length / 4; >> + u8 buf[AES_BLOCK_SIZE]; >> >> - __aes_arm_encrypt(ctx->key_enc, rounds, in, out); >> + if (out == in) >> + in = memcpy(buf, in, AES_BLOCK_SIZE); >> + >> + while (unlikely(__aes_arm_encrypt(ctx->key_enc, rounds, in, out))) >> + cpu_relax(); >> } >> >> static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) >> { >> struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); >> int rounds = 6 + ctx->key_length / 4; >> + u8 buf[AES_BLOCK_SIZE]; >> + >> + if (out == in) >> + in = memcpy(buf, in, AES_BLOCK_SIZE); >> >> - __aes_arm_decrypt(ctx->key_dec, rounds, in, out); >> + while (unlikely(__aes_arm_decrypt(ctx->key_dec, rounds, in, out))) >> + cpu_relax(); >> } >> >> static struct crypto_alg aes_alg = { > > It's an interesting
Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks
On Fri, Oct 19, 2018 at 05:54:12PM +0800, Ard Biesheuvel wrote: > On 19 October 2018 at 13:41, Ard Biesheuvel wrote: > > On 18 October 2018 at 12:37, Eric Biggers wrote: > >> From: Eric Biggers > >> > >> Make the ARM scalar AES implementation closer to constant-time by > >> disabling interrupts and prefetching the tables into L1 cache. This is > >> feasible because due to ARM's "free" rotations, the main tables are only > >> 1024 bytes instead of the usual 4096 used by most AES implementations. > >> > >> On ARM Cortex-A7, the speed loss is only about 5%. The resulting code > >> is still over twice as fast as aes_ti.c. Responsiveness is potentially > >> a concern, but interrupts are only disabled for a single AES block. > >> > > > > So that would be in the order of 700 cycles, based on the numbers you > > shared in v1 of the aes_ti.c patch. Does that sound about right? So > > that would be around 1 microsecond, which is really not a number to > > obsess about imo. > > > > I considered another option, which is to detect whether an interrupt > > has been taken (by writing some canary value below that stack pointer > > in the location where the exception handler will preserve the value of > > sp, and checking at the end whether it has been modified) and doing a > > usleep_range(x, y) if that is the case. > > > > But this is much simpler so let's only go there if we must. > > > > I played around a bit and implemented it for discussion purposes, but > restarting the operation if it gets interrupted, as suggested in the > paper (whitespace corruption courtesy of Gmail) > > > diff --git a/arch/arm/crypto/aes-cipher-core.S > b/arch/arm/crypto/aes-cipher-core.S > index 184d6c2d15d5..2e8a84a47784 100644 > --- a/arch/arm/crypto/aes-cipher-core.S > +++ b/arch/arm/crypto/aes-cipher-core.S > @@ -10,6 +10,7 @@ > */ > > #include > +#include > #include > > .text > @@ -139,6 +140,34 @@ > > __adrl ttab, \ttab > > + /* > + * Set a canary that will allow us to tell whether any > + * interrupts were taken while this function was executing. > + * The zero value will be overwritten with the process counter > + * value at the point where the IRQ exception is taken. > + */ > + mov t0, #0 > + str t0, [sp, #-(SVC_REGS_SIZE - S_PC)] > + > + /* > + * Prefetch the 1024-byte 'ft' or 'it' table into L1 cache, > + * assuming cacheline size >= 32. This is a hardening measure > + * intended to make cache-timing attacks more difficult. > + * They may not be fully prevented, however; see the paper > + * https://cr.yp.to/antiforgery/cachetiming-20050414.pdf > + * ("Cache-timing attacks on AES") for a discussion of the many > + * difficulties involved in writing truly constant-time AES > + * software. > + */ > + .set i, 0 > + .rept 1024 / 128 > + ldr r8, [ttab, #i + 0] > + ldr r9, [ttab, #i + 32] > + ldr r10, [ttab, #i + 64] > + ldr r11, [ttab, #i + 96] > + .set i, i + 128 > + .endr > + > tst rounds, #2 > bne 1f > > @@ -154,6 +183,8 @@ > 2: __adrl ttab, \ltab > \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b > > + ldr r0, [sp, #-(SVC_REGS_SIZE - S_PC)] // check canary > + > #ifdef CONFIG_CPU_BIG_ENDIAN > __rev r4, r4 > __rev r5, r5 > diff --git a/arch/arm/crypto/aes-cipher-glue.c > b/arch/arm/crypto/aes-cipher-glue.c > index c222f6e072ad..de8f32121511 100644 > --- a/arch/arm/crypto/aes-cipher-glue.c > +++ b/arch/arm/crypto/aes-cipher-glue.c > @@ -11,28 +11,39 @@ > > #include > #include > +#include > #include > > -asmlinkage void __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 > *out); > +asmlinkage int __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 *out); > EXPORT_SYMBOL(__aes_arm_encrypt); > > -asmlinkage void __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 > *out); > +asmlinkage int __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 *out); > EXPORT_SYMBOL(__aes_arm_decrypt); > > static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > { > struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > int rounds = 6 + ctx->key_length / 4; > + u8 buf[AES_BLOCK_SIZE]; > > - __aes_arm_encrypt(ctx->key_enc, rounds, in, out); > + if (out == in) > + in = memcpy(buf, in, AES_BLOCK_SIZE); > + > + while (unlikely(__aes_arm_encrypt(ctx->key_enc, rounds, in, out))) > + cpu_relax(); > } > > static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > { > struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > int rounds = 6 + ctx->key_length / 4; > + u8 buf[AES_BLOCK_SIZE]; > + > + if (out == in) > + in = memcpy(buf, in, AES_BLOCK_SIZE); > > - __aes_arm_decrypt(ctx->key_dec, rounds, in, out); > + while (unlikely(__aes_arm_decrypt(ctx->key_dec, rounds, in, out))) > + cpu_relax(); > } > > static struct crypto_alg aes_alg = { It's an interesting idea, but the main thing I don't like about this is that the time it takes to do the encryption/decryption is unbounded, since it could get livelocked with a high rate of interrupts. To fix
Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks
On Fri, Oct 19, 2018 at 01:41:35PM +0800, Ard Biesheuvel wrote: > On 18 October 2018 at 12:37, Eric Biggers wrote: > > From: Eric Biggers > > > > Make the ARM scalar AES implementation closer to constant-time by > > disabling interrupts and prefetching the tables into L1 cache. This is > > feasible because due to ARM's "free" rotations, the main tables are only > > 1024 bytes instead of the usual 4096 used by most AES implementations. > > > > On ARM Cortex-A7, the speed loss is only about 5%. The resulting code > > is still over twice as fast as aes_ti.c. Responsiveness is potentially > > a concern, but interrupts are only disabled for a single AES block. > > > > So that would be in the order of 700 cycles, based on the numbers you > shared in v1 of the aes_ti.c patch. Does that sound about right? So > that would be around 1 microsecond, which is really not a number to > obsess about imo. > Correct, on ARM Cortex-A7 I'm seeing slightly over 700 cycles per block encrypted or decrypted, including the prefetching. - Eric
Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks
On 19 October 2018 at 13:41, Ard Biesheuvel wrote: > On 18 October 2018 at 12:37, Eric Biggers wrote: >> From: Eric Biggers >> >> Make the ARM scalar AES implementation closer to constant-time by >> disabling interrupts and prefetching the tables into L1 cache. This is >> feasible because due to ARM's "free" rotations, the main tables are only >> 1024 bytes instead of the usual 4096 used by most AES implementations. >> >> On ARM Cortex-A7, the speed loss is only about 5%. The resulting code >> is still over twice as fast as aes_ti.c. Responsiveness is potentially >> a concern, but interrupts are only disabled for a single AES block. >> > > So that would be in the order of 700 cycles, based on the numbers you > shared in v1 of the aes_ti.c patch. Does that sound about right? So > that would be around 1 microsecond, which is really not a number to > obsess about imo. > > I considered another option, which is to detect whether an interrupt > has been taken (by writing some canary value below that stack pointer > in the location where the exception handler will preserve the value of > sp, and checking at the end whether it has been modified) and doing a > usleep_range(x, y) if that is the case. > > But this is much simpler so let's only go there if we must. > I played around a bit and implemented it for discussion purposes, but restarting the operation if it gets interrupted, as suggested in the paper (whitespace corruption courtesy of Gmail) diff --git a/arch/arm/crypto/aes-cipher-core.S b/arch/arm/crypto/aes-cipher-core.S index 184d6c2d15d5..2e8a84a47784 100644 --- a/arch/arm/crypto/aes-cipher-core.S +++ b/arch/arm/crypto/aes-cipher-core.S @@ -10,6 +10,7 @@ */ #include +#include #include .text @@ -139,6 +140,34 @@ __adrl ttab, \ttab + /* + * Set a canary that will allow us to tell whether any + * interrupts were taken while this function was executing. + * The zero value will be overwritten with the process counter + * value at the point where the IRQ exception is taken. + */ + mov t0, #0 + str t0, [sp, #-(SVC_REGS_SIZE - S_PC)] + + /* + * Prefetch the 1024-byte 'ft' or 'it' table into L1 cache, + * assuming cacheline size >= 32. This is a hardening measure + * intended to make cache-timing attacks more difficult. + * They may not be fully prevented, however; see the paper + * https://cr.yp.to/antiforgery/cachetiming-20050414.pdf + * ("Cache-timing attacks on AES") for a discussion of the many + * difficulties involved in writing truly constant-time AES + * software. + */ + .set i, 0 + .rept 1024 / 128 + ldr r8, [ttab, #i + 0] + ldr r9, [ttab, #i + 32] + ldr r10, [ttab, #i + 64] + ldr r11, [ttab, #i + 96] + .set i, i + 128 + .endr + tst rounds, #2 bne 1f @@ -154,6 +183,8 @@ 2: __adrl ttab, \ltab \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b + ldr r0, [sp, #-(SVC_REGS_SIZE - S_PC)] // check canary + #ifdef CONFIG_CPU_BIG_ENDIAN __rev r4, r4 __rev r5, r5 diff --git a/arch/arm/crypto/aes-cipher-glue.c b/arch/arm/crypto/aes-cipher-glue.c index c222f6e072ad..de8f32121511 100644 --- a/arch/arm/crypto/aes-cipher-glue.c +++ b/arch/arm/crypto/aes-cipher-glue.c @@ -11,28 +11,39 @@ #include #include +#include #include -asmlinkage void __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 *out); +asmlinkage int __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 *out); EXPORT_SYMBOL(__aes_arm_encrypt); -asmlinkage void __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 *out); +asmlinkage int __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 *out); EXPORT_SYMBOL(__aes_arm_decrypt); static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); int rounds = 6 + ctx->key_length / 4; + u8 buf[AES_BLOCK_SIZE]; - __aes_arm_encrypt(ctx->key_enc, rounds, in, out); + if (out == in) + in = memcpy(buf, in, AES_BLOCK_SIZE); + + while (unlikely(__aes_arm_encrypt(ctx->key_enc, rounds, in, out))) + cpu_relax(); } static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); int rounds = 6 + ctx->key_length / 4; + u8 buf[AES_BLOCK_SIZE]; + + if (out == in) + in = memcpy(buf, in, AES_BLOCK_SIZE); - __aes_arm_decrypt(ctx->key_dec, rounds, in, out); + while (unlikely(__aes_arm_decrypt(ctx->key_dec, rounds, in, out))) + cpu_relax(); } static struct crypto_alg aes_alg = {
Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks
On 18 October 2018 at 12:37, Eric Biggers wrote: > From: Eric Biggers > > Make the ARM scalar AES implementation closer to constant-time by > disabling interrupts and prefetching the tables into L1 cache. This is > feasible because due to ARM's "free" rotations, the main tables are only > 1024 bytes instead of the usual 4096 used by most AES implementations. > > On ARM Cortex-A7, the speed loss is only about 5%. The resulting code > is still over twice as fast as aes_ti.c. Responsiveness is potentially > a concern, but interrupts are only disabled for a single AES block. > So that would be in the order of 700 cycles, based on the numbers you shared in v1 of the aes_ti.c patch. Does that sound about right? So that would be around 1 microsecond, which is really not a number to obsess about imo. I considered another option, which is to detect whether an interrupt has been taken (by writing some canary value below that stack pointer in the location where the exception handler will preserve the value of sp, and checking at the end whether it has been modified) and doing a usleep_range(x, y) if that is the case. But this is much simpler so let's only go there if we must. > Note that even after these changes, the implementation still isn't > necessarily guaranteed to be constant-time; see > https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion > of the many difficulties involved in writing truly constant-time AES > software. But it's valuable to make such attacks more difficult. > > Much of this patch is based on patches suggested by Ard Biesheuvel. > > Suggested-by: Ard Biesheuvel > Signed-off-by: Eric Biggers Reviewed-by: Ard Biesheuvel > --- > arch/arm/crypto/Kconfig | 9 + > arch/arm/crypto/aes-cipher-core.S | 62 ++- > crypto/aes_generic.c | 9 +++-- > 3 files changed, 66 insertions(+), 14 deletions(-) > > diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig > index ef0c7feea6e29..0473a8f683896 100644 > --- a/arch/arm/crypto/Kconfig > +++ b/arch/arm/crypto/Kconfig > @@ -69,6 +69,15 @@ config CRYPTO_AES_ARM > help > Use optimized AES assembler routines for ARM platforms. > > + On ARM processors without the Crypto Extensions, this is the > + fastest AES implementation for single blocks. For multiple > + blocks, the NEON bit-sliced implementation is usually faster. > + > + This implementation may be vulnerable to cache timing attacks, > + since it uses lookup tables. However, as countermeasures it > + disables IRQs and preloads the tables; it is hoped this makes > + such attacks very difficult. > + > config CRYPTO_AES_ARM_BS > tristate "Bit sliced AES using NEON instructions" > depends on KERNEL_MODE_NEON > diff --git a/arch/arm/crypto/aes-cipher-core.S > b/arch/arm/crypto/aes-cipher-core.S > index 184d6c2d15d5e..f2d67c095e596 100644 > --- a/arch/arm/crypto/aes-cipher-core.S > +++ b/arch/arm/crypto/aes-cipher-core.S > @@ -10,6 +10,7 @@ > */ > > #include > +#include > #include > > .text > @@ -41,7 +42,7 @@ > .endif > .endm > > - .macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, > enc, sz, op > + .macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, > enc, sz, op, oldcpsr > __select\out0, \in0, 0 > __selectt0, \in1, 1 > __load \out0, \out0, 0, \sz, \op > @@ -73,6 +74,14 @@ > __load t0, t0, 3, \sz, \op > __load \t4, \t4, 3, \sz, \op > > + .ifnb \oldcpsr > + /* > +* This is the final round and we're done with all data-dependent > table > +* lookups, so we can safely re-enable interrupts. > +*/ > + restore_irqs\oldcpsr > + .endif > + > eor \out1, \out1, t1, ror #24 > eor \out0, \out0, t2, ror #16 > ldm rk!, {t1, t2} > @@ -83,14 +92,14 @@ > eor \out1, \out1, t2 > .endm > > - .macro fround, out0, out1, out2, out3, in0, in1, in2, in3, > sz=2, op > + .macro fround, out0, out1, out2, out3, in0, in1, in2, in3, > sz=2, op, oldcpsr > __hround\out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, > 1, \sz, \op > - __hround\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, > \sz, \op > + __hround\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, > \sz, \op, \oldcpsr > .endm > > - .macro iround, out0,
Re: [PATCH v2 1/2] crypto: aes_ti - disable interrupts while accessing S-box
Hi Eric, On 17 October 2018 at 14:18, Eric Biggers wrote: > From: Eric Biggers > > In the "aes-fixed-time" AES implementation, disable interrupts while > accessing the S-box, in order to make cache-timing attacks more > difficult. Previously it was possible for the CPU to be interrupted > while the S-box was loaded into L1 cache, potentially evicting the > cachelines and causing later table lookups to be time-variant. > > In tests I did on x86 and ARM, this doesn't affect performance > significantly. Responsiveness is potentially a concern, but interrupts > are only disabled for a single AES block. > > Note that even after this change, the implementation still isn't > necessarily guaranteed to be constant-time; see > https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion > of the many difficulties involved in writing truly constant-time AES > software. But it's valuable to make such attacks more difficult. > > Signed-off-by: Eric Biggers Thanks for taking a look. Could we add something to the Kconfig blurb that mentions that it runs the algorithm witn interrupts disabled? In any case, Reviewed-by: Ard Biesheuvel > --- > crypto/aes_ti.c | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c > index 03023b2290e8..1ff9785b30f5 100644 > --- a/crypto/aes_ti.c > +++ b/crypto/aes_ti.c > @@ -269,6 +269,7 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > const u32 *rkp = ctx->key_enc + 4; > int rounds = 6 + ctx->key_length / 4; > u32 st0[4], st1[4]; > + unsigned long flags; > int round; > > st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in); > @@ -276,6 +277,12 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8); > st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in + 12); > > + /* > +* Temporarily disable interrupts to avoid races where cachelines are > +* evicted when the CPU is interrupted to do something else. > +*/ > + local_irq_save(flags); > + > st0[0] ^= __aesti_sbox[ 0] ^ __aesti_sbox[128]; > st0[1] ^= __aesti_sbox[32] ^ __aesti_sbox[160]; > st0[2] ^= __aesti_sbox[64] ^ __aesti_sbox[192]; > @@ -300,6 +307,8 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > put_unaligned_le32(subshift(st1, 1) ^ rkp[5], out + 4); > put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8); > put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12); > + > + local_irq_restore(flags); > } > > static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > @@ -308,6 +317,7 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > const u32 *rkp = ctx->key_dec + 4; > int rounds = 6 + ctx->key_length / 4; > u32 st0[4], st1[4]; > + unsigned long flags; > int round; > > st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in); > @@ -315,6 +325,12 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > st0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8); > st0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12); > > + /* > +* Temporarily disable interrupts to avoid races where cachelines are > +* evicted when the CPU is interrupted to do something else. > +*/ > + local_irq_save(flags); > + > st0[0] ^= __aesti_inv_sbox[ 0] ^ __aesti_inv_sbox[128]; > st0[1] ^= __aesti_inv_sbox[32] ^ __aesti_inv_sbox[160]; > st0[2] ^= __aesti_inv_sbox[64] ^ __aesti_inv_sbox[192]; > @@ -339,6 +355,8 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[5], out + 4); > put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8); > put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12); > + > + local_irq_restore(flags); > } > > static struct crypto_alg aes_alg = { > -- > 2.19.1 >
Re: [PATCH v2 2/2] crypto: arm/aes - add some hardening against cache-timing attacks
Hi Eric, Thanks for looking into this. On 17 October 2018 at 14:18, Eric Biggers wrote: > From: Eric Biggers > > Make the ARM scalar AES implementation closer to constant-time by > disabling interrupts and prefetching the tables into L1 cache. This is > feasible because due to ARM's "free" rotations, the main tables are only > 1024 bytes instead of the usual 4096 used by most AES implementations. > > On ARM Cortex-A7, the speed loss is only about 5%. The resulting > implementation is still over twice as fast as aes_ti.c. > > Note that even after these changes, the implementation still isn't > necessarily guaranteed to be constant-time; see > https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion > of the many difficulties involved in writing truly constant-time AES > software. But it's valuable to make such attacks more difficult. > > Suggested-by: Ard Biesheuvel > Signed-off-by: Eric Biggers > --- > arch/arm/crypto/aes-cipher-core.S | 26 ++ > arch/arm/crypto/aes-cipher-glue.c | 13 + > crypto/aes_generic.c | 9 + > 3 files changed, 44 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/crypto/aes-cipher-core.S > b/arch/arm/crypto/aes-cipher-core.S > index 184d6c2d15d5..ba9d4aefe585 100644 > --- a/arch/arm/crypto/aes-cipher-core.S > +++ b/arch/arm/crypto/aes-cipher-core.S > @@ -138,6 +138,23 @@ > eor r7, r7, r11 > > __adrl ttab, \ttab > + /* > +* Prefetch the 1024-byte 'ft' or 'it' table into L1 cache, assuming > +* cacheline size >= 32. This, along with the caller disabling > +* interrupts, is a hardening measure intended to make cache-timing > +* attacks more difficult. They may not be fully prevented, however; > +* see the paper https://cr.yp.to/antiforgery/cachetiming-20050414.pdf > +* ("Cache-timing attacks on AES") for a discussion of the many > +* difficulties involved in writing truly constant-time AES software. > +*/ > + .set i, 0 > +.rept 1024 / 128 > + ldr r8, [ttab, #i + 0] > + ldr r9, [ttab, #i + 32] > + ldr r10, [ttab, #i + 64] > + ldr r11, [ttab, #i + 96] > + .set i, i + 128 > +.endr > Mind sticking a bit more to the coding style of the file? I.e., indent the gas directives with the code, and putting two tabs after them > tst rounds, #2 > bne 1f > @@ -152,6 +169,15 @@ > b 0b > > 2: __adrl ttab, \ltab > +.if \bsz == 0 > + /* Prefetch the 256-byte inverse S-box; see explanation above */ > + .set i, 0 > +.rept 256 / 64 > + ldr t0, [ttab, #i + 0] > + ldr t1, [ttab, #i + 32] > + .set i, i + 64 > +.endr > +.endif > \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b > > #ifdef CONFIG_CPU_BIG_ENDIAN > diff --git a/arch/arm/crypto/aes-cipher-glue.c > b/arch/arm/crypto/aes-cipher-glue.c > index c222f6e072ad..f40e35eb22e4 100644 > --- a/arch/arm/crypto/aes-cipher-glue.c > +++ b/arch/arm/crypto/aes-cipher-glue.c > @@ -23,16 +23,29 @@ static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, > const u8 *in) > { > struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > int rounds = 6 + ctx->key_length / 4; > + unsigned long flags; > > + /* > +* This AES implementation prefetches the lookup table into L1 cache > to > +* try to make timing attacks on the table lookups more difficult. > +* Temporarily disable interrupts to avoid races where cachelines are > +* evicted when the CPU is interrupted to do something else. > +*/ > + local_irq_save(flags); > __aes_arm_encrypt(ctx->key_enc, rounds, in, out); > + local_irq_restore(flags); > } > Disabling interrupts like that is going to raise some eyebrows, so I'd prefer it if we can reduce the scope of the IRQ blackout as much as we can. This means we should move the IRQ en/disable into the .S file, and only let it cover the parts of the code where we are actually doing table lookups that are indexed by the input. > static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > { > struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > int rounds = 6 + ctx->key_length / 4; > + unsigned long flags; > > + /* Disable interrupts to help mitigate timing attacks, see above */ > + local_irq_save(flags); > __aes_arm_decrypt(ctx->key_dec, rounds, in, out); > + local_irq_restore(flags); > } > > static struct crypto_alg aes_alg = { > diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c > index ca554d57d01e..13df33aca463 100644 > --- a/crypto/aes_generic.c > +++ b/crypto/aes_generic.c > @@ -63,7 +63,8 @@ static inline u8 byte(const u32 x, const unsigned n) > > static const u32 rco_tab[10] = { 1, 2, 4,
Re: [PATCH] crypto: caam - add SPDX license identifier to all files
On Wed, Oct 10, 2018 at 02:26:48PM +0300, Horia Geantă wrote: > Previously, a tree-wide change added SPDX license identifiers to > files lacking licensing information: > b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files > with no license") > > To be consistent update the rest of the files: > -files with license specified by means of MODULE_LICENSE() > -files with complete license text > -Kconfig > > Signed-off-by: Horia Geantă > --- > drivers/crypto/caam/Kconfig| 1 + > drivers/crypto/caam/caamalg.c | 1 + > drivers/crypto/caam/caamalg_desc.c | 1 + > drivers/crypto/caam/caamalg_qi.c | 1 + > drivers/crypto/caam/caamhash.c | 1 + > drivers/crypto/caam/caampkc.c | 1 + > drivers/crypto/caam/caamrng.c | 1 + > drivers/crypto/caam/ctrl.c | 1 + > drivers/crypto/caam/jr.c | 1 + > drivers/crypto/caam/sg_sw_qm.h | 29 + > drivers/crypto/caam/sg_sw_qm2.h| 30 +- > 11 files changed, 11 insertions(+), 57 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: aes_ti - disable interrupts while accessing sbox
Hi Ard, On Thu, Oct 04, 2018 at 08:55:14AM +0200, Ard Biesheuvel wrote: > Hi Eric, > > On 4 October 2018 at 06:07, Eric Biggers wrote: > > From: Eric Biggers > > > > The generic constant-time AES implementation is supposed to preload the > > AES S-box into the CPU's L1 data cache. But, an interrupt handler can > > run on the CPU and muck with the cache. Worse, on preemptible kernels > > the process can even be preempted and moved to a different CPU. So the > > implementation may actually still be vulnerable to cache-timing attacks. > > > > Make it more robust by disabling interrupts while the sbox is used. > > > > In some quick tests on x86 and ARM, this doesn't affect performance > > significantly. Responsiveness is also a concern, but interrupts are > > only disabled for a single AES block which even on ARM Cortex-A7 is > > "only" ~1500 cycles to encrypt or ~2600 cycles to decrypt. > > > > I share your concern, but that is quite a big hammer. > > Also, does it really take ~100 cycles per byte? That is terrible :-) > > Given that the full lookup table is only 1024 bytes (or 1024+256 bytes > for decryption), I wonder if something like the below would be a > better option for A7 (apologies for the mangled whitespace) > > diff --git a/arch/arm/crypto/aes-cipher-core.S > b/arch/arm/crypto/aes-cipher-core.S > index 184d6c2d15d5..83e893f7e581 100644 > --- a/arch/arm/crypto/aes-cipher-core.S > +++ b/arch/arm/crypto/aes-cipher-core.S > @@ -139,6 +139,13 @@ > > __adrl ttab, \ttab > > + .irpc r, 01234567 > + ldr r8, [ttab, #(32 * \r)] > + ldr r9, [ttab, #(32 * \r) + 256] > + ldr r10, [ttab, #(32 * \r) + 512] > + ldr r11, [ttab, #(32 * \r) + 768] > + .endr > + > tst rounds, #2 > bne 1f > > @@ -180,6 +187,12 @@ ENDPROC(__aes_arm_encrypt) > > .align 5 > ENTRY(__aes_arm_decrypt) > + __adrl ttab, __aes_arm_inverse_sbox > + > + .irpc r, 01234567 > + ldr r8, [ttab, #(32 * \r)] > + .endr > + > do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0 > ENDPROC(__aes_arm_decrypt) > > diff --git a/arch/arm/crypto/aes-cipher-glue.c > b/arch/arm/crypto/aes-cipher-glue.c > index c222f6e072ad..630e1a436f1d 100644 > --- a/arch/arm/crypto/aes-cipher-glue.c > +++ b/arch/arm/crypto/aes-cipher-glue.c > @@ -23,16 +23,22 @@ static void aes_encrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > { > struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > int rounds = 6 + ctx->key_length / 4; > + unsigned long flags; > > + local_irq_save(flags); > __aes_arm_encrypt(ctx->key_enc, rounds, in, out); > + local_irq_restore(flags); > } > > static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > { > struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); > int rounds = 6 + ctx->key_length / 4; > + unsigned long flags; > > + local_irq_save(flags); > __aes_arm_decrypt(ctx->key_dec, rounds, in, out); > + local_irq_restore(flags); > } > > static struct crypto_alg aes_alg = { > diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c > index ca554d57d01e..82fa860c9cb9 100644 > --- a/crypto/aes_generic.c > +++ b/crypto/aes_generic.c > @@ -63,7 +63,7 @@ static inline u8 byte(const u32 x, const unsigned n) > > static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 }; > > -__visible const u32 crypto_ft_tab[4][256] = { > +__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = { > { > 0xa56363c6, 0x847c7cf8, 0x99ee, 0x8d7b7bf6, > 0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591, > @@ -327,7 +327,7 @@ __visible const u32 crypto_ft_tab[4][256] = { > } > }; > > -__visible const u32 crypto_fl_tab[4][256] = { > +__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = { > { > 0x0063, 0x007c, 0x0077, 0x007b, > 0x00f2, 0x006b, 0x006f, 0x00c5, > @@ -591,7 +591,7 @@ __visible const u32 crypto_fl_tab[4][256] = { > } > }; > > -__visible const u32 crypto_it_tab[4][256] = { > +__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = { > { > 0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a, > 0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b, > @@ -855,7 +855,7 @@ __visible const u32 crypto_it_tab[4][256] = { > } > }; > > -__visible const u32 crypto_il_tab[4][256] = { > +__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = { > { > 0x0052, 0x0009, 0x006a, 0x00d5, > 0x0030, 0x0036, 0x00a5, 0x0038, > Thanks for the suggestion -- this turns out to work pretty well. At least in a microbenchmark, loading the larger table only slows it down by around 5%, so it's still around 2-3 times faster than aes_ti.c on ARM Cortex-A7. It also noticeably improves Adiantum performance (Adiantum-XChaCha12-AES): Before (cpb)After (cpb) Adiantum (enc), size=4096: 10.91 10.58 Adiantum (dec), size=4096: 11.04 10.62 Adiantum (enc), size=512:18.07 15.57 Adiantum (dec), size=512:19.10 15.83 (Those
Re: [bug report] crypto: user - Implement a generic crypto statistics
On Thu, Oct 11, 2018 at 02:10:42PM +0300, Dan Carpenter wrote: > Hello Corentin Labbe, > > The patch cac5818c25d0: "crypto: user - Implement a generic crypto > statistics" from Sep 19, 2018, leads to the following static checker > warning: > > crypto/crypto_user_stat.c:53 crypto_report_aead() > warn: check that 'raead' doesn't leak information (struct has holes) > > crypto/crypto_user_stat.c > 34 static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg > *alg) > 35 { > 36 struct crypto_stat raead; >^^^ > 37 u64 v64; > 38 u32 v32; > 39 > 40 strncpy(raead.type, "aead", sizeof(raead.type)); > 41 > 42 v32 = atomic_read(>encrypt_cnt); > 43 raead.stat_encrypt_cnt = v32; > 44 v64 = atomic64_read(>encrypt_tlen); > 45 raead.stat_encrypt_tlen = v64; > 46 v32 = atomic_read(>decrypt_cnt); > 47 raead.stat_decrypt_cnt = v32; > 48 v64 = atomic64_read(>decrypt_tlen); > 49 raead.stat_decrypt_tlen = v64; > 50 v32 = atomic_read(>aead_err_cnt); > 51 raead.stat_aead_err_cnt = v32; > 52 > 53 if (nla_put(skb, CRYPTOCFGA_STAT_AEAD, > 54 sizeof(struct crypto_stat), )) > ^^ > We haven't totally initialized raead and also it apparently has struct > holes after the u64s. > > 55 goto nla_put_failure; > 56 return 0; > 57 > 58 nla_put_failure: > 59 return -EMSGSIZE; > 60 } > > See also: > > crypto/crypto_user_stat.c:53 crypto_report_aead() warn: check that 'raead' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:81 crypto_report_cipher() warn: check that > 'rcipher' doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:108 crypto_report_comp() warn: check that 'rcomp' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:135 crypto_report_acomp() warn: check that 'racomp' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:166 crypto_report_akcipher() warn: check that > 'rakcipher' doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:191 crypto_report_kpp() warn: check that 'rkpp' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:215 crypto_report_ahash() warn: check that 'rhash' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:239 crypto_report_shash() warn: check that 'rhash' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:265 crypto_report_rng() warn: check that 'rrng' > doesn't leak information (struct has holes) > crypto/crypto_user_stat.c:295 crypto_reportstat_one() warn: check that 'rl' > doesn't leak information (struct has holes) > > regards, > dan carpenter Thanks for the report, I will send a fix soon. Regards
Re: [PATCH] crypto: arm64/aes-blk - ensure XTS mask is always loaded
On Mon, Oct 08, 2018 at 01:16:59PM +0200, Ard Biesheuvel wrote: > Commit 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling") > optimized away some reloads of the XTS mask vector, but failed to take > into account that calls into the XTS en/decrypt routines will take a > slightly different code path if a single block of input is split across > different buffers. So let's ensure that the first load occurs > unconditionally, and move the reload to the end so it doesn't occur > needlessly. > > Fixes: 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling") > Signed-off-by: Ard Biesheuvel > --- > arch/arm64/crypto/aes-modes.S | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH -next] crypto: mxs-dcp: make symbols 'sha1_null_hash' and 'sha256_null_hash' static
On Thu, Oct 11, 2018 at 01:49:48AM +, Wei Yongjun wrote: > Fixes the following sparse warnings: > > drivers/crypto/mxs-dcp.c:39:15: warning: > symbol 'sha1_null_hash' was not declared. Should it be static? > drivers/crypto/mxs-dcp.c:43:15: warning: > symbol 'sha256_null_hash' was not declared. Should it be static? > > Fixes: c709eebaf5c5 ("crypto: mxs-dcp - Fix SHA null hashes and output > length") > Signed-off-by: Wei Yongjun > --- > drivers/crypto/mxs-dcp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto/testmgr.c: fix sizeof() on COMP_BUF_SIZE
On Sun, Oct 07, 2018 at 01:58:10PM +0200, Michael Schupikov wrote: > After allocation, output and decomp_output both point to memory chunks of > size COMP_BUF_SIZE. Then, only the first bytes are zeroed out using > sizeof(COMP_BUF_SIZE) as parameter to memset(), because > sizeof(COMP_BUF_SIZE) provides the size of the constant and not the size of > allocated memory. > > Instead, the whole allocated memory is meant to be zeroed out. Use > COMP_BUF_SIZE as parameter to memset() directly in order to accomplish > this. > > Fixes: 336073840a872 ("crypto: testmgr - Allow different compression results") > > Signed-off-by: Michael Schupikov > --- > crypto/testmgr.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH -next] crypto: axis - fix platform_no_drv_owner.cocci warnings
On Fri, Oct 05, 2018 at 06:42:44AM +, YueHaibing wrote: > Remove .owner field if calls are used which set it automatically > Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci > > Signed-off-by: YueHaibing Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH -next] crypto: chtls - remove set but not used variable 'csk'
On Fri, Oct 05, 2018 at 06:43:27AM +, YueHaibing wrote: > Fixes gcc '-Wunused-but-set-variable' warning: > > drivers/crypto/chelsio/chtls/chtls_cm.c: In function 'chtls_disconnect': > drivers/crypto/chelsio/chtls/chtls_cm.c:408:21: warning: > variable 'csk' set but not used [-Wunused-but-set-variable] > > drivers/crypto/chelsio/chtls/chtls_cm.c: In function 'chtls_recv_sock': > drivers/crypto/chelsio/chtls/chtls_cm.c:1016:23: warning: > variable 'tcph' set but not used [-Wunused-but-set-variable] > > 'csk' and 'tcph' are never used since introduce > in commit cc35c88ae4db ("crypto : chtls - CPL handler definition") > > Signed-off-by: YueHaibing > --- > drivers/crypto/chelsio/chtls/chtls_cm.c | 4 > 1 file changed, 4 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 2/3] crypto: crypto_xor - use unaligned accessors for aligned fast path
On 9 October 2018 at 05:47, Eric Biggers wrote: > Hi Ard, > > On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote: >> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> because the ordinary load/store instructions (ldr, ldrh, ldrb) can >> tolerate any misalignment of the memory address. However, load/store >> double and load/store multiple instructions (ldrd, ldm) may still only >> be used on memory addresses that are 32-bit aligned, and so we have to >> use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we >> may end up with a severe performance hit due to alignment traps that >> require fixups by the kernel. >> >> Fortunately, the get_unaligned() accessors do the right thing: when >> building for ARMv6 or later, the compiler will emit unaligned accesses >> using the ordinary load/store instructions (but avoid the ones that >> require 32-bit alignment). When building for older ARM, those accessors >> will emit the appropriate sequence of ldrb/mov/orr instructions. And on >> architectures that can truly tolerate any kind of misalignment, the >> get_unaligned() accessors resolve to the leXX_to_cpup accessors that >> operate on aligned addresses. >> >> So switch to the unaligned accessors for the aligned fast path. This >> will create the exact same code on architectures that can really >> tolerate any kind of misalignment, and generate code for ARMv6+ that >> avoids load/store instructions that trigger alignment faults. >> >> Signed-off-by: Ard Biesheuvel >> --- >> crypto/algapi.c | 7 +++ >> include/crypto/algapi.h | 11 +-- >> 2 files changed, 12 insertions(+), 6 deletions(-) >> >> diff --git a/crypto/algapi.c b/crypto/algapi.c >> index 2545c5f89c4c..52ce3c5a0499 100644 >> --- a/crypto/algapi.c >> +++ b/crypto/algapi.c >> @@ -988,11 +988,10 @@ void crypto_inc(u8 *a, unsigned int size) >> __be32 *b = (__be32 *)(a + size); >> u32 c; >> >> - if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || >> - IS_ALIGNED((unsigned long)b, __alignof__(*b))) >> + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) >> for (; size >= 4; size -= 4) { >> - c = be32_to_cpu(*--b) + 1; >> - *b = cpu_to_be32(c); >> + c = get_unaligned_be32(--b) + 1; >> + put_unaligned_be32(c, b); >> if (likely(c)) >> return; >> } >> diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h >> index 4a5ad10e75f0..86267c232f34 100644 >> --- a/include/crypto/algapi.h >> +++ b/include/crypto/algapi.h >> @@ -17,6 +17,8 @@ >> #include >> #include >> >> +#include >> + >> /* >> * Maximum values for blocksize and alignmask, used to allocate >> * static buffers that are big enough for any combination of >> @@ -212,7 +214,9 @@ static inline void crypto_xor(u8 *dst, const u8 *src, >> unsigned int size) >> unsigned long *s = (unsigned long *)src; >> >> while (size > 0) { >> - *d++ ^= *s++; >> + put_unaligned(get_unaligned(d) ^ get_unaligned(s), d); >> + d++; >> + s++; >> size -= sizeof(unsigned long); >> } >> } else { >> @@ -231,7 +235,10 @@ static inline void crypto_xor_cpy(u8 *dst, const u8 >> *src1, const u8 *src2, >> unsigned long *s2 = (unsigned long *)src2; >> >> while (size > 0) { >> - *d++ = *s1++ ^ *s2++; >> + put_unaligned(get_unaligned(s1) ^ get_unaligned(s2), >> d); >> + d++; >> + s1++; >> + s2++; >> size -= sizeof(unsigned long); >> } >> } else { >> -- >> 2.11.0 >> > > Doesn't __crypto_xor() have the same problem too? > More or less, and I was wondering what to do about it. To fix __crypto_xor() correctly, we'd have to duplicate the code path that operates on the u64[], u32[] and u16[] chunks, or we'll end up with suboptimal code that uses the accessors even if the alignment routine has executed first. This is the same issue Jason points out in siphash. Perhaps the answer is to add 'fast' unaligned accessors that may be used on unaligned quantities only if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set? E.g., #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS #define get_unaligned_fast get_unaligned #else #define get_unaligned_fast(x) (*(x)) #endif Arnd?
Re: [PATCH 1/3] crypto: memneq - use unaligned accessors for aligned fast path
On 9 October 2018 at 05:34, Eric Biggers wrote: > Hi Ard, > > On Mon, Oct 08, 2018 at 11:15:52PM +0200, Ard Biesheuvel wrote: >> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> because the ordinary load/store instructions (ldr, ldrh, ldrb) can >> tolerate any misalignment of the memory address. However, load/store >> double and load/store multiple instructions (ldrd, ldm) may still only >> be used on memory addresses that are 32-bit aligned, and so we have to >> use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we >> may end up with a severe performance hit due to alignment traps that >> require fixups by the kernel. >> >> Fortunately, the get_unaligned() accessors do the right thing: when >> building for ARMv6 or later, the compiler will emit unaligned accesses >> using the ordinary load/store instructions (but avoid the ones that >> require 32-bit alignment). When building for older ARM, those accessors >> will emit the appropriate sequence of ldrb/mov/orr instructions. And on >> architectures that can truly tolerate any kind of misalignment, the >> get_unaligned() accessors resolve to the leXX_to_cpup accessors that >> operate on aligned addresses. >> >> So switch to the unaligned accessors for the aligned fast path. This >> will create the exact same code on architectures that can really >> tolerate any kind of misalignment, and generate code for ARMv6+ that >> avoids load/store instructions that trigger alignment faults. >> >> Signed-off-by: Ard Biesheuvel >> --- >> crypto/memneq.c | 24 ++-- >> 1 file changed, 17 insertions(+), 7 deletions(-) >> >> diff --git a/crypto/memneq.c b/crypto/memneq.c >> index afed1bd16aee..0f46a6150f22 100644 >> --- a/crypto/memneq.c >> +++ b/crypto/memneq.c >> @@ -60,6 +60,7 @@ >> */ >> >> #include >> +#include >> >> #ifndef __HAVE_ARCH_CRYPTO_MEMNEQ >> >> @@ -71,7 +72,10 @@ __crypto_memneq_generic(const void *a, const void *b, >> size_t size) >> >> #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) >> while (size >= sizeof(unsigned long)) { >> - neq |= *(unsigned long *)a ^ *(unsigned long *)b; >> + unsigned long const *p = a; >> + unsigned long const *q = b; >> + >> + neq |= get_unaligned(p) ^ get_unaligned(q); >> OPTIMIZER_HIDE_VAR(neq); >> a += sizeof(unsigned long); >> b += sizeof(unsigned long); >> @@ -95,18 +99,24 @@ static inline unsigned long __crypto_memneq_16(const >> void *a, const void *b) >> >> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> if (sizeof(unsigned long) == 8) { >> - neq |= *(unsigned long *)(a) ^ *(unsigned long *)(b); >> + unsigned long const *p = a; >> + unsigned long const *q = b; >> + >> + neq |= get_unaligned(p++) ^ get_unaligned(q++); >> OPTIMIZER_HIDE_VAR(neq); >> - neq |= *(unsigned long *)(a+8) ^ *(unsigned long *)(b+8); >> + neq |= get_unaligned(p) ^ get_unaligned(q); >> OPTIMIZER_HIDE_VAR(neq); >> } else if (sizeof(unsigned int) == 4) { >> - neq |= *(unsigned int *)(a)^ *(unsigned int *)(b); >> + unsigned int const *p = a; >> + unsigned int const *q = b; >> + >> + neq |= get_unaligned(p++) ^ get_unaligned(q++); >> OPTIMIZER_HIDE_VAR(neq); >> - neq |= *(unsigned int *)(a+4) ^ *(unsigned int *)(b+4); >> + neq |= get_unaligned(p++) ^ get_unaligned(q++); >> OPTIMIZER_HIDE_VAR(neq); >> - neq |= *(unsigned int *)(a+8) ^ *(unsigned int *)(b+8); >> + neq |= get_unaligned(p++) ^ get_unaligned(q++); >> OPTIMIZER_HIDE_VAR(neq); >> - neq |= *(unsigned int *)(a+12) ^ *(unsigned int *)(b+12); >> + neq |= get_unaligned(p) ^ get_unaligned(q); >> OPTIMIZER_HIDE_VAR(neq); >> } else >> #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ > > This looks good, but maybe now we should get rid of the > !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS path too? > At least for the 16-byte case: > > static inline unsigned long __crypto_memneq_16(const void *a, const void *b) > { > const unsigned long *p = a, *q = b; > unsigned long neq = 0; > > BUILD_BUG_ON(sizeof(*p) != 4 && sizeof(*p) != 8); > neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > if (sizeof(*p) == 4) { > neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > } > return neq; > } Yes that makes sense.
Re: [PATCH 2/3] crypto: crypto_xor - use unaligned accessors for aligned fast path
Hi Ard, On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote: > On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > because the ordinary load/store instructions (ldr, ldrh, ldrb) can > tolerate any misalignment of the memory address. However, load/store > double and load/store multiple instructions (ldrd, ldm) may still only > be used on memory addresses that are 32-bit aligned, and so we have to > use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we > may end up with a severe performance hit due to alignment traps that > require fixups by the kernel. > > Fortunately, the get_unaligned() accessors do the right thing: when > building for ARMv6 or later, the compiler will emit unaligned accesses > using the ordinary load/store instructions (but avoid the ones that > require 32-bit alignment). When building for older ARM, those accessors > will emit the appropriate sequence of ldrb/mov/orr instructions. And on > architectures that can truly tolerate any kind of misalignment, the > get_unaligned() accessors resolve to the leXX_to_cpup accessors that > operate on aligned addresses. > > So switch to the unaligned accessors for the aligned fast path. This > will create the exact same code on architectures that can really > tolerate any kind of misalignment, and generate code for ARMv6+ that > avoids load/store instructions that trigger alignment faults. > > Signed-off-by: Ard Biesheuvel > --- > crypto/algapi.c | 7 +++ > include/crypto/algapi.h | 11 +-- > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/crypto/algapi.c b/crypto/algapi.c > index 2545c5f89c4c..52ce3c5a0499 100644 > --- a/crypto/algapi.c > +++ b/crypto/algapi.c > @@ -988,11 +988,10 @@ void crypto_inc(u8 *a, unsigned int size) > __be32 *b = (__be32 *)(a + size); > u32 c; > > - if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || > - IS_ALIGNED((unsigned long)b, __alignof__(*b))) > + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) > for (; size >= 4; size -= 4) { > - c = be32_to_cpu(*--b) + 1; > - *b = cpu_to_be32(c); > + c = get_unaligned_be32(--b) + 1; > + put_unaligned_be32(c, b); > if (likely(c)) > return; > } > diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h > index 4a5ad10e75f0..86267c232f34 100644 > --- a/include/crypto/algapi.h > +++ b/include/crypto/algapi.h > @@ -17,6 +17,8 @@ > #include > #include > > +#include > + > /* > * Maximum values for blocksize and alignmask, used to allocate > * static buffers that are big enough for any combination of > @@ -212,7 +214,9 @@ static inline void crypto_xor(u8 *dst, const u8 *src, > unsigned int size) > unsigned long *s = (unsigned long *)src; > > while (size > 0) { > - *d++ ^= *s++; > + put_unaligned(get_unaligned(d) ^ get_unaligned(s), d); > + d++; > + s++; > size -= sizeof(unsigned long); > } > } else { > @@ -231,7 +235,10 @@ static inline void crypto_xor_cpy(u8 *dst, const u8 > *src1, const u8 *src2, > unsigned long *s2 = (unsigned long *)src2; > > while (size > 0) { > - *d++ = *s1++ ^ *s2++; > + put_unaligned(get_unaligned(s1) ^ get_unaligned(s2), d); > + d++; > + s1++; > + s2++; > size -= sizeof(unsigned long); > } > } else { > -- > 2.11.0 > Doesn't __crypto_xor() have the same problem too? - Eric
Re: [PATCH 1/3] crypto: memneq - use unaligned accessors for aligned fast path
Hi Ard, On Mon, Oct 08, 2018 at 11:15:52PM +0200, Ard Biesheuvel wrote: > On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > because the ordinary load/store instructions (ldr, ldrh, ldrb) can > tolerate any misalignment of the memory address. However, load/store > double and load/store multiple instructions (ldrd, ldm) may still only > be used on memory addresses that are 32-bit aligned, and so we have to > use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we > may end up with a severe performance hit due to alignment traps that > require fixups by the kernel. > > Fortunately, the get_unaligned() accessors do the right thing: when > building for ARMv6 or later, the compiler will emit unaligned accesses > using the ordinary load/store instructions (but avoid the ones that > require 32-bit alignment). When building for older ARM, those accessors > will emit the appropriate sequence of ldrb/mov/orr instructions. And on > architectures that can truly tolerate any kind of misalignment, the > get_unaligned() accessors resolve to the leXX_to_cpup accessors that > operate on aligned addresses. > > So switch to the unaligned accessors for the aligned fast path. This > will create the exact same code on architectures that can really > tolerate any kind of misalignment, and generate code for ARMv6+ that > avoids load/store instructions that trigger alignment faults. > > Signed-off-by: Ard Biesheuvel > --- > crypto/memneq.c | 24 ++-- > 1 file changed, 17 insertions(+), 7 deletions(-) > > diff --git a/crypto/memneq.c b/crypto/memneq.c > index afed1bd16aee..0f46a6150f22 100644 > --- a/crypto/memneq.c > +++ b/crypto/memneq.c > @@ -60,6 +60,7 @@ > */ > > #include > +#include > > #ifndef __HAVE_ARCH_CRYPTO_MEMNEQ > > @@ -71,7 +72,10 @@ __crypto_memneq_generic(const void *a, const void *b, > size_t size) > > #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) > while (size >= sizeof(unsigned long)) { > - neq |= *(unsigned long *)a ^ *(unsigned long *)b; > + unsigned long const *p = a; > + unsigned long const *q = b; > + > + neq |= get_unaligned(p) ^ get_unaligned(q); > OPTIMIZER_HIDE_VAR(neq); > a += sizeof(unsigned long); > b += sizeof(unsigned long); > @@ -95,18 +99,24 @@ static inline unsigned long __crypto_memneq_16(const void > *a, const void *b) > > #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > if (sizeof(unsigned long) == 8) { > - neq |= *(unsigned long *)(a) ^ *(unsigned long *)(b); > + unsigned long const *p = a; > + unsigned long const *q = b; > + > + neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > - neq |= *(unsigned long *)(a+8) ^ *(unsigned long *)(b+8); > + neq |= get_unaligned(p) ^ get_unaligned(q); > OPTIMIZER_HIDE_VAR(neq); > } else if (sizeof(unsigned int) == 4) { > - neq |= *(unsigned int *)(a)^ *(unsigned int *)(b); > + unsigned int const *p = a; > + unsigned int const *q = b; > + > + neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > - neq |= *(unsigned int *)(a+4) ^ *(unsigned int *)(b+4); > + neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > - neq |= *(unsigned int *)(a+8) ^ *(unsigned int *)(b+8); > + neq |= get_unaligned(p++) ^ get_unaligned(q++); > OPTIMIZER_HIDE_VAR(neq); > - neq |= *(unsigned int *)(a+12) ^ *(unsigned int *)(b+12); > + neq |= get_unaligned(p) ^ get_unaligned(q); > OPTIMIZER_HIDE_VAR(neq); > } else > #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ This looks good, but maybe now we should get rid of the !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS path too? At least for the 16-byte case: static inline unsigned long __crypto_memneq_16(const void *a, const void *b) { const unsigned long *p = a, *q = b; unsigned long neq = 0; BUILD_BUG_ON(sizeof(*p) != 4 && sizeof(*p) != 8); neq |= get_unaligned(p++) ^ get_unaligned(q++); OPTIMIZER_HIDE_VAR(neq); neq |= get_unaligned(p++) ^ get_unaligned(q++); OPTIMIZER_HIDE_VAR(neq); if (sizeof(*p) == 4) { neq |= get_unaligned(p++) ^ get_unaligned(q++); OPTIMIZER_HIDE_VAR(neq); neq |= get_unaligned(p++) ^ get_unaligned(q++); OPTIMIZER_HIDE_VAR(neq); } return neq; }
Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal
On Fri, Oct 05, 2018 at 10:13:06AM -0700, Eric Biggers wrote: > From: Eric Biggers > > aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit() > to register/unregister the "fpu" template. But these functions don't > exist anymore, causing a build error. Remove the calls to them. > > Fixes: 944585a64f5e ("crypto: x86/aes-ni - remove special handling of AES in > PCBC mode") > Signed-off-by: Eric Biggers > --- > arch/x86/crypto/aesni-intel_glue.c | 13 + > 1 file changed, 1 insertion(+), 12 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: arm64/aes - fix handling sub-block CTS-CBC inputs
On Tue, Oct 02, 2018 at 10:22:15PM -0700, Eric Biggers wrote: > From: Eric Biggers > > In the new arm64 CTS-CBC implementation, return an error code rather > than crashing on inputs shorter than AES_BLOCK_SIZE bytes. Also set > cra_blocksize to AES_BLOCK_SIZE (like is done in the cts template) to > indicate the minimum input size. > > Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode") > Signed-off-by: Eric Biggers Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/3] crypto: mxs-dcp - Fix tcrypt on imx6
On Tue, Oct 02, 2018 at 07:01:46PM +, Leonard Crestez wrote: > The mxs-dcp driver currently fails to probe on imx6. Fix the whole thing > by porting a cleaned/squashed version of fixes carried in the NXP vendor > tree. > > Tested with "modprobe tcrypt" and CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n > on imx6sl imx6sll imx6ull: no failures. > > I'm not very familiar with crypto and did not write write these fixes so > a skeptical review would be appreciated. > > Previously: > https://lore.kernel.org/patchwork/patch/989652/ > > Dan Douglass (1): > crypto: mxs-dcp - Implement sha import/export > > Radu Solea (2): > crypto: mxs-dcp - Fix SHA null hashes and output length > crypto: mxs-dcp - Fix AES issues > > drivers/crypto/mxs-dcp.c | 121 --- > 1 file changed, 101 insertions(+), 20 deletions(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v2 0/2] crypto - fix aegis/morus for big endian systems
On Mon, Oct 01, 2018 at 10:36:36AM +0200, Ard Biesheuvel wrote: > Some bug fixes for issues that I stumbled upon while working on other > stuff. > > Changes since v1: > - add Ondrej's ack to #1 > - simplify #2 and drop unrelated performance tweak > > Ard Biesheuvel (2): > crypto: morus/generic - fix for big endian systems > crypto: aegis/generic - fix for big endian systems > > crypto/aegis.h | 20 +--- > crypto/morus1280.c | 7 ++- > crypto/morus640.c | 16 > 3 files changed, 15 insertions(+), 28 deletions(-) All applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] drivers: crypto: caam: kconfig: create menu for CAAM
Franck LENORMAND wrote: > The CAAM driver has multiple configuration and all are listed > in the crypto menu. > > This patch create a menu dedicated to the Freescale CAAM driver. > > Signed-off-by: Franck LENORMAND > --- > drivers/crypto/caam/Kconfig | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig > index 1eb8527..fb87245 100644 > --- a/drivers/crypto/caam/Kconfig > +++ b/drivers/crypto/caam/Kconfig > @@ -1,3 +1,5 @@ > +menu "Freescale CAAM" > + > config CRYPTO_DEV_FSL_CAAM >tristate "Freescale CAAM-Multicore driver backend" >depends on FSL_SOC || ARCH_MXC || ARCH_LAYERSCAPE > @@ -152,3 +154,5 @@ config CRYPTO_DEV_FSL_CAAM_DEBUG > config CRYPTO_DEV_FSL_CAAM_CRYPTO_API_DESC >def_tristate (CRYPTO_DEV_FSL_CAAM_CRYPTO_API || \ > CRYPTO_DEV_FSL_CAAM_CRYPTO_API_QI) > + > +endmenu Please rebase this on the current cryptodev tree as it doesn't apply. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal
On Fri, Oct 05, 2018 at 07:16:13PM +0200, Ard Biesheuvel wrote: > On 5 October 2018 at 19:13, Eric Biggers wrote: > > From: Eric Biggers > > > > aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit() > > to register/unregister the "fpu" template. But these functions don't > > exist anymore, causing a build error. Remove the calls to them. > > > > Fixes: 944585a64f5e ("crypto: x86/aes-ni - remove special handling of AES > > in PCBC mode") > > Signed-off-by: Eric Biggers > > Thanks for spotting that. > > I had actually noticed myself, but wasn't really expecting this RFC > patch to be picked up without discussion. > The patch seems reasonable to me -- we shouldn't maintain a special FPU template just for AES-PCBC when possibly no one is even using that algorithm. - Eric
Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal
On 5 October 2018 at 19:13, Eric Biggers wrote: > From: Eric Biggers > > aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit() > to register/unregister the "fpu" template. But these functions don't > exist anymore, causing a build error. Remove the calls to them. > > Fixes: 944585a64f5e ("crypto: x86/aes-ni - remove special handling of AES in > PCBC mode") > Signed-off-by: Eric Biggers Thanks for spotting that. I had actually noticed myself, but wasn't really expecting this RFC patch to be picked up without discussion. > --- > arch/x86/crypto/aesni-intel_glue.c | 13 + > 1 file changed, 1 insertion(+), 12 deletions(-) > > diff --git a/arch/x86/crypto/aesni-intel_glue.c > b/arch/x86/crypto/aesni-intel_glue.c > index 89bae64eef4f9..661f7daf43da9 100644 > --- a/arch/x86/crypto/aesni-intel_glue.c > +++ b/arch/x86/crypto/aesni-intel_glue.c > @@ -102,9 +102,6 @@ asmlinkage void aesni_cbc_enc(struct crypto_aes_ctx *ctx, > u8 *out, > asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out, > const u8 *in, unsigned int len, u8 *iv); > > -int crypto_fpu_init(void); > -void crypto_fpu_exit(void); > - > #define AVX_GEN2_OPTSIZE 640 > #define AVX_GEN4_OPTSIZE 4096 > > @@ -1449,13 +1446,9 @@ static int __init aesni_init(void) > #endif > #endif > > - err = crypto_fpu_init(); > - if (err) > - return err; > - > err = crypto_register_algs(aesni_algs, ARRAY_SIZE(aesni_algs)); > if (err) > - goto fpu_exit; > + return err; > > err = crypto_register_skciphers(aesni_skciphers, > ARRAY_SIZE(aesni_skciphers)); > @@ -1489,8 +1482,6 @@ static int __init aesni_init(void) > ARRAY_SIZE(aesni_skciphers)); > unregister_algs: > crypto_unregister_algs(aesni_algs, ARRAY_SIZE(aesni_algs)); > -fpu_exit: > - crypto_fpu_exit(); > return err; > } > > @@ -1501,8 +1492,6 @@ static void __exit aesni_exit(void) > crypto_unregister_skciphers(aesni_skciphers, > ARRAY_SIZE(aesni_skciphers)); > crypto_unregister_algs(aesni_algs, ARRAY_SIZE(aesni_algs)); > - > - crypto_fpu_exit(); > } > > late_initcall(aesni_init); > -- > 2.19.0.605.g01d371f741-goog >
Re: [PATCH] crypto: qat - move temp buffers off the stack
On 5 October 2018 at 04:29, Herbert Xu wrote: > On Wed, Sep 26, 2018 at 11:51:59AM +0200, Ard Biesheuvel wrote: >> Arnd reports that with Kees's latest VLA patches applied, the HMAC >> handling in the QAT driver uses a worst case estimate of 160 bytes >> for the SHA blocksize, allowing the compiler to determine the size >> of the stack frame at runtime and throw a warning: >> >> drivers/crypto/qat/qat_common/qat_algs.c: In function >> 'qat_alg_do_precomputes': >> drivers/crypto/qat/qat_common/qat_algs.c:257:1: error: the frame size >> of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] >> >> Given that this worst case estimate is only 32 bytes larger than the >> actual block size of SHA-512, the use of a VLA here was hiding the >> excessive size of the stack frame from the compiler, and so we should >> try to move these buffers off the stack. >> >> So move the ipad/opad buffers and the various SHA state descriptors >> into the tfm context struct. Since qat_alg_do_precomputes() is only >> called in the context of a setkey() operation, this should be safe. >> Using SHA512_BLOCK_SIZE for the size of the ipad/opad buffers allows >> them to be used by SHA-1/SHA-256 as well. >> >> Reported-by: Arnd Bergmann >> Signed-off-by: Ard Biesheuvel >> --- >> This applies against v4.19-rc while Arnd's report was about -next. However, >> since Kees's VLA change results in a warning about a pre-existing condition, >> we may decide to apply it as a fix, and handle the conflict with Kees's >> patch in cryptodev. Otherwise, I can respin it to apply onto cryptodev >> directly. The patch was build tested only - I don't have the hardware. >> >> Thoughts anyone? > > I applied it against cryptodev only. I don't think it's bad enough > to warrant a backport to stable though. But if you guys disagree we > could always send the backport to stable after this goes upstream. > Works for me.
Re: [PATCH] crypto: lrw - fix rebase error after out of bounds fix
On Sun, Sep 30, 2018 at 09:51:16PM +0200, Ard Biesheuvel wrote: > Due to an unfortunate interaction between commit fbe1a850b3b1 > ("crypto: lrw - Fix out-of bounds access on counter overflow") and > commit c778f96bf347 ("crypto: lrw - Optimize tweak computation"), > we ended up with a version of next_index() that always returns 127. > > Fixes: c778f96bf347 ("crypto: lrw - Optimize tweak computation") > Signed-off-by: Ard Biesheuvel Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: qat - move temp buffers off the stack
On Wed, Sep 26, 2018 at 11:51:59AM +0200, Ard Biesheuvel wrote: > Arnd reports that with Kees's latest VLA patches applied, the HMAC > handling in the QAT driver uses a worst case estimate of 160 bytes > for the SHA blocksize, allowing the compiler to determine the size > of the stack frame at runtime and throw a warning: > > drivers/crypto/qat/qat_common/qat_algs.c: In function > 'qat_alg_do_precomputes': > drivers/crypto/qat/qat_common/qat_algs.c:257:1: error: the frame size > of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] > > Given that this worst case estimate is only 32 bytes larger than the > actual block size of SHA-512, the use of a VLA here was hiding the > excessive size of the stack frame from the compiler, and so we should > try to move these buffers off the stack. > > So move the ipad/opad buffers and the various SHA state descriptors > into the tfm context struct. Since qat_alg_do_precomputes() is only > called in the context of a setkey() operation, this should be safe. > Using SHA512_BLOCK_SIZE for the size of the ipad/opad buffers allows > them to be used by SHA-1/SHA-256 as well. > > Reported-by: Arnd Bergmann > Signed-off-by: Ard Biesheuvel > --- > This applies against v4.19-rc while Arnd's report was about -next. However, > since Kees's VLA change results in a warning about a pre-existing condition, > we may decide to apply it as a fix, and handle the conflict with Kees's > patch in cryptodev. Otherwise, I can respin it to apply onto cryptodev > directly. The patch was build tested only - I don't have the hardware. > > Thoughts anyone? I applied it against cryptodev only. I don't think it's bad enough to warrant a backport to stable though. But if you guys disagree we could always send the backport to stable after this goes upstream. > drivers/crypto/qat/qat_common/qat_algs.c | 60 ++-- > 1 file changed, 31 insertions(+), 29 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH -next v2] crypto: ccp - Make function sev_get_firmware() static
On Wed, Sep 26, 2018 at 02:09:23AM +, Wei Yongjun wrote: > Fixes the following sparse warning: > > drivers/crypto/ccp/psp-dev.c:444:5: warning: > symbol 'sev_get_firmware' was not declared. Should it be static? > > Fixes: e93720606efd ("crypto: ccp - Allow SEV firmware to be chosen based on > Family and Model") > Signed-off-by: Wei Yongjun > --- > v1 -> v2: add fixes and add cc Janakarajan > --- > drivers/crypto/ccp/psp-dev.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC PATCH] crypto: x86/aes-ni - remove special handling of AES in PCBC mode
On Mon, Sep 24, 2018 at 02:48:16PM +0200, Ard Biesheuvel wrote: > For historical reasons, the AES-NI based implementation of the PCBC > chaining mode uses a special FPU chaining mode wrapper template to > amortize the FPU start/stop overhead over multiple blocks. > > When this FPU wrapper was introduced, it supported widely used > chaining modes such as XTS and CTR (as well as LRW), but currently, > PCBC is the only remaining user. > > Since there are no known users of pcbc(aes) in the kernel, let's remove > this special driver, and rely on the generic pcbc driver to encapsulate > the AES-NI core cipher. > > Signed-off-by: Ard Biesheuvel > --- > arch/x86/crypto/Makefile | 2 +- > arch/x86/crypto/aesni-intel_glue.c | 32 --- > arch/x86/crypto/fpu.c | 207 > crypto/Kconfig | 2 +- > 4 files changed, 2 insertions(+), 241 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: aes_ti - disable interrupts while accessing sbox
Hi Eric, On 4 October 2018 at 06:07, Eric Biggers wrote: > From: Eric Biggers > > The generic constant-time AES implementation is supposed to preload the > AES S-box into the CPU's L1 data cache. But, an interrupt handler can > run on the CPU and muck with the cache. Worse, on preemptible kernels > the process can even be preempted and moved to a different CPU. So the > implementation may actually still be vulnerable to cache-timing attacks. > > Make it more robust by disabling interrupts while the sbox is used. > > In some quick tests on x86 and ARM, this doesn't affect performance > significantly. Responsiveness is also a concern, but interrupts are > only disabled for a single AES block which even on ARM Cortex-A7 is > "only" ~1500 cycles to encrypt or ~2600 cycles to decrypt. > I share your concern, but that is quite a big hammer. Also, does it really take ~100 cycles per byte? That is terrible :-) Given that the full lookup table is only 1024 bytes (or 1024+256 bytes for decryption), I wonder if something like the below would be a better option for A7 (apologies for the mangled whitespace) diff --git a/arch/arm/crypto/aes-cipher-core.S b/arch/arm/crypto/aes-cipher-core.S index 184d6c2d15d5..83e893f7e581 100644 --- a/arch/arm/crypto/aes-cipher-core.S +++ b/arch/arm/crypto/aes-cipher-core.S @@ -139,6 +139,13 @@ __adrl ttab, \ttab + .irpc r, 01234567 + ldr r8, [ttab, #(32 * \r)] + ldr r9, [ttab, #(32 * \r) + 256] + ldr r10, [ttab, #(32 * \r) + 512] + ldr r11, [ttab, #(32 * \r) + 768] + .endr + tst rounds, #2 bne 1f @@ -180,6 +187,12 @@ ENDPROC(__aes_arm_encrypt) .align 5 ENTRY(__aes_arm_decrypt) + __adrl ttab, __aes_arm_inverse_sbox + + .irpc r, 01234567 + ldr r8, [ttab, #(32 * \r)] + .endr + do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0 ENDPROC(__aes_arm_decrypt) diff --git a/arch/arm/crypto/aes-cipher-glue.c b/arch/arm/crypto/aes-cipher-glue.c index c222f6e072ad..630e1a436f1d 100644 --- a/arch/arm/crypto/aes-cipher-glue.c +++ b/arch/arm/crypto/aes-cipher-glue.c @@ -23,16 +23,22 @@ static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); int rounds = 6 + ctx->key_length / 4; + unsigned long flags; + local_irq_save(flags); __aes_arm_encrypt(ctx->key_enc, rounds, in, out); + local_irq_restore(flags); } static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) { struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); int rounds = 6 + ctx->key_length / 4; + unsigned long flags; + local_irq_save(flags); __aes_arm_decrypt(ctx->key_dec, rounds, in, out); + local_irq_restore(flags); } static struct crypto_alg aes_alg = { diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c index ca554d57d01e..82fa860c9cb9 100644 --- a/crypto/aes_generic.c +++ b/crypto/aes_generic.c @@ -63,7 +63,7 @@ static inline u8 byte(const u32 x, const unsigned n) static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 }; -__visible const u32 crypto_ft_tab[4][256] = { +__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = { { 0xa56363c6, 0x847c7cf8, 0x99ee, 0x8d7b7bf6, 0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591, @@ -327,7 +327,7 @@ __visible const u32 crypto_ft_tab[4][256] = { } }; -__visible const u32 crypto_fl_tab[4][256] = { +__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = { { 0x0063, 0x007c, 0x0077, 0x007b, 0x00f2, 0x006b, 0x006f, 0x00c5, @@ -591,7 +591,7 @@ __visible const u32 crypto_fl_tab[4][256] = { } }; -__visible const u32 crypto_it_tab[4][256] = { +__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = { { 0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a, 0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b, @@ -855,7 +855,7 @@ __visible const u32 crypto_it_tab[4][256] = { } }; -__visible const u32 crypto_il_tab[4][256] = { +__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = { { 0x0052, 0x0009, 0x006a, 0x00d5, 0x0030, 0x0036, 0x00a5, 0x0038, > Fixes: b5e0b032b6c3 ("crypto: aes - add generic time invariant AES cipher") > Signed-off-by: Eric Biggers > --- > crypto/aes_ti.c | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c > index 03023b2290e8e..81b604419ee0e 100644 > --- a/crypto/aes_ti.c > +++ b/crypto/aes_ti.c > @@ -269,6 +269,7 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > const u32 *rkp = ctx->key_enc + 4; > int rounds = 6 + ctx->key_length / 4; > u32 st0[4], st1[4]; > + unsigned long flags; > int round; > > st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in); > @@ -276,6 +277,12 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 > *out, const u8 *in) > st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8); > st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in +
Re: [PATCH] crypto: arm64/aes - fix handling sub-block CTS-CBC inputs
On 3 October 2018 at 07:22, Eric Biggers wrote: > From: Eric Biggers > > In the new arm64 CTS-CBC implementation, return an error code rather > than crashing on inputs shorter than AES_BLOCK_SIZE bytes. Also set > cra_blocksize to AES_BLOCK_SIZE (like is done in the cts template) to > indicate the minimum input size. > > Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode") > Signed-off-by: Eric Biggers Thanks Eric Reviewed-by: Ard Biesheuvel > --- > arch/arm64/crypto/aes-glue.c | 13 + > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c > index 26d2b0263ba63..1e676625ef33f 100644 > --- a/arch/arm64/crypto/aes-glue.c > +++ b/arch/arm64/crypto/aes-glue.c > @@ -243,8 +243,11 @@ static int cts_cbc_encrypt(struct skcipher_request *req) > > skcipher_request_set_tfm(>subreq, tfm); > > - if (req->cryptlen == AES_BLOCK_SIZE) > + if (req->cryptlen <= AES_BLOCK_SIZE) { > + if (req->cryptlen < AES_BLOCK_SIZE) > + return -EINVAL; > cbc_blocks = 1; > + } > > if (cbc_blocks > 0) { > unsigned int blocks; > @@ -305,8 +308,11 @@ static int cts_cbc_decrypt(struct skcipher_request *req) > > skcipher_request_set_tfm(>subreq, tfm); > > - if (req->cryptlen == AES_BLOCK_SIZE) > + if (req->cryptlen <= AES_BLOCK_SIZE) { > + if (req->cryptlen < AES_BLOCK_SIZE) > + return -EINVAL; > cbc_blocks = 1; > + } > > if (cbc_blocks > 0) { > unsigned int blocks; > @@ -486,14 +492,13 @@ static struct skcipher_alg aes_algs[] = { { > .cra_driver_name= "__cts-cbc-aes-" MODE, > .cra_priority = PRIO, > .cra_flags = CRYPTO_ALG_INTERNAL, > - .cra_blocksize = 1, > + .cra_blocksize = AES_BLOCK_SIZE, > .cra_ctxsize= sizeof(struct crypto_aes_ctx), > .cra_module = THIS_MODULE, > }, > .min_keysize= AES_MIN_KEY_SIZE, > .max_keysize= AES_MAX_KEY_SIZE, > .ivsize = AES_BLOCK_SIZE, > - .chunksize = AES_BLOCK_SIZE, > .walksize = 2 * AES_BLOCK_SIZE, > .setkey = skcipher_aes_setkey, > .encrypt= cts_cbc_encrypt, > -- > 2.19.0 >
Re: [PATCH v2 2/2] crypto: aegis/generic - fix for big endian systems
On Mon, Oct 1, 2018 at 10:36 AM Ard Biesheuvel wrote: > Use the correct __le32 annotation and accessors to perform the > single round of AES encryption performed inside the AEGIS transform. > Otherwise, tcrypt reports: > > alg: aead: Test 1 failed on encryption for aegis128-generic > : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > alg: aead: Test 1 failed on encryption for aegis128l-generic > : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > alg: aead: Test 1 failed on encryption for aegis256-generic > : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") > Cc: # v4.18+ > Signed-off-by: Ard Biesheuvel LGTM now, thanks! Reviewed-by: Ondrej Mosnacek > --- > crypto/aegis.h | 20 +--- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/crypto/aegis.h b/crypto/aegis.h > index f1c6900ddb80..405e025fc906 100644 > --- a/crypto/aegis.h > +++ b/crypto/aegis.h > @@ -21,7 +21,7 @@ > > union aegis_block { > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > u8 bytes[AEGIS_BLOCK_SIZE]; > }; > > @@ -57,24 +57,22 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > const union aegis_block *src, > const union aegis_block *key) > { > - u32 *d = dst->words32; > const u8 *s = src->bytes; > - const u32 *k = key->words32; > const u32 *t0 = crypto_ft_tab[0]; > const u32 *t1 = crypto_ft_tab[1]; > const u32 *t2 = crypto_ft_tab[2]; > const u32 *t3 = crypto_ft_tab[3]; > u32 d0, d1, d2, d3; > > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > + d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]]; > + d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]]; > + d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]]; > + d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]]; > > - d[0] = d0; > - d[1] = d1; > - d[2] = d2; > - d[3] = d3; > + dst->words32[0] = cpu_to_le32(d0) ^ key->words32[0]; > + dst->words32[1] = cpu_to_le32(d1) ^ key->words32[1]; > + dst->words32[2] = cpu_to_le32(d2) ^ key->words32[2]; > + dst->words32[3] = cpu_to_le32(d3) ^ key->words32[3]; > } > > #endif /* _CRYPTO_AEGIS_H */ > -- > 2.17.1 > -- Ondrej Mosnacek Associate Software Engineer, Security Technologies Red Hat, Inc.
Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems
On Mon, Oct 1, 2018 at 10:01 AM Ard Biesheuvel wrote: > On 1 October 2018 at 10:00, Ondrej Mosnacek wrote: > > On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel > > wrote: > >> On 30 September 2018 at 10:58, Ard Biesheuvel > >> wrote: > >> > Use the correct __le32 annotation and accessors to perform the > >> > single round of AES encryption performed inside the AEGIS transform. > >> > Otherwise, tcrypt reports: > >> > > >> > alg: aead: Test 1 failed on encryption for aegis128-generic > >> > : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > >> > alg: aead: Test 1 failed on encryption for aegis128l-generic > >> > : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > >> > alg: aead: Test 1 failed on encryption for aegis256-generic > >> > : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > >> > > >> > While at it, let's refer to the first precomputed table only, and > >> > derive the other ones by rotation. This reduces the D-cache footprint > >> > by 75%, and shouldn't be too costly or free on load/store architectures > >> > (and X86 has its own AES-NI based implementation) > >> > > >> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD > >> > implementations") > >> > Cc: # v4.18+ > >> > Signed-off-by: Ard Biesheuvel > >> > --- > >> > crypto/aegis.h | 23 +--- > >> > 1 file changed, 10 insertions(+), 13 deletions(-) > >> > > >> > diff --git a/crypto/aegis.h b/crypto/aegis.h > >> > index f1c6900ddb80..84d3e07a3c33 100644 > >> > --- a/crypto/aegis.h > >> > +++ b/crypto/aegis.h > >> > @@ -21,7 +21,7 @@ > >> > > >> > union aegis_block { > >> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > >> > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > >> > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > >> > u8 bytes[AEGIS_BLOCK_SIZE]; > >> > }; > >> > > >> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block > >> > *dst, > >> > { > >> > u32 *d = dst->words32; > >> > const u8 *s = src->bytes; > >> > - const u32 *k = key->words32; > >> > + const __le32 *k = key->words32; > >> > const u32 *t0 = crypto_ft_tab[0]; > >> > - const u32 *t1 = crypto_ft_tab[1]; > >> > - const u32 *t2 = crypto_ft_tab[2]; > >> > - const u32 *t3 = crypto_ft_tab[3]; > >> > u32 d0, d1, d2, d3; > >> > > >> > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > >> > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > >> > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > >> > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > >> > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ > >> > rol32(t0[s[15]], 24); > >> > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ > >> > rol32(t0[s[ 3]], 24); > >> > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ > >> > rol32(t0[s[ 7]], 24); > >> > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ > >> > rol32(t0[s[11]], 24); > >> > > >> > - d[0] = d0; > >> > - d[1] = d1; > >> > - d[2] = d2; > >> > - d[3] = d3; > >> > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > >> > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > >> > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > >> > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); > >> > >> > >> I suppose this > >> > >> > + d[0] = cpu_to_le32(d0) ^ k[0]; > >> > + d[1] = cpu_to_le32(d1) ^ k[1]; > >> > + d[2] = cpu_to_le32(d2) ^ k[2]; > >> > + d[3] = cpu_to_le32(d3) ^ k[3]; > >> > >> should work fine as well > > > > Yeah, that looks nicer, but I'm not sure if it is completely OK to do > > bitwise/arithmetic operations directly on the __[lb]e* types... Maybe > > yes, but the code I've seen that used them usually seemed to treat > > them as opaque types. > > > > No, xor is fine with __le/__be types Ah, OK then. Good to know :) -- Ondrej Mosnacek Associate Software Engineer, Security Technologies Red Hat, Inc.
Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems
On 1 October 2018 at 10:00, Ondrej Mosnacek wrote: > On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel > wrote: >> On 30 September 2018 at 10:58, Ard Biesheuvel >> wrote: >> > Use the correct __le32 annotation and accessors to perform the >> > single round of AES encryption performed inside the AEGIS transform. >> > Otherwise, tcrypt reports: >> > >> > alg: aead: Test 1 failed on encryption for aegis128-generic >> > : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e >> > alg: aead: Test 1 failed on encryption for aegis128l-generic >> > : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 >> > alg: aead: Test 1 failed on encryption for aegis256-generic >> > : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c >> > >> > While at it, let's refer to the first precomputed table only, and >> > derive the other ones by rotation. This reduces the D-cache footprint >> > by 75%, and shouldn't be too costly or free on load/store architectures >> > (and X86 has its own AES-NI based implementation) >> > >> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD >> > implementations") >> > Cc: # v4.18+ >> > Signed-off-by: Ard Biesheuvel >> > --- >> > crypto/aegis.h | 23 +--- >> > 1 file changed, 10 insertions(+), 13 deletions(-) >> > >> > diff --git a/crypto/aegis.h b/crypto/aegis.h >> > index f1c6900ddb80..84d3e07a3c33 100644 >> > --- a/crypto/aegis.h >> > +++ b/crypto/aegis.h >> > @@ -21,7 +21,7 @@ >> > >> > union aegis_block { >> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; >> > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; >> > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; >> > u8 bytes[AEGIS_BLOCK_SIZE]; >> > }; >> > >> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, >> > { >> > u32 *d = dst->words32; >> > const u8 *s = src->bytes; >> > - const u32 *k = key->words32; >> > + const __le32 *k = key->words32; >> > const u32 *t0 = crypto_ft_tab[0]; >> > - const u32 *t1 = crypto_ft_tab[1]; >> > - const u32 *t2 = crypto_ft_tab[2]; >> > - const u32 *t3 = crypto_ft_tab[3]; >> > u32 d0, d1, d2, d3; >> > >> > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; >> > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; >> > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; >> > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; >> > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ >> > rol32(t0[s[15]], 24); >> > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ >> > rol32(t0[s[ 3]], 24); >> > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ >> > rol32(t0[s[ 7]], 24); >> > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ >> > rol32(t0[s[11]], 24); >> > >> > - d[0] = d0; >> > - d[1] = d1; >> > - d[2] = d2; >> > - d[3] = d3; >> > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); >> > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); >> > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); >> > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); >> >> >> I suppose this >> >> > + d[0] = cpu_to_le32(d0) ^ k[0]; >> > + d[1] = cpu_to_le32(d1) ^ k[1]; >> > + d[2] = cpu_to_le32(d2) ^ k[2]; >> > + d[3] = cpu_to_le32(d3) ^ k[3]; >> >> should work fine as well > > Yeah, that looks nicer, but I'm not sure if it is completely OK to do > bitwise/arithmetic operations directly on the __[lb]e* types... Maybe > yes, but the code I've seen that used them usually seemed to treat > them as opaque types. > No, xor is fine with __le/__be types
Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems
On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel wrote: > On 30 September 2018 at 10:58, Ard Biesheuvel > wrote: > > Use the correct __le32 annotation and accessors to perform the > > single round of AES encryption performed inside the AEGIS transform. > > Otherwise, tcrypt reports: > > > > alg: aead: Test 1 failed on encryption for aegis128-generic > > : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > > alg: aead: Test 1 failed on encryption for aegis128l-generic > > : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > > alg: aead: Test 1 failed on encryption for aegis256-generic > > : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > > > > While at it, let's refer to the first precomputed table only, and > > derive the other ones by rotation. This reduces the D-cache footprint > > by 75%, and shouldn't be too costly or free on load/store architectures > > (and X86 has its own AES-NI based implementation) > > > > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD > > implementations") > > Cc: # v4.18+ > > Signed-off-by: Ard Biesheuvel > > --- > > crypto/aegis.h | 23 +--- > > 1 file changed, 10 insertions(+), 13 deletions(-) > > > > diff --git a/crypto/aegis.h b/crypto/aegis.h > > index f1c6900ddb80..84d3e07a3c33 100644 > > --- a/crypto/aegis.h > > +++ b/crypto/aegis.h > > @@ -21,7 +21,7 @@ > > > > union aegis_block { > > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > > u8 bytes[AEGIS_BLOCK_SIZE]; > > }; > > > > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > > { > > u32 *d = dst->words32; > > const u8 *s = src->bytes; > > - const u32 *k = key->words32; > > + const __le32 *k = key->words32; > > const u32 *t0 = crypto_ft_tab[0]; > > - const u32 *t1 = crypto_ft_tab[1]; > > - const u32 *t2 = crypto_ft_tab[2]; > > - const u32 *t3 = crypto_ft_tab[3]; > > u32 d0, d1, d2, d3; > > > > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ > > rol32(t0[s[15]], 24); > > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ > > rol32(t0[s[ 3]], 24); > > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ > > rol32(t0[s[ 7]], 24); > > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ > > rol32(t0[s[11]], 24); > > > > - d[0] = d0; > > - d[1] = d1; > > - d[2] = d2; > > - d[3] = d3; > > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); > > > I suppose this > > > + d[0] = cpu_to_le32(d0) ^ k[0]; > > + d[1] = cpu_to_le32(d1) ^ k[1]; > > + d[2] = cpu_to_le32(d2) ^ k[2]; > > + d[3] = cpu_to_le32(d3) ^ k[3]; > > should work fine as well Yeah, that looks nicer, but I'm not sure if it is completely OK to do bitwise/arithmetic operations directly on the __[lb]e* types... Maybe yes, but the code I've seen that used them usually seemed to treat them as opaque types. > > > } > > > > #endif /* _CRYPTO_AEGIS_H */ > > -- > > 2.19.0 > > -- Ondrej Mosnacek Associate Software Engineer, Security Technologies Red Hat, Inc.
Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems
On 1 October 2018 at 09:50, Ondrej Mosnacek wrote: > Hi Ard, > > On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel > wrote: >> Use the correct __le32 annotation and accessors to perform the >> single round of AES encryption performed inside the AEGIS transform. >> Otherwise, tcrypt reports: >> >> alg: aead: Test 1 failed on encryption for aegis128-generic >> : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e >> alg: aead: Test 1 failed on encryption for aegis128l-generic >> : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 >> alg: aead: Test 1 failed on encryption for aegis256-generic >> : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > > Hm... I think the reason I made a mistake here is that I first had a > version with the AES table hard-coded and I had an #ifdef endian> #else #endif there with values for little-endian and > big-endian variants. Then I realized the aes_generic module exports > the crypto_ft_table and rewrote the code to use that. Somewhere along > the way I forgot to check if the aes_generic table uses the same trick > and correct the code... > > It would be nice to apply the same optimization to aes_generic.c, but > unfortunately the current tables are exported so changing the > convention would break external modules that use them :/ > Indeed. I am doing some refactoring work on the AES code, which is how I ran into this in the first place. https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=for-kernelci >> >> While at it, let's refer to the first precomputed table only, and >> derive the other ones by rotation. This reduces the D-cache footprint >> by 75%, and shouldn't be too costly or free on load/store architectures >> (and X86 has its own AES-NI based implementation) > > Could you maybe extract this into a separate patch? I don't think we > should mix functional and performance fixes together. > Yeah, good point. I will do that and fold in the simplification. >> >> Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD >> implementations") >> Cc: # v4.18+ >> Signed-off-by: Ard Biesheuvel >> --- >> crypto/aegis.h | 23 +--- >> 1 file changed, 10 insertions(+), 13 deletions(-) >> >> diff --git a/crypto/aegis.h b/crypto/aegis.h >> index f1c6900ddb80..84d3e07a3c33 100644 >> --- a/crypto/aegis.h >> +++ b/crypto/aegis.h >> @@ -21,7 +21,7 @@ >> >> union aegis_block { >> __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; >> - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; >> + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; >> u8 bytes[AEGIS_BLOCK_SIZE]; >> }; >> >> @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, >> { >> u32 *d = dst->words32; >> const u8 *s = src->bytes; >> - const u32 *k = key->words32; >> + const __le32 *k = key->words32; >> const u32 *t0 = crypto_ft_tab[0]; >> - const u32 *t1 = crypto_ft_tab[1]; >> - const u32 *t2 = crypto_ft_tab[2]; >> - const u32 *t3 = crypto_ft_tab[3]; >> u32 d0, d1, d2, d3; >> >> - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; >> - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; >> - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; >> - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; >> + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ >> rol32(t0[s[15]], 24); >> + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ >> rol32(t0[s[ 3]], 24); >> + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ >> rol32(t0[s[ 7]], 24); >> + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ >> rol32(t0[s[11]], 24); >> >> - d[0] = d0; >> - d[1] = d1; >> - d[2] = d2; >> - d[3] = d3; >> + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); >> + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); >> + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); >> + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); >> } >> >> #endif /* _CRYPTO_AEGIS_H */ >> -- >> 2.19.0 >> > > Thanks, > > -- > Ondrej Mosnacek > Associate Software Engineer, Security Technologies > Red Hat, Inc.
Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems
Hi Ard, On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel wrote: > Use the correct __le32 annotation and accessors to perform the > single round of AES encryption performed inside the AEGIS transform. > Otherwise, tcrypt reports: > > alg: aead: Test 1 failed on encryption for aegis128-generic > : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > alg: aead: Test 1 failed on encryption for aegis128l-generic > : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > alg: aead: Test 1 failed on encryption for aegis256-generic > : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c Hm... I think the reason I made a mistake here is that I first had a version with the AES table hard-coded and I had an #ifdef #else #endif there with values for little-endian and big-endian variants. Then I realized the aes_generic module exports the crypto_ft_table and rewrote the code to use that. Somewhere along the way I forgot to check if the aes_generic table uses the same trick and correct the code... It would be nice to apply the same optimization to aes_generic.c, but unfortunately the current tables are exported so changing the convention would break external modules that use them :/ > > While at it, let's refer to the first precomputed table only, and > derive the other ones by rotation. This reduces the D-cache footprint > by 75%, and shouldn't be too costly or free on load/store architectures > (and X86 has its own AES-NI based implementation) Could you maybe extract this into a separate patch? I don't think we should mix functional and performance fixes together. > > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") > Cc: # v4.18+ > Signed-off-by: Ard Biesheuvel > --- > crypto/aegis.h | 23 +--- > 1 file changed, 10 insertions(+), 13 deletions(-) > > diff --git a/crypto/aegis.h b/crypto/aegis.h > index f1c6900ddb80..84d3e07a3c33 100644 > --- a/crypto/aegis.h > +++ b/crypto/aegis.h > @@ -21,7 +21,7 @@ > > union aegis_block { > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > u8 bytes[AEGIS_BLOCK_SIZE]; > }; > > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > { > u32 *d = dst->words32; > const u8 *s = src->bytes; > - const u32 *k = key->words32; > + const __le32 *k = key->words32; > const u32 *t0 = crypto_ft_tab[0]; > - const u32 *t1 = crypto_ft_tab[1]; > - const u32 *t2 = crypto_ft_tab[2]; > - const u32 *t3 = crypto_ft_tab[3]; > u32 d0, d1, d2, d3; > > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ > rol32(t0[s[15]], 24); > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ > rol32(t0[s[ 3]], 24); > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ > rol32(t0[s[ 7]], 24); > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ > rol32(t0[s[11]], 24); > > - d[0] = d0; > - d[1] = d1; > - d[2] = d2; > - d[3] = d3; > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); > } > > #endif /* _CRYPTO_AEGIS_H */ > -- > 2.19.0 > Thanks, -- Ondrej Mosnacek Associate Software Engineer, Security Technologies Red Hat, Inc.
Re: [PATCH 1/2] crypto: morus/generic - fix for big endian systems
On 1 October 2018 at 09:26, Ondrej Mosnacek wrote: > On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel > wrote: >> Omit the endian swabbing when folding the lengths of the assoc and >> crypt input buffers into the state to finalize the tag. This is not >> necessary given that the memory representation of the state is in >> machine native endianness already. >> >> This fixes an error reported by tcrypt running on a big endian system: >> >> alg: aead: Test 2 failed on encryption for morus640-generic >> : a8 30 ef fb e6 26 eb 23 b0 87 dd 98 57 f3 e1 4b >> 0010: 21 >> alg: aead: Test 2 failed on encryption for morus1280-generic >> : 88 19 1b fb 1c 29 49 0e ee 82 2f cb 97 a6 a5 ee >> 0010: 5f > > Yikes, I never really got around to test MORUS and AEGIS on a BE > machine... My mistake, sorry :/ > No worries - this is brand new code so this is not entirely unexpected. >> >> Fixes: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD >> implementations") >> Cc: # v4.18+ >> Signed-off-by: Ard Biesheuvel > > Reviewed-by: Ondrej Mosnacek > Thanks! >> --- >> crypto/morus1280.c | 7 ++- >> crypto/morus640.c | 16 >> 2 files changed, 6 insertions(+), 17 deletions(-) >> >> diff --git a/crypto/morus1280.c b/crypto/morus1280.c >> index d057cf5ac4a8..3889c188f266 100644 >> --- a/crypto/morus1280.c >> +++ b/crypto/morus1280.c >> @@ -385,14 +385,11 @@ static void crypto_morus1280_final(struct >> morus1280_state *state, >>struct morus1280_block *tag_xor, >>u64 assoclen, u64 cryptlen) >> { >> - u64 assocbits = assoclen * 8; >> - u64 cryptbits = cryptlen * 8; >> - >> struct morus1280_block tmp; >> unsigned int i; >> >> - tmp.words[0] = cpu_to_le64(assocbits); >> - tmp.words[1] = cpu_to_le64(cryptbits); >> + tmp.words[0] = assoclen * 8; >> + tmp.words[1] = cryptlen * 8; >> tmp.words[2] = 0; >> tmp.words[3] = 0; >> >> diff --git a/crypto/morus640.c b/crypto/morus640.c >> index 1ca76e54281b..da06ec2f6a80 100644 >> --- a/crypto/morus640.c >> +++ b/crypto/morus640.c >> @@ -384,21 +384,13 @@ static void crypto_morus640_final(struct >> morus640_state *state, >> struct morus640_block *tag_xor, >> u64 assoclen, u64 cryptlen) >> { >> - u64 assocbits = assoclen * 8; >> - u64 cryptbits = cryptlen * 8; >> - >> - u32 assocbits_lo = (u32)assocbits; >> - u32 assocbits_hi = (u32)(assocbits >> 32); >> - u32 cryptbits_lo = (u32)cryptbits; >> - u32 cryptbits_hi = (u32)(cryptbits >> 32); >> - >> struct morus640_block tmp; >> unsigned int i; >> >> - tmp.words[0] = cpu_to_le32(assocbits_lo); >> - tmp.words[1] = cpu_to_le32(assocbits_hi); >> - tmp.words[2] = cpu_to_le32(cryptbits_lo); >> - tmp.words[3] = cpu_to_le32(cryptbits_hi); >> + tmp.words[0] = lower_32_bits(assoclen * 8); >> + tmp.words[1] = upper_32_bits(assoclen * 8); >> + tmp.words[2] = lower_32_bits(cryptlen * 8); >> + tmp.words[3] = upper_32_bits(cryptlen * 8); >> >> for (i = 0; i < MORUS_BLOCK_WORDS; i++) >> state->s[4].words[i] ^= state->s[0].words[i]; >> -- >> 2.19.0 >> > > Thanks, > > -- > Ondrej Mosnacek > Associate Software Engineer, Security Technologies > Red Hat, Inc.