subject:"Re\\\\\\\:"

Re: [PATCH v5 00/11] crypto: crypto_user_stat: misc enhancement

2018-12-06 Thread Herbert Xu

On Thu, Nov 29, 2018 at 02:42:15PM +, Corentin Labbe wrote:
> Hello
> 
> This patchset fixes all reported problem by Eric biggers.
> 
> Regards
> 
> Changes since v4:
> - Inlined functions when !CRYPTO_STATS
> 
> Changes since v3:
> - Added a crypto_stats_init as asked vy Neil Horman
> - Fixed some checkpatch complaints
> 
> Changes since v2:
> - moved all crypto_stats functions from header to algapi.c for using
>   crypto_alg_get/put
> 
> Changes since v1:
> - Better locking of crypto_alg via crypto_alg_get/crypto_alg_put
> - remove all intermediate variables in crypto/crypto_user_stat.c
> - splited all internal stats variables into different structures
> 
> Corentin Labbe (11):
>   crypto: crypto_user_stat: made crypto_user_stat optional
>   crypto: CRYPTO_STATS should depend on CRYPTO_USER
>   crypto: crypto_user_stat: convert all stats from u32 to u64
>   crypto: crypto_user_stat: split user space crypto stat structures
>   crypto: tool: getstat: convert user space example to the new
> crypto_user_stat uapi
>   crypto: crypto_user_stat: fix use_after_free of struct xxx_request
>   crypto: crypto_user_stat: Fix invalid stat reporting
>   crypto: crypto_user_stat: remove intermediate variable
>   crypto: crypto_user_stat: Split stats in multiple structures
>   crypto: crypto_user_stat: rename err_cnt parameter
>   crypto: crypto_user_stat: Add crypto_stats_init
> 
>  crypto/Kconfig   |   1 +
>  crypto/Makefile  |   3 +-
>  crypto/ahash.c   |  17 +-
>  crypto/algapi.c  | 247 ++-
>  crypto/crypto_user_stat.c| 160 +--
>  crypto/rng.c |   4 +-
>  include/crypto/acompress.h   |  38 +---
>  include/crypto/aead.h|  38 +---
>  include/crypto/akcipher.h|  74 ++-
>  include/crypto/hash.h|  32 +--
>  include/crypto/internal/cryptouser.h |  17 ++
>  include/crypto/kpp.h |  48 +
>  include/crypto/rng.h |  27 +--
>  include/crypto/skcipher.h|  36 +---
>  include/linux/crypto.h   | 290 ++-
>  include/uapi/linux/cryptouser.h  | 102 ++
>  tools/crypto/getstat.c   |  72 +++
>  17 files changed, 676 insertions(+), 530 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [crypto chcr 1/2] small packet Tx stalls the queue

2018-12-06 Thread Herbert Xu

On Fri, Nov 30, 2018 at 02:31:48PM +0530, Atul Gupta wrote:
> Immediate packets sent to hardware should include the work
> request length in calculating the flits. WR occupy one flit and
> if not accounted result in invalid request which stalls the HW
> queue.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Atul Gupta 
> ---
>  drivers/crypto/chelsio/chcr_ipsec.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] fscrypt: remove CRYPTO_CTR dependency

2018-12-04 Thread Eric Biggers

On Thu, Sep 06, 2018 at 12:43:41PM +0200, Ard Biesheuvel wrote:
> On 5 September 2018 at 21:24, Eric Biggers  wrote:
> > From: Eric Biggers 
> >
> > fscrypt doesn't use the CTR mode of operation for anything, so there's
> > no need to select CRYPTO_CTR.  It was added by commit 71dea01ea2ed
> > ("ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is
> > enabled").  But, I've been unable to identify the arm64 crypto bug it
> > was supposedly working around.
> >
> > I suspect the issue was seen only on some old Android device kernel
> > (circa 3.10?).  So if the fix wasn't mistaken, the real bug is probably
> > already fixed.  Or maybe it was actually a bug in a non-upstream crypto
> > driver.
> >
> > So, remove the dependency.  If it turns out there's actually still a
> > bug, we'll fix it properly.
> >
> > Signed-off-by: Eric Biggers 
> 
> Acked-by: Ard Biesheuvel 
> 
> This may be related to
> 
> 11e3b725cfc2 crypto: arm64/aes-blk - honour iv_out requirement in CBC
> and CTR modes
> 
> given that the commit in question mentions CTS. How it actually works
> around the issue is unclear to me, though.
> 
> 
> 
> 
> > ---
> >  fs/crypto/Kconfig | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/fs/crypto/Kconfig b/fs/crypto/Kconfig
> > index 02b7d91c92310..284b589b4774d 100644
> > --- a/fs/crypto/Kconfig
> > +++ b/fs/crypto/Kconfig
> > @@ -6,7 +6,6 @@ config FS_ENCRYPTION
> > select CRYPTO_ECB
> > select CRYPTO_XTS
> > select CRYPTO_CTS
> > -   select CRYPTO_CTR
> > select CRYPTO_SHA256
> > select KEYS
> > help
> > --
> > 2.19.0.rc2.392.g5ba43deb5a-goog
> >

Ping.  Ted, can you consider applying this to the fscrypt tree for 4.21?

Thanks,

- Eric

Re: [PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions

2018-11-29 Thread Herbert Xu

On Tue, Nov 20, 2018 at 05:30:47PM +0100, Martin Willi wrote:
> In the quest for pushing the limits of chacha20 encryption for both IPsec
> and Wireguard, this small series adds AVX-512VL block functions. The VL
> variant works on 256-bit ymm registers, but compared to AVX2 can benefit
> from the new instructions.
> 
> Compared to the AVX2 version, these block functions bring an overall
> speed improvement across encryption lengths of ~20%. Below the tcrypt
> results for additional block sizes in kOps/s, for the current AVX2
> code path, the new AVX-512VL code path and the comparison to Zinc in
> AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz).
> 
> These numbers result in a very nice chart, available at:
>   https://download.strongswan.org/misc/chacha-avx-512vl.svg
> 
>  zinc   zinc
>  len   avx2  512vl   avx2  512vl
>8   5719   5672   5468   5612
>   16   5675   5627   5355   5621
>   24   5687   5601   5322   5633
>   32   5667   5622   5244   5564
>   40   5603   5582   5337   5578
>   48   5638   5539   5400   5556
>   56   5624   5566   5375   5482
>   64   5590   5573   5352   5531
>   72   4841   5467   3365   3457
>   80   5316   5761   3310   3381
>   88   4798   5470   3239   3343
>   96   5324   5723   3197   3281
>  104   4819   5460   3155   3232
>  112   5266   5749   3020   3195
>  120   4776   5391   2959   3145
>  128   5291   5723   3398   3489
>  136   4122   4837   3321   3423
>  144   4507   5057   3247   3389
>  152   4139   4815   3233   3329
>  160   4482   5043   3159   3256
>  168   4142   4766   3131   3224
>  176   4506   5028   3073   3162
>  184   4119   4772   3010   3109
>  192   4499   5016   3402   3502
>  200   4127   4766   3329   3448
>  208   4452   5012   3276   3371
>  216   4128   4744   3243   3334
>  224   4484   5008   3203   3298
>  232   4103   4772   3141   3237
>  240   4458   4963   3115   3217
>  248   4121   4751   3085   3177
>  256   4461   4987   3364   4046
>  264   3406   4282   3270   4006
>  272   3408   4287   3207   3961
>  280   3371   4271   3203   3825
>  288   3625   4301   3129   3751
>  296   3402   4283   3093   3688
>  304   3401   4247   3062   3637
>  312   3382   4282   2995   3614
>  320   3611   4279   3305   4070
>  328   3386   4260   3276   3968
>  336   3369   4288   3171   3929
>  344   3389   4289   3134   3847
>  352   3609   4266   3127   3720
>  360   3355   4252   3076   3692
>  368   3387   4264   3048   3650
>  376   3387   4238   2967   3553
>  384   3568   4265   3277   4035
>  392   3369   4262   3299   3973
>  400   3362   4235   3239   3899
>  408   3352   4269   3196   3843
>  416   3585   4243   3127   3736
>  424   3364   4216   3092   3672
>  432   3341   4246   3067   3628
>  440   3353   4235   3018   3593
>  448   3538   4245   3327   4035
>  456   3322   4244   3275   3900
>  464   3340   4237   3212   3880
>  472   3330   4242   3054   3802
>  480   3530   4234   3078   3707
>  488   3337   4228   3094   3664
>  496   3330   4223   3015   3591
>  504   3317   4214   3002   3517
>  512   3531   4197   3339   4016
>  520   2511   3101   2030   2682
>  528   2627   3087   2027   2641
>  536   2508   3102   2001   2601
>  544   2638   3090   1964   2564
>  552   2494   3077   1962   2516
>  560   2625   3064   1941   2515
>  568   2500   3086   1922   2493
>  576   2611   3074   2050   2689
>  584   2482   3062   2041   2680
>  592   2595   3074   2026   2644
>  600   2470   3060   1985   2595
>  608   2581   3039   1961   2555
>  616   2478   3062   1956   2521
>  624   2587   3066   1930   2493
>  632   2457   3053   1923   2486
>  640   2581   3050   2059   2712
>  648   2296   2839   2024   2655
>  656   2389   2845   2019   2642
>  664   2292   2842   2002   2610
>  672   2404   2838   1959   2537
>  680   2273   2827   1956   2527
>  688   2389   2840   1938   2510
>  696   2280   2837   1911   2463
>  704   2370   2819   2055   2702
>  712   2277   2834   2029   2663
>  720   2369   2829   2020   2625
>  728   2255   2820   2001   2600
>  736   2373   2819   1958   2543
>  744   2269   2827   1956   2524
>  752   2364   2817   1937   2492
>  760   2270   2805   1909   2483
>  768   2378   2820   2050   2696
>  776   2053   2700   2002   2643
>  784   2066   2693   1922   2640
>  792   2065   2703   1928   2602
>  800   2138   2706   1962   2535
>  808   2065   2679   1938   2528
>  816   2063   2699   1929   2500
>  824   2053   2676   1915   2468
>  832   2149   2692   2036   2693
>  840   2055   2689   2024   2659
>  848   2049   2689   2006   2610
>  856   2057   2702   1979   2585
>  864   2144   2703   1960   2547
>  872   2047   2685   1945   2501
>  880   2055   2683   1902   2497
>  888   2060   2689   1897   2478
>  896   2139   2693   2023   2663
>  904   2049   2686   1970   2644
>  912   2055   2688   1925   2621
>  920   2047   2685   1911   2572
>  928   2114   2695   1907   2545
>  936   2055   2681   1927   2492
>  944   2055   2693   1930   2478

Re: [Help] Null pointer exception in scatterwalk_start() in kernel-4.9

2018-11-28 Thread Herbert Xu

On Tue, Nov 20, 2018 at 07:09:53AM +, gongchen (E) wrote:
> Hi Dear Herbert,
> 
> Sorry to bother you , but we’ve met a problem in crypto module, 
> would you please kindly help us look into it ? Thank you very much.
> 
>  In the below function chain, scatterwalk_start() doesn't check 
> the result of sg_next(), so the kernel will crash if sg_next() returns a null 
> pointer, which is our case. (The full stack is at the end of letter)
>  
> blkcipher_walk_done()->scatterwalk_done()->scatterwalk_pagedone()->scatterwalk_start(walk,
>  sg_next(walk->sg));
> 
> Should we add a null-pointer-check in scatterwalk_start()? Or is 
> there any process can ensure that there should be a valid sg pointer if the 
> condition (walk->offset >= walk->sg->offset + walk->sg->length) is true?
>   
> We are really looking forward to your reply, any information will 
> be appreciated , thanks again.

Did you apply the following patch?

commit 0868def3e4100591e7a1fdbf3eed1439cc8f7ca3
Author: Eric Biggers 
Date:   Mon Jul 23 10:54:57 2018 -0700

crypto: blkcipher - fix crash flushing dcache in error path

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Jason A. Donenfeld

Hi Martin,

On Tue, Nov 20, 2018 at 5:29 PM Martin Willi  wrote:
> Thanks for the offer, no need at this time. But I certainly would
> welcome if you could do some (Wireguard) benching with that code to see
> if it works for you.

I certainly will test it in a few different network circumstances,
especially since real testing like this is sometimes more telling than
busy-loop benchmarks.

> > Actually, similarly here, a 10nm Cannon Lake machine should be
> > arriving at my house this week, which should make for some
> > interesting testing ground for non-throttled zmm, if you'd like to
> > play with it.
>
> Maybe in a future iteration, thanks. In fact would it be interesting to
> know if Cannon Lake can handle that throttling better.

Everything I've read on the Internet seems to indicate that's the
case, so one of the first things I'll be doing is seeing if that's
true. There are also the AVX512 IFMA instructions to play with!

Jason

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Martin Willi

Hi Jason,

> [...] I have a massive Xeon Gold 5120 machine that I can give you
> access to if you'd like to do some testing and benching.

Thanks for the offer, no need at this time. But I certainly would
welcome if you could do some (Wireguard) benching with that code to see
if it works for you.

> Actually, similarly here, a 10nm Cannon Lake machine should be
> arriving at my house this week, which should make for some
> interesting testing ground for non-throttled zmm, if you'd like to
> play with it.

Maybe in a future iteration, thanks. In fact would it be interesting to
know if Cannon Lake can handle that throttling better.

Regards
Martin

Re: [PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'shash' tfm allocations

2018-11-19 Thread Herbert Xu

On Wed, Nov 14, 2018 at 12:21:11PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> 'shash' algorithms are always synchronous, so passing CRYPTO_ALG_ASYNC
> in the mask to crypto_alloc_shash() has no effect.  Many users therefore
> already don't pass it, but some still do.  This inconsistency can cause
> confusion, especially since the way the 'mask' argument works is
> somewhat counterintuitive.
> 
> Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  drivers/block/drbd/drbd_receiver.c  | 2 +-
>  drivers/md/dm-integrity.c   | 2 +-
>  drivers/net/wireless/intersil/orinoco/mic.c | 6 ++
>  fs/ubifs/auth.c | 5 ++---
>  net/bluetooth/smp.c | 2 +-
>  security/apparmor/crypto.c  | 2 +-
>  security/integrity/evm/evm_crypto.c | 3 +--
>  security/keys/encrypted-keys/encrypted.c| 4 ++--
>  security/keys/trusted.c | 4 ++--
>  9 files changed, 13 insertions(+), 17 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations

2018-11-19 Thread Herbert Xu

On Wed, Nov 14, 2018 at 12:19:39PM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> 'cipher' algorithms (single block ciphers) are always synchronous, so
> passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
> effect.  Many users therefore already don't pass it, but some still do.
> This inconsistency can cause confusion, especially since the way the
> 'mask' argument works is somewhat counterintuitive.
> 
> Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  arch/s390/crypto/aes_s390.c   | 2 +-
>  drivers/crypto/amcc/crypto4xx_alg.c   | 3 +--
>  drivers/crypto/ccp/ccp-crypto-aes-cmac.c  | 4 +---
>  drivers/crypto/geode-aes.c| 2 +-
>  drivers/md/dm-crypt.c | 2 +-
>  drivers/net/wireless/cisco/airo.c | 2 +-
>  drivers/staging/rtl8192e/rtllib_crypt_ccmp.c  | 2 +-
>  drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_ccmp.c | 2 +-
>  drivers/usb/wusbcore/crypto.c | 2 +-
>  net/bluetooth/smp.c   | 6 +++---
>  net/mac80211/wep.c| 4 ++--
>  net/wireless/lib80211_crypt_ccmp.c| 2 +-
>  net/wireless/lib80211_crypt_tkip.c| 4 ++--
>  net/wireless/lib80211_crypt_wep.c | 4 ++--
>  14 files changed, 19 insertions(+), 22 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: remove useless initializations of cra_list

2018-11-19 Thread Herbert Xu

On Wed, Nov 14, 2018 at 11:35:48AM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Some algorithms initialize their .cra_list prior to registration.
> But this is unnecessary since crypto_register_alg() will overwrite
> .cra_list when adding the algorithm to the 'crypto_alg_list'.
> Apparently the useless assignment has just been copy+pasted around.
> 
> So, remove the useless assignments.
> 
> Exception: paes_s390.c uses cra_list to check whether the algorithm is
> registered or not, so I left that as-is for now.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  arch/sparc/crypto/aes_glue.c  | 5 -
>  arch/sparc/crypto/camellia_glue.c | 5 -
>  arch/sparc/crypto/des_glue.c  | 5 -
>  crypto/lz4.c  | 1 -
>  crypto/lz4hc.c| 1 -
>  drivers/crypto/bcm/cipher.c   | 2 --
>  drivers/crypto/omap-aes.c | 2 --
>  drivers/crypto/omap-des.c | 1 -
>  drivers/crypto/qce/ablkcipher.c   | 1 -
>  drivers/crypto/qce/sha.c  | 1 -
>  drivers/crypto/sahara.c   | 1 -
>  11 files changed, 25 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: inside-secure - remove useless setting of type flags

2018-11-19 Thread Herbert Xu

On Wed, Nov 14, 2018 at 11:10:53AM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
> Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type
> flags") took care of this everywhere else, but a few more instances made
> it into the tree at about the same time.  Squash them before they get
> copy+pasted around again.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 
> ---
>  drivers/crypto/inside-secure/safexcel_cipher.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-19 Thread Jason A. Donenfeld

Hi Martin,

On Mon, Nov 19, 2018 at 8:52 AM Martin Willi  wrote:
>
> Adding AVX-512VL support is relatively simple. I have a patchset mostly
> ready that is more than competitive with the code from Zinc. I'll clean
> that up and do more testing before posting it later this week.

Terrific. Depending on how it turns out, it'll be nice to try
integrating this into Zinc. I have a massive Xeon Gold 5120 machine
that I can give you access to if you'd like to do some testing and
benching. Poke me on IRC -- I'm zx2c4.

> I don't think that having AVX-512F is that important until it is really
> usable on CPUs in the market.

Actually, similarly here, a 10nm Cannon Lake machine should be
arriving at my house this week, which should make for some interesting
testing ground for non-throttled zmm, if you'd like to play with it.

Jason

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-19 Thread Leon Romanovsky

On Mon, Nov 19, 2018 at 05:19:10PM +0800, Kenneth Lee wrote:
> On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> > Date: Mon, 19 Nov 2018 17:14:05 +0800
> > From: Kenneth Lee 
> > To: Leon Romanovsky 
> > CC: Tim Sell , linux-...@vger.kernel.org,
> >  Alexander Shishkin , Zaibo Xu
> >  , zhangfei@foxmail.com, linux...@huawei.com,
> >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> >  , Gavin Schenk , RDMA mailing
> >  list , Vinod Koul , Jason
> >  Gunthorpe , Doug Ledford , Uwe
> >  Kleine-König , David Kershner
> >  , Kenneth Lee , Johan
> >  Hovold , Cyrille Pitchen
> >  , Sagar Dharia
> >  , Jens Axboe ,
> >  guodong...@linaro.org, linux-netdev , Randy Dunlap
> >  , linux-ker...@vger.kernel.org, Zhou Wang
> >  , linux-crypto@vger.kernel.org, Philippe
> >  Ombredanne , Sanyog Kale ,
> >  "David S. Miller" ,
> >  linux-accelerat...@lists.ozlabs.org
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.5.21 (2010-09-15)
> > Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> >
> > On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > > From: Leon Romanovsky 
> > > To: Kenneth Lee 
> > > CC: Kenneth Lee , Tim Sell ,
> > >  linux-...@vger.kernel.org, Alexander Shishkin
> > >  , Zaibo Xu ,
> > >  zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org,
> > >  Christoph Lameter , Hao Fang , 
> > > Gavin
> > >  Schenk , RDMA mailing list
> > >  , Zhou Wang , Jason
> > >  Gunthorpe , Doug Ledford , Uwe
> > >  Kleine-König , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , "David S.
> > >  Miller" , linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com>
> > >
> > > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > > From: Leon Romanovsky 
> > > > > To: Kenneth Lee 
> > > > > CC: Tim Sell , linux-...@vger.kernel.org,
> > > > >  Alexander Shishkin , Zaibo Xu
> > > > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > > > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao 
> > > > > Fang
> > > > >  , Gavin Schenk , RDMA 
> > > > > mailing
> > > > >  list , Zhou Wang 
> > > > > ,
> > > > >  Jason Gunthorpe , Doug Ledford , 
> > > > > Uwe
> > > > >  Kleine-König , David Kershner
> > > > >  , Johan Hovold , Cyrille
> > > > >  Pitchen , Sagar Dharia
> > > > >  , Jens Axboe ,
> > > > >  guodong...@linaro.org, linux-netdev , Randy 
> > > > > Dunlap
> > > > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > > > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > > > >  , Sanyog Kale , 
> > > > > Kenneth Lee
> > > > >  , "David S. Miller" ,
> > > > >  linux-accelerat...@lists.ozlabs.org
> > > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for 
> > > > > WarpDrive/uacce
> > > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com>
> > > > >
> > > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > > >
> > > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > > From: Kenneth Lee 
> > > > > > > >
> > > > > > > > WarpDrive is a general accelerator framework for the user 
> > > > > > > > application to
> > > > > > > > access the hardware without going through the ker

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-19 Thread Kenneth Lee

On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote:
> Date: Mon, 19 Nov 2018 17:14:05 +0800
> From: Kenneth Lee 
> To: Leon Romanovsky 
> CC: Tim Sell , linux-...@vger.kernel.org,
>  Alexander Shishkin , Zaibo Xu
>  , zhangfei@foxmail.com, linux...@huawei.com,
>  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
>  , Gavin Schenk , RDMA mailing
>  list , Vinod Koul , Jason
>  Gunthorpe , Doug Ledford , Uwe
>  Kleine-König , David Kershner
>  , Kenneth Lee , Johan
>  Hovold , Cyrille Pitchen
>  , Sagar Dharia
>  , Jens Axboe ,
>  guodong...@linaro.org, linux-netdev , Randy Dunlap
>  , linux-ker...@vger.kernel.org, Zhou Wang
>  , linux-crypto@vger.kernel.org, Philippe
>  Ombredanne , Sanyog Kale ,
>  "David S. Miller" ,
>  linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.5.21 (2010-09-15)
> Message-ID: <20181119091405.GE157308@Turing-Arch-b>
> 
> On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> > Date: Thu, 15 Nov 2018 16:54:55 +0200
> > From: Leon Romanovsky 
> > To: Kenneth Lee 
> > CC: Kenneth Lee , Tim Sell ,
> >  linux-...@vger.kernel.org, Alexander Shishkin
> >  , Zaibo Xu ,
> >  zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org,
> >  Christoph Lameter , Hao Fang , Gavin
> >  Schenk , RDMA mailing list
> >  , Zhou Wang , Jason
> >  Gunthorpe , Doug Ledford , Uwe
> >  Kleine-König , David Kershner
> >  , Johan Hovold , Cyrille
> >  Pitchen , Sagar Dharia
> >  , Jens Axboe ,
> >  guodong...@linaro.org, linux-netdev , Randy Dunlap
> >  , linux-ker...@vger.kernel.org, Vinod Koul
> >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> >  , Sanyog Kale , "David S.
> >  Miller" , linux-accelerat...@lists.ozlabs.org
> > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > User-Agent: Mutt/1.10.1 (2018-07-13)
> > Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com>
> > 
> > On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > > From: Leon Romanovsky 
> > > > To: Kenneth Lee 
> > > > CC: Tim Sell , linux-...@vger.kernel.org,
> > > >  Alexander Shishkin , Zaibo Xu
> > > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > > >  , Gavin Schenk , RDMA 
> > > > mailing
> > > >  list , Zhou Wang ,
> > > >  Jason Gunthorpe , Doug Ledford , 
> > > > Uwe
> > > >  Kleine-König , David Kershner
> > > >  , Johan Hovold , Cyrille
> > > >  Pitchen , Sagar Dharia
> > > >  , Jens Axboe ,
> > > >  guodong...@linaro.org, linux-netdev , Randy 
> > > > Dunlap
> > > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > > >  , Sanyog Kale , Kenneth 
> > > > Lee
> > > >  , "David S. Miller" ,
> > > >  linux-accelerat...@lists.ozlabs.org
> > > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com>
> > > >
> > > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > > >
> > > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > > From: Kenneth Lee 
> > > > > > >
> > > > > > > WarpDrive is a general accelerator framework for the user 
> > > > > > > application to
> > > > > > > access the hardware without going through the kernel in data path.
> > > > > > >
> > > > > > > The kernel component to provide kernel facility to driver for 
> > > > > > > expose the
> > > > > > > user interface is called uacce. It a short name for
> > > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > > >
> > > > > > > This patch add document to explain how it works.
> > > > > > + RDMA and netdev folks
> > > > > >
> > > > >

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-19 Thread Kenneth Lee

On Thu, Nov 15, 2018 at 04:54:55PM +0200, Leon Romanovsky wrote:
> Date: Thu, 15 Nov 2018 16:54:55 +0200
> From: Leon Romanovsky 
> To: Kenneth Lee 
> CC: Kenneth Lee , Tim Sell ,
>  linux-...@vger.kernel.org, Alexander Shishkin
>  , Zaibo Xu ,
>  zhangfei@foxmail.com, linux...@huawei.com, haojian.zhu...@linaro.org,
>  Christoph Lameter , Hao Fang , Gavin
>  Schenk , RDMA mailing list
>  , Zhou Wang , Jason
>  Gunthorpe , Doug Ledford , Uwe
>  Kleine-König , David Kershner
>  , Johan Hovold , Cyrille
>  Pitchen , Sagar Dharia
>  , Jens Axboe ,
>  guodong...@linaro.org, linux-netdev , Randy Dunlap
>  , linux-ker...@vger.kernel.org, Vinod Koul
>  , linux-crypto@vger.kernel.org, Philippe Ombredanne
>  , Sanyog Kale , "David S.
>  Miller" , linux-accelerat...@lists.ozlabs.org
> Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20181115145455.gn3...@mtr-leonro.mtl.com>
> 
> On Thu, Nov 15, 2018 at 04:51:09PM +0800, Kenneth Lee wrote:
> > On Wed, Nov 14, 2018 at 06:00:17PM +0200, Leon Romanovsky wrote:
> > > Date: Wed, 14 Nov 2018 18:00:17 +0200
> > > From: Leon Romanovsky 
> > > To: Kenneth Lee 
> > > CC: Tim Sell , linux-...@vger.kernel.org,
> > >  Alexander Shishkin , Zaibo Xu
> > >  , zhangfei@foxmail.com, linux...@huawei.com,
> > >  haojian.zhu...@linaro.org, Christoph Lameter , Hao Fang
> > >  , Gavin Schenk , RDMA 
> > > mailing
> > >  list , Zhou Wang ,
> > >  Jason Gunthorpe , Doug Ledford , Uwe
> > >  Kleine-König , David Kershner
> > >  , Johan Hovold , Cyrille
> > >  Pitchen , Sagar Dharia
> > >  , Jens Axboe ,
> > >  guodong...@linaro.org, linux-netdev , Randy 
> > > Dunlap
> > >  , linux-ker...@vger.kernel.org, Vinod Koul
> > >  , linux-crypto@vger.kernel.org, Philippe Ombredanne
> > >  , Sanyog Kale , Kenneth 
> > > Lee
> > >  , "David S. Miller" ,
> > >  linux-accelerat...@lists.ozlabs.org
> > > Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
> > > User-Agent: Mutt/1.10.1 (2018-07-13)
> > > Message-ID: <20181114160017.gi3...@mtr-leonro.mtl.com>
> > >
> > > On Wed, Nov 14, 2018 at 10:58:09AM +0800, Kenneth Lee wrote:
> > > >
> > > > 在 2018/11/13 上午8:23, Leon Romanovsky 写道:
> > > > > On Mon, Nov 12, 2018 at 03:58:02PM +0800, Kenneth Lee wrote:
> > > > > > From: Kenneth Lee 
> > > > > >
> > > > > > WarpDrive is a general accelerator framework for the user 
> > > > > > application to
> > > > > > access the hardware without going through the kernel in data path.
> > > > > >
> > > > > > The kernel component to provide kernel facility to driver for 
> > > > > > expose the
> > > > > > user interface is called uacce. It a short name for
> > > > > > "Unified/User-space-access-intended Accelerator Framework".
> > > > > >
> > > > > > This patch add document to explain how it works.
> > > > > + RDMA and netdev folks
> > > > >
> > > > > Sorry, to be late in the game, I don't see other patches, but from
> > > > > the description below it seems like you are reinventing RDMA verbs
> > > > > model. I have hard time to see the differences in the proposed
> > > > > framework to already implemented in drivers/infiniband/* for the 
> > > > > kernel
> > > > > space and for the https://github.com/linux-rdma/rdma-core/ for the 
> > > > > user
> > > > > space parts.
> > > >
> > > > Thanks Leon,
> > > >
> > > > Yes, we tried to solve similar problem in RDMA. We also learned a lot 
> > > > from
> > > > the exist code of RDMA. But we we have to make a new one because we 
> > > > cannot
> > > > register accelerators such as AI operation, encryption or compression 
> > > > to the
> > > > RDMA framework:)
> > >
> > > Assuming that you did everything right and still failed to use RDMA
> > > framework, you was supposed to fix it and not to reinvent new exactly
> > > same one. It is how we develop kernel, by reusing existing code.
> >
> > Yes, but we don't force other system such as NIC or GPU into RDMA, do we?
> 
> You don't introduce new NIC or GPU, but proposing another interfac

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-18 Thread Martin Willi

Hi Jason,

> I'd be inclined to roll with your implementation if it can eventually
> become competitive with Andy Polyakov's, [...]

I think for the SSSE3/AVX2 code paths it is competitive; especially for
small sizes it is faster, which is not that unimportant when
implementing layer 3 VPNs.

> there are still no AVX-512 paths, which means it's considerably
> slower on all newer generation Intel chips. Andy's has the AVX-512VL
> implementation for Skylake (using ymm, so as not to hit throttling)
> and AVX-512F for Cannon Lake and beyond (using zmm).

I don't think that having AVX-512F is that important until it is really
usable on CPUs in the market.

Adding AVX-512VL support is relatively simple. I have a patchset mostly
ready that is more than competitive with the code from Zinc. I'll clean
that up and do more testing before posting it later this week.

Best regards
Martin

Re: [PATCH 0/5] crypto: caam - add support for Era 10

2018-11-15 Thread Herbert Xu

On Thu, Nov 08, 2018 at 03:36:26PM +0200, Horia Geantă wrote:
> This patch set adds support for CAAM Era 10, currently used in LX2160A SoC:
> -new register mapping: some registers/fields are deprecated and moved
> to different locations, mainly version registers
> -algorithms
> chacha20 (over DPSECI - Data Path SEC Interface on fsl-mc bus)
> rfc7539(chacha20,poly1305) (over both DPSECI and Job Ring Interface)
> rfc7539esp(chacha20,poly1305) (over both DPSECI and Job Ring Interface)
> 
> Note: the patch set is generated on top of cryptodev-2.6, however testing
> was performed based on linux-next (tag: next-20181108) - which includes
> LX2160A platform support + manually updating LX2160A dts with:
> -fsl-mc bus DT node
> -missing dma-ranges property in soc DT node
> 
> Cristian Stoica (1):
>   crypto: export CHACHAPOLY_IV_SIZE
> 
> Horia Geantă (4):
>   crypto: caam - add register map changes cf. Era 10
>   crypto: caam/qi2 - add support for ChaCha20
>   crypto: caam/jr - add support for Chacha20 + Poly1305
>   crypto: caam/qi2 - add support for Chacha20 + Poly1305
> 
>  crypto/chacha20poly1305.c  |   2 -
>  drivers/crypto/caam/caamalg.c  | 266 
> ++---
>  drivers/crypto/caam/caamalg_desc.c | 139 ++-
>  drivers/crypto/caam/caamalg_desc.h |   5 +
>  drivers/crypto/caam/caamalg_qi.c   |  37 --
>  drivers/crypto/caam/caamalg_qi2.c  | 156 +-
>  drivers/crypto/caam/caamhash.c |  20 ++-
>  drivers/crypto/caam/caampkc.c  |  10 +-
>  drivers/crypto/caam/caamrng.c  |  10 +-
>  drivers/crypto/caam/compat.h   |   2 +
>  drivers/crypto/caam/ctrl.c |  28 +++-
>  drivers/crypto/caam/desc.h |  28 
>  drivers/crypto/caam/desc_constr.h  |   7 +-
>  drivers/crypto/caam/regs.h |  74 +--
>  include/crypto/chacha20.h  |   1 +
>  15 files changed, 724 insertions(+), 61 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-15 Thread Herbert Xu

On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> This patchset improves performance of the ChaCha20 SIMD implementations
> for x86_64. For some specific encryption lengths, performance is more
> than doubled. Two mechanisms are used to achieve this:
> 
> * Instead of calculating the minimal number of required blocks for a
>   given encryption length, functions producing more blocks are used
>   more aggressively. Calculating a 4-block function can be faster than
>   calculating a 2-block and a 1-block function, even if only three
>   blocks are actually required.
> 
> * In addition to the 8-block AVX2 function, a 4-block and a 2-block
>   function are introduced.
> 
> Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> 8-block functions. Patch 4 makes use of that by engaging the next higher
> level block functions more aggressively. Patch 5 and 6 add the new AVX2
> functions for 2 and 4 blocks. Patches are based on cryptodev and would
> need adjustments to apply on top of the Adiantum patchset.
> 
> Note that the more aggressive use of larger block functions calculate
> blocks that may get discarded. This may have a negative impact on energy
> usage or the processors thermal budget. However, with the new block
> functions we can avoid this over-calculation for many lengths, so the
> performance win can be considered more important.
> 
> Below are performance numbers measured with tcrypt using additional
> encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> existing, new the implementation with this patchset. As comparison
> the numbers for zinc in v6:
> 
>  len  old  new zinc
>8 5908 5818 5818
>   16 5917 5828 5726
>   24 5916 5869 5757
>   32 5920 5789 5813
>   40 5868 5799 5710
>   48 5877 5761 5761
>   56 5869 5797 5742
>   64 5897 5862 5685
>   72 3381 4979 3520
>   80 3364 5541 3475
>   88 3350 4977 3424
>   96 3342 5530 3371
>  104 3328 4923 3313
>  112 3317 5528 3207
>  120 3313 4970 3150
>  128 3492 5535 3568
>  136 2487 4570 3690
>  144 2481 5047 3599
>  152 2473 4565 3566
>  160 2459 5022 3515
>  168 2461 4550 3437
>  176 2454 5020 3325
>  184 2449 4535 3279
>  192 2538 5011 3762
>  200 1962 4537 3702
>  208 1962 4971 3622
>  216 1954 4487 3518
>  224 1949 4936 3445
>  232 1948 4497 3422
>  240 1941 4947 3317
>  248 1940 4481 3279
>  256 3798 4964 3723
>  264 2638 3577 3639
>  272 2637 3567 3597
>  280 2628 3563 3565
>  288 2630 3795 3484
>  296 2621 3580 3422
>  304 2612 3569 3352
>  312 2602 3599 3308
>  320 2694 3821 3694
>  328 2060 3538 3681
>  336 2054 3565 3599
>  344 2054 3553 3523
>  352 2049 3809 3419
>  360 2045 3575 3403
>  368 2035 3560 3334
>  376 2036 3555 3257
>  384 2092 3785 3715
>  392 1691 3505 3612
>  400 1684 3527 3553
>  408 1686 3527 3496
>  416 1684 3804 3430
>  424 1681 3555 3402
>  432 1675 3559 3311
>  440 1672 3558 3275
>  448 1710 3780 3689
>  456 1431 3541 3618
>  464 1428 3538 3576
>  472 1430 3527 3509
>  480 1426 3788 3405
>  488 1423 3502 3397
>  496 1423 3519 3298
>  504 1418 3519 3277
>  512 3694 3736 3735
>  520 2601 2571 2209
>  528 2601 2677 2148
>  536 2587 2534 2164
>  544 2578 2659 2138
>  552 2570 2552 2126
>  560 2566 2661 2035
>  568 2567 2542 2041
>  576 2639 2674 2199
>  584 2031 2531 2183
>  592 2027 2660 2145
>  600 2016 2513 2155
>  608 2009 2638 2133
>  616 2006 2522 2115
>  624 2000 2649 2064
>  632 1996 2518 2045
>  640 2053 2651 2188
>  648 1666 2402 2182
>  656 1663 2517 2158
>  664 1659 2397 2147
>  672 1657 2510 2139
>  680 1656 2394 2114
>  688 1653 2497 2077
>  696 1646 2393 2043
>  704 1678 2510 2208
>  712 1414 2391 2189
>  720 1412 2506 2169
>  728 1411 2384 2145
>  736 1408 2494 2142
>  744 1408 2379 2081
>  752 1405 2485 2064
>  760 1403 2376 2043
>  768 2189 2498 2211
>  776 1756 2137 2192
>  784 1746 2145 2146
>  792 1744 2141 2141
>  800 1743  2094
>  808 1742 2140 2100
>  816 1735 2134 2061
>  824 1731 2135 2045
>  832 1778  2223
>  840 1480 2132 2184
>  848 1480 2134 2173
>  856 1476 2124 2145
>  864 1474 2210 2126
>  872 1472 2127 2105
>  880 1463 2123 2056
>  888 1468 2123 2043
>  896 1494 2208 2219
>  904 1278 2120 2192
>  912 1277 2121 2170
>  920 1273 2118 2149
>  928 1272 2207 2125
>  936 1267 2125 2098
>  944 1265 2127 2060
>  952 1267 2126 2049
>  960 1289 2213 2204
>  968 1125 2123 2187
>  976 1122 2127 2166
>  984 1120 2123 2136
>  992 1118 2207 2119
> 1000 1118 2120 2101
> 1008 1117 2122 2042
> 1016 1115 2121 2048
> 1024 2174 2191 2195
> 1032 1748 1724 1565
> 1040 1745 1782 1544
> 1048 1736 1737 1554
> 1056 1738 1802 1541
> 1064 1735 1728 1523
> 1072 1730 1780 1507
> 1080 1729 1724 1497
> 1088 1757 1783 1592
> 1096 1475 1723 1575
> 1104 1474 1778 1563
> 1112 1472 1708 1544
> 1120 1468 1774 1521
> 1128 1466 1718 1521
> 1136 1462 1780 1501
> 1144 1460 1719 1491
> 1152 1481 1782 1575
> 1160 1271 1647 1558
> 1168 1271 1706 1554
> 1176 1268 1645 1545
> 1184 1265 1711 1538
> 1192 1265 1648 1530
> 1200 1264 1705 1493
> 1208 1262 1647 1498
> 1216 1277 1695 1581

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-15 Thread Jason A. Donenfeld

Hi Martin,

This is nice work, and given that it's quite clean -- and that it's
usually hard to screw up chacha in subtle ways when test vectors pass
(unlike, say, poly1305 or curve25519), I'd be inclined to roll with
your implementation if it can eventually become competitive with Andy
Polyakov's, which I'm currently working on for Zinc (which no longer
has pre-generated code, addressing the biggest hurdle; v9 will be sent
shortly). Specifically, I'm not quite sure the improvements here tip
the balance apply to all avx2 microarchitectures, and most
importantly, there are still no AVX-512 paths, which means it's
considerably slower on all newer generation Intel chips. Andy's has
the AVX-512VL implementation for Skylake (using ymm, so as not to hit
throttling) and AVX-512F for Cannon Lake and beyond (using zmm). I've
attached some measurements below showing how stark the difference is.

The take away is that while Andy's implementation is still ahead in
terms of performance today, I'd certainly encourage your efforts to
gain parity with that, and I'd be happy have that when the performance
and fuzzing time is right for it. So please do keep chipping away at
it; I think it's a potentially useful effort.

Regards,
Jason

size old zinc
  
0 64 54
16 386 372
32 388 396
48 388 420
64 366 350
80 708 666
96 708 692
112 706 736
128 692 648
144 1036 682
160 1036 708
176 1036 730
192 1016 658
208 1360 684
224 1362 708
240 1360 732
256 644 500
272 990 526
288 988 556
304 988 576
320 972 500
336 1314 532
352 1316 558
368 1318 578
384 1308 506
400 1644 532
416 1644 556
432 1644 594
448 1624 508
464 1970 534
480 1970 556
496 1968 582
512 660 624
528 1016 682
544 1016 702
560 1018 728
576 998 654
592 1344 680
608 1344 708
624 1344 730
640 1326 654
656 1670 686
672 1670 708
688 1670 732
704 1652 658
720 1998 682
736 1998 710
752 1996 734
768 1256 662
784 1606 688
800 1606 714
816 1606 736
832 1584 660
848 1948 688
864 1950 714
880 1948 736
896 1912 688
912 2258 718
928 2258 744
944 2256 768
960 2238 692
976 2584 718
992 2584 744
1008 2584 770



On Thu, Nov 15, 2018 at 6:21 PM Herbert Xu  wrote:
>
> On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> > This patchset improves performance of the ChaCha20 SIMD implementations
> > for x86_64. For some specific encryption lengths, performance is more
> > than doubled. Two mechanisms are used to achieve this:
> >
> > * Instead of calculating the minimal number of required blocks for a
> >   given encryption length, functions producing more blocks are used
> >   more aggressively. Calculating a 4-block function can be faster than
> >   calculating a 2-block and a 1-block function, even if only three
> >   blocks are actually required.
> >
> > * In addition to the 8-block AVX2 function, a 4-block and a 2-block
> >   function are introduced.
> >
> > Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> > 8-block functions. Patch 4 makes use of that by engaging the next higher
> > level block functions more aggressively. Patch 5 and 6 add the new AVX2
> > functions for 2 and 4 blocks. Patches are based on cryptodev and would
> > need adjustments to apply on top of the Adiantum patchset.
> >
> > Note that the more aggressive use of larger block functions calculate
> > blocks that may get discarded. This may have a negative impact on energy
> > usage or the processors thermal budget. However, with the new block
> > functions we can avoid this over-calculation for many lengths, so the
> > performance win can be considered more important.
> >
> > Below are performance numbers measured with tcrypt using additional
> > encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> > existing, new the implementation with this patchset. As comparison
> > the numbers for zinc in v6:
> >
> >  len  old  new zinc
> >8 5908 5818 5818
> >   16 5917 5828 5726
> >   24 5916 5869 5757
> >   32 5920 5789 5813
> >   40 5868 5799 5710
> >   48 5877 5761 5761
> >   56 5869 5797 5742
> >   64 5897 5862 5685
> >   72 3381 4979 3520
> >   80 3364 5541 3475
> >   88 3350 4977 3424
> >   96 3342 5530 3371
> >  104 3328 4923 3313
> >  112 3317 5528 3207
> >  120 3313 4970 3150
> >  128 3492 5535 3568
> >  136 2487 4570 3690
> >  144 2481 5047 3599
> >  152 2473 4565 3566
> >  160 2459 5022 3515
> >  168 2461 4550 3437
> >  176 2454 5020 3325
> >  184 2449 4535 3279
> >  192 2538 5011 3762
> >  200 1962 4537 3702
> >  208 1962 4971 3622
> >  216 1954 4487 3518
> >  224 1949 4936 3445
> >  232 1948 4497 3422
> >  240 1941 4947 3317
> >  248 1940 4481 3279
> >  256 3798 4964 3723
> >  264 2638 3577 3639
> >  272 2637 3567 3597
> >  280 2628 3563 3565
> >  288 2630 3795 3484
> >  296 2621 3580 3422
> >  304 2612 3569 3352
> >  312 2602 3599 3308
> >  320 2694 3821 3694
> >  328 2060 3538 3681
> >  336 2054 3565 3599
> >  344 2054 3553 3523
> >  352 2049 3809 3419
> >  360 2045 3575 3403
> >  368 2035 3560 3334
> >  376 2036 3555 3257
> >  384 2092 3785 3715
> >

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-15 Thread Herbert Xu

On Sun, Nov 11, 2018 at 10:36:24AM +0100, Martin Willi wrote:
> This patchset improves performance of the ChaCha20 SIMD implementations
> for x86_64. For some specific encryption lengths, performance is more
> than doubled. Two mechanisms are used to achieve this:
> 
> * Instead of calculating the minimal number of required blocks for a
>   given encryption length, functions producing more blocks are used
>   more aggressively. Calculating a 4-block function can be faster than
>   calculating a 2-block and a 1-block function, even if only three
>   blocks are actually required.
> 
> * In addition to the 8-block AVX2 function, a 4-block and a 2-block
>   function are introduced.
> 
> Patches 1-3 add support for partial lengths to the existing 1-, 4- and
> 8-block functions. Patch 4 makes use of that by engaging the next higher
> level block functions more aggressively. Patch 5 and 6 add the new AVX2
> functions for 2 and 4 blocks. Patches are based on cryptodev and would
> need adjustments to apply on top of the Adiantum patchset.
> 
> Note that the more aggressive use of larger block functions calculate
> blocks that may get discarded. This may have a negative impact on energy
> usage or the processors thermal budget. However, with the new block
> functions we can avoid this over-calculation for many lengths, so the
> performance win can be considered more important.
> 
> Below are performance numbers measured with tcrypt using additional
> encryption lengths; numbers in kOps/s, on my i7-5557U. old is the
> existing, new the implementation with this patchset. As comparison
> the numbers for zinc in v6:
> 
>  len  old  new zinc
>8 5908 5818 5818
>   16 5917 5828 5726
>   24 5916 5869 5757
>   32 5920 5789 5813
>   40 5868 5799 5710
>   48 5877 5761 5761
>   56 5869 5797 5742
>   64 5897 5862 5685
>   72 3381 4979 3520
>   80 3364 5541 3475
>   88 3350 4977 3424
>   96 3342 5530 3371
>  104 3328 4923 3313
>  112 3317 5528 3207
>  120 3313 4970 3150
>  128 3492 5535 3568
>  136 2487 4570 3690
>  144 2481 5047 3599
>  152 2473 4565 3566
>  160 2459 5022 3515
>  168 2461 4550 3437
>  176 2454 5020 3325
>  184 2449 4535 3279
>  192 2538 5011 3762
>  200 1962 4537 3702
>  208 1962 4971 3622
>  216 1954 4487 3518
>  224 1949 4936 3445
>  232 1948 4497 3422
>  240 1941 4947 3317
>  248 1940 4481 3279
>  256 3798 4964 3723
>  264 2638 3577 3639
>  272 2637 3567 3597
>  280 2628 3563 3565
>  288 2630 3795 3484
>  296 2621 3580 3422
>  304 2612 3569 3352
>  312 2602 3599 3308
>  320 2694 3821 3694
>  328 2060 3538 3681
>  336 2054 3565 3599
>  344 2054 3553 3523
>  352 2049 3809 3419
>  360 2045 3575 3403
>  368 2035 3560 3334
>  376 2036 3555 3257
>  384 2092 3785 3715
>  392 1691 3505 3612
>  400 1684 3527 3553
>  408 1686 3527 3496
>  416 1684 3804 3430
>  424 1681 3555 3402
>  432 1675 3559 3311
>  440 1672 3558 3275
>  448 1710 3780 3689
>  456 1431 3541 3618
>  464 1428 3538 3576
>  472 1430 3527 3509
>  480 1426 3788 3405
>  488 1423 3502 3397
>  496 1423 3519 3298
>  504 1418 3519 3277
>  512 3694 3736 3735
>  520 2601 2571 2209
>  528 2601 2677 2148
>  536 2587 2534 2164
>  544 2578 2659 2138
>  552 2570 2552 2126
>  560 2566 2661 2035
>  568 2567 2542 2041
>  576 2639 2674 2199
>  584 2031 2531 2183
>  592 2027 2660 2145
>  600 2016 2513 2155
>  608 2009 2638 2133
>  616 2006 2522 2115
>  624 2000 2649 2064
>  632 1996 2518 2045
>  640 2053 2651 2188
>  648 1666 2402 2182
>  656 1663 2517 2158
>  664 1659 2397 2147
>  672 1657 2510 2139
>  680 1656 2394 2114
>  688 1653 2497 2077
>  696 1646 2393 2043
>  704 1678 2510 2208
>  712 1414 2391 2189
>  720 1412 2506 2169
>  728 1411 2384 2145
>  736 1408 2494 2142
>  744 1408 2379 2081
>  752 1405 2485 2064
>  760 1403 2376 2043
>  768 2189 2498 2211
>  776 1756 2137 2192
>  784 1746 2145 2146
>  792 1744 2141 2141
>  800 1743  2094
>  808 1742 2140 2100
>  816 1735 2134 2061
>  824 1731 2135 2045
>  832 1778  2223
>  840 1480 2132 2184
>  848 1480 2134 2173
>  856 1476 2124 2145
>  864 1474 2210 2126
>  872 1472 2127 2105
>  880 1463 2123 2056
>  888 1468 2123 2043
>  896 1494 2208 2219
>  904 1278 2120 2192
>  912 1277 2121 2170
>  920 1273 2118 2149
>  928 1272 2207 2125
>  936 1267 2125 2098
>  944 1265 2127 2060
>  952 1267 2126 2049
>  960 1289 2213 2204
>  968 1125 2123 2187
>  976 1122 2127 2166
>  984 1120 2123 2136
>  992 1118 2207 2119
> 1000 1118 2120 2101
> 1008 1117 2122 2042
> 1016 1115 2121 2048
> 1024 2174 2191 2195
> 1032 1748 1724 1565
> 1040 1745 1782 1544
> 1048 1736 1737 1554
> 1056 1738 1802 1541
> 1064 1735 1728 1523
> 1072 1730 1780 1507
> 1080 1729 1724 1497
> 1088 1757 1783 1592
> 1096 1475 1723 1575
> 1104 1474 1778 1563
> 1112 1472 1708 1544
> 1120 1468 1774 1521
> 1128 1466 1718 1521
> 1136 1462 1780 1501
> 1144 1460 1719 1491
> 1152 1481 1782 1575
> 1160 1271 1647 1558
> 1168 1271 1706 1554
> 1176 1268 1645 1545
> 1184 1265 1711 1538
> 1192 1265 1648 1530
> 1200 1264 1705 1493
> 1208 1262 1647 1498
> 1216 1277 1695 1581

Re: [PATCH] crypto: inside-secure - remove useless setting of type flags

2018-11-14 Thread Antoine Tenart

Hi Eric,

On Wed, Nov 14, 2018 at 11:10:53AM -0800, Eric Biggers wrote:
> From: Eric Biggers 
> 
> Remove the unnecessary setting of CRYPTO_ALG_TYPE_SKCIPHER.
> Commit 2c95e6d97892 ("crypto: skcipher - remove useless setting of type
> flags") took care of this everywhere else, but a few more instances made
> it into the tree at about the same time.  Squash them before they get
> copy+pasted around again.
> 
> This patch shouldn't change any actual behavior.
> 
> Signed-off-by: Eric Biggers 

Acked-by: Antoine Tenart 

Thanks!
Antoine

> ---
>  drivers/crypto/inside-secure/safexcel_cipher.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c 
> b/drivers/crypto/inside-secure/safexcel_cipher.c
> index 3aef1d43e4351..d531c14020dcb 100644
> --- a/drivers/crypto/inside-secure/safexcel_cipher.c
> +++ b/drivers/crypto/inside-secure/safexcel_cipher.c
> @@ -970,7 +970,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des = {
>   .cra_name = "cbc(des)",
>   .cra_driver_name = "safexcel-cbc-des",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> @@ -1010,7 +1010,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des = {
>   .cra_name = "ecb(des)",
>   .cra_driver_name = "safexcel-ecb-des",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> @@ -1074,7 +1074,7 @@ struct safexcel_alg_template safexcel_alg_cbc_des3_ede 
> = {
>   .cra_name = "cbc(des3_ede)",
>   .cra_driver_name = "safexcel-cbc-des3_ede",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES3_EDE_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> @@ -1114,7 +1114,7 @@ struct safexcel_alg_template safexcel_alg_ecb_des3_ede 
> = {
>   .cra_name = "ecb(des3_ede)",
>   .cra_driver_name = "safexcel-ecb-des3_ede",
>   .cra_priority = 300,
> - .cra_flags = CRYPTO_ALG_TYPE_SKCIPHER | 
> CRYPTO_ALG_ASYNC |
> + .cra_flags = CRYPTO_ALG_ASYNC |
>CRYPTO_ALG_KERN_DRIVER_ONLY,
>   .cra_blocksize = DES3_EDE_BLOCK_SIZE,
>   .cra_ctxsize = sizeof(struct safexcel_cipher_ctx),
> -- 
> 2.19.1.930.g4563a0d9d0-goog
> 

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: Something wrong with cryptodev-2.6 tree?

2018-11-12 Thread Herbert Xu

On Mon, Nov 12, 2018 at 09:44:41AM +0200, Gilad Ben-Yossef wrote:
> Hi,
> 
> It seems that the cryptodev-2.6 tree at
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> has somehow rolled back 3 months ago.
> 
> Not sure if it's a git.kernel.org issue or something else but probably
> worth taking a look?

Thanks Gilad.  It should be fixed now.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-11 Thread Lubomir Rintel

On Sat, 2018-11-10 at 15:51 +0100, Stefan Wahren wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.
> 
> Cc: Lubomir Rintel 
> Signed-off-by: Stefan Wahren 
> ---
>  drivers/char/hw_random/bcm2835-rng.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/char/hw_random/bcm2835-rng.c
> b/drivers/char/hw_random/bcm2835-rng.c
> index 6767d96..256b0b1 100644
> --- a/drivers/char/hw_random/bcm2835-rng.c
> +++ b/drivers/char/hw_random/bcm2835-rng.c
> @@ -1,10 +1,7 @@
> -/**
> +// SPDX-License-Identifier: GPL-2.0
> +/*
>   * Copyright (c) 2010-2012 Broadcom. All rights reserved.
>   * Copyright (c) 2013 Lubomir Rintel
> - *
> - * This program is free software; you can redistribute it and/or
> - * modify it under the terms of the GNU General Public License
> ("GPL")
> - * version 2, as published by the Free Software Foundation.
>   */
>  
>  #include 

Acked-by: Lubomir Rintel

Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-10 Thread Eric Anholt

Stefan Wahren  writes:

> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature

Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-10 Thread Greg Kroah-Hartman

On Sat, Nov 10, 2018 at 03:51:16PM +0100, Stefan Wahren wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.
> 
> Cc: Lubomir Rintel 
> Signed-off-by: Stefan Wahren 
> ---
>  drivers/char/hw_random/bcm2835-rng.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)

Acked-by: Greg Kroah-Hartman

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-09 Thread Herbert Xu

On Sat, Oct 20, 2018 at 02:01:52AM +0300, Dmitry Eremin-Solenikov wrote:
> crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
> IV, rather than with data stream, resulting in incorrect decryption.
> Test vectors will be added in the next patch.
> 
> Signed-off-by: Dmitry Eremin-Solenikov 
> Cc: sta...@vger.kernel.org
> ---
>  crypto/cfb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v3 0/2] crypto: some hardening against AES cache-timing attacks

2018-11-09 Thread Herbert Xu

On Wed, Oct 17, 2018 at 09:37:57PM -0700, Eric Biggers wrote:
> This series makes the "aes-fixed-time" and "aes-arm" implementations of
> AES more resistant to cache-timing attacks.
> 
> Note that even after these changes, the implementations still aren't
> necessarily guaranteed to be constant-time; see
> https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
> of the many difficulties involved in writing truly constant-time AES
> software.  But it's valuable to make such attacks more difficult.
> 
> Changed since v2:
> - In aes-arm, move the IRQ disable/enable into the assembly file.
> - Other aes-arm tweaks.
> - Add Kconfig help text.
> 
> Thanks to Ard Biesheuvel for the suggestions.
> 
> Eric Biggers (2):
>   crypto: aes_ti - disable interrupts while accessing S-box
>   crypto: arm/aes - add some hardening against cache-timing attacks
> 
>  arch/arm/crypto/Kconfig   |  9 +
>  arch/arm/crypto/aes-cipher-core.S | 62 ++-
>  crypto/Kconfig|  3 +-
>  crypto/aes_generic.c  |  9 +++--
>  crypto/aes_ti.c   | 18 +
>  5 files changed, 86 insertions(+), 15 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Ard Biesheuvel

On 9 November 2018 at 10:45, Herbert Xu  wrote:
> On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote:
>> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
>> >
>> > This should be
>> >
>> > reqsize += max(crypto_skcipher_reqsize(_tfm->base);
>> >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
>> >
>> > since the cryptd path in simd still needs some space in the subreq for
>> > the completion.
>>
>> OK this is what I applied to the cryptodev tree, please double-check
>> to see if I did anything silly:
>
> I meant the crypto tree rather than cryptodev.
>

That looks fine. Thanks Herbert.

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Herbert Xu

On Fri, Nov 09, 2018 at 05:44:47PM +0800, Herbert Xu wrote:
> On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
> >
> > This should be
> > 
> > reqsize += max(crypto_skcipher_reqsize(_tfm->base);
> >crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
> > 
> > since the cryptd path in simd still needs some space in the subreq for
> > the completion.
> 
> OK this is what I applied to the cryptodev tree, please double-check
> to see if I did anything silly:

I meant the crypto tree rather than cryptodev.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-09 Thread Herbert Xu

On Fri, Nov 09, 2018 at 12:33:23AM +0100, Ard Biesheuvel wrote:
>
> This should be
> 
> reqsize += max(crypto_skcipher_reqsize(_tfm->base);
>crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
> 
> since the cryptd path in simd still needs some space in the subreq for
> the completion.

OK this is what I applied to the cryptodev tree, please double-check
to see if I did anything silly:

diff --git a/crypto/simd.c b/crypto/simd.c
index ea7240be3001..78e8d037ae2b 100644
--- a/crypto/simd.c
+++ b/crypto/simd.c
@@ -124,8 +124,9 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm)
 
ctx->cryptd_tfm = cryptd_tfm;
 
-   reqsize = sizeof(struct skcipher_request);
-   reqsize += crypto_skcipher_reqsize(_tfm->base);
+   reqsize = crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
+   reqsize = max(reqsize, crypto_skcipher_reqsize(_tfm->base));
+   reqsize += sizeof(struct skcipher_request);
 
crypto_skcipher_set_reqsize(tfm, reqsize);
 
Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: .S_shipped unnecessary?

2018-11-08 Thread Masahiro Yamada

On Fri, Nov 9, 2018 at 8:42 AM Ard Biesheuvel  wrote:
>
> (+ Masahiro, kbuild ml)
>
> On 8 November 2018 at 21:37, Jason A. Donenfeld  wrote:
> > Hi Ard, Eric, and others,
> >
> > As promised, the next Zinc patchset will have less generated code! After a
> > bit of work with Andy and Samuel, I'll be bundling the perlasm.
> >
>
> Wonderful! Any problems doing that for x86_64 ?
>
> > One thing I'm wondering about, though, is the wisdom behind the current
> > .S_shipped pattern. Usually the _shipped is for big firmware blobs that are
> > hard (or impossible) to build independently. But in this case, the .S is
> > generated from the .pl significantly faster than gcc even compiles a basic
> > C file. And, since perl is needed to build the kernel anyway, it's not like
> > it will be impossible to find the right tools. Rather than clutter up 
> > commits
> > with the .pl _and_ the .S_shipped, what would you think if I just generated
> > the .S each time as an ordinary build artifact. AFAICT, this is fairly 
> > usual,
> > and it's hard to see downsides. Hence, why I'm writing this email: are there
> > any downsides to that?
> >
>
> I agree 100%. When I added this the first time, it was at the request
> of the ARM maintainer, who was reluctant to rely on Perl for some
> reason.
>
> Recently, we have had to add a kludge to prevent spurious rebuilds of
> the .S_shipped files as well.
>
> I'd be perfectly happy to get rid of this entirely, and always
> generate the .S from the .pl, which to me is kind of the point of
> carrying these files in the first place.
>
> Masahiro: do you see any problems with this?


No problem.


Documentation/process/changes.rst says the following:

You will need perl 5 and the following modules: ``Getopt::Long``,
``Getopt::Std``, ``File::Basename``, and ``File::Find`` to build the kernel.



We can assume perl is installed on the user's build machine.



--
Best Regards
Masahiro Yamada

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Qian Cai




> On Nov 8, 2018, at 6:33 PM, Ard Biesheuvel  wrote:
> 
> On 8 November 2018 at 23:55, Ard Biesheuvel  wrote:
>> The simd wrapper's skcipher request context structure consists
>> of a single subrequest whose size is taken from the subordinate
>> skcipher. However, in simd_skcipher_init(), the reqsize that is
>> retrieved is not from the subordinate skcipher but from the
>> cryptd request structure, whose size is completely unrelated to
>> the actual wrapped skcipher.
>> 
>> Reported-by: Qian Cai 
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> crypto/simd.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/crypto/simd.c b/crypto/simd.c
>> index ea7240be3001..2f3d6e897afc 100644
>> --- a/crypto/simd.c
>> +++ b/crypto/simd.c
>> @@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher 
>> *tfm)
>>ctx->cryptd_tfm = cryptd_tfm;
>> 
>>reqsize = sizeof(struct skcipher_request);
>> -   reqsize += crypto_skcipher_reqsize(_tfm->base);
>> +   reqsize += 
>> crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
>> 
> 
> This should be
> 
> reqsize += max(crypto_skcipher_reqsize(_tfm->base);
>   crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));
> 
> since the cryptd path in simd still needs some space in the subreq for
> the completion.
Tested-by: Qian Cai

Re: .S_shipped unnecessary?

2018-11-08 Thread Jason A. Donenfeld

Hey Ard,

On Fri, Nov 9, 2018 at 12:42 AM Ard Biesheuvel
 wrote:
> Wonderful! Any problems doing that for x86_64 ?

The x86_64 is still a WIP, but hopefully we'll succeed.

> I agree 100%. When I added this the first time, it was at the request
> of the ARM maintainer, who was reluctant to rely on Perl for some
> reason.
>
> Recently, we have had to add a kludge to prevent spurious rebuilds of
> the .S_shipped files as well.
>
> I'd be perfectly happy to get rid of this entirely, and always
> generate the .S from the .pl, which to me is kind of the point of
> carrying these files in the first place.

Terrific. I'll move ahead in that direction then. It makes things _so_
much cleaner, and doesn't introduce new build modes ("should the
generated _ship go into the build directory or the source directory?
what kind of artifact is it? how to address $(srcdir) vs $(src) in
that context? bla bla") that really over complicate things.

Jason

Re: .S_shipped unnecessary?

2018-11-08 Thread Ard Biesheuvel

(+ Masahiro, kbuild ml)

On 8 November 2018 at 21:37, Jason A. Donenfeld  wrote:
> Hi Ard, Eric, and others,
>
> As promised, the next Zinc patchset will have less generated code! After a
> bit of work with Andy and Samuel, I'll be bundling the perlasm.
>

Wonderful! Any problems doing that for x86_64 ?

> One thing I'm wondering about, though, is the wisdom behind the current
> .S_shipped pattern. Usually the _shipped is for big firmware blobs that are
> hard (or impossible) to build independently. But in this case, the .S is
> generated from the .pl significantly faster than gcc even compiles a basic
> C file. And, since perl is needed to build the kernel anyway, it's not like
> it will be impossible to find the right tools. Rather than clutter up commits
> with the .pl _and_ the .S_shipped, what would you think if I just generated
> the .S each time as an ordinary build artifact. AFAICT, this is fairly usual,
> and it's hard to see downsides. Hence, why I'm writing this email: are there
> any downsides to that?
>

I agree 100%. When I added this the first time, it was at the request
of the ARM maintainer, who was reluctant to rely on Perl for some
reason.

Recently, we have had to add a kludge to prevent spurious rebuilds of
the .S_shipped files as well.

I'd be perfectly happy to get rid of this entirely, and always
generate the .S from the .pl, which to me is kind of the point of
carrying these files in the first place.

Masahiro: do you see any problems with this?

Re: [PATCH] crypto/simd: correctly take reqsize of wrapped skcipher into account

2018-11-08 Thread Ard Biesheuvel

On 8 November 2018 at 23:55, Ard Biesheuvel  wrote:
> The simd wrapper's skcipher request context structure consists
> of a single subrequest whose size is taken from the subordinate
> skcipher. However, in simd_skcipher_init(), the reqsize that is
> retrieved is not from the subordinate skcipher but from the
> cryptd request structure, whose size is completely unrelated to
> the actual wrapped skcipher.
>
> Reported-by: Qian Cai 
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/simd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/simd.c b/crypto/simd.c
> index ea7240be3001..2f3d6e897afc 100644
> --- a/crypto/simd.c
> +++ b/crypto/simd.c
> @@ -125,7 +125,7 @@ static int simd_skcipher_init(struct crypto_skcipher *tfm)
> ctx->cryptd_tfm = cryptd_tfm;
>
> reqsize = sizeof(struct skcipher_request);
> -   reqsize += crypto_skcipher_reqsize(_tfm->base);
> +   reqsize += crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm));
>

This should be

reqsize += max(crypto_skcipher_reqsize(_tfm->base);
   crypto_skcipher_reqsize(cryptd_skcipher_child(cryptd_tfm)));

since the cryptd path in simd still needs some space in the subreq for
the completion.

Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include

2018-11-07 Thread Vladimir Murzin

On 07/11/18 14:55, Will Deacon wrote:
> On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote:
>> There are cases where the whole feature, for instance arm64/lse or
>> arm/crypto, can depend on assembler. Current practice is to report
>> buildtime that selected feature is not supported, which can be quite
>> annoying...
> 
> Why is it annoying? You still end up with a working kernel.

.config doesn't really represent if option was built or not, annoying
part is digging build logs (if anyone's saved them at all!) or relevant
parts of dmesg (if option throws anything in there and which not always
part of reports).

> 
>> It'd nicer if we can check assembler first and opt-in feature
>> visibility in Kconfig.
>>
>> Cc: Masahiro Yamada 
>> Cc: Will Deacon 
>> Cc: Marc Zyngier 
>> Cc: Ard Biesheuvel 
>> Signed-off-by: Vladimir Murzin 
>> ---
>>  scripts/Kconfig.include | 4 
>>  1 file changed, 4 insertions(+)
> 
> One issue I have with doing the check like this is that if somebody sends
> you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel
> using that .config and an old toolchain, the option is silently dropped.

I see... at least we have some tools like ./scripts/diffconfig

> 
> I think the diagnostic is actually useful in this case.

Fully agree on diagnostic side, any suggestions how it can be improved?

Cheers
Vladimir

> 
> Will
>

Re: [RFC PATCH 1/4] kconfig: add as-instr macro to scripts/Kconfig.include

2018-11-07 Thread Will Deacon

On Wed, Nov 07, 2018 at 09:40:05AM +, Vladimir Murzin wrote:
> There are cases where the whole feature, for instance arm64/lse or
> arm/crypto, can depend on assembler. Current practice is to report
> buildtime that selected feature is not supported, which can be quite
> annoying...

Why is it annoying? You still end up with a working kernel.

> It'd nicer if we can check assembler first and opt-in feature
> visibility in Kconfig.
> 
> Cc: Masahiro Yamada 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: Ard Biesheuvel 
> Signed-off-by: Vladimir Murzin 
> ---
>  scripts/Kconfig.include | 4 
>  1 file changed, 4 insertions(+)

One issue I have with doing the check like this is that if somebody sends
you a .config with e.g. ARM64_LSE_ATOMICS=y and you try to build a kernel
using that .config and an old toolchain, the option is silently dropped.

I think the diagnostic is actually useful in this case.

Will

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-01 Thread Dmitry Eremin-Solenikov

чт, 1 нояб. 2018 г. в 11:41, Herbert Xu :
>
> On Thu, Nov 01, 2018 at 11:32:37AM +0300, Dmitry Eremin-Solenikov wrote:
> >
> > Since 4.20 pull went into Linus'es tree, any change of getting these two 
> > patches
> > in crypto tree?
>
> These aren't critical enough for the current mainline so they will
> go in at the next merge window.

Thank you.


-- 
With best wishes
Dmitry

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-01 Thread Herbert Xu

On Thu, Nov 01, 2018 at 11:32:37AM +0300, Dmitry Eremin-Solenikov wrote:
>
> Since 4.20 pull went into Linus'es tree, any change of getting these two 
> patches
> in crypto tree?

These aren't critical enough for the current mainline so they will
go in at the next merge window.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-11-01 Thread Dmitry Eremin-Solenikov

Hello,

вс, 21 окт. 2018 г. в 11:07, James Bottomley
:
>
> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
> > (+ James)
>
> Thanks!
>
> > On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
> >  wrote:
> > > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream
> > > with
> > > IV, rather than with data stream, resulting in incorrect
> > > decryption.
> > > Test vectors will be added in the next patch.
> > >
> > > Signed-off-by: Dmitry Eremin-Solenikov 
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  crypto/cfb.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/crypto/cfb.c b/crypto/cfb.c
> > > index a0d68c09e1b9..fd4e8500e121 100644
> > > --- a/crypto/cfb.c
> > > +++ b/crypto/cfb.c
> > > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct
> > > skcipher_walk *walk,
> > >
> > > do {
> > > crypto_cfb_encrypt_one(tfm, iv, dst);
> > > -   crypto_xor(dst, iv, bsize);
> > > +   crypto_xor(dst, src, bsize);
>
> This does look right.  I think the reason the TPM code works is that it
> always does encrypt/decrypt in-place, which is a separate piece of the
> code which appears to be correct.

Since 4.20 pull went into Linus'es tree, any change of getting these two patches
in crypto tree?

-- 
With best wishes
Dmitry

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-25 Thread Jarkko Sakkinen


On Wed, 24 Oct 2018, James Bottomley wrote:

+static void KDFa(u8 *key, int keylen, const char *label, u8 *u,
+u8 *v, int bytes, u8 *out)


Should this be in lower case? I would rename it as tpm_kdfa().


This one is defined as KDFa() in the standards and it's not TPM
specific (although some standards refer to it as KDFA).  I'm not averse
to making them tpm_kdfe() and tpm_kdfa() but I was hoping that one day
the crypto subsystem would need them and we could move them in there
because KDFs are the new shiny in crypto primitives (TLS 1.2 started
using them, for instance).


I care more about tracing and debugging than naming and having 'tpm_' in
front of every TPM function makes tracing a lean process. AFAIK using
upper case letters is against kernel coding conventions. I'm not sure
why this would make an exception on that.


Why doesn't it matter here?


Because, as the comment says, it eventually gets overwritten by running
ecdh to derive the two co-ordinates.  (pointers to these two
uninitialized areas are passed into the ecdh destination sg list).


Oh, I just misunderstood the comment. Wouldn't it be easier to say that
the data is initialized later?


+   buf_len = crypto_ecdh_key_len();
+   if (sizeof(encoded_key) < buf_len) {
+   dev_err(>dev, "salt buffer too small needs
%d\n",
+   buf_len);
+   goto out;
+   }


In what situation this can happen? Can sizeof(encoded_key) >=
buf_len?


Yes, but only if someone is trying to crack your ecdh.  One of the
security issues in ecdh is if someone makes a very specific point
choice (usually in the cofactor space) that has a very short period,
the attacker can guess the input to KDFe.  In this case if TPM genie
provided a specially crafted attack EC point, we'd detect it here
because the resulting buffer would be too short.


Right. Thank you for the explanation. Here some kind of comment might
not be a bad idea...


In general this function should have a clear explanation what it does
and maybe less these one character variables but instead variables
with more documenting names. Explain in high level what algorithms
are used and how the salt is calculated.


I'll try, but this is a rather complex function.


Understood. I do not expect perfection here and we can improve
documetation later on.

For anyone wanting to review James' patches and w/o much experience on
EC, I recommend reading this article:

https://arstechnica.com/information-technology/2013/10/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/

I read it few years ago and refreshed my memory few days ago by
re-reading it.




+
+/**
+ * tpm_buf_append_hmac_session() append a TPM session element
+ * @buf: The buffer to be appended
+ * @auth: the auth structure allocated by
tpm2_start_auth_session()
+ * @attributes: The session attributes
+ * @passphrase: The session authority (NULL if none)
+ * @passphraselen: The length of the session authority (0 if none)


The alignment.


the alignment of what?


We generally have parameter descriptions tab-aligned.


Why there would be trailing zeros?


Because TPM 1.2 mandated zero padded fixed size passphrases so the TPM
2.0 standard specifies a way of converting these to variable size
strings by eliminating the zero padding.


Ok.


James


I'm also looking forward for the CONTEXT_GAP patch based on the
yesterdays discussion. We do want it and I was stupid not to take it
couple years ago :-) Thanks.

/Jarkko

Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions

2018-10-25 Thread Jarkko Sakkinen


On Wed, 24 Oct 2018, James Bottomley wrote:

On Wed, 2018-10-24 at 02:51 +0300, Jarkko Sakkinen wrote:

I would consider sending first a patch set that would iterate the
existing session stuff to be ready for this i.e. merge in two
iterations (emphasis on the word "consider"). We can probably merge
the groundwork quite fast.


I realise we're going to have merge conflicts on the later ones, so why
don't we do this: I'll still send as one series, but you apply the ones
you think are precursors and I'll rebase and resend the rest?

James


Works for me and now I think after yesterdays dicussions etc. that this
should be merged as one series.

/Jarkko

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-24 Thread James Bottomley

On Wed, 2018-10-24 at 02:48 +0300, Jarkko Sakkinen wrote:
> On Mon, 22 Oct 2018, James Bottomley wrote:
> > [...]

I'll tidy up the descriptions.

> These all sould be combined with the existing session stuff inside
> tpm2-cmd.c and not have duplicate infrastructures. The file name
> should be tpm2-session.c (we neither have tpm2-cmds.c).

You mean move tpm2_buf_append_auth() into the new sessions file as well
... sure, I can do that.

[...]
> > +
> > +/*
> > + * assume hash sha256 and nonces u, v of size SHA256_DIGEST_SIZE
> > but
> > + * otherwise standard KDFa.  Note output is in bytes not bits.
> > + */
> > +static void KDFa(u8 *key, int keylen, const char *label, u8 *u,
> > +u8 *v, int bytes, u8 *out)
> 
> Should this be in lower case? I would rename it as tpm_kdfa().

This one is defined as KDFa() in the standards and it's not TPM
specific (although some standards refer to it as KDFA).  I'm not averse
to making them tpm_kdfe() and tpm_kdfa() but I was hoping that one day
the crypto subsystem would need them and we could move them in there
because KDFs are the new shiny in crypto primitives (TLS 1.2 started
using them, for instance).

> > +{
> > +   u32 counter;
> > +   const __be32 bits = cpu_to_be32(bytes * 8);
> > +
> > +   for (counter = 1; bytes > 0; bytes -= SHA256_DIGEST_SIZE,
> > counter++,
> > +out += SHA256_DIGEST_SIZE) {
> 
> Only one counter is actually used for anything so this is overly
> complicated and IMHO it is ok to call the counter just 'i'. Maybe
> just:
> 
> for (i = 1; (bytes - (i - 1) * SHA256_DIGEST_SIZE) > 0; i++) {
> 
> > +   SHASH_DESC_ON_STACK(desc, sha256_hash);
> > +   __be32 c = cpu_to_be32(counter);
> > +
> > +   hmac_init(desc, key, keylen);
> > +   crypto_shash_update(desc, (u8 *), sizeof(c));
> > +   crypto_shash_update(desc, label, strlen(label)+1);
> > +   crypto_shash_update(desc, u, SHA256_DIGEST_SIZE);
> > +   crypto_shash_update(desc, v, SHA256_DIGEST_SIZE);
> > +   crypto_shash_update(desc, (u8 *),
> > sizeof(bits));
> > +   hmac_final(desc, key, keylen, out);
> > +   }
> > +}
> > +
> > +/*
> > + * Somewhat of a bastardization of the real KDFe.  We're assuming
> > + * we're working with known point sizes for the input parameters
> > and
> > + * the hash algorithm is fixed at sha256.  Because we know that
> > the
> > + * point size is 32 bytes like the hash size, there's no need to
> > loop
> > + * in this KDF.
> > + */
> > +static void KDFe(u8 z[EC_PT_SZ], const char *str, u8 *pt_u, u8
> > *pt_v,
> > +u8 *keyout)
> > +{
> > +   SHASH_DESC_ON_STACK(desc, sha256_hash);
> > +   /*
> > +* this should be an iterative counter, but because we
> > know
> > +*  we're only taking 32 bytes for the point using a
> > sha256
> > +*  hash which is also 32 bytes, there's only one loop
> > +*/
> > +   __be32 c = cpu_to_be32(1);
> > +
> > +   desc->tfm = sha256_hash;
> > +   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> > +
> > +   crypto_shash_init(desc);
> > +   /* counter (BE) */
> > +   crypto_shash_update(desc, (u8 *), sizeof(c));
> > +   /* secret value */
> > +   crypto_shash_update(desc, z, EC_PT_SZ);
> > +   /* string including trailing zero */
> > +   crypto_shash_update(desc, str, strlen(str)+1);
> > +   crypto_shash_update(desc, pt_u, EC_PT_SZ);
> > +   crypto_shash_update(desc, pt_v, EC_PT_SZ);
> > +   crypto_shash_final(desc, keyout);
> > +}
> > +
> > +static void tpm_buf_append_salt(struct tpm_buf *buf, struct
> > tpm_chip *chip,
> > +   struct tpm2_auth *auth)
> 
> Given the complexity of this function and some not that obvious
> choices in the implementation (coordinates), it would make sense to
> document this function.

I'll try to beef up the salting description

> > +{
> > +   struct crypto_kpp *kpp;
> > +   struct kpp_request *req;
> > +   struct scatterlist s[2], d[1];
> > +   struct ecdh p = {0};
> > +   u8 encoded_key[EC_PT_SZ], *x, *y;
> 
> Why you use one character variable name 'p' and longer name
> 'encoded_key'?
> 
> > +   unsigned int buf_len;
> > +   u8 *secret;
> > +
> > +   secret = kmalloc(EC_PT_SZ, GFP_KERNEL);
> > +   if (!secret)
> > +   return;
> > +
> > +   p.curve_id = ECC_CURVE_NIST_P256;
> 
> Could this be set already in the initialization?

I'm never sure about designated initializers, but I think, after
looking them up again, it will zero fill unmentioned elements.

> > +
> > +   /* secret is two sized points */
> > +   tpm_buf_append_u16(buf, (EC_PT_SZ + 2)*2);
> 
> White space missing. Should be "(EC_PT_SZ + 2) * 2". The comment is a
> bit obscure (maybe, do not have any specific suggestion how to make
> it less obscure).
> 
> > +   /*
> > +* we cheat here and append uninitialized data to form
> > +* the points.  All we care about is getting the two
> > +* co-ordinate pointers, which will be used to overwrite
> > +* the uninitialized data
> > +*/
> 
>

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-24 Thread Jarkko Sakkinen


On Tue, 23 Oct 2018, Ard Biesheuvel wrote:

On 23 October 2018 at 04:01, James Bottomley
 wrote:

On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote:
[...]

+static void hmac_init(struct shash_desc *desc, u8 *key, int
keylen)
+{
+   u8 pad[SHA256_BLOCK_SIZE];
+   int i;
+
+   desc->tfm = sha256_hash;
+   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;


I don't think this actually does anything in the shash API
implementation, so you can drop this.


OK, I find crypto somewhat hard to follow.  There were bits I had to
understand, like when I wrote the CFB implementation or when I fixed
the ECDH scatterlist handling, but I've got to confess, in time
honoured tradition I simply copied this from EVM crypto without
actually digging into the code to understand why.



Yeah, it is notoriously hard to use, and we should try to improve that.


James,

I would hope (already said in my review) to use longer than one
character variable names for most of the stuff. I did not quite
understand why you decided to use 'counter' for obvious counter
variable and one character names for non-obvious stuff :-)

I'm not sure where the 'encoded' exactly comes in the variable
name 'encoded_key' especially in the context of these cryptic
names.

/Jarkko

Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions

2018-10-24 Thread James Bottomley

On Wed, 2018-10-24 at 02:51 +0300, Jarkko Sakkinen wrote:
> I would consider sending first a patch set that would iterate the
> existing session stuff to be ready for this i.e. merge in two
> iterations (emphasis on the word "consider"). We can probably merge
> the groundwork quite fast.

I realise we're going to have merge conflicts on the later ones, so why
don't we do this: I'll still send as one series, but you apply the ones
you think are precursors and I'll rebase and resend the rest?

James

Re: [PATCH v4 5/7] trusted keys: Add session encryption protection to the seal/unseal path

2018-10-23 Thread Jarkko Sakkinen


The tag in the short description does not look at all. Should be either
"tpm:" or "keys, trusted:".

On Mon, 22 Oct 2018, James Bottomley wrote:

If some entity is snooping the TPM bus, the can see the data going in
to be sealed and the data coming out as it is unsealed.  Add parameter
and response encryption to these cases to ensure that no secrets are
leaked even if the bus is snooped.

As part of doing this conversion it was discovered that policy
sessions can't work with HMAC protected authority because of missing
pieces (the tpm Nonce).  I've added code to work the same way as
before, which will result in potential authority exposure (while still
adding security for the command and the returned blob), and a fixme to
redo the API to get rid of this security hole.

Signed-off-by: James Bottomley 
---
drivers/char/tpm/tpm2-cmd.c | 155 
1 file changed, 98 insertions(+), 57 deletions(-)

diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 22f1c7bee173..a8655cd535d1 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -425,7 +425,9 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
{
unsigned int blob_len;
struct tpm_buf buf;
+   struct tpm_buf t2b;
u32 hash;
+   struct tpm2_auth *auth;
int i;
int rc;

@@ -439,45 +441,56 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
if (i == ARRAY_SIZE(tpm2_hash_map))
return -EINVAL;

-   rc = tpm_buf_init(, TPM2_ST_SESSIONS, TPM2_CC_CREATE);
+   rc = tpm2_start_auth_session(chip, );
if (rc)
return rc;

-   tpm_buf_append_u32(, options->keyhandle);
-   tpm2_buf_append_auth(, TPM2_RS_PW,
-NULL /* nonce */, 0,
-0 /* session_attributes */,
-options->keyauth /* hmac */,
-TPM_DIGEST_SIZE);
+   rc = tpm_buf_init(, TPM2_ST_SESSIONS, TPM2_CC_CREATE);
+   if (rc) {
+   tpm2_end_auth_session(auth);
+   return rc;
+   }
+
+   rc = tpm_buf_init_2b();
+   if (rc) {
+   tpm_buf_destroy();
+   tpm2_end_auth_session(auth);
+   return rc;
+   }

+   tpm_buf_append_name(, auth, options->keyhandle, NULL);
+   tpm_buf_append_hmac_session(, auth, TPM2_SA_DECRYPT,
+   options->keyauth, TPM_DIGEST_SIZE);
/* sensitive */
-   tpm_buf_append_u16(, 4 + TPM_DIGEST_SIZE + payload->key_len + 1);
+   tpm_buf_append_u16(, TPM_DIGEST_SIZE);
+   tpm_buf_append(, options->blobauth, TPM_DIGEST_SIZE);
+   tpm_buf_append_u16(, payload->key_len + 1);
+   tpm_buf_append(, payload->key, payload->key_len);
+   tpm_buf_append_u8(, payload->migratable);

-   tpm_buf_append_u16(, TPM_DIGEST_SIZE);
-   tpm_buf_append(, options->blobauth, TPM_DIGEST_SIZE);
-   tpm_buf_append_u16(, payload->key_len + 1);
-   tpm_buf_append(, payload->key, payload->key_len);
-   tpm_buf_append_u8(, payload->migratable);
+   tpm_buf_append_2b(, );

/* public */
-   tpm_buf_append_u16(, 14 + options->policydigest_len);
-   tpm_buf_append_u16(, TPM2_ALG_KEYEDHASH);
-   tpm_buf_append_u16(, hash);
+   tpm_buf_append_u16(, TPM2_ALG_KEYEDHASH);
+   tpm_buf_append_u16(, hash);

/* policy */
if (options->policydigest_len) {
-   tpm_buf_append_u32(, 0);
-   tpm_buf_append_u16(, options->policydigest_len);
-   tpm_buf_append(, options->policydigest,
+   tpm_buf_append_u32(, 0);
+   tpm_buf_append_u16(, options->policydigest_len);
+   tpm_buf_append(, options->policydigest,
   options->policydigest_len);
} else {
-   tpm_buf_append_u32(, TPM2_OA_USER_WITH_AUTH);
-   tpm_buf_append_u16(, 0);
+   tpm_buf_append_u32(, TPM2_OA_USER_WITH_AUTH);
+   tpm_buf_append_u16(, 0);
}

/* public parameters */
-   tpm_buf_append_u16(, TPM2_ALG_NULL);
-   tpm_buf_append_u16(, 0);
+   tpm_buf_append_u16(, TPM2_ALG_NULL);
+   /* unique (zero) */
+   tpm_buf_append_u16(, 0);
+
+   tpm_buf_append_2b(, );

/* outside info */
tpm_buf_append_u16(, 0);
@@ -490,8 +503,11 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
goto out;
}

-   rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 4, 0,
- "sealing data");
+   tpm_buf_fill_hmac_session(, auth);
+
+   rc = tpm_transmit_cmd(chip, >kernel_space, buf.data,
+ PAGE_SIZE, 4, 0, "sealing data");
+   rc = tpm_buf_check_hmac_response(, auth, rc);
if (rc)
goto out;

@@ -509,6 +525,7 @@ int tpm2_seal_trusted(struct tpm_chip *chip,
payload->blob_len = blob_len;

Re: [PATCH v4 0/7] add integrity and security to TPM2 transactions

2018-10-23 Thread Jarkko Sakkinen


I would consider sending first a patch set that would iterate the existing
session stuff to be ready for this i.e. merge in two iterations 
(emphasis on the word "consider"). We can probably merge the groundwork

quite fast.

/Jarkko

On Mon, 22 Oct 2018, James Bottomley wrote:

By now, everybody knows we have a problem with the TPM2_RS_PW easy
button on TPM2 in that transactions on the TPM bus can be intercepted
and altered.  The way to fix this is to use real sessions for HMAC
capabilities to ensure integrity and to use parameter and response
encryption to ensure confidentiality of the data flowing over the TPM
bus.

This patch series is about adding a simple API which can ensure the
above properties as a layered addition to the existing TPM handling
code.  This series now includes protections for PCR extend, getting
random numbers from the TPM and data sealing and unsealing.  It
therefore eliminates all uses of TPM2_RS_PW in the kernel and adds
encryption protection to sensitive data flowing into and out of the
TPM.

In the third version I added data sealing and unsealing protection,
apart from one API based problem which means that the way trusted keys
were protected it's not currently possible to HMAC protect an authority
that comes with a policy, so the API will have to be extended to fix
that case

In this fourth version, I tidy up some of the code and add more
security features, the most notable is that we now calculate the NULL
seed name and compare our calculation to the value returned in
TPM2_ReadPublic, which means we now can't be spoofed.  This version
also gives a sysfs variable for the null seed which userspace can use
to run a key certification operation to prove that the TPM was always
secure when communicating with the kernel.

I've verified this using the test suite in the last patch on a VM
connected to a tpm2 emulator.  I also instrumented the emulator to make
sure the sensitive data was properly encrypted.

James

---


James Bottomley (7):
 tpm-buf: create new functions for handling TPM buffers
 tpm2-sessions: Add full HMAC and encrypt/decrypt session handling
 tpm2: add hmac checks to tpm2_pcr_extend()
 tpm2: add session encryption protection to tpm2_get_random()
 trusted keys: Add session encryption protection to the seal/unseal
   path
 tpm: add the null key name as a tpm2 sysfs variable
 tpm2-sessions: NOT FOR COMMITTING add sessions testing

drivers/char/tpm/Kconfig  |3 +
drivers/char/tpm/Makefile |3 +-
drivers/char/tpm/tpm-buf.c|  191 ++
drivers/char/tpm/tpm-chip.c   |1 +
drivers/char/tpm/tpm-sysfs.c  |   27 +-
drivers/char/tpm/tpm.h|  129 ++--
drivers/char/tpm/tpm2-cmd.c   |  248 ---
drivers/char/tpm/tpm2-sessions-test.c |  360 ++
drivers/char/tpm/tpm2-sessions.c  | 1188 +
drivers/char/tpm/tpm2-sessions.h  |   57 ++
10 files changed, 2027 insertions(+), 180 deletions(-)
create mode 100644 drivers/char/tpm/tpm-buf.c
create mode 100644 drivers/char/tpm/tpm2-sessions-test.c
create mode 100644 drivers/char/tpm/tpm2-sessions.c
create mode 100644 drivers/char/tpm/tpm2-sessions.h

--
2.16.4

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-23 Thread Jarkko Sakkinen


On Mon, 22 Oct 2018, James Bottomley wrote:

This code adds true session based HMAC authentication plus parameter
decryption and response encryption using AES.


In order to reduce complexity it would make sense to split into two
commits: authentication and parameter encryption.



The basic design of this code is to segregate all the nasty crypto,
hash and hmac code into tpm2-sessions.c and export a usable API.

The API first of all starts off by gaining a session with

tpm2_start_auth_session()

Which initiates a session with the TPM and allocates an opaque
tpm2_auth structure to handle the session parameters.  Then the use is
simply:

* tpm_buf_append_name() in place of the tpm_buf_append_u32 for the
 handles


Do not understand the description of this function.



* tpm_buf_append_hmac_session() where tpm2_append_auth() would go


Another fuzzy description.



* tpm_buf_fill_hmac_session() called after the entire command buffer
 is finished but before tpm_transmit_cmd() is called which computes
 the correct HMAC and places it in the command at the correct
 location.


I would call this simply tpm_buf_insert_hmac(). It is clear and precise
name what it does.

These all sould be combined with the existing session stuff inside
tpm2-cmd.c and not have duplicate infrastructures. The file name should
be tpm2-session.c (we neither have tpm2-cmds.c).



Finally, after tpm_transmit_cmd() is called,
tpm_buf_check_hmac_response() is called to check that the returned
HMAC matched and collect the new state for the next use of the
session, if any.

The features of the session is controlled by the session attributes
set in tpm_buf_append_hmac_session().  If TPM2_SA_CONTINUE_SESSION is
not specified, the session will be flushed and the tpm2_auth structure
freed in tpm_buf_check_hmac_response(); otherwise the session may be
used again.  Parameter encryption is specified by or'ing the flag
TPM2_SA_DECRYPT and response encryption by or'ing the flag
TPM2_SA_ENCRYPT.  the various encryptions will be taken care of by
tpm_buf_fill_hmac_session() and tpm_buf_check_hmac_response()
respectively.

To get all of this to work securely, the Kernel now needs a primary
key to encrypt the session salt to, so we derive an EC key from the
NULL seed and store it in the tpm_chip structure.  We also make sure
that this seed remains for the kernel by using a kernel space to take
it out of the TPM when userspace wants to use it.

Signed-off-by: James Bottomley 

---

v2: Added docbook and improved response check API
v3: Add readpublic, fix hmac length, add API for close on error
   allow for the hmac session not being first in the sessions
v4: Make NULL seed template exactly match the SRK ECC template.
   Also check the NULL primary key name is what getpublic returns
   to prevent spoofing.  Also parametrise the name size for reuse
---
drivers/char/tpm/Kconfig |3 +
drivers/char/tpm/Makefile|2 +-
drivers/char/tpm/tpm.h   |   32 +
drivers/char/tpm/tpm2-cmd.c  |   34 +-
drivers/char/tpm/tpm2-sessions.c | 1188 ++
drivers/char/tpm/tpm2-sessions.h |   57 ++
6 files changed, 1300 insertions(+), 16 deletions(-)
create mode 100644 drivers/char/tpm/tpm2-sessions.c
create mode 100644 drivers/char/tpm/tpm2-sessions.h



tpm2-cmd.c changes should be in their own commit e.g.:

"tpm: add own space for in-kernel TPM communication"


diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index 0aee88df98d1..8c714d8550c4 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -8,6 +8,9 @@ menuconfig TCG_TPM
select SECURITYFS
select CRYPTO
select CRYPTO_HASH_INFO
+   select CRYPTO_ECDH
+   select CRYPTO_AES
+   select CRYPTO_CFB
---help---
  If you have a TPM security chip in your system, which
  implements the Trusted Computing Group's specification,
diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 65d165cce530..1b5f6ccbb86d 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -5,7 +5,7 @@
obj-$(CONFIG_TCG_TPM) += tpm.o
tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o \
 tpm-dev-common.o tpmrm-dev.o eventlog/common.o eventlog/tpm1.o \
-eventlog/tpm2.o tpm2-space.o tpm-buf.o
+eventlog/tpm2.o tpm2-space.o tpm-buf.o tpm2-sessions.o
tpm-$(CONFIG_ACPI) += tpm_ppi.o eventlog/acpi.o
tpm-$(CONFIG_EFI) += eventlog/efi.o
tpm-$(CONFIG_OF) += eventlog/of.o
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index d73701e36eba..d39065d9995d 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -42,6 +42,15 @@
#include 
#endif

+/* fixed define for the curve we use which is NIST_P256 */
+#define EC_PT_SZ   32
+
+/*
+ * fixed define for the size of a name.  This is actually HASHALG size
+ * plus 2, so 32 for SHA256
+ */
+#define TPM2_NAME_SIZE 34


Please, remove this definition completely. It only

Re: [PATCH v4 1/7] tpm-buf: create new functions for handling TPM buffers

2018-10-23 Thread Jarkko Sakkinen


On Mon, 22 Oct 2018, James Bottomley wrote:

This separates out the old tpm_buf_... handling functions from static
inlines into tpm.h and makes them their own tpm-buf.c file.  It also
adds handling for tpm2b structures and also incremental pointer
advancing parsers.

Signed-off-by: James Bottomley 


Would also prefer simply "tpm:" tag e.g. (in all of the commits actually
in this series but I remark it just here).

"tpm: create new tpm_buf_functions for parsing and TPM2B"

/Jarkko

Re: [PATCH v4 1/7] tpm-buf: create new functions for handling TPM buffers

2018-10-23 Thread Jarkko Sakkinen


On Mon, 22 Oct 2018, James Bottomley wrote:

This separates out the old tpm_buf_... handling functions from static
inlines into tpm.h and makes them their own tpm-buf.c file.  It also
adds handling for tpm2b structures and also incremental pointer
advancing parsers.


Nitpicking: when my SGX patch set has been reviewed one of the comments
has been that commit messages should be in imperative form e.g.

"Separate out the old tpm_buf handling functions from static inlines
into tpm.h and make them their..."


Signed-off-by: James Bottomley 

---

v2: added this patch to separate out the API changes
v3: added tpm_buf_reset_cmd()
v4: renamed tpm_buf_reset_cmd() to tpm_buf_reset()
---
drivers/char/tpm/Makefile  |   2 +-
drivers/char/tpm/tpm-buf.c | 191 +
drivers/char/tpm/tpm.h |  96 ---
3 files changed, 208 insertions(+), 81 deletions(-)
create mode 100644 drivers/char/tpm/tpm-buf.c

diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile
index 3655258bee73..65d165cce530 100644
--- a/drivers/char/tpm/Makefile
+++ b/drivers/char/tpm/Makefile
@@ -5,7 +5,7 @@
obj-$(CONFIG_TCG_TPM) += tpm.o
tpm-y := tpm-interface.o tpm-dev.o tpm-sysfs.o tpm-chip.o tpm2-cmd.o \
 tpm-dev-common.o tpmrm-dev.o eventlog/common.o eventlog/tpm1.o \
-eventlog/tpm2.o tpm2-space.o
+eventlog/tpm2.o tpm2-space.o tpm-buf.o
tpm-$(CONFIG_ACPI) += tpm_ppi.o eventlog/acpi.o
tpm-$(CONFIG_EFI) += eventlog/efi.o
tpm-$(CONFIG_OF) += eventlog/of.o
diff --git a/drivers/char/tpm/tpm-buf.c b/drivers/char/tpm/tpm-buf.c
new file mode 100644
index ..faa22bb09d99
--- /dev/null
+++ b/drivers/char/tpm/tpm-buf.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Handing for tpm2b structures to facilitate the building of commands
+ */
+
+#include "tpm.h"
+
+#include 
+
+#include 
+
+static int __tpm_buf_init(struct tpm_buf *buf)
+{
+   buf->data_page = alloc_page(GFP_HIGHUSER);
+   if (!buf->data_page)
+   return -ENOMEM;
+
+   buf->flags = 0;
+   buf->data = kmap(buf->data_page);
+
+   return 0;
+}
+
+void tpm_buf_reset(struct tpm_buf *buf, u16 tag, u32 ordinal)
+{
+   struct tpm_input_header *head;
+
+   head = (struct tpm_input_header *) buf->data;
+
+   head->tag = cpu_to_be16(tag);
+   head->length = cpu_to_be32(sizeof(*head));
+   head->ordinal = cpu_to_be32(ordinal);
+}
+EXPORT_SYMBOL_GPL(tpm_buf_reset);
+
+int tpm_buf_init(struct tpm_buf *buf, u16 tag, u32 ordinal)
+{
+   int rc;
+
+   rc = __tpm_buf_init(buf);
+   if (rc)
+   return rc;
+
+   tpm_buf_reset(buf, tag, ordinal);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(tpm_buf_init);
+
+int tpm_buf_init_2b(struct tpm_buf *buf)
+{
+   struct tpm_input_header *head;
+   int rc;
+
+   rc = __tpm_buf_init(buf);
+   if (rc)
+   return rc;
+
+   head = (struct tpm_input_header *) buf->data;
+
+   head->length = cpu_to_be32(sizeof(*head));
+
+   buf->flags = TPM_BUF_2B;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(tpm_buf_init_2b);
+
+void tpm_buf_destroy(struct tpm_buf *buf)
+{
+   kunmap(buf->data_page);
+   __free_page(buf->data_page);
+}
+EXPORT_SYMBOL_GPL(tpm_buf_destroy);
+
+static void *tpm_buf_data(struct tpm_buf *buf)
+{
+   if (buf->flags & TPM_BUF_2B)
+   return buf->data + TPM_HEADER_SIZE;
+   return buf->data;
+}
+
+u32 tpm_buf_length(struct tpm_buf *buf)
+{
+   struct tpm_input_header *head = (struct tpm_input_header *)buf->data;
+   u32 len;
+
+   len = be32_to_cpu(head->length);
+   if (buf->flags & TPM_BUF_2B)
+   len -= sizeof(*head);
+   return len;
+}
+EXPORT_SYMBOL_GPL(tpm_buf_length);
+
+u16 tpm_buf_tag(struct tpm_buf *buf)
+{
+   struct tpm_input_header *head = (struct tpm_input_header *)buf->data;
+
+   return be16_to_cpu(head->tag);
+}
+EXPORT_SYMBOL_GPL(tpm_buf_tag);
+
+void tpm_buf_append(struct tpm_buf *buf,
+   const unsigned char *new_data,
+   unsigned int new_len)
+{
+   struct tpm_input_header *head = (struct tpm_input_header *) buf->data;
+   u32 len = be32_to_cpu(head->length);
+
+   /* Return silently if overflow has already happened. */
+   if (buf->flags & TPM_BUF_OVERFLOW)
+   return;
+
+   if ((len + new_len) > PAGE_SIZE) {
+   WARN(1, "tpm_buf: overflow\n");
+   buf->flags |= TPM_BUF_OVERFLOW;
+   return;
+   }
+
+   memcpy(>data[len], new_data, new_len);
+   head->length = cpu_to_be32(len + new_len);
+}
+EXPORT_SYMBOL_GPL(tpm_buf_append);
+
+void tpm_buf_append_u8(struct tpm_buf *buf, const u8 value)
+{
+   tpm_buf_append(buf, , 1);
+}
+EXPORT_SYMBOL_GPL(tpm_buf_append_u8);
+
+void tpm_buf_append_u16(struct tpm_buf *buf, const u16 value)
+{
+   __be16 value2 = cpu_to_be16(value);
+
+   tpm_buf_append(buf, (u8 *) , 2);
+}

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-23 Thread Ard Biesheuvel

On 23 October 2018 at 04:01, James Bottomley
 wrote:
> On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote:
> [...]
>> > +static void hmac_init(struct shash_desc *desc, u8 *key, int
>> > keylen)
>> > +{
>> > +   u8 pad[SHA256_BLOCK_SIZE];
>> > +   int i;
>> > +
>> > +   desc->tfm = sha256_hash;
>> > +   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
>>
>> I don't think this actually does anything in the shash API
>> implementation, so you can drop this.
>
> OK, I find crypto somewhat hard to follow.  There were bits I had to
> understand, like when I wrote the CFB implementation or when I fixed
> the ECDH scatterlist handling, but I've got to confess, in time
> honoured tradition I simply copied this from EVM crypto without
> actually digging into the code to understand why.
>

Yeah, it is notoriously hard to use, and we should try to improve that.

>>  However, I take it this means that hmac_init() is never called in
>> contexts where sleeping is not allowed? For the relevance of that,
>> please see below.
>
> Right, these routines are always called as an adjunct to a TPM
> operation and TPM operations can sleep, so we must at least have kernel
> thread context.
>
> [...]
>> > +   /* encrypt before HMAC */
>> > +   if (auth->attrs & TPM2_SA_DECRYPT) {
>> > +   struct scatterlist sg[1];
>> > +   u16 len;
>> > +   SKCIPHER_REQUEST_ON_STACK(req, auth->aes);
>> > +
>> > +   skcipher_request_set_tfm(req, auth->aes);
>> > +   skcipher_request_set_callback(req,
>> > CRYPTO_TFM_REQ_MAY_SLEEP,
>> > + NULL, NULL);
>> > +
>>
>> Your crypto_alloc_skcipher() call [further down] does not mask out
>> CRYPTO_ALG_ASYNC, and so it permits asynchronous implementations to
>> be selected. Your generic cfb template only permits synchronous
>> implementations, since it wraps the cipher directly (which is always
>> synchronous). However, we have support in the tree for some
>> accelerators that implement cfb(aes), and those will return
>> -EINPROGRESS when invoking crypto_skcipher_en/decrypt(req), which you
>> are not set up to handle.
>>
>> So the simple solution is to call 'crypto_alloc_skcipher("cfb(aes)",
>> 0, CRYPTO_ALG_ASYNC)' below instead.
>>
>> However, I would prefer it if we permit asynchronous skciphers here.
>> The reason is that, on a given platform, the accelerator may be the
>> only truly time invariant AES implementation that is available, and
>> since we are dealing with the TPM, a bit of paranoia does not hurt.
>> It also makes it easier to implement it as a [time invariant] SIMD
>> based asynchronous skcipher, which are simpler than synchronous ones
>> since they don't require a non-SIMD fallback path for calls from
>> contexts where the SIMD unit may not be used.
>>
>> I have already implemented cfb(aes) for arm64/NEON. I will post the
>> patch after the merge window closes.
>>
>> > +   /* need key and IV */
>> > +   KDFa(auth->session_key, SHA256_DIGEST_SIZE
>> > ++ auth->passphraselen, "CFB", auth->our_nonce,
>> > +auth->tpm_nonce, AES_KEYBYTES +
>> > AES_BLOCK_SIZE,
>> > +auth->scratch);
>> > +   crypto_skcipher_setkey(auth->aes, auth->scratch,
>> > AES_KEYBYTES);
>> > +   len = tpm_get_inc_u16();
>> > +   sg_init_one(sg, p, len);
>> > +   skcipher_request_set_crypt(req, sg, sg, len,
>> > +  auth->scratch +
>> > AES_KEYBYTES);
>> > +   crypto_skcipher_encrypt(req);
>>
>> So please consider replacing this with something like.
>>
>> DECLARE_CRYPTO_WAIT(wait); [further up]
>> skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP,
>>   crypto_req_done, );
>> crypto_wait_req(crypto_skcipher_encrypt(req), );
>
> Sure, I can do this ... the crypto skcipher handling was also cut and
> paste, but I forget where from this time.  So what I think you're
> asking for is below as the incremental diff?  I've tested this out and
> it all works fine in my session testing environment (and on my real
> laptop) ... although since I'm fully sync, I won't really have tested
> the -EINPROGRESS do the wait case.
>

Yes that looks fine. I will try to test this on one of my arm64
systems, but it may take me some time to get around to it.

In any case,

Reviewed-by: Ard Biesheuvel  # crypto API parts

>
> diff --git a/drivers/char/tpm/tpm2-sessions.c 
> b/drivers/char/tpm/tpm2-sessions.c
> index 422c3ec64f8c..bbd3e8580954 100644
> --- a/drivers/char/tpm/tpm2-sessions.c
> +++ b/drivers/char/tpm/tpm2-sessions.c
> @@ -165,7 +165,6 @@ static void hmac_init(struct shash_desc *desc, u8 *key, 
> int keylen)
> int i;
>
> desc->tfm = sha256_hash;
> -   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> crypto_shash_init(desc);
> for (i = 0; i < sizeof(pad); i++) {
>

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-23 Thread James Bottomley

On Mon, 2018-10-22 at 19:19 -0300, Ard Biesheuvel wrote:
[...]
> > +static void hmac_init(struct shash_desc *desc, u8 *key, int
> > keylen)
> > +{
> > +   u8 pad[SHA256_BLOCK_SIZE];
> > +   int i;
> > +
> > +   desc->tfm = sha256_hash;
> > +   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> 
> I don't think this actually does anything in the shash API
> implementation, so you can drop this.

OK, I find crypto somewhat hard to follow.  There were bits I had to
understand, like when I wrote the CFB implementation or when I fixed
the ECDH scatterlist handling, but I've got to confess, in time
honoured tradition I simply copied this from EVM crypto without
actually digging into the code to understand why.

>  However, I take it this means that hmac_init() is never called in
> contexts where sleeping is not allowed? For the relevance of that,
> please see below.

Right, these routines are always called as an adjunct to a TPM
operation and TPM operations can sleep, so we must at least have kernel
thread context.

[...]
> > +   /* encrypt before HMAC */
> > +   if (auth->attrs & TPM2_SA_DECRYPT) {
> > +   struct scatterlist sg[1];
> > +   u16 len;
> > +   SKCIPHER_REQUEST_ON_STACK(req, auth->aes);
> > +
> > +   skcipher_request_set_tfm(req, auth->aes);
> > +   skcipher_request_set_callback(req,
> > CRYPTO_TFM_REQ_MAY_SLEEP,
> > + NULL, NULL);
> > +
> 
> Your crypto_alloc_skcipher() call [further down] does not mask out
> CRYPTO_ALG_ASYNC, and so it permits asynchronous implementations to
> be selected. Your generic cfb template only permits synchronous
> implementations, since it wraps the cipher directly (which is always
> synchronous). However, we have support in the tree for some
> accelerators that implement cfb(aes), and those will return
> -EINPROGRESS when invoking crypto_skcipher_en/decrypt(req), which you
> are not set up to handle.
> 
> So the simple solution is to call 'crypto_alloc_skcipher("cfb(aes)",
> 0, CRYPTO_ALG_ASYNC)' below instead.
> 
> However, I would prefer it if we permit asynchronous skciphers here.
> The reason is that, on a given platform, the accelerator may be the
> only truly time invariant AES implementation that is available, and
> since we are dealing with the TPM, a bit of paranoia does not hurt.
> It also makes it easier to implement it as a [time invariant] SIMD
> based asynchronous skcipher, which are simpler than synchronous ones
> since they don't require a non-SIMD fallback path for calls from
> contexts where the SIMD unit may not be used.
> 
> I have already implemented cfb(aes) for arm64/NEON. I will post the
> patch after the merge window closes.
> 
> > +   /* need key and IV */
> > +   KDFa(auth->session_key, SHA256_DIGEST_SIZE
> > ++ auth->passphraselen, "CFB", auth->our_nonce,
> > +auth->tpm_nonce, AES_KEYBYTES +
> > AES_BLOCK_SIZE,
> > +auth->scratch);
> > +   crypto_skcipher_setkey(auth->aes, auth->scratch,
> > AES_KEYBYTES);
> > +   len = tpm_get_inc_u16();
> > +   sg_init_one(sg, p, len);
> > +   skcipher_request_set_crypt(req, sg, sg, len,
> > +  auth->scratch +
> > AES_KEYBYTES);
> > +   crypto_skcipher_encrypt(req);
> 
> So please consider replacing this with something like.
> 
> DECLARE_CRYPTO_WAIT(wait); [further up]
> skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP,
>   crypto_req_done, );
> crypto_wait_req(crypto_skcipher_encrypt(req), );

Sure, I can do this ... the crypto skcipher handling was also cut and
paste, but I forget where from this time.  So what I think you're
asking for is below as the incremental diff?  I've tested this out and
it all works fine in my session testing environment (and on my real
laptop) ... although since I'm fully sync, I won't really have tested
the -EINPROGRESS do the wait case.

James

---

diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index 422c3ec64f8c..bbd3e8580954 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -165,7 +165,6 @@ static void hmac_init(struct shash_desc *desc, u8 *key, int 
keylen)
int i;
 
desc->tfm = sha256_hash;
-   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
crypto_shash_init(desc);
for (i = 0; i < sizeof(pad); i++) {
if (i < keylen)
@@ -244,7 +243,6 @@ static void KDFe(u8 z[EC_PT_SZ], const char *str, u8 *pt_u, 
u8 *pt_v,
__be32 c = cpu_to_be32(1);
 
desc->tfm = sha256_hash;
-   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
 
crypto_shash_init(desc);
/* counter (BE) */
@@ -445,7 +443,6 @@ void tpm_buf_fill_hmac_session(struct tpm_buf *buf, struct 
tpm2_auth *auth)
auth->ordinal = head->ordinal;

Re: [PATCH v4 2/7] tpm2-sessions: Add full HMAC and encrypt/decrypt session handling

2018-10-22 Thread Ard Biesheuvel

Hi James,

Some comments below on how you are using the crypto API.

On 22 October 2018 at 04:36, James Bottomley
 wrote:
> This code adds true session based HMAC authentication plus parameter
> decryption and response encryption using AES.
>
> The basic design of this code is to segregate all the nasty crypto,
> hash and hmac code into tpm2-sessions.c and export a usable API.
>
> The API first of all starts off by gaining a session with
>
> tpm2_start_auth_session()
>
> Which initiates a session with the TPM and allocates an opaque
> tpm2_auth structure to handle the session parameters.  Then the use is
> simply:
>
> * tpm_buf_append_name() in place of the tpm_buf_append_u32 for the
>   handles
>
> * tpm_buf_append_hmac_session() where tpm2_append_auth() would go
>
> * tpm_buf_fill_hmac_session() called after the entire command buffer
>   is finished but before tpm_transmit_cmd() is called which computes
>   the correct HMAC and places it in the command at the correct
>   location.
>
> Finally, after tpm_transmit_cmd() is called,
> tpm_buf_check_hmac_response() is called to check that the returned
> HMAC matched and collect the new state for the next use of the
> session, if any.
>
> The features of the session is controlled by the session attributes
> set in tpm_buf_append_hmac_session().  If TPM2_SA_CONTINUE_SESSION is
> not specified, the session will be flushed and the tpm2_auth structure
> freed in tpm_buf_check_hmac_response(); otherwise the session may be
> used again.  Parameter encryption is specified by or'ing the flag
> TPM2_SA_DECRYPT and response encryption by or'ing the flag
> TPM2_SA_ENCRYPT.  the various encryptions will be taken care of by
> tpm_buf_fill_hmac_session() and tpm_buf_check_hmac_response()
> respectively.
>
> To get all of this to work securely, the Kernel now needs a primary
> key to encrypt the session salt to, so we derive an EC key from the
> NULL seed and store it in the tpm_chip structure.  We also make sure
> that this seed remains for the kernel by using a kernel space to take
> it out of the TPM when userspace wants to use it.
>
> Signed-off-by: James Bottomley 
>
> ---
>
> v2: Added docbook and improved response check API
> v3: Add readpublic, fix hmac length, add API for close on error
> allow for the hmac session not being first in the sessions
> v4: Make NULL seed template exactly match the SRK ECC template.
> Also check the NULL primary key name is what getpublic returns
> to prevent spoofing.  Also parametrise the name size for reuse
> ---
>  drivers/char/tpm/Kconfig |3 +
>  drivers/char/tpm/Makefile|2 +-
>  drivers/char/tpm/tpm.h   |   32 +
>  drivers/char/tpm/tpm2-cmd.c  |   34 +-
>  drivers/char/tpm/tpm2-sessions.c | 1188 
> ++
>  drivers/char/tpm/tpm2-sessions.h |   57 ++
>  6 files changed, 1300 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/char/tpm/tpm2-sessions.c
>  create mode 100644 drivers/char/tpm/tpm2-sessions.h
>
...
> diff --git a/drivers/char/tpm/tpm2-sessions.c 
> b/drivers/char/tpm/tpm2-sessions.c
> new file mode 100644
> index ..422c3ec64f8c
> --- /dev/null
> +++ b/drivers/char/tpm/tpm2-sessions.c
> @@ -0,0 +1,1188 @@
...
> +/*
> + * this is our static crypto shash.  This is possible because the hash
> + * is multi-threaded and all the state stored in the desc
> + */
> +static struct crypto_shash *sha256_hash;
> +
> +/*
> + * It turns out the crypto hmac(sha256) is hard for us to consume
> + * because it assumes a fixed key and the TPM seems to change the key
> + * on every operation, so we weld the hmac init and final functions in
> + * here to give it the same usage characteristics as a regular hash
> + */
> +static void hmac_init(struct shash_desc *desc, u8 *key, int keylen)
> +{
> +   u8 pad[SHA256_BLOCK_SIZE];
> +   int i;
> +
> +   desc->tfm = sha256_hash;
> +   desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;

I don't think this actually does anything in the shash API
implementation, so you can drop this. However, I take it this means
that hmac_init() is never called in contexts where sleeping is not
allowed? For the relevance of that, please see below.

> +   crypto_shash_init(desc);
> +   for (i = 0; i < sizeof(pad); i++) {
> +   if (i < keylen)
> +   pad[i] = key[i];
> +   else
> +   pad[i] = 0;
> +   pad[i] ^= HMAC_IPAD_VALUE;
> +   }
> +   crypto_shash_update(desc, pad, sizeof(pad));
> +}
> +
> +static void hmac_final(struct shash_desc *desc, u8 *key, int keylen, u8 *out)
> +{
> +   u8 pad[SHA256_BLOCK_SIZE];
> +   int i;
> +
> +   for (i = 0; i < sizeof(pad); i++) {
> +   if (i < keylen)
> +   pad[i] = key[i];
> +   else
> +   pad[i] = 0;
> +   pad[i] ^= HMAC_OPAD_VALUE;
> +   }
> +
> +   /* collect the final hash;  use out as

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread Ard Biesheuvel

On 21 October 2018 at 11:00, James Bottomley
 wrote:
> On October 21, 2018 9:58:04 AM GMT, Ard Biesheuvel 
>  wrote:
>>On 21 October 2018 at 10:07, James Bottomley
>> wrote:
>>> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
 (+ James)
>>>
>>> Thanks!
>>>
 On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
  wrote:
 > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated
>>keystream
 > with
 > IV, rather than with data stream, resulting in incorrect
 > decryption.
 > Test vectors will be added in the next patch.
 >
 > Signed-off-by: Dmitry Eremin-Solenikov 
 > Cc: sta...@vger.kernel.org
 > ---
 >  crypto/cfb.c | 2 +-
 >  1 file changed, 1 insertion(+), 1 deletion(-)
 >
 > diff --git a/crypto/cfb.c b/crypto/cfb.c
 > index a0d68c09e1b9..fd4e8500e121 100644
 > --- a/crypto/cfb.c
 > +++ b/crypto/cfb.c
 > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct
 > skcipher_walk *walk,
 >
 > do {
 > crypto_cfb_encrypt_one(tfm, iv, dst);
 > -   crypto_xor(dst, iv, bsize);
 > +   crypto_xor(dst, src, bsize);
>>>
>>> This does look right.  I think the reason the TPM code works is that
>>it
>>> always does encrypt/decrypt in-place, which is a separate piece of
>>the
>>> code which appears to be correct.
>>>
>>
>>Yeah I figured that.
>>
>>So where is the TPM code that actually uses this code?
>
> It was posted to the integrity list a while ago.  I'm planning a repost  
> shortly.
>

OK, found it. Mind cc'ing me on that repost?

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread James Bottomley

On October 21, 2018 9:58:04 AM GMT, Ard Biesheuvel  
wrote:
>On 21 October 2018 at 10:07, James Bottomley
> wrote:
>> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
>>> (+ James)
>>
>> Thanks!
>>
>>> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
>>>  wrote:
>>> > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated
>keystream
>>> > with
>>> > IV, rather than with data stream, resulting in incorrect
>>> > decryption.
>>> > Test vectors will be added in the next patch.
>>> >
>>> > Signed-off-by: Dmitry Eremin-Solenikov 
>>> > Cc: sta...@vger.kernel.org
>>> > ---
>>> >  crypto/cfb.c | 2 +-
>>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >
>>> > diff --git a/crypto/cfb.c b/crypto/cfb.c
>>> > index a0d68c09e1b9..fd4e8500e121 100644
>>> > --- a/crypto/cfb.c
>>> > +++ b/crypto/cfb.c
>>> > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct
>>> > skcipher_walk *walk,
>>> >
>>> > do {
>>> > crypto_cfb_encrypt_one(tfm, iv, dst);
>>> > -   crypto_xor(dst, iv, bsize);
>>> > +   crypto_xor(dst, src, bsize);
>>
>> This does look right.  I think the reason the TPM code works is that
>it
>> always does encrypt/decrypt in-place, which is a separate piece of
>the
>> code which appears to be correct.
>>
>
>Yeah I figured that.
>
>So where is the TPM code that actually uses this code?

It was posted to the integrity list a while ago.  I'm planning a repost  
shortly.

James


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread Ard Biesheuvel

On 21 October 2018 at 10:07, James Bottomley
 wrote:
> On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
>> (+ James)
>
> Thanks!
>
>> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
>>  wrote:
>> > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream
>> > with
>> > IV, rather than with data stream, resulting in incorrect
>> > decryption.
>> > Test vectors will be added in the next patch.
>> >
>> > Signed-off-by: Dmitry Eremin-Solenikov 
>> > Cc: sta...@vger.kernel.org
>> > ---
>> >  crypto/cfb.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/crypto/cfb.c b/crypto/cfb.c
>> > index a0d68c09e1b9..fd4e8500e121 100644
>> > --- a/crypto/cfb.c
>> > +++ b/crypto/cfb.c
>> > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct
>> > skcipher_walk *walk,
>> >
>> > do {
>> > crypto_cfb_encrypt_one(tfm, iv, dst);
>> > -   crypto_xor(dst, iv, bsize);
>> > +   crypto_xor(dst, src, bsize);
>
> This does look right.  I think the reason the TPM code works is that it
> always does encrypt/decrypt in-place, which is a separate piece of the
> code which appears to be correct.
>

Yeah I figured that.

So where is the TPM code that actually uses this code?

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread James Bottomley

On Sun, 2018-10-21 at 09:05 +0200, Ard Biesheuvel wrote:
> (+ James)

Thanks!

> On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
>  wrote:
> > crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream
> > with
> > IV, rather than with data stream, resulting in incorrect
> > decryption.
> > Test vectors will be added in the next patch.
> > 
> > Signed-off-by: Dmitry Eremin-Solenikov 
> > Cc: sta...@vger.kernel.org
> > ---
> >  crypto/cfb.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/crypto/cfb.c b/crypto/cfb.c
> > index a0d68c09e1b9..fd4e8500e121 100644
> > --- a/crypto/cfb.c
> > +++ b/crypto/cfb.c
> > @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct
> > skcipher_walk *walk,
> > 
> > do {
> > crypto_cfb_encrypt_one(tfm, iv, dst);
> > -   crypto_xor(dst, iv, bsize);
> > +   crypto_xor(dst, src, bsize);

This does look right.  I think the reason the TPM code works is that it
always does encrypt/decrypt in-place, which is a separate piece of the
code which appears to be correct.

James

Re: [PATCH 2/2] crypto: testmgr: add AES-CFB tests

2018-10-21 Thread Ard Biesheuvel

(+ James)

On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
 wrote:
> Add AES128/192/256-CFB testvectors from NIST SP800-38A.
>
> Signed-off-by: Dmitry Eremin-Solenikov 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Eremin-Solenikov 
> ---
>  crypto/tcrypt.c  |  5 
>  crypto/testmgr.c |  7 +
>  crypto/testmgr.h | 76 
>  3 files changed, 88 insertions(+)
>
> diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
> index bdde95e8d369..a6315827d240 100644
> --- a/crypto/tcrypt.c
> +++ b/crypto/tcrypt.c
> @@ -1733,6 +1733,7 @@ static int do_test(const char *alg, u32 type, u32 mask, 
> int m, u32 num_mb)
> ret += tcrypt_test("xts(aes)");
> ret += tcrypt_test("ctr(aes)");
> ret += tcrypt_test("rfc3686(ctr(aes))");
> +   ret += tcrypt_test("cfb(aes)");
> break;
>
> case 11:
> @@ -2059,6 +2060,10 @@ static int do_test(const char *alg, u32 type, u32 
> mask, int m, u32 num_mb)
> speed_template_16_24_32);
> test_cipher_speed("ctr(aes)", DECRYPT, sec, NULL, 0,
> speed_template_16_24_32);
> +   test_cipher_speed("cfb(aes)", ENCRYPT, sec, NULL, 0,
> +   speed_template_16_24_32);
> +   test_cipher_speed("cfb(aes)", DECRYPT, sec, NULL, 0,
> +   speed_template_16_24_32);
> break;
>
> case 201:
> diff --git a/crypto/testmgr.c b/crypto/testmgr.c
> index a1d42245082a..016d61c419fc 100644
> --- a/crypto/testmgr.c
> +++ b/crypto/testmgr.c
> @@ -2684,6 +2684,13 @@ static const struct alg_test_desc alg_test_descs[] = {
> .dec = __VECS(aes_ccm_dec_tv_template)
> }
> }
> +   }, {
> +   .alg = "cfb(aes)",
> +   .test = alg_test_skcipher,
> +   .fips_allowed = 1,
> +   .suite = {
> +   .cipher = __VECS(aes_cfb_tv_template)
> +   },
> }, {
> .alg = "chacha20",
> .test = alg_test_skcipher,
> diff --git a/crypto/testmgr.h b/crypto/testmgr.h
> index 173111c70746..19b6d184c8fb 100644
> --- a/crypto/testmgr.h
> +++ b/crypto/testmgr.h
> @@ -12081,6 +12081,82 @@ static const struct cipher_testvec 
> aes_cbc_tv_template[] = {
> },
>  };
>
> +static const struct cipher_testvec aes_cfb_tv_template[] = {
> +   { /* From NIST SP800-38A */
> +   .key= "\x2b\x7e\x15\x16\x28\xae\xd2\xa6"
> + "\xab\xf7\x15\x88\x09\xcf\x4f\x3c",
> +   .klen   = 16,
> +   .iv = "\x00\x01\x02\x03\x04\x05\x06\x07"
> + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
> +   .ptext  = "\x6b\xc1\xbe\xe2\x2e\x40\x9f\x96"
> + "\xe9\x3d\x7e\x11\x73\x93\x17\x2a"
> + "\xae\x2d\x8a\x57\x1e\x03\xac\x9c"
> + "\x9e\xb7\x6f\xac\x45\xaf\x8e\x51"
> + "\x30\xc8\x1c\x46\xa3\x5c\xe4\x11"
> + "\xe5\xfb\xc1\x19\x1a\x0a\x52\xef"
> + "\xf6\x9f\x24\x45\xdf\x4f\x9b\x17"
> + "\xad\x2b\x41\x7b\xe6\x6c\x37\x10",
> +   .ctext  = "\x3b\x3f\xd9\x2e\xb7\x2d\xad\x20"
> + "\x33\x34\x49\xf8\xe8\x3c\xfb\x4a"
> + "\xc8\xa6\x45\x37\xa0\xb3\xa9\x3f"
> + "\xcd\xe3\xcd\xad\x9f\x1c\xe5\x8b"
> + "\x26\x75\x1f\x67\xa3\xcb\xb1\x40"
> + "\xb1\x80\x8c\xf1\x87\xa4\xf4\xdf"
> + "\xc0\x4b\x05\x35\x7c\x5d\x1c\x0e"
> + "\xea\xc4\xc6\x6f\x9f\xf7\xf2\xe6",
> +   .len= 64,
> +   }, {
> +   .key= "\x8e\x73\xb0\xf7\xda\x0e\x64\x52"
> + "\xc8\x10\xf3\x2b\x80\x90\x79\xe5"
> + "\x62\xf8\xea\xd2\x52\x2c\x6b\x7b",
> +   .klen   = 24,
> +   .iv = "\x00\x01\x02\x03\x04\x05\x06\x07"
> + "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
> +   .ptext  = "\x6b\xc1\xbe\xe2\x2e\x40\x9f\x96"
> + "\xe9\x3d\x7e\x11\x73\x93\x17\x2a"
> + "\xae\x2d\x8a\x57\x1e\x03\xac\x9c"
> + "\x9e\xb7\x6f\xac\x45\xaf\x8e\x51"
> + "\x30\xc8\x1c\x46\xa3\x5c\xe4\x11"
> + "\xe5\xfb\xc1\x19\x1a\x0a\x52\xef"
> + "\xf6\x9f\x24\x45\xdf\x4f\x9b\x17"
> + "\xad\x2b\x41\x7b\xe6\x6c\x37\x10",
> +   .ctext  = "\xcd\xc8\x0d\x6f\xdd\xf1\x8c\xab"
> + "\x34\xc2\x59\x09\xc9\x9a\x41\x74"
> + "\x67\xce\x7f\x7f\x81\x17\x36\x21"
> +

RE: [PATCH 0/3] crypto: ccree: add SM3 support

2018-10-21 Thread yael.chemla

Hi Herbert, 
I'm sorry for notifying just now, but this patch set should be applies on top of
"crypto: ccree: add CryptoCell 713 baseline support" patch set by Gilad 
Ben-Yossef
Hence failures reported by kbuild test robot .

> -Original Message-
> From: Gilad Ben-Yossef 
> Sent: Thursday, 18 October 2018 16:26
> To: yaeceh01 
> Cc: Herbert Xu ; David Miller
> ; Linux Crypto Mailing List  cry...@vger.kernel.org>; Ofir Drang ; Linux kernel
> mailing list 
> Subject: Re: [PATCH 0/3] crypto: ccree: add SM3 support
> 
> On Thu, Oct 18, 2018 at 4:00 PM Yael Chemla 
> wrote:
> >
> > Add support for SM3 in CryptoCell 713.
> >
> > Yael Chemla (3):
> >   crypto: ccree: adjust hash length to suit certain context specifics
> >   crypto: ccree:  modify set_cipher_mode usage from cc_hash
> >   crypto: ccree: add SM3 support
> >
> >  drivers/crypto/Kconfig  |   1 +
> >  drivers/crypto/ccree/cc_aead.c  |  19 +++-
> >  drivers/crypto/ccree/cc_crypto_ctx.h|   4 +-
> >  drivers/crypto/ccree/cc_driver.c|  10 +-
> >  drivers/crypto/ccree/cc_driver.h|   2 +-
> >  drivers/crypto/ccree/cc_hash.c  | 175 +++--
> ---
> >  drivers/crypto/ccree/cc_hw_queue_defs.h |  27 +
> >  7 files changed, 182 insertions(+), 56 deletions(-)
> >
> > --
> > 2.7.4
> >
> 
> Herbert, these go on top of my own previous patch set that adds the
> CryptoCell 713 and its SM4 cipher support.
> 
> Acked-by: Gilad Ben-Yossef 
> 
> Thanks,
> Gilad
> 
> --
> Gilad Ben-Yossef
> Chief Coffee Drinker
> 
> values of β will give rise to dom!

Re: [PATCH 1/2] crypto: fix cfb mode decryption

2018-10-21 Thread Ard Biesheuvel

(+ James)

On 20 October 2018 at 01:01, Dmitry Eremin-Solenikov
 wrote:
> crypto_cfb_decrypt_segment() incorrectly XOR'ed generated keystream with
> IV, rather than with data stream, resulting in incorrect decryption.
> Test vectors will be added in the next patch.
>
> Signed-off-by: Dmitry Eremin-Solenikov 
> Cc: sta...@vger.kernel.org
> ---
>  crypto/cfb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/crypto/cfb.c b/crypto/cfb.c
> index a0d68c09e1b9..fd4e8500e121 100644
> --- a/crypto/cfb.c
> +++ b/crypto/cfb.c
> @@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct 
> skcipher_walk *walk,
>
> do {
> crypto_cfb_encrypt_one(tfm, iv, dst);
> -   crypto_xor(dst, iv, bsize);
> +   crypto_xor(dst, src, bsize);
> iv = src;
>
> src += bsize;
> --
> 2.19.1
>

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-19 Thread Ard Biesheuvel

On 20 October 2018 at 04:39, Eric Biggers  wrote:
> On Fri, Oct 19, 2018 at 05:54:12PM +0800, Ard Biesheuvel wrote:
>> On 19 October 2018 at 13:41, Ard Biesheuvel  
>> wrote:
>> > On 18 October 2018 at 12:37, Eric Biggers  wrote:
>> >> From: Eric Biggers 
>> >>
>> >> Make the ARM scalar AES implementation closer to constant-time by
>> >> disabling interrupts and prefetching the tables into L1 cache.  This is
>> >> feasible because due to ARM's "free" rotations, the main tables are only
>> >> 1024 bytes instead of the usual 4096 used by most AES implementations.
>> >>
>> >> On ARM Cortex-A7, the speed loss is only about 5%.  The resulting code
>> >> is still over twice as fast as aes_ti.c.  Responsiveness is potentially
>> >> a concern, but interrupts are only disabled for a single AES block.
>> >>
>> >
>> > So that would be in the order of 700 cycles, based on the numbers you
>> > shared in v1 of the aes_ti.c patch. Does that sound about right? So
>> > that would be around 1 microsecond, which is really not a number to
>> > obsess about imo.
>> >
>> > I considered another option, which is to detect whether an interrupt
>> > has been taken (by writing some canary value below that stack pointer
>> > in the location where the exception handler will preserve the value of
>> > sp, and checking at the end whether it has been modified) and doing a
>> > usleep_range(x, y) if that is the case.
>> >
>> > But this is much simpler so let's only go there if we must.
>> >
>>
>> I played around a bit and implemented it for discussion purposes, but
>> restarting the operation if it gets interrupted, as suggested in the
>> paper (whitespace corruption courtesy of Gmail)
>>
>>
>> diff --git a/arch/arm/crypto/aes-cipher-core.S
>> b/arch/arm/crypto/aes-cipher-core.S
>> index 184d6c2d15d5..2e8a84a47784 100644
>> --- a/arch/arm/crypto/aes-cipher-core.S
>> +++ b/arch/arm/crypto/aes-cipher-core.S
>> @@ -10,6 +10,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>  #include 
>>
>>   .text
>> @@ -139,6 +140,34 @@
>>
>>   __adrl ttab, \ttab
>>
>> + /*
>> + * Set a canary that will allow us to tell whether any
>> + * interrupts were taken while this function was executing.
>> + * The zero value will be overwritten with the process counter
>> + * value at the point where the IRQ exception is taken.
>> + */
>> + mov t0, #0
>> + str t0, [sp, #-(SVC_REGS_SIZE - S_PC)]
>> +
>> + /*
>> + * Prefetch the 1024-byte 'ft' or 'it' table into L1 cache,
>> + * assuming cacheline size >= 32.  This is a hardening measure
>> + * intended to make cache-timing attacks more difficult.
>> + * They may not be fully prevented, however; see the paper
>> + * https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
>> + * ("Cache-timing attacks on AES") for a discussion of the many
>> + * difficulties involved in writing truly constant-time AES
>> + * software.
>> + */
>> + .set i, 0
>> + .rept 1024 / 128
>> + ldr r8, [ttab, #i + 0]
>> + ldr r9, [ttab, #i + 32]
>> + ldr r10, [ttab, #i + 64]
>> + ldr r11, [ttab, #i + 96]
>> + .set i, i + 128
>> + .endr
>> +
>>   tst rounds, #2
>>   bne 1f
>>
>> @@ -154,6 +183,8 @@
>>  2: __adrl ttab, \ltab
>>   \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
>>
>> + ldr r0, [sp, #-(SVC_REGS_SIZE - S_PC)] // check canary
>> +
>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>   __rev r4, r4
>>   __rev r5, r5
>> diff --git a/arch/arm/crypto/aes-cipher-glue.c
>> b/arch/arm/crypto/aes-cipher-glue.c
>> index c222f6e072ad..de8f32121511 100644
>> --- a/arch/arm/crypto/aes-cipher-glue.c
>> +++ b/arch/arm/crypto/aes-cipher-glue.c
>> @@ -11,28 +11,39 @@
>>
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>
>> -asmlinkage void __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 
>> *out);
>> +asmlinkage int __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 
>> *out);
>>  EXPORT_SYMBOL(__aes_arm_encrypt);
>>
>> -asmlinkage void __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 
>> *out);
>> +asmlinkage int __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 
>> *out);
>>  EXPORT_SYMBOL(__aes_arm_decrypt);
>>
>>  static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>>  {
>>   struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>>   int rounds = 6 + ctx->key_length / 4;
>> + u8 buf[AES_BLOCK_SIZE];
>>
>> - __aes_arm_encrypt(ctx->key_enc, rounds, in, out);
>> + if (out == in)
>> +   in = memcpy(buf, in, AES_BLOCK_SIZE);
>> +
>> + while (unlikely(__aes_arm_encrypt(ctx->key_enc, rounds, in, out)))
>> +   cpu_relax();
>>  }
>>
>>  static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>>  {
>>   struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>>   int rounds = 6 + ctx->key_length / 4;
>> + u8 buf[AES_BLOCK_SIZE];
>> +
>> + if (out == in)
>> +   in = memcpy(buf, in, AES_BLOCK_SIZE);
>>
>> - __aes_arm_decrypt(ctx->key_dec, rounds, in, out);
>> + while (unlikely(__aes_arm_decrypt(ctx->key_dec, rounds, in, out)))
>> +   cpu_relax();
>>  }
>>
>>  static struct crypto_alg aes_alg = {
>
> It's an interesting

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-19 Thread Eric Biggers

On Fri, Oct 19, 2018 at 05:54:12PM +0800, Ard Biesheuvel wrote:
> On 19 October 2018 at 13:41, Ard Biesheuvel  wrote:
> > On 18 October 2018 at 12:37, Eric Biggers  wrote:
> >> From: Eric Biggers 
> >>
> >> Make the ARM scalar AES implementation closer to constant-time by
> >> disabling interrupts and prefetching the tables into L1 cache.  This is
> >> feasible because due to ARM's "free" rotations, the main tables are only
> >> 1024 bytes instead of the usual 4096 used by most AES implementations.
> >>
> >> On ARM Cortex-A7, the speed loss is only about 5%.  The resulting code
> >> is still over twice as fast as aes_ti.c.  Responsiveness is potentially
> >> a concern, but interrupts are only disabled for a single AES block.
> >>
> >
> > So that would be in the order of 700 cycles, based on the numbers you
> > shared in v1 of the aes_ti.c patch. Does that sound about right? So
> > that would be around 1 microsecond, which is really not a number to
> > obsess about imo.
> >
> > I considered another option, which is to detect whether an interrupt
> > has been taken (by writing some canary value below that stack pointer
> > in the location where the exception handler will preserve the value of
> > sp, and checking at the end whether it has been modified) and doing a
> > usleep_range(x, y) if that is the case.
> >
> > But this is much simpler so let's only go there if we must.
> >
> 
> I played around a bit and implemented it for discussion purposes, but
> restarting the operation if it gets interrupted, as suggested in the
> paper (whitespace corruption courtesy of Gmail)
> 
> 
> diff --git a/arch/arm/crypto/aes-cipher-core.S
> b/arch/arm/crypto/aes-cipher-core.S
> index 184d6c2d15d5..2e8a84a47784 100644
> --- a/arch/arm/crypto/aes-cipher-core.S
> +++ b/arch/arm/crypto/aes-cipher-core.S
> @@ -10,6 +10,7 @@
>   */
> 
>  #include 
> +#include 
>  #include 
> 
>   .text
> @@ -139,6 +140,34 @@
> 
>   __adrl ttab, \ttab
> 
> + /*
> + * Set a canary that will allow us to tell whether any
> + * interrupts were taken while this function was executing.
> + * The zero value will be overwritten with the process counter
> + * value at the point where the IRQ exception is taken.
> + */
> + mov t0, #0
> + str t0, [sp, #-(SVC_REGS_SIZE - S_PC)]
> +
> + /*
> + * Prefetch the 1024-byte 'ft' or 'it' table into L1 cache,
> + * assuming cacheline size >= 32.  This is a hardening measure
> + * intended to make cache-timing attacks more difficult.
> + * They may not be fully prevented, however; see the paper
> + * https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
> + * ("Cache-timing attacks on AES") for a discussion of the many
> + * difficulties involved in writing truly constant-time AES
> + * software.
> + */
> + .set i, 0
> + .rept 1024 / 128
> + ldr r8, [ttab, #i + 0]
> + ldr r9, [ttab, #i + 32]
> + ldr r10, [ttab, #i + 64]
> + ldr r11, [ttab, #i + 96]
> + .set i, i + 128
> + .endr
> +
>   tst rounds, #2
>   bne 1f
> 
> @@ -154,6 +183,8 @@
>  2: __adrl ttab, \ltab
>   \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
> 
> + ldr r0, [sp, #-(SVC_REGS_SIZE - S_PC)] // check canary
> +
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   __rev r4, r4
>   __rev r5, r5
> diff --git a/arch/arm/crypto/aes-cipher-glue.c
> b/arch/arm/crypto/aes-cipher-glue.c
> index c222f6e072ad..de8f32121511 100644
> --- a/arch/arm/crypto/aes-cipher-glue.c
> +++ b/arch/arm/crypto/aes-cipher-glue.c
> @@ -11,28 +11,39 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
> -asmlinkage void __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 
> *out);
> +asmlinkage int __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 *out);
>  EXPORT_SYMBOL(__aes_arm_encrypt);
> 
> -asmlinkage void __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 
> *out);
> +asmlinkage int __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 *out);
>  EXPORT_SYMBOL(__aes_arm_decrypt);
> 
>  static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
>   struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>   int rounds = 6 + ctx->key_length / 4;
> + u8 buf[AES_BLOCK_SIZE];
> 
> - __aes_arm_encrypt(ctx->key_enc, rounds, in, out);
> + if (out == in)
> +   in = memcpy(buf, in, AES_BLOCK_SIZE);
> +
> + while (unlikely(__aes_arm_encrypt(ctx->key_enc, rounds, in, out)))
> +   cpu_relax();
>  }
> 
>  static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
>   struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>   int rounds = 6 + ctx->key_length / 4;
> + u8 buf[AES_BLOCK_SIZE];
> +
> + if (out == in)
> +   in = memcpy(buf, in, AES_BLOCK_SIZE);
> 
> - __aes_arm_decrypt(ctx->key_dec, rounds, in, out);
> + while (unlikely(__aes_arm_decrypt(ctx->key_dec, rounds, in, out)))
> +   cpu_relax();
>  }
> 
>  static struct crypto_alg aes_alg = {

It's an interesting idea, but the main thing I don't like about this is that the
time it takes to do the encryption/decryption is unbounded, since it could get
livelocked with a high rate of interrupts.  To fix

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-19 Thread Eric Biggers

On Fri, Oct 19, 2018 at 01:41:35PM +0800, Ard Biesheuvel wrote:
> On 18 October 2018 at 12:37, Eric Biggers  wrote:
> > From: Eric Biggers 
> >
> > Make the ARM scalar AES implementation closer to constant-time by
> > disabling interrupts and prefetching the tables into L1 cache.  This is
> > feasible because due to ARM's "free" rotations, the main tables are only
> > 1024 bytes instead of the usual 4096 used by most AES implementations.
> >
> > On ARM Cortex-A7, the speed loss is only about 5%.  The resulting code
> > is still over twice as fast as aes_ti.c.  Responsiveness is potentially
> > a concern, but interrupts are only disabled for a single AES block.
> >
> 
> So that would be in the order of 700 cycles, based on the numbers you
> shared in v1 of the aes_ti.c patch. Does that sound about right? So
> that would be around 1 microsecond, which is really not a number to
> obsess about imo.
> 

Correct, on ARM Cortex-A7 I'm seeing slightly over 700 cycles per block
encrypted or decrypted, including the prefetching.

- Eric

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-19 Thread Ard Biesheuvel

On 19 October 2018 at 13:41, Ard Biesheuvel  wrote:
> On 18 October 2018 at 12:37, Eric Biggers  wrote:
>> From: Eric Biggers 
>>
>> Make the ARM scalar AES implementation closer to constant-time by
>> disabling interrupts and prefetching the tables into L1 cache.  This is
>> feasible because due to ARM's "free" rotations, the main tables are only
>> 1024 bytes instead of the usual 4096 used by most AES implementations.
>>
>> On ARM Cortex-A7, the speed loss is only about 5%.  The resulting code
>> is still over twice as fast as aes_ti.c.  Responsiveness is potentially
>> a concern, but interrupts are only disabled for a single AES block.
>>
>
> So that would be in the order of 700 cycles, based on the numbers you
> shared in v1 of the aes_ti.c patch. Does that sound about right? So
> that would be around 1 microsecond, which is really not a number to
> obsess about imo.
>
> I considered another option, which is to detect whether an interrupt
> has been taken (by writing some canary value below that stack pointer
> in the location where the exception handler will preserve the value of
> sp, and checking at the end whether it has been modified) and doing a
> usleep_range(x, y) if that is the case.
>
> But this is much simpler so let's only go there if we must.
>

I played around a bit and implemented it for discussion purposes, but
restarting the operation if it gets interrupted, as suggested in the
paper (whitespace corruption courtesy of Gmail)


diff --git a/arch/arm/crypto/aes-cipher-core.S
b/arch/arm/crypto/aes-cipher-core.S
index 184d6c2d15d5..2e8a84a47784 100644
--- a/arch/arm/crypto/aes-cipher-core.S
+++ b/arch/arm/crypto/aes-cipher-core.S
@@ -10,6 +10,7 @@
  */

 #include 
+#include 
 #include 

  .text
@@ -139,6 +140,34 @@

  __adrl ttab, \ttab

+ /*
+ * Set a canary that will allow us to tell whether any
+ * interrupts were taken while this function was executing.
+ * The zero value will be overwritten with the process counter
+ * value at the point where the IRQ exception is taken.
+ */
+ mov t0, #0
+ str t0, [sp, #-(SVC_REGS_SIZE - S_PC)]
+
+ /*
+ * Prefetch the 1024-byte 'ft' or 'it' table into L1 cache,
+ * assuming cacheline size >= 32.  This is a hardening measure
+ * intended to make cache-timing attacks more difficult.
+ * They may not be fully prevented, however; see the paper
+ * https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
+ * ("Cache-timing attacks on AES") for a discussion of the many
+ * difficulties involved in writing truly constant-time AES
+ * software.
+ */
+ .set i, 0
+ .rept 1024 / 128
+ ldr r8, [ttab, #i + 0]
+ ldr r9, [ttab, #i + 32]
+ ldr r10, [ttab, #i + 64]
+ ldr r11, [ttab, #i + 96]
+ .set i, i + 128
+ .endr
+
  tst rounds, #2
  bne 1f

@@ -154,6 +183,8 @@
 2: __adrl ttab, \ltab
  \round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b

+ ldr r0, [sp, #-(SVC_REGS_SIZE - S_PC)] // check canary
+
 #ifdef CONFIG_CPU_BIG_ENDIAN
  __rev r4, r4
  __rev r5, r5
diff --git a/arch/arm/crypto/aes-cipher-glue.c
b/arch/arm/crypto/aes-cipher-glue.c
index c222f6e072ad..de8f32121511 100644
--- a/arch/arm/crypto/aes-cipher-glue.c
+++ b/arch/arm/crypto/aes-cipher-glue.c
@@ -11,28 +11,39 @@

 #include 
 #include 
+#include 
 #include 

-asmlinkage void __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 *out);
+asmlinkage int __aes_arm_encrypt(u32 *rk, int rounds, const u8 *in, u8 *out);
 EXPORT_SYMBOL(__aes_arm_encrypt);

-asmlinkage void __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 *out);
+asmlinkage int __aes_arm_decrypt(u32 *rk, int rounds, const u8 *in, u8 *out);
 EXPORT_SYMBOL(__aes_arm_decrypt);

 static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
  struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
  int rounds = 6 + ctx->key_length / 4;
+ u8 buf[AES_BLOCK_SIZE];

- __aes_arm_encrypt(ctx->key_enc, rounds, in, out);
+ if (out == in)
+   in = memcpy(buf, in, AES_BLOCK_SIZE);
+
+ while (unlikely(__aes_arm_encrypt(ctx->key_enc, rounds, in, out)))
+   cpu_relax();
 }

 static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
  struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
  int rounds = 6 + ctx->key_length / 4;
+ u8 buf[AES_BLOCK_SIZE];
+
+ if (out == in)
+   in = memcpy(buf, in, AES_BLOCK_SIZE);

- __aes_arm_decrypt(ctx->key_dec, rounds, in, out);
+ while (unlikely(__aes_arm_decrypt(ctx->key_dec, rounds, in, out)))
+   cpu_relax();
 }

 static struct crypto_alg aes_alg = {

Re: [PATCH v3 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-18 Thread Ard Biesheuvel

On 18 October 2018 at 12:37, Eric Biggers  wrote:
> From: Eric Biggers 
>
> Make the ARM scalar AES implementation closer to constant-time by
> disabling interrupts and prefetching the tables into L1 cache.  This is
> feasible because due to ARM's "free" rotations, the main tables are only
> 1024 bytes instead of the usual 4096 used by most AES implementations.
>
> On ARM Cortex-A7, the speed loss is only about 5%.  The resulting code
> is still over twice as fast as aes_ti.c.  Responsiveness is potentially
> a concern, but interrupts are only disabled for a single AES block.
>

So that would be in the order of 700 cycles, based on the numbers you
shared in v1 of the aes_ti.c patch. Does that sound about right? So
that would be around 1 microsecond, which is really not a number to
obsess about imo.

I considered another option, which is to detect whether an interrupt
has been taken (by writing some canary value below that stack pointer
in the location where the exception handler will preserve the value of
sp, and checking at the end whether it has been modified) and doing a
usleep_range(x, y) if that is the case.

But this is much simpler so let's only go there if we must.

> Note that even after these changes, the implementation still isn't
> necessarily guaranteed to be constant-time; see
> https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
> of the many difficulties involved in writing truly constant-time AES
> software.  But it's valuable to make such attacks more difficult.
>
> Much of this patch is based on patches suggested by Ard Biesheuvel.
>
> Suggested-by: Ard Biesheuvel 
> Signed-off-by: Eric Biggers 

Reviewed-by: Ard Biesheuvel 

> ---
>  arch/arm/crypto/Kconfig   |  9 +
>  arch/arm/crypto/aes-cipher-core.S | 62 ++-
>  crypto/aes_generic.c  |  9 +++--
>  3 files changed, 66 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
> index ef0c7feea6e29..0473a8f683896 100644
> --- a/arch/arm/crypto/Kconfig
> +++ b/arch/arm/crypto/Kconfig
> @@ -69,6 +69,15 @@ config CRYPTO_AES_ARM
> help
>   Use optimized AES assembler routines for ARM platforms.
>
> + On ARM processors without the Crypto Extensions, this is the
> + fastest AES implementation for single blocks.  For multiple
> + blocks, the NEON bit-sliced implementation is usually faster.
> +
> + This implementation may be vulnerable to cache timing attacks,
> + since it uses lookup tables.  However, as countermeasures it
> + disables IRQs and preloads the tables; it is hoped this makes
> + such attacks very difficult.
> +
>  config CRYPTO_AES_ARM_BS
> tristate "Bit sliced AES using NEON instructions"
> depends on KERNEL_MODE_NEON
> diff --git a/arch/arm/crypto/aes-cipher-core.S 
> b/arch/arm/crypto/aes-cipher-core.S
> index 184d6c2d15d5e..f2d67c095e596 100644
> --- a/arch/arm/crypto/aes-cipher-core.S
> +++ b/arch/arm/crypto/aes-cipher-core.S
> @@ -10,6 +10,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>
> .text
> @@ -41,7 +42,7 @@
> .endif
> .endm
>
> -   .macro  __hround, out0, out1, in0, in1, in2, in3, t3, t4, 
> enc, sz, op
> +   .macro  __hround, out0, out1, in0, in1, in2, in3, t3, t4, 
> enc, sz, op, oldcpsr
> __select\out0, \in0, 0
> __selectt0, \in1, 1
> __load  \out0, \out0, 0, \sz, \op
> @@ -73,6 +74,14 @@
> __load  t0, t0, 3, \sz, \op
> __load  \t4, \t4, 3, \sz, \op
>
> +   .ifnb   \oldcpsr
> +   /*
> +* This is the final round and we're done with all data-dependent 
> table
> +* lookups, so we can safely re-enable interrupts.
> +*/
> +   restore_irqs\oldcpsr
> +   .endif
> +
> eor \out1, \out1, t1, ror #24
> eor \out0, \out0, t2, ror #16
> ldm rk!, {t1, t2}
> @@ -83,14 +92,14 @@
> eor \out1, \out1, t2
> .endm
>
> -   .macro  fround, out0, out1, out2, out3, in0, in1, in2, in3, 
> sz=2, op
> +   .macro  fround, out0, out1, out2, out3, in0, in1, in2, in3, 
> sz=2, op, oldcpsr
> __hround\out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 
> 1, \sz, \op
> -   __hround\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, 
> \sz, \op
> +   __hround\out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, 
> \sz, \op, \oldcpsr
> .endm
>
> -   .macro  iround, out0,

Re: [PATCH v2 1/2] crypto: aes_ti - disable interrupts while accessing S-box

2018-10-17 Thread Ard Biesheuvel

Hi Eric,

On 17 October 2018 at 14:18, Eric Biggers  wrote:
> From: Eric Biggers 
>
> In the "aes-fixed-time" AES implementation, disable interrupts while
> accessing the S-box, in order to make cache-timing attacks more
> difficult.  Previously it was possible for the CPU to be interrupted
> while the S-box was loaded into L1 cache, potentially evicting the
> cachelines and causing later table lookups to be time-variant.
>
> In tests I did on x86 and ARM, this doesn't affect performance
> significantly.  Responsiveness is potentially a concern, but interrupts
> are only disabled for a single AES block.
>
> Note that even after this change, the implementation still isn't
> necessarily guaranteed to be constant-time; see
> https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
> of the many difficulties involved in writing truly constant-time AES
> software.  But it's valuable to make such attacks more difficult.
>
> Signed-off-by: Eric Biggers 

Thanks for taking a look. Could we add something to the Kconfig blurb
that mentions that it runs the algorithm witn interrupts disabled? In
any case,

Reviewed-by: Ard Biesheuvel 


> ---
>  crypto/aes_ti.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c
> index 03023b2290e8..1ff9785b30f5 100644
> --- a/crypto/aes_ti.c
> +++ b/crypto/aes_ti.c
> @@ -269,6 +269,7 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> const u32 *rkp = ctx->key_enc + 4;
> int rounds = 6 + ctx->key_length / 4;
> u32 st0[4], st1[4];
> +   unsigned long flags;
> int round;
>
> st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in);
> @@ -276,6 +277,12 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8);
> st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in + 12);
>
> +   /*
> +* Temporarily disable interrupts to avoid races where cachelines are
> +* evicted when the CPU is interrupted to do something else.
> +*/
> +   local_irq_save(flags);
> +
> st0[0] ^= __aesti_sbox[ 0] ^ __aesti_sbox[128];
> st0[1] ^= __aesti_sbox[32] ^ __aesti_sbox[160];
> st0[2] ^= __aesti_sbox[64] ^ __aesti_sbox[192];
> @@ -300,6 +307,8 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> put_unaligned_le32(subshift(st1, 1) ^ rkp[5], out + 4);
> put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8);
> put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12);
> +
> +   local_irq_restore(flags);
>  }
>
>  static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> @@ -308,6 +317,7 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> const u32 *rkp = ctx->key_dec + 4;
> int rounds = 6 + ctx->key_length / 4;
> u32 st0[4], st1[4];
> +   unsigned long flags;
> int round;
>
> st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in);
> @@ -315,6 +325,12 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> st0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8);
> st0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12);
>
> +   /*
> +* Temporarily disable interrupts to avoid races where cachelines are
> +* evicted when the CPU is interrupted to do something else.
> +*/
> +   local_irq_save(flags);
> +
> st0[0] ^= __aesti_inv_sbox[ 0] ^ __aesti_inv_sbox[128];
> st0[1] ^= __aesti_inv_sbox[32] ^ __aesti_inv_sbox[160];
> st0[2] ^= __aesti_inv_sbox[64] ^ __aesti_inv_sbox[192];
> @@ -339,6 +355,8 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[5], out + 4);
> put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8);
> put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12);
> +
> +   local_irq_restore(flags);
>  }
>
>  static struct crypto_alg aes_alg = {
> --
> 2.19.1
>

Re: [PATCH v2 2/2] crypto: arm/aes - add some hardening against cache-timing attacks

2018-10-17 Thread Ard Biesheuvel

Hi Eric,

Thanks for looking into this.

On 17 October 2018 at 14:18, Eric Biggers  wrote:
> From: Eric Biggers 
>
> Make the ARM scalar AES implementation closer to constant-time by
> disabling interrupts and prefetching the tables into L1 cache.  This is
> feasible because due to ARM's "free" rotations, the main tables are only
> 1024 bytes instead of the usual 4096 used by most AES implementations.
>
> On ARM Cortex-A7, the speed loss is only about 5%.  The resulting
> implementation is still over twice as fast as aes_ti.c.
>
> Note that even after these changes, the implementation still isn't
> necessarily guaranteed to be constant-time; see
> https://cr.yp.to/antiforgery/cachetiming-20050414.pdf for a discussion
> of the many difficulties involved in writing truly constant-time AES
> software.  But it's valuable to make such attacks more difficult.
>
> Suggested-by: Ard Biesheuvel 
> Signed-off-by: Eric Biggers 
> ---
>  arch/arm/crypto/aes-cipher-core.S | 26 ++
>  arch/arm/crypto/aes-cipher-glue.c | 13 +
>  crypto/aes_generic.c  |  9 +
>  3 files changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/crypto/aes-cipher-core.S 
> b/arch/arm/crypto/aes-cipher-core.S
> index 184d6c2d15d5..ba9d4aefe585 100644
> --- a/arch/arm/crypto/aes-cipher-core.S
> +++ b/arch/arm/crypto/aes-cipher-core.S
> @@ -138,6 +138,23 @@
> eor r7, r7, r11
>
> __adrl  ttab, \ttab
> +   /*
> +* Prefetch the 1024-byte 'ft' or 'it' table into L1 cache, assuming
> +* cacheline size >= 32.  This, along with the caller disabling
> +* interrupts, is a hardening measure intended to make cache-timing
> +* attacks more difficult.  They may not be fully prevented, however;
> +* see the paper https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
> +* ("Cache-timing attacks on AES") for a discussion of the many
> +* difficulties involved in writing truly constant-time AES software.
> +*/
> +   .set i, 0
> +.rept 1024 / 128
> +   ldr r8, [ttab, #i + 0]
> +   ldr r9, [ttab, #i + 32]
> +   ldr r10, [ttab, #i + 64]
> +   ldr r11, [ttab, #i + 96]
> +   .set i, i + 128
> +.endr
>

Mind sticking a bit more to the coding style of the file? I.e., indent
the gas directives with the code, and putting two tabs after them

> tst rounds, #2
> bne 1f
> @@ -152,6 +169,15 @@
> b   0b
>
>  2: __adrl  ttab, \ltab
> +.if \bsz == 0
> +   /* Prefetch the 256-byte inverse S-box; see explanation above */
> +   .set i, 0
> +.rept 256 / 64
> +   ldr t0, [ttab, #i + 0]
> +   ldr t1, [ttab, #i + 32]
> +   .set i, i + 64
> +.endr
> +.endif
> \round  r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
>
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> diff --git a/arch/arm/crypto/aes-cipher-glue.c 
> b/arch/arm/crypto/aes-cipher-glue.c
> index c222f6e072ad..f40e35eb22e4 100644
> --- a/arch/arm/crypto/aes-cipher-glue.c
> +++ b/arch/arm/crypto/aes-cipher-glue.c
> @@ -23,16 +23,29 @@ static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, 
> const u8 *in)
>  {
> struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> int rounds = 6 + ctx->key_length / 4;
> +   unsigned long flags;
>
> +   /*
> +* This AES implementation prefetches the lookup table into L1 cache 
> to
> +* try to make timing attacks on the table lookups more difficult.
> +* Temporarily disable interrupts to avoid races where cachelines are
> +* evicted when the CPU is interrupted to do something else.
> +*/
> +   local_irq_save(flags);
> __aes_arm_encrypt(ctx->key_enc, rounds, in, out);
> +   local_irq_restore(flags);
>  }
>

Disabling interrupts like that is going to raise some eyebrows, so I'd
prefer it if we can reduce the scope of the IRQ blackout as much as we
can. This means we should move the IRQ en/disable into the .S file,
and only let it cover the parts of the code where we are actually
doing table lookups that are indexed by the input.

>  static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
> struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
> int rounds = 6 + ctx->key_length / 4;
> +   unsigned long flags;
>
> +   /* Disable interrupts to help mitigate timing attacks, see above */
> +   local_irq_save(flags);
> __aes_arm_decrypt(ctx->key_dec, rounds, in, out);
> +   local_irq_restore(flags);
>  }
>
>  static struct crypto_alg aes_alg = {
> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
> index ca554d57d01e..13df33aca463 100644
> --- a/crypto/aes_generic.c
> +++ b/crypto/aes_generic.c
> @@ -63,7 +63,8 @@ static inline u8 byte(const u32 x, const unsigned n)
>
>  static const u32 rco_tab[10] = { 1, 2, 4,

Re: [PATCH] crypto: caam - add SPDX license identifier to all files

2018-10-17 Thread Herbert Xu

On Wed, Oct 10, 2018 at 02:26:48PM +0300, Horia Geantă wrote:
> Previously, a tree-wide change added SPDX license identifiers to
> files lacking licensing information:
> b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files 
> with no license")
> 
> To be consistent update the rest of the files:
> -files with license specified by means of MODULE_LICENSE()
> -files with complete license text
> -Kconfig
> 
> Signed-off-by: Horia Geantă 
> ---
>  drivers/crypto/caam/Kconfig|  1 +
>  drivers/crypto/caam/caamalg.c  |  1 +
>  drivers/crypto/caam/caamalg_desc.c |  1 +
>  drivers/crypto/caam/caamalg_qi.c   |  1 +
>  drivers/crypto/caam/caamhash.c |  1 +
>  drivers/crypto/caam/caampkc.c  |  1 +
>  drivers/crypto/caam/caamrng.c  |  1 +
>  drivers/crypto/caam/ctrl.c |  1 +
>  drivers/crypto/caam/jr.c   |  1 +
>  drivers/crypto/caam/sg_sw_qm.h | 29 +
>  drivers/crypto/caam/sg_sw_qm2.h| 30 +-
>  11 files changed, 11 insertions(+), 57 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: aes_ti - disable interrupts while accessing sbox

2018-10-16 Thread Eric Biggers

Hi Ard,

On Thu, Oct 04, 2018 at 08:55:14AM +0200, Ard Biesheuvel wrote:
> Hi Eric,
> 
> On 4 October 2018 at 06:07, Eric Biggers  wrote:
> > From: Eric Biggers 
> >
> > The generic constant-time AES implementation is supposed to preload the
> > AES S-box into the CPU's L1 data cache.  But, an interrupt handler can
> > run on the CPU and muck with the cache.  Worse, on preemptible kernels
> > the process can even be preempted and moved to a different CPU.  So the
> > implementation may actually still be vulnerable to cache-timing attacks.
> >
> > Make it more robust by disabling interrupts while the sbox is used.
> >
> > In some quick tests on x86 and ARM, this doesn't affect performance
> > significantly.  Responsiveness is also a concern, but interrupts are
> > only disabled for a single AES block which even on ARM Cortex-A7 is
> > "only" ~1500 cycles to encrypt or ~2600 cycles to decrypt.
> >
> 
> I share your concern, but that is quite a big hammer.
> 
> Also, does it really take ~100 cycles per byte? That is terrible :-)
> 
> Given that the full lookup table is only 1024 bytes (or 1024+256 bytes
> for decryption), I wonder if something like the below would be a
> better option for A7 (apologies for the mangled whitespace)
> 
> diff --git a/arch/arm/crypto/aes-cipher-core.S
> b/arch/arm/crypto/aes-cipher-core.S
> index 184d6c2d15d5..83e893f7e581 100644
> --- a/arch/arm/crypto/aes-cipher-core.S
> +++ b/arch/arm/crypto/aes-cipher-core.S
> @@ -139,6 +139,13 @@
> 
>   __adrl ttab, \ttab
> 
> + .irpc r, 01234567
> + ldr r8, [ttab, #(32 * \r)]
> + ldr r9, [ttab, #(32 * \r) + 256]
> + ldr r10, [ttab, #(32 * \r) + 512]
> + ldr r11, [ttab, #(32 * \r) + 768]
> + .endr
> +
>   tst rounds, #2
>   bne 1f
> 
> @@ -180,6 +187,12 @@ ENDPROC(__aes_arm_encrypt)
> 
>   .align 5
>  ENTRY(__aes_arm_decrypt)
> + __adrl ttab, __aes_arm_inverse_sbox
> +
> + .irpc r, 01234567
> + ldr r8, [ttab, #(32 * \r)]
> + .endr
> +
>   do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0
>  ENDPROC(__aes_arm_decrypt)
> 
> diff --git a/arch/arm/crypto/aes-cipher-glue.c
> b/arch/arm/crypto/aes-cipher-glue.c
> index c222f6e072ad..630e1a436f1d 100644
> --- a/arch/arm/crypto/aes-cipher-glue.c
> +++ b/arch/arm/crypto/aes-cipher-glue.c
> @@ -23,16 +23,22 @@ static void aes_encrypt(struct crypto_tfm *tfm, u8
> *out, const u8 *in)
>  {
>   struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>   int rounds = 6 + ctx->key_length / 4;
> + unsigned long flags;
> 
> + local_irq_save(flags);
>   __aes_arm_encrypt(ctx->key_enc, rounds, in, out);
> + local_irq_restore(flags);
>  }
> 
>  static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>  {
>   struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
>   int rounds = 6 + ctx->key_length / 4;
> + unsigned long flags;
> 
> + local_irq_save(flags);
>   __aes_arm_decrypt(ctx->key_dec, rounds, in, out);
> + local_irq_restore(flags);
>  }
> 
>  static struct crypto_alg aes_alg = {
> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
> index ca554d57d01e..82fa860c9cb9 100644
> --- a/crypto/aes_generic.c
> +++ b/crypto/aes_generic.c
> @@ -63,7 +63,7 @@ static inline u8 byte(const u32 x, const unsigned n)
> 
>  static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 };
> 
> -__visible const u32 crypto_ft_tab[4][256] = {
> +__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = {
>   {
>   0xa56363c6, 0x847c7cf8, 0x99ee, 0x8d7b7bf6,
>   0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591,
> @@ -327,7 +327,7 @@ __visible const u32 crypto_ft_tab[4][256] = {
>   }
>  };
> 
> -__visible const u32 crypto_fl_tab[4][256] = {
> +__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = {
>   {
>   0x0063, 0x007c, 0x0077, 0x007b,
>   0x00f2, 0x006b, 0x006f, 0x00c5,
> @@ -591,7 +591,7 @@ __visible const u32 crypto_fl_tab[4][256] = {
>   }
>  };
> 
> -__visible const u32 crypto_it_tab[4][256] = {
> +__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = {
>   {
>   0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a,
>   0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b,
> @@ -855,7 +855,7 @@ __visible const u32 crypto_it_tab[4][256] = {
>   }
>  };
> 
> -__visible const u32 crypto_il_tab[4][256] = {
> +__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = {
>   {
>   0x0052, 0x0009, 0x006a, 0x00d5,
>   0x0030, 0x0036, 0x00a5, 0x0038,
> 

Thanks for the suggestion -- this turns out to work pretty well.  At least in a
microbenchmark, loading the larger table only slows it down by around 5%, so
it's still around 2-3 times faster than aes_ti.c on ARM Cortex-A7.  It also
noticeably improves Adiantum performance (Adiantum-XChaCha12-AES):

 Before (cpb)After (cpb)
Adiantum (enc), size=4096:   10.91   10.58
Adiantum (dec), size=4096:   11.04   10.62
Adiantum (enc), size=512:18.07   15.57
Adiantum (dec), size=512:19.10   15.83

(Those

Re: [bug report] crypto: user - Implement a generic crypto statistics

2018-10-15 Thread LABBE Corentin

On Thu, Oct 11, 2018 at 02:10:42PM +0300, Dan Carpenter wrote:
> Hello Corentin Labbe,
> 
> The patch cac5818c25d0: "crypto: user - Implement a generic crypto
> statistics" from Sep 19, 2018, leads to the following static checker
> warning:
> 
>   crypto/crypto_user_stat.c:53 crypto_report_aead()
>   warn: check that 'raead' doesn't leak information (struct has holes)
> 
> crypto/crypto_user_stat.c
> 34  static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg 
> *alg)
> 35  {
> 36  struct crypto_stat raead;
>^^^
> 37  u64 v64;
> 38  u32 v32;
> 39  
> 40  strncpy(raead.type, "aead", sizeof(raead.type));
> 41  
> 42  v32 = atomic_read(>encrypt_cnt);
> 43  raead.stat_encrypt_cnt = v32;
> 44  v64 = atomic64_read(>encrypt_tlen);
> 45  raead.stat_encrypt_tlen = v64;
> 46  v32 = atomic_read(>decrypt_cnt);
> 47  raead.stat_decrypt_cnt = v32;
> 48  v64 = atomic64_read(>decrypt_tlen);
> 49  raead.stat_decrypt_tlen = v64;
> 50  v32 = atomic_read(>aead_err_cnt);
> 51  raead.stat_aead_err_cnt = v32;
> 52  
> 53  if (nla_put(skb, CRYPTOCFGA_STAT_AEAD,
> 54  sizeof(struct crypto_stat), ))
> ^^
> We haven't totally initialized raead and also it apparently has struct
> holes after the u64s.
> 
> 55  goto nla_put_failure;
> 56  return 0;
> 57  
> 58  nla_put_failure:
> 59  return -EMSGSIZE;
> 60  }
> 
> See also:
> 
> crypto/crypto_user_stat.c:53 crypto_report_aead() warn: check that 'raead' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:81 crypto_report_cipher() warn: check that 
> 'rcipher' doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:108 crypto_report_comp() warn: check that 'rcomp' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:135 crypto_report_acomp() warn: check that 'racomp' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:166 crypto_report_akcipher() warn: check that 
> 'rakcipher' doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:191 crypto_report_kpp() warn: check that 'rkpp' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:215 crypto_report_ahash() warn: check that 'rhash' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:239 crypto_report_shash() warn: check that 'rhash' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:265 crypto_report_rng() warn: check that 'rrng' 
> doesn't leak information (struct has holes)
> crypto/crypto_user_stat.c:295 crypto_reportstat_one() warn: check that 'rl' 
> doesn't leak information (struct has holes)
> 
> regards,
> dan carpenter

Thanks for the report, I will send a fix soon.

Regards

Re: [PATCH] crypto: arm64/aes-blk - ensure XTS mask is always loaded

2018-10-12 Thread Herbert Xu

On Mon, Oct 08, 2018 at 01:16:59PM +0200, Ard Biesheuvel wrote:
> Commit 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling")
> optimized away some reloads of the XTS mask vector, but failed to take
> into account that calls into the XTS en/decrypt routines will take a
> slightly different code path if a single block of input is split across
> different buffers. So let's ensure that the first load occurs
> unconditionally, and move the reload to the end so it doesn't occur
> needlessly.
> 
> Fixes: 2e5d2f33d1db ("crypto: arm64/aes-blk - improve XTS mask handling")
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm64/crypto/aes-modes.S | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH -next] crypto: mxs-dcp: make symbols 'sha1_null_hash' and 'sha256_null_hash' static

2018-10-12 Thread Herbert Xu

On Thu, Oct 11, 2018 at 01:49:48AM +, Wei Yongjun wrote:
> Fixes the following sparse warnings:
> 
> drivers/crypto/mxs-dcp.c:39:15: warning:
>  symbol 'sha1_null_hash' was not declared. Should it be static?
> drivers/crypto/mxs-dcp.c:43:15: warning:
>  symbol 'sha256_null_hash' was not declared. Should it be static?
> 
> Fixes: c709eebaf5c5 ("crypto: mxs-dcp - Fix SHA null hashes and output 
> length")
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/crypto/mxs-dcp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto/testmgr.c: fix sizeof() on COMP_BUF_SIZE

2018-10-12 Thread Herbert Xu

On Sun, Oct 07, 2018 at 01:58:10PM +0200, Michael Schupikov wrote:
> After allocation, output and decomp_output both point to memory chunks of
> size COMP_BUF_SIZE. Then, only the first bytes are zeroed out using
> sizeof(COMP_BUF_SIZE) as parameter to memset(), because
> sizeof(COMP_BUF_SIZE) provides the size of the constant and not the size of
> allocated memory.
> 
> Instead, the whole allocated memory is meant to be zeroed out. Use
> COMP_BUF_SIZE as parameter to memset() directly in order to accomplish
> this.
> 
> Fixes: 336073840a872 ("crypto: testmgr - Allow different compression results")
> 
> Signed-off-by: Michael Schupikov 
> ---
>  crypto/testmgr.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH -next] crypto: axis - fix platform_no_drv_owner.cocci warnings

2018-10-12 Thread Herbert Xu

On Fri, Oct 05, 2018 at 06:42:44AM +, YueHaibing wrote:
> Remove .owner field if calls are used which set it automatically
> Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
> 
> Signed-off-by: YueHaibing 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH -next] crypto: chtls - remove set but not used variable 'csk'

2018-10-12 Thread Herbert Xu

On Fri, Oct 05, 2018 at 06:43:27AM +, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/crypto/chelsio/chtls/chtls_cm.c: In function 'chtls_disconnect':
> drivers/crypto/chelsio/chtls/chtls_cm.c:408:21: warning:
>  variable 'csk' set but not used [-Wunused-but-set-variable]
>  
> drivers/crypto/chelsio/chtls/chtls_cm.c: In function 'chtls_recv_sock':
> drivers/crypto/chelsio/chtls/chtls_cm.c:1016:23: warning:
>  variable 'tcph' set but not used [-Wunused-but-set-variable]
>  
> 'csk' and 'tcph' are never used since introduce
> in commit cc35c88ae4db ("crypto : chtls - CPL handler definition")
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/crypto/chelsio/chtls/chtls_cm.c | 4 
>  1 file changed, 4 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 2/3] crypto: crypto_xor - use unaligned accessors for aligned fast path

2018-10-09 Thread Ard Biesheuvel

On 9 October 2018 at 05:47, Eric Biggers  wrote:
> Hi Ard,
>
> On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote:
>> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> because the ordinary load/store instructions (ldr, ldrh, ldrb) can
>> tolerate any misalignment of the memory address. However, load/store
>> double and load/store multiple instructions (ldrd, ldm) may still only
>> be used on memory addresses that are 32-bit aligned, and so we have to
>> use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
>> may end up with a severe performance hit due to alignment traps that
>> require fixups by the kernel.
>>
>> Fortunately, the get_unaligned() accessors do the right thing: when
>> building for ARMv6 or later, the compiler will emit unaligned accesses
>> using the ordinary load/store instructions (but avoid the ones that
>> require 32-bit alignment). When building for older ARM, those accessors
>> will emit the appropriate sequence of ldrb/mov/orr instructions. And on
>> architectures that can truly tolerate any kind of misalignment, the
>> get_unaligned() accessors resolve to the leXX_to_cpup accessors that
>> operate on aligned addresses.
>>
>> So switch to the unaligned accessors for the aligned fast path. This
>> will create the exact same code on architectures that can really
>> tolerate any kind of misalignment, and generate code for ARMv6+ that
>> avoids load/store instructions that trigger alignment faults.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  crypto/algapi.c |  7 +++
>>  include/crypto/algapi.h | 11 +--
>>  2 files changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/crypto/algapi.c b/crypto/algapi.c
>> index 2545c5f89c4c..52ce3c5a0499 100644
>> --- a/crypto/algapi.c
>> +++ b/crypto/algapi.c
>> @@ -988,11 +988,10 @@ void crypto_inc(u8 *a, unsigned int size)
>>   __be32 *b = (__be32 *)(a + size);
>>   u32 c;
>>
>> - if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
>> - IS_ALIGNED((unsigned long)b, __alignof__(*b)))
>> + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
>>   for (; size >= 4; size -= 4) {
>> - c = be32_to_cpu(*--b) + 1;
>> - *b = cpu_to_be32(c);
>> + c = get_unaligned_be32(--b) + 1;
>> + put_unaligned_be32(c, b);
>>   if (likely(c))
>>   return;
>>   }
>> diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
>> index 4a5ad10e75f0..86267c232f34 100644
>> --- a/include/crypto/algapi.h
>> +++ b/include/crypto/algapi.h
>> @@ -17,6 +17,8 @@
>>  #include 
>>  #include 
>>
>> +#include 
>> +
>>  /*
>>   * Maximum values for blocksize and alignmask, used to allocate
>>   * static buffers that are big enough for any combination of
>> @@ -212,7 +214,9 @@ static inline void crypto_xor(u8 *dst, const u8 *src, 
>> unsigned int size)
>>   unsigned long *s = (unsigned long *)src;
>>
>>   while (size > 0) {
>> - *d++ ^= *s++;
>> + put_unaligned(get_unaligned(d) ^ get_unaligned(s), d);
>> + d++;
>> + s++;
>>   size -= sizeof(unsigned long);
>>   }
>>   } else {
>> @@ -231,7 +235,10 @@ static inline void crypto_xor_cpy(u8 *dst, const u8 
>> *src1, const u8 *src2,
>>   unsigned long *s2 = (unsigned long *)src2;
>>
>>   while (size > 0) {
>> - *d++ = *s1++ ^ *s2++;
>> + put_unaligned(get_unaligned(s1) ^ get_unaligned(s2), 
>> d);
>> + d++;
>> + s1++;
>> + s2++;
>>   size -= sizeof(unsigned long);
>>   }
>>   } else {
>> --
>> 2.11.0
>>
>
> Doesn't __crypto_xor() have the same problem too?
>

More or less, and I was wondering what to do about it.

To fix __crypto_xor() correctly, we'd have to duplicate the code path
that operates on the u64[], u32[] and u16[] chunks, or we'll end up
with suboptimal code that uses the accessors even if the alignment
routine has executed first. This is the same issue Jason points out in
siphash.

Perhaps the answer is to add 'fast' unaligned accessors that may be
used on unaligned quantities only if
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is set?

E.g.,

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
#define get_unaligned_fast  get_unaligned
#else
#define get_unaligned_fast(x)  (*(x))
#endif

Arnd?

Re: [PATCH 1/3] crypto: memneq - use unaligned accessors for aligned fast path

2018-10-09 Thread Ard Biesheuvel

On 9 October 2018 at 05:34, Eric Biggers  wrote:
> Hi Ard,
>
> On Mon, Oct 08, 2018 at 11:15:52PM +0200, Ard Biesheuvel wrote:
>> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> because the ordinary load/store instructions (ldr, ldrh, ldrb) can
>> tolerate any misalignment of the memory address. However, load/store
>> double and load/store multiple instructions (ldrd, ldm) may still only
>> be used on memory addresses that are 32-bit aligned, and so we have to
>> use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
>> may end up with a severe performance hit due to alignment traps that
>> require fixups by the kernel.
>>
>> Fortunately, the get_unaligned() accessors do the right thing: when
>> building for ARMv6 or later, the compiler will emit unaligned accesses
>> using the ordinary load/store instructions (but avoid the ones that
>> require 32-bit alignment). When building for older ARM, those accessors
>> will emit the appropriate sequence of ldrb/mov/orr instructions. And on
>> architectures that can truly tolerate any kind of misalignment, the
>> get_unaligned() accessors resolve to the leXX_to_cpup accessors that
>> operate on aligned addresses.
>>
>> So switch to the unaligned accessors for the aligned fast path. This
>> will create the exact same code on architectures that can really
>> tolerate any kind of misalignment, and generate code for ARMv6+ that
>> avoids load/store instructions that trigger alignment faults.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  crypto/memneq.c | 24 ++--
>>  1 file changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/crypto/memneq.c b/crypto/memneq.c
>> index afed1bd16aee..0f46a6150f22 100644
>> --- a/crypto/memneq.c
>> +++ b/crypto/memneq.c
>> @@ -60,6 +60,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>
>>  #ifndef __HAVE_ARCH_CRYPTO_MEMNEQ
>>
>> @@ -71,7 +72,10 @@ __crypto_memneq_generic(const void *a, const void *b, 
>> size_t size)
>>
>>  #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
>>   while (size >= sizeof(unsigned long)) {
>> - neq |= *(unsigned long *)a ^ *(unsigned long *)b;
>> + unsigned long const *p = a;
>> + unsigned long const *q = b;
>> +
>> + neq |= get_unaligned(p) ^ get_unaligned(q);
>>   OPTIMIZER_HIDE_VAR(neq);
>>   a += sizeof(unsigned long);
>>   b += sizeof(unsigned long);
>> @@ -95,18 +99,24 @@ static inline unsigned long __crypto_memneq_16(const 
>> void *a, const void *b)
>>
>>  #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>>   if (sizeof(unsigned long) == 8) {
>> - neq |= *(unsigned long *)(a)   ^ *(unsigned long *)(b);
>> + unsigned long const *p = a;
>> + unsigned long const *q = b;
>> +
>> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>>   OPTIMIZER_HIDE_VAR(neq);
>> - neq |= *(unsigned long *)(a+8) ^ *(unsigned long *)(b+8);
>> + neq |= get_unaligned(p) ^ get_unaligned(q);
>>   OPTIMIZER_HIDE_VAR(neq);
>>   } else if (sizeof(unsigned int) == 4) {
>> - neq |= *(unsigned int *)(a)^ *(unsigned int *)(b);
>> + unsigned int const *p = a;
>> + unsigned int const *q = b;
>> +
>> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>>   OPTIMIZER_HIDE_VAR(neq);
>> - neq |= *(unsigned int *)(a+4)  ^ *(unsigned int *)(b+4);
>> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>>   OPTIMIZER_HIDE_VAR(neq);
>> - neq |= *(unsigned int *)(a+8)  ^ *(unsigned int *)(b+8);
>> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>>   OPTIMIZER_HIDE_VAR(neq);
>> - neq |= *(unsigned int *)(a+12) ^ *(unsigned int *)(b+12);
>> + neq |= get_unaligned(p) ^ get_unaligned(q);
>>   OPTIMIZER_HIDE_VAR(neq);
>>   } else
>>  #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
>
> This looks good, but maybe now we should get rid of the
> !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS path too?
> At least for the 16-byte case:
>
> static inline unsigned long __crypto_memneq_16(const void *a, const void *b)
> {
> const unsigned long *p = a, *q = b;
> unsigned long neq = 0;
>
> BUILD_BUG_ON(sizeof(*p) != 4 && sizeof(*p) != 8);
> neq |= get_unaligned(p++) ^ get_unaligned(q++);
> OPTIMIZER_HIDE_VAR(neq);
> neq |= get_unaligned(p++) ^ get_unaligned(q++);
> OPTIMIZER_HIDE_VAR(neq);
> if (sizeof(*p) == 4) {
> neq |= get_unaligned(p++) ^ get_unaligned(q++);
> OPTIMIZER_HIDE_VAR(neq);
> neq |= get_unaligned(p++) ^ get_unaligned(q++);
> OPTIMIZER_HIDE_VAR(neq);
> }
> return neq;
> }

Yes that makes sense.

Re: [PATCH 2/3] crypto: crypto_xor - use unaligned accessors for aligned fast path

2018-10-08 Thread Eric Biggers

Hi Ard,

On Mon, Oct 08, 2018 at 11:15:53PM +0200, Ard Biesheuvel wrote:
> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> because the ordinary load/store instructions (ldr, ldrh, ldrb) can
> tolerate any misalignment of the memory address. However, load/store
> double and load/store multiple instructions (ldrd, ldm) may still only
> be used on memory addresses that are 32-bit aligned, and so we have to
> use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
> may end up with a severe performance hit due to alignment traps that
> require fixups by the kernel.
> 
> Fortunately, the get_unaligned() accessors do the right thing: when
> building for ARMv6 or later, the compiler will emit unaligned accesses
> using the ordinary load/store instructions (but avoid the ones that
> require 32-bit alignment). When building for older ARM, those accessors
> will emit the appropriate sequence of ldrb/mov/orr instructions. And on
> architectures that can truly tolerate any kind of misalignment, the
> get_unaligned() accessors resolve to the leXX_to_cpup accessors that
> operate on aligned addresses.
> 
> So switch to the unaligned accessors for the aligned fast path. This
> will create the exact same code on architectures that can really
> tolerate any kind of misalignment, and generate code for ARMv6+ that
> avoids load/store instructions that trigger alignment faults.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/algapi.c |  7 +++
>  include/crypto/algapi.h | 11 +--
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/crypto/algapi.c b/crypto/algapi.c
> index 2545c5f89c4c..52ce3c5a0499 100644
> --- a/crypto/algapi.c
> +++ b/crypto/algapi.c
> @@ -988,11 +988,10 @@ void crypto_inc(u8 *a, unsigned int size)
>   __be32 *b = (__be32 *)(a + size);
>   u32 c;
>  
> - if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
> - IS_ALIGNED((unsigned long)b, __alignof__(*b)))
> + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
>   for (; size >= 4; size -= 4) {
> - c = be32_to_cpu(*--b) + 1;
> - *b = cpu_to_be32(c);
> + c = get_unaligned_be32(--b) + 1;
> + put_unaligned_be32(c, b);
>   if (likely(c))
>   return;
>   }
> diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
> index 4a5ad10e75f0..86267c232f34 100644
> --- a/include/crypto/algapi.h
> +++ b/include/crypto/algapi.h
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  /*
>   * Maximum values for blocksize and alignmask, used to allocate
>   * static buffers that are big enough for any combination of
> @@ -212,7 +214,9 @@ static inline void crypto_xor(u8 *dst, const u8 *src, 
> unsigned int size)
>   unsigned long *s = (unsigned long *)src;
>  
>   while (size > 0) {
> - *d++ ^= *s++;
> + put_unaligned(get_unaligned(d) ^ get_unaligned(s), d);
> + d++;
> + s++;
>   size -= sizeof(unsigned long);
>   }
>   } else {
> @@ -231,7 +235,10 @@ static inline void crypto_xor_cpy(u8 *dst, const u8 
> *src1, const u8 *src2,
>   unsigned long *s2 = (unsigned long *)src2;
>  
>   while (size > 0) {
> - *d++ = *s1++ ^ *s2++;
> + put_unaligned(get_unaligned(s1) ^ get_unaligned(s2), d);
> + d++;
> + s1++;
> + s2++;
>   size -= sizeof(unsigned long);
>   }
>   } else {
> -- 
> 2.11.0
> 

Doesn't __crypto_xor() have the same problem too?

- Eric

Re: [PATCH 1/3] crypto: memneq - use unaligned accessors for aligned fast path

2018-10-08 Thread Eric Biggers

Hi Ard,

On Mon, Oct 08, 2018 at 11:15:52PM +0200, Ard Biesheuvel wrote:
> On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> because the ordinary load/store instructions (ldr, ldrh, ldrb) can
> tolerate any misalignment of the memory address. However, load/store
> double and load/store multiple instructions (ldrd, ldm) may still only
> be used on memory addresses that are 32-bit aligned, and so we have to
> use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
> may end up with a severe performance hit due to alignment traps that
> require fixups by the kernel.
> 
> Fortunately, the get_unaligned() accessors do the right thing: when
> building for ARMv6 or later, the compiler will emit unaligned accesses
> using the ordinary load/store instructions (but avoid the ones that
> require 32-bit alignment). When building for older ARM, those accessors
> will emit the appropriate sequence of ldrb/mov/orr instructions. And on
> architectures that can truly tolerate any kind of misalignment, the
> get_unaligned() accessors resolve to the leXX_to_cpup accessors that
> operate on aligned addresses.
> 
> So switch to the unaligned accessors for the aligned fast path. This
> will create the exact same code on architectures that can really
> tolerate any kind of misalignment, and generate code for ARMv6+ that
> avoids load/store instructions that trigger alignment faults.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/memneq.c | 24 ++--
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/crypto/memneq.c b/crypto/memneq.c
> index afed1bd16aee..0f46a6150f22 100644
> --- a/crypto/memneq.c
> +++ b/crypto/memneq.c
> @@ -60,6 +60,7 @@
>   */
>  
>  #include 
> +#include 
>  
>  #ifndef __HAVE_ARCH_CRYPTO_MEMNEQ
>  
> @@ -71,7 +72,10 @@ __crypto_memneq_generic(const void *a, const void *b, 
> size_t size)
>  
>  #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
>   while (size >= sizeof(unsigned long)) {
> - neq |= *(unsigned long *)a ^ *(unsigned long *)b;
> + unsigned long const *p = a;
> + unsigned long const *q = b;
> +
> + neq |= get_unaligned(p) ^ get_unaligned(q);
>   OPTIMIZER_HIDE_VAR(neq);
>   a += sizeof(unsigned long);
>   b += sizeof(unsigned long);
> @@ -95,18 +99,24 @@ static inline unsigned long __crypto_memneq_16(const void 
> *a, const void *b)
>  
>  #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>   if (sizeof(unsigned long) == 8) {
> - neq |= *(unsigned long *)(a)   ^ *(unsigned long *)(b);
> + unsigned long const *p = a;
> + unsigned long const *q = b;
> +
> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>   OPTIMIZER_HIDE_VAR(neq);
> - neq |= *(unsigned long *)(a+8) ^ *(unsigned long *)(b+8);
> + neq |= get_unaligned(p) ^ get_unaligned(q);
>   OPTIMIZER_HIDE_VAR(neq);
>   } else if (sizeof(unsigned int) == 4) {
> - neq |= *(unsigned int *)(a)^ *(unsigned int *)(b);
> + unsigned int const *p = a;
> + unsigned int const *q = b;
> +
> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>   OPTIMIZER_HIDE_VAR(neq);
> - neq |= *(unsigned int *)(a+4)  ^ *(unsigned int *)(b+4);
> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>   OPTIMIZER_HIDE_VAR(neq);
> - neq |= *(unsigned int *)(a+8)  ^ *(unsigned int *)(b+8);
> + neq |= get_unaligned(p++) ^ get_unaligned(q++);
>   OPTIMIZER_HIDE_VAR(neq);
> - neq |= *(unsigned int *)(a+12) ^ *(unsigned int *)(b+12);
> + neq |= get_unaligned(p) ^ get_unaligned(q);
>   OPTIMIZER_HIDE_VAR(neq);
>   } else
>  #endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */

This looks good, but maybe now we should get rid of the
!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS path too?
At least for the 16-byte case:

static inline unsigned long __crypto_memneq_16(const void *a, const void *b)
{
const unsigned long *p = a, *q = b;
unsigned long neq = 0;

BUILD_BUG_ON(sizeof(*p) != 4 && sizeof(*p) != 8);
neq |= get_unaligned(p++) ^ get_unaligned(q++);
OPTIMIZER_HIDE_VAR(neq);
neq |= get_unaligned(p++) ^ get_unaligned(q++);
OPTIMIZER_HIDE_VAR(neq);
if (sizeof(*p) == 4) {
neq |= get_unaligned(p++) ^ get_unaligned(q++);
OPTIMIZER_HIDE_VAR(neq);
neq |= get_unaligned(p++) ^ get_unaligned(q++);
OPTIMIZER_HIDE_VAR(neq);
}
return neq;
}

Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal

2018-10-07 Thread Herbert Xu

On Fri, Oct 05, 2018 at 10:13:06AM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit()
> to register/unregister the "fpu" template.  But these functions don't
> exist anymore, causing a build error.  Remove the calls to them.
> 
> Fixes: 944585a64f5e ("crypto: x86/aes-ni - remove special handling of AES in 
> PCBC mode")
> Signed-off-by: Eric Biggers 
> ---
>  arch/x86/crypto/aesni-intel_glue.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: arm64/aes - fix handling sub-block CTS-CBC inputs

2018-10-07 Thread Herbert Xu

On Tue, Oct 02, 2018 at 10:22:15PM -0700, Eric Biggers wrote:
> From: Eric Biggers 
> 
> In the new arm64 CTS-CBC implementation, return an error code rather
> than crashing on inputs shorter than AES_BLOCK_SIZE bytes.  Also set
> cra_blocksize to AES_BLOCK_SIZE (like is done in the cts template) to
> indicate the minimum input size.
> 
> Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode")
> Signed-off-by: Eric Biggers 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/3] crypto: mxs-dcp - Fix tcrypt on imx6

2018-10-07 Thread Herbert Xu

On Tue, Oct 02, 2018 at 07:01:46PM +, Leonard Crestez wrote:
> The mxs-dcp driver currently fails to probe on imx6. Fix the whole thing
> by porting a cleaned/squashed version of fixes carried in the NXP vendor
> tree.
> 
> Tested with "modprobe tcrypt" and CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=n
> on imx6sl imx6sll imx6ull: no failures.
> 
> I'm not very familiar with crypto and did not write write these fixes so
> a skeptical review would be appreciated.
> 
> Previously:
>   https://lore.kernel.org/patchwork/patch/989652/
> 
> Dan Douglass (1):
>   crypto: mxs-dcp - Implement sha import/export
> 
> Radu Solea (2):
>   crypto: mxs-dcp - Fix SHA null hashes and output length
>   crypto: mxs-dcp - Fix AES issues
> 
>  drivers/crypto/mxs-dcp.c | 121 ---
>  1 file changed, 101 insertions(+), 20 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v2 0/2] crypto - fix aegis/morus for big endian systems

2018-10-07 Thread Herbert Xu

On Mon, Oct 01, 2018 at 10:36:36AM +0200, Ard Biesheuvel wrote:
> Some bug fixes for issues that I stumbled upon while working on other
> stuff.
> 
> Changes since v1:
> - add Ondrej's ack to #1
> - simplify #2 and drop unrelated performance tweak
> 
> Ard Biesheuvel (2):
>   crypto: morus/generic - fix for big endian systems
>   crypto: aegis/generic - fix for big endian systems
> 
>  crypto/aegis.h | 20 +---
>  crypto/morus1280.c |  7 ++-
>  crypto/morus640.c  | 16 
>  3 files changed, 15 insertions(+), 28 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] drivers: crypto: caam: kconfig: create menu for CAAM

2018-10-07 Thread Herbert Xu

Franck LENORMAND  wrote:
> The CAAM driver has multiple configuration and all are listed
> in the crypto menu.
> 
> This patch create a menu dedicated to the Freescale CAAM driver.
> 
> Signed-off-by: Franck LENORMAND 
> ---
> drivers/crypto/caam/Kconfig | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig
> index 1eb8527..fb87245 100644
> --- a/drivers/crypto/caam/Kconfig
> +++ b/drivers/crypto/caam/Kconfig
> @@ -1,3 +1,5 @@
> +menu "Freescale CAAM"
> +
> config CRYPTO_DEV_FSL_CAAM
>tristate "Freescale CAAM-Multicore driver backend"
>depends on FSL_SOC || ARCH_MXC || ARCH_LAYERSCAPE
> @@ -152,3 +154,5 @@ config CRYPTO_DEV_FSL_CAAM_DEBUG
> config CRYPTO_DEV_FSL_CAAM_CRYPTO_API_DESC
>def_tristate (CRYPTO_DEV_FSL_CAAM_CRYPTO_API || \
>  CRYPTO_DEV_FSL_CAAM_CRYPTO_API_QI)
> +
> +endmenu

Please rebase this on the current cryptodev tree as it doesn't
apply.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal

2018-10-05 Thread Eric Biggers

On Fri, Oct 05, 2018 at 07:16:13PM +0200, Ard Biesheuvel wrote:
> On 5 October 2018 at 19:13, Eric Biggers  wrote:
> > From: Eric Biggers 
> >
> > aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit()
> > to register/unregister the "fpu" template.  But these functions don't
> > exist anymore, causing a build error.  Remove the calls to them.
> >
> > Fixes: 944585a64f5e ("crypto: x86/aes-ni - remove special handling of AES 
> > in PCBC mode")
> > Signed-off-by: Eric Biggers 
> 
> Thanks for spotting that.
> 
> I had actually noticed myself, but wasn't really expecting this RFC
> patch to be picked up without discussion.
> 

The patch seems reasonable to me -- we shouldn't maintain a special FPU template
just for AES-PCBC when possibly no one is even using that algorithm.

- Eric

Re: [PATCH] crypto: x86/aes-ni - fix build error following fpu template removal

2018-10-05 Thread Ard Biesheuvel

On 5 October 2018 at 19:13, Eric Biggers  wrote:
> From: Eric Biggers 
>
> aesni-intel_glue.c still calls crypto_fpu_init() and crypto_fpu_exit()
> to register/unregister the "fpu" template.  But these functions don't
> exist anymore, causing a build error.  Remove the calls to them.
>
> Fixes: 944585a64f5e ("crypto: x86/aes-ni - remove special handling of AES in 
> PCBC mode")
> Signed-off-by: Eric Biggers 

Thanks for spotting that.

I had actually noticed myself, but wasn't really expecting this RFC
patch to be picked up without discussion.


> ---
>  arch/x86/crypto/aesni-intel_glue.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_glue.c 
> b/arch/x86/crypto/aesni-intel_glue.c
> index 89bae64eef4f9..661f7daf43da9 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -102,9 +102,6 @@ asmlinkage void aesni_cbc_enc(struct crypto_aes_ctx *ctx, 
> u8 *out,
>  asmlinkage void aesni_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
>   const u8 *in, unsigned int len, u8 *iv);
>
> -int crypto_fpu_init(void);
> -void crypto_fpu_exit(void);
> -
>  #define AVX_GEN2_OPTSIZE 640
>  #define AVX_GEN4_OPTSIZE 4096
>
> @@ -1449,13 +1446,9 @@ static int __init aesni_init(void)
>  #endif
>  #endif
>
> -   err = crypto_fpu_init();
> -   if (err)
> -   return err;
> -
> err = crypto_register_algs(aesni_algs, ARRAY_SIZE(aesni_algs));
> if (err)
> -   goto fpu_exit;
> +   return err;
>
> err = crypto_register_skciphers(aesni_skciphers,
> ARRAY_SIZE(aesni_skciphers));
> @@ -1489,8 +1482,6 @@ static int __init aesni_init(void)
> ARRAY_SIZE(aesni_skciphers));
>  unregister_algs:
> crypto_unregister_algs(aesni_algs, ARRAY_SIZE(aesni_algs));
> -fpu_exit:
> -   crypto_fpu_exit();
> return err;
>  }
>
> @@ -1501,8 +1492,6 @@ static void __exit aesni_exit(void)
> crypto_unregister_skciphers(aesni_skciphers,
> ARRAY_SIZE(aesni_skciphers));
> crypto_unregister_algs(aesni_algs, ARRAY_SIZE(aesni_algs));
> -
> -   crypto_fpu_exit();
>  }
>
>  late_initcall(aesni_init);
> --
> 2.19.0.605.g01d371f741-goog
>

Re: [PATCH] crypto: qat - move temp buffers off the stack

2018-10-05 Thread Ard Biesheuvel

On 5 October 2018 at 04:29, Herbert Xu  wrote:
> On Wed, Sep 26, 2018 at 11:51:59AM +0200, Ard Biesheuvel wrote:
>> Arnd reports that with Kees's latest VLA patches applied, the HMAC
>> handling in the QAT driver uses a worst case estimate of 160 bytes
>> for the SHA blocksize, allowing the compiler to determine the size
>> of the stack frame at runtime and throw a warning:
>>
>>   drivers/crypto/qat/qat_common/qat_algs.c: In function 
>> 'qat_alg_do_precomputes':
>>   drivers/crypto/qat/qat_common/qat_algs.c:257:1: error: the frame size
>>   of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>>
>> Given that this worst case estimate is only 32 bytes larger than the
>> actual block size of SHA-512, the use of a VLA here was hiding the
>> excessive size of the stack frame from the compiler, and so we should
>> try to move these buffers off the stack.
>>
>> So move the ipad/opad buffers and the various SHA state descriptors
>> into the tfm context struct. Since qat_alg_do_precomputes() is only
>> called in the context of a setkey() operation, this should be safe.
>> Using SHA512_BLOCK_SIZE for the size of the ipad/opad buffers allows
>> them to be used by SHA-1/SHA-256 as well.
>>
>> Reported-by: Arnd Bergmann 
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> This applies against v4.19-rc while Arnd's report was about -next. However,
>> since Kees's VLA change results in a warning about a pre-existing condition,
>> we may decide to apply it as a fix, and handle the conflict with Kees's
>> patch in cryptodev. Otherwise, I can respin it to apply onto cryptodev
>> directly. The patch was build tested only - I don't have the hardware.
>>
>> Thoughts anyone?
>
> I applied it against cryptodev only.  I don't think it's bad enough
> to warrant a backport to stable though.  But if you guys disagree we
> could always send the backport to stable after this goes upstream.
>

Works for me.

Re: [PATCH] crypto: lrw - fix rebase error after out of bounds fix

2018-10-04 Thread Herbert Xu

On Sun, Sep 30, 2018 at 09:51:16PM +0200, Ard Biesheuvel wrote:
> Due to an unfortunate interaction between commit fbe1a850b3b1
> ("crypto: lrw - Fix out-of bounds access on counter overflow") and
> commit c778f96bf347 ("crypto: lrw - Optimize tweak computation"),
> we ended up with a version of next_index() that always returns 127.
> 
> Fixes: c778f96bf347 ("crypto: lrw - Optimize tweak computation")
> Signed-off-by: Ard Biesheuvel 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: qat - move temp buffers off the stack

2018-10-04 Thread Herbert Xu

On Wed, Sep 26, 2018 at 11:51:59AM +0200, Ard Biesheuvel wrote:
> Arnd reports that with Kees's latest VLA patches applied, the HMAC
> handling in the QAT driver uses a worst case estimate of 160 bytes
> for the SHA blocksize, allowing the compiler to determine the size
> of the stack frame at runtime and throw a warning:
> 
>   drivers/crypto/qat/qat_common/qat_algs.c: In function 
> 'qat_alg_do_precomputes':
>   drivers/crypto/qat/qat_common/qat_algs.c:257:1: error: the frame size
>   of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> 
> Given that this worst case estimate is only 32 bytes larger than the
> actual block size of SHA-512, the use of a VLA here was hiding the
> excessive size of the stack frame from the compiler, and so we should
> try to move these buffers off the stack.
> 
> So move the ipad/opad buffers and the various SHA state descriptors
> into the tfm context struct. Since qat_alg_do_precomputes() is only
> called in the context of a setkey() operation, this should be safe.
> Using SHA512_BLOCK_SIZE for the size of the ipad/opad buffers allows
> them to be used by SHA-1/SHA-256 as well.
> 
> Reported-by: Arnd Bergmann 
> Signed-off-by: Ard Biesheuvel 
> ---
> This applies against v4.19-rc while Arnd's report was about -next. However,
> since Kees's VLA change results in a warning about a pre-existing condition,
> we may decide to apply it as a fix, and handle the conflict with Kees's
> patch in cryptodev. Otherwise, I can respin it to apply onto cryptodev
> directly. The patch was build tested only - I don't have the hardware.
> 
> Thoughts anyone?

I applied it against cryptodev only.  I don't think it's bad enough
to warrant a backport to stable though.  But if you guys disagree we
could always send the backport to stable after this goes upstream.

>  drivers/crypto/qat/qat_common/qat_algs.c | 60 ++--
>  1 file changed, 31 insertions(+), 29 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH -next v2] crypto: ccp - Make function sev_get_firmware() static

2018-10-04 Thread Herbert Xu

On Wed, Sep 26, 2018 at 02:09:23AM +, Wei Yongjun wrote:
> Fixes the following sparse warning:
> 
> drivers/crypto/ccp/psp-dev.c:444:5: warning:
>  symbol 'sev_get_firmware' was not declared. Should it be static?
> 
> Fixes: e93720606efd ("crypto: ccp - Allow SEV firmware to be chosen based on 
> Family and Model")
> Signed-off-by: Wei Yongjun 
> ---
> v1 -> v2: add fixes and add cc Janakarajan
> ---
>  drivers/crypto/ccp/psp-dev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [RFC PATCH] crypto: x86/aes-ni - remove special handling of AES in PCBC mode

2018-10-04 Thread Herbert Xu

On Mon, Sep 24, 2018 at 02:48:16PM +0200, Ard Biesheuvel wrote:
> For historical reasons, the AES-NI based implementation of the PCBC
> chaining mode uses a special FPU chaining mode wrapper template to
> amortize the FPU start/stop overhead over multiple blocks.
> 
> When this FPU wrapper was introduced, it supported widely used
> chaining modes such as XTS and CTR (as well as LRW), but currently,
> PCBC is the only remaining user.
> 
> Since there are no known users of pcbc(aes) in the kernel, let's remove
> this special driver, and rely on the generic pcbc driver to encapsulate
> the AES-NI core cipher.
> 
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/x86/crypto/Makefile   |   2 +-
>  arch/x86/crypto/aesni-intel_glue.c |  32 ---
>  arch/x86/crypto/fpu.c  | 207 
>  crypto/Kconfig |   2 +-
>  4 files changed, 2 insertions(+), 241 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: aes_ti - disable interrupts while accessing sbox

2018-10-04 Thread Ard Biesheuvel

Hi Eric,

On 4 October 2018 at 06:07, Eric Biggers  wrote:
> From: Eric Biggers 
>
> The generic constant-time AES implementation is supposed to preload the
> AES S-box into the CPU's L1 data cache.  But, an interrupt handler can
> run on the CPU and muck with the cache.  Worse, on preemptible kernels
> the process can even be preempted and moved to a different CPU.  So the
> implementation may actually still be vulnerable to cache-timing attacks.
>
> Make it more robust by disabling interrupts while the sbox is used.
>
> In some quick tests on x86 and ARM, this doesn't affect performance
> significantly.  Responsiveness is also a concern, but interrupts are
> only disabled for a single AES block which even on ARM Cortex-A7 is
> "only" ~1500 cycles to encrypt or ~2600 cycles to decrypt.
>

I share your concern, but that is quite a big hammer.

Also, does it really take ~100 cycles per byte? That is terrible :-)

Given that the full lookup table is only 1024 bytes (or 1024+256 bytes
for decryption), I wonder if something like the below would be a
better option for A7 (apologies for the mangled whitespace)

diff --git a/arch/arm/crypto/aes-cipher-core.S
b/arch/arm/crypto/aes-cipher-core.S
index 184d6c2d15d5..83e893f7e581 100644
--- a/arch/arm/crypto/aes-cipher-core.S
+++ b/arch/arm/crypto/aes-cipher-core.S
@@ -139,6 +139,13 @@

  __adrl ttab, \ttab

+ .irpc r, 01234567
+ ldr r8, [ttab, #(32 * \r)]
+ ldr r9, [ttab, #(32 * \r) + 256]
+ ldr r10, [ttab, #(32 * \r) + 512]
+ ldr r11, [ttab, #(32 * \r) + 768]
+ .endr
+
  tst rounds, #2
  bne 1f

@@ -180,6 +187,12 @@ ENDPROC(__aes_arm_encrypt)

  .align 5
 ENTRY(__aes_arm_decrypt)
+ __adrl ttab, __aes_arm_inverse_sbox
+
+ .irpc r, 01234567
+ ldr r8, [ttab, #(32 * \r)]
+ .endr
+
  do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0
 ENDPROC(__aes_arm_decrypt)

diff --git a/arch/arm/crypto/aes-cipher-glue.c
b/arch/arm/crypto/aes-cipher-glue.c
index c222f6e072ad..630e1a436f1d 100644
--- a/arch/arm/crypto/aes-cipher-glue.c
+++ b/arch/arm/crypto/aes-cipher-glue.c
@@ -23,16 +23,22 @@ static void aes_encrypt(struct crypto_tfm *tfm, u8
*out, const u8 *in)
 {
  struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
  int rounds = 6 + ctx->key_length / 4;
+ unsigned long flags;

+ local_irq_save(flags);
  __aes_arm_encrypt(ctx->key_enc, rounds, in, out);
+ local_irq_restore(flags);
 }

 static void aes_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
  struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);
  int rounds = 6 + ctx->key_length / 4;
+ unsigned long flags;

+ local_irq_save(flags);
  __aes_arm_decrypt(ctx->key_dec, rounds, in, out);
+ local_irq_restore(flags);
 }

 static struct crypto_alg aes_alg = {
diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
index ca554d57d01e..82fa860c9cb9 100644
--- a/crypto/aes_generic.c
+++ b/crypto/aes_generic.c
@@ -63,7 +63,7 @@ static inline u8 byte(const u32 x, const unsigned n)

 static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 };

-__visible const u32 crypto_ft_tab[4][256] = {
+__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = {
  {
  0xa56363c6, 0x847c7cf8, 0x99ee, 0x8d7b7bf6,
  0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591,
@@ -327,7 +327,7 @@ __visible const u32 crypto_ft_tab[4][256] = {
  }
 };

-__visible const u32 crypto_fl_tab[4][256] = {
+__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = {
  {
  0x0063, 0x007c, 0x0077, 0x007b,
  0x00f2, 0x006b, 0x006f, 0x00c5,
@@ -591,7 +591,7 @@ __visible const u32 crypto_fl_tab[4][256] = {
  }
 };

-__visible const u32 crypto_it_tab[4][256] = {
+__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = {
  {
  0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a,
  0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b,
@@ -855,7 +855,7 @@ __visible const u32 crypto_it_tab[4][256] = {
  }
 };

-__visible const u32 crypto_il_tab[4][256] = {
+__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = {
  {
  0x0052, 0x0009, 0x006a, 0x00d5,
  0x0030, 0x0036, 0x00a5, 0x0038,






> Fixes: b5e0b032b6c3 ("crypto: aes - add generic time invariant AES cipher")
> Signed-off-by: Eric Biggers 
> ---
>  crypto/aes_ti.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c
> index 03023b2290e8e..81b604419ee0e 100644
> --- a/crypto/aes_ti.c
> +++ b/crypto/aes_ti.c
> @@ -269,6 +269,7 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> const u32 *rkp = ctx->key_enc + 4;
> int rounds = 6 + ctx->key_length / 4;
> u32 st0[4], st1[4];
> +   unsigned long flags;
> int round;
>
> st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in);
> @@ -276,6 +277,12 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 
> *out, const u8 *in)
> st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8);
> st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in +

Re: [PATCH] crypto: arm64/aes - fix handling sub-block CTS-CBC inputs

2018-10-03 Thread Ard Biesheuvel

On 3 October 2018 at 07:22, Eric Biggers  wrote:
> From: Eric Biggers 
>
> In the new arm64 CTS-CBC implementation, return an error code rather
> than crashing on inputs shorter than AES_BLOCK_SIZE bytes.  Also set
> cra_blocksize to AES_BLOCK_SIZE (like is done in the cts template) to
> indicate the minimum input size.
>
> Fixes: dd597fb33ff0 ("crypto: arm64/aes-blk - add support for CTS-CBC mode")
> Signed-off-by: Eric Biggers 

Thanks Eric

Reviewed-by: Ard Biesheuvel 

> ---
>  arch/arm64/crypto/aes-glue.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
> index 26d2b0263ba63..1e676625ef33f 100644
> --- a/arch/arm64/crypto/aes-glue.c
> +++ b/arch/arm64/crypto/aes-glue.c
> @@ -243,8 +243,11 @@ static int cts_cbc_encrypt(struct skcipher_request *req)
>
> skcipher_request_set_tfm(>subreq, tfm);
>
> -   if (req->cryptlen == AES_BLOCK_SIZE)
> +   if (req->cryptlen <= AES_BLOCK_SIZE) {
> +   if (req->cryptlen < AES_BLOCK_SIZE)
> +   return -EINVAL;
> cbc_blocks = 1;
> +   }
>
> if (cbc_blocks > 0) {
> unsigned int blocks;
> @@ -305,8 +308,11 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
>
> skcipher_request_set_tfm(>subreq, tfm);
>
> -   if (req->cryptlen == AES_BLOCK_SIZE)
> +   if (req->cryptlen <= AES_BLOCK_SIZE) {
> +   if (req->cryptlen < AES_BLOCK_SIZE)
> +   return -EINVAL;
> cbc_blocks = 1;
> +   }
>
> if (cbc_blocks > 0) {
> unsigned int blocks;
> @@ -486,14 +492,13 @@ static struct skcipher_alg aes_algs[] = { {
> .cra_driver_name= "__cts-cbc-aes-" MODE,
> .cra_priority   = PRIO,
> .cra_flags  = CRYPTO_ALG_INTERNAL,
> -   .cra_blocksize  = 1,
> +   .cra_blocksize  = AES_BLOCK_SIZE,
> .cra_ctxsize= sizeof(struct crypto_aes_ctx),
> .cra_module = THIS_MODULE,
> },
> .min_keysize= AES_MIN_KEY_SIZE,
> .max_keysize= AES_MAX_KEY_SIZE,
> .ivsize = AES_BLOCK_SIZE,
> -   .chunksize  = AES_BLOCK_SIZE,
> .walksize   = 2 * AES_BLOCK_SIZE,
> .setkey = skcipher_aes_setkey,
> .encrypt= cts_cbc_encrypt,
> --
> 2.19.0
>

Re: [PATCH v2 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-02 Thread Ondrej Mosnacek

On Mon, Oct 1, 2018 at 10:36 AM Ard Biesheuvel
 wrote:
> Use the correct __le32 annotation and accessors to perform the
> single round of AES encryption performed inside the AEGIS transform.
> Otherwise, tcrypt reports:
>
>   alg: aead: Test 1 failed on encryption for aegis128-generic
>   : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
>   alg: aead: Test 1 failed on encryption for aegis128l-generic
>   : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
>   alg: aead: Test 1 failed on encryption for aegis256-generic
>   : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c
>
> Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations")
> Cc:  # v4.18+
> Signed-off-by: Ard Biesheuvel 

LGTM now, thanks!

Reviewed-by: Ondrej Mosnacek 

> ---
>  crypto/aegis.h | 20 +---
>  1 file changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/crypto/aegis.h b/crypto/aegis.h
> index f1c6900ddb80..405e025fc906 100644
> --- a/crypto/aegis.h
> +++ b/crypto/aegis.h
> @@ -21,7 +21,7 @@
>
>  union aegis_block {
> __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
> -   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
> +   __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)];
> u8 bytes[AEGIS_BLOCK_SIZE];
>  };
>
> @@ -57,24 +57,22 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
> const union aegis_block *src,
> const union aegis_block *key)
>  {
> -   u32 *d = dst->words32;
> const u8  *s  = src->bytes;
> -   const u32 *k  = key->words32;
> const u32 *t0 = crypto_ft_tab[0];
> const u32 *t1 = crypto_ft_tab[1];
> const u32 *t2 = crypto_ft_tab[2];
> const u32 *t3 = crypto_ft_tab[3];
> u32 d0, d1, d2, d3;
>
> -   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0];
> -   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1];
> -   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2];
> -   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3];
> +   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]];
> +   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]];
> +   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]];
> +   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]];
>
> -   d[0] = d0;
> -   d[1] = d1;
> -   d[2] = d2;
> -   d[3] = d3;
> +   dst->words32[0] = cpu_to_le32(d0) ^ key->words32[0];
> +   dst->words32[1] = cpu_to_le32(d1) ^ key->words32[1];
> +   dst->words32[2] = cpu_to_le32(d2) ^ key->words32[2];
> +   dst->words32[3] = cpu_to_le32(d3) ^ key->words32[3];
>  }
>
>  #endif /* _CRYPTO_AEGIS_H */
> --
> 2.17.1
>

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ondrej Mosnacek

On Mon, Oct 1, 2018 at 10:01 AM Ard Biesheuvel
 wrote:
> On 1 October 2018 at 10:00, Ondrej Mosnacek  wrote:
> > On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel
> >  wrote:
> >> On 30 September 2018 at 10:58, Ard Biesheuvel  
> >> wrote:
> >> > Use the correct __le32 annotation and accessors to perform the
> >> > single round of AES encryption performed inside the AEGIS transform.
> >> > Otherwise, tcrypt reports:
> >> >
> >> >   alg: aead: Test 1 failed on encryption for aegis128-generic
> >> >   : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
> >> >   alg: aead: Test 1 failed on encryption for aegis128l-generic
> >> >   : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
> >> >   alg: aead: Test 1 failed on encryption for aegis256-generic
> >> >   : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c
> >> >
> >> > While at it, let's refer to the first precomputed table only, and
> >> > derive the other ones by rotation. This reduces the D-cache footprint
> >> > by 75%, and shouldn't be too costly or free on load/store architectures
> >> > (and X86 has its own AES-NI based implementation)
> >> >
> >> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD 
> >> > implementations")
> >> > Cc:  # v4.18+
> >> > Signed-off-by: Ard Biesheuvel 
> >> > ---
> >> >  crypto/aegis.h | 23 +---
> >> >  1 file changed, 10 insertions(+), 13 deletions(-)
> >> >
> >> > diff --git a/crypto/aegis.h b/crypto/aegis.h
> >> > index f1c6900ddb80..84d3e07a3c33 100644
> >> > --- a/crypto/aegis.h
> >> > +++ b/crypto/aegis.h
> >> > @@ -21,7 +21,7 @@
> >> >
> >> >  union aegis_block {
> >> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
> >> > -   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
> >> > +   __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)];
> >> > u8 bytes[AEGIS_BLOCK_SIZE];
> >> >  };
> >> >
> >> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block 
> >> > *dst,
> >> >  {
> >> > u32 *d = dst->words32;
> >> > const u8  *s  = src->bytes;
> >> > -   const u32 *k  = key->words32;
> >> > +   const __le32 *k  = key->words32;
> >> > const u32 *t0 = crypto_ft_tab[0];
> >> > -   const u32 *t1 = crypto_ft_tab[1];
> >> > -   const u32 *t2 = crypto_ft_tab[2];
> >> > -   const u32 *t3 = crypto_ft_tab[3];
> >> > u32 d0, d1, d2, d3;
> >> >
> >> > -   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0];
> >> > -   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1];
> >> > -   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2];
> >> > -   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3];
> >> > +   d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ 
> >> > rol32(t0[s[15]], 24);
> >> > +   d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ 
> >> > rol32(t0[s[ 3]], 24);
> >> > +   d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ 
> >> > rol32(t0[s[ 7]], 24);
> >> > +   d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ 
> >> > rol32(t0[s[11]], 24);
> >> >
> >> > -   d[0] = d0;
> >> > -   d[1] = d1;
> >> > -   d[2] = d2;
> >> > -   d[3] = d3;
> >> > +   d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0]));
> >> > +   d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1]));
> >> > +   d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2]));
> >> > +   d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3]));
> >>
> >>
> >> I suppose this
> >>
> >> > +   d[0] = cpu_to_le32(d0) ^ k[0];
> >> > +   d[1] = cpu_to_le32(d1) ^ k[1];
> >> > +   d[2] = cpu_to_le32(d2) ^ k[2];
> >> > +   d[3] = cpu_to_le32(d3) ^ k[3];
> >>
> >> should work fine as well
> >
> > Yeah, that looks nicer, but I'm not sure if it is completely OK to do
> > bitwise/arithmetic operations directly on the __[lb]e* types...  Maybe
> > yes, but the code I've seen that used them usually seemed to treat
> > them as opaque types.
> >
>
> No, xor is fine with __le/__be types

Ah, OK then.  Good to know :)

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel

On 1 October 2018 at 10:00, Ondrej Mosnacek  wrote:
> On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel
>  wrote:
>> On 30 September 2018 at 10:58, Ard Biesheuvel  
>> wrote:
>> > Use the correct __le32 annotation and accessors to perform the
>> > single round of AES encryption performed inside the AEGIS transform.
>> > Otherwise, tcrypt reports:
>> >
>> >   alg: aead: Test 1 failed on encryption for aegis128-generic
>> >   : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
>> >   alg: aead: Test 1 failed on encryption for aegis128l-generic
>> >   : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
>> >   alg: aead: Test 1 failed on encryption for aegis256-generic
>> >   : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c
>> >
>> > While at it, let's refer to the first precomputed table only, and
>> > derive the other ones by rotation. This reduces the D-cache footprint
>> > by 75%, and shouldn't be too costly or free on load/store architectures
>> > (and X86 has its own AES-NI based implementation)
>> >
>> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD 
>> > implementations")
>> > Cc:  # v4.18+
>> > Signed-off-by: Ard Biesheuvel 
>> > ---
>> >  crypto/aegis.h | 23 +---
>> >  1 file changed, 10 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/crypto/aegis.h b/crypto/aegis.h
>> > index f1c6900ddb80..84d3e07a3c33 100644
>> > --- a/crypto/aegis.h
>> > +++ b/crypto/aegis.h
>> > @@ -21,7 +21,7 @@
>> >
>> >  union aegis_block {
>> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
>> > -   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
>> > +   __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)];
>> > u8 bytes[AEGIS_BLOCK_SIZE];
>> >  };
>> >
>> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
>> >  {
>> > u32 *d = dst->words32;
>> > const u8  *s  = src->bytes;
>> > -   const u32 *k  = key->words32;
>> > +   const __le32 *k  = key->words32;
>> > const u32 *t0 = crypto_ft_tab[0];
>> > -   const u32 *t1 = crypto_ft_tab[1];
>> > -   const u32 *t2 = crypto_ft_tab[2];
>> > -   const u32 *t3 = crypto_ft_tab[3];
>> > u32 d0, d1, d2, d3;
>> >
>> > -   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0];
>> > -   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1];
>> > -   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2];
>> > -   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3];
>> > +   d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ 
>> > rol32(t0[s[15]], 24);
>> > +   d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ 
>> > rol32(t0[s[ 3]], 24);
>> > +   d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ 
>> > rol32(t0[s[ 7]], 24);
>> > +   d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ 
>> > rol32(t0[s[11]], 24);
>> >
>> > -   d[0] = d0;
>> > -   d[1] = d1;
>> > -   d[2] = d2;
>> > -   d[3] = d3;
>> > +   d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0]));
>> > +   d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1]));
>> > +   d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2]));
>> > +   d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3]));
>>
>>
>> I suppose this
>>
>> > +   d[0] = cpu_to_le32(d0) ^ k[0];
>> > +   d[1] = cpu_to_le32(d1) ^ k[1];
>> > +   d[2] = cpu_to_le32(d2) ^ k[2];
>> > +   d[3] = cpu_to_le32(d3) ^ k[3];
>>
>> should work fine as well
>
> Yeah, that looks nicer, but I'm not sure if it is completely OK to do
> bitwise/arithmetic operations directly on the __[lb]e* types...  Maybe
> yes, but the code I've seen that used them usually seemed to treat
> them as opaque types.
>

No, xor is fine with __le/__be types

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ondrej Mosnacek

On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel
 wrote:
> On 30 September 2018 at 10:58, Ard Biesheuvel  
> wrote:
> > Use the correct __le32 annotation and accessors to perform the
> > single round of AES encryption performed inside the AEGIS transform.
> > Otherwise, tcrypt reports:
> >
> >   alg: aead: Test 1 failed on encryption for aegis128-generic
> >   : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
> >   alg: aead: Test 1 failed on encryption for aegis128l-generic
> >   : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
> >   alg: aead: Test 1 failed on encryption for aegis256-generic
> >   : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c
> >
> > While at it, let's refer to the first precomputed table only, and
> > derive the other ones by rotation. This reduces the D-cache footprint
> > by 75%, and shouldn't be too costly or free on load/store architectures
> > (and X86 has its own AES-NI based implementation)
> >
> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD 
> > implementations")
> > Cc:  # v4.18+
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  crypto/aegis.h | 23 +---
> >  1 file changed, 10 insertions(+), 13 deletions(-)
> >
> > diff --git a/crypto/aegis.h b/crypto/aegis.h
> > index f1c6900ddb80..84d3e07a3c33 100644
> > --- a/crypto/aegis.h
> > +++ b/crypto/aegis.h
> > @@ -21,7 +21,7 @@
> >
> >  union aegis_block {
> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
> > -   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
> > +   __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)];
> > u8 bytes[AEGIS_BLOCK_SIZE];
> >  };
> >
> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
> >  {
> > u32 *d = dst->words32;
> > const u8  *s  = src->bytes;
> > -   const u32 *k  = key->words32;
> > +   const __le32 *k  = key->words32;
> > const u32 *t0 = crypto_ft_tab[0];
> > -   const u32 *t1 = crypto_ft_tab[1];
> > -   const u32 *t2 = crypto_ft_tab[2];
> > -   const u32 *t3 = crypto_ft_tab[3];
> > u32 d0, d1, d2, d3;
> >
> > -   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0];
> > -   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1];
> > -   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2];
> > -   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3];
> > +   d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ 
> > rol32(t0[s[15]], 24);
> > +   d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ 
> > rol32(t0[s[ 3]], 24);
> > +   d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ 
> > rol32(t0[s[ 7]], 24);
> > +   d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ 
> > rol32(t0[s[11]], 24);
> >
> > -   d[0] = d0;
> > -   d[1] = d1;
> > -   d[2] = d2;
> > -   d[3] = d3;
> > +   d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0]));
> > +   d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1]));
> > +   d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2]));
> > +   d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3]));
>
>
> I suppose this
>
> > +   d[0] = cpu_to_le32(d0) ^ k[0];
> > +   d[1] = cpu_to_le32(d1) ^ k[1];
> > +   d[2] = cpu_to_le32(d2) ^ k[2];
> > +   d[3] = cpu_to_le32(d3) ^ k[3];
>
> should work fine as well

Yeah, that looks nicer, but I'm not sure if it is completely OK to do
bitwise/arithmetic operations directly on the __[lb]e* types...  Maybe
yes, but the code I've seen that used them usually seemed to treat
them as opaque types.

>
> >  }
> >
> >  #endif /* _CRYPTO_AEGIS_H */
> > --
> > 2.19.0
> >

--
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel

On 1 October 2018 at 09:50, Ondrej Mosnacek  wrote:
> Hi Ard,
>
> On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel
>  wrote:
>> Use the correct __le32 annotation and accessors to perform the
>> single round of AES encryption performed inside the AEGIS transform.
>> Otherwise, tcrypt reports:
>>
>>   alg: aead: Test 1 failed on encryption for aegis128-generic
>>   : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
>>   alg: aead: Test 1 failed on encryption for aegis128l-generic
>>   : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
>>   alg: aead: Test 1 failed on encryption for aegis256-generic
>>   : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c
>
> Hm...  I think the reason I made a mistake here is that I first had a
> version with the AES table hard-coded and I had an #ifdef  endian> #else #endif there with values for little-endian and
> big-endian variants.  Then I realized the aes_generic module exports
> the crypto_ft_table and rewrote the code to use that.  Somewhere along
> the way I forgot to check if the aes_generic table uses the same trick
> and correct the code...
>
> It would be nice to apply the same optimization to aes_generic.c, but
> unfortunately the current tables are exported so changing the
> convention would break external modules that use them :/
>

Indeed. I am doing some refactoring work on the AES code, which is how
I ran into this in the first place.

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=for-kernelci

>>
>> While at it, let's refer to the first precomputed table only, and
>> derive the other ones by rotation. This reduces the D-cache footprint
>> by 75%, and shouldn't be too costly or free on load/store architectures
>> (and X86 has its own AES-NI based implementation)
>
> Could you maybe extract this into a separate patch?  I don't think we
> should mix functional and performance fixes together.
>

Yeah, good point. I will do that and fold in the simplification.

>>
>> Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD 
>> implementations")
>> Cc:  # v4.18+
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  crypto/aegis.h | 23 +---
>>  1 file changed, 10 insertions(+), 13 deletions(-)
>>
>> diff --git a/crypto/aegis.h b/crypto/aegis.h
>> index f1c6900ddb80..84d3e07a3c33 100644
>> --- a/crypto/aegis.h
>> +++ b/crypto/aegis.h
>> @@ -21,7 +21,7 @@
>>
>>  union aegis_block {
>> __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
>> -   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
>> +   __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)];
>> u8 bytes[AEGIS_BLOCK_SIZE];
>>  };
>>
>> @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
>>  {
>> u32 *d = dst->words32;
>> const u8  *s  = src->bytes;
>> -   const u32 *k  = key->words32;
>> +   const __le32 *k  = key->words32;
>> const u32 *t0 = crypto_ft_tab[0];
>> -   const u32 *t1 = crypto_ft_tab[1];
>> -   const u32 *t2 = crypto_ft_tab[2];
>> -   const u32 *t3 = crypto_ft_tab[3];
>> u32 d0, d1, d2, d3;
>>
>> -   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0];
>> -   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1];
>> -   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2];
>> -   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3];
>> +   d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ 
>> rol32(t0[s[15]], 24);
>> +   d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ 
>> rol32(t0[s[ 3]], 24);
>> +   d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ 
>> rol32(t0[s[ 7]], 24);
>> +   d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ 
>> rol32(t0[s[11]], 24);
>>
>> -   d[0] = d0;
>> -   d[1] = d1;
>> -   d[2] = d2;
>> -   d[3] = d3;
>> +   d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0]));
>> +   d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1]));
>> +   d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2]));
>> +   d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3]));
>>  }
>>
>>  #endif /* _CRYPTO_AEGIS_H */
>> --
>> 2.19.0
>>
>
> Thanks,
>
> --
> Ondrej Mosnacek 
> Associate Software Engineer, Security Technologies
> Red Hat, Inc.

Re: [PATCH 2/2] crypto: aegis/generic - fix for big endian systems

2018-10-01 Thread Ondrej Mosnacek

Hi Ard,

On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel
 wrote:
> Use the correct __le32 annotation and accessors to perform the
> single round of AES encryption performed inside the AEGIS transform.
> Otherwise, tcrypt reports:
>
>   alg: aead: Test 1 failed on encryption for aegis128-generic
>   : 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e
>   alg: aead: Test 1 failed on encryption for aegis128l-generic
>   : cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28
>   alg: aead: Test 1 failed on encryption for aegis256-generic
>   : aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c

Hm...  I think the reason I made a mistake here is that I first had a
version with the AES table hard-coded and I had an #ifdef  #else #endif there with values for little-endian and
big-endian variants.  Then I realized the aes_generic module exports
the crypto_ft_table and rewrote the code to use that.  Somewhere along
the way I forgot to check if the aes_generic table uses the same trick
and correct the code...

It would be nice to apply the same optimization to aes_generic.c, but
unfortunately the current tables are exported so changing the
convention would break external modules that use them :/

>
> While at it, let's refer to the first precomputed table only, and
> derive the other ones by rotation. This reduces the D-cache footprint
> by 75%, and shouldn't be too costly or free on load/store architectures
> (and X86 has its own AES-NI based implementation)

Could you maybe extract this into a separate patch?  I don't think we
should mix functional and performance fixes together.

>
> Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations")
> Cc:  # v4.18+
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/aegis.h | 23 +---
>  1 file changed, 10 insertions(+), 13 deletions(-)
>
> diff --git a/crypto/aegis.h b/crypto/aegis.h
> index f1c6900ddb80..84d3e07a3c33 100644
> --- a/crypto/aegis.h
> +++ b/crypto/aegis.h
> @@ -21,7 +21,7 @@
>
>  union aegis_block {
> __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)];
> -   u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)];
> +   __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)];
> u8 bytes[AEGIS_BLOCK_SIZE];
>  };
>
> @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst,
>  {
> u32 *d = dst->words32;
> const u8  *s  = src->bytes;
> -   const u32 *k  = key->words32;
> +   const __le32 *k  = key->words32;
> const u32 *t0 = crypto_ft_tab[0];
> -   const u32 *t1 = crypto_ft_tab[1];
> -   const u32 *t2 = crypto_ft_tab[2];
> -   const u32 *t3 = crypto_ft_tab[3];
> u32 d0, d1, d2, d3;
>
> -   d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0];
> -   d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1];
> -   d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2];
> -   d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3];
> +   d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ 
> rol32(t0[s[15]], 24);
> +   d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ 
> rol32(t0[s[ 3]], 24);
> +   d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ 
> rol32(t0[s[ 7]], 24);
> +   d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ 
> rol32(t0[s[11]], 24);
>
> -   d[0] = d0;
> -   d[1] = d1;
> -   d[2] = d2;
> -   d[3] = d3;
> +   d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0]));
> +   d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1]));
> +   d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2]));
> +   d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3]));
>  }
>
>  #endif /* _CRYPTO_AEGIS_H */
> --
> 2.19.0
>

Thanks,

-- 
Ondrej Mosnacek 
Associate Software Engineer, Security Technologies
Red Hat, Inc.

Re: [PATCH 1/2] crypto: morus/generic - fix for big endian systems

2018-10-01 Thread Ard Biesheuvel

On 1 October 2018 at 09:26, Ondrej Mosnacek  wrote:
> On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel
>  wrote:
>> Omit the endian swabbing when folding the lengths of the assoc and
>> crypt input buffers into the state to finalize the tag. This is not
>> necessary given that the memory representation of the state is in
>> machine native endianness already.
>>
>> This fixes an error reported by tcrypt running on a big endian system:
>>
>>   alg: aead: Test 2 failed on encryption for morus640-generic
>>   : a8 30 ef fb e6 26 eb 23 b0 87 dd 98 57 f3 e1 4b
>>   0010: 21
>>   alg: aead: Test 2 failed on encryption for morus1280-generic
>>   : 88 19 1b fb 1c 29 49 0e ee 82 2f cb 97 a6 a5 ee
>>   0010: 5f
>
> Yikes, I never really got around to test MORUS and AEGIS on a BE
> machine...  My mistake, sorry :/
>

No worries - this is brand new code so this is not entirely unexpected.

>>
>> Fixes: 396be41f16fd ("crypto: morus - Add generic MORUS AEAD 
>> implementations")
>> Cc:  # v4.18+
>> Signed-off-by: Ard Biesheuvel 
>
> Reviewed-by: Ondrej Mosnacek 
>

Thanks!

>> ---
>>  crypto/morus1280.c |  7 ++-
>>  crypto/morus640.c  | 16 
>>  2 files changed, 6 insertions(+), 17 deletions(-)
>>
>> diff --git a/crypto/morus1280.c b/crypto/morus1280.c
>> index d057cf5ac4a8..3889c188f266 100644
>> --- a/crypto/morus1280.c
>> +++ b/crypto/morus1280.c
>> @@ -385,14 +385,11 @@ static void crypto_morus1280_final(struct 
>> morus1280_state *state,
>>struct morus1280_block *tag_xor,
>>u64 assoclen, u64 cryptlen)
>>  {
>> -   u64 assocbits = assoclen * 8;
>> -   u64 cryptbits = cryptlen * 8;
>> -
>> struct morus1280_block tmp;
>> unsigned int i;
>>
>> -   tmp.words[0] = cpu_to_le64(assocbits);
>> -   tmp.words[1] = cpu_to_le64(cryptbits);
>> +   tmp.words[0] = assoclen * 8;
>> +   tmp.words[1] = cryptlen * 8;
>> tmp.words[2] = 0;
>> tmp.words[3] = 0;
>>
>> diff --git a/crypto/morus640.c b/crypto/morus640.c
>> index 1ca76e54281b..da06ec2f6a80 100644
>> --- a/crypto/morus640.c
>> +++ b/crypto/morus640.c
>> @@ -384,21 +384,13 @@ static void crypto_morus640_final(struct 
>> morus640_state *state,
>>   struct morus640_block *tag_xor,
>>   u64 assoclen, u64 cryptlen)
>>  {
>> -   u64 assocbits = assoclen * 8;
>> -   u64 cryptbits = cryptlen * 8;
>> -
>> -   u32 assocbits_lo = (u32)assocbits;
>> -   u32 assocbits_hi = (u32)(assocbits >> 32);
>> -   u32 cryptbits_lo = (u32)cryptbits;
>> -   u32 cryptbits_hi = (u32)(cryptbits >> 32);
>> -
>> struct morus640_block tmp;
>> unsigned int i;
>>
>> -   tmp.words[0] = cpu_to_le32(assocbits_lo);
>> -   tmp.words[1] = cpu_to_le32(assocbits_hi);
>> -   tmp.words[2] = cpu_to_le32(cryptbits_lo);
>> -   tmp.words[3] = cpu_to_le32(cryptbits_hi);
>> +   tmp.words[0] = lower_32_bits(assoclen * 8);
>> +   tmp.words[1] = upper_32_bits(assoclen * 8);
>> +   tmp.words[2] = lower_32_bits(cryptlen * 8);
>> +   tmp.words[3] = upper_32_bits(cryptlen * 8);
>>
>> for (i = 0; i < MORUS_BLOCK_WORDS; i++)
>> state->s[4].words[i] ^= state->s[0].words[i];
>> --
>> 2.19.0
>>
>
> Thanks,
>
> --
> Ondrej Mosnacek 
> Associate Software Engineer, Security Technologies
> Red Hat, Inc.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 16295 matches

Mail list logo