RE: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> From: Jerome Glisse > Sent: Thursday, September 13, 2018 10:52 PM > [...] > AFAIK, on x86 and PPC at least, all PCIE devices are in the same group > by default at boot or at least all devices behind the same bridge. the group thing reflects physical hierarchy limitation, not changed cross boot. Please note iommu group defines the minimal isolation boundary - all devices within same group must be attached to the same iommu domain or address space, because physically IOMMU cannot differentiate DMAs out of those devices. devices behind legacy PCI-X bridge is one example. other examples include devices behind a PCIe switch port which doesn't support ACS thus cannot route p2p transaction to IOMMU. If talking about typical PCIe endpoint (with upstreaming ports all supporting ACS), you'll get one device per group. One iommu group today is attached to only one iommu domain. In the future one group may attach to multiple domains, as the aux domain concept being discussed in another thread. > > Maybe they are kernel option to avoid that and userspace init program > can definitly re-arrange that base on sysadmin policy). I don't think there is such option, as it may break isolation model enabled by IOMMU. [...] > > > That is why i am being pedantic :) on making sure there is good reasons > > > to do what you do inside VFIO. I do believe that we want a common > frame- > > > work like the one you are proposing but i do not believe it should be > > > part of VFIO given the baggages it comes with and that are not relevant > > > to the use cases for this kind of devices. > > The purpose of VFIO is clear - the kernel portal for granting generic device resource (mmio, irq, etc.) to user space. VFIO doesn't care what exactly a resource is used for (queue, cmd reg, etc.). If really pursuing VFIO path is necessary, maybe such common framework should lay down in user space, which gets all granted resource from kernel driver thru VFIO and then provides accelerator services to other processes? Thanks Kevin
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On 12 September 2018 at 20:34, Eric Biggers wrote: > On Wed, Sep 12, 2018 at 08:19:21PM +0200, Ard Biesheuvel wrote: >> On 12 September 2018 at 20:16, Jason A. Donenfeld wrote: >> > Hi Eric, >> > >> > On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers wrote: >> >> I'd strongly prefer the assembly to be readable too. Jason, I'm not sure >> >> if >> >> you've actually read through the asm from the OpenSSL implementations, >> >> but the >> >> generated .S files actually do lose a lot of semantic information that >> >> was in >> >> the original .pl scripts. >> > >> > The thing to keep in mind is that the .S was not directly and blindly >> > generated from the .pl. We started with the output of the .pl, and >> > then, particularly in the case of x86_64, worked with it a lot, and >> > now it's something a bit different. We've definitely spent a lot of >> > time reading that assembly. >> > >> >> Can we please have those changes as a separate patch? Preferably to >> the .pl file rather than the .S file, so we can easily distinguish the >> code from upstream from the code that you modified. >> >> > I'll see if I can improve the readability with some register name >> > remapping on ARM. No guarantees, but I'll play a bit and see if I can >> > make it a bit better. >> > >> > Jason > > FWIW, yesterday I made a modified version of poly1305-armv4.pl that generates > an > asm file that works in kernel mode. The changes are actually pretty small, > and > I think we can get them upstream into OpenSSL like they were for > sha256-armv4.pl > and sha512-armv4.pl. I'll start a thread with Andy Polyakov and you two. > > But I don't have time to help with all the many OpenSSL asm files Jason is > proposing, just maybe poly1305-armv4 and chacha-armv4 for now. > Thanks Eric. I reached out to Andy Polyakov off line, and he is happy to work with us again on this, although he did point out that our experiences on ARM may not extrapolate to x86_64, given the fact that the perl sources there also contain parameterization for the calling convention differences between Windows and SysV.
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On 13 September 2018 at 17:58, Jason A. Donenfeld wrote: > On Thu, Sep 13, 2018 at 5:43 PM Ard Biesheuvel > wrote: >> I'd prefer it if all the accelerated software implementations live in >> the same place. But I do strongly prefer arch code to live in >> arch/$arch > > Zinc follows the scheme of the raid6 code, as well as of most other > crypto libraries: code is grouped by cipher, making it easy for people > to work with and understand differing implementations. It also allows > us to trivially link these together at compile time rather than at > link time, which makes cipher selection much more efficient. It's > really much more maintainable this way. > >> I think AES-GCM is a useful example here. I really like the SIMD token >> abstraction a lot, but I would like to understand how this would work >> in Zinc if you have >> a) a generic implementation >> b) perhaps an arch specific scalar implementation >> c) a pure NEON implementation >> d) an implementation using AES instructions but not the PMULL instructions >> e) an implementation that uses AES and PMULL instructions. > > The same way that Zinc currently chooses between the five different > implementations for, say, x86_64 ChaCha20: > > - Generic C scalar > - SSSE3 > - AVX2 > - AVX512F > - AVX512VL > > We make a decision based on CPU capabilities, SIMD context, and input > length, and then choose the right function. > OK, so given random.c's future dependency on Zinc (for ChaCha20), and the fact that Zinc is one monolithic piece of code, all versions of all algorithms will always be statically linked into the kernel proper. I'm not sure that is acceptable. >> You know what? If you're up for it, let's not wait until Plumbers, but >> instead, let's collaborate off list to get this into shape. > > Sure, sounds good. > BTW you haven't answered my question yet about what happens when the WireGuard protocol version changes: will we need a flag day and switch all deployments over at the same time?
Re: [PATCH] gcmaes_crypt_by_sg: don't use GFP_ATOMIC allocation if the request doesn't cross a page
On Wed, Sep 05, 2018 at 09:18:43AM -0400, Mikulas Patocka wrote: > This patch fixes gcmaes_crypt_by_sg so that it won't use memory > allocation if the data doesn't cross a page boundary. > > Authenticated encryption may be used by dm-crypt. If the encryption or > decryption fails, it would result in I/O error and filesystem corruption. > The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can > fail anytime. This patch fixes the logic so that it won't attempt the > failing allocation if the data doesn't cross a page boundary. > > Signed-off-by: Mikulas Patocka > Cc: sta...@vger.kernel.org Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 1/1] crypto: cavium/nitrox - Added support for SR-IOV configuration.
On Fri, Sep 07, 2018 at 12:31:18PM +0530, Srikanth Jampala wrote: > Added support to configure SR-IOV using sysfs interface. > Supported VF modes are 16, 32, 64 and 128. Grouped the > hardware configuration functions to "nitrox_hal.h" file. > Changed driver version to "1.1". > > Signed-off-by: Srikanth Jampala > Reviewed-by: Gadam Sreerama Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFC PATCH cryptodev] crc-t10dif: crc_t10dif_mutex can be static
On Wed, Sep 05, 2018 at 01:52:44AM +0800, kbuild test robot wrote: > > Fixes: b76377543b73 ("crc-t10dif: Pick better transform if one becomes > available") > Signed-off-by: kbuild test robot Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2
On Wed, Sep 05, 2018 at 09:26:41AM +0200, Ondrej Mosnacek wrote: > It turns out OSXSAVE needs to be checked only for AVX, not for SSE. > Without this patch the affected modules refuse to load on CPUs with SSE2 > but without AVX support. > > Fixes: 877ccce7cbe8 ("crypto: x86/aegis,morus - Fix and simplify CPUID > checks") > Cc: # 4.18 > Reported-by: Zdenek Kaspar > Signed-off-by: Ondrej Mosnacek Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v8 5/9] dm: Remove VLA usage from hashes
On Thu, Sep 13, 2018 at 01:54:39PM -0400, Mike Snitzer wrote: > > Acked-by: Mike Snitzer Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: Adds user space interface for ALG_SET_KEY_TYPE
Kalyani Akula wrote: > ALG_SET_KEY_TYPE requires caller to pass the key_type to be used > for AES encryption/decryption. > > Sometimes the cipher key will be stored in the device's > hardware (eFuse, BBRAM etc).So,there is a need to specify the information > about the key-type to use it for Encrypt or Decrypt operations. > > This patch implements the above requirement. > > > Signed-off-by: Kalyani Akula This patch does not make sense by itself. Please post it together with the patches that will make use of this. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: padlock-aes: Add ecx to outputs for rep instructions
On Fri, Sep 07, 2018 at 02:57:42AM +0100, Ben Hutchings wrote: > The current constraints for inline "rep xcrypt*" instructions mark ecx > as an input only. The compiler can therefore assume wrongly that ecx > holds the same value afterward, but in reality it will contain 0. > > This previously led to data corruption, which was fixed around by > commit 46d8c4b28652 ("crypto: padlock-aes - Fix Nano workaround data > corruption"). But a future compiler or different optimisation > configuration could reintroduce the problem. Update the constraints > to avoid this. > > Fixes: a76c1c23d0c3 ("crypto: padlock-aes - work around Nano CPU ...") > Fixes: 46d8c4b28652 ("crypto: padlock-aes - Fix Nano workaround data ...") > Signed-off-by: Ben Hutchings > --- > This is totally untested, so don't assume I know what I'm talking > about. :-) Could you please test it by combining this patch with a revert of my fix to confirm that it actually works? Thanks! -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
On Thu, Sep 13, 2018 at 10:51:50AM -0400, Jerome Glisse wrote: > Date: Thu, 13 Sep 2018 10:51:50 -0400 > From: Jerome Glisse > To: Kenneth Lee > CC: Kenneth Lee , Alex Williamson > , Herbert Xu , > k...@vger.kernel.org, Jonathan Corbet , Greg Kroah-Hartman > , Zaibo Xu , > linux-...@vger.kernel.org, Sanjay Kumar , Hao > Fang , linux-ker...@vger.kernel.org, > linux...@huawei.com, io...@lists.linux-foundation.org, "David S . Miller" > , linux-crypto@vger.kernel.org, Zhou Wang > , Philippe Ombredanne , > Thomas Gleixner , Joerg Roedel , > linux-accelerat...@lists.ozlabs.org, Lu Baolu > Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive > User-Agent: Mutt/1.10.1 (2018-07-13) > Message-ID: <20180913145149.gb3...@redhat.com> > > On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote: > > On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote: > > > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote: > > > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote: > > > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote: > > > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote: > > > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote: > > > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote: > > > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote: > > > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse > > > > > > > > > > wrote: > > > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee > > > > > > > > > > > wrote: > > > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex > > > > > > > > > > > > Williamson wrote: > > > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth > > > > > > > > > > > > > > Lee wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it > > > > > > > > > > "copy_from_user" the > > > > > > > > > > user memory to the kernel. That is not what we need. What > > > > > > > > > > we try to get is: the > > > > > > > > > > user application do something on its data, and push it away > > > > > > > > > > to the accelerator, > > > > > > > > > > and says: "I'm tied, it is your turn to do the job...". > > > > > > > > > > Then the accelerator has > > > > > > > > > > the memory, referring any portion of it with the same VAs > > > > > > > > > > of the application, > > > > > > > > > > even the VAs are stored inside the memory itself. > > > > > > > > > > > > > > > > > > You were not looking at right place see > > > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c > > > > > > > > > It does GUP and create GEM object AFAICR you can wrap that > > > > > > > > > GEM object into a > > > > > > > > > dma buffer object. > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for directing me to this implementation. It is > > > > > > > > interesting:). > > > > > > > > > > > > > > > > But it is not yet solve my problem. If I understand it right, > > > > > > > > the userptr in > > > > > > > > i915 do the following: > > > > > > > > > > > > > > > > 1. The user process sets a user pointer with size to the kernel > > > > > > > > via ioctl. > > > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm > > > > > > > > for further > > > > > > > >reference. > > > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the > > > > > > > > device. So the data > > > > > > > >can be shared between the user space and the hardware. > > > > > > > > > > > > > > > > But my scenario is: > > > > > > > > > > > > > > > > 1. The user process has some data in the user space, pointed by > > > > > > > > a pointer, say > > > > > > > >ptr1. And within the memory, there may be some other > > > > > > > > pointers, let's say one > > > > > > > >of them is ptr2. > > > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO > > > > > > > > space. And the > > > > > > > >hardware must refer ptr1 and ptr2 *directly* for data. > > > > > > > > > > > > > > > > Userptr lets the hardware and process share the same memory > > > > > > > > space. But I need > > > > > > > > them to share the same *address space*. So IOMMU is a MUST for > > > > > > > > WarpDrive, > > > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the > > > > > > > > procedure is OK. > > > > > > > > > > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ? > > > > > > > If so then wait for necessary SVA/SVM to land and do warp drive > > > > > > > without non SVA/SVM path. > > > > > > > > > > > > > > > > > > > I think we should clear the concept of SVA/SVM here. As my > > > > > > understanding, Share > > > > > > Virtual Address/Memory means: any virtual address
[PATCH][RFC] crypto: skcipher: Remove VLA usage
RFC follow-up to https://lkml.kernel.org/r/CAGXu5j+bpLK=EQ9LHkO8V=sdaQwt==6fbghgn2vi1e9_wxs...@mail.gmail.com The core API changes: struct crypto_sync_skcipher crypto_alloc_sync_skcipher() crypto_free_sync_skcipher() crypto_sync_skcipher_setkey() skcipher_request_set_sync_tfm() SKCIPHER_REQUEST_ON_STACK type check and a single user's refactoring as an example: drivers/crypto/ccp/ccp-crypto.h drivers/crypto/ccp/ccp-crypto-aes-xts.c Does this look correct? If so, I can continue and do the other 60 instances of SKCIPHER_REQUEST_ON_STACK(). Signed-off-by: Kees Cook --- crypto/skcipher.c | 24 + drivers/crypto/ccp/ccp-crypto-aes-xts.c | 10 drivers/crypto/ccp/ccp-crypto.h | 2 +- include/crypto/skcipher.h | 34 - 4 files changed, 63 insertions(+), 7 deletions(-) diff --git a/crypto/skcipher.c b/crypto/skcipher.c index 0bd8c6caa498..4caab81d2d02 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -949,6 +949,30 @@ struct crypto_skcipher *crypto_alloc_skcipher(const char *alg_name, } EXPORT_SYMBOL_GPL(crypto_alloc_skcipher); +struct crypto_sync_skcipher *crypto_alloc_sync_skcipher( + const char *alg_name, u32 type, u32 mask) +{ + struct crypto_skcipher *tfm; + + /* Only sync algorithms allowed. */ + mask |= CRYPTO_ALG_ASYNC; + + tfm = crypto_alloc_tfm(alg_name, &crypto_skcipher_type2, type, mask); + + /* +* Make sure we do not allocate something that might get used with +* an on-stack request: check the request size. +*/ + if (!IS_ERR(tfm) && WARN_ON(crypto_skcipher_reqsize(tfm) > + MAX_SYNC_SKCIPHER_REQSIZE)) { + crypto_free_skcipher(tfm); + return ERR_PTR(-EINVAL); + } + + return (struct crypto_sync_skcipher *)tfm; +} +EXPORT_SYMBOL_GPL(crypto_alloc_sync_skcipher); + int crypto_has_skcipher2(const char *alg_name, u32 type, u32 mask) { return crypto_type_has_alg(alg_name, &crypto_skcipher_type2, diff --git a/drivers/crypto/ccp/ccp-crypto-aes-xts.c b/drivers/crypto/ccp/ccp-crypto-aes-xts.c index 94b5bcf5b628..983c921736b4 100644 --- a/drivers/crypto/ccp/ccp-crypto-aes-xts.c +++ b/drivers/crypto/ccp/ccp-crypto-aes-xts.c @@ -102,7 +102,7 @@ static int ccp_aes_xts_setkey(struct crypto_ablkcipher *tfm, const u8 *key, ctx->u.aes.key_len = key_len / 2; sg_init_one(&ctx->u.aes.key_sg, ctx->u.aes.key, key_len); - return crypto_skcipher_setkey(ctx->u.aes.tfm_skcipher, key, key_len); + return crypto_sync_skcipher_setkey(ctx->u.aes.tfm_skcipher, key, key_len); } static int ccp_aes_xts_crypt(struct ablkcipher_request *req, @@ -156,7 +156,7 @@ static int ccp_aes_xts_crypt(struct ablkcipher_request *req, /* Use the fallback to process the request for any * unsupported unit sizes or key sizes */ - skcipher_request_set_tfm(subreq, ctx->u.aes.tfm_skcipher); + skcipher_request_set_sync_tfm(subreq, ctx->u.aes.tfm_skcipher); skcipher_request_set_callback(subreq, req->base.flags, NULL, NULL); skcipher_request_set_crypt(subreq, req->src, req->dst, @@ -203,12 +203,12 @@ static int ccp_aes_xts_decrypt(struct ablkcipher_request *req) static int ccp_aes_xts_cra_init(struct crypto_tfm *tfm) { struct ccp_ctx *ctx = crypto_tfm_ctx(tfm); - struct crypto_skcipher *fallback_tfm; + struct crypto_sync_skcipher *fallback_tfm; ctx->complete = ccp_aes_xts_complete; ctx->u.aes.key_len = 0; - fallback_tfm = crypto_alloc_skcipher("xts(aes)", 0, + fallback_tfm = crypto_alloc_sync_skcipher("xts(aes)", 0, CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK); if (IS_ERR(fallback_tfm)) { @@ -226,7 +226,7 @@ static void ccp_aes_xts_cra_exit(struct crypto_tfm *tfm) { struct ccp_ctx *ctx = crypto_tfm_ctx(tfm); - crypto_free_skcipher(ctx->u.aes.tfm_skcipher); + crypto_free_sync_skcipher(ctx->u.aes.tfm_skcipher); } static int ccp_register_aes_xts_alg(struct list_head *head, diff --git a/drivers/crypto/ccp/ccp-crypto.h b/drivers/crypto/ccp/ccp-crypto.h index b9fd090c46c2..28819e11db96 100644 --- a/drivers/crypto/ccp/ccp-crypto.h +++ b/drivers/crypto/ccp/ccp-crypto.h @@ -88,7 +88,7 @@ static inline struct ccp_crypto_ahash_alg * /* AES related defines */ struct ccp_aes_ctx { /* Fallback cipher for XTS with unsupported unit sizes */ - struct crypto_skcipher *tfm_skcipher; + struct crypto_sync_skcipher *tfm_skcipher; /* Cipher used to generate CMAC K1/K2 keys */ struct crypto_cipher *tfm_cipher; diff --git a/include
Re: [PATCH v8 5/9] dm: Remove VLA usage from hashes
On Tue, Aug 07 2018 at 5:18pm -0400, Kees Cook wrote: > In the quest to remove all stack VLA usage from the kernel[1], this uses > the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper > bounds on stack usage. > > [1] > https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com > > Signed-off-by: Kees Cook > --- > drivers/md/dm-integrity.c | 23 +-- > drivers/md/dm-verity-fec.c | 5 - > 2 files changed, 21 insertions(+), 7 deletions(-) > > diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c > index 86438b2f10dd..884edd7cf1d0 100644 > --- a/drivers/md/dm-integrity.c > +++ b/drivers/md/dm-integrity.c > @@ -521,7 +521,12 @@ static void section_mac(struct dm_integrity_c *ic, > unsigned section, __u8 result > } > memset(result + size, 0, JOURNAL_MAC_SIZE - size); > } else { > - __u8 digest[size]; > + __u8 digest[HASH_MAX_DIGESTSIZE]; > + > + if (WARN_ON(size > sizeof(digest))) { > + dm_integrity_io_error(ic, "digest_size", -EINVAL); > + goto err; > + } > r = crypto_shash_final(desc, digest); > if (unlikely(r)) { > dm_integrity_io_error(ic, "crypto_shash_final", r); > @@ -1244,7 +1249,7 @@ static void integrity_metadata(struct work_struct *w) > struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct > dm_integrity_io)); > char *checksums; > unsigned extra_space = unlikely(digest_size > ic->tag_size) ? > digest_size - ic->tag_size : 0; > - char checksums_onstack[ic->tag_size + extra_space]; > + char checksums_onstack[HASH_MAX_DIGESTSIZE]; > unsigned sectors_to_process = dio->range.n_sectors; > sector_t sector = dio->range.logical_sector; > > @@ -1253,8 +1258,14 @@ static void integrity_metadata(struct work_struct *w) > > checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> > ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space, > GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN); > - if (!checksums) > + if (!checksums) { > checksums = checksums_onstack; > + if (WARN_ON(extra_space && > + digest_size > sizeof(checksums_onstack))) { > + r = -EINVAL; > + goto error; > + } > + } > > __bio_for_each_segment(bv, bio, iter, dio->orig_bi_iter) { > unsigned pos; Given the length of the kmalloc() just prior to this new WARN_ON() line I'm not seeing why you've elected to split the WARN_ON across multiple lines. But that style nit aside: Acked-by: Mike Snitzer
Re: [PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests
On Thu, Sep 13, 2018 at 9:46 AM, Kees Cook wrote: > On Mon, Sep 10, 2018 at 10:52 PM, Herbert Xu > wrote: >> On Fri, Sep 07, 2018 at 08:56:23AM +0200, Ard Biesheuvel wrote: >>> >>> OK, so given that all SKCIPHER_REQUEST_ON_STACK occurrences are >>> updated in this series anyway, perhaps we should add >>> skcipher_[en|de]crypt_onstack() flavors that encapsulate the >>> additional check? Only question is how to enforce at compile time that >>> those are used instead of the ordinary ones when using a stack >>> allocated request. Would you mind using some macro foo here involving >>> __builtin_types_compatible_p() ? >> >> Something like a completely new type which in reality is just a >> wrapper around skcipher: >> >> struct crypto_sync_skcipher { >> struct crypto_skcipher base; >> } tfm; >> >> tfm = crypto_alloc_sync_skcipher(...); >> >> crypto_sync_skcipher_encrypt(...) >> crypto_sync_skcipher_decrypt(...) >> >> These functions would just be trivial inline functions around their >> crypto_skcipher counterparts. > > This means new wrappers for the other helpers too, yes? For example: > > SKCIPHER_REQUEST_ON_STACK(nreq, ctx->null); > > skcipher_request_set_tfm(nreq, ctx->null); > skcipher_request_set_callback(nreq, req->base.flags, NULL, NULL); > skcipher_request_set_crypt(nreq, req->src, req->dst, nbytes, NULL); > > return crypto_skcipher_encrypt(nreq); > > For the above, we'd also need: > > sync_skcipher_request_set_tfm() > sync_skcipher_request_set_callback() > sync_skcipher_request_set_crypt() Wait, I think I misunderstood you. Did you mean a new top-level thing (tfm?) not a new request type? That would mean at least replacing skcipher_request_set_tfm() with a wrapper (since the tfm argument is different), but _not_ encrypt/decrypt like you mention. I could perform a type test in SKCIPHER_REQUEST_ON_STACK(). Globally: - add struct crypto_sync_skcipher wrapper - add crypto_alloc_sync_skcipher() to check non-ASYNC and request size of actual tfm - add skcipher_request_set_sync_tfm() to attach the wrapped tfm to the request - SKCIPHER_REQUEST_ON_STACK() would verify the tfm was a struct crypto_sync_skcipher. Two changes per user: - change allocation to use new crypto_alloc_sync_skcipher() which does the runtime checking - change skcipher_request_set_tfm() to skcipher_request_set_sync_tfm() This means struct skcipher_request is unchanged, along with _set_callback, _set_crypt, _zero, and en/decrypt. API misuse would be caught at build-time (via SKCIPHER_REQUEST_ON_STACK type checking) and any request size problems would be caught at allocation time. Does this sound like what you had in mind? -Kees -- Kees Cook Pixel Security
Re: [PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests
On Mon, Sep 10, 2018 at 10:52 PM, Herbert Xu wrote: > On Fri, Sep 07, 2018 at 08:56:23AM +0200, Ard Biesheuvel wrote: >> >> OK, so given that all SKCIPHER_REQUEST_ON_STACK occurrences are >> updated in this series anyway, perhaps we should add >> skcipher_[en|de]crypt_onstack() flavors that encapsulate the >> additional check? Only question is how to enforce at compile time that >> those are used instead of the ordinary ones when using a stack >> allocated request. Would you mind using some macro foo here involving >> __builtin_types_compatible_p() ? > > Something like a completely new type which in reality is just a > wrapper around skcipher: > > struct crypto_sync_skcipher { > struct crypto_skcipher base; > } tfm; > > tfm = crypto_alloc_sync_skcipher(...); > > crypto_sync_skcipher_encrypt(...) > crypto_sync_skcipher_decrypt(...) > > These functions would just be trivial inline functions around their > crypto_skcipher counterparts. This means new wrappers for the other helpers too, yes? For example: SKCIPHER_REQUEST_ON_STACK(nreq, ctx->null); skcipher_request_set_tfm(nreq, ctx->null); skcipher_request_set_callback(nreq, req->base.flags, NULL, NULL); skcipher_request_set_crypt(nreq, req->src, req->dst, nbytes, NULL); return crypto_skcipher_encrypt(nreq); For the above, we'd also need: sync_skcipher_request_set_tfm() sync_skcipher_request_set_callback() sync_skcipher_request_set_crypt() -Kees -- Kees Cook Pixel Security
Re: [PATCH v8 5/9] dm: Remove VLA usage from hashes
On Mon, Sep 3, 2018 at 8:13 PM, Herbert Xu wrote: > On Tue, Aug 07, 2018 at 02:18:39PM -0700, Kees Cook wrote: >> In the quest to remove all stack VLA usage from the kernel[1], this uses >> the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper >> bounds on stack usage. >> >> [1] >> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com >> >> Signed-off-by: Kees Cook > > Can the dm folks please review this patch? Mike or Alasdair, can you Ack this patch so Herbert can include it in the crypto tree? This is blocking some VLA removals[1]... Thanks! -Kees [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com -- Kees Cook Pixel Security
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On Thu, Sep 13, 2018 at 5:43 PM Ard Biesheuvel wrote: > I'd prefer it if all the accelerated software implementations live in > the same place. But I do strongly prefer arch code to live in > arch/$arch Zinc follows the scheme of the raid6 code, as well as of most other crypto libraries: code is grouped by cipher, making it easy for people to work with and understand differing implementations. It also allows us to trivially link these together at compile time rather than at link time, which makes cipher selection much more efficient. It's really much more maintainable this way. > I think AES-GCM is a useful example here. I really like the SIMD token > abstraction a lot, but I would like to understand how this would work > in Zinc if you have > a) a generic implementation > b) perhaps an arch specific scalar implementation > c) a pure NEON implementation > d) an implementation using AES instructions but not the PMULL instructions > e) an implementation that uses AES and PMULL instructions. The same way that Zinc currently chooses between the five different implementations for, say, x86_64 ChaCha20: - Generic C scalar - SSSE3 - AVX2 - AVX512F - AVX512VL We make a decision based on CPU capabilities, SIMD context, and input length, and then choose the right function. > You know what? If you're up for it, let's not wait until Plumbers, but > instead, let's collaborate off list to get this into shape. Sure, sounds good. Jason
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On Thu, Sep 13, 2018 at 5:04 PM Ard Biesheuvel wrote: > > The code still benefits from the review that's gone into OpenSSL. It's > > not modified in ways that would affect the cryptographic operations > > being done. It's modified to be suitable for kernel space. > > So could we please at least have those changes as a separate patch then? I'll experiment with a couple ways of trying to communicate with precision what's been changed from OpenSSL for the next round of patches. > > That's interesting. I'll bring this up with AndyP. FWIW, if you think > > you have a real and compelling claim here, I'd be much more likely to > > accept a different ChaCha20 implementation than I would be to accept a > > different Poly1305 implementation. (It's a *lot* harder to screw up > > ChaCha20 than it is to screw up Poly1305.) > > The question is really whether we want different implementations in > the crypto API and in zinc. Per earlier in this discussion, I've already authored patches that replaces the crypto API's implementations with simple calls to Zinc, so that code isn't duplicated. These will be in v4 and you can comment on the approach then. > You are completely missing my point. I am not particularly invested in > the crypto API, and I share the concerns about its usability. That is > why I want to make sure that your solution actually results in a net > improvement for everybody, not just for WireGuard, in a maintainable > way. Right, likewise. I've put quite a bit of effort into separating Zinc into Zinc and not into something part of WireGuard. The motivation for doing so is a decent amount of call sites all around the kernel that I'd like to gradually fix up.
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On 13 September 2018 at 16:32, Jason A. Donenfeld wrote: > On Thu, Sep 13, 2018 at 7:41 AM Ard Biesheuvel > wrote: >> But one of the supposed selling points of this crypto library is that >> it gives engineers who are frightened of crypto in general and the >> crypto API in particular simple and easy to use crypto primitives >> rather than having to jump through the crypto API's hoops. > > The goal is for engineers who want to specifically use algorithm X > from within the kernel in a non-dynamic way to be able to then use > algorithm X with a simple function call. The goal is not to open it up > to people who have no idea what they're doing; for that a NaCL-like > library with functions like "crypto_box_open" or something would fit > the bill; but that's also not what we're trying to do here. Please > don't confuse the design goals. The rest of your email is therefore a > bit of a straw man; cut the rhetoric out. > >> A crypto library whose only encryption algorithm is a stream cipher >> does *not* deliver on that promise, since it is only suitable for >> cases where IVs are guaranteed not to be reused. > > False. We also offer XChaCha20Poly1305, which takes a massive nonce, > suitable for random generation. > > If there became a useful case for AES-PMAC-SIV or even AES-GCM or > something to that extent, then Zinc would add that as required. But > we're not going to start adding random ciphers unless they're needed. > I'd prefer it if all the accelerated software implementations live in the same place. But I do strongly prefer arch code to live in arch/$arch, simply because of the maintenance scope for arch developers and maintainers. I think AES-GCM is a useful example here. I really like the SIMD token abstraction a lot, but I would like to understand how this would work in Zinc if you have a) a generic implementation b) perhaps an arch specific scalar implementation c) a pure NEON implementation d) an implementation using AES instructions but not the PMULL instructions e) an implementation that uses AES and PMULL instructions. On arch/arm64 we support code patching, on other arches it may be different. We also support udev loading of modules depending on CPU capabilities, which means only the implementation you are likely to use will be loaded, and other modules are disregarded. I am not asking you to implement this now. But I do want to understand how these use cases fit into Zinc. And note that this has nothing to do with the crypto API: this could simply be a synchronous aes_gcm_encrypt() library function that maps to any of the above at runtime depending on the platform's capabilities. >> You yourself were >> bitten by the clunkiness of the crypto API when attempting to use the >> SHA26 code, right? So shouldn't we move that into this crypto library >> as well? > > As stated in the initial commit, and in numerous other emails > stretching back a year, yes, sha256 and other things in lib/ are going > to be put into Zinc following the initial merge of Zinc. These changes > will happen incrementally, like everything else that happens in the > kernel. Sha256, in particular, is probably the first thing I'll port > post-merge. > Excellent. >> I think it is reasonable for WireGuard to standardize on >> ChaCha20/Poly1305 only, although I have my concerns about the flag day >> that will be required if this 'one true cipher' ever does turn out to >> be compromised (either that, or we will have to go back in time and >> add some kind of protocol versioning to existing deployments of >> WireGuard) > > Those concerns are not valid and have already been addressed (to you, > I believe) on this mailing list and elsewhere. WireGuard is versioned, > hence there's no need to "add" versioning, and it is prepared to roll > out new cryptography in a subsequent version should there be any > issues. In other words, your concern is based on a misunderstanding of > the protocol. If you have issues, however, with the design decisions > of WireGuard, something that's been heavily discussed with members of > the linux kernel community, networking community, cryptography > community, and so forth, for the last 3 years, I invite you to bring > them up on . > I'd prefer to have the discussion here, if you don't mind. IIUC clients and servers need to run the same version of the protocol. So does it require a flag day to switch the world over to the new version when the old one is deprecated? >> And frankly, if the code were as good as the prose, we wouldn't be >> having this discussion. > > Please cut out this rhetoric. That's an obviously unprovable > statement, but it probably isn't true anyway. I wish you'd stick to > technical concerns only, rather than what appears to be a desire to > derail this by any means necessary. > I apologize if I hit a nerve there, that was not my intention. I am not an 'entrenched crypto API guy that is out to get you'. I am a core ARM developer that takes an interest in crypto, shares
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
> On Sep 12, 2018, at 11:39 PM, Milan Broz wrote: > >> On 13/09/18 01:45, Andy Lutomirski wrote: >> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel > ... >> b) Crypto that is used dynamically. This includes dm-crypt >> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a >> lot of IPSEC stuff, possibly KCM, and probably many more. These will >> get comparatively little benefit from being converted to a zinc-like >> interface. For some of these cases, it wouldn't make any sense at all >> to convert them. Certainly the ones that do async hardware crypto >> using DMA engines will never look at all like zinc, even under the >> hood. > > Please note, that dm-crypt now uses not only block ciphers and modes, > but also authenticated encryption and hashes (for ESSIV and HMAC > in authenticated composed modes) and RNG (for random IV). > We use crypto API, including async variants (I hope correctly :) Right. And all this is why I don’t think dm-crypt should use zinc, at least not any time soon.
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On 13 September 2018 at 16:18, Jason A. Donenfeld wrote: > On Thu, Sep 13, 2018 at 1:45 AM Andy Lutomirski wrote: >> I'm not convinced that there's any real need for *all* crypto >> algorithms to move into lib/zinc or to move at all. As I see it, >> there are two classes of crypto algorithms in the kernel: >> >> a) Crypto that is used by code that chooses its algorithm statically >> and wants synchronous operations. These include everything in >> drivers/char/random.c, but also a bunch of various networking things >> that are hardcoded and basically everything that uses stack buffers. >> (This means it includes all the code that I broke when I did >> VMAP_STACK. Sign.) > > Right, exactly. This is what will wind up using Zinc. I'm working on > an example usage of this for v4 of the patch submission, which you can > ogle in a preview here if you're curious: > > https://git.zx2c4.com/linux-dev/commit/?h=big_key_rewrite > > 28 insertions, 206 deletions :-D > I must say, that actually looks pretty good. >> b) Crypto that is used dynamically. This includes dm-crypt >> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a >> lot of IPSEC stuff, possibly KCM, and probably many more. These will >> get comparatively little benefit from being converted to a zinc-like >> interface. For some of these cases, it wouldn't make any sense at all >> to convert them. Certainly the ones that do async hardware crypto >> using DMA engines will never look at all like zinc, even under the >> hood. > > Right, this is what the crypto API will continue to be used for. > > >> I think that, as a short-term goal, it makes a lot of sense to have >> implementations of the crypto that *new* kernel code (like Wireguard) >> wants to use in style (a) that live in /lib, and it obviously makes >> sense to consolidate their implementations with the crypto/ >> implementations in a timely manner. As a medium-term goal, adding >> more algorithms as needed for things that could use the simpler APIs >> (Bluetooth, perhaps) would make sense. > > Agreed 100%. With regards to "consolidate their implementations" -- > I've actually already done this after your urging yesterday, and so > that will be a part of v4. > >> But I see no reason at all that /lib should ever contain a grab-bag of >> crypto implementations just for the heck of it. They should have real >> in-kernel users IMO. And this means that there will probably always >> be some crypto implementations in crypto/ for things like aes-xts. > > Right, precisely. > > Jason
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On 13 September 2018 at 16:15, Jason A. Donenfeld wrote: > Hi Ard, > > On Thu, Sep 13, 2018 at 12:56 AM Ard Biesheuvel > wrote: >> In this series, you are dumping a huge volume of unannotated, >> generated asm into the kernel which has been modified [by you] to >> [among other things?] adhere to the kernel API (without documenting >> what the changes are exactly). How does that live up to the promise of >> better, peer reviewed code? > > The code still benefits from the review that's gone into OpenSSL. It's > not modified in ways that would affect the cryptographic operations > being done. It's modified to be suitable for kernel space. > So could we please at least have those changes as a separate patch then? >> Then there is the performance claim. We know for instance that the >> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to >> possess a micro-architectural property that ALU instructions are >> essentially free when they are interleaved with SIMD instructions. But >> we also know that a) Cortex-A7, which is a relevant target, is not one >> of those cores, and b) that chip designers are not likely to optimize >> for that particular usage pattern so relying on it in generic code is >> unwise in general. > > That's interesting. I'll bring this up with AndyP. FWIW, if you think > you have a real and compelling claim here, I'd be much more likely to > accept a different ChaCha20 implementation than I would be to accept a > different Poly1305 implementation. (It's a *lot* harder to screw up > ChaCha20 than it is to screw up Poly1305.) > The question is really whether we want different implementations in the crypto API and in zinc. >> I am also concerned about your claim that all software algorithms will >> be moved into this crypto library. > > I'll defer to Andy's response here, which I think is a correct one: > https://lkml.org/lkml/2018/9/13/27 > > The short answer is that Zinc is going to be adding the ciphers that > people want to use for normal reasons from normal code. For example, > after this merges, we'll next be working on moving the remaining > non-optimized C code out of lib/ that's called by places (such as > SHA2). > Excellent. >> You are not specific about whose >> responsibility it will be that this is going to happen in a timely >> fashion. > > I thought I laid out the roadmap for this in the commit message. In > case I wasn't clear: my plan is to tackle lib/ after merging, and I > plan to do so in a timely manner. It's a pretty common tactic to keep > layering on tasks, "what about X?", "what about Y?", "I won't agree > unless Z!" -- when in reality kernel development and refactorings are > done incrementally. I've been around on this list contributing code > for long enough that you should have a decent amount of confidence > that I'm not just going to disappear working on this or something > insane like that. And neither are the two academic cryptographers CC'd > on this thread. So, as Andy said, we're going to be porting to Zinc > the primitives that are useful for the various applications of Zinc. > This means yes, we'll have SHA2 in there. > >> chaining modes >> What are the APIs >> going to look like for block ciphers, taking chaining modes into >> account? > > As mentioned in the commit message and numerous times, we're not > trying to make a win32-like crypto API here or to remake the existing > Linux crypto API. Rather we're providing libraries of specific > functions that are useful for various circumstances. For example, if > AES-GCM is desired at some point, then we'll have a similar API for > that as we do for ChaPoly -- one that takes buffers and one that takes > sg. Likewise, hash functions use the familiar init/update/final. > "Generic" chaining modes aren't really part of the equation or design > goals. > > Again, I realize you've spent a long time working on the existing > crypto API, and so your questions and concerns are in the line of, > "how are we going to make Zinc look like the existing crypto API in > functionality?" You are completely missing my point. I am not particularly invested in the crypto API, and I share the concerns about its usability. That is why I want to make sure that your solution actually results in a net improvement for everybody, not just for WireGuard, in a maintainable way. > But that's not what we're up to here. We have a > different and complementary design goal. I understand why you're > squirming, but please recognize we're working on different things. > >> I'm sure it is rather simple to port the crypto API implementation of >> ChaCha20 to use your library. I am more concerned about how your >> library is going to expand to cover all other software algorithms that >> we currently use in the kernel. > > The subset of algorithms we add will be developed with the same > methodology as the present ones. There is nothing making this > particularly difficult or even more difficult for other primitives > than it wa
Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote: > On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote: > > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote: > > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote: > > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote: > > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote: > > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote: > > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote: > > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote: > > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote: > > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote: > > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson > > > > > > > > > > > wrote: > > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse > > > > > > > > > > > > wrote: > > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee > > > > > > > > > > > > > wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it > > > > > > > > > "copy_from_user" the > > > > > > > > > user memory to the kernel. That is not what we need. What we > > > > > > > > > try to get is: the > > > > > > > > > user application do something on its data, and push it away > > > > > > > > > to the accelerator, > > > > > > > > > and says: "I'm tied, it is your turn to do the job...". Then > > > > > > > > > the accelerator has > > > > > > > > > the memory, referring any portion of it with the same VAs of > > > > > > > > > the application, > > > > > > > > > even the VAs are stored inside the memory itself. > > > > > > > > > > > > > > > > You were not looking at right place see > > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c > > > > > > > > It does GUP and create GEM object AFAICR you can wrap that GEM > > > > > > > > object into a > > > > > > > > dma buffer object. > > > > > > > > > > > > > > > > > > > > > > Thank you for directing me to this implementation. It is > > > > > > > interesting:). > > > > > > > > > > > > > > But it is not yet solve my problem. If I understand it right, the > > > > > > > userptr in > > > > > > > i915 do the following: > > > > > > > > > > > > > > 1. The user process sets a user pointer with size to the kernel > > > > > > > via ioctl. > > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm > > > > > > > for further > > > > > > >reference. > > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the > > > > > > > device. So the data > > > > > > >can be shared between the user space and the hardware. > > > > > > > > > > > > > > But my scenario is: > > > > > > > > > > > > > > 1. The user process has some data in the user space, pointed by a > > > > > > > pointer, say > > > > > > >ptr1. And within the memory, there may be some other pointers, > > > > > > > let's say one > > > > > > >of them is ptr2. > > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO > > > > > > > space. And the > > > > > > >hardware must refer ptr1 and ptr2 *directly* for data. > > > > > > > > > > > > > > Userptr lets the hardware and process share the same memory > > > > > > > space. But I need > > > > > > > them to share the same *address space*. So IOMMU is a MUST for > > > > > > > WarpDrive, > > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the > > > > > > > procedure is OK. > > > > > > > > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ? > > > > > > If so then wait for necessary SVA/SVM to land and do warp drive > > > > > > without non SVA/SVM path. > > > > > > > > > > > > > > > > I think we should clear the concept of SVA/SVM here. As my > > > > > understanding, Share > > > > > Virtual Address/Memory means: any virtual address in a process can be > > > > > used by > > > > > device at the same time. This requires IOMMU device to support PASID. > > > > > And > > > > > optionally, it requires the feature of page-fault-from-device. > > > > > > > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, access > > > > to range that are MMIO map ie CPU page table pointing to IO memory, IIRC > > > > it is undefined what happens on some platform for a device trying to > > > > access those using SVA/SVM. > > > > > > > > > > > > > But before the feature is settled down, IOMMU can be used immediately > > > > > in the > > > > > current kernel. That make it possible to assign ONE process's virtual > > > > > addresses > > > > > to the device's IOMMU page table with GUP. This make WarpDrive work > > > > > well for one > > > > > process. > > > > > > > > UH ? How ? You want to GUP _every_ single valid address in the process > > > > and map it to the device ? How do you handle new vm
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
Hi Milan, On Thu, Sep 13, 2018 at 8:40 AM Milan Broz wrote: > Please note, that dm-crypt now uses not only block ciphers and modes, > but also authenticated encryption and hashes (for ESSIV and HMAC > in authenticated composed modes) and RNG (for random IV). > We use crypto API, including async variants (I hope correctly :) > > There is a long time battle to move initialization vectors generators > from dm-crypt to crypto API. If there are any plans to use a new library, > this issue should be discussed as well. > (Some dm-crypt IV generators are disk encryption specific, some do more > that just IV so porting is not straightforward etc). > > Related problem here is an optimization of chain of sectors encryption - > if we have new crypto API, it would be nice if can take chain of sectors > so possible implementation can process this chain in one batch > (every sector need to be tweaked by differently generated IV - and we > are back in problem above). > I think filesystem encryption uses the same pattern. > > And btw, we use the same algorithms through AF_ALG in userspace (cryptsetup). > > So please, if you mention dm-crypt, note that it is very complex > crypto API consumer :) And everything is dynamic, configurable through > dm-crypt options. > > That said, I would be more than happy to help in experiments to porting > dm-crypt > to any other crypto library, but if it doesn't not help with problems > mentioned above, I do not see any compelling reason for the new library for > dm-crypt... dm-crypt is probably a good consumer of the existing crypto API and won't be impacted by the introduction of Zinc, which is really just the exposure of a couple low level simple crypto functions, and not a fancy API like the crypto API which dm-crypt happily uses. Jason
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On Thu, Sep 13, 2018 at 7:41 AM Ard Biesheuvel wrote: > But one of the supposed selling points of this crypto library is that > it gives engineers who are frightened of crypto in general and the > crypto API in particular simple and easy to use crypto primitives > rather than having to jump through the crypto API's hoops. The goal is for engineers who want to specifically use algorithm X from within the kernel in a non-dynamic way to be able to then use algorithm X with a simple function call. The goal is not to open it up to people who have no idea what they're doing; for that a NaCL-like library with functions like "crypto_box_open" or something would fit the bill; but that's also not what we're trying to do here. Please don't confuse the design goals. The rest of your email is therefore a bit of a straw man; cut the rhetoric out. > A crypto library whose only encryption algorithm is a stream cipher > does *not* deliver on that promise, since it is only suitable for > cases where IVs are guaranteed not to be reused. False. We also offer XChaCha20Poly1305, which takes a massive nonce, suitable for random generation. If there became a useful case for AES-PMAC-SIV or even AES-GCM or something to that extent, then Zinc would add that as required. But we're not going to start adding random ciphers unless they're needed. > You yourself were > bitten by the clunkiness of the crypto API when attempting to use the > SHA26 code, right? So shouldn't we move that into this crypto library > as well? As stated in the initial commit, and in numerous other emails stretching back a year, yes, sha256 and other things in lib/ are going to be put into Zinc following the initial merge of Zinc. These changes will happen incrementally, like everything else that happens in the kernel. Sha256, in particular, is probably the first thing I'll port post-merge. > I think it is reasonable for WireGuard to standardize on > ChaCha20/Poly1305 only, although I have my concerns about the flag day > that will be required if this 'one true cipher' ever does turn out to > be compromised (either that, or we will have to go back in time and > add some kind of protocol versioning to existing deployments of > WireGuard) Those concerns are not valid and have already been addressed (to you, I believe) on this mailing list and elsewhere. WireGuard is versioned, hence there's no need to "add" versioning, and it is prepared to roll out new cryptography in a subsequent version should there be any issues. In other words, your concern is based on a misunderstanding of the protocol. If you have issues, however, with the design decisions of WireGuard, something that's been heavily discussed with members of the linux kernel community, networking community, cryptography community, and so forth, for the last 3 years, I invite you to bring them up on . > And frankly, if the code were as good as the prose, we wouldn't be > having this discussion. Please cut out this rhetoric. That's an obviously unprovable statement, but it probably isn't true anyway. I wish you'd stick to technical concerns only, rather than what appears to be a desire to derail this by any means necessary. > Zinc adds its own clunky ways to mix arch and > generic code, involving GCC -include command line arguments and > #ifdefs everywhere. My review comments on this were completely ignored > by Jason. No, they were not ignored. v2 cleaned up the #ifdefs. v4 has already cleaned up the makefile stuff and will be even cleaner. Good things await, don't worry. Jason
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
On Thu, Sep 13, 2018 at 1:45 AM Andy Lutomirski wrote: > I'm not convinced that there's any real need for *all* crypto > algorithms to move into lib/zinc or to move at all. As I see it, > there are two classes of crypto algorithms in the kernel: > > a) Crypto that is used by code that chooses its algorithm statically > and wants synchronous operations. These include everything in > drivers/char/random.c, but also a bunch of various networking things > that are hardcoded and basically everything that uses stack buffers. > (This means it includes all the code that I broke when I did > VMAP_STACK. Sign.) Right, exactly. This is what will wind up using Zinc. I'm working on an example usage of this for v4 of the patch submission, which you can ogle in a preview here if you're curious: https://git.zx2c4.com/linux-dev/commit/?h=big_key_rewrite 28 insertions, 206 deletions :-D > b) Crypto that is used dynamically. This includes dm-crypt > (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a > lot of IPSEC stuff, possibly KCM, and probably many more. These will > get comparatively little benefit from being converted to a zinc-like > interface. For some of these cases, it wouldn't make any sense at all > to convert them. Certainly the ones that do async hardware crypto > using DMA engines will never look at all like zinc, even under the > hood. Right, this is what the crypto API will continue to be used for. > I think that, as a short-term goal, it makes a lot of sense to have > implementations of the crypto that *new* kernel code (like Wireguard) > wants to use in style (a) that live in /lib, and it obviously makes > sense to consolidate their implementations with the crypto/ > implementations in a timely manner. As a medium-term goal, adding > more algorithms as needed for things that could use the simpler APIs > (Bluetooth, perhaps) would make sense. Agreed 100%. With regards to "consolidate their implementations" -- I've actually already done this after your urging yesterday, and so that will be a part of v4. > But I see no reason at all that /lib should ever contain a grab-bag of > crypto implementations just for the heck of it. They should have real > in-kernel users IMO. And this means that there will probably always > be some crypto implementations in crypto/ for things like aes-xts. Right, precisely. Jason
Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
Hi Ard, On Thu, Sep 13, 2018 at 12:56 AM Ard Biesheuvel wrote: > In this series, you are dumping a huge volume of unannotated, > generated asm into the kernel which has been modified [by you] to > [among other things?] adhere to the kernel API (without documenting > what the changes are exactly). How does that live up to the promise of > better, peer reviewed code? The code still benefits from the review that's gone into OpenSSL. It's not modified in ways that would affect the cryptographic operations being done. It's modified to be suitable for kernel space. > Then there is the performance claim. We know for instance that the > OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to > possess a micro-architectural property that ALU instructions are > essentially free when they are interleaved with SIMD instructions. But > we also know that a) Cortex-A7, which is a relevant target, is not one > of those cores, and b) that chip designers are not likely to optimize > for that particular usage pattern so relying on it in generic code is > unwise in general. That's interesting. I'll bring this up with AndyP. FWIW, if you think you have a real and compelling claim here, I'd be much more likely to accept a different ChaCha20 implementation than I would be to accept a different Poly1305 implementation. (It's a *lot* harder to screw up ChaCha20 than it is to screw up Poly1305.) > I am also concerned about your claim that all software algorithms will > be moved into this crypto library. I'll defer to Andy's response here, which I think is a correct one: https://lkml.org/lkml/2018/9/13/27 The short answer is that Zinc is going to be adding the ciphers that people want to use for normal reasons from normal code. For example, after this merges, we'll next be working on moving the remaining non-optimized C code out of lib/ that's called by places (such as SHA2). > You are not specific about whose > responsibility it will be that this is going to happen in a timely > fashion. I thought I laid out the roadmap for this in the commit message. In case I wasn't clear: my plan is to tackle lib/ after merging, and I plan to do so in a timely manner. It's a pretty common tactic to keep layering on tasks, "what about X?", "what about Y?", "I won't agree unless Z!" -- when in reality kernel development and refactorings are done incrementally. I've been around on this list contributing code for long enough that you should have a decent amount of confidence that I'm not just going to disappear working on this or something insane like that. And neither are the two academic cryptographers CC'd on this thread. So, as Andy said, we're going to be porting to Zinc the primitives that are useful for the various applications of Zinc. This means yes, we'll have SHA2 in there. > chaining modes > What are the APIs > going to look like for block ciphers, taking chaining modes into > account? As mentioned in the commit message and numerous times, we're not trying to make a win32-like crypto API here or to remake the existing Linux crypto API. Rather we're providing libraries of specific functions that are useful for various circumstances. For example, if AES-GCM is desired at some point, then we'll have a similar API for that as we do for ChaPoly -- one that takes buffers and one that takes sg. Likewise, hash functions use the familiar init/update/final. "Generic" chaining modes aren't really part of the equation or design goals. Again, I realize you've spent a long time working on the existing crypto API, and so your questions and concerns are in the line of, "how are we going to make Zinc look like the existing crypto API in functionality?" But that's not what we're up to here. We have a different and complementary design goal. I understand why you're squirming, but please recognize we're working on different things. > I'm sure it is rather simple to port the crypto API implementation of > ChaCha20 to use your library. I am more concerned about how your > library is going to expand to cover all other software algorithms that > we currently use in the kernel. The subset of algorithms we add will be developed with the same methodology as the present ones. There is nothing making this particularly difficult or even more difficult for other primitives than it was for ChaCha20. It's especially easy, in fact, since we're following similar design methodologies as the vast majority of other cryptography libraries that have been developed. Namely, we're creating simple things called "functions". > Of course. But please respond to all the concerns, > You have not > responded to that concern yet. Sorry, it's certainly not my intention. I've been on vacation with my family for the last several weeks, and only returned home sleep-deprived last night after 4 days of plane delays. I've now rested and will resume working on this full-time and I'll try my best to address concerns, and also go back through emails to find things I
[PATCH v4 0/4] crypto: lrw - Fixes and improvements
This patchset contains a corner-case fix and several improvements for the LRW template. The first patch fixes an out-of-bounds array access (and subsequently incorrect cipher output) when the LRW counter goes from all ones to all zeros. This patch should be applied to the crypto-2.6 tree and also go to stable. The second patch adds a test vector for lrw(aes) that covers the above bug. The third patch is a small optimization of the LRW tweak computation. The fourth patch is a follow-up to a similar patch for XTS (it simplifies away the use of dynamically allocated auxiliary buffer to cache the computed tweak values): https://patchwork.kernel.org/patch/10588775/ Patches 2-4 should be applied only to cryptodev-2.6, but they all depend on the first patch. Changes in v4: - applied various corrections/suggestions from Eric Biggers - added a fix for buggy behavior on counter wrap-around (+ test vector) v3: https://www.spinics.net/lists/linux-crypto/msg34946.html Changes in v3: - fix a copy-paste error v2: https://www.spinics.net/lists/linux-crypto/msg34890.html Changes in v2: - small cleanup suggested by Eric Biggers v1: https://www.spinics.net/lists/linux-crypto/msg34871.html Ondrej Mosnacek (4): crypto: lrw - Fix out-of bounds access on counter overflow crypto: testmgr - Add test for LRW counter wrap-around crypto: lrw - Optimize tweak computation crypto: lrw - Do not use auxiliary buffer crypto/lrw.c | 342 +-- crypto/testmgr.h | 21 +++ 2 files changed, 112 insertions(+), 251 deletions(-) -- 2.17.1
[PATCH v4 3/4] crypto: lrw - Optimize tweak computation
This patch rewrites the tweak computation to a slightly simpler method that performs less bswaps. Based on performance measurements the new code seems to provide slightly better performance than the old one. PERFORMANCE MEASUREMENTS (x86_64) Performed using: https://gitlab.com/omos/linux-crypto-bench Crypto driver used: lrw(ecb-aes-aesni) Before: ALGORITHM KEY (b)DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 204 286 lrw(aes) 320 64 227 203 lrw(aes) 384 64 208 204 lrw(aes) 256 512 441 439 lrw(aes) 320 512 456 455 lrw(aes) 384 512 469 483 lrw(aes) 256409621362190 lrw(aes) 320409621612213 lrw(aes) 384409622952369 lrw(aes) 256 1638476927868 lrw(aes) 320 1638482308691 lrw(aes) 384 1638489718813 lrw(aes) 256 32768 15336 15560 lrw(aes) 320 32768 16410 16346 lrw(aes) 384 32768 18023 17465 After: ALGORITHM KEY (b)DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 200 203 lrw(aes) 320 64 202 204 lrw(aes) 384 64 204 205 lrw(aes) 256 512 415 415 lrw(aes) 320 512 432 440 lrw(aes) 384 512 449 451 lrw(aes) 256409618381995 lrw(aes) 320409621231980 lrw(aes) 384409621002119 lrw(aes) 256 1638471836954 lrw(aes) 320 1638478447631 lrw(aes) 384 1638482568126 lrw(aes) 256 32768 14772 14484 lrw(aes) 320 32768 15281 15431 lrw(aes) 384 32768 16469 16293 Signed-off-by: Ondrej Mosnacek --- crypto/lrw.c | 61 +++- 1 file changed, 37 insertions(+), 24 deletions(-) diff --git a/crypto/lrw.c b/crypto/lrw.c index 5504d1325a56..7377b5b486fd 100644 --- a/crypto/lrw.c +++ b/crypto/lrw.c @@ -120,27 +120,28 @@ static int setkey(struct crypto_skcipher *parent, const u8 *key, return 0; } -static inline void inc(be128 *iv) -{ - be64_add_cpu(&iv->b, 1); - if (!iv->b) - be64_add_cpu(&iv->a, 1); -} - -/* this returns the number of consequative 1 bits starting - * from the right, get_index128(00 00 00 00 00 00 ... 00 00 10 FB) = 2 */ -static inline int get_index128(be128 *block) +/* + * Returns the number of trailing '1' bits in the words of the counter, which is + * represented by 4 32-bit words, arranged from least to most significant. + * At the same time, increments the counter by one. + * + * For example: + * + * u32 counter[4] = { 0x, 0x1, 0x0, 0x0 }; + * int i = next_index(&counter); + * // i == 33, counter == { 0x0, 0x2, 0x0, 0x0 } + */ +static int next_index(u32 *counter) { - int x; - __be32 *p = (__be32 *) block; + int i, res = 0; - for (p += 3, x = 0; x < 128; p--, x += 32) { - u32 val = be32_to_cpup(p); - - if (!~val) - continue; - - return x + ffz(val); + for (i = 0; i < 4; i++) { + if (counter[i] + 1 != 0) { + res += ffz(counter[i]++); + break; + } + counter[i] = 0; + res += 32; } /* @@ -214,8 +215,9 @@ static int pre_crypt(struct skcipher_request *req) struct scatterlist *sg; unsigned cryptlen; unsigned offset; - be128 *iv; bool more; + __be32 *iv; + u32 counter[4]; int err; subreq = &rctx->subreq; @@ -230,7 +232,12 @@ static int pre_crypt(struct skcipher_request *req) cryptlen, req->iv); err = skcipher_walk_virt(&w, subreq, false); - iv = w.iv; + iv = (__be32 *)w.iv; + + counter[0] = be32_to_cpu(iv[3]); + counter[1] = be32_to_cpu(iv[2]); + counter[2] = be32_to_cpu(iv[
[PATCH v4 4/4] crypto: lrw - Do not use auxiliary buffer
This patch simplifies the LRW template to recompute the LRW tweaks from scratch in the second pass and thus also removes the need to allocate a dynamic buffer using kmalloc(). As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt. PERFORMANCE MEASUREMENTS (x86_64) Performed using: https://gitlab.com/omos/linux-crypto-bench Crypto driver used: lrw(ecb-aes-aesni) The results show that the new code has about the same performance as the old code. For 512-byte message it seems to be even slightly faster, but that might be just noise. Before: ALGORITHM KEY (b)DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 200 203 lrw(aes) 320 64 202 204 lrw(aes) 384 64 204 205 lrw(aes) 256 512 415 415 lrw(aes) 320 512 432 440 lrw(aes) 384 512 449 451 lrw(aes) 256409618381995 lrw(aes) 320409621231980 lrw(aes) 384409621002119 lrw(aes) 256 1638471836954 lrw(aes) 320 1638478447631 lrw(aes) 384 1638482568126 lrw(aes) 256 32768 14772 14484 lrw(aes) 320 32768 15281 15431 lrw(aes) 384 32768 16469 16293 After: ALGORITHM KEY (b)DATA (B) TIME ENC (ns) TIME DEC (ns) lrw(aes) 256 64 197 196 lrw(aes) 320 64 200 197 lrw(aes) 384 64 203 199 lrw(aes) 256 512 385 380 lrw(aes) 320 512 401 395 lrw(aes) 384 512 415 415 lrw(aes) 256409618691846 lrw(aes) 320409620801981 lrw(aes) 384409621602109 lrw(aes) 256 1638470777127 lrw(aes) 320 1638478077766 lrw(aes) 384 1638481088357 lrw(aes) 256 32768 14111 14454 lrw(aes) 320 32768 15268 15082 lrw(aes) 384 32768 16581 16250 [1] https://lkml.org/lkml/2018/8/23/1315 Signed-off-by: Ondrej Mosnacek --- crypto/lrw.c | 280 ++- 1 file changed, 51 insertions(+), 229 deletions(-) diff --git a/crypto/lrw.c b/crypto/lrw.c index 7377b5b486fd..6fcf0d431185 100644 --- a/crypto/lrw.c +++ b/crypto/lrw.c @@ -29,8 +29,6 @@ #include #include -#define LRW_BUFFER_SIZE 128u - #define LRW_BLOCK_SIZE 16 struct priv { @@ -56,19 +54,7 @@ struct priv { }; struct rctx { - be128 buf[LRW_BUFFER_SIZE / sizeof(be128)]; - be128 t; - - be128 *ext; - - struct scatterlist srcbuf[2]; - struct scatterlist dstbuf[2]; - struct scatterlist *src; - struct scatterlist *dst; - - unsigned int left; - struct skcipher_request subreq; }; @@ -152,86 +138,31 @@ static int next_index(u32 *counter) return 127; } -static int post_crypt(struct skcipher_request *req) +/* + * We compute the tweak masks twice (both before and after the ECB encryption or + * decryption) to avoid having to allocate a temporary buffer and/or make + * mutliple calls to the 'ecb(..)' instance, which usually would be slower than + * just doing the next_index() calls again. + */ +static int xor_tweak(struct skcipher_request *req, bool second_pass) { - struct rctx *rctx = skcipher_request_ctx(req); - be128 *buf = rctx->ext ?: rctx->buf; - struct skcipher_request *subreq; const int bs = LRW_BLOCK_SIZE; - struct skcipher_walk w; - struct scatterlist *sg; - unsigned offset; - int err; - - subreq = &rctx->subreq; - err = skcipher_walk_virt(&w, subreq, false); - - while (w.nbytes) { - unsigned int avail = w.nbytes; - be128 *wdst; - - wdst = w.dst.virt.addr; - - do { - be128_xor(wdst, buf++, wdst); - wdst++; - } while ((avail -= bs) >= bs); - - err = skcipher_walk_done(&w, avail); - } - - rctx->left -= subreq->cryptle
[PATCH v4 1/4] crypto: lrw - Fix out-of bounds access on counter overflow
When the LRW block counter overflows, the current implementation returns 128 as the index to the precomputed multiplication table, which has 128 entries. This patch fixes it to return the correct value (127). Fixes: 64470f1b8510 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode") Cc: # 2.6.20+ Reported-by: Eric Biggers Signed-off-by: Ondrej Mosnacek --- crypto/lrw.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/crypto/lrw.c b/crypto/lrw.c index 393a782679c7..5504d1325a56 100644 --- a/crypto/lrw.c +++ b/crypto/lrw.c @@ -143,7 +143,12 @@ static inline int get_index128(be128 *block) return x + ffz(val); } - return x; + /* +* If we get here, then x == 128 and we are incrementing the counter +* from all ones to all zeros. This means we must return index 127, i.e. +* the one corresponding to key2*{ 1,...,1 }. +*/ + return 127; } static int post_crypt(struct skcipher_request *req) -- 2.17.1
[PATCH v4 2/4] crypto: testmgr - Add test for LRW counter wrap-around
This patch adds a test vector for lrw(aes) that triggers wrap-around of the counter, which is a tricky corner case. Suggested-by: Eric Biggers Signed-off-by: Ondrej Mosnacek --- crypto/testmgr.h | 21 + 1 file changed, 21 insertions(+) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index 0b3d7cadbe93..47629cb1efd3 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -13145,6 +13145,27 @@ static const struct cipher_testvec aes_lrw_tv_template[] = { .ctext = "\x5b\x90\x8e\xc1\xab\xdd\x67\x5f" "\x3d\x69\x8a\x95\x53\xc8\x9c\xe5", .len= 16, + }, { /* Test counter wrap-around, modified from LRW-32-AES 1 */ + .key= "\x45\x62\xac\x25\xf8\x28\x17\x6d" + "\x4c\x26\x84\x14\xb5\x68\x01\x85" + "\x25\x8e\x2a\x05\xe7\x3e\x9d\x03" + "\xee\x5a\x83\x0c\xcc\x09\x4c\x87", + .klen = 32, + .iv = "\xff\xff\xff\xff\xff\xff\xff\xff" + "\xff\xff\xff\xff\xff\xff\xff\xff", + .ptext = "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x41\x42\x43\x44\x45\x46" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x41\x42\x43\x44\x45\x46" + "\x30\x31\x32\x33\x34\x35\x36\x37" + "\x38\x39\x41\x42\x43\x44\x45\x46", + .ctext = "\x47\x90\x50\xf6\xf4\x8d\x5c\x7f" + "\x84\xc7\x83\x95\x2d\xa2\x02\xc0" + "\xda\x7f\xa3\xc0\x88\x2a\x0a\x50" + "\xfb\xc1\x78\x03\x39\xfe\x1d\xe5" + "\xf1\xb2\x73\xcd\x65\xa3\xdf\x5f" + "\xe9\x5d\x48\x92\x54\x63\x4e\xb8", + .len= 48, }, { /* http://www.mail-archive.com/stds-p1619@listserv.ieee.org/msg00173.html */ .key= "\xf8\xd4\x76\xff\xd6\x46\xee\x6c" -- 2.17.1
Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote: > Date: Tue, 11 Sep 2018 09:40:14 -0400 > From: Jerome Glisse > To: Kenneth Lee > CC: Kenneth Lee , Alex Williamson > , Herbert Xu , > k...@vger.kernel.org, Jonathan Corbet , Greg Kroah-Hartman > , Zaibo Xu , > linux-...@vger.kernel.org, Sanjay Kumar , Hao > Fang , linux-ker...@vger.kernel.org, > linux...@huawei.com, io...@lists.linux-foundation.org, "David S . Miller" > , linux-crypto@vger.kernel.org, Zhou Wang > , Philippe Ombredanne , > Thomas Gleixner , Joerg Roedel , > linux-accelerat...@lists.ozlabs.org, Lu Baolu > Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive > User-Agent: Mutt/1.10.1 (2018-07-13) > Message-ID: <20180911134013.ga3...@redhat.com> > > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote: > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote: > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote: > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote: > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote: > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote: > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote: > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote: > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote: > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson > > > > > > > > > > wrote: > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse > > > > > > > > > > > wrote: > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee > > > > > > > > > > > > wrote: > > > > > > > > > > [...] > > > > > > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it > > > > > > > > "copy_from_user" the > > > > > > > > user memory to the kernel. That is not what we need. What we > > > > > > > > try to get is: the > > > > > > > > user application do something on its data, and push it away to > > > > > > > > the accelerator, > > > > > > > > and says: "I'm tied, it is your turn to do the job...". Then > > > > > > > > the accelerator has > > > > > > > > the memory, referring any portion of it with the same VAs of > > > > > > > > the application, > > > > > > > > even the VAs are stored inside the memory itself. > > > > > > > > > > > > > > You were not looking at right place see > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c > > > > > > > It does GUP and create GEM object AFAICR you can wrap that GEM > > > > > > > object into a > > > > > > > dma buffer object. > > > > > > > > > > > > > > > > > > > Thank you for directing me to this implementation. It is > > > > > > interesting:). > > > > > > > > > > > > But it is not yet solve my problem. If I understand it right, the > > > > > > userptr in > > > > > > i915 do the following: > > > > > > > > > > > > 1. The user process sets a user pointer with size to the kernel via > > > > > > ioctl. > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm for > > > > > > further > > > > > >reference. > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the device. > > > > > > So the data > > > > > >can be shared between the user space and the hardware. > > > > > > > > > > > > But my scenario is: > > > > > > > > > > > > 1. The user process has some data in the user space, pointed by a > > > > > > pointer, say > > > > > >ptr1. And within the memory, there may be some other pointers, > > > > > > let's say one > > > > > >of them is ptr2. > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO space. > > > > > > And the > > > > > >hardware must refer ptr1 and ptr2 *directly* for data. > > > > > > > > > > > > Userptr lets the hardware and process share the same memory space. > > > > > > But I need > > > > > > them to share the same *address space*. So IOMMU is a MUST for > > > > > > WarpDrive, > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the > > > > > > procedure is OK. > > > > > > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ? > > > > > If so then wait for necessary SVA/SVM to land and do warp drive > > > > > without non SVA/SVM path. > > > > > > > > > > > > > I think we should clear the concept of SVA/SVM here. As my > > > > understanding, Share > > > > Virtual Address/Memory means: any virtual address in a process can be > > > > used by > > > > device at the same time. This requires IOMMU device to support PASID. > > > > And > > > > optionally, it requires the feature of page-fault-from-device. > > > > > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, access > > > to range that are MMIO map ie CPU page table pointing to IO memory, IIRC > > > it is undefined what happens on some platform for a device trying to > > > access those using SVA/SVM. > > > > > > > > >
[PATCH 2/4] crypto: s5p-sss: Fix whitespace issues
Fix two whitespace issues that occurred at line breakup. Signed-off-by: Christoph Manszewski --- drivers/crypto/s5p-sss.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c index 0cf3f12d8f74..ce9bd13ea166 100644 --- a/drivers/crypto/s5p-sss.c +++ b/drivers/crypto/s5p-sss.c @@ -491,7 +491,7 @@ static void s5p_unset_indata(struct s5p_aes_dev *dev) } static int s5p_make_sg_cpy(struct s5p_aes_dev *dev, struct scatterlist *src, - struct scatterlist **dst) + struct scatterlist **dst) { void *pages; int len; @@ -1890,7 +1890,7 @@ static int s5p_set_indata_start(struct s5p_aes_dev *dev, } static int s5p_set_outdata_start(struct s5p_aes_dev *dev, - struct ablkcipher_request *req) +struct ablkcipher_request *req) { struct scatterlist *sg; int err; -- 2.7.4
[PATCH 4/4] crypto: s5p-sss: Add aes-ctr support
Add support for aes counter(ctr) block cipher mode of operation for Exynos Hardware. In contrast to ecb and cbc modes, aes-ctr allows encyption/decryption for request sizes not being a multiple of 16(bytes). Hardware requires block sizes being a multiple of 16(bytes). In order to achieve this, copy request source and destination memory, and align it's size to 16. That way hardware processes additional bytes, that are omitted when copying the result back to its original destination. Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.19-rc2 with crypto run-time self test testmgr and with tcrypt module: insmod tcrypt.ko sec=1 mode=500. Signed-off-by: Christoph Manszewski --- drivers/crypto/s5p-sss.c | 45 - 1 file changed, 40 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c index 817a4b2b71e6..cfa104f6e468 100644 --- a/drivers/crypto/s5p-sss.c +++ b/drivers/crypto/s5p-sss.c @@ -1814,7 +1814,7 @@ static struct ahash_alg algs_sha1_md5_sha256[] = { }; static void s5p_set_aes(struct s5p_aes_dev *dev, - const u8 *key, const u8 *iv, + const u8 *key, const u8 *iv, const u8 *ctr, unsigned int keylen) { void __iomem *keystart; @@ -1822,6 +1822,9 @@ static void s5p_set_aes(struct s5p_aes_dev *dev, if (iv) memcpy_toio(dev->aes_ioaddr + SSS_REG_AES_IV_DATA(0), iv, 0x10); + if (ctr) + memcpy_toio(dev->aes_ioaddr + SSS_REG_AES_CNT_DATA(0), ctr, 0x10); + if (keylen == AES_KEYSIZE_256) keystart = dev->aes_ioaddr + SSS_REG_AES_KEY_DATA(0); else if (keylen == AES_KEYSIZE_192) @@ -1903,8 +1906,9 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, unsigned long mode) u32 aes_control; unsigned long flags; int err; - u8 *iv; + u8 *iv, *ctr; + /* This sets bit [13:12] to 00, which selects 128-bit counter */ aes_control = SSS_AES_KEY_CHANGE_MODE; if (mode & FLAGS_AES_DECRYPT) aes_control |= SSS_AES_MODE_DECRYPT; @@ -1912,11 +1916,14 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, unsigned long mode) if ((mode & FLAGS_AES_MODE_MASK) == FLAGS_AES_CBC) { aes_control |= SSS_AES_CHAIN_MODE_CBC; iv = req->info; + ctr = NULL; } else if ((mode & FLAGS_AES_MODE_MASK) == FLAGS_AES_CTR) { aes_control |= SSS_AES_CHAIN_MODE_CTR; - iv = req->info; + iv = NULL; + ctr = req->info; } else { iv = NULL; /* AES_ECB */ + ctr = NULL; } if (dev->ctx->keylen == AES_KEYSIZE_192) @@ -1948,7 +1955,7 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, unsigned long mode) goto outdata_error; SSS_AES_WRITE(dev, AES_CONTROL, aes_control); - s5p_set_aes(dev, dev->ctx->aes_key, iv, dev->ctx->keylen); + s5p_set_aes(dev, dev->ctx->aes_key, iv, ctr, dev->ctx->keylen); s5p_set_dma_indata(dev, dev->sg_src); s5p_set_dma_outdata(dev, dev->sg_dst); @@ -2026,7 +2033,8 @@ static int s5p_aes_crypt(struct ablkcipher_request *req, unsigned long mode) struct s5p_aes_ctx *ctx = crypto_ablkcipher_ctx(tfm); struct s5p_aes_dev *dev = ctx->dev; - if (!IS_ALIGNED(req->nbytes, AES_BLOCK_SIZE)) { + if (!IS_ALIGNED(req->nbytes, AES_BLOCK_SIZE) && + ((mode & FLAGS_AES_MODE_MASK) != FLAGS_AES_CTR)) { dev_err(dev->dev, "request size is not exact amount of AES blocks\n"); return -EINVAL; } @@ -2073,6 +2081,11 @@ static int s5p_aes_cbc_decrypt(struct ablkcipher_request *req) return s5p_aes_crypt(req, FLAGS_AES_DECRYPT | FLAGS_AES_CBC); } +static int s5p_aes_ctr_crypt(struct ablkcipher_request *req) +{ + return s5p_aes_crypt(req, FLAGS_AES_CTR); +} + static int s5p_aes_cra_init(struct crypto_tfm *tfm) { struct s5p_aes_ctx *ctx = crypto_tfm_ctx(tfm); @@ -2127,6 +2140,28 @@ static struct crypto_alg algs[] = { .decrypt= s5p_aes_cbc_decrypt, } }, + { + .cra_name = "ctr(aes)", + .cra_driver_name= "ctr-aes-s5p", + .cra_priority = 100, + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | + CRYPTO_ALG_ASYNC | + CRYPTO_ALG_KERN_DRIVER_ONLY, + .cra_blocksize = AES_BLOCK_SIZE, + .cra_ctxsize= sizeof(struct s5p_aes_ctx), + .cra_alignmask = 0x0f, + .cra_type = &crypto_ablkcipher_type, + .cra_module = THIS_MODULE, + .cra_init
[PATCH 1/4] crypto: s5p-sss: Fix race in error handling
Remove a race condition introduced by error path in functions: s5p_aes_interrupt and s5p_aes_crypt_start. Setting the busy field of struct s5p_aes_dev to false made it possible for s5p_tasklet_cb to change the req field, before s5p_aes_complete was called. Change the first parameter of s5p_aes_complete to struct ablkcipher_request. Before spin_unlock, make a copy of the currently handled request, to ensure s5p_aes_complete function call with the correct request. Signed-off-by: Christoph Manszewski --- drivers/crypto/s5p-sss.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c index faa282074e5a..0cf3f12d8f74 100644 --- a/drivers/crypto/s5p-sss.c +++ b/drivers/crypto/s5p-sss.c @@ -475,9 +475,9 @@ static void s5p_sg_done(struct s5p_aes_dev *dev) } /* Calls the completion. Cannot be called with dev->lock hold. */ -static void s5p_aes_complete(struct s5p_aes_dev *dev, int err) +static void s5p_aes_complete(struct ablkcipher_request *req, int err) { - dev->req->base.complete(&dev->req->base, err); + req->base.complete(&req->base, err); } static void s5p_unset_outdata(struct s5p_aes_dev *dev) @@ -655,6 +655,7 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id) { struct platform_device *pdev = dev_id; struct s5p_aes_dev *dev = platform_get_drvdata(pdev); + struct ablkcipher_request *req; int err_dma_tx = 0; int err_dma_rx = 0; int err_dma_hx = 0; @@ -725,9 +726,10 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id) if (err_dma_hx == 1) s5p_set_dma_hashdata(dev, dev->hash_sg_iter); + req = dev->req; spin_unlock_irqrestore(&dev->lock, flags); - s5p_aes_complete(dev, 0); + s5p_aes_complete(req, 0); /* Device is still busy */ tasklet_schedule(&dev->tasklet); } else { @@ -755,8 +757,9 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id) if (err_dma_hx == 1) s5p_set_dma_hashdata(dev, dev->hash_sg_iter); + req = dev->req; spin_unlock_irqrestore(&dev->lock, flags); - s5p_aes_complete(dev, err); + s5p_aes_complete(req, err); hash_irq_end: /* @@ -1983,7 +1986,7 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, unsigned long mode) s5p_sg_done(dev); dev->busy = false; spin_unlock_irqrestore(&dev->lock, flags); - s5p_aes_complete(dev, err); + s5p_aes_complete(req, err); } static void s5p_tasklet_cb(unsigned long data) -- 2.7.4
[PATCH 0/4] crypto: s5p-sss: Fix and add aes-ctr support
Hello, This patch series adds aes-ctr support in s5p-sss.c driver. Additionally it it provides a fix, and a minor code cleanup. Patch 1 contains a simple fix, of a possible race condition. Patches 2-3 are code cleanup patches. Patch 4 adds support for aes-ctr block cipher mode of operation. Regards, Chris Christoph Manszewski (4): crypto: s5p-sss: Fix race in error handling crypto: s5p-sss: Fix whitespace issues crypto: s5p-sss: Minor code cleanup crypto: s5p-sss: Add aes-ctr support drivers/crypto/s5p-sss.c | 114 +++ 1 file changed, 66 insertions(+), 48 deletions(-) -- 2.7.4
[PATCH 3/4] crypto: s5p-sss: Minor code cleanup
Modifications in s5p-sss.c: - remove unnecessary 'goto' statements, - change uint_8 and uint_32 to u8 and u32 types, Signed-off-by: Christoph Manszewski --- drivers/crypto/s5p-sss.c | 54 +++- 1 file changed, 17 insertions(+), 37 deletions(-) diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c index ce9bd13ea166..817a4b2b71e6 100644 --- a/drivers/crypto/s5p-sss.c +++ b/drivers/crypto/s5p-sss.c @@ -249,8 +249,8 @@ struct s5p_aes_reqctx { struct s5p_aes_ctx { struct s5p_aes_dev *dev; - uint8_t aes_key[AES_MAX_KEY_SIZE]; - uint8_t nonce[CTR_RFC3686_NONCE_SIZE]; + u8 aes_key[AES_MAX_KEY_SIZE]; + u8 nonce[CTR_RFC3686_NONCE_SIZE]; int keylen; }; @@ -518,46 +518,28 @@ static int s5p_make_sg_cpy(struct s5p_aes_dev *dev, struct scatterlist *src, static int s5p_set_outdata(struct s5p_aes_dev *dev, struct scatterlist *sg) { - int err; - - if (!sg->length) { - err = -EINVAL; - goto exit; - } + if (!sg->length) + return -EINVAL; - err = dma_map_sg(dev->dev, sg, 1, DMA_FROM_DEVICE); - if (!err) { - err = -ENOMEM; - goto exit; - } + if (!dma_map_sg(dev->dev, sg, 1, DMA_FROM_DEVICE)) + return -ENOMEM; dev->sg_dst = sg; - err = 0; -exit: - return err; + return 0; } static int s5p_set_indata(struct s5p_aes_dev *dev, struct scatterlist *sg) { - int err; - - if (!sg->length) { - err = -EINVAL; - goto exit; - } + if (!sg->length) + return -EINVAL; - err = dma_map_sg(dev->dev, sg, 1, DMA_TO_DEVICE); - if (!err) { - err = -ENOMEM; - goto exit; - } + if (!dma_map_sg(dev->dev, sg, 1, DMA_TO_DEVICE)) + return -ENOMEM; dev->sg_src = sg; - err = 0; -exit: - return err; + return 0; } /* @@ -662,8 +644,7 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id) bool tx_end = false; bool hx_end = false; unsigned long flags; - uint32_t status; - u32 st_bits; + u32 status, st_bits; int err; spin_lock_irqsave(&dev->lock, flags); @@ -1833,7 +1814,7 @@ static struct ahash_alg algs_sha1_md5_sha256[] = { }; static void s5p_set_aes(struct s5p_aes_dev *dev, - const uint8_t *key, const uint8_t *iv, + const u8 *key, const u8 *iv, unsigned int keylen) { void __iomem *keystart; @@ -1919,7 +1900,7 @@ static int s5p_set_outdata_start(struct s5p_aes_dev *dev, static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, unsigned long mode) { struct ablkcipher_request *req = dev->req; - uint32_t aes_control; + u32 aes_control; unsigned long flags; int err; u8 *iv; @@ -2027,7 +2008,7 @@ static int s5p_aes_handle_req(struct s5p_aes_dev *dev, err = ablkcipher_enqueue_request(&dev->queue, req); if (dev->busy) { spin_unlock_irqrestore(&dev->lock, flags); - goto exit; + return err; } dev->busy = true; @@ -2035,7 +2016,6 @@ static int s5p_aes_handle_req(struct s5p_aes_dev *dev, tasklet_schedule(&dev->tasklet); -exit: return err; } @@ -2057,7 +2037,7 @@ static int s5p_aes_crypt(struct ablkcipher_request *req, unsigned long mode) } static int s5p_aes_setkey(struct crypto_ablkcipher *cipher, - const uint8_t *key, unsigned int keylen) + const u8 *key, unsigned int keylen) { struct crypto_tfm *tfm = crypto_ablkcipher_tfm(cipher); struct s5p_aes_ctx *ctx = crypto_tfm_ctx(tfm); -- 2.7.4
Re: [PATCH 1/5] crypto: arm/aes-ce - enable module autoloading based on CPU feature bits
On 13 September 2018 at 08:24, Stefan Agner wrote: > On 10.09.2018 00:01, Ard Biesheuvel wrote: >> On 10 September 2018 at 08:21, Stefan Agner wrote: >>> Hi Ard, >>> >>> On 21.05.2017 03:23, Ard Biesheuvel wrote: Make the module autoloadable by tying it to the CPU feature bit that describes whether the optional instructions it relies on are implemented by the current CPU. >>> >>> This leads to a compiler warning when compiling multi_v7_defconfig/ARM32 >>> using Clang 6.0.1: >>> >>> arch/arm/crypto/aes-ce-glue.c:450:1: warning: variable >>> 'cpu_feature_match_AES' is not needed and will not >>> be emitted [-Wunneeded-internal-declaration] >>> module_cpu_feature_match(AES, aes_init); >>> >>> ./include/linux/cpufeature.h:48:33: note: expanded from macro >>> 'module_cpu_feature_match' >>> static struct cpu_feature const cpu_feature_match_ ## x[] = \ >>> >>> :83:1: note: expanded from here >>> cpu_feature_match_AES >>> ^ >>> 1 warning generated. >>> >>> Do you happen to have an idea how to alleviate? >>> >> >> I guess this only happens for modules that are selected as builtin, >> and so MODULE_DEVICE_TABLE() resolves to nothing? >> Does this only occur for CPU features? > > So in the above case CONFIG_ARM_CRYPTO=y, CONFIG_CRYPTO_AES_ARM_CE=m... > > Right now I only saw it with CPU features... I remember seen similar issues, > which got resolved. Digging in the git history I found 1f318a8bafcf > ("modules: mark __inittest/__exittest as __maybe_unused"), > > This seems to resolve it: > > --- a/include/linux/cpufeature.h > +++ b/include/linux/cpufeature.h > @@ -45,7 +45,7 @@ > * 'asm/cpufeature.h' of your favorite architecture. > */ > #define module_cpu_feature_match(x, __initfunc)\ > -static struct cpu_feature const cpu_feature_match_ ## x[] =\ > +static struct cpu_feature const __maybe_unused cpu_feature_match_ ## x[] = \ > { { .feature = cpu_feature(x) }, { } }; \ > MODULE_DEVICE_TABLE(cpu, cpu_feature_match_ ## x); \ > \ > > Also arch/arm/crypto/crc32-ce-glue.c needs an extra __maybe_unused. > Yes, that looks like the right approach to me. The difference between other uses of MODULE_DEVICE_TABLE() is that the second argument usually gets referenced in some way in the driver struct. It the CPU feature case, that does not happen and so the struct ends up being unreferenced when building the code into the kernel.