RE: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Tian, Kevin
> From: Jerome Glisse
> Sent: Thursday, September 13, 2018 10:52 PM
>
[...]
 > AFAIK, on x86 and PPC at least, all PCIE devices are in the same group
> by default at boot or at least all devices behind the same bridge.

the group thing reflects physical hierarchy limitation, not changed
cross boot. Please note iommu group defines the minimal isolation
boundary - all devices within same group must be attached to the
same iommu domain or address space, because physically IOMMU
cannot differentiate DMAs out of those devices. devices behind
legacy PCI-X bridge is one example. other examples include devices
behind a PCIe switch port which doesn't support ACS thus cannot
route p2p transaction to IOMMU. If talking about typical PCIe 
endpoint (with upstreaming ports all supporting ACS), you'll get
one device per group.

One iommu group today is attached to only one iommu domain.
In the future one group may attach to multiple domains, as the
aux domain concept being discussed in another thread.

> 
> Maybe they are kernel option to avoid that and userspace init program
> can definitly re-arrange that base on sysadmin policy).

I don't think there is such option, as it may break isolation model
enabled by IOMMU.

[...]
> > > That is why i am being pedantic :) on making sure there is good reasons
> > > to do what you do inside VFIO. I do believe that we want a common
> frame-
> > > work like the one you are proposing but i do not believe it should be
> > > part of VFIO given the baggages it comes with and that are not relevant
> > > to the use cases for this kind of devices.
> >

The purpose of VFIO is clear - the kernel portal for granting generic 
device resource (mmio, irq, etc.) to user space. VFIO doesn't care
what exactly a resource is used for (queue, cmd reg, etc.). If really
pursuing VFIO path is necessary, maybe such common framework
should lay down in user space, which gets all granted resource from
kernel driver thru VFIO and then provides accelerator services to 
other processes?

Thanks
Kevin


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Ard Biesheuvel
On 12 September 2018 at 20:34, Eric Biggers  wrote:
> On Wed, Sep 12, 2018 at 08:19:21PM +0200, Ard Biesheuvel wrote:
>> On 12 September 2018 at 20:16, Jason A. Donenfeld  wrote:
>> > Hi Eric,
>> >
>> > On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers  wrote:
>> >> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure 
>> >> if
>> >> you've actually read through the asm from the OpenSSL implementations, 
>> >> but the
>> >> generated .S files actually do lose a lot of semantic information that 
>> >> was in
>> >> the original .pl scripts.
>> >
>> > The thing to keep in mind is that the .S was not directly and blindly
>> > generated from the .pl. We started with the output of the .pl, and
>> > then, particularly in the case of x86_64, worked with it a lot, and
>> > now it's something a bit different. We've definitely spent a lot of
>> > time reading that assembly.
>> >
>>
>> Can we please have those changes as a separate patch? Preferably to
>> the .pl file rather than the .S file, so we can easily distinguish the
>> code from upstream from the code that you modified.
>>
>> > I'll see if I can improve the readability with some register name
>> > remapping on ARM. No guarantees, but I'll play a bit and see if I can
>> > make it a bit better.
>> >
>> > Jason
>
> FWIW, yesterday I made a modified version of poly1305-armv4.pl that generates 
> an
> asm file that works in kernel mode.  The changes are actually pretty small, 
> and
> I think we can get them upstream into OpenSSL like they were for 
> sha256-armv4.pl
> and sha512-armv4.pl.  I'll start a thread with Andy Polyakov and you two.
>
> But I don't have time to help with all the many OpenSSL asm files Jason is
> proposing, just maybe poly1305-armv4 and chacha-armv4 for now.
>

Thanks Eric. I reached out to Andy Polyakov off line, and he is happy
to work with us again on this, although he did point out that our
experiences on ARM may not extrapolate to x86_64, given the fact that
the perl sources there also contain parameterization for the calling
convention differences between Windows and SysV.


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Ard Biesheuvel
On 13 September 2018 at 17:58, Jason A. Donenfeld  wrote:
> On Thu, Sep 13, 2018 at 5:43 PM Ard Biesheuvel
>  wrote:
>> I'd prefer it if all the accelerated software implementations live in
>> the same place. But I do strongly prefer arch code to live in
>> arch/$arch
>
> Zinc follows the scheme of the raid6 code, as well as of most other
> crypto libraries: code is grouped by cipher, making it easy for people
> to work with and understand differing implementations. It also allows
> us to trivially link these together at compile time rather than at
> link time, which makes cipher selection much more efficient. It's
> really much more maintainable this way.
>
>> I think AES-GCM is a useful example here. I really like the SIMD token
>> abstraction a lot, but I would like to understand how this would work
>> in Zinc if you have
>> a) a generic implementation
>> b) perhaps an arch specific scalar implementation
>> c) a pure NEON implementation
>> d) an implementation using AES instructions but not the PMULL instructions
>> e) an implementation that uses AES and PMULL instructions.
>
> The same way that Zinc currently chooses between the five different
> implementations for, say, x86_64 ChaCha20:
>
> - Generic C scalar
> - SSSE3
> - AVX2
> - AVX512F
> - AVX512VL
>
> We make a decision based on CPU capabilities, SIMD context, and input
> length, and then choose the right function.
>

OK, so given random.c's future dependency on Zinc (for ChaCha20), and
the fact that Zinc is one monolithic piece of code, all versions of
all algorithms will always be statically linked into the kernel
proper. I'm not sure that is acceptable.

>> You know what? If you're up for it, let's not wait until Plumbers, but
>> instead, let's collaborate off list to get this into shape.
>
> Sure, sounds good.
>

BTW you haven't answered my question yet about what happens when the
WireGuard protocol version changes: will we need a flag day and switch
all deployments over at the same time?


Re: [PATCH] gcmaes_crypt_by_sg: don't use GFP_ATOMIC allocation if the request doesn't cross a page

2018-09-13 Thread Herbert Xu
On Wed, Sep 05, 2018 at 09:18:43AM -0400, Mikulas Patocka wrote:
> This patch fixes gcmaes_crypt_by_sg so that it won't use memory
> allocation if the data doesn't cross a page boundary.
> 
> Authenticated encryption may be used by dm-crypt. If the encryption or
> decryption fails, it would result in I/O error and filesystem corruption.
> The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can
> fail anytime. This patch fixes the logic so that it won't attempt the
> failing allocation if the data doesn't cross a page boundary.
> 
> Signed-off-by: Mikulas Patocka 
> Cc: sta...@vger.kernel.org

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH 1/1] crypto: cavium/nitrox - Added support for SR-IOV configuration.

2018-09-13 Thread Herbert Xu
On Fri, Sep 07, 2018 at 12:31:18PM +0530, Srikanth Jampala wrote:
> Added support to configure SR-IOV using sysfs interface.
> Supported VF modes are 16, 32, 64 and 128. Grouped the
> hardware configuration functions to "nitrox_hal.h" file.
> Changed driver version to "1.1".
> 
> Signed-off-by: Srikanth Jampala 
> Reviewed-by: Gadam Sreerama 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFC PATCH cryptodev] crc-t10dif: crc_t10dif_mutex can be static

2018-09-13 Thread Herbert Xu
On Wed, Sep 05, 2018 at 01:52:44AM +0800, kbuild test robot wrote:
> 
> Fixes: b76377543b73 ("crc-t10dif: Pick better transform if one becomes 
> available")
> Signed-off-by: kbuild test robot 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2

2018-09-13 Thread Herbert Xu
On Wed, Sep 05, 2018 at 09:26:41AM +0200, Ondrej Mosnacek wrote:
> It turns out OSXSAVE needs to be checked only for AVX, not for SSE.
> Without this patch the affected modules refuse to load on CPUs with SSE2
> but without AVX support.
> 
> Fixes: 877ccce7cbe8 ("crypto: x86/aegis,morus - Fix and simplify CPUID 
> checks")
> Cc:  # 4.18
> Reported-by: Zdenek Kaspar 
> Signed-off-by: Ondrej Mosnacek 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v8 5/9] dm: Remove VLA usage from hashes

2018-09-13 Thread Herbert Xu
On Thu, Sep 13, 2018 at 01:54:39PM -0400, Mike Snitzer wrote:
>
> Acked-by: Mike Snitzer 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: Adds user space interface for ALG_SET_KEY_TYPE

2018-09-13 Thread Herbert Xu
Kalyani Akula  wrote:
> ALG_SET_KEY_TYPE requires caller to pass the key_type to be used
> for AES encryption/decryption.
> 
> Sometimes the cipher key will be stored in the device's
> hardware (eFuse, BBRAM etc).So,there is a need to specify the information
> about the key-type to use it for Encrypt or Decrypt operations.
> 
> This patch implements the above requirement.
> 
> 
> Signed-off-by: Kalyani Akula 

This patch does not make sense by itself.  Please post it together
with the patches that will make use of this.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: padlock-aes: Add ecx to outputs for rep instructions

2018-09-13 Thread Herbert Xu
On Fri, Sep 07, 2018 at 02:57:42AM +0100, Ben Hutchings wrote:
> The current constraints for inline "rep xcrypt*" instructions mark ecx
> as an input only.  The compiler can therefore assume wrongly that ecx
> holds the same value afterward, but in reality it will contain 0.
> 
> This previously led to data corruption, which was fixed around by
> commit 46d8c4b28652 ("crypto: padlock-aes - Fix Nano workaround data
> corruption").  But a future compiler or different optimisation
> configuration could reintroduce the problem.  Update the constraints
> to avoid this.
> 
> Fixes: a76c1c23d0c3 ("crypto: padlock-aes - work around Nano CPU ...")
> Fixes: 46d8c4b28652 ("crypto: padlock-aes - Fix Nano workaround data ...")
> Signed-off-by: Ben Hutchings 
> ---
> This is totally untested, so don't assume I know what I'm talking
> about. :-)

Could you please test it by combining this patch with a revert
of my fix to confirm that it actually works?

Thanks!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Kenneth Lee
On Thu, Sep 13, 2018 at 10:51:50AM -0400, Jerome Glisse wrote:
> Date: Thu, 13 Sep 2018 10:51:50 -0400
> From: Jerome Glisse 
> To: Kenneth Lee 
> CC: Kenneth Lee , Alex Williamson
>  , Herbert Xu ,
>  k...@vger.kernel.org, Jonathan Corbet , Greg Kroah-Hartman
>  , Zaibo Xu ,
>  linux-...@vger.kernel.org, Sanjay Kumar , Hao
>  Fang , linux-ker...@vger.kernel.org,
>  linux...@huawei.com, io...@lists.linux-foundation.org, "David S . Miller"
>  , linux-crypto@vger.kernel.org, Zhou Wang
>  , Philippe Ombredanne ,
>  Thomas Gleixner , Joerg Roedel ,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu 
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20180913145149.gb3...@redhat.com>
> 
> On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote:
> > On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> > > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex 
> > > > > > > > > > > > Williamson wrote:
> > > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth 
> > > > > > > > > > > > > > Lee wrote:
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > > > "copy_from_user" the
> > > > > > > > > > user memory to the kernel. That is not what we need. What 
> > > > > > > > > > we try to get is: the
> > > > > > > > > > user application do something on its data, and push it away 
> > > > > > > > > > to the accelerator,
> > > > > > > > > > and says: "I'm tied, it is your turn to do the job...". 
> > > > > > > > > > Then the accelerator has
> > > > > > > > > > the memory, referring any portion of it with the same VAs 
> > > > > > > > > > of the application,
> > > > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > > > 
> > > > > > > > > You were not looking at right place see 
> > > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > > > It does GUP and create GEM object AFAICR you can wrap that 
> > > > > > > > > GEM object into a
> > > > > > > > > dma buffer object.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Thank you for directing me to this implementation. It is 
> > > > > > > > interesting:).
> > > > > > > > 
> > > > > > > > But it is not yet solve my problem. If I understand it right, 
> > > > > > > > the userptr in
> > > > > > > > i915 do the following:
> > > > > > > > 
> > > > > > > > 1. The user process sets a user pointer with size to the kernel 
> > > > > > > > via ioctl.
> > > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm 
> > > > > > > > for further
> > > > > > > >reference.
> > > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the 
> > > > > > > > device. So the data
> > > > > > > >can be shared between the user space and the hardware.
> > > > > > > > 
> > > > > > > > But my scenario is: 
> > > > > > > > 
> > > > > > > > 1. The user process has some data in the user space, pointed by 
> > > > > > > > a pointer, say
> > > > > > > >ptr1. And within the memory, there may be some other 
> > > > > > > > pointers, let's say one
> > > > > > > >of them is ptr2.
> > > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO 
> > > > > > > > space. And the
> > > > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > > > 
> > > > > > > > Userptr lets the hardware and process share the same memory 
> > > > > > > > space. But I need
> > > > > > > > them to share the same *address space*. So IOMMU is a MUST for 
> > > > > > > > WarpDrive,
> > > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > > > procedure is OK.
> > > > > > > 
> > > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > > > without non SVA/SVM path.
> > > > > > > 
> > > > > > 
> > > > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > > > understanding, Share
> > > > > > Virtual Address/Memory means: any virtual address 

[PATCH][RFC] crypto: skcipher: Remove VLA usage

2018-09-13 Thread Kees Cook
RFC follow-up to 
https://lkml.kernel.org/r/CAGXu5j+bpLK=EQ9LHkO8V=sdaQwt==6fbghgn2vi1e9_wxs...@mail.gmail.com

The core API changes:

struct crypto_sync_skcipher
crypto_alloc_sync_skcipher()
crypto_free_sync_skcipher()
crypto_sync_skcipher_setkey()
skcipher_request_set_sync_tfm()
SKCIPHER_REQUEST_ON_STACK type check

and a single user's refactoring as an example:

drivers/crypto/ccp/ccp-crypto.h
drivers/crypto/ccp/ccp-crypto-aes-xts.c

Does this look correct? If so, I can continue and do the other 60
instances of SKCIPHER_REQUEST_ON_STACK().

Signed-off-by: Kees Cook 
---
 crypto/skcipher.c   | 24 +
 drivers/crypto/ccp/ccp-crypto-aes-xts.c | 10 
 drivers/crypto/ccp/ccp-crypto.h |  2 +-
 include/crypto/skcipher.h   | 34 -
 4 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 0bd8c6caa498..4caab81d2d02 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -949,6 +949,30 @@ struct crypto_skcipher *crypto_alloc_skcipher(const char 
*alg_name,
 }
 EXPORT_SYMBOL_GPL(crypto_alloc_skcipher);
 
+struct crypto_sync_skcipher *crypto_alloc_sync_skcipher(
+   const char *alg_name, u32 type, u32 mask)
+{
+   struct crypto_skcipher *tfm;
+
+   /* Only sync algorithms allowed. */
+   mask |= CRYPTO_ALG_ASYNC;
+
+   tfm = crypto_alloc_tfm(alg_name, &crypto_skcipher_type2, type, mask);
+
+   /*
+* Make sure we do not allocate something that might get used with
+* an on-stack request: check the request size.
+*/
+   if (!IS_ERR(tfm) && WARN_ON(crypto_skcipher_reqsize(tfm) >
+   MAX_SYNC_SKCIPHER_REQSIZE)) {
+   crypto_free_skcipher(tfm);
+   return ERR_PTR(-EINVAL);
+   }
+
+   return (struct crypto_sync_skcipher *)tfm;
+}
+EXPORT_SYMBOL_GPL(crypto_alloc_sync_skcipher);
+
 int crypto_has_skcipher2(const char *alg_name, u32 type, u32 mask)
 {
return crypto_type_has_alg(alg_name, &crypto_skcipher_type2,
diff --git a/drivers/crypto/ccp/ccp-crypto-aes-xts.c 
b/drivers/crypto/ccp/ccp-crypto-aes-xts.c
index 94b5bcf5b628..983c921736b4 100644
--- a/drivers/crypto/ccp/ccp-crypto-aes-xts.c
+++ b/drivers/crypto/ccp/ccp-crypto-aes-xts.c
@@ -102,7 +102,7 @@ static int ccp_aes_xts_setkey(struct crypto_ablkcipher 
*tfm, const u8 *key,
ctx->u.aes.key_len = key_len / 2;
sg_init_one(&ctx->u.aes.key_sg, ctx->u.aes.key, key_len);
 
-   return crypto_skcipher_setkey(ctx->u.aes.tfm_skcipher, key, key_len);
+   return crypto_sync_skcipher_setkey(ctx->u.aes.tfm_skcipher, key, 
key_len);
 }
 
 static int ccp_aes_xts_crypt(struct ablkcipher_request *req,
@@ -156,7 +156,7 @@ static int ccp_aes_xts_crypt(struct ablkcipher_request *req,
/* Use the fallback to process the request for any
 * unsupported unit sizes or key sizes
 */
-   skcipher_request_set_tfm(subreq, ctx->u.aes.tfm_skcipher);
+   skcipher_request_set_sync_tfm(subreq, ctx->u.aes.tfm_skcipher);
skcipher_request_set_callback(subreq, req->base.flags,
  NULL, NULL);
skcipher_request_set_crypt(subreq, req->src, req->dst,
@@ -203,12 +203,12 @@ static int ccp_aes_xts_decrypt(struct ablkcipher_request 
*req)
 static int ccp_aes_xts_cra_init(struct crypto_tfm *tfm)
 {
struct ccp_ctx *ctx = crypto_tfm_ctx(tfm);
-   struct crypto_skcipher *fallback_tfm;
+   struct crypto_sync_skcipher *fallback_tfm;
 
ctx->complete = ccp_aes_xts_complete;
ctx->u.aes.key_len = 0;
 
-   fallback_tfm = crypto_alloc_skcipher("xts(aes)", 0,
+   fallback_tfm = crypto_alloc_sync_skcipher("xts(aes)", 0,
 CRYPTO_ALG_ASYNC |
 CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(fallback_tfm)) {
@@ -226,7 +226,7 @@ static void ccp_aes_xts_cra_exit(struct crypto_tfm *tfm)
 {
struct ccp_ctx *ctx = crypto_tfm_ctx(tfm);
 
-   crypto_free_skcipher(ctx->u.aes.tfm_skcipher);
+   crypto_free_sync_skcipher(ctx->u.aes.tfm_skcipher);
 }
 
 static int ccp_register_aes_xts_alg(struct list_head *head,
diff --git a/drivers/crypto/ccp/ccp-crypto.h b/drivers/crypto/ccp/ccp-crypto.h
index b9fd090c46c2..28819e11db96 100644
--- a/drivers/crypto/ccp/ccp-crypto.h
+++ b/drivers/crypto/ccp/ccp-crypto.h
@@ -88,7 +88,7 @@ static inline struct ccp_crypto_ahash_alg *
 /* AES related defines */
 struct ccp_aes_ctx {
/* Fallback cipher for XTS with unsupported unit sizes */
-   struct crypto_skcipher *tfm_skcipher;
+   struct crypto_sync_skcipher *tfm_skcipher;
 
/* Cipher used to generate CMAC K1/K2 keys */
struct crypto_cipher *tfm_cipher;
diff --git a/include

Re: [PATCH v8 5/9] dm: Remove VLA usage from hashes

2018-09-13 Thread Mike Snitzer
On Tue, Aug 07 2018 at  5:18pm -0400,
Kees Cook  wrote:

> In the quest to remove all stack VLA usage from the kernel[1], this uses
> the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper
> bounds on stack usage.
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 
> Signed-off-by: Kees Cook 
> ---
>  drivers/md/dm-integrity.c  | 23 +--
>  drivers/md/dm-verity-fec.c |  5 -
>  2 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
> index 86438b2f10dd..884edd7cf1d0 100644
> --- a/drivers/md/dm-integrity.c
> +++ b/drivers/md/dm-integrity.c
> @@ -521,7 +521,12 @@ static void section_mac(struct dm_integrity_c *ic, 
> unsigned section, __u8 result
>   }
>   memset(result + size, 0, JOURNAL_MAC_SIZE - size);
>   } else {
> - __u8 digest[size];
> + __u8 digest[HASH_MAX_DIGESTSIZE];
> +
> + if (WARN_ON(size > sizeof(digest))) {
> + dm_integrity_io_error(ic, "digest_size", -EINVAL);
> + goto err;
> + }
>   r = crypto_shash_final(desc, digest);
>   if (unlikely(r)) {
>   dm_integrity_io_error(ic, "crypto_shash_final", r);
> @@ -1244,7 +1249,7 @@ static void integrity_metadata(struct work_struct *w)
>   struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct 
> dm_integrity_io));
>   char *checksums;
>   unsigned extra_space = unlikely(digest_size > ic->tag_size) ? 
> digest_size - ic->tag_size : 0;
> - char checksums_onstack[ic->tag_size + extra_space];
> + char checksums_onstack[HASH_MAX_DIGESTSIZE];
>   unsigned sectors_to_process = dio->range.n_sectors;
>   sector_t sector = dio->range.logical_sector;
>  
> @@ -1253,8 +1258,14 @@ static void integrity_metadata(struct work_struct *w)
>  
>   checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> 
> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space,
>   GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
> - if (!checksums)
> + if (!checksums) {
>   checksums = checksums_onstack;
> + if (WARN_ON(extra_space &&
> + digest_size > sizeof(checksums_onstack))) {
> + r = -EINVAL;
> + goto error;
> + }
> + }
>  
>   __bio_for_each_segment(bv, bio, iter, dio->orig_bi_iter) {
>   unsigned pos;

Given the length of the kmalloc() just prior to this new WARN_ON() line
I'm not seeing why you've elected to split the WARN_ON across multiple
lines.

But that style nit aside:

Acked-by: Mike Snitzer 


Re: [PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests

2018-09-13 Thread Kees Cook
On Thu, Sep 13, 2018 at 9:46 AM, Kees Cook  wrote:
> On Mon, Sep 10, 2018 at 10:52 PM, Herbert Xu
>  wrote:
>> On Fri, Sep 07, 2018 at 08:56:23AM +0200, Ard Biesheuvel wrote:
>>>
>>> OK, so given that all SKCIPHER_REQUEST_ON_STACK occurrences are
>>> updated in this series anyway, perhaps we should add
>>> skcipher_[en|de]crypt_onstack() flavors that encapsulate the
>>> additional check? Only question is how to enforce at compile time that
>>> those are used instead of the ordinary ones when using a stack
>>> allocated request. Would you mind using some macro foo here involving
>>> __builtin_types_compatible_p() ?
>>
>> Something like a completely new type which in reality is just a
>> wrapper around skcipher:
>>
>> struct crypto_sync_skcipher {
>> struct crypto_skcipher base;
>> } tfm;
>>
>> tfm = crypto_alloc_sync_skcipher(...);
>>
>> crypto_sync_skcipher_encrypt(...)
>> crypto_sync_skcipher_decrypt(...)
>>
>> These functions would just be trivial inline functions around their
>> crypto_skcipher counterparts.
>
> This means new wrappers for the other helpers too, yes? For example:
>
> SKCIPHER_REQUEST_ON_STACK(nreq, ctx->null);
>
> skcipher_request_set_tfm(nreq, ctx->null);
> skcipher_request_set_callback(nreq, req->base.flags, NULL, NULL);
> skcipher_request_set_crypt(nreq, req->src, req->dst, nbytes, NULL);
>
> return crypto_skcipher_encrypt(nreq);
>
> For the above, we'd also need:
>
> sync_skcipher_request_set_tfm()
> sync_skcipher_request_set_callback()
> sync_skcipher_request_set_crypt()

Wait, I think I misunderstood you. Did you mean a new top-level thing
(tfm?) not a new request type?

That would mean at least replacing skcipher_request_set_tfm() with a
wrapper (since the tfm argument is different), but _not_
encrypt/decrypt like you mention. I could perform a type test in
SKCIPHER_REQUEST_ON_STACK().

Globally:
- add struct crypto_sync_skcipher wrapper
- add crypto_alloc_sync_skcipher() to check non-ASYNC and request size
of actual tfm
- add skcipher_request_set_sync_tfm() to attach the wrapped tfm to the request
- SKCIPHER_REQUEST_ON_STACK() would verify the tfm was a struct
crypto_sync_skcipher.

Two changes per user:
- change allocation to use new crypto_alloc_sync_skcipher() which does
the runtime checking
- change skcipher_request_set_tfm() to skcipher_request_set_sync_tfm()

This means struct skcipher_request is unchanged, along with
_set_callback, _set_crypt, _zero, and en/decrypt.

API misuse would be caught at build-time (via
SKCIPHER_REQUEST_ON_STACK type checking) and any request size problems
would be caught at allocation time.

Does this sound like what you had in mind?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v2 2/4] crypto: skcipher - Enforce non-ASYNC for on-stack requests

2018-09-13 Thread Kees Cook
On Mon, Sep 10, 2018 at 10:52 PM, Herbert Xu
 wrote:
> On Fri, Sep 07, 2018 at 08:56:23AM +0200, Ard Biesheuvel wrote:
>>
>> OK, so given that all SKCIPHER_REQUEST_ON_STACK occurrences are
>> updated in this series anyway, perhaps we should add
>> skcipher_[en|de]crypt_onstack() flavors that encapsulate the
>> additional check? Only question is how to enforce at compile time that
>> those are used instead of the ordinary ones when using a stack
>> allocated request. Would you mind using some macro foo here involving
>> __builtin_types_compatible_p() ?
>
> Something like a completely new type which in reality is just a
> wrapper around skcipher:
>
> struct crypto_sync_skcipher {
> struct crypto_skcipher base;
> } tfm;
>
> tfm = crypto_alloc_sync_skcipher(...);
>
> crypto_sync_skcipher_encrypt(...)
> crypto_sync_skcipher_decrypt(...)
>
> These functions would just be trivial inline functions around their
> crypto_skcipher counterparts.

This means new wrappers for the other helpers too, yes? For example:

SKCIPHER_REQUEST_ON_STACK(nreq, ctx->null);

skcipher_request_set_tfm(nreq, ctx->null);
skcipher_request_set_callback(nreq, req->base.flags, NULL, NULL);
skcipher_request_set_crypt(nreq, req->src, req->dst, nbytes, NULL);

return crypto_skcipher_encrypt(nreq);

For the above, we'd also need:

sync_skcipher_request_set_tfm()
sync_skcipher_request_set_callback()
sync_skcipher_request_set_crypt()

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v8 5/9] dm: Remove VLA usage from hashes

2018-09-13 Thread Kees Cook
On Mon, Sep 3, 2018 at 8:13 PM, Herbert Xu  wrote:
> On Tue, Aug 07, 2018 at 02:18:39PM -0700, Kees Cook wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this uses
>> the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper
>> bounds on stack usage.
>>
>> [1] 
>> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
>>
>> Signed-off-by: Kees Cook 
>
> Can the dm folks please review this patch?

Mike or Alasdair, can you Ack this patch so Herbert can include it in
the crypto tree? This is blocking some VLA removals[1]...

Thanks!

-Kees

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

-- 
Kees Cook
Pixel Security


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Jason A. Donenfeld
On Thu, Sep 13, 2018 at 5:43 PM Ard Biesheuvel
 wrote:
> I'd prefer it if all the accelerated software implementations live in
> the same place. But I do strongly prefer arch code to live in
> arch/$arch

Zinc follows the scheme of the raid6 code, as well as of most other
crypto libraries: code is grouped by cipher, making it easy for people
to work with and understand differing implementations. It also allows
us to trivially link these together at compile time rather than at
link time, which makes cipher selection much more efficient. It's
really much more maintainable this way.

> I think AES-GCM is a useful example here. I really like the SIMD token
> abstraction a lot, but I would like to understand how this would work
> in Zinc if you have
> a) a generic implementation
> b) perhaps an arch specific scalar implementation
> c) a pure NEON implementation
> d) an implementation using AES instructions but not the PMULL instructions
> e) an implementation that uses AES and PMULL instructions.

The same way that Zinc currently chooses between the five different
implementations for, say, x86_64 ChaCha20:

- Generic C scalar
- SSSE3
- AVX2
- AVX512F
- AVX512VL

We make a decision based on CPU capabilities, SIMD context, and input
length, and then choose the right function.

> You know what? If you're up for it, let's not wait until Plumbers, but
> instead, let's collaborate off list to get this into shape.

Sure, sounds good.

Jason


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Jason A. Donenfeld
On Thu, Sep 13, 2018 at 5:04 PM Ard Biesheuvel
 wrote:
> > The code still benefits from the review that's gone into OpenSSL. It's
> > not modified in ways that would affect the cryptographic operations
> > being done. It's modified to be suitable for kernel space.
>
> So could we please at least have those changes as a separate patch then?

I'll experiment with a couple ways of trying to communicate with
precision what's been changed from OpenSSL for the next round of
patches.

> > That's interesting. I'll bring this up with AndyP. FWIW, if you think
> > you have a real and compelling claim here, I'd be much more likely to
> > accept a different ChaCha20 implementation than I would be to accept a
> > different Poly1305 implementation. (It's a *lot* harder to screw up
> > ChaCha20 than it is to screw up Poly1305.)
>
> The question is really whether we want different implementations in
> the crypto API and in zinc.

Per earlier in this discussion, I've already authored patches that
replaces the crypto API's implementations with simple calls to Zinc,
so that code isn't duplicated. These will be in v4 and you can comment
on the approach then.

> You are completely missing my point. I am not particularly invested in
> the crypto API, and I share the concerns about its usability. That is
> why I want to make sure that your solution actually results in a net
> improvement for everybody, not just for WireGuard, in a maintainable
> way.

Right, likewise. I've put quite a bit of effort into separating Zinc
into Zinc and not into something part of WireGuard. The motivation for
doing so is a decent amount of call sites all around the kernel that
I'd like to gradually fix up.


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Ard Biesheuvel
On 13 September 2018 at 16:32, Jason A. Donenfeld  wrote:
> On Thu, Sep 13, 2018 at 7:41 AM Ard Biesheuvel
>  wrote:
>> But one of the supposed selling points of this crypto library is that
>> it gives engineers who are frightened of crypto in general and the
>> crypto API in particular simple and easy to use crypto primitives
>> rather than having to jump through the crypto API's hoops.
>
> The goal is for engineers who want to specifically use algorithm X
> from within the kernel in a non-dynamic way to be able to then use
> algorithm X with a simple function call. The goal is not to open it up
> to people who have no idea what they're doing; for that a NaCL-like
> library with functions like "crypto_box_open" or something would fit
> the bill; but that's also not what we're trying to do here. Please
> don't confuse the design goals. The rest of your email is therefore a
> bit of a straw man; cut the rhetoric out.
>
>> A crypto library whose only encryption algorithm is a stream cipher
>> does *not* deliver on that promise, since it is only suitable for
>> cases where IVs are guaranteed not to be reused.
>
> False. We also offer XChaCha20Poly1305, which takes a massive nonce,
> suitable for random generation.
>
> If there became a useful case for AES-PMAC-SIV or even AES-GCM or
> something to that extent, then Zinc would add that as required. But
> we're not going to start adding random ciphers unless they're needed.
>

I'd prefer it if all the accelerated software implementations live in
the same place. But I do strongly prefer arch code to live in
arch/$arch, simply because of the maintenance scope for arch
developers and maintainers.

I think AES-GCM is a useful example here. I really like the SIMD token
abstraction a lot, but I would like to understand how this would work
in Zinc if you have
a) a generic implementation
b) perhaps an arch specific scalar implementation
c) a pure NEON implementation
d) an implementation using AES instructions but not the PMULL instructions
e) an implementation that uses AES and PMULL instructions.

On arch/arm64 we support code patching, on other arches it may be
different. We also support udev loading of modules depending on CPU
capabilities, which means only the implementation you are likely to
use will be loaded, and other modules are disregarded.

I am not asking you to implement this now. But I do want to understand
how these use cases fit into Zinc. And note that this has nothing to
do with the crypto API: this could simply be a synchronous
aes_gcm_encrypt() library function that maps to any of the above at
runtime depending on the platform's capabilities.

>> You yourself were
>> bitten by the clunkiness of the crypto API when attempting to use the
>> SHA26 code, right? So shouldn't we move that into this crypto library
>> as well?
>
> As stated in the initial commit, and in numerous other emails
> stretching back a year, yes, sha256 and other things in lib/ are going
> to be put into Zinc following the initial merge of Zinc. These changes
> will happen incrementally, like everything else that happens in the
> kernel. Sha256, in particular, is probably the first thing I'll port
> post-merge.
>

Excellent.

>> I think it is reasonable for WireGuard to standardize on
>> ChaCha20/Poly1305 only, although I have my concerns about the flag day
>> that will be required if this 'one true cipher' ever does turn out to
>> be compromised (either that, or we will have to go back in time and
>> add some kind of protocol versioning to existing deployments of
>> WireGuard)
>
> Those concerns are not valid and have already been addressed (to you,
> I believe) on this mailing list and elsewhere. WireGuard is versioned,
> hence there's no need to "add" versioning, and it is prepared to roll
> out new cryptography in a subsequent version should there be any
> issues. In other words, your concern is based on a misunderstanding of
> the protocol. If you have issues, however, with the design decisions
> of WireGuard, something that's been heavily discussed with members of
> the linux kernel community, networking community, cryptography
> community, and so forth, for the last 3 years, I invite you to bring
> them up on .
>

I'd prefer to have the discussion here, if you don't mind.

IIUC clients and servers need to run the same version of the protocol.
So does it require a flag day to switch the world over to the new
version when the old one is deprecated?

>> And frankly, if the code were as good as the prose, we wouldn't be
>> having this discussion.
>
> Please cut out this rhetoric. That's an obviously unprovable
> statement, but it probably isn't true anyway. I wish you'd stick to
> technical concerns only, rather than what appears to be a desire to
> derail this by any means necessary.
>

I apologize if I hit a nerve there, that was not my intention.

I am not an 'entrenched crypto API guy that is out to get you'. I am a
core ARM developer that takes an interest in crypto, shares

Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Andy Lutomirski


> On Sep 12, 2018, at 11:39 PM, Milan Broz  wrote:
> 
>> On 13/09/18 01:45, Andy Lutomirski wrote:
>> On Wed, Sep 12, 2018 at 3:56 PM, Ard Biesheuvel
> ... 
>> b) Crypto that is used dynamically.  This includes dm-crypt
>> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
>> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
>> get comparatively little benefit from being converted to a zinc-like
>> interface.  For some of these cases, it wouldn't make any sense at all
>> to convert them.  Certainly the ones that do async hardware crypto
>> using DMA engines will never look at all like zinc, even under the
>> hood.
> 
> Please note, that dm-crypt now uses not only block ciphers and modes,
> but also authenticated encryption and hashes (for ESSIV and HMAC
> in authenticated composed modes) and RNG (for random IV).
> We use crypto API, including async variants (I hope correctly :)

Right. And all this is why I don’t think dm-crypt should use zinc, at least not 
any time soon.


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Ard Biesheuvel
On 13 September 2018 at 16:18, Jason A. Donenfeld  wrote:
> On Thu, Sep 13, 2018 at 1:45 AM Andy Lutomirski  wrote:
>> I'm not convinced that there's any real need for *all* crypto
>> algorithms to move into lib/zinc or to move at all.  As I see it,
>> there are two classes of crypto algorithms in the kernel:
>>
>> a) Crypto that is used by code that chooses its algorithm statically
>> and wants synchronous operations.  These include everything in
>> drivers/char/random.c, but also a bunch of various networking things
>> that are hardcoded and basically everything that uses stack buffers.
>> (This means it includes all the code that I broke when I did
>> VMAP_STACK.  Sign.)
>
> Right, exactly. This is what will wind up using Zinc. I'm working on
> an example usage of this for v4 of the patch submission, which you can
> ogle in a preview here if you're curious:
>
> https://git.zx2c4.com/linux-dev/commit/?h=big_key_rewrite
>
> 28 insertions, 206 deletions :-D
>

I must say, that actually looks pretty good.

>> b) Crypto that is used dynamically.  This includes dm-crypt
>> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
>> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
>> get comparatively little benefit from being converted to a zinc-like
>> interface.  For some of these cases, it wouldn't make any sense at all
>> to convert them.  Certainly the ones that do async hardware crypto
>> using DMA engines will never look at all like zinc, even under the
>> hood.
>
> Right, this is what the crypto API will continue to be used for.
>
>
>> I think that, as a short-term goal, it makes a lot of sense to have
>> implementations of the crypto that *new* kernel code (like Wireguard)
>> wants to use in style (a) that live in /lib, and it obviously makes
>> sense to consolidate their implementations with the crypto/
>> implementations in a timely manner.  As a medium-term goal, adding
>> more algorithms as needed for things that could use the simpler APIs
>> (Bluetooth, perhaps) would make sense.
>
> Agreed 100%. With regards to "consolidate their implementations" --
> I've actually already done this after your urging yesterday, and so
> that will be a part of v4.
>
>> But I see no reason at all that /lib should ever contain a grab-bag of
>> crypto implementations just for the heck of it.  They should have real
>> in-kernel users IMO.  And this means that there will probably always
>> be some crypto implementations in crypto/ for things like aes-xts.
>
> Right, precisely.
>
> Jason


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Ard Biesheuvel
On 13 September 2018 at 16:15, Jason A. Donenfeld  wrote:
> Hi Ard,
>
> On Thu, Sep 13, 2018 at 12:56 AM Ard Biesheuvel
>  wrote:
>> In this series, you are dumping a huge volume of unannotated,
>> generated asm into the kernel which has been modified [by you] to
>> [among other things?] adhere to the kernel API (without documenting
>> what the changes are exactly). How does that live up to the promise of
>> better, peer reviewed code?
>
> The code still benefits from the review that's gone into OpenSSL. It's
> not modified in ways that would affect the cryptographic operations
> being done. It's modified to be suitable for kernel space.
>

So could we please at least have those changes as a separate patch then?

>> Then there is the performance claim. We know for instance that the
>> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
>> possess a micro-architectural property that ALU instructions are
>> essentially free when they are interleaved with SIMD instructions. But
>> we also know that a) Cortex-A7, which is a relevant target, is not one
>> of those cores, and b) that chip designers are not likely to optimize
>> for that particular usage pattern so relying on it in generic code is
>> unwise in general.
>
> That's interesting. I'll bring this up with AndyP. FWIW, if you think
> you have a real and compelling claim here, I'd be much more likely to
> accept a different ChaCha20 implementation than I would be to accept a
> different Poly1305 implementation. (It's a *lot* harder to screw up
> ChaCha20 than it is to screw up Poly1305.)
>

The question is really whether we want different implementations in
the crypto API and in zinc.

>> I am also concerned about your claim that all software algorithms will
>> be moved into this crypto library.
>
> I'll defer to Andy's response here, which I think is a correct one:
> https://lkml.org/lkml/2018/9/13/27
>
> The short answer is that Zinc is going to be adding the ciphers that
> people want to use for normal reasons from normal code. For example,
> after this merges, we'll next be working on moving the remaining
> non-optimized C code out of lib/ that's called by places (such as
> SHA2).
>

Excellent.

>> You are not specific about whose
>> responsibility it will be that this is going to happen in a timely
>> fashion.
>
> I thought I laid out the roadmap for this in the commit message. In
> case I wasn't clear: my plan is to tackle lib/ after merging, and I
> plan to do so in a timely manner. It's a pretty common tactic to keep
> layering on tasks, "what about X?", "what about Y?", "I won't agree
> unless Z!" -- when in reality kernel development and refactorings are
> done incrementally. I've been around on this list contributing code
> for long enough that you should have a decent amount of confidence
> that I'm not just going to disappear working on this or something
> insane like that. And neither are the two academic cryptographers CC'd
> on this thread. So, as Andy said, we're going to be porting to Zinc
> the primitives that are useful for the various applications of Zinc.
> This means yes, we'll have SHA2 in there.
>
>> chaining modes
>> What are the APIs
>> going to look like for block ciphers, taking chaining modes into
>> account?
>
> As mentioned in the commit message and numerous times, we're not
> trying to make a win32-like crypto API here or to remake the existing
> Linux crypto API. Rather we're providing libraries of specific
> functions that are useful for various circumstances. For example, if
> AES-GCM is desired at some point, then we'll have a similar API for
> that as we do for ChaPoly -- one that takes buffers and one that takes
> sg. Likewise, hash functions use the familiar init/update/final.
> "Generic" chaining modes aren't really part of the equation or design
> goals.
>
> Again, I realize you've spent a long time working on the existing
> crypto API, and so your questions and concerns are in the line of,
> "how are we going to make Zinc look like the existing crypto API in
> functionality?"

You are completely missing my point. I am not particularly invested in
the crypto API, and I share the concerns about its usability. That is
why I want to make sure that your solution actually results in a net
improvement for everybody, not just for WireGuard, in a maintainable
way.

> But that's not what we're up to here. We have a
> different and complementary design goal. I understand why you're
> squirming, but please recognize we're working on different things.
>
>> I'm sure it is rather simple to port the crypto API implementation of
>> ChaCha20 to use your library. I am more concerned about how your
>> library is going to expand to cover all other software algorithms that
>> we currently use in the kernel.
>
> The subset of algorithms we add will be developed with the same
> methodology as the present ones. There is nothing making this
> particularly difficult or even more difficult for other primitives
> than it wa

Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Jerome Glisse
On Thu, Sep 13, 2018 at 04:32:32PM +0800, Kenneth Lee wrote:
> On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> > On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote:
> > > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote:
> > > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee 
> > > > > > > > > > > > > wrote:
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > > "copy_from_user" the
> > > > > > > > > user memory to the kernel. That is not what we need. What we 
> > > > > > > > > try to get is: the
> > > > > > > > > user application do something on its data, and push it away 
> > > > > > > > > to the accelerator,
> > > > > > > > > and says: "I'm tied, it is your turn to do the job...". Then 
> > > > > > > > > the accelerator has
> > > > > > > > > the memory, referring any portion of it with the same VAs of 
> > > > > > > > > the application,
> > > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > > 
> > > > > > > > You were not looking at right place see 
> > > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > > It does GUP and create GEM object AFAICR you can wrap that GEM 
> > > > > > > > object into a
> > > > > > > > dma buffer object.
> > > > > > > > 
> > > > > > > 
> > > > > > > Thank you for directing me to this implementation. It is 
> > > > > > > interesting:).
> > > > > > > 
> > > > > > > But it is not yet solve my problem. If I understand it right, the 
> > > > > > > userptr in
> > > > > > > i915 do the following:
> > > > > > > 
> > > > > > > 1. The user process sets a user pointer with size to the kernel 
> > > > > > > via ioctl.
> > > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm 
> > > > > > > for further
> > > > > > >reference.
> > > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the 
> > > > > > > device. So the data
> > > > > > >can be shared between the user space and the hardware.
> > > > > > > 
> > > > > > > But my scenario is: 
> > > > > > > 
> > > > > > > 1. The user process has some data in the user space, pointed by a 
> > > > > > > pointer, say
> > > > > > >ptr1. And within the memory, there may be some other pointers, 
> > > > > > > let's say one
> > > > > > >of them is ptr2.
> > > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO 
> > > > > > > space. And the
> > > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > > 
> > > > > > > Userptr lets the hardware and process share the same memory 
> > > > > > > space. But I need
> > > > > > > them to share the same *address space*. So IOMMU is a MUST for 
> > > > > > > WarpDrive,
> > > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > > procedure is OK.
> > > > > > 
> > > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > > without non SVA/SVM path.
> > > > > > 
> > > > > 
> > > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > > understanding, Share
> > > > > Virtual Address/Memory means: any virtual address in a process can be 
> > > > > used by
> > > > > device at the same time. This requires IOMMU device to support PASID. 
> > > > > And
> > > > > optionally, it requires the feature of page-fault-from-device.
> > > > 
> > > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, access
> > > > to range that are MMIO map ie CPU page table pointing to IO memory, IIRC
> > > > it is undefined what happens on some platform for a device trying to
> > > > access those using SVA/SVM.
> > > > 
> > > > 
> > > > > But before the feature is settled down, IOMMU can be used immediately 
> > > > > in the
> > > > > current kernel. That make it possible to assign ONE process's virtual 
> > > > > addresses
> > > > > to the device's IOMMU page table with GUP. This make WarpDrive work 
> > > > > well for one
> > > > > process.
> > > > 
> > > > UH ? How ? You want to GUP _every_ single valid address in the process
> > > > and map it to the device ? How do you handle new vm

Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Jason A. Donenfeld
Hi Milan,

On Thu, Sep 13, 2018 at 8:40 AM Milan Broz  wrote:
> Please note, that dm-crypt now uses not only block ciphers and modes,
> but also authenticated encryption and hashes (for ESSIV and HMAC
> in authenticated composed modes) and RNG (for random IV).
> We use crypto API, including async variants (I hope correctly :)
>
> There is a long time battle to move initialization vectors generators
> from dm-crypt to crypto API. If there are any plans to use a new library,
> this issue should be discussed as well.
> (Some dm-crypt IV generators are disk encryption specific, some do more
> that just IV so porting is not straightforward etc).
>
> Related problem here is an optimization of chain of sectors encryption -
> if we have new crypto API, it would be nice if can take chain of sectors
> so possible implementation can process this chain in one batch
> (every sector need to be tweaked by differently generated IV - and we
> are back in problem above).
> I think filesystem encryption uses the same pattern.
>
> And btw, we use the same algorithms through AF_ALG in userspace (cryptsetup).
>
> So please, if you mention dm-crypt, note that it is very complex
> crypto API consumer :) And everything is dynamic, configurable through
> dm-crypt options.
>
> That said, I would be more than happy to help in experiments to porting 
> dm-crypt
> to any other crypto library, but if it doesn't not help with problems
> mentioned above, I do not see any compelling reason for the new library for 
> dm-crypt...

dm-crypt is probably a good consumer of the existing crypto API and
won't be impacted by the introduction of Zinc, which is really just
the exposure of a couple low level simple crypto functions, and not a
fancy API like the crypto API which dm-crypt happily uses.

Jason


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Jason A. Donenfeld
On Thu, Sep 13, 2018 at 7:41 AM Ard Biesheuvel
 wrote:
> But one of the supposed selling points of this crypto library is that
> it gives engineers who are frightened of crypto in general and the
> crypto API in particular simple and easy to use crypto primitives
> rather than having to jump through the crypto API's hoops.

The goal is for engineers who want to specifically use algorithm X
from within the kernel in a non-dynamic way to be able to then use
algorithm X with a simple function call. The goal is not to open it up
to people who have no idea what they're doing; for that a NaCL-like
library with functions like "crypto_box_open" or something would fit
the bill; but that's also not what we're trying to do here. Please
don't confuse the design goals. The rest of your email is therefore a
bit of a straw man; cut the rhetoric out.

> A crypto library whose only encryption algorithm is a stream cipher
> does *not* deliver on that promise, since it is only suitable for
> cases where IVs are guaranteed not to be reused.

False. We also offer XChaCha20Poly1305, which takes a massive nonce,
suitable for random generation.

If there became a useful case for AES-PMAC-SIV or even AES-GCM or
something to that extent, then Zinc would add that as required. But
we're not going to start adding random ciphers unless they're needed.

> You yourself were
> bitten by the clunkiness of the crypto API when attempting to use the
> SHA26 code, right? So shouldn't we move that into this crypto library
> as well?

As stated in the initial commit, and in numerous other emails
stretching back a year, yes, sha256 and other things in lib/ are going
to be put into Zinc following the initial merge of Zinc. These changes
will happen incrementally, like everything else that happens in the
kernel. Sha256, in particular, is probably the first thing I'll port
post-merge.

> I think it is reasonable for WireGuard to standardize on
> ChaCha20/Poly1305 only, although I have my concerns about the flag day
> that will be required if this 'one true cipher' ever does turn out to
> be compromised (either that, or we will have to go back in time and
> add some kind of protocol versioning to existing deployments of
> WireGuard)

Those concerns are not valid and have already been addressed (to you,
I believe) on this mailing list and elsewhere. WireGuard is versioned,
hence there's no need to "add" versioning, and it is prepared to roll
out new cryptography in a subsequent version should there be any
issues. In other words, your concern is based on a misunderstanding of
the protocol. If you have issues, however, with the design decisions
of WireGuard, something that's been heavily discussed with members of
the linux kernel community, networking community, cryptography
community, and so forth, for the last 3 years, I invite you to bring
them up on .

> And frankly, if the code were as good as the prose, we wouldn't be
> having this discussion.

Please cut out this rhetoric. That's an obviously unprovable
statement, but it probably isn't true anyway. I wish you'd stick to
technical concerns only, rather than what appears to be a desire to
derail this by any means necessary.

> Zinc adds its own clunky ways to mix arch and
> generic code, involving GCC -include command line arguments and
> #ifdefs everywhere. My review comments on this were completely ignored
> by Jason.

No, they were not ignored. v2 cleaned up the #ifdefs. v4 has already
cleaned up the makefile stuff and will be even cleaner. Good things
await, don't worry.

Jason


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Jason A. Donenfeld
On Thu, Sep 13, 2018 at 1:45 AM Andy Lutomirski  wrote:
> I'm not convinced that there's any real need for *all* crypto
> algorithms to move into lib/zinc or to move at all.  As I see it,
> there are two classes of crypto algorithms in the kernel:
>
> a) Crypto that is used by code that chooses its algorithm statically
> and wants synchronous operations.  These include everything in
> drivers/char/random.c, but also a bunch of various networking things
> that are hardcoded and basically everything that uses stack buffers.
> (This means it includes all the code that I broke when I did
> VMAP_STACK.  Sign.)

Right, exactly. This is what will wind up using Zinc. I'm working on
an example usage of this for v4 of the patch submission, which you can
ogle in a preview here if you're curious:

https://git.zx2c4.com/linux-dev/commit/?h=big_key_rewrite

28 insertions, 206 deletions :-D

> b) Crypto that is used dynamically.  This includes dm-crypt
> (aes-xts-plain64, aes-cbc-essiv, etc), all the ALG_IF interfaces, a
> lot of IPSEC stuff, possibly KCM, and probably many more.  These will
> get comparatively little benefit from being converted to a zinc-like
> interface.  For some of these cases, it wouldn't make any sense at all
> to convert them.  Certainly the ones that do async hardware crypto
> using DMA engines will never look at all like zinc, even under the
> hood.

Right, this is what the crypto API will continue to be used for.


> I think that, as a short-term goal, it makes a lot of sense to have
> implementations of the crypto that *new* kernel code (like Wireguard)
> wants to use in style (a) that live in /lib, and it obviously makes
> sense to consolidate their implementations with the crypto/
> implementations in a timely manner.  As a medium-term goal, adding
> more algorithms as needed for things that could use the simpler APIs
> (Bluetooth, perhaps) would make sense.

Agreed 100%. With regards to "consolidate their implementations" --
I've actually already done this after your urging yesterday, and so
that will be a part of v4.

> But I see no reason at all that /lib should ever contain a grab-bag of
> crypto implementations just for the heck of it.  They should have real
> in-kernel users IMO.  And this means that there will probably always
> be some crypto implementations in crypto/ for things like aes-xts.

Right, precisely.

Jason


Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library

2018-09-13 Thread Jason A. Donenfeld
Hi Ard,

On Thu, Sep 13, 2018 at 12:56 AM Ard Biesheuvel
 wrote:
> In this series, you are dumping a huge volume of unannotated,
> generated asm into the kernel which has been modified [by you] to
> [among other things?] adhere to the kernel API (without documenting
> what the changes are exactly). How does that live up to the promise of
> better, peer reviewed code?

The code still benefits from the review that's gone into OpenSSL. It's
not modified in ways that would affect the cryptographic operations
being done. It's modified to be suitable for kernel space.

> Then there is the performance claim. We know for instance that the
> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
> possess a micro-architectural property that ALU instructions are
> essentially free when they are interleaved with SIMD instructions. But
> we also know that a) Cortex-A7, which is a relevant target, is not one
> of those cores, and b) that chip designers are not likely to optimize
> for that particular usage pattern so relying on it in generic code is
> unwise in general.

That's interesting. I'll bring this up with AndyP. FWIW, if you think
you have a real and compelling claim here, I'd be much more likely to
accept a different ChaCha20 implementation than I would be to accept a
different Poly1305 implementation. (It's a *lot* harder to screw up
ChaCha20 than it is to screw up Poly1305.)

> I am also concerned about your claim that all software algorithms will
> be moved into this crypto library.

I'll defer to Andy's response here, which I think is a correct one:
https://lkml.org/lkml/2018/9/13/27

The short answer is that Zinc is going to be adding the ciphers that
people want to use for normal reasons from normal code. For example,
after this merges, we'll next be working on moving the remaining
non-optimized C code out of lib/ that's called by places (such as
SHA2).

> You are not specific about whose
> responsibility it will be that this is going to happen in a timely
> fashion.

I thought I laid out the roadmap for this in the commit message. In
case I wasn't clear: my plan is to tackle lib/ after merging, and I
plan to do so in a timely manner. It's a pretty common tactic to keep
layering on tasks, "what about X?", "what about Y?", "I won't agree
unless Z!" -- when in reality kernel development and refactorings are
done incrementally. I've been around on this list contributing code
for long enough that you should have a decent amount of confidence
that I'm not just going to disappear working on this or something
insane like that. And neither are the two academic cryptographers CC'd
on this thread. So, as Andy said, we're going to be porting to Zinc
the primitives that are useful for the various applications of Zinc.
This means yes, we'll have SHA2 in there.

> chaining modes
> What are the APIs
> going to look like for block ciphers, taking chaining modes into
> account?

As mentioned in the commit message and numerous times, we're not
trying to make a win32-like crypto API here or to remake the existing
Linux crypto API. Rather we're providing libraries of specific
functions that are useful for various circumstances. For example, if
AES-GCM is desired at some point, then we'll have a similar API for
that as we do for ChaPoly -- one that takes buffers and one that takes
sg. Likewise, hash functions use the familiar init/update/final.
"Generic" chaining modes aren't really part of the equation or design
goals.

Again, I realize you've spent a long time working on the existing
crypto API, and so your questions and concerns are in the line of,
"how are we going to make Zinc look like the existing crypto API in
functionality?" But that's not what we're up to here. We have a
different and complementary design goal. I understand why you're
squirming, but please recognize we're working on different things.

> I'm sure it is rather simple to port the crypto API implementation of
> ChaCha20 to use your library. I am more concerned about how your
> library is going to expand to cover all other software algorithms that
> we currently use in the kernel.

The subset of algorithms we add will be developed with the same
methodology as the present ones. There is nothing making this
particularly difficult or even more difficult for other primitives
than it was for ChaCha20. It's especially easy, in fact, since we're
following similar design methodologies as the vast majority of other
cryptography libraries that have been developed. Namely, we're
creating simple things called "functions".

> Of course. But please respond to all the concerns,
> You have not
> responded to that concern yet.

Sorry, it's certainly not my intention. I've been on vacation with my
family for the last several weeks, and only returned home
sleep-deprived last night after 4 days of plane delays. I've now
rested and will resume working on this full-time and I'll try my best
to address concerns, and also go back through emails to find things I

[PATCH v4 0/4] crypto: lrw - Fixes and improvements

2018-09-13 Thread Ondrej Mosnacek
This patchset contains a corner-case fix and several improvements  for
the LRW template.

The first patch fixes an out-of-bounds array access (and subsequently
incorrect cipher output) when the LRW counter goes from all ones to all
zeros. This patch should be applied to the crypto-2.6 tree and also go
to stable.

The second patch adds a test vector for lrw(aes) that covers the above
bug.

The third patch is a small optimization of the LRW tweak computation.

The fourth patch is a follow-up to a similar patch for XTS (it
simplifies away the use of dynamically allocated auxiliary buffer to
cache the computed tweak values):
https://patchwork.kernel.org/patch/10588775/

Patches 2-4 should be applied only to cryptodev-2.6, but they all depend
on the first patch.

Changes in v4:
  - applied various corrections/suggestions from Eric Biggers
  - added a fix for buggy behavior on counter wrap-around (+ test vector)

v3: https://www.spinics.net/lists/linux-crypto/msg34946.html
Changes in v3:
  - fix a copy-paste error

v2: https://www.spinics.net/lists/linux-crypto/msg34890.html
Changes in v2:
  - small cleanup suggested by Eric Biggers

v1: https://www.spinics.net/lists/linux-crypto/msg34871.html

Ondrej Mosnacek (4):
  crypto: lrw - Fix out-of bounds access on counter overflow
  crypto: testmgr - Add test for LRW counter wrap-around
  crypto: lrw - Optimize tweak computation
  crypto: lrw - Do not use auxiliary buffer

 crypto/lrw.c | 342 +--
 crypto/testmgr.h |  21 +++
 2 files changed, 112 insertions(+), 251 deletions(-)

-- 
2.17.1



[PATCH v4 3/4] crypto: lrw - Optimize tweak computation

2018-09-13 Thread Ondrej Mosnacek
This patch rewrites the tweak computation to a slightly simpler method
that performs less bswaps. Based on performance measurements the new
code seems to provide slightly better performance than the old one.

PERFORMANCE MEASUREMENTS (x86_64)
Performed using: https://gitlab.com/omos/linux-crypto-bench
Crypto driver used: lrw(ecb-aes-aesni)

Before:
   ALGORITHM KEY (b)DATA (B)   TIME ENC (ns)   TIME DEC (ns)
lrw(aes) 256  64 204 286
lrw(aes) 320  64 227 203
lrw(aes) 384  64 208 204
lrw(aes) 256 512 441 439
lrw(aes) 320 512 456 455
lrw(aes) 384 512 469 483
lrw(aes) 256409621362190
lrw(aes) 320409621612213
lrw(aes) 384409622952369
lrw(aes) 256   1638476927868
lrw(aes) 320   1638482308691
lrw(aes) 384   1638489718813
lrw(aes) 256   32768   15336   15560
lrw(aes) 320   32768   16410   16346
lrw(aes) 384   32768   18023   17465

After:
   ALGORITHM KEY (b)DATA (B)   TIME ENC (ns)   TIME DEC (ns)
lrw(aes) 256  64 200 203
lrw(aes) 320  64 202 204
lrw(aes) 384  64 204 205
lrw(aes) 256 512 415 415
lrw(aes) 320 512 432 440
lrw(aes) 384 512 449 451
lrw(aes) 256409618381995
lrw(aes) 320409621231980
lrw(aes) 384409621002119
lrw(aes) 256   1638471836954
lrw(aes) 320   1638478447631
lrw(aes) 384   1638482568126
lrw(aes) 256   32768   14772   14484
lrw(aes) 320   32768   15281   15431
lrw(aes) 384   32768   16469   16293

Signed-off-by: Ondrej Mosnacek 
---
 crypto/lrw.c | 61 +++-
 1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/crypto/lrw.c b/crypto/lrw.c
index 5504d1325a56..7377b5b486fd 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@@ -120,27 +120,28 @@ static int setkey(struct crypto_skcipher *parent, const 
u8 *key,
return 0;
 }
 
-static inline void inc(be128 *iv)
-{
-   be64_add_cpu(&iv->b, 1);
-   if (!iv->b)
-   be64_add_cpu(&iv->a, 1);
-}
-
-/* this returns the number of consequative 1 bits starting
- * from the right, get_index128(00 00 00 00 00 00 ... 00 00 10 FB) = 2 */
-static inline int get_index128(be128 *block)
+/*
+ * Returns the number of trailing '1' bits in the words of the counter, which 
is
+ * represented by 4 32-bit words, arranged from least to most significant.
+ * At the same time, increments the counter by one.
+ *
+ * For example:
+ *
+ * u32 counter[4] = { 0x, 0x1, 0x0, 0x0 };
+ * int i = next_index(&counter);
+ * // i == 33, counter == { 0x0, 0x2, 0x0, 0x0 }
+ */
+static int next_index(u32 *counter)
 {
-   int x;
-   __be32 *p = (__be32 *) block;
+   int i, res = 0;
 
-   for (p += 3, x = 0; x < 128; p--, x += 32) {
-   u32 val = be32_to_cpup(p);
-
-   if (!~val)
-   continue;
-
-   return x + ffz(val);
+   for (i = 0; i < 4; i++) {
+   if (counter[i] + 1 != 0) {
+   res += ffz(counter[i]++);
+   break;
+   }
+   counter[i] = 0;
+   res += 32;
}
 
/*
@@ -214,8 +215,9 @@ static int pre_crypt(struct skcipher_request *req)
struct scatterlist *sg;
unsigned cryptlen;
unsigned offset;
-   be128 *iv;
bool more;
+   __be32 *iv;
+   u32 counter[4];
int err;
 
subreq = &rctx->subreq;
@@ -230,7 +232,12 @@ static int pre_crypt(struct skcipher_request *req)
   cryptlen, req->iv);
 
err = skcipher_walk_virt(&w, subreq, false);
-   iv = w.iv;
+   iv = (__be32 *)w.iv;
+
+   counter[0] = be32_to_cpu(iv[3]);
+   counter[1] = be32_to_cpu(iv[2]);
+   counter[2] = be32_to_cpu(iv[

[PATCH v4 4/4] crypto: lrw - Do not use auxiliary buffer

2018-09-13 Thread Ondrej Mosnacek
This patch simplifies the LRW template to recompute the LRW tweaks from
scratch in the second pass and thus also removes the need to allocate a
dynamic buffer using kmalloc().

As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt.

PERFORMANCE MEASUREMENTS (x86_64)
Performed using: https://gitlab.com/omos/linux-crypto-bench
Crypto driver used: lrw(ecb-aes-aesni)

The results show that the new code has about the same performance as the
old code. For 512-byte message it seems to be even slightly faster, but
that might be just noise.

Before:
   ALGORITHM KEY (b)DATA (B)   TIME ENC (ns)   TIME DEC (ns)
lrw(aes) 256  64 200 203
lrw(aes) 320  64 202 204
lrw(aes) 384  64 204 205
lrw(aes) 256 512 415 415
lrw(aes) 320 512 432 440
lrw(aes) 384 512 449 451
lrw(aes) 256409618381995
lrw(aes) 320409621231980
lrw(aes) 384409621002119
lrw(aes) 256   1638471836954
lrw(aes) 320   1638478447631
lrw(aes) 384   1638482568126
lrw(aes) 256   32768   14772   14484
lrw(aes) 320   32768   15281   15431
lrw(aes) 384   32768   16469   16293

After:
   ALGORITHM KEY (b)DATA (B)   TIME ENC (ns)   TIME DEC (ns)
lrw(aes) 256  64 197 196
lrw(aes) 320  64 200 197
lrw(aes) 384  64 203 199
lrw(aes) 256 512 385 380
lrw(aes) 320 512 401 395
lrw(aes) 384 512 415 415
lrw(aes) 256409618691846
lrw(aes) 320409620801981
lrw(aes) 384409621602109
lrw(aes) 256   1638470777127
lrw(aes) 320   1638478077766
lrw(aes) 384   1638481088357
lrw(aes) 256   32768   14111   14454
lrw(aes) 320   32768   15268   15082
lrw(aes) 384   32768   16581   16250

[1] https://lkml.org/lkml/2018/8/23/1315

Signed-off-by: Ondrej Mosnacek 
---
 crypto/lrw.c | 280 ++-
 1 file changed, 51 insertions(+), 229 deletions(-)

diff --git a/crypto/lrw.c b/crypto/lrw.c
index 7377b5b486fd..6fcf0d431185 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@@ -29,8 +29,6 @@
 #include 
 #include 
 
-#define LRW_BUFFER_SIZE 128u
-
 #define LRW_BLOCK_SIZE 16
 
 struct priv {
@@ -56,19 +54,7 @@ struct priv {
 };
 
 struct rctx {
-   be128 buf[LRW_BUFFER_SIZE / sizeof(be128)];
-
be128 t;
-
-   be128 *ext;
-
-   struct scatterlist srcbuf[2];
-   struct scatterlist dstbuf[2];
-   struct scatterlist *src;
-   struct scatterlist *dst;
-
-   unsigned int left;
-
struct skcipher_request subreq;
 };
 
@@ -152,86 +138,31 @@ static int next_index(u32 *counter)
return 127;
 }
 
-static int post_crypt(struct skcipher_request *req)
+/*
+ * We compute the tweak masks twice (both before and after the ECB encryption 
or
+ * decryption) to avoid having to allocate a temporary buffer and/or make
+ * mutliple calls to the 'ecb(..)' instance, which usually would be slower than
+ * just doing the next_index() calls again.
+ */
+static int xor_tweak(struct skcipher_request *req, bool second_pass)
 {
-   struct rctx *rctx = skcipher_request_ctx(req);
-   be128 *buf = rctx->ext ?: rctx->buf;
-   struct skcipher_request *subreq;
const int bs = LRW_BLOCK_SIZE;
-   struct skcipher_walk w;
-   struct scatterlist *sg;
-   unsigned offset;
-   int err;
-
-   subreq = &rctx->subreq;
-   err = skcipher_walk_virt(&w, subreq, false);
-
-   while (w.nbytes) {
-   unsigned int avail = w.nbytes;
-   be128 *wdst;
-
-   wdst = w.dst.virt.addr;
-
-   do {
-   be128_xor(wdst, buf++, wdst);
-   wdst++;
-   } while ((avail -= bs) >= bs);
-
-   err = skcipher_walk_done(&w, avail);
-   }
-
-   rctx->left -= subreq->cryptle

[PATCH v4 1/4] crypto: lrw - Fix out-of bounds access on counter overflow

2018-09-13 Thread Ondrej Mosnacek
When the LRW block counter overflows, the current implementation returns
128 as the index to the precomputed multiplication table, which has 128
entries. This patch fixes it to return the correct value (127).

Fixes: 64470f1b8510 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow 
block cipher mode")
Cc:  # 2.6.20+
Reported-by: Eric Biggers 
Signed-off-by: Ondrej Mosnacek 
---
 crypto/lrw.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/crypto/lrw.c b/crypto/lrw.c
index 393a782679c7..5504d1325a56 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@@ -143,7 +143,12 @@ static inline int get_index128(be128 *block)
return x + ffz(val);
}
 
-   return x;
+   /*
+* If we get here, then x == 128 and we are incrementing the counter
+* from all ones to all zeros. This means we must return index 127, i.e.
+* the one corresponding to key2*{ 1,...,1 }.
+*/
+   return 127;
 }
 
 static int post_crypt(struct skcipher_request *req)
-- 
2.17.1



[PATCH v4 2/4] crypto: testmgr - Add test for LRW counter wrap-around

2018-09-13 Thread Ondrej Mosnacek
This patch adds a test vector for lrw(aes) that triggers wrap-around of
the counter, which is a tricky corner case.

Suggested-by: Eric Biggers 
Signed-off-by: Ondrej Mosnacek 
---
 crypto/testmgr.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 0b3d7cadbe93..47629cb1efd3 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -13145,6 +13145,27 @@ static const struct cipher_testvec 
aes_lrw_tv_template[] = {
.ctext  = "\x5b\x90\x8e\xc1\xab\xdd\x67\x5f"
  "\x3d\x69\x8a\x95\x53\xc8\x9c\xe5",
.len= 16,
+   }, { /* Test counter wrap-around, modified from LRW-32-AES 1 */
+   .key= "\x45\x62\xac\x25\xf8\x28\x17\x6d"
+ "\x4c\x26\x84\x14\xb5\x68\x01\x85"
+ "\x25\x8e\x2a\x05\xe7\x3e\x9d\x03"
+ "\xee\x5a\x83\x0c\xcc\x09\x4c\x87",
+   .klen   = 32,
+   .iv = "\xff\xff\xff\xff\xff\xff\xff\xff"
+ "\xff\xff\xff\xff\xff\xff\xff\xff",
+   .ptext  = "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x41\x42\x43\x44\x45\x46"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x41\x42\x43\x44\x45\x46"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x41\x42\x43\x44\x45\x46",
+   .ctext  = "\x47\x90\x50\xf6\xf4\x8d\x5c\x7f"
+ "\x84\xc7\x83\x95\x2d\xa2\x02\xc0"
+ "\xda\x7f\xa3\xc0\x88\x2a\x0a\x50"
+ "\xfb\xc1\x78\x03\x39\xfe\x1d\xe5"
+ "\xf1\xb2\x73\xcd\x65\xa3\xdf\x5f"
+ "\xe9\x5d\x48\x92\x54\x63\x4e\xb8",
+   .len= 48,
}, {
 /* http://www.mail-archive.com/stds-p1619@listserv.ieee.org/msg00173.html */
.key= "\xf8\xd4\x76\xff\xd6\x46\xee\x6c"
-- 
2.17.1



Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-09-13 Thread Kenneth Lee
On Tue, Sep 11, 2018 at 09:40:14AM -0400, Jerome Glisse wrote:
> Date: Tue, 11 Sep 2018 09:40:14 -0400
> From: Jerome Glisse 
> To: Kenneth Lee 
> CC: Kenneth Lee , Alex Williamson
>  , Herbert Xu ,
>  k...@vger.kernel.org, Jonathan Corbet , Greg Kroah-Hartman
>  , Zaibo Xu ,
>  linux-...@vger.kernel.org, Sanjay Kumar , Hao
>  Fang , linux-ker...@vger.kernel.org,
>  linux...@huawei.com, io...@lists.linux-foundation.org, "David S . Miller"
>  , linux-crypto@vger.kernel.org, Zhou Wang
>  , Philippe Ombredanne ,
>  Thomas Gleixner , Joerg Roedel ,
>  linux-accelerat...@lists.ozlabs.org, Lu Baolu 
> Subject: Re: [RFCv2 PATCH 0/7] A General Accelerator Framework, WarpDrive
> User-Agent: Mutt/1.10.1 (2018-07-13)
> Message-ID: <20180911134013.ga3...@redhat.com>
> 
> On Tue, Sep 11, 2018 at 02:40:43PM +0800, Kenneth Lee wrote:
> > On Mon, Sep 10, 2018 at 11:33:59PM -0400, Jerome Glisse wrote:
> > > On Tue, Sep 11, 2018 at 10:42:09AM +0800, Kenneth Lee wrote:
> > > > On Mon, Sep 10, 2018 at 10:54:23AM -0400, Jerome Glisse wrote:
> > > > > On Mon, Sep 10, 2018 at 11:28:09AM +0800, Kenneth Lee wrote:
> > > > > > On Fri, Sep 07, 2018 at 12:53:06PM -0400, Jerome Glisse wrote:
> > > > > > > On Fri, Sep 07, 2018 at 12:01:38PM +0800, Kenneth Lee wrote:
> > > > > > > > On Thu, Sep 06, 2018 at 09:31:33AM -0400, Jerome Glisse wrote:
> > > > > > > > > On Thu, Sep 06, 2018 at 05:45:32PM +0800, Kenneth Lee wrote:
> > > > > > > > > > On Tue, Sep 04, 2018 at 10:15:09AM -0600, Alex Williamson 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Tue, 4 Sep 2018 11:00:19 -0400 Jerome Glisse 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > On Mon, Sep 03, 2018 at 08:51:57AM +0800, Kenneth Lee 
> > > > > > > > > > > > wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > > > I took a look at i915_gem_execbuffer_ioctl(). It seems it 
> > > > > > > > "copy_from_user" the
> > > > > > > > user memory to the kernel. That is not what we need. What we 
> > > > > > > > try to get is: the
> > > > > > > > user application do something on its data, and push it away to 
> > > > > > > > the accelerator,
> > > > > > > > and says: "I'm tied, it is your turn to do the job...". Then 
> > > > > > > > the accelerator has
> > > > > > > > the memory, referring any portion of it with the same VAs of 
> > > > > > > > the application,
> > > > > > > > even the VAs are stored inside the memory itself.
> > > > > > > 
> > > > > > > You were not looking at right place see 
> > > > > > > drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > > > > It does GUP and create GEM object AFAICR you can wrap that GEM 
> > > > > > > object into a
> > > > > > > dma buffer object.
> > > > > > > 
> > > > > > 
> > > > > > Thank you for directing me to this implementation. It is 
> > > > > > interesting:).
> > > > > > 
> > > > > > But it is not yet solve my problem. If I understand it right, the 
> > > > > > userptr in
> > > > > > i915 do the following:
> > > > > > 
> > > > > > 1. The user process sets a user pointer with size to the kernel via 
> > > > > > ioctl.
> > > > > > 2. The kernel wraps it as a dma-buf and keeps the process's mm for 
> > > > > > further
> > > > > >reference.
> > > > > > 3. The user pages are allocated, GUPed or DMA mapped to the device. 
> > > > > > So the data
> > > > > >can be shared between the user space and the hardware.
> > > > > > 
> > > > > > But my scenario is: 
> > > > > > 
> > > > > > 1. The user process has some data in the user space, pointed by a 
> > > > > > pointer, say
> > > > > >ptr1. And within the memory, there may be some other pointers, 
> > > > > > let's say one
> > > > > >of them is ptr2.
> > > > > > 2. Now I need to assign ptr1 *directly* to the hardware MMIO space. 
> > > > > > And the
> > > > > >hardware must refer ptr1 and ptr2 *directly* for data.
> > > > > > 
> > > > > > Userptr lets the hardware and process share the same memory space. 
> > > > > > But I need
> > > > > > them to share the same *address space*. So IOMMU is a MUST for 
> > > > > > WarpDrive,
> > > > > > NOIOMMU mode, as Jean said, is just for verifying some of the 
> > > > > > procedure is OK.
> > > > > 
> > > > > So to be 100% clear should we _ignore_ the non SVA/SVM case ?
> > > > > If so then wait for necessary SVA/SVM to land and do warp drive
> > > > > without non SVA/SVM path.
> > > > > 
> > > > 
> > > > I think we should clear the concept of SVA/SVM here. As my 
> > > > understanding, Share
> > > > Virtual Address/Memory means: any virtual address in a process can be 
> > > > used by
> > > > device at the same time. This requires IOMMU device to support PASID. 
> > > > And
> > > > optionally, it requires the feature of page-fault-from-device.
> > > 
> > > Yes we agree on what SVA/SVM is. There is a one gotcha thought, access
> > > to range that are MMIO map ie CPU page table pointing to IO memory, IIRC
> > > it is undefined what happens on some platform for a device trying to
> > > access those using SVA/SVM.
> > > 
> > > 
> > >

[PATCH 2/4] crypto: s5p-sss: Fix whitespace issues

2018-09-13 Thread Christoph Manszewski
Fix two whitespace issues that occurred at line breakup.

Signed-off-by: Christoph Manszewski 
---
 drivers/crypto/s5p-sss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c
index 0cf3f12d8f74..ce9bd13ea166 100644
--- a/drivers/crypto/s5p-sss.c
+++ b/drivers/crypto/s5p-sss.c
@@ -491,7 +491,7 @@ static void s5p_unset_indata(struct s5p_aes_dev *dev)
 }
 
 static int s5p_make_sg_cpy(struct s5p_aes_dev *dev, struct scatterlist *src,
-   struct scatterlist **dst)
+  struct scatterlist **dst)
 {
void *pages;
int len;
@@ -1890,7 +1890,7 @@ static int s5p_set_indata_start(struct s5p_aes_dev *dev,
 }
 
 static int s5p_set_outdata_start(struct s5p_aes_dev *dev,
-   struct ablkcipher_request *req)
+struct ablkcipher_request *req)
 {
struct scatterlist *sg;
int err;
-- 
2.7.4



[PATCH 4/4] crypto: s5p-sss: Add aes-ctr support

2018-09-13 Thread Christoph Manszewski
Add support for aes counter(ctr) block cipher mode of operation for
Exynos Hardware. In contrast to ecb and cbc modes, aes-ctr allows
encyption/decryption for request sizes not being a multiple of 16(bytes).

Hardware requires block sizes being a multiple of 16(bytes). In order to
achieve this, copy request source and destination memory, and align it's size
to 16. That way hardware processes additional bytes, that are omitted
when copying the result back to its original destination.

Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.19-rc2 with crypto
run-time self test testmgr and with tcrypt module:
insmod tcrypt.ko sec=1 mode=500.

Signed-off-by: Christoph Manszewski 
---
 drivers/crypto/s5p-sss.c | 45 -
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c
index 817a4b2b71e6..cfa104f6e468 100644
--- a/drivers/crypto/s5p-sss.c
+++ b/drivers/crypto/s5p-sss.c
@@ -1814,7 +1814,7 @@ static struct ahash_alg algs_sha1_md5_sha256[] = {
 };
 
 static void s5p_set_aes(struct s5p_aes_dev *dev,
-   const u8 *key, const u8 *iv,
+   const u8 *key, const u8 *iv, const u8 *ctr,
unsigned int keylen)
 {
void __iomem *keystart;
@@ -1822,6 +1822,9 @@ static void s5p_set_aes(struct s5p_aes_dev *dev,
if (iv)
memcpy_toio(dev->aes_ioaddr + SSS_REG_AES_IV_DATA(0), iv, 0x10);
 
+   if (ctr)
+   memcpy_toio(dev->aes_ioaddr + SSS_REG_AES_CNT_DATA(0), ctr, 
0x10);
+
if (keylen == AES_KEYSIZE_256)
keystart = dev->aes_ioaddr + SSS_REG_AES_KEY_DATA(0);
else if (keylen == AES_KEYSIZE_192)
@@ -1903,8 +1906,9 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, 
unsigned long mode)
u32 aes_control;
unsigned long flags;
int err;
-   u8 *iv;
+   u8 *iv, *ctr;
 
+   /* This sets bit [13:12] to 00, which selects 128-bit counter */
aes_control = SSS_AES_KEY_CHANGE_MODE;
if (mode & FLAGS_AES_DECRYPT)
aes_control |= SSS_AES_MODE_DECRYPT;
@@ -1912,11 +1916,14 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev 
*dev, unsigned long mode)
if ((mode & FLAGS_AES_MODE_MASK) == FLAGS_AES_CBC) {
aes_control |= SSS_AES_CHAIN_MODE_CBC;
iv = req->info;
+   ctr = NULL;
} else if ((mode & FLAGS_AES_MODE_MASK) == FLAGS_AES_CTR) {
aes_control |= SSS_AES_CHAIN_MODE_CTR;
-   iv = req->info;
+   iv = NULL;
+   ctr = req->info;
} else {
iv = NULL; /* AES_ECB */
+   ctr = NULL;
}
 
if (dev->ctx->keylen == AES_KEYSIZE_192)
@@ -1948,7 +1955,7 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, 
unsigned long mode)
goto outdata_error;
 
SSS_AES_WRITE(dev, AES_CONTROL, aes_control);
-   s5p_set_aes(dev, dev->ctx->aes_key, iv, dev->ctx->keylen);
+   s5p_set_aes(dev, dev->ctx->aes_key, iv, ctr, dev->ctx->keylen);
 
s5p_set_dma_indata(dev,  dev->sg_src);
s5p_set_dma_outdata(dev, dev->sg_dst);
@@ -2026,7 +2033,8 @@ static int s5p_aes_crypt(struct ablkcipher_request *req, 
unsigned long mode)
struct s5p_aes_ctx *ctx = crypto_ablkcipher_ctx(tfm);
struct s5p_aes_dev *dev = ctx->dev;
 
-   if (!IS_ALIGNED(req->nbytes, AES_BLOCK_SIZE)) {
+   if (!IS_ALIGNED(req->nbytes, AES_BLOCK_SIZE) &&
+   ((mode & FLAGS_AES_MODE_MASK) != FLAGS_AES_CTR)) {
dev_err(dev->dev, "request size is not exact amount of AES 
blocks\n");
return -EINVAL;
}
@@ -2073,6 +2081,11 @@ static int s5p_aes_cbc_decrypt(struct ablkcipher_request 
*req)
return s5p_aes_crypt(req, FLAGS_AES_DECRYPT | FLAGS_AES_CBC);
 }
 
+static int s5p_aes_ctr_crypt(struct ablkcipher_request *req)
+{
+   return s5p_aes_crypt(req, FLAGS_AES_CTR);
+}
+
 static int s5p_aes_cra_init(struct crypto_tfm *tfm)
 {
struct s5p_aes_ctx *ctx = crypto_tfm_ctx(tfm);
@@ -2127,6 +2140,28 @@ static struct crypto_alg algs[] = {
.decrypt= s5p_aes_cbc_decrypt,
}
},
+   {
+   .cra_name   = "ctr(aes)",
+   .cra_driver_name= "ctr-aes-s5p",
+   .cra_priority   = 100,
+   .cra_flags  = CRYPTO_ALG_TYPE_ABLKCIPHER |
+ CRYPTO_ALG_ASYNC |
+ CRYPTO_ALG_KERN_DRIVER_ONLY,
+   .cra_blocksize  = AES_BLOCK_SIZE,
+   .cra_ctxsize= sizeof(struct s5p_aes_ctx),
+   .cra_alignmask  = 0x0f,
+   .cra_type   = &crypto_ablkcipher_type,
+   .cra_module = THIS_MODULE,
+   .cra_init   

[PATCH 1/4] crypto: s5p-sss: Fix race in error handling

2018-09-13 Thread Christoph Manszewski
Remove a race condition introduced by error path in functions:
s5p_aes_interrupt and s5p_aes_crypt_start. Setting the busy field of
struct s5p_aes_dev to false made it possible for s5p_tasklet_cb to
change the req field, before s5p_aes_complete was called.

Change the first parameter of s5p_aes_complete to struct
ablkcipher_request. Before spin_unlock, make a copy of the currently
handled request, to ensure s5p_aes_complete function call with the
correct request.

Signed-off-by: Christoph Manszewski 
---
 drivers/crypto/s5p-sss.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c
index faa282074e5a..0cf3f12d8f74 100644
--- a/drivers/crypto/s5p-sss.c
+++ b/drivers/crypto/s5p-sss.c
@@ -475,9 +475,9 @@ static void s5p_sg_done(struct s5p_aes_dev *dev)
 }
 
 /* Calls the completion. Cannot be called with dev->lock hold. */
-static void s5p_aes_complete(struct s5p_aes_dev *dev, int err)
+static void s5p_aes_complete(struct ablkcipher_request *req, int err)
 {
-   dev->req->base.complete(&dev->req->base, err);
+   req->base.complete(&req->base, err);
 }
 
 static void s5p_unset_outdata(struct s5p_aes_dev *dev)
@@ -655,6 +655,7 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id)
 {
struct platform_device *pdev = dev_id;
struct s5p_aes_dev *dev = platform_get_drvdata(pdev);
+   struct ablkcipher_request *req;
int err_dma_tx = 0;
int err_dma_rx = 0;
int err_dma_hx = 0;
@@ -725,9 +726,10 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id)
if (err_dma_hx == 1)
s5p_set_dma_hashdata(dev, dev->hash_sg_iter);
 
+   req = dev->req;
spin_unlock_irqrestore(&dev->lock, flags);
 
-   s5p_aes_complete(dev, 0);
+   s5p_aes_complete(req, 0);
/* Device is still busy */
tasklet_schedule(&dev->tasklet);
} else {
@@ -755,8 +757,9 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id)
if (err_dma_hx == 1)
s5p_set_dma_hashdata(dev, dev->hash_sg_iter);
 
+   req = dev->req;
spin_unlock_irqrestore(&dev->lock, flags);
-   s5p_aes_complete(dev, err);
+   s5p_aes_complete(req, err);
 
 hash_irq_end:
/*
@@ -1983,7 +1986,7 @@ static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, 
unsigned long mode)
s5p_sg_done(dev);
dev->busy = false;
spin_unlock_irqrestore(&dev->lock, flags);
-   s5p_aes_complete(dev, err);
+   s5p_aes_complete(req, err);
 }
 
 static void s5p_tasklet_cb(unsigned long data)
-- 
2.7.4



[PATCH 0/4] crypto: s5p-sss: Fix and add aes-ctr support

2018-09-13 Thread Christoph Manszewski
Hello,

This patch series adds aes-ctr support in s5p-sss.c driver. Additionally it
it provides a fix, and a minor code cleanup.

Patch 1 contains a simple fix, of a possible race condition.
Patches 2-3 are code cleanup patches.
Patch 4 adds support for aes-ctr block cipher mode of operation.

Regards,
Chris

Christoph Manszewski (4):
  crypto: s5p-sss: Fix race in error handling
  crypto: s5p-sss: Fix whitespace issues
  crypto: s5p-sss: Minor code cleanup
  crypto: s5p-sss: Add aes-ctr support

 drivers/crypto/s5p-sss.c | 114 +++
 1 file changed, 66 insertions(+), 48 deletions(-)

-- 
2.7.4



[PATCH 3/4] crypto: s5p-sss: Minor code cleanup

2018-09-13 Thread Christoph Manszewski
Modifications in s5p-sss.c:

- remove unnecessary 'goto' statements,

- change uint_8 and uint_32 to u8 and u32 types,

Signed-off-by: Christoph Manszewski 
---
 drivers/crypto/s5p-sss.c | 54 +++-
 1 file changed, 17 insertions(+), 37 deletions(-)

diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c
index ce9bd13ea166..817a4b2b71e6 100644
--- a/drivers/crypto/s5p-sss.c
+++ b/drivers/crypto/s5p-sss.c
@@ -249,8 +249,8 @@ struct s5p_aes_reqctx {
 struct s5p_aes_ctx {
struct s5p_aes_dev  *dev;
 
-   uint8_t aes_key[AES_MAX_KEY_SIZE];
-   uint8_t nonce[CTR_RFC3686_NONCE_SIZE];
+   u8  aes_key[AES_MAX_KEY_SIZE];
+   u8  nonce[CTR_RFC3686_NONCE_SIZE];
int keylen;
 };
 
@@ -518,46 +518,28 @@ static int s5p_make_sg_cpy(struct s5p_aes_dev *dev, 
struct scatterlist *src,
 
 static int s5p_set_outdata(struct s5p_aes_dev *dev, struct scatterlist *sg)
 {
-   int err;
-
-   if (!sg->length) {
-   err = -EINVAL;
-   goto exit;
-   }
+   if (!sg->length)
+   return -EINVAL;
 
-   err = dma_map_sg(dev->dev, sg, 1, DMA_FROM_DEVICE);
-   if (!err) {
-   err = -ENOMEM;
-   goto exit;
-   }
+   if (!dma_map_sg(dev->dev, sg, 1, DMA_FROM_DEVICE))
+   return -ENOMEM;
 
dev->sg_dst = sg;
-   err = 0;
 
-exit:
-   return err;
+   return 0;
 }
 
 static int s5p_set_indata(struct s5p_aes_dev *dev, struct scatterlist *sg)
 {
-   int err;
-
-   if (!sg->length) {
-   err = -EINVAL;
-   goto exit;
-   }
+   if (!sg->length)
+   return -EINVAL;
 
-   err = dma_map_sg(dev->dev, sg, 1, DMA_TO_DEVICE);
-   if (!err) {
-   err = -ENOMEM;
-   goto exit;
-   }
+   if (!dma_map_sg(dev->dev, sg, 1, DMA_TO_DEVICE))
+   return -ENOMEM;
 
dev->sg_src = sg;
-   err = 0;
 
-exit:
-   return err;
+   return 0;
 }
 
 /*
@@ -662,8 +644,7 @@ static irqreturn_t s5p_aes_interrupt(int irq, void *dev_id)
bool tx_end = false;
bool hx_end = false;
unsigned long flags;
-   uint32_t status;
-   u32 st_bits;
+   u32 status, st_bits;
int err;
 
spin_lock_irqsave(&dev->lock, flags);
@@ -1833,7 +1814,7 @@ static struct ahash_alg algs_sha1_md5_sha256[] = {
 };
 
 static void s5p_set_aes(struct s5p_aes_dev *dev,
-   const uint8_t *key, const uint8_t *iv,
+   const u8 *key, const u8 *iv,
unsigned int keylen)
 {
void __iomem *keystart;
@@ -1919,7 +1900,7 @@ static int s5p_set_outdata_start(struct s5p_aes_dev *dev,
 static void s5p_aes_crypt_start(struct s5p_aes_dev *dev, unsigned long mode)
 {
struct ablkcipher_request *req = dev->req;
-   uint32_t aes_control;
+   u32 aes_control;
unsigned long flags;
int err;
u8 *iv;
@@ -2027,7 +2008,7 @@ static int s5p_aes_handle_req(struct s5p_aes_dev *dev,
err = ablkcipher_enqueue_request(&dev->queue, req);
if (dev->busy) {
spin_unlock_irqrestore(&dev->lock, flags);
-   goto exit;
+   return err;
}
dev->busy = true;
 
@@ -2035,7 +2016,6 @@ static int s5p_aes_handle_req(struct s5p_aes_dev *dev,
 
tasklet_schedule(&dev->tasklet);
 
-exit:
return err;
 }
 
@@ -2057,7 +2037,7 @@ static int s5p_aes_crypt(struct ablkcipher_request *req, 
unsigned long mode)
 }
 
 static int s5p_aes_setkey(struct crypto_ablkcipher *cipher,
- const uint8_t *key, unsigned int keylen)
+ const u8 *key, unsigned int keylen)
 {
struct crypto_tfm *tfm = crypto_ablkcipher_tfm(cipher);
struct s5p_aes_ctx *ctx = crypto_tfm_ctx(tfm);
-- 
2.7.4



Re: [PATCH 1/5] crypto: arm/aes-ce - enable module autoloading based on CPU feature bits

2018-09-13 Thread Ard Biesheuvel
On 13 September 2018 at 08:24, Stefan Agner  wrote:
> On 10.09.2018 00:01, Ard Biesheuvel wrote:
>> On 10 September 2018 at 08:21, Stefan Agner  wrote:
>>> Hi Ard,
>>>
>>> On 21.05.2017 03:23, Ard Biesheuvel wrote:
 Make the module autoloadable by tying it to the CPU feature bit that
 describes whether the optional instructions it relies on are implemented
 by the current CPU.

>>>
>>> This leads to a compiler warning when compiling multi_v7_defconfig/ARM32
>>> using Clang 6.0.1:
>>>
>>> arch/arm/crypto/aes-ce-glue.c:450:1: warning: variable
>>> 'cpu_feature_match_AES' is not needed and will not
>>>   be emitted [-Wunneeded-internal-declaration]
>>> module_cpu_feature_match(AES, aes_init);
>>>
>>> ./include/linux/cpufeature.h:48:33: note: expanded from macro
>>> 'module_cpu_feature_match'
>>> static struct cpu_feature const cpu_feature_match_ ## x[] = \
>>>
>>> :83:1: note: expanded from here
>>> cpu_feature_match_AES
>>> ^
>>> 1 warning generated.
>>>
>>> Do you happen to have an idea how to alleviate?
>>>
>>
>> I guess this only happens for modules that are selected as builtin,
>> and so MODULE_DEVICE_TABLE() resolves to nothing?
>> Does this only occur for CPU features?
>
> So in the above case CONFIG_ARM_CRYPTO=y, CONFIG_CRYPTO_AES_ARM_CE=m...
>
> Right now I only saw it with CPU features... I remember seen similar issues, 
> which got resolved. Digging in the git history I found 1f318a8bafcf 
> ("modules: mark __inittest/__exittest as __maybe_unused"),
>
> This seems to resolve it:
>
> --- a/include/linux/cpufeature.h
> +++ b/include/linux/cpufeature.h
> @@ -45,7 +45,7 @@
>   * 'asm/cpufeature.h' of your favorite architecture.
>   */
>  #define module_cpu_feature_match(x, __initfunc)\
> -static struct cpu_feature const cpu_feature_match_ ## x[] =\
> +static struct cpu_feature const __maybe_unused cpu_feature_match_ ## x[] = \
> { { .feature = cpu_feature(x) }, { } }; \
>  MODULE_DEVICE_TABLE(cpu, cpu_feature_match_ ## x); \
> \
>
> Also arch/arm/crypto/crc32-ce-glue.c needs an extra __maybe_unused.
>

Yes, that looks like the right approach to me. The difference between
other uses of MODULE_DEVICE_TABLE() is that the second argument
usually gets referenced in some way in the driver struct. It the CPU
feature case, that does not happen and so the struct ends up being
unreferenced when building the code into the kernel.