date:20180807

On Fri, Aug 03, 2018 at 01:37:50PM +0200, Ondrej Mosnacek wrote:
> It turns out I had misunderstood how the x86_match_cpu() function works.
> It evaluates a logical OR of the matching conditions, not logical AND.
> This caused the CPU feature checks for AEGIS to pass even if only SSE2
> (but not AES-NI) was supported (or vice versa), leading to potential
> crashes if something tried to use the registered algs.
> 
> This patch switches the checks to a simpler method that is used e.g. in
> the Camellia x86 code.
> 
> The patch also removes the MODULE_DEVICE_TABLE declarations which
> actually seem to cause the modules to be auto-loaded at boot, which is
> not desired. The crypto API on-demand module loading is sufficient.
> 
> Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations")
> Fixes: 6ecc9d9ff91f ("crypto: x86 - Add optimized MORUS implementations")
> Signed-off-by: Ondrej Mosnacek 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [RFC PATCH cryptodev] crypto: sec_send_request() can be static

On Sat, Aug 04, 2018 at 06:21:01AM +0800, kbuild test robot wrote:
> 
> Fixes: 915e4e8413da ("crypto: hisilicon - SEC security accelerator driver")
> Signed-off-by: kbuild test robot 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/2] crypto: arm64/ghash-ce - performance improvements

On Sat, Aug 04, 2018 at 08:46:23PM +0200, Ard Biesheuvel wrote:
> Another bit of performance work on the GHASH driver: this time it is not
> the combined AES/GCM algorithm but the bare GHASH driver that gets updated.
> 
> Even though ARM cores that implement the polynomical multiplication
> instructions that these routines depend on are guaranteed to also support
> the AES instructions, and can thus use the AES/GCM driver, there could
> be reasons to use the accelerated GHASH in isolation, e.g., with another
> symmetric blockcipher, with a faster h/w accelerator, or potentially with
> an accelerator that does not expose the AES key to the OS.
> 
> The resulting code runs at 1.1 cycles per byte on Cortex-A53 (down from
> 2.4 cycles per byte)
> 
> Ard Biesheuvel (2):
>   crypto: arm64/ghash-ce - replace NEON yield check with block limit
>   crypto: arm64/ghash-ce - implement 4-way aggregation
> 
>  arch/arm64/crypto/ghash-ce-core.S | 153 ++--
>  arch/arm64/crypto/ghash-ce-glue.c |  87 ++-
>  2 files changed, 161 insertions(+), 79 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [RESEND PATCH 2/2] crypto: caam - Support deferred probing in JR dependent drivers

2018-08-07 Thread Horia Geanta

On 8/7/2018 11:00 AM, Marcin Niestroj wrote:
> It is possible, that caam_jr_alloc() is called before JR devices are
> probed. Return -EPROBE_DEFER in drivers that rely on JR devices, so
> they are probed at later stage.
> 
These drivers don't have a probe() callback.
Returning -EPROBE_DEFER in module's init() (caamrng) or crypto algorithm's
init() (caamalg etc.) callbacks is not going to help, they won't be called 
later.

Does adding request_module("caam_jr") in module's init() solve the issue?

Regards,
Horia

[PATCH] crypto: arm64/sm4-ce - check for the right CPU feature bit

2018-08-07 Thread Ard Biesheuvel

ARMv8.2 specifies special instructions for the SM3 cryptographic hash
and the SM4 symmetric cipher. While it is unlikely that a core would
implement one and not the other, we should only use SM4 instructions
if the SM4 CPU feature bit is set, and we currently check the SM3
feature bit instead. So fix that.

Signed-off-by: Ard Biesheuvel 
---
It would be good to get this backported to -stable but there is no
need to merge this as a fix at -rc8

 arch/arm64/crypto/sm4-ce-glue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/crypto/sm4-ce-glue.c b/arch/arm64/crypto/sm4-ce-glue.c
index b7fb5274b250..0c4fc223f225 100644
--- a/arch/arm64/crypto/sm4-ce-glue.c
+++ b/arch/arm64/crypto/sm4-ce-glue.c
@@ -69,5 +69,5 @@ static void __exit sm4_ce_mod_fini(void)
crypto_unregister_alg(&sm4_ce_alg);
 }
 
-module_cpu_feature_match(SM3, sm4_ce_mod_init);
+module_cpu_feature_match(SM4, sm4_ce_mod_init);
 module_exit(sm4_ce_mod_fini);
-- 
2.18.0

[PATCH -next] crypto: hisilicon - Make function sec_send_request() static

2018-08-07 Thread Wei Yongjun

Fixes the following sparse warning:

drivers/crypto/hisilicon/sec/sec_algs.c:396:5: warning:
 symbol 'sec_send_request' was not declared. Should it be static?

Fixes: 915e4e8413da ("crypto: hisilicon - SEC security accelerator driver")
Signed-off-by: Wei Yongjun 
---
 drivers/crypto/hisilicon/sec/sec_algs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/hisilicon/sec/sec_algs.c 
b/drivers/crypto/hisilicon/sec/sec_algs.c
index d69d3ce..03ba1df 100644
--- a/drivers/crypto/hisilicon/sec/sec_algs.c
+++ b/drivers/crypto/hisilicon/sec/sec_algs.c
@@ -393,7 +393,8 @@ static void sec_alg_free_el(struct sec_request_el *el,
 }
 
 /* queuelock must be held */
-int sec_send_request(struct sec_request *sec_req, struct sec_queue *queue)
+static int sec_send_request(struct sec_request *sec_req,
+   struct sec_queue *queue)
 {
struct sec_request_el *el, *temp;
int ret = 0;

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-07 Thread Yu Chen

On Mon, Aug 06, 2018 at 12:20:20PM +0200, Oliver Neukum wrote:
> On Mo, 2018-08-06 at 15:57 +0800, Yu Chen wrote:
> > Hi Oliver,
> > On Thu, Jul 26, 2018 at 09:30:46AM +0200, Oliver Neukum wrote:
> > > On Di, 2018-07-24 at 00:23 +0800, Yu Chen wrote:
> > > > 
> > > > Good point, we once tried to generate key in kernel, but people
> > > > suggest to generate key in userspace and provide it to the
> > > > kernel, which is what ecryptfs do currently, so it seems this
> > > > should also be safe for encryption in kernel.
> > > > https://www.spinics.net/lists/linux-crypto/msg33145.html
> > > > Thus Chun-Yi's signature can use EFI key and both the key from
> > > > user space.
> > > 
> > > Hi,
> > > 
> > > ecryptfs can trust user space. It is supposed to keep data
> > > safe while the system is inoperative.
> > 
> > Humm, I did not quite get the point here, let's take fscrypt
> 
> While the system is running and the fs is mounted, your data
> is as secure as root access to your machine, right? You encrypt
> a disk primarily so data cannot be recovered (and altered) while
> the system is not running.
>
> Secure Boot does not trust root fully. There is a cryptographic
> chain of trust and user space is not part of it.
>
Okay, I see. So if we want to use secure boot mechanism for
hibernation encryption, user space is trusted.
> > for example, the kernel gets user generated key from user space,
> > and uses per-inode nonce(random bytes) as the master key to
> > do a KDF(key derivation function) on user provided key, and uses
> > that key for encryption. We can also added similar mechanism
> > to generate the key in kernel space but the key should be
> > original from user's provided key(password derived), because
> > the security boot/signature mechanism could not cover the case
> > that, two different users could resume to each other's context
> > because there isn't any certification during resume if it is
> > on the same physical hardware.
> 
> Please explain. You will always have to suspend the whole machine
> with all tasks of all users. And STD with Secure Boot need not
> imply that you encrypt your discs. You need to encrypt only
> kernel memory to meet the requirements.
> 
> As STD affects the whole machine it must require root rights.
> So I cannot see how you can talk about a session belonging
> to a user. Please explain.
> 
The case is for physical access, not the 'user' in OS.
> It seems to me that you can in theory encrypt the password
> by a key coming from user space, so that you need to know
> an additional key to resume the system, but that seems to me
> above and beyond what Secure Boot requires.
> 
Understand.

Best,
Yu
>   Regards
>   Oliver
>

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-07 Thread Yu Chen

On Mon, Aug 06, 2018 at 06:39:58PM +0800, joeyli wrote:
> On Mon, Aug 06, 2018 at 04:45:34PM +0800, Yu Chen wrote:
> > Hi Pavel,
> > On Sun, Aug 05, 2018 at 12:02:00PM +0200, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > > User space doesn't need to involve. The EFI root key is generated by
> > > > > EFI boot stub and be transfer to kernel. It's stored in EFI boot 
> > > > > service
> > > > > variable that it can only be accessed by trusted EFI binary when
> > > > > secure boot is enabled.
> > > > >
> > > > Okay, this apply to the 'suspend' phase, right?
> > > > I'm still a little confused about the 'resume' phase.
> > > > Taking encryption as example(not signature),
> > > > the purpose of doing hibernation encryption is to prevent other users
> > > > from stealing ram content. Say, user A uses a  passphrase to generate 
> > > > the
> > > 
> > > No, I don't think that's purpose here.
> > > 
> > > Purpose here is to prevent user from reading/modifying kernel memory
> > > content on machine he owns.
> > >
> > Say, A puts his laptop into hibernation and walks away,
> > and B walks by, and opens A's laptop and wakes up the system and he
> > can do what he wants. Although EFI key/TPM trusted key is enabled,
> > currently there's no certification during resume, which sounds
> > unsafe to me. Afterall, the original requirement is to probe
> > user for password during resume, which sounds more natural.
> 
> OK, I saw your case. This is a physical accessing.
> 
> I have a question: The suspend to memory also has the same behavior
> and more people are using suspend. Should we think a common solution
> to cover S3 and S4? 
>
Since STD behaves more likely a boot up, STR does not have solid
requirement for certification.
> > > Strange as it may sound, that is what "secure" boot requires (and what
> > > Disney wants).
> > > 
> > Ok, I understand this requirement, and I'm also concerning how to
> > distinguish different users from seeing data of each other.
> > 
> > Joey,
> > I'm thinking of a possible direction which could take advantage
> > of the password.  It leverages either EFI key or TPM
> > trusted key to get it done. Does it make sense?
> > 
> > 1. The user space generates a symetric key key_user using
> >the password, and passes the key_user to the kernel as the master
> >key.
> > 2. The kernel uses the EFI key or TPM trusted key to encrypt
> >the key_user thus gets a encrypt_key.
> > 3. Uses the encrypt_key to do snapshot encryption
> > 4. During resume, the same encrypt_key is generated following
> >the same steps(I assume the same EFI key or TPM key could be fetched
> >during resumed, right?) and do the snapshot decryption.
> >
> 
> Yes, we can use TPM key to protect the user key. But I suggest that we
> should give user a function to disable the user key because not everyone
> want to key-in a password for hibernate/resume and also snapshot image
> encryption.
> 
> Two policies:
> - When user key-in user key, the snapshot image must be encryption.
> - Without key-in user key, I still want the snapshot image can be encryption.
> 
> No matter that the user key be key-in or not, the snapshot image must be
> encrypted by a kernel key. So I suggest that we treat the user key as a salt
> for snapshot image encryption and authentication. If the user key
> be disabled, then kernel just generates a random number as a salt.
> 
> Actually, the kernel must compares the user key before snapshot decryption.
> If the user key doesn't match but user space still triggers furture resume
> process. Then kernel direct drops the snapshot image. 
>  
Anyway I'm ok with using TPM for major 'security', please feel free
to send a second version out, and for certification implementation
we can have further discussion on that later.

Best,
Yu
> > And this is what fscrypt is doing:
> > Documentation/filesystems/fscrypt.rst
> >
> The use case is different. We have two key for two purposes. And the two
> functions can be separated.
> 
> Thanks
> Joey Lee

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-07 Thread Ryan Chen

On Tue, Aug 7, 2018 at 3:33 PM Yu Chen  wrote:
>
> On Mon, Aug 06, 2018 at 12:20:20PM +0200, Oliver Neukum wrote:
> > On Mo, 2018-08-06 at 15:57 +0800, Yu Chen wrote:
> > > Hi Oliver,
> > > On Thu, Jul 26, 2018 at 09:30:46AM +0200, Oliver Neukum wrote:
> > > > On Di, 2018-07-24 at 00:23 +0800, Yu Chen wrote:
> > > > >
> > > > > Good point, we once tried to generate key in kernel, but people
> > > > > suggest to generate key in userspace and provide it to the
> > > > > kernel, which is what ecryptfs do currently, so it seems this
> > > > > should also be safe for encryption in kernel.
> > > > > https://www.spinics.net/lists/linux-crypto/msg33145.html
> > > > > Thus Chun-Yi's signature can use EFI key and both the key from
> > > > > user space.
> > > >
> > > > Hi,
> > > >
> > > > ecryptfs can trust user space. It is supposed to keep data
> > > > safe while the system is inoperative.
> > >
> > > Humm, I did not quite get the point here, let's take fscrypt
> >
> > While the system is running and the fs is mounted, your data
> > is as secure as root access to your machine, right? You encrypt
> > a disk primarily so data cannot be recovered (and altered) while
> > the system is not running.
> >
> > Secure Boot does not trust root fully. There is a cryptographic
> > chain of trust and user space is not part of it.
> >
> Okay, I see. So if we want to use secure boot mechanism for
> hibernation encryption, user space is trusted.
s/ is trusted/is not trusted/

Re: [PATCH 1/9] crypto: add zbufsize() interface

On Thu, Aug 02, 2018 at 02:51:10PM -0700, Kees Cook wrote:
> When pstore was refactored to use the crypto compress API in:
> 
>   commit cb3bee0369bc ("pstore: Use crypto compress API")
> 
> nearly all the pstore-specific compression routines were replaced with
> the existing crypto compression API. One case remained: calculating the
> "worst case" compression sizes ahead of time so it could have a buffer
> preallocated for doing compression (which was called "zbufsize").
> 
> To make pstore fully algorithm-agnostic, the compression API needs to
> grow this functionality. This adds the interface to support querying the
> "worst case" estimate, with a new "zbufsize" routine that each compressor
> can implement. The per-compressor implementations come in later commits.
> 
> Signed-off-by: Kees Cook 
> ---
>  crypto/compress.c   |  9 +
>  include/crypto/internal/scompress.h | 11 +++
>  include/linux/crypto.h  | 12 
>  3 files changed, 32 insertions(+)
> 
> diff --git a/crypto/compress.c b/crypto/compress.c
> index f2d522924a07..29a80bb3b9d3 100644
> --- a/crypto/compress.c
> +++ b/crypto/compress.c
> @@ -33,12 +33,21 @@ static int crypto_decompress(struct crypto_tfm *tfm,
>  dlen);
>  }
>  
> +static int crypto_zbufsize(struct crypto_tfm *tfm,
> +unsigned int slen, unsigned int *dlen)
> +{
> + if (!tfm->__crt_alg->cra_compress.coa_zbufsize)
> + return -ENOTSUPP;
> + return tfm->__crt_alg->cra_compress.coa_zbufsize(tfm, slen, dlen);
> +}

Please don't add new features to the old compress interface.  Any
new improvements should be added to scomp/acomp only.  Users who
need new features should be converted.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v7 2/9] crypto: cbc: Remove VLA usage

On Thu, Aug 02, 2018 at 03:51:45PM -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> uses the upper bounds on blocksize. Since this is always a cipher
> blocksize, use the existing cipher max blocksize.
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 
> Signed-off-by: Kees Cook 
> ---
>  include/crypto/cbc.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/crypto/cbc.h b/include/crypto/cbc.h
> index f5b8bfc22e6d..47db0aac2ab9 100644
> --- a/include/crypto/cbc.h
> +++ b/include/crypto/cbc.h
> @@ -113,7 +113,9 @@ static inline int crypto_cbc_decrypt_inplace(
>   unsigned int bsize = crypto_skcipher_blocksize(tfm);
>   unsigned int nbytes = walk->nbytes;
>   u8 *src = walk->src.virt.addr;
> - u8 last_iv[bsize];
> + u8 last_iv[MAX_CIPHER_BLOCKSIZE];
> +
> + BUG_ON(bsize > sizeof(last_iv));

Ugh, please don't add these BUG_ONs.  Add them to the places where
the algorithm is created (if they aren't checking that already).

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH][RESEND] lib/mpi: remove redundant variable esign

On Mon, Jul 30, 2018 at 09:59:36AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Variable esign is being assigned but is never used hence it is
> redundant and can be removed.
> 
> Cleans up clang warning:
> warning: variable 'esign' set but not used [-Wunused-but-set-variable]
> 
> Signed-off-by: Colin Ian King 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-07 Thread Oliver Neukum

On Di, 2018-08-07 at 15:38 +0800, Yu Chen wrote:
> > As STD affects the whole machine it must require root rights.
> > So I cannot see how you can talk about a session belonging
> > to a user. Please explain.
> > 
> 
> The case is for physical access, not the 'user' in OS.

Well, yes, but Secure Boot does not guard against anybody
booting or halting the machine. It limits what you can
boot by a chain of trust.

I think you are trying to add a feature to Secure Boot.

Regards
Oliver

Re: [RFC PATCH 3/9] crypto: chacha20-generic - refactor to allow varying number of rounds

2018-08-07 Thread Samuel Neves

> The best attack on ChaCha breaks 7 rounds, and that attack requires 2^248 
> operations.

This number, as far as I can tell, comes from the "New features of
Latin dances" paper. There have been some minor improvements in the
intervening 10 years, e.g., [1, 2, 3, 4], which pull back the
complexity of breaking ChaCha7 down to 2^235. In any case, every
attack so far appears to hit a wall at 8 rounds, with 12 rounds---the
recommended eSTREAM round number for Salsa20---seeming to offer a
reasonable security margin, still somewhat better than that of the
AES.

Best regards,
Samuel Neves

[1] https://eprint.iacr.org/2015/698
[2] https://eprint.iacr.org/2015/217
[3] https://eprint.iacr.org/2016/1034
[4] https://doi.org/10.1016/j.dam.2017.04.034

Re: [RFC PATCH 8/9] crypto: arm/poly1305 - add NEON accelerated Poly1305 implementation

2018-08-07 Thread Ard Biesheuvel

On 7 August 2018 at 00:32, Eric Biggers  wrote:
> From: Eric Biggers 
>
> Add the Poly1305 code from OpenSSL, which was written by Andy Polyakov.
> I took the .S file from WireGuard, whose author has made the needed
> tweaks for Linux kernel integration and verified that Andy had given
> permission for GPLv2 distribution.  I didn't make any additional changes
> to the .S file.
>
> Note, for HPolyC I'd eventually like a Poly1305 implementation that
> allows precomputing powers of the key.  But for now this implementation
> just provides the existing semantics where the key and nonce are treated
> as a "one-time key" that must be provided for every message.
>
> Signed-off-by: Eric Biggers 

Hi Eric,

In the past, I worked with Andy on several occasions to get my kernel
changes incorporated into the upstream OpenSSL version of the
'perlasm' .pl file.

This achieves a number of things:
- we get a readable version of the code in the kernel tree,
- our changes are reviewed upstream
- upgrading involves grabbing the latest .pl file rather than merging
generated code (which requires careful review)
- GPLv2 permission is made explicit, rather than something someone
claims to have reached agreement on,
- no legal ambiguity whether the output of the perl script is covered
by the license (which is what we incorporate here)

Note that the 'available under GPL depending on where you obtained the
code' in the CRYPTOGAMS license likely conflicts with the GPL itself,
but I am not a lawyer so I'd much prefer having the upstream copy
mention this explicitly.

Re: [PATCH] crypto: remove speck

2018-08-07 Thread Ard Biesheuvel

On 7 August 2018 at 05:15, Theodore Y. Ts'o  wrote:
> On Mon, Aug 06, 2018 at 08:12:38PM -0700, Eric Biggers wrote:
>> I mention this because people are naturally going to be curious about that, 
>> e.g.
>> speculating that Google found a "backdoor" -- remember that we do have some 
>> good
>> cryptographers!  I'm just stating what we know, out of honesty and openness; 
>> I
>> don't really intend to be arguing for Speck with this statement, and in any 
>> case
>> we already made the decision to not use Speck.
>
> Let's be clear --- the arguments about whether or not to use Speck,
> and whether or not to remove Speck from the kernel, are purely
> political --- not techinical.
>

Whether or not to use it may be a political rather than a technical
motivation. But the reason I acked this patch is not because removing
it aligns with my political conviction regarding Speck, but simply
because its contributor, primary intended user and therefore de facto
maintainer stated publicly that it no longer had any intention to use
it going forward.

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-07 Thread joeyli

On Tue, Aug 07, 2018 at 03:43:12PM +0800, Yu Chen wrote:
> On Mon, Aug 06, 2018 at 06:39:58PM +0800, joeyli wrote:
> > On Mon, Aug 06, 2018 at 04:45:34PM +0800, Yu Chen wrote:
> > > Hi Pavel,
> > > On Sun, Aug 05, 2018 at 12:02:00PM +0200, Pavel Machek wrote:
> > > > Hi!
> > > > 
> > > > > > User space doesn't need to involve. The EFI root key is generated by
> > > > > > EFI boot stub and be transfer to kernel. It's stored in EFI boot 
> > > > > > service
> > > > > > variable that it can only be accessed by trusted EFI binary when
> > > > > > secure boot is enabled.
> > > > > >
> > > > > Okay, this apply to the 'suspend' phase, right?
> > > > > I'm still a little confused about the 'resume' phase.
> > > > > Taking encryption as example(not signature),
> > > > > the purpose of doing hibernation encryption is to prevent other users
> > > > > from stealing ram content. Say, user A uses a  passphrase to generate 
> > > > > the
> > > > 
> > > > No, I don't think that's purpose here.
> > > > 
> > > > Purpose here is to prevent user from reading/modifying kernel memory
> > > > content on machine he owns.
> > > >
> > > Say, A puts his laptop into hibernation and walks away,
> > > and B walks by, and opens A's laptop and wakes up the system and he
> > > can do what he wants. Although EFI key/TPM trusted key is enabled,
> > > currently there's no certification during resume, which sounds
> > > unsafe to me. Afterall, the original requirement is to probe
> > > user for password during resume, which sounds more natural.
> > 
> > OK, I saw your case. This is a physical accessing.
> > 
> > I have a question: The suspend to memory also has the same behavior
> > and more people are using suspend. Should we think a common solution
> > to cover S3 and S4? 
> >
> Since STD behaves more likely a boot up, STR does not have solid
> requirement for certification.

In your A/B user case, when STR, B user can still open A's laptop
and wakes up the system and do what he wants because he can get
the console like resume from STD. I didn't see difference.

> > > > Strange as it may sound, that is what "secure" boot requires (and what
> > > > Disney wants).
> > > > 
> > > Ok, I understand this requirement, and I'm also concerning how to
> > > distinguish different users from seeing data of each other.
> > > 
> > > Joey,
> > > I'm thinking of a possible direction which could take advantage
> > > of the password.  It leverages either EFI key or TPM
> > > trusted key to get it done. Does it make sense?
> > > 
> > > 1. The user space generates a symetric key key_user using
> > >the password, and passes the key_user to the kernel as the master
> > >key.
> > > 2. The kernel uses the EFI key or TPM trusted key to encrypt
> > >the key_user thus gets a encrypt_key.
> > > 3. Uses the encrypt_key to do snapshot encryption
> > > 4. During resume, the same encrypt_key is generated following
> > >the same steps(I assume the same EFI key or TPM key could be fetched
> > >during resumed, right?) and do the snapshot decryption.
> > >
> > 
> > Yes, we can use TPM key to protect the user key. But I suggest that we
> > should give user a function to disable the user key because not everyone
> > want to key-in a password for hibernate/resume and also snapshot image
> > encryption.
> > 
> > Two policies:
> > - When user key-in user key, the snapshot image must be encryption.
> > - Without key-in user key, I still want the snapshot image can be 
> > encryption.
> > 
> > No matter that the user key be key-in or not, the snapshot image must be
> > encrypted by a kernel key. So I suggest that we treat the user key as a salt
> > for snapshot image encryption and authentication. If the user key
> > be disabled, then kernel just generates a random number as a salt.
> > 
> > Actually, the kernel must compares the user key before snapshot decryption.
> > If the user key doesn't match but user space still triggers furture resume
> > process. Then kernel direct drops the snapshot image. 
> >  
> Anyway I'm ok with using TPM for major 'security', please feel free
> to send a second version out, and for certification implementation
> we can have further discussion on that later.
> 

Regards
Joey Lee

Re: [PATCH 1/9] crypto: add zbufsize() interface

On Tue, Aug 7, 2018 at 2:45 AM, Herbert Xu  wrote:
> On Thu, Aug 02, 2018 at 02:51:10PM -0700, Kees Cook wrote:
>> When pstore was refactored to use the crypto compress API in:
>>
>>   commit cb3bee0369bc ("pstore: Use crypto compress API")
>>
>> nearly all the pstore-specific compression routines were replaced with
>> the existing crypto compression API. One case remained: calculating the
>> "worst case" compression sizes ahead of time so it could have a buffer
>> preallocated for doing compression (which was called "zbufsize").
>>
>> To make pstore fully algorithm-agnostic, the compression API needs to
>> grow this functionality. This adds the interface to support querying the
>> "worst case" estimate, with a new "zbufsize" routine that each compressor
>> can implement. The per-compressor implementations come in later commits.
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  crypto/compress.c   |  9 +
>>  include/crypto/internal/scompress.h | 11 +++
>>  include/linux/crypto.h  | 12 
>>  3 files changed, 32 insertions(+)
>>
>> diff --git a/crypto/compress.c b/crypto/compress.c
>> index f2d522924a07..29a80bb3b9d3 100644
>> --- a/crypto/compress.c
>> +++ b/crypto/compress.c
>> @@ -33,12 +33,21 @@ static int crypto_decompress(struct crypto_tfm *tfm,
>>  dlen);
>>  }
>>
>> +static int crypto_zbufsize(struct crypto_tfm *tfm,
>> +unsigned int slen, unsigned int *dlen)
>> +{
>> + if (!tfm->__crt_alg->cra_compress.coa_zbufsize)
>> + return -ENOTSUPP;
>> + return tfm->__crt_alg->cra_compress.coa_zbufsize(tfm, slen, dlen);
>> +}
>
> Please don't add new features to the old compress interface.  Any
> new improvements should be added to scomp/acomp only.  Users who
> need new features should be converted.

So, keep crypto_scomp_zbufsize() and drop crypto_comp_zbufsize() and
crypto_zbufsize()? Should I add crypto_acomp_zbufsize()?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-07 Thread Jason A. Donenfeld

Hey Andy, Eric, & all,

I've started the work of separating this out into 16 individual
commits, have addressed numerous other things brought up like the
ifdef maze, and have begun rewriting (parts of) the original commit
message. It's still a work in progress, and I still have some work to
do, but if you want to follow along, things are happening here:

https://git.zx2c4.com/linux-dev/log/?h=jd/wireguard

I'll be rebasing and reworking this continuously, but that's how it's
shaping up.

As I'm still working on it, I won't be submitting v2 today, but if you
have suggestions or concerns or want to poke me while I'm working on
v2, don't hesitate to send me private email or ping me in IRC (I'm
"zx2c4" there) to chat.

Regards,
Jason

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-07 Thread Andy Lutomirski

On Tue, Aug 7, 2018 at 11:54 AM, Jason A. Donenfeld  wrote:
> Hey Andy, Eric, & all,
>
> I've started the work of separating this out into 16 individual
> commits, have addressed numerous other things brought up like the
> ifdef maze, and have begun rewriting (parts of) the original commit
> message. It's still a work in progress, and I still have some work to
> do, but if you want to follow along, things are happening here:
>
> https://git.zx2c4.com/linux-dev/log/?h=jd/wireguard
>
> I'll be rebasing and reworking this continuously, but that's how it's
> shaping up.
>
> As I'm still working on it, I won't be submitting v2 today, but if you
> have suggestions or concerns or want to poke me while I'm working on
> v2, don't hesitate to send me private email or ping me in IRC (I'm
> "zx2c4" there) to chat.
>
> Regards,
> Jason

For "zinc: add simd helper", I think it should be in include/linux,
and include/linux/simd.h should (immediately or maybe in the future)
include  to pick up arch-specific stuff.  And the patch
should get sent to linux-a...@vger.kernel.org.

In your blake2s_arch() implementation, you're not passing in a
simd_context_t.  Is that still a work in progress?  I thought the plan
was to pass it in rather than doing the check in the _arch()
functions.

Re: [PATCH] crypto: remove speck

Hi Jeffrey,

On Mon, Aug 06, 2018 at 09:03:27PM -0400, Jeffrey Walton wrote:
> On Mon, Aug 6, 2018 at 7:04 PM, Jason A. Donenfeld  wrote:
> > These are unused, undesired, and have never actually been used by
> > anybody. The original authors of this code have changed their mind about
> > its inclusion. Therefore, this patch removes it.
> 
> I think it may be unwise to completely discard Speck for several
> reasons. The two biggest pain points for me are:
> 
>   - political concerns addressed by other ciphers
>   - high quality lightweight block cipher implementation
>   - some regulated industries will need it for their problem domains
> 
> It seems to me the political concerns were addressed by not using
> Speck for Android. I don't believe HPolyC and Speck are orthogonal.
> Instead they provide the user with a choice which is usually a good
> thing.
> 
> I also think allowing politics a heavy hand endangers other ciphers
> like SM3 and SM4. I would advise against removing them just because
> they are Chinese ciphers. I suppose the same could be argued for North
> Korea and Jipsam and Pilsung (if North Korea ever offers their
> ciphers).
> 
> I think Eric, Ard and other contributions lead to a high quality
> implementation of Speck. High quality implementations that "just
> works" everywhere on multiple platforms are rather hard to come by.
> The kernel's unified implementation ensures lots of folks don't go
> making lots of mistakes when rolling their own.
> 
> There are verticals that will need a choice or alternative like Speck.
> US Aerospace, US Automotive and US Hoteliers come to mind. US
> Financial my use them too (they having some trading platforms with
> absurd requirements that make Simon and Speck appear bloated and
> overweight).  Some of the verticals are going to need an alternative
> that meets technical and security goals and pass the audits.
> 
> Choice is a good thing. Users need choices for technical, regulatory
> and legal reasons.
> 

This is about the Linux kernel, though.  The purpose of the Linux kernel's
crypto API is to allow kernel code to do crypto, and also sometimes to allow
access to crypto accelerator hardware.  It's *not* to provide a comprehensive
collection of algorithms for userspace programs to use, or to provide reference
implementations for crypto algorithms.  Before our change in plans, we needed
Speck-XTS in the kernel so that it could be used in dm-crypt and fscrypt, which
are kernel features and therefore require in-kernel implementations.  And of
course, our proposed new solution, HPolyC, will need to be supported in the
kernel too for the same reason.  It's just the way transparent disk and file
encryption works; the crypto needs to be done in the kernel.  But do your other
mentioned use cases actually need Speck to be in the Linux kernel?

I doubt it, in general.  Userspace applications of Speck seem likely to use
their own implementation.  Remember, Speck is extraordinarily simple, and the
original paper has example C code.  So if someone really wanted to use Speck in
a userspace application, they're likely to just add an implementation directly
to their application, rather than coding up a Linux-specific thing using AF_ALG.
Or they'd use a Speck implementation from a library like Crypto++.

And while I think it's clear that the reasons why a significant number of people
don't want to *use* Speck (even in legitimate use cases) are heavily political,
that doesn't necessarily mean that the reason for removing Speck from Linux
needs to be as political.  If there's code in the kernel with no known users
anymore, whether it's a crypto algorithm or something like an obsolete CPU
architecture, then it's eligible to be removed so that it doesn't need to be
maintained -- potentially even if it breaks the userspace ABI (since it's not
really broken if no one cares)...  Note that Skein is another example of a
crypto algorithm that was recently removed from the kernel because no one was
using it, though to be fair it was in the staging directory too.  It's still a
bit of a double standard as there are likely other ciphers in the kernel that
actually no one is using either, but it's a valid argument...

Thanks,

- Eric

Re: [PATCH v7 2/9] crypto: cbc: Remove VLA usage

On Tue, Aug 7, 2018 at 2:47 AM, Herbert Xu  wrote:
> On Thu, Aug 02, 2018 at 03:51:45PM -0700, Kees Cook wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> uses the upper bounds on blocksize. Since this is always a cipher
>> blocksize, use the existing cipher max blocksize.
>>
>> [1] 
>> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  include/crypto/cbc.h | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/crypto/cbc.h b/include/crypto/cbc.h
>> index f5b8bfc22e6d..47db0aac2ab9 100644
>> --- a/include/crypto/cbc.h
>> +++ b/include/crypto/cbc.h
>> @@ -113,7 +113,9 @@ static inline int crypto_cbc_decrypt_inplace(
>>   unsigned int bsize = crypto_skcipher_blocksize(tfm);
>>   unsigned int nbytes = walk->nbytes;
>>   u8 *src = walk->src.virt.addr;
>> - u8 last_iv[bsize];
>> + u8 last_iv[MAX_CIPHER_BLOCKSIZE];
>> +
>> + BUG_ON(bsize > sizeof(last_iv));
>
> Ugh, please don't add these BUG_ONs.  Add them to the places where
> the algorithm is created (if they aren't checking that already).

It's already being checked (cra_blocksize vs MAX_CIPHER_BLOCKSIZE) so
I was just adding the BUG_ON to catch "impossible" behavior. I'll
leave it out in the next revision.

-Kees

-- 
Kees Cook
Pixel Security

[PATCH v8 3/9] crypto: ccm: Remove VLA usage

From: Ard Biesheuvel 

In the quest to remove all stack VLA usage from the kernel[1], this drops
AHASH_REQUEST_ON_STACK by preallocating the ahash request area combined
with the skcipher area (which are not used at the same time).

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Ard Biesheuvel 
Signed-off-by: Kees Cook 
---
 crypto/ccm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/crypto/ccm.c b/crypto/ccm.c
index 0a083342ec8c..b242fd0d3262 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -50,7 +50,10 @@ struct crypto_ccm_req_priv_ctx {
u32 flags;
struct scatterlist src[3];
struct scatterlist dst[3];
-   struct skcipher_request skreq;
+   union {
+   struct ahash_request ahreq;
+   struct skcipher_request skreq;
+   };
 };
 
 struct cbcmac_tfm_ctx {
@@ -181,7 +184,7 @@ static int crypto_ccm_auth(struct aead_request *req, struct 
scatterlist *plain,
struct crypto_ccm_req_priv_ctx *pctx = crypto_ccm_reqctx(req);
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_ccm_ctx *ctx = crypto_aead_ctx(aead);
-   AHASH_REQUEST_ON_STACK(ahreq, ctx->mac);
+   struct ahash_request *ahreq = &pctx->ahreq;
unsigned int assoclen = req->assoclen;
struct scatterlist sg[3];
u8 *odata = pctx->odata;
@@ -427,7 +430,7 @@ static int crypto_ccm_init_tfm(struct crypto_aead *tfm)
crypto_aead_set_reqsize(
tfm,
align + sizeof(struct crypto_ccm_req_priv_ctx) +
-   crypto_skcipher_reqsize(ctr));
+   max(crypto_ahash_reqsize(mac), crypto_skcipher_reqsize(ctr)));
 
return 0;
 
-- 
2.17.1

[PATCH v8 5/9] dm: Remove VLA usage from hashes

In the quest to remove all stack VLA usage from the kernel[1], this uses
the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper
bounds on stack usage.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 drivers/md/dm-integrity.c  | 23 +--
 drivers/md/dm-verity-fec.c |  5 -
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 86438b2f10dd..884edd7cf1d0 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -521,7 +521,12 @@ static void section_mac(struct dm_integrity_c *ic, 
unsigned section, __u8 result
}
memset(result + size, 0, JOURNAL_MAC_SIZE - size);
} else {
-   __u8 digest[size];
+   __u8 digest[HASH_MAX_DIGESTSIZE];
+
+   if (WARN_ON(size > sizeof(digest))) {
+   dm_integrity_io_error(ic, "digest_size", -EINVAL);
+   goto err;
+   }
r = crypto_shash_final(desc, digest);
if (unlikely(r)) {
dm_integrity_io_error(ic, "crypto_shash_final", r);
@@ -1244,7 +1249,7 @@ static void integrity_metadata(struct work_struct *w)
struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct 
dm_integrity_io));
char *checksums;
unsigned extra_space = unlikely(digest_size > ic->tag_size) ? 
digest_size - ic->tag_size : 0;
-   char checksums_onstack[ic->tag_size + extra_space];
+   char checksums_onstack[HASH_MAX_DIGESTSIZE];
unsigned sectors_to_process = dio->range.n_sectors;
sector_t sector = dio->range.logical_sector;
 
@@ -1253,8 +1258,14 @@ static void integrity_metadata(struct work_struct *w)
 
checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> 
ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space,
GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN);
-   if (!checksums)
+   if (!checksums) {
checksums = checksums_onstack;
+   if (WARN_ON(extra_space &&
+   digest_size > sizeof(checksums_onstack))) {
+   r = -EINVAL;
+   goto error;
+   }
+   }
 
__bio_for_each_segment(bv, bio, iter, dio->orig_bi_iter) {
unsigned pos;
@@ -1466,7 +1477,7 @@ static bool __journal_read_write(struct dm_integrity_io 
*dio, struct bio *bio,
} while (++s < ic->sectors_per_block);
 #ifdef INTERNAL_VERIFY
if (ic->internal_hash) {
-   char 
checksums_onstack[max(crypto_shash_digestsize(ic->internal_hash), 
ic->tag_size)];
+   char 
checksums_onstack[max(HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)];
 
integrity_sector_checksum(ic, 
logical_sector, mem + bv.bv_offset, checksums_onstack);
if (unlikely(memcmp(checksums_onstack, 
journal_entry_tag(ic, je), ic->tag_size))) {
@@ -1516,7 +1527,7 @@ static bool __journal_read_write(struct dm_integrity_io 
*dio, struct bio *bio,
if (ic->internal_hash) {
unsigned digest_size = 
crypto_shash_digestsize(ic->internal_hash);
if (unlikely(digest_size > 
ic->tag_size)) {
-   char 
checksums_onstack[digest_size];
+   char 
checksums_onstack[HASH_MAX_DIGESTSIZE];
integrity_sector_checksum(ic, 
logical_sector, (char *)js, checksums_onstack);
memcpy(journal_entry_tag(ic, 
je), checksums_onstack, ic->tag_size);
} else
@@ -1937,7 +1948,7 @@ static void do_journal_write(struct dm_integrity_c *ic, 
unsigned write_start,
unlikely(from_replay) &&
 #endif
ic->internal_hash) {
-   char 
test_tag[max(crypto_shash_digestsize(ic->internal_hash), ic->tag_size)];
+   char test_tag[max_t(size_t, 
HASH_MAX_DIGESTSIZE, MAX_TAG_SIZE)];
 
integrity_sector_checksum(ic, sec + ((l 
- j) << ic->sb->log2_sectors_per_block),
  (char 
*)access_journal_data(ic, i, l), test_tag);
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index 684af08d0747..0ce04e5b4afb 100644
--- a/drivers

[PATCH v8 6/9] crypto alg: Introduce generic max blocksize and alignmask

In the quest to remove all stack VLA usage from the kernel[1], this
exposes a new general upper bound on crypto blocksize and alignmask
(higher than for the existing cipher limits) for VLA removal,
and introduces new checks.

At present, the highest cra_alignmask in the kernel is 63. The highest
cra_blocksize is 144 (SHA3_224_BLOCK_SIZE, 18 8-byte words). For the
new blocksize limit, I went with 160 (20 8-byte words).

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 crypto/algapi.c | 7 ++-
 include/crypto/algapi.h | 4 +++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index c0755cf4f53f..496fc51bf215 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -57,9 +57,14 @@ static int crypto_check_alg(struct crypto_alg *alg)
if (alg->cra_alignmask & (alg->cra_alignmask + 1))
return -EINVAL;
 
-   if (alg->cra_blocksize > PAGE_SIZE / 8)
+   /* General maximums for all algs. */
+   if (alg->cra_alignmask > MAX_ALGAPI_ALIGNMASK)
return -EINVAL;
 
+   if (alg->cra_blocksize > MAX_ALGAPI_BLOCKSIZE)
+   return -EINVAL;
+
+   /* Lower maximums for specific alg types. */
if (!alg->cra_type && (alg->cra_flags & CRYPTO_ALG_TYPE_MASK) ==
   CRYPTO_ALG_TYPE_CIPHER) {
if (alg->cra_alignmask > MAX_CIPHER_ALIGNMASK)
diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index bd5e8ccf1687..21371ac8f355 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -20,8 +20,10 @@
 /*
  * Maximum values for blocksize and alignmask, used to allocate
  * static buffers that are big enough for any combination of
- * ciphers and architectures.
+ * algs and architectures. Ciphers have a lower maximum size.
  */
+#define MAX_ALGAPI_BLOCKSIZE   160
+#define MAX_ALGAPI_ALIGNMASK   63
 #define MAX_CIPHER_BLOCKSIZE   16
 #define MAX_CIPHER_ALIGNMASK   15
 
-- 
2.17.1

[PATCH v8 9/9] crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

In the quest to remove all stack VLA usage from the kernel[1], this
caps the skcipher request size similar to other limits and adds a sanity
check at registration. Looking at instrumented tcrypt output, the largest
is for lrw:

crypt: testing lrw(aes)
crypto_skcipher_set_reqsize: 8
crypto_skcipher_set_reqsize: 88
crypto_skcipher_set_reqsize: 472

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 include/crypto/internal/skcipher.h | 1 +
 include/crypto/skcipher.h  | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/crypto/internal/skcipher.h 
b/include/crypto/internal/skcipher.h
index e42f7063f245..5035482cbe68 100644
--- a/include/crypto/internal/skcipher.h
+++ b/include/crypto/internal/skcipher.h
@@ -130,6 +130,7 @@ static inline struct crypto_skcipher *crypto_spawn_skcipher(
 static inline void crypto_skcipher_set_reqsize(
struct crypto_skcipher *skcipher, unsigned int reqsize)
 {
+   BUG_ON(reqsize > SKCIPHER_MAX_REQSIZE);
skcipher->reqsize = reqsize;
 }
 
diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 2f327f090c3e..c48e194438cf 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -139,9 +139,11 @@ struct skcipher_alg {
struct crypto_alg base;
 };
 
+#define SKCIPHER_MAX_REQSIZE   472
+
 #define SKCIPHER_REQUEST_ON_STACK(name, tfm) \
char __##name##_desc[sizeof(struct skcipher_request) + \
-   crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR; \
+   SKCIPHER_MAX_REQSIZE] CRYPTO_MINALIGN_ATTR; \
struct skcipher_request *name = (void *)__##name##_desc
 
 /**
-- 
2.17.1

[PATCH v8 4/9] crypto: hash: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this
removes the VLAs in SHASH_DESC_ON_STACK (via crypto_shash_descsize())
by using the maximum allowable size (which is now more clearly captured
in a macro), along with a few other cases. Similar limits are turned into
macros as well.

A review of existing sizes shows that SHA512_DIGEST_SIZE (64) is the
largest digest size and that sizeof(struct sha3_state) (360) is the
largest descriptor size. The corresponding maximums are reduced.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 crypto/ahash.c| 4 ++--
 crypto/algif_hash.c   | 2 +-
 crypto/shash.c| 6 +++---
 include/crypto/hash.h | 6 +-
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index a64c143165b1..78aaf2158c43 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -550,8 +550,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
 {
struct crypto_alg *base = &alg->halg.base;
 
-   if (alg->halg.digestsize > PAGE_SIZE / 8 ||
-   alg->halg.statesize > PAGE_SIZE / 8 ||
+   if (alg->halg.digestsize > HASH_MAX_DIGESTSIZE ||
+   alg->halg.statesize > HASH_MAX_STATESIZE ||
alg->halg.statesize == 0)
return -EINVAL;
 
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index bfcf595fd8f9..d0cde541beb6 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -239,7 +239,7 @@ static int hash_accept(struct socket *sock, struct socket 
*newsock, int flags,
struct alg_sock *ask = alg_sk(sk);
struct hash_ctx *ctx = ask->private;
struct ahash_request *req = &ctx->req;
-   char state[crypto_ahash_statesize(crypto_ahash_reqtfm(req)) ? : 1];
+   char state[HASH_MAX_STATESIZE];
struct sock *sk2;
struct alg_sock *ask2;
struct hash_ctx *ctx2;
diff --git a/crypto/shash.c b/crypto/shash.c
index 5d732c6bb4b2..86d76b5c626c 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -458,9 +458,9 @@ static int shash_prepare_alg(struct shash_alg *alg)
 {
struct crypto_alg *base = &alg->base;
 
-   if (alg->digestsize > PAGE_SIZE / 8 ||
-   alg->descsize > PAGE_SIZE / 8 ||
-   alg->statesize > PAGE_SIZE / 8)
+   if (alg->digestsize > HASH_MAX_DIGESTSIZE ||
+   alg->descsize > HASH_MAX_DESCSIZE ||
+   alg->statesize > HASH_MAX_STATESIZE)
return -EINVAL;
 
base->cra_type = &crypto_shash_type;
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 76e432cab75d..21587011ab0f 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -151,9 +151,13 @@ struct shash_desc {
void *__ctx[] CRYPTO_MINALIGN_ATTR;
 };
 
+#define HASH_MAX_DIGESTSIZE 64
+#define HASH_MAX_DESCSIZE  360
+#define HASH_MAX_STATESIZE 512
+
 #define SHASH_DESC_ON_STACK(shash, ctx)  \
char __##shash##_desc[sizeof(struct shash_desc) + \
-   crypto_shash_descsize(ctx)] CRYPTO_MINALIGN_ATTR; \
+   HASH_MAX_DESCSIZE] CRYPTO_MINALIGN_ATTR; \
struct shash_desc *shash = (struct shash_desc *)__##shash##_desc
 
 /**
-- 
2.17.1

[PATCH v8 7/9] crypto: qat: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this uses
the new upper bound for the stack buffer. Also adds a sanity check.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 drivers/crypto/qat/qat_common/qat_algs.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/qat_algs.c 
b/drivers/crypto/qat/qat_common/qat_algs.c
index 1138e41d6805..a28edf7b792f 100644
--- a/drivers/crypto/qat/qat_common/qat_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_algs.c
@@ -153,8 +153,8 @@ static int qat_alg_do_precomputes(struct 
icp_qat_hw_auth_algo_blk *hash,
struct sha512_state sha512;
int block_size = crypto_shash_blocksize(ctx->hash_tfm);
int digest_size = crypto_shash_digestsize(ctx->hash_tfm);
-   char ipad[block_size];
-   char opad[block_size];
+   char ipad[MAX_ALGAPI_BLOCKSIZE];
+   char opad[MAX_ALGAPI_BLOCKSIZE];
__be32 *hash_state_out;
__be64 *hash512_state_out;
int i, offset;
@@ -164,6 +164,10 @@ static int qat_alg_do_precomputes(struct 
icp_qat_hw_auth_algo_blk *hash,
shash->tfm = ctx->hash_tfm;
shash->flags = 0x0;
 
+   if (WARN_ON(block_size > sizeof(ipad) ||
+   sizeof(ipad) != sizeof(opad)))
+   return -EINVAL;
+
if (auth_keylen > block_size) {
int ret = crypto_shash_digest(shash, auth_key,
  auth_keylen, ipad);
-- 
2.17.1

[PATCH v8 2/9] crypto: cbc: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this
uses the upper bounds on blocksize. Since this is always a cipher
blocksize, use the existing cipher max blocksize.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 include/crypto/cbc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/crypto/cbc.h b/include/crypto/cbc.h
index f5b8bfc22e6d..3bf28beefa33 100644
--- a/include/crypto/cbc.h
+++ b/include/crypto/cbc.h
@@ -113,7 +113,7 @@ static inline int crypto_cbc_decrypt_inplace(
unsigned int bsize = crypto_skcipher_blocksize(tfm);
unsigned int nbytes = walk->nbytes;
u8 *src = walk->src.virt.addr;
-   u8 last_iv[bsize];
+   u8 last_iv[MAX_CIPHER_BLOCKSIZE];
 
/* Start of the last block. */
src += nbytes - (nbytes & (bsize - 1)) - bsize;
-- 
2.17.1

[PATCH v8 1/9] crypto: xcbc: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this uses
the maximum blocksize and adds a sanity check. For xcbc, the blocksize
must always be 16, so use that, since it's already being enforced during
instantiation.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 crypto/xcbc.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/crypto/xcbc.c b/crypto/xcbc.c
index 25c75af50d3f..c055f57fab11 100644
--- a/crypto/xcbc.c
+++ b/crypto/xcbc.c
@@ -57,15 +57,17 @@ struct xcbc_desc_ctx {
u8 ctx[];
 };
 
+#define XCBC_BLOCKSIZE 16
+
 static int crypto_xcbc_digest_setkey(struct crypto_shash *parent,
 const u8 *inkey, unsigned int keylen)
 {
unsigned long alignmask = crypto_shash_alignmask(parent);
struct xcbc_tfm_ctx *ctx = crypto_shash_ctx(parent);
-   int bs = crypto_shash_blocksize(parent);
u8 *consts = PTR_ALIGN(&ctx->ctx[0], alignmask + 1);
int err = 0;
-   u8 key1[bs];
+   u8 key1[XCBC_BLOCKSIZE];
+   int bs = sizeof(key1);
 
if ((err = crypto_cipher_setkey(ctx->child, inkey, keylen)))
return err;
@@ -212,7 +214,7 @@ static int xcbc_create(struct crypto_template *tmpl, struct 
rtattr **tb)
return PTR_ERR(alg);
 
switch(alg->cra_blocksize) {
-   case 16:
+   case XCBC_BLOCKSIZE:
break;
default:
goto out_put_alg;
-- 
2.17.1

[PATCH v8 0/9] crypto: Remove VLA usage

v8 cover letter:

I continue to hope this can land in v4.19, but I realize that's unlikely.
It would be nice, though, if some of the "trivial" patches could get taken
(e.g. cbc, xcbc, ccm VLA removals) so I don't have to keep repeating them.
*fingers crossed*

Series cover letter:

This is nearly the last of the VLA removals[1], but it's one of the
largest because crypto gets used in lots of places. After looking
through code, usage, reading the threads Gustavo started, and comparing
the use-cases to the other VLA removals that have landed in the kernel,
I think this series is likely the best way forward to shut the door on
VLAs forever.

For background, the crypto stack usage is for callers to do an immediate
bit of work that doesn't allocate new memory. This means that other VLA
removal techniques (like just using kmalloc) aren't workable, and the
next common technique is needed: examination of maximum stack usage and
the addition of sanity checks. This series does that, and in several
cases, these maximums were already implicit in the code.

This series is intended to land via the crypto tree for 4.19, though it
touches dm, networking, and a few other things as well, since there are
dependent patches (new crypto #defines being used, etc).

Thanks!

-Kees

[1]
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Changelog:

v8:
- remove "impossible condition" BUG-check in cbc.

v7:
- refresh and reorganize without ahash->shash conversions
- merge hash maximums for both shash and ahash

v6:
- make xcbc blocksize unconditional
- add ahash-to-shash conversion patch series to entirely remove
AHASH_REQUEST_ON_STACK from the kernel

v5:
- limit AHASH_REQUEST_ON_STACK size only to non-async hash wrapping.
- sanity-check ahash reqsize only when doing shash wrapping.
- remove frame_warn changes in favor of shash conversions and other fixes.
- send ahash to shash conversion patches and other fixes separately.

v4:
- add back *_REQUEST_ON_STACK patches.
- programmatically find stack sizes for *_REQUEST_ON_STACK patches.
- whitelist some code that trips FRAME_WARN on 32-bit builds.
- fix alignment patches.

v3:
- drop *_REQUEST_ON_STACK patches. The rest of this series is pretty
straight-forward, and I'd like to get them tested in -next while
we continue to chip away at the *_REQUEST_ON_STACK VLA removal patches
separately. "Part 2" will continue with those.

v2:
- use 512 instead of PAGE_SIZE / 8 to avoid bloat on large-page archs.
- swtich xcbc to "16" max universally.
- fix typo in bounds check for dm buffer size.
- examine actual reqsizes for skcipher and ahash instead of guessing.
- improve names and comments for alg maxes

Ard Biesheuvel (1):
crypto: ccm: Remove VLA usage

Kees Cook (8):
crypto: xcbc: Remove VLA usage
crypto: cbc: Remove VLA usage
crypto: hash: Remove VLA usage
dm: Remove VLA usage from hashes
crypto alg: Introduce generic max blocksize and alignmask
crypto: qat: Remove VLA usage
crypto: shash: Remove VLA usage in unaligned hashing
crypto: skcipher: Remove VLA usage for SKCIPHER_REQUEST_ON_STACK

--
2.17.1

[PATCH v8 8/9] crypto: shash: Remove VLA usage in unaligned hashing

In the quest to remove all stack VLA usage from the kernel[1], this uses
the newly defined max alignment to perform unaligned hashing to avoid
VLAs, and drops the helper function while adding sanity checks on the
resulting buffer sizes. Additionally, the __aligned_largest macro is
removed since this helper was the only user.

[1] 
https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com

Signed-off-by: Kees Cook 
---
 crypto/shash.c   | 27 ---
 include/linux/compiler-gcc.h |  1 -
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/crypto/shash.c b/crypto/shash.c
index 86d76b5c626c..d21f04d70dce 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -73,13 +73,6 @@ int crypto_shash_setkey(struct crypto_shash *tfm, const u8 
*key,
 }
 EXPORT_SYMBOL_GPL(crypto_shash_setkey);
 
-static inline unsigned int shash_align_buffer_size(unsigned len,
-  unsigned long mask)
-{
-   typedef u8 __aligned_largest u8_aligned;
-   return len + (mask & ~(__alignof__(u8_aligned) - 1));
-}
-
 static int shash_update_unaligned(struct shash_desc *desc, const u8 *data,
  unsigned int len)
 {
@@ -88,11 +81,17 @@ static int shash_update_unaligned(struct shash_desc *desc, 
const u8 *data,
unsigned long alignmask = crypto_shash_alignmask(tfm);
unsigned int unaligned_len = alignmask + 1 -
 ((unsigned long)data & alignmask);
-   u8 ubuf[shash_align_buffer_size(unaligned_len, alignmask)]
-   __aligned_largest;
+   /*
+* We cannot count on __aligned() working for large values:
+* https://patchwork.kernel.org/patch/9507697/
+*/
+   u8 ubuf[MAX_ALGAPI_ALIGNMASK * 2];
u8 *buf = PTR_ALIGN(&ubuf[0], alignmask + 1);
int err;
 
+   if (WARN_ON(buf + unaligned_len > ubuf + sizeof(ubuf)))
+   return -EINVAL;
+
if (unaligned_len > len)
unaligned_len = len;
 
@@ -124,11 +123,17 @@ static int shash_final_unaligned(struct shash_desc *desc, 
u8 *out)
unsigned long alignmask = crypto_shash_alignmask(tfm);
struct shash_alg *shash = crypto_shash_alg(tfm);
unsigned int ds = crypto_shash_digestsize(tfm);
-   u8 ubuf[shash_align_buffer_size(ds, alignmask)]
-   __aligned_largest;
+   /*
+* We cannot count on __aligned() working for large values:
+* https://patchwork.kernel.org/patch/9507697/
+*/
+   u8 ubuf[MAX_ALGAPI_ALIGNMASK + HASH_MAX_DIGESTSIZE];
u8 *buf = PTR_ALIGN(&ubuf[0], alignmask + 1);
int err;
 
+   if (WARN_ON(buf + ds > ubuf + sizeof(ubuf)))
+   return -EINVAL;
+
err = shash->final(desc, buf);
if (err)
goto out;
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index f1a7492a5cc8..1f1cdef36a82 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -125,7 +125,6 @@
  */
 #define __pure __attribute__((pure))
 #define __aligned(x)   __attribute__((aligned(x)))
-#define __aligned_largest  __attribute__((aligned))
 #define __printf(a, b) __attribute__((format(printf, a, b)))
 #define __scanf(a, b)  __attribute__((format(scanf, a, b)))
 #define __attribute_const____attribute__((__const__))
-- 
2.17.1

Re: [RFC PATCH 3/9] crypto: chacha20-generic - refactor to allow varying number of rounds

On Tue, Aug 07, 2018 at 11:21:04AM +0100, Samuel Neves wrote:
> > The best attack on ChaCha breaks 7 rounds, and that attack requires 2^248 
> > operations.
> 
> This number, as far as I can tell, comes from the "New features of
> Latin dances" paper. There have been some minor improvements in the
> intervening 10 years, e.g., [1, 2, 3, 4], which pull back the
> complexity of breaking ChaCha7 down to 2^235. In any case, every
> attack so far appears to hit a wall at 8 rounds, with 12 rounds---the
> recommended eSTREAM round number for Salsa20---seeming to offer a
> reasonable security margin, still somewhat better than that of the
> AES.
> 
> Best regards,
> Samuel Neves
> 
> [1] https://eprint.iacr.org/2015/698
> [2] https://eprint.iacr.org/2015/217
> [3] https://eprint.iacr.org/2016/1034
> [4] https://doi.org/10.1016/j.dam.2017.04.034

Thanks Samuel, I'll fix that number in the next iteration of the patchset.

- Eric

Re: [RFC PATCH 8/9] crypto: arm/poly1305 - add NEON accelerated Poly1305 implementation

Hi Ard,

On Tue, Aug 07, 2018 at 02:09:05PM +0200, Ard Biesheuvel wrote:
> On 7 August 2018 at 00:32, Eric Biggers  wrote:
> > From: Eric Biggers 
> >
> > Add the Poly1305 code from OpenSSL, which was written by Andy Polyakov.
> > I took the .S file from WireGuard, whose author has made the needed
> > tweaks for Linux kernel integration and verified that Andy had given
> > permission for GPLv2 distribution.  I didn't make any additional changes
> > to the .S file.
> >
> > Note, for HPolyC I'd eventually like a Poly1305 implementation that
> > allows precomputing powers of the key.  But for now this implementation
> > just provides the existing semantics where the key and nonce are treated
> > as a "one-time key" that must be provided for every message.
> >
> > Signed-off-by: Eric Biggers 
> 
> Hi Eric,
> 
> In the past, I worked with Andy on several occasions to get my kernel
> changes incorporated into the upstream OpenSSL version of the
> 'perlasm' .pl file.
> 
> This achieves a number of things:
> - we get a readable version of the code in the kernel tree,
> - our changes are reviewed upstream
> - upgrading involves grabbing the latest .pl file rather than merging
> generated code (which requires careful review)
> - GPLv2 permission is made explicit, rather than something someone
> claims to have reached agreement on,
> - no legal ambiguity whether the output of the perl script is covered
> by the license (which is what we incorporate here)
> 
> Note that the 'available under GPL depending on where you obtained the
> code' in the CRYPTOGAMS license likely conflicts with the GPL itself,
> but I am not a lawyer so I'd much prefer having the upstream copy
> mention this explicitly.

First, note that Jason is proposing adding this exact same .S file as part of
his new "zinc" cryptography library, along with 7 other OpenSSL .S files.  So it
may really be him you need to convince.

But yes, I don't really like the approach of just including the .S output of the
.pl script either, as it loses semantic information that was in the .pl script.
Ideally the source should either be the .pl script, or else a real hand written
.S file with the proper comments and macros to make it readable -- not something
in-between.

Getting the license clarification and possibly other changes upstream is a good
idea too.  I noticed, though, that the actual wording used in some files
upstream ("Permission to use under GPLv2 terms is granted") apparently still
isn't considered sufficient by some, so a separate clarification from Andy was
apparently still needed: see kernel commit c2e415fe75bbc83c1...

- Eric

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-07 Thread Jason A. Donenfeld

Hey Andy,

On Tue, Aug 7, 2018 at 12:43 PM Andy Lutomirski  wrote:
> For "zinc: add simd helper", I think it should be in include/linux,
> and include/linux/simd.h should (immediately or maybe in the future)
> include  to pick up arch-specific stuff.  And the patch
> should get sent to linux-a...@vger.kernel.org.

I guess you saw my prompt about that in the previous commit message?
Based on your encouragement, I implemented it:
https://git.zx2c4.com/linux-dev/commit/?h=simd This is _far_ more
invasive than I wanted to be, as I don't want this patch submission to
grow unwieldy and never be merged, but I guess we can roll with this
for now...

> In your blake2s_arch() implementation, you're not passing in a
> simd_context_t.  Is that still a work in progress?  I thought the plan
> was to pass it in rather than doing the check in the _arch()
> functions.

I'm inclined to do the explicit context passing only when a function
is likely to be used in that kind of environment, and adjust as
needed. Long term, anyway, that API will be removed once the x86 guys
figure out lazy FPU restoration and the amortization doesn't add
anything.

Jason

Re: [RFC PATCH 3/9] crypto: chacha20-generic - refactor to allow varying number of rounds

On Tue, Aug 07, 2018 at 02:51:21PM -0700, Eric Biggers wrote:
> On Tue, Aug 07, 2018 at 11:21:04AM +0100, Samuel Neves wrote:
> > > The best attack on ChaCha breaks 7 rounds, and that attack requires 2^248 
> > > operations.
> > 
> > This number, as far as I can tell, comes from the "New features of
> > Latin dances" paper. There have been some minor improvements in the
> > intervening 10 years, e.g., [1, 2, 3, 4], which pull back the
> > complexity of breaking ChaCha7 down to 2^235. In any case, every
> > attack so far appears to hit a wall at 8 rounds, with 12 rounds---the
> > recommended eSTREAM round number for Salsa20---seeming to offer a
> > reasonable security margin, still somewhat better than that of the
> > AES.
> > 
> > Best regards,
> > Samuel Neves
> > 
> > [1] https://eprint.iacr.org/2015/698
> > [2] https://eprint.iacr.org/2015/217
> > [3] https://eprint.iacr.org/2016/1034
> > [4] https://doi.org/10.1016/j.dam.2017.04.034
> 
> Thanks Samuel, I'll fix that number in the next iteration of the patchset.
> 

Oops, sorry, for some reason I thought you had quoted one of my commit messages,
but it was actually Paul's email.  I did mention in "crypto: chacha - add
XChaCha12 support" that "the best known attack on ChaCha makes it through only 7
rounds", but I didn't specify the complexity.

- Eric

Re: [RFC PATCH 0/7] A General Accelerator Framework, WarpDrive

2018-08-07 Thread Kenneth Lee





在 2018年08月06日 星期一 11:32 下午, Jerome Glisse 写道:

On Mon, Aug 06, 2018 at 11:12:52AM +0800, Kenneth Lee wrote:

On Fri, Aug 03, 2018 at 10:39:44AM -0400, Jerome Glisse wrote:

On Fri, Aug 03, 2018 at 11:47:21AM +0800, Kenneth Lee wrote:

On Thu, Aug 02, 2018 at 10:22:43AM -0400, Jerome Glisse wrote:

On Thu, Aug 02, 2018 at 12:05:57PM +0800, Kenneth Lee wrote:

On Thu, Aug 02, 2018 at 02:33:12AM +, Tian, Kevin wrote:

On Wed, Aug 01, 2018 at 06:22:14PM +0800, Kenneth Lee wrote:

[...]


But doorbell is just a notification. Except for DOS (to make hardware busy) it
cannot actually take or change anything from the kernel space. And the DOS
problem can be always taken as the problem that a group of processes share the
same kernel entity.

In the coming HIP09 hardware, the doorbell will come with a random number so
only the process who allocated the queue can knock it correctly.

When doorbell is ring the hardware start fetching commands from
the queue and execute them ? If so than a rogue process B might
ring the doorbell of process A which would starts execution of
random commands (ie whatever random memory value there is left
inside the command buffer memory, could be old commands i guess).

If this is not how this doorbell works then, yes it can only do
a denial of service i guess. Issue i have with doorbell is that
i have seen 10 differents implementations in 10 differents hw
and each are different as to what ringing or value written to the
doorbell does. It is painfull to track what is what for each hw.


In our implementation, doorbell is simply a notification, just like an interrupt
to the accelerator. The command is all about what's in the queue.

I agree that there is no simple and standard way to track the shared IO space.
But I think we have to trust the driver in some way. If the driver is malicious,
even a simple ioctl can become an attack.

Trusting kernel space driver is fine, trusting user space driver is
not in my view. AFAICT every driver developer so far always made
sure that someone could not abuse its device to do harmfull thing to
other process.


Fully agree. That is why this driver shares only the doorbell space. There is
only the doorbell is shared in the whole page, nothing else.

Maybe you are concerning the user driver will give malicious command to the
hardware? But these commands cannot influence the other process. If we can trust
the hardware design, the process cannot do any harm.

My questions was what happens if a process B ring the doorbell of
process A.

On some hardware the value written in the doorbell is use as an
index in command buffer. On other it just wakeup the hardware to go
look at a structure private to the process. They are other variations
of those themes.

If it is the former ie the value is use to advance in the command
buffer then a rogue process can force another process to advance its
command buffer and what is in the command buffer can be some random
old memory values which can be more harmfull than just Denial Of
Service.


Yes. We have considered that. There is no other information in the 
doorbell. The indexes, such as head and tail pointers, are all in the 
shared memory between the hardware and the user process. The other 
process cannot touch it.



My more general question is do we want to grow VFIO to become
a more generic device driver API. This patchset adds a command
queue concept to it (i don't think it exist today but i have
not follow VFIO closely).


The thing is, VFIO is the only place to support DMA from user land. If we don't
put it here, we have to create another similar facility to support the same.

No it is not, network device, GPU, block device, ... they all do
support DMA. The point i am trying to make here is that even in

Sorry, wait a minute, are we talking the same thing? I meant "DMA from user
land", not "DMA from kernel driver". To do that we have to manipulate the
IOMMU(Unit). I think it can only be done by default_domain or vfio domain. Or
the user space have to directly access the IOMMU.

GPU do DMA in the sense that you pass to the kernel a valid
virtual address (kernel driver do all the proper check) and
then you can use the GPU to copy from or to that range of
virtual address. Exactly how you want to use this compression
engine. It does not rely on SVM but SVM going forward would
still be the prefered option.


No, SVM is not the reason why we rely on Jean's SVM(SVA) series. We rely on
Jean's series because of multi-process (PASID or substream ID) support.

But of couse, WarpDrive can still benefit from the SVM feature.

We are getting side tracked here. PASID/ID do not require VFIO.


Yes, PASID itself do not require VFIO. But what if:

1. Support DMA from user space.
2. The hardware makes use of standard IOMMU/SMMU for IO address translation.
3. The IOMMU facility is shared by both kernel and user drivers.
4. Support PASID with the current IOMMU facility

your mechanisms the userspace must have a specific userspa

Re: [RFC PATCH 3/7] vfio: add spimdev support

2018-08-07 Thread Kenneth Lee





在 2018年08月07日 星期二 01:05 上午, Alex Williamson 写道:

On Mon, 6 Aug 2018 09:34:28 -0700
"Raj, Ashok"  wrote:


On Mon, Aug 06, 2018 at 09:49:40AM -0600, Alex Williamson wrote:

On Mon, 6 Aug 2018 09:40:04 +0800
Kenneth Lee  wrote:

1. It supports thousands of processes. Take zip accelerator as an example, any
application need data compression/decompression will need to interact with the
accelerator. To support that, you have to create tens of thousands of mdev for
their usage. I don't think it is a good idea to have so many devices in the
system.

Each mdev is a device, regardless of whether there are hardware
resources committed to the device, so I don't understand this argument.


2. The application does not want to own the mdev for long. It just need an
access point for the hardware service. If it has to interact with an management
agent for allocation and release, this makes the problem complex.

I don't see how the length of the usage plays a role here either.  Are
you concerned that the time it takes to create and remove an mdev is
significant compared to the usage time?  Userspace is certainly welcome
to create a pool of devices, but why should it be the kernel's
responsibility to dynamically assign resources to an mdev?  What's the
usage model when resources are unavailable?  It seems there's
complexity in either case, but it's generally userspace's responsibility
to impose a policy.
   

Can vfio dev's created representing an mdev be shared between several
processes?  It doesn't need to be exclusive.

The path to hardware is established by the processes binding to SVM and
IOMMU ensuring that the PASID is plummed properly.  One can think the
same hardware is shared between several processes, hardware knows the
isolation is via the PASID.

For these cases it isn't required to create a dev per process.

The iommu group is the unit of ownership, a vfio group mirrors an iommu
group, therefore a vfio group only allows a single open(2).  A group
also represents the minimum isolation set of devices, therefore devices
within a group are not considered isolated and must share the same
address space represented by the vfio container.  Beyond that, it is
possible to share devices among processes, but (I think) it generally
implies a hierarchical rather than peer relationship between
processes.  Thanks,
Actually, this is the key problem we concerned. Our logic was: The PASID 
refer to the connection between the device and the process. So the 
resource should be allocated only when the process "make use of" the 
device. This strategy also bring another advantage that the kernel 
driver can also make use of the resource if no user application open it.


We do have another branch that allocate resource to mdev directly. It 
looks not so nice (many mdevs and user agent is required for resource 
management). If the conclusion here is to keep the mdev's original 
semantics, we will send that branch for discussion in next RFC.


Cheers
Kenneth


Alex

Re: [RFC PATCH 1/7] vfio/spimdev: Add documents for WarpDrive framework

2018-08-07 Thread Kenneth Lee





在 2018年08月06日 星期一 08:27 下午, Pavel Machek 写道:

Hi!


WarpDrive is a common user space accelerator framework.  Its main component
in Kernel is called spimdev, Share Parent IOMMU Mediated Device. It exposes

spimdev is really unfortunate name. It looks like it has something to do with 
SPI, but
it does not.


Yes. Let me change it to Share (IOMMU) Domain MDev, SDMdev:)

+++ b/Documentation/warpdrive/warpdrive.rst
@@ -0,0 +1,153 @@
+Introduction of WarpDrive
+=
+
+*WarpDrive* is a general accelerator framework built on top of vfio.
+It can be taken as a light weight virtual function, which you can use without
+*SR-IOV* like facility and can be shared among multiple processes.
+
+It can be used as the quick channel for accelerators, network adaptors or
+other hardware in user space. It can make some implementation simpler.  E.g.
+you can reuse most of the *netdev* driver and just share some ring buffer to
+the user space driver for *DPDK* or *ODP*. Or you can combine the RSA
+accelerator with the *netdev* in the user space as a Web reversed proxy, etc.

What is DPDK? ODP?

DPDK：https://www.dpdk.org/about/
ODP: https://www.opendataplane.org/

will add the reference in the next RFC



+How does it work
+
+
+*WarpDrive* takes the Hardware Accelerator as a heterogeneous processor which
+can share some load for the CPU:
+
+.. image:: wd.svg
+:alt: This is a .svg image, if your browser cannot show it,
+try to download and view it locally
+
+So it provides the capability to the user application to:
+
+1. Send request to the hardware
+2. Share memory with the application and other accelerators
+
+These requirements can be fulfilled by VFIO if the accelerator can serve each
+application with a separated Virtual Function. But a *SR-IOV* like VF (we will
+call it *HVF* hereinafter) design is too heavy for the accelerator which
+service thousands of processes.

VFIO? VF? HVF?

Also "gup" might be worth spelling out.

But I think the reference [1] has explained this.



+References
+==
+.. [1] Accroding to the comment in in mm/gup.c, The *gup* is only safe within
+   a syscall.  Because it can only keep the physical memory in place
+   without making sure the VMA will always point to it. Maybe we should
+   raise the VM_PINNED patchset (see
+   https://lists.gt.net/linux/kernel/1931993) again to solve this probl


I went through the docs, but I still don't know what it does.

Will refine the doc in next RFC, hope it will help.


Pavel

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-07 Thread Andy Lutomirski




> On Aug 7, 2018, at 4:48 PM, Jason A. Donenfeld  wrote:
> 
> Hey Andy,
> 
>> On Tue, Aug 7, 2018 at 12:43 PM Andy Lutomirski  wrote:
>> For "zinc: add simd helper", I think it should be in include/linux,
>> and include/linux/simd.h should (immediately or maybe in the future)
>> include  to pick up arch-specific stuff.  And the patch
>> should get sent to linux-a...@vger.kernel.org.
> 
> I guess you saw my prompt about that in the previous commit message?
> Based on your encouragement, I implemented it:
> https://git.zx2c4.com/linux-dev/commit/?h=simd This is _far_ more
> invasive than I wanted to be, as I don't want this patch submission to
> grow unwieldy and never be merged, but I guess we can roll with this
> for now...
> 

I really wish we had a way to see that we use asm-generic’s copy of a header in 
all cases except where an arch opts out.

>> In your blake2s_arch() implementation, you're not passing in a
>> simd_context_t.  Is that still a work in progress?  I thought the plan
>> was to pass it in rather than doing the check in the _arch()
>> functions.
> 
> I'm inclined to do the explicit context passing only when a function
> is likely to be used in that kind of environment, and adjust as
> needed. Long term, anyway, that API will be removed once the x86 guys
> figure out lazy FPU restoration and the amortization doesn't add
> anything.

Fair enough.

> 
> Jason

Re: [PATCH v1 2/3] zinc: Introduce minimal cryptography library

2018-08-07 Thread Jason A. Donenfeld

On Tue, Aug 7, 2018 at 6:49 PM Andy Lutomirski  wrote:
> I really wish we had a way to see that we use asm-generic’s copy of a header 
> in all cases except where an arch opts out.

It's really not that hard to do -- symlink asm-generic to a target
called "asm" inside an otherwise empty directory, and add that
otherwise empty directory to the -I paths just after arch/include.
Since it's searched second, it's only used if the first fails. Maybe
I'm missing something though, as this seems a bit too obvious. Perhaps
a project for another day.

Re: [PATCH 1/9] crypto: add zbufsize() interface