Re: [PATCH 5/7] random: replace non-blocking pool with a Chacha20-based CRNG

2016-06-26 Thread Pavel Machek
Hi!

> Yes, I understand the argument that the networking stack is now
> requiring the crypto layer --- but not all IOT devices may necessarily
> require the IP stack (they might be using some alternate wireless
> communications stack) and I'd much rather not make things worse.
> 
> 
> The final thing is that it's not at all clear that the accelerated
> implementation is all that important anyway.  Consider the following
> two results using the unaccelerated ChaCha20:
> 
> % dd if=/dev/urandom bs=4M count=32 of=/dev/null
> 32+0 records in
> 32+0 records out
> 134217728 bytes (134 MB, 128 MiB) copied, 1.18647 s, 113 MB/s
> 
> % dd if=/dev/urandom bs=32 count=4194304 of=/dev/null
> 4194304+0 records in
> 4194304+0 records out
> 134217728 bytes (134 MB, 128 MiB) copied, 7.08294 s, 18.9 MB/s
> 
> So in both cases, we are reading 128M from the CRNG.  In the first
> case, we see the sort of speed we would get if we were using the CRNG
> for some illegitimate, such as "dd if=/dev/urandom of=/dev/sdX bs=4M"
> (because they were too lazy to type "apt-get install nwipe").
> 
> In the second case, we see the use of /dev/urandom in a much more
> reasonable, proper, real-world use case for /de/urandom, which is some
> userspace process needing a 256 bit session key for a TLS connection,
> or some such.  In this case, we see that the other overheads of
> providing the anti-backtracking protection, system call overhead,
> etc., completely dominate the speed of the core crypto primitive.
> 
> So even if the AVX optimized is 100% faster than the generic version,
> it would change the time needed to create a 256 byte session key from
> 1.68 microseconds to 1.55 microseconds.  And this is ignoring the

Ok, so lets say I'm writing some TLS server, and I know that traffic
is currently heavy because it was heavy in last 5 minutes. Would it
make sense for me to request 128M of randomness from /dev/urandom, and
then use that internally, to avoid the syscall overhead?

Ok, maybe 128M is a bit much because by requesting that much in single
request i'd turn urandom into PRNG, but perhaps 1MB block makes sense?

And I guess even requesting 128M would make sense, as kernel can
select best crypto implementation for CRNG, and I'd prefer to avoid
that code in my application as it is hardware-specific...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] random: add backtracking protection to the CRNG

2016-06-26 Thread Pavel Machek
On Mon 2016-06-13 11:48:39, Theodore Ts'o wrote:
> Signed-off-by: Theodore Ts'o 
> ---
>  drivers/char/random.c | 54 
> ++-
>  1 file changed, 49 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index d640865..963a6a9 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -436,7 +436,8 @@ static int crng_init = 0;
>  #define crng_ready() (likely(crng_init > 0))
>  static void _extract_crng(struct crng_state *crng,
> __u8 out[CHACHA20_BLOCK_SIZE]);
> -static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE]);
> +static void _crng_backtrack_protect(struct crng_state *crng,
> + __u8 tmp[CHACHA20_BLOCK_SIZE], int used);
>  static void process_random_ready_list(void);

You can do u8 and u32 in the code, we know we are in kernel.

> +/*
> + * Use the leftover bytes from the CRNG block output (if there is
> + * enough) to mutate the CRNG key to provide backtracking protection.
> + */
> +static void _crng_backtrack_protect(struct crng_state *crng,
> + __u8 tmp[CHACHA20_BLOCK_SIZE], int used)
> +{
> + unsigned long   flags;
> + __u32   *s, *d;
> + int i;
> +
> + used = round_up(used, sizeof(__u32));
> + if (used + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) {
> + extract_crng(tmp);
> + used = 0;
> + }
> + spin_lock_irqsave(&crng->lock, flags);
> + s = (__u32 *) &tmp[used];
> + d = &crng->state[4];
> + for (i=0; i < 8; i++)
> + *d++ ^= *s++;
> + spin_unlock_irqrestore(&crng->lock, flags);
> +}

You are basically trying to turn CRNG into one way hash function here,
right? Do you have any explanation that it has the required
properties?

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] random: replace non-blocking pool with a Chacha20-based CRNG

2016-06-26 Thread Stephan Mueller
Am Sonntag, 26. Juni 2016, 20:47:43 schrieb Pavel Machek:

Hi Pavel,

> Hi!
> 
> > Yes, I understand the argument that the networking stack is now
> > requiring the crypto layer --- but not all IOT devices may necessarily
> > require the IP stack (they might be using some alternate wireless
> > communications stack) and I'd much rather not make things worse.
> > 
> > 
> > The final thing is that it's not at all clear that the accelerated
> > implementation is all that important anyway.  Consider the following
> > two results using the unaccelerated ChaCha20:
> > 
> > % dd if=/dev/urandom bs=4M count=32 of=/dev/null
> > 32+0 records in
> > 32+0 records out
> > 134217728 bytes (134 MB, 128 MiB) copied, 1.18647 s, 113 MB/s
> > 
> > % dd if=/dev/urandom bs=32 count=4194304 of=/dev/null
> > 4194304+0 records in
> > 4194304+0 records out
> > 134217728 bytes (134 MB, 128 MiB) copied, 7.08294 s, 18.9 MB/s
> > 
> > So in both cases, we are reading 128M from the CRNG.  In the first
> > case, we see the sort of speed we would get if we were using the CRNG
> > for some illegitimate, such as "dd if=/dev/urandom of=/dev/sdX bs=4M"
> > (because they were too lazy to type "apt-get install nwipe").
> > 
> > In the second case, we see the use of /dev/urandom in a much more
> > reasonable, proper, real-world use case for /de/urandom, which is some
> > userspace process needing a 256 bit session key for a TLS connection,
> > or some such.  In this case, we see that the other overheads of
> > providing the anti-backtracking protection, system call overhead,
> > etc., completely dominate the speed of the core crypto primitive.
> > 
> > So even if the AVX optimized is 100% faster than the generic version,
> > it would change the time needed to create a 256 byte session key from
> > 1.68 microseconds to 1.55 microseconds.  And this is ignoring the
> 
> Ok, so lets say I'm writing some TLS server, and I know that traffic
> is currently heavy because it was heavy in last 5 minutes. Would it
> make sense for me to request 128M of randomness from /dev/urandom, and
> then use that internally, to avoid the syscall overhead?
> 
> Ok, maybe 128M is a bit much because by requesting that much in single
> request i'd turn urandom into PRNG, but perhaps 1MB block makes sense?

I would definitely say that this is appropriate when you use the data stream 
from urandom directly as keying material.
> 
> And I guess even requesting 128M would make sense, as kernel can
> select best crypto implementation for CRNG, and I'd prefer to avoid
> that code in my application as it is hardware-specific...

The code in the server should only make sure it memset(0) any of the random 
data immediately after use. This way even when the server dumps core, all you 
see is *unused* random data which does not hurt.

Ciao
Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] random: replace non-blocking pool with a Chacha20-based CRNG

2016-06-26 Thread Theodore Ts'o
On Sun, Jun 26, 2016 at 08:47:43PM +0200, Pavel Machek wrote:
> Ok, so lets say I'm writing some TLS server, and I know that traffic
> is currently heavy because it was heavy in last 5 minutes. Would it
> make sense for me to request 128M of randomness from /dev/urandom, and
> then use that internally, to avoid the syscall overhead?

Probably not.  Keep in mind that even requesting a 256 bit key one at
a time, it's still only taking 1.7 microseconds.  The time to do the
Diffie-Hellman will vary depending on whether you are doing DH in a
log field or a ECC field, but whether it's around 18-20ms or 3-4ms,
it's over *three* orders of magnitude more than the time it takes to
generate a random session key.

So trying to grab 1M of randomness and managing it in your TLS server
is almost certainly optimizing for The Wrong Thing.  (Not to mention
the potential for disaster if that randomness gets stolen via some
kind of wild pointer bug, etc., etc.)

This is why I think it's a waste of time talknig about trying to use
AVX or SIMD optimized version of ChaCha20.  Even if the cost to do the
XSAVE / XRESTORE doesn't crush the optimization of advantages of
ChaCha20, and if you ignore the costs of adding backtracking defenses,
etc., bloating the kernel by adding support for the crypto layer
doesn't make sense when using the generic ChaCha20 already gets you
down into the microsecond arena.  You might be able to get the session
key generation down by another .3 or .4 microseconds, but even if you
cut it down by half to .9 microseconds, the public key operations and
the diffie-hellman operations are going to make such savings be almost
completely unnoticeable.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] random: add backtracking protection to the CRNG

2016-06-26 Thread Theodore Ts'o
On Sun, Jun 26, 2016 at 08:47:53PM +0200, Pavel Machek wrote:
> 
> You are basically trying to turn CRNG into one way hash function here,
> right? Do you have any explanation that it has the required
> properties?

Well, not really.  A CRNG has the property that if you generate a
series of outputs: O_N-1, O_N, O_N+1, etc., knowledge of O_N does not
give you any special knowledge with respect to O_N+1 or O_N-1.

The anti-backtracking protection means that when we generate O_N, we
use O_N+1 to mutate the state used for the CRNG; specifically, we are
XOR'ing O_N+1 into the state.  Now let's suppose that state gets
exposed.  Even if you know O_N, that's not going to let you know
O_N+1, so knowledge of the exposed state post XOR with O_N+1 isn't
going to help you get back the original state.

More generally, if we assume ChaCha20 is secure, that means that you
can't derive the key even if you have known plaintext.  The output of
the CRNG is basically the keystream --- what you have after you XOR
the ciphertext with the plaintext.  If ChaCha20 is secure, knowledge
of large portions of the keystream should not help you determine the
key, which means is why knowledge of O_N-1, O_N, won't help you derive
either (a) the state of CRNG, aka the ChaCha20 key, or (b) O_N+1.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 06/27] crypto: ahash: increase the maximum allowed statesize

2016-06-26 Thread Tero Kristo

On 24/06/16 13:32, Herbert Xu wrote:

On Wed, Jun 22, 2016 at 04:23:39PM +0300, Tero Kristo wrote:

The statesize is used to determine the maximum size for saved ahash
context. In some cases, this can be much larger than what is currently
allocated for it, for example omap-sham driver uses a buffer size of
PAGE_SIZE. Increase the statesize to accommodate this.

Signed-off-by: Tero Kristo 


Nack.  The exported state is supposed to consist of the actual
hash state, plus at most one block worth of unhashed data.  It's
limited so that we can store it on the stack.

So no I'm not taking this patch.



Ok, I think I need to allocate the storage space locally then within the 
driver. Would it be ok to call kmalloc / free in the export / import 
implementation of the driver? The size of the unhashed buffer in 
omap-sham is unfortunately rather large.


-Tero
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 06/27] crypto: ahash: increase the maximum allowed statesize

2016-06-26 Thread Herbert Xu
On Mon, Jun 27, 2016 at 07:58:43AM +0300, Tero Kristo wrote:
>
> Ok, I think I need to allocate the storage space locally then within
> the driver. Would it be ok to call kmalloc / free in the export /
> import implementation of the driver? The size of the unhashed buffer
> in omap-sham is unfortunately rather large.

The allocation should usually be done from the request_alloc
function, i.e., you set the reqsize and the user does the allocation
for you.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 05/27] crypto: omap-sham: avoid executing tasklet where not needed

2016-06-26 Thread Tero Kristo

On 24/06/16 13:30, Herbert Xu wrote:

On Wed, Jun 22, 2016 at 04:23:38PM +0300, Tero Kristo wrote:

Some of the call paths of OMAP SHA driver can avoid executing the next
step of the crypto queue under tasklet; instead, execute the next step
directly via function call. This avoids a costly round-trip via the
scheduler giving a slight performance boost.

Signed-off-by: Tero Kristo 
---
  drivers/crypto/omap-sham.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 6247887..84a0027 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -242,6 +242,8 @@ static struct omap_sham_drv sham = {
.lock = __SPIN_LOCK_UNLOCKED(sham.lock),
  };

+static void omap_sham_done_task(unsigned long data);
+
  static inline u32 omap_sham_read(struct omap_sham_dev *dd, u32 offset)
  {
return __raw_readl(dd->io_base + offset);
@@ -1007,7 +1009,7 @@ static void omap_sham_finish_req(struct ahash_request 
*req, int err)
req->base.complete(&req->base, err);

/* handle new request */
-   tasklet_schedule(&dd->done_task);
+   omap_sham_done_task((unsigned long)dd);
  }


Hmm, what guarnatees that you won't run out of stack doing this?


Good question, I had a deeper look at the driver and it seems you are 
right... It may result in a recursion loop within the driver. I will dig 
this further and break the recursion if it indeed can call itself 
indefinitely.


My knowledge of the crypto stack (+ these drivers unfortunately) is 
still pretty limited, I appreciate your reviews a lot.


-Tero
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Crypto Fixes for 4.7

2016-06-26 Thread Herbert Xu
Hi Linus:

This push fixes the following issue:

- Missing length check for user-space GETALG request.
- Bogus memmove length in ux500 driver.
- Incorrect priority setting for vmx driver.
- Incorrect ABI selection for vmx driver.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Anton Blanchard (2):
  crypto: vmx - Fix ABI detection
  crypto: vmx - Increase priority of aes-cbc cipher

Linus Walleij (1):
  crypto: ux500 - memmove the right size

Mathias Krause (1):
  crypto: user - re-add size check for CRYPTO_MSG_GETALG

 crypto/crypto_user.c  |1 +
 drivers/crypto/ux500/hash/hash_core.c |4 ++--
 drivers/crypto/vmx/aes_cbc.c  |2 +-
 drivers/crypto/vmx/aes_ctr.c  |2 +-
 drivers/crypto/vmx/ppc-xlate.pl   |2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html