* H. Peter Anvin <h...@zytor.com> wrote:

> For those who have read the Google+ thread[1] it is pretty 
> clear that there are varying opinions on the idea of removing 
> the RDRAND bypass.
> 
> I have gathered some performance numbers to make the debate 
> more concrete: RDRAND is between 12 and 15 times faster than 
> the current random pool system (for large and small blocks, 
> respectively.)  Both the pool system and RDRAND scale 
> perfectly with frequency, so the ratio is independent of 
> P-states.
> 
> Given the discrepancy in performance (and presumably, in terms 
> of power) I still very much believe it is a mistake to 
> unconditionally disallow users the option for using RDRAND 
> directly, but what I *do* believe we can all agree on is that 
> security is paramount.  Dropping RDRAND is not just a 
> performance loss but is likely a security loss since it will 
> produce substantially less entropy.
> 
> As a compromise I offer the following patch; in terms of 
> performance it is "the worst of both worlds" but it should 
> provide the combined security of either; even if RDRAND is 
> completely compromised by the NSA, Microsoft and the 
> Illuminati all at once it will do no worse than the existing 
> code, [...]

I should mention that since there's documented historic examples 
of our software random pool in drivers/char/random.c having bugs 
that no-one noticed, XOR-ing the RDRAND hardware stream to the 
software entropy pool's stream is the sensible thing to do, from 
a security point of view.

So your patch recovers the security lost via this recent patch 
in random.git:

  c2557a303ab6 random: add new get_random_bytes_arch() function

I don't think random.git should be sent upstream without 
addressing this Ivy Bridge security regression first.

Btw., the commit above has a very misleading title: it not just 
'adds' a function but the far more important change it does is 
that it removes RDRAND from the output stream.

The commit also made random number generation an order of 
magnitude slower.

> [...] and (since RDRAND is so much faster than the existing 
> code) it has only a modest performance cost.  More 
> realistically, it will let many more users take advantage of a 
> high entropy quick-reseeding random number generator, thus 
> ending up with a major gain in security.
>
> It is worth noting that although RDRAND by itself is adequate 
> for any in-kernel users (and the 3.4-3.5 kernels use them as 
> such unless you specify "nordrand"), this is not true for 
> /dev/random; nor, due to abuse, /dev/urandom; the recently 
> disclosed[2] RDSEED instruction, on the other hand, is defined 
> to be fully entropic and can be used for any purpose; that one 
> will be introduced in a later processor.
> 
> Note that the attached patch is way more conservative than it needs
> to be: every byte is mixed with RDRAND data twice on its way through
> (and an additional 1.2 byte is lost), as I moved the mixing to
> extract_buf(), but even so the overhead is modest, and mixing in
> extract_buf() makes the code quite a bit simpler.
> 
> This patch is on top of random.git.
> 
> 
> [1] https://plus.google.com/115124063126128475540/posts/KbAEJKMsAfq
> 
> [2] http://software.intel.com/file/45207
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 

> From b36c22b00c6bf8e91a758d3167e912b0ac4f0d0c Mon Sep 17 00:00:00 2001
> From: "H. Peter Anvin" <h...@linux.intel.com>
> Date: Tue, 24 Jul 2012 14:48:56 -0700
> Subject: [PATCH] random: mix in architectural randomness in extract_buf()
> 
> RDRAND is so much faster than the Linux pool system that we can
> always just mix in architectural randomness.
> 
> Doing this in extract_buf() lets us do this in one convenient
> place, unfortunately the output size (10 bytes) is maximally
> awkward.  That, plus the fact that every output byte will have
> passed through extract_buf() twice means we are not being very
> efficient with the RDRAND use.
> 
> Measurements show that RDRAND is 12-15 times faster than the Linux
> pool system.  Doing the math shows this corresponds to about an
> 11.5% slowdown which is confirmed by measurements.
> 
> Users who are very performance- or power-sensitive could definitely
> still benefit from being allowed to use RDRAND directly, but I
> believe this version should satisfy even the most hyper-paranoid
> crowd.
> 
> Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
> Cc: DJ Johnson <dj.john...@intel.com>
> ---
>  drivers/char/random.c |   56 
> ++++++++++++++++++++++++++++---------------------
>  1 file changed, 32 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 9793b40..a4a24e4 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -277,6 +277,8 @@
>  #define SEC_XFER_SIZE 512
>  #define EXTRACT_SIZE 10
>  
> +#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
> +
>  /*
>   * The minimum number of bits of entropy before we wake up a read on
>   * /dev/random.  Should be enough to do a significant reseed.
> @@ -813,11 +815,7 @@ static ssize_t extract_entropy(struct entropy_store *r, 
> void *buf,
>   */
>  static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
>  {
> -     union {
> -             __u32   tmp[OUTPUT_POOL_WORDS];
> -             long    hwrand[4];
> -     } u;
> -     int     i;
> +     __u32   tmp[OUTPUT_POOL_WORDS];
>  
>       if (r->pull && r->entropy_count < nbytes * 8 &&
>           r->entropy_count < r->poolinfo->POOLBITS) {
> @@ -828,23 +826,17 @@ static void xfer_secondary_pool(struct entropy_store 
> *r, size_t nbytes)
>               /* pull at least as many as BYTES as wakeup BITS */
>               bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
>               /* but never more than the buffer size */
> -             bytes = min_t(int, bytes, sizeof(u.tmp));
> +             bytes = min_t(int, bytes, sizeof(tmp));
>  
>               DEBUG_ENT("going to reseed %s with %d bits "
>                         "(%d of %d requested)\n",
>                         r->name, bytes * 8, nbytes * 8, r->entropy_count);
>  
> -             bytes = extract_entropy(r->pull, u.tmp, bytes,
> +             bytes = extract_entropy(r->pull, tmp, bytes,
>                                       random_read_wakeup_thresh / 8, rsvd);
> -             mix_pool_bytes(r, u.tmp, bytes, NULL);
> +             mix_pool_bytes(r, tmp, bytes, NULL);
>               credit_entropy_bits(r, bytes*8);
>       }
> -     kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
> -     for (i = 0; i < 4; i++)
> -             if (arch_get_random_long(&u.hwrand[i]))
> -                     break;
> -     if (i)
> -             mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
>  }
>  
>  /*
> @@ -901,15 +893,19 @@ static size_t account(struct entropy_store *r, size_t 
> nbytes, int min,
>  static void extract_buf(struct entropy_store *r, __u8 *out)
>  {
>       int i;
> -     __u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
> +     union {
> +             __u32 w[5];
> +             unsigned long l[LONGS(EXTRACT_SIZE)];
> +     } hash;
> +     __u32 workspace[SHA_WORKSPACE_WORDS];
>       __u8 extract[64];
>       unsigned long flags;
>  
>       /* Generate a hash across the pool, 16 words (512 bits) at a time */
> -     sha_init(hash);
> +     sha_init(hash.w);
>       spin_lock_irqsave(&r->lock, flags);
>       for (i = 0; i < r->poolinfo->poolwords; i += 16)
> -             sha_transform(hash, (__u8 *)(r->pool + i), workspace);
> +             sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);
>  
>       /*
>        * We mix the hash back into the pool to prevent backtracking
> @@ -920,14 +916,14 @@ static void extract_buf(struct entropy_store *r, __u8 
> *out)
>        * brute-forcing the feedback as hard as brute-forcing the
>        * hash.
>        */
> -     __mix_pool_bytes(r, hash, sizeof(hash), extract);
> +     __mix_pool_bytes(r, hash.w, sizeof(hash.w), extract);
>       spin_unlock_irqrestore(&r->lock, flags);
>  
>       /*
>        * To avoid duplicates, we atomically extract a portion of the
>        * pool while mixing, and hash one final time.
>        */
> -     sha_transform(hash, extract, workspace);
> +     sha_transform(hash.w, extract, workspace);
>       memset(extract, 0, sizeof(extract));
>       memset(workspace, 0, sizeof(workspace));
>  
> @@ -936,11 +932,23 @@ static void extract_buf(struct entropy_store *r, __u8 
> *out)
>        * pattern, we fold it in half. Thus, we always feed back
>        * twice as much data as we output.
>        */
> -     hash[0] ^= hash[3];
> -     hash[1] ^= hash[4];
> -     hash[2] ^= rol32(hash[2], 16);
> -     memcpy(out, hash, EXTRACT_SIZE);
> -     memset(hash, 0, sizeof(hash));
> +     hash.w[0] ^= hash.w[3];
> +     hash.w[1] ^= hash.w[4];
> +     hash.w[2] ^= rol32(hash.w[2], 16);
> +
> +     /*
> +      * If we have a architectural hardware random number
> +      * generator, mix that in, too.
> +      */
> +     for (i = 0; i < LONGS(EXTRACT_SIZE); i++) {
> +             unsigned long v;
> +             if (!arch_get_random_long(&v))
> +                     break;
> +             hash.l[i] ^= v;
> +     }
> +
> +     memcpy(out, hash.w, EXTRACT_SIZE);
> +     memset(&hash, 0, sizeof(hash));
>  }

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to