On Thu, Aug 09, 2018 at 08:38:56PM +0200, Stephan Müller wrote:
> The function extract_crng invokes the ChaCha20 block operation directly
> on the user-provided buffer. The block operation operates on u32 words.
> Thus the extract_crng function expects the buffer to be aligned to u32
> as it is visible with the parameter type of extract_crng. However,
> get_random_bytes uses a void pointer which may or may not be aligned.
> Thus, an alignment check is necessary and the temporary buffer must be
> used if the alignment to u32 is not ensured.
> 
> Cc: <sta...@vger.kernel.org> # v4.16+
> Cc: Ted Tso <ty...@mit.edu>
> Signed-off-by: Stephan Mueller <smuel...@chronox.de>
> ---
>  drivers/char/random.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index bd449ad52442..23f336872426 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1617,8 +1617,14 @@ static void _get_random_bytes(void *buf, int nbytes)
>       trace_get_random_bytes(nbytes, _RET_IP_);
>  
>       while (nbytes >= CHACHA20_BLOCK_SIZE) {
> -             extract_crng(buf);
> -             buf += CHACHA20_BLOCK_SIZE;
> +             if (likely((unsigned long)buf & (sizeof(tmp[0]) - 1))) {
> +                     extract_crng(buf);
> +                     buf += CHACHA20_BLOCK_SIZE;
> +             } else {
> +                     extract_crng(tmp);
> +                     memcpy(buf, tmp, CHACHA20_BLOCK_SIZE);
> +             }
> +
>               nbytes -= CHACHA20_BLOCK_SIZE;
>       }
>  
> -- 
> 2.17.1

This patch is backwards: the temporary buffer is used when the buffer is
*aligned*, not misaligned.  And more problematically, 'buf' is never incremented
in one of the cases...

Note that I had tried to fix the chacha20_block() alignment bugs in commit
9f480faec58cd6197a ("crypto: chacha20 - Fix keystream alignment for
chacha20_block()"), but I had missed this case.  I don't like seeing the
alignment requirement being worked around with a temporary buffer; it's
error-prone, and inefficient on common platforms.  How about we instead make the
output of chacha20_block() a u8 array and output the 16 32-bit words using
put_unaligned_le32()?  In retrospect I probably should have just done that, but
at the time I didn't know of any case where the alignment would be a problem.

- Eric

Reply via email to