Renaud Allard <[email protected]> wrote:
> The readhash() function in diff(1) uses a signed int for the
> hash accumulator. The computation `sum = sum * 127 + t` overflows
> after about 5 characters on any line containing high byte values.
>
> Trace with input "3\xd9100\n" compared against /dev/null:
>
> sum = 1
> t = '3' (51): sum = 1 * 127 + 51 = 178
> t = 0xd9 (217): sum = 178 * 127 + 217 = 22,823
> t = '1' (49): sum = 22823 * 127 + 49 = 2,898,570
> t = '0' (48): sum = 2898570 * 127 + 48 = 368,118,438
> t = '0' (48): sum = 368118438 * 127 + 48 -> overflow (> INT_MAX)
>
> Signed integer overflow is undefined behavior in C. Since sum is
> only used as a hash value for equality comparison and sorting, the
> fix is to make it unsigned int so the multiplication wraps
> deterministically.
>
> Found by AFL++ fuzzing with UBSan.
makes sense to me; okay op@
> Index: usr.bin/diff/diffreg.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
> retrieving revision 1.95
> diff -u -p -r1.95 diffreg.c
> --- usr.bin/diff/diffreg.c 24 Oct 2021 21:24:16 -0000 1.95
> +++ usr.bin/diff/diffreg.c
> @@ -1177,7 +1177,7 @@ static int
> readhash(FILE *f, int flags)
> {
> int i, t, space;
> - int sum;
> + unsigned int sum;
>
> sum = 1;
> space = 0;