The readhash() function in diff(1) uses a signed int for the
hash accumulator.  The computation `sum = sum * 127 + t` overflows
after about 5 characters on any line containing high byte values.

Trace with input "3\xd9100\n" compared against /dev/null:

  sum = 1
  t = '3' (51):   sum = 1 * 127 + 51       =        178
  t = 0xd9 (217): sum = 178 * 127 + 217     =     22,823
  t = '1' (49):   sum = 22823 * 127 + 49    =  2,898,570
  t = '0' (48):   sum = 2898570 * 127 + 48  = 368,118,438
  t = '0' (48):   sum = 368118438 * 127 + 48  -> overflow (> INT_MAX)

Signed integer overflow is undefined behavior in C.  Since sum is
only used as a hash value for equality comparison and sorting, the
fix is to make it unsigned int so the multiplication wraps
deterministically.

Found by AFL++ fuzzing with UBSan.

Index: usr.bin/diff/diffreg.c
===================================================================
RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
retrieving revision 1.95
diff -u -p -r1.95 diffreg.c
--- usr.bin/diff/diffreg.c      24 Oct 2021 21:24:16 -0000      1.95
+++ usr.bin/diff/diffreg.c
@@ -1177,7 +1177,7 @@ static int
 readhash(FILE *f, int flags)
 {
        int i, t, space;
-       int sum;
+       unsigned int sum;

        sum = 1;
        space = 0;

Reply via email to