The readhash() function in diff(1) uses a signed int for the
hash accumulator. The computation `sum = sum * 127 + t` overflows
after about 5 characters on any line containing high byte values.
Trace with input "3\xd9100\n" compared against /dev/null:
sum = 1
t = '3' (51): sum = 1 * 127 + 51 = 178
t = 0xd9 (217): sum = 178 * 127 + 217 = 22,823
t = '1' (49): sum = 22823 * 127 + 49 = 2,898,570
t = '0' (48): sum = 2898570 * 127 + 48 = 368,118,438
t = '0' (48): sum = 368118438 * 127 + 48 -> overflow (> INT_MAX)
Signed integer overflow is undefined behavior in C. Since sum is
only used as a hash value for equality comparison and sorting, the
fix is to make it unsigned int so the multiplication wraps
deterministically.
Found by AFL++ fuzzing with UBSan.
Index: usr.bin/diff/diffreg.c
===================================================================
RCS file: /cvs/src/usr.bin/diff/diffreg.c,v
retrieving revision 1.95
diff -u -p -r1.95 diffreg.c
--- usr.bin/diff/diffreg.c 24 Oct 2021 21:24:16 -0000 1.95
+++ usr.bin/diff/diffreg.c
@@ -1177,7 +1177,7 @@ static int
readhash(FILE *f, int flags)
{
int i, t, space;
- int sum;
+ unsigned int sum;
sum = 1;
space = 0;