Before I forget: While I believe this patch is okay, I make no
guarantees, promises, etc. If using it causes World War III, then
tough.
Shortly after our DHCP server was upgraded from pl18 to pl25, several
VxWorks clients (based on the WIDE implementation) started going nuts.
The client would repeat requests as soon as an ACK was received. The
problem is believed to be an interaction between some bad checksum code
in pl25/26 and some major brain damage on the part of the WIDE client
(it shouldn't retransmit the request as soon as it receives a bad
incoming packet).
A TSR has been opened with WRS regarding their client's behavior in this
situation.
At any rate, the included patch appears to fix the checksum problem with
the server. It was made against 2.0b1pl26.
The patch fixes the overflow problem and also normalizes the result (so
that a one's complement "negative zero"/0xffff is converted to
0--otherwise, the "if(wrapsum(...))" stuff will think the checksum is
wrong).
The original diagnosis from a previous email:
> I think I've found a problem... Consider the function "wrapsum()"
> (in common/packet.c--I removed the debug #if's for clarity):
>
> 1: u_int32_t wrapsum (sum)
> 2: u_int32_t sum;
> 3: {
> 4: while (sum > 0x10000) {
> 5: sum = (sum >> 16) + (sum & 0xFFFF);
> 6: sum += (sum >> 16);
> 7: }
> 8: sum = sum ^ 0xFFFF;
> 9:
> 10: return htons(sum);
> 11: }
>
> If one calls this function with 0x4fffc, the test on line 4 will be
> true. After line 5, sum=0x10000. Line 6 wraps the carry resulting in
> sum=0x10001. The while() on line 4 is again true, so the same carry is
> again wrapped (after which, sum=0x0002).
>
> This 15-byte packet should reproduce the above example:
> E1 9B 08 0E 3C 8C F5 96 96 39 65 19 FE DF EA
>
> Removing line 6 fixes this problem. Also, the test on line 4 should
> probably be "while(sum >= 0x10000) {" (better yet, "while(sum >> 16) {"
> would give the compiler's CSE optimizer something to play with).
packet.c.diff.gz