Excerpts from Alan Modra's message of September 4, 2020 2:59 am: > On Thu, Sep 03, 2020 at 11:02:50PM +0200, Iain Buclaw wrote: >> Excerpts from Alan Modra's message of September 3, 2020 3:01 pm: >> > Running the libiberty testsuite >> > ./test-demangle < libiberty/testsuite/d-demangle-expected >> > libiberty/d-demangle.c:214:14: runtime error: signed integer overflow: >> > 922337203 * 10 cannot be represented in type 'long int' >> > >> > On looking at silencing ubsan, I found a real bug in dlang_number. >> > For a 32-bit long, some overflows won't be detected. For example, >> > 21474836480. Why? Well 214748364 * 10 is 0x7FFFFFF8 (no overflow so >> > far). Adding 8 gives 0x80000000 (which does overflow but there is no >> > test for that overflow in the code). Then multiplying 0x80000000 * 10 >> > = 0x500000000 = 0 won't be caught by the multiplication overflow test. >> > The same holds for a 64-bit long using similarly crafted digit >> > sequences. >> > >> > This patch replaces the mod 10 test with a simpler limit test, and >> > similarly the mod 26 test in dlang_decode_backref. >> > >> > About the limit test: >> > val * 10 + digit > ULONG_MAX is the condition for overflow >> > ie. >> > val * 10 > ULONG_MAX - digit >> > or >> > val > (ULONG_MAX - digit) / 10 >> > or assuming the largest digit >> > val > (ULONG_MAX - 9) / 10 >> > >> > I resisted the aesthetic appeal of simplifying this further to >> > val > -10UL / 10 >> > since -1UL for ULONG_MAX is only correct for 2's complement numbers. >> > >> > Passes all the libiberty tests, on both 32-bit and 64-bit hosts. OK >> > to apply? >> > >> >> Thanks Alan, change seems reasonable, however on giving it a mull over, >> I see that the largest number that dlang_number would need to be able to >> handle is UINT_MAX. These two tests which decode a wchar value are >> representative of that (first is valid, second invalid). >> >> # >> --format=dlang >> _D4test21__T3funVwi4294967295Z3funFNaNbNiNfZv >> test.fun!('\Uffffffff').fun() >> # >> --format=dlang >> _D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv >> _D4test21__T3funVwi4294967296Z3funFNaNbNiNfZv > > Hmm, OK, on a 32-bit host those both won't be handled due to the > "(long) val < 0" test. > > Do you really want the last one to fail on a 64-bit host, ie. not > produce "test.fun!('\U100000000').fun()"? I'm guessing you do, but > I'd like that confirmed before posting a followup patch. >
Actually I meant dchar (char32_t in C++11), and yes, it should fail on a 64-bit host, as there is no such thing as a 64-bit character type. Though arguably, that could be handled by the caller in dlang_parse_integer. Looking at the places where dlang_number is used, I see: - Get length of next symbol (unlikely to be bigger than UCHAR_MAX). - Get length of embedded template symbol (can get very long, but rarely to the point where USHRT_MAX is exceeded). - Get number of elements in an encoded tuple, string, or array literal (anything bigger than INT_MAX not allowed at compile-time). - Get an encoded boolean value (0 or 1). - Get an encoded character value (max valid value is 0x0010FFFF, but as dchar's an unsigned 32-bit type, might as well allow up to UINT_MAX). So I think UINT_MAX is a reasonable artificial limit. >> I'm fine with creating a new PR and dealing with the above in a separate >> change though, as it will require a few more replacements to adjust the >> result parameter type to 'unsigned' or 'long long'. > > Ha! When I wrote the original patch I changed most of the "long" > parameters and variables to "unsigned long", because it was clear from > the code (and my understanding of name mangling) that the values were > in fact always unsigned. I also changed "int" to "size_t" where > appropriate, not that you need to handle strings larger than 2G in > length but simply that size_t is the correct type to use with string > functions, malloc, memcpy and so on. I decided to remove all those > changes before posting as they were just tidies, I thought. > Yeah, signedness is handled outside of dlang_number, so only need to treat all numbers as unsigned for the purpose of decoding. Apologies that you seem to have opened a can of worms here. :-) Iain.