Uri and I were talking about this offline. The list might find it useful, too.
I recall Wei and I worked on this a few years ago. Crypto++ had self tests failing under Intel's ICC (but passed under GCC, Clang, MSVC and Comeau). It turned out that ICC was ruthless and removed Undefined Behavior. So ICC was aggressively enforcing C/C++ language rules, while the other compiler were more tolerant in accepting illegal programs due to the shifts. So if you look at misc.h and say WTF???, this is why.... On Wed, Jul 22, 2015 at 6:52 PM, Jeffrey <[email protected]> wrote: >> We should check whether y==0, because y may well be unsigned. Right? > > I should also mention.... There's a FIX 0 built into the solution at > the design level: "shift by" unsigned values because negative shifts > are UB, too. > > The result is callers must be perform: > > int y = ... > x = rotl<word32>(x, static_cast<word32>(y)); > > So FIX 0 sidesteps the UB due to negative shifts and pushes the issue > into the caller. But it should not be a problem in practice because > the caller is the most capable to determine the strategy when dealing > with signed values. > > Jeff > > On Wed, Jul 22, 2015 at 6:37 PM, Jeffrey <[email protected]> wrote: >> On Wed, Jul 22, 2015 at 5:43 PM, Uri <[email protected]> wrote: >>> >>> CAST-128 validation suite running... >>> >>> passed 0123456712345678234567893456789A 0123456789ABCDEF >>> 238B4FE5847E44B2 >>> >>> Assertion failed: (y > 0), function rotlVariable, file misc.h, line 714. >>> >>> Abort trap: 6 >>> >> OK, so the issues with rotate is an exercise in why airplanes crash. >> Its not just one thing - its 2 or 3 subtle things interacting >> together. The full problem layers constant time requirements on top of >> C/C++ UB. >> >> Below is the pedigree... >> >> Because FIX 4 and 5 are the most efficient ones, and we need to figure >> out a way *always* use it, if possible. We should use it even when >> CRYPTOPP_DISABLE_ASM is defined because its so efficient and it avoids >> so many C/C++ UB land mines. >> >> Generally speaking, CRYPTOPP_DISABLE_ASM is intended to fix another >> problem. The problem is that of an ASM code block gone awry, like GCM >> mode. I don't believe its intended to deny a program the most >> efficient implementation of a rotate. >> >> If GCC provided an intrinsic for rotate, then we would enjoy the >> peaceful coexistence of an ASM rotate with CRYPTOPP_DISABLE_ASM. >> >> ***** >> >> Original: >> >> return (x << y) | (x >> (y - 32)); >> >> Its near constant time because all the instructions always execute. >> But its UB when Y > 31 (due to (x << y) ) and Y = 0 (due to (x >> (y - >> 32)). >> >> FIX 1: >> >> // Or y &= 31 (mask low 5-bits) >> y %= 32; >> return (x << y) | (x >> (y - 32)); >> >> Its near constant time because all the instructions always execute. >> But its UB when Y = 0 (due to (x >> (y - 32)). And we added an >> additional statement to the function, so we just lost 3 cycles. >> >> FIX 2: >> >> if y = 0 return x >> >> // Or y &= 31 (mask low 5-bits) >> y %= 32; >> (x << y) | (x >> (y - 32)); >> >> The function is no longer near constant time because it exits early. >> Its also well defined for all y. But it adds two instructions, so we >> lost at least 3 cycles and introduced a branch. >> >> FIX 3: >> >> // Or y &= 31 (mask low 5-bits) >> y %= 32; >> (x << y) | (x >> ((y - 32) % 32)); >> >> The function was restore to near constant time. We introduced two >> additional statements (the two reductions), so we lost at least 6 >> cycles. But we don't have UB, and we don't have branches. >> >> FIX 4 (Microsoft): >> >> #ifdef _MSC_VER >> return return _rotl(x, static_cast<word32>(y)); >> #endif >> >> We got out of C/C++ because of that damn UB. We are back to 1 >> statement that executes in 3 cycles. >> >> FIX 5 (GNU Linx): >> >> #ifdef __GNUC__ >> // rot<32> variant >> __asm__ ("roll %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%32))); >> return x; >> #endif >> >> We got out of C/C++ because of that damn UB. We are back to 1 >> statement that executes in 3 cycles. >> >> We rely on the ***preprocessor*** to perform the modular reduction >> 'y%32' when its a constant. That means the compiler has to propagate >> the constant from the C/C++ code down into our call to ASM. -- -- You received this message because you are subscribed to the "Crypto++ Users" Google Group. To unsubscribe, send an email to [email protected]. More information about Crypto++ and this group is available at http://www.cryptopp.com. --- You received this message because you are subscribed to the Google Groups "Crypto++ Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
