On 12/14/2012 04:20 AM, Richard Biener wrote: > Exposing known rounding modes as new operation codes may sound like > a good idea (well, I went a similar way with trying to make operations with > undefined overflow explicit ... but the fallout was quite large even though > there is only one kind of undefined overflow and not many operation codes > that are affected ... so the work stalled - see no-undefined-overflow branch). > But don't under-estimate the fallout - both in wrong-code and > missed-optimizations.
Yes, there will be problems adding new operation codes, but if you separate out the subcode somewhere, how can you be sure that the existing optimizations are looking at it and honoring it? It seems to me that's just as much a source of wrong-code as new operation codes. > Not sure if we want to start allocating sub-spaces of codes to a group > to allow flag-like composition (say, PLUS_EXPR gets 0x10 and the lower > nibble specifies the rounding mode). It looks more appealing for the > rounding mode case (more cases) than for the binary (un-)defined overflow > case. The largest problem here is that we're constrained on space: ENUM_BITFIELD(rtx_code) code: 16; unsigned int subcode : 16; we can't afford to allocate an entire nibble to rounding. We could allocate the codes in some sort of pattern that would make it easy to extract the rounding mode algorithmicly. Something like (code - BASE) % 5 since there are 4 directed rounding modes plus "unknown" or "dynamic". > You'd want to expose the rounding mode libc functions as builtins to be > able to detect them. That's good anyway and can be done independently > (they currently act as memory optimization barrier which avoids most of > the issues with -frounding-math support). Yep. > Insertion of rounding mode changes has to be done after 2nd scheduling > (and you probably want to have even 1st scheduling optimize the schedule > for rounding mode changes ...). Machine-reorg is one natural place to do > it (or where we currently insert vzeroupper). Flogging the 387 fpcr or the sse mxcr is just complicated enough to require a free register, and thus it probably has to be done before register allocation. E.g. during the optimize-mode-switching pass where we currently handle 387 rounding modes coming from other builtins and casts. r~