Hi Richard,
> "Roger Sayle" <ro...@nextmovesoftware.com> writes: > > This patch addresses PR rtl-optimization/106594, a significant > > performance regression affecting aarch64 recently introduced (exposed) > > by one of my recent RTL simplification improvements. Firstly many > > thanks to Tamar Christina for confirming that the core of this patch > > provides ~5% performance improvement on some on his benchmarks. > > > > GCC's combine pass uses the function expand_compound_operation to > > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a > > pair of shift operations, as not all targets support extension > > instructions [technically ZERO_EXTEND may potentially be simplified/ > > canonicalized to an AND operation, but the theory remains the same]. > > Are you sure the reason is that not all targets support extension > instructions? > I thought in practice this routine would only tend to see ZERO_EXTEND etc. if > those codes appeared in the original rtl insns. Excellent question. My current understanding is that this subroutine is used for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for which many platforms, really do need a pair of shift instructions. (or an AND). The bit (to me) that that's suspicious is the exact wording of the comment... > > /* Convert sign extension to zero extension, if we know that the high > > bit is not set, as this is easier to optimize. It will be converted > > back to cheaper alternative in make_extraction. */ Notice that things are converted back in "make_extraction", and may not be getting converted back (based on empirical evidence) for non-extractions, i.e. extensions. This code is being called for ZERO_EXTEND on a path that doesn't subsequently call make_compound. As shown in the PR, there are code generation differences (that impact performance), but I agree there's some ambiguity around the intent of the original code. My personal preference is to write backend patterns that contain ZERO_EXTEND (or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical SUBREG. For example, the new patterns added to i386.md look (to me) "cleaner" than the forms that they complement/replace. The burden is then on the middle-end to simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs. Interestingly, this is sometimes already the case, for example, we simplify (ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent (ffs (and ... 255)) [but these are perhaps also just missed optimizations]. Thoughts? I'll also dig more into the history of these (combine) functions. Cheers, Roger --