https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69678
Bug ID: 69678 Summary: Missed function specialization + partial devirtualization opportunity Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: wschmidt at gcc dot gnu.org CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-* Created attachment 37583 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37583&action=edit Tarball with source, object, executable This is a missed optimization that requires a combination of LTO, function specialization, profiling, partial devirtualization, and inlining, so there are plenty of places where we can go wrong. However, another compiler manages it, and GCC is over 2x slower on this simple test as a result. In the attachment, there are two source files, disp.c and dispf.c. dispf.c contains two functions, "one" and "two." disp.c contains one call to "one" and an indirect call (within a loop) that can call either "one" or "two." Both functions are always called using the value 3 as input. "two" ignores this parameter, while "one" has an early exit for the value 3. The desired behavior with options -O3 -flto -fprofile-use would be: (1) Specialize each of "one" and "two" for a parameter value of 3; (2) Perform partial devirtualization based on profile data for the indirect call, resulting in conditional calls to "one-prime" and "two-prime" in that order prior to falling back to the indirect call; and (3) Inlining the specialized functions at the three direct call sites. Currently it appears that GCC will do neither step (1) nor step (2), making step (3) impossible.