https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99320
Bug ID: 99320 Summary: constexpr defined arrays within constexpr functions would benefit from lookup-tables Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gcc-bugs at marehr dot dialup.fu-berlin.de Target Milestone: --- Hi gcc-team, first of all, sorry if this is the wrong component, but I guess that this is a "missed-optimization" issue rather than a regular C++ issue, so I wasn't sure which component fit the most. I have the following code (which can be further reduced, but I kept it as original as possible to reflect my use case): ```c++ #include <array> struct foo { static constexpr char bar(unsigned idx) { constexpr std::array<char, 256> lookup_table { [] () constexpr { std::array<char, 256> ret{}; // reverse mapping for characters and their lowercase for (unsigned rnk = 0u; rnk < 15; ++rnk) { ret[rnk + 'A'] = rnk; } // set U equal to T ret['U'] = ret['T']; ret['u'] = ret['t']; // iupac characters get special treatment, because there is no N ret['R'] = ret['A']; ret['r'] = ret['A']; // A or G ret['Y'] = ret['C']; ret['y'] = ret['C']; // C or T ret['S'] = ret['C']; ret['s'] = ret['C']; // C or G ret['W'] = ret['A']; ret['w'] = ret['A']; // A or T ret['K'] = ret['G']; ret['k'] = ret['G']; // G or T ret['M'] = ret['A']; ret['m'] = ret['A']; // A or T ret['B'] = ret['C']; ret['b'] = ret['C']; // C or G or T ret['D'] = ret['A']; ret['d'] = ret['A']; // A or G or T ret['H'] = ret['A']; ret['h'] = ret['A']; // A or C or T ret['V'] = ret['A']; ret['v'] = ret['A']; // A or C or G return ret; }() }; return lookup_table[idx]; } }; int main(int argc, char const ** argv) { return foo::bar(argc); } ``` I wanted to switch from defining that lookup-table within the class (e.g. `static constexpr ... lookup_table = ...`) to define the lookup-table within the function directly, and I noticed that I had some performance regression in my benchmarks. Some micro benchmarks went from ~80ns to ~3000ns, but I also saw an impact on more "realistic" macro benchmarks. After looking at the assembly https://godbolt.org/z/n9bo7W, I noticed that the table is "constructed" on each function call rather than a single lookup-instruction. So I compared it to what clang does, and it seems that they are actually generating a static lookup table. I know that this use case is quite niche, but it would be cool to have it nevertheless :) Thank you!