> The attached patch catches C constructs: > (A << 8) | (A >> 8) > where A is unsigned 16 bits > and maps them to builtin_bswap16(A) which can provide more efficient > implementations on some targets.
This belongs in tree-ssa-math-opts.c:execute_optimize_bswap instead. When I implemented __builtin_bswap16, I didn't add this because I thought this would be overkill since the RTL combiner should be able to catch the pattern. Have you investigated on this front? But I don't have a strong opinion. -- Eric Botcazou