https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821
--- Comment #33 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 20 Nov 2017, ubizjak at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821 > > --- Comment #32 from Uroš Bizjak <ubizjak at gmail dot com> --- > (In reply to rguent...@suse.de from comment #20) > > > Index: gcc/tree-ssa-sccvn.c > > =================================================================== > > --- gcc/tree-ssa-sccvn.c (revision 254945) > > +++ gcc/tree-ssa-sccvn.c (working copy) > > @@ -3632,6 +3632,38 @@ visit_nary_op (tree lhs, gassign *stmt) > > The patched compiler works for this particular case (a "break" is needed at > the > top of the patch, though), but there are many other cases where a temporary > should be calculated and its value stored, e.g.: > > --cut here-- > void foo_OK (char *buf, unsigned int data) > { > buf[0] = ~data >> 8; > buf[1] = ~data; > } > > void foo (char *buf, unsigned int data) > { > buf[0] = ~data; > buf[1] = ~data >> 8; > } > > void bar (char *buf, unsigned int data) > { > buf[0] = -data >> 8; > buf[1] = -data; > } > > void baz (char *buf, unsigned int data) > { > buf[0] = (data+3) >> 8; > buf[1] = (data+3); > } > --cut here-- > > Only foo_OK compiles (-O2 -march=haswell) to expected asm: > > notl %esi > movbe %si, (%rdi) > > all others generate duplicated operation: > > movb %sil, %al > notl %esi > notl %eax > movb %al, (%rdi) > movl %esi, %eax > movb %ah, 1(%rdi) > > Should I open a new PR for the above enhancement? Ok, I fear we're missing some canonicalization here. For foo we see <bb 2> : _1 = (char) data_7(D); _2 = ~_1; *buf_9(D) = _2; _3 = ~data_7(D); _4 = _3 >> 8; _6 = (char) _4; MEM[(char *)buf_9(D) + 1B] = _6; for bar (patch didn't handle negates which I'm not sure are as easily handled correctness wise?) <bb 2> : _1 = -data_8(D); _2 = _1 >> 8; _3 = (char) _2; *buf_10(D) = _3; _4 = (unsigned char) data_8(D); _5 = -_4; _7 = (char) _5; MEM[(char *)buf_10(D) + 1B] = _7; for baz <bb 2> : _1 = data_8(D) + 3; _2 = _1 >> 8; _3 = (char) _2; *buf_10(D) = _3; _4 = (unsigned char) data_8(D); _5 = _4 + 3; _7 = (char) _5; MEM[(char *)buf_10(D) + 1B] = _7; and it's already the FEs shortening optimization or folding that makes the operations non-canonical. .original dumps: *buf = (char) (~data >> 8); *(buf + 1) = ~(char) data; *buf = ~(char) data; *(buf + 1) = (char) (~data >> 8); *buf = (char) (-data >> 8); *(buf + 1) = (char) -(unsigned char) data; *buf = (char) (data + 3 >> 8); *(buf + 1) = (char) ((unsigned char) data + 3); note there's conflicting interest from targets like AVR that prefer truncation to be moved innermost (which would argue for a CSE-only solution together with aggressive shortening of operations). In the above case that probably means (char) (~data >> 8) -> ~(char)(data >> 8) or alternatively figure we're interested in short only and thus (char)~(short)data and (char)(~(short)data >> 8) so we can CSE ~(short)data. Note that within CSE we have the difficulty that the first stmt is basically unchangeable with the trick we're doing right now. This means CSE requires the first stmt to be in useful canonical form. That said, bar is easily handled in the proposed patch (though I have to think about correctness). The other two can't be done easily - what would fix it would be not narrowing the ~- operation in the FE. In this case it is convert_to_integer_1 doing the narrowing and also fold_unary having a related transform. Removing the convert_to_integer_1 code fixes the missed transforms but in baz where we have similar narrowing being performed for the add... I suppose we do want to narrow at some point (and in fact the user might already write ~(char)data ...). As usual we have conflicting interests here. Note that undoing narrowing isn't easily done so narrowing very early is certainly unwanted.