On Friday 03 July 2015 07:15 PM, Richard Biener wrote:
> On Fri, Jul 3, 2015 at 3:10 PM, Vineet Gupta <vineet.gup...@synopsys.com> 
> wrote:
>> Hi,
>>
>> I have the following test case (reduced from Linux kernel sources) and it 
>> seems
>> gcc is optimizing away the first loop iteration.
>>
>> arc-linux-gcc -c -O2 star-9000857057.c -fno-branch-count-reg --save-temps 
>> -mA7
>>
>> ----------->8-------------
>> static inline int __test_bit(unsigned int nr, const volatile unsigned long 
>> *addr)
>> {
>>  unsigned long mask;
>>
>>  addr += nr >> 5;
>> #if 0
>>     nr &= 0x1f;
>> #endif
>>  mask = 1UL << nr;
>>  return ((mask & *addr) != 0);
>> }
>>
>> int foo (int a, unsigned long *p)
>> {
>>   int i;
>>   for (i = 63; i>=0; i--)
>>   {
>>       if (!(__test_bit(i, p)))
>>            continue;
>>       a += i;
>>   }
>>   return a;
>> }
>> ----------->8-------------
>>
>> gcc generates following
>>
>> ----------->8-------------
>>         .global foo
>>         .type   foo, @function
>> foo:
>>         ld_s r2,[r1,4]  <---- dead code
>>         mov_s r2,63
>>         .align 4
>> .L2:
>>         sub r2,r2,1    <-----SUB first
>>         cmp r2,-1
>>         jeq.d [blink]
>>         lsr r3,r2,5   <----- BUG: first @mask is (1 << 62) NOT (1 << 63)
>>         .align 2
>> .L4:
>>         ld.as r3,[r1,r3]
>>         bbit0.nd r3,r2,@.L2
>>         add_s r0,r0,r2
>>         sub r2,r2,1
>>         cmp r2,-1
>>         bne.d @.L4
>>         lsr r3,r2,5
>>         j_s [blink]
>>         .size   foo, .-foo
>>         .ident  "GCC: (ARCv2 ISA Linux uClibc toolchain 
>> arc-2015.06-rc1-21-g21b2c4b83dfa)
>> 4.8.4"
>> ----------->8-------------
>>
>> For initial 32 loop operations, this test is effectively doing 64 bit 
>> operation,
>> e.g. (1 << 63) in 32 bit regime. Is this supposed to be undefined, truncated 
>> to
>> zero or port specific.
>>
>> If it is truncate to zero then generated code below is not correct as it 
>> needs to
>> elide not just the first iteration (corresponding to i = 63) but 63..32
>>
>> Further ARCompact ISA provides that instructions involving bitpos operands 
>> BSET,
>> BCLR, LSL can any number whatsoever, but core will only use the lower 5 bits 
>> (so
>> clamping the bitpos to 0..31 w/o need for doing that in code.
>>
>> So is this a gcc bug, or some spec misinterpretation,.....
> It is the C language standard that says that shifts like this invoke
> undefined behavior.

Right, but the compiler is a program nevertheless and it knows what to do when 
it
sees 1 << 62
It's not like there is an uninitialized variable or something which will provide
unexpected behaviour.
More importantly, the question is can ports define a specific behaviour for such
cases and whether that would be sufficient to guarantee the semantics.

The point being ARC ISA provides a neat feature where core only considers lower 
5
bits of bitpos operands. Thus we can make such behaviour not only deterministic 
in
the context of ARC, but also optimal, eliding the need for doing specific
masking/clamping to 5 bits.

-Vineet

Reply via email to