Re: Spurious register spill with volatile function argument

2016-03-28 Thread Florian Weimer
* Paul Koning:

>> On Mar 28, 2016, at 8:11 AM, Florian Weimer  wrote:
>> 
>> ...
>> The problem is that “reading” is either not defined, or the existing
>> flatly contradicts existing practice.
>> 
>> For example, if p is a pointer to a struct, will the expression >m
>> read *p?
>
> Presumably the offset of m is substantially larger than 0?  If so, my
> answer would be "it had better not".  Does any compiler treat that
> statement as an access to *p ?

As I tried to explain, GCC does, for aliasing purposes.  For the
expression p->m, there is an implicit read of *p, asserting that the
static and dynamic types match.


Re: Spurious register spill with volatile function argument

2016-03-28 Thread Paul_Koning

> On Mar 28, 2016, at 8:11 AM, Florian Weimer  wrote:
> 
> ...
> The problem is that “reading” is either not defined, or the existing
> flatly contradicts existing practice.
> 
> For example, if p is a pointer to a struct, will the expression >m
> read *p?

Presumably the offset of m is substantially larger than 0?  If so, my answer 
would be "it had better not".  Does any compiler treat that statement as an 
access to *p ?

paul


Re: Spurious register spill with volatile function argument

2016-03-28 Thread Florian Weimer
* Andrew Haley:

> "volatile" doesn't really mean very much, formally speaking.  Sure, the
> standard says "accesses to volatile objects are evaluated
> strictly according to the rules of the abstract machine," but nowhere
> is it specified exactly what constitutes an access.

Reading or modifying an object is defined as “access”.

The problem is that “reading” is either not defined, or the existing
flatly contradicts existing practice.

For example, if p is a pointer to a struct, will the expression >m
read *p?

Previously, this wasn't very interesting.  But under the model memory,
it's suddenly quite relevant.  If reading p->m implies a read of the
entire object *p, you cannot use a member to synchronize access to
other members of the struct.  For example, if m is a mutex, and
carefully acquire the mutex before you read or write other members,
you still have data race between a write to some other member and the
acquisition of the mutex because the mutex acquisition reads the
entire struct (including the member written to ).

One possible cure is to take the address of the mutex and keep track
of it separately.  Or you could construct a pointer using offsetof.
But no one is doing that, obviously.

This is not entirely hypothetical.  Even today, GCC's aliasing
analysis requires that those implicit whole-object reads take place,
to make certain forms of type-punning invalid which would otherwise be
well-defined (and for which GCC would generate invalid code).


Re: Spurious register spill with volatile function argument

2016-03-28 Thread Andrew Haley
On 27/03/16 06:57, Michael Clark wrote:

> GCC, Clang folk, any ideas on why there is a stack spill for a
> volatile register argument passed in esi? Does volatile force the
> argument to have storage allocated on the stack? Is this a corner
> case in the C standard? This argument in the x86_64 calling
> convention only has a register, so technically it can’t change
> outside the control of the C "virtual machine” so volatile has a
> vague meaning here.

"volatile" doesn't really mean very much, formally speaking.  Sure, the
standard says "accesses to volatile objects are evaluated
strictly according to the rules of the abstract machine," but nowhere
is it specified exactly what constitutes an access.  (To be precise,
"what constitutes an access to an object that has volatile-qualified
type is implementation-defined.")

So, we have to fall back to tradition.  Traditionally, all volatile
objects are allocated stack slots and all accesses to them are memory
accesses.  This is consistent behaviour, and has been for a long time.
It is also extremely useful when debugging optimized code.

> volatile for scalar function arguments seems to mean: “make this
> volatile and subject to change outside of the compiler” rather than
> being a qualifier for its storage (which is a register).

No, arguments are not necessarily stored in registers: they're passed
in registers, but after function entry function they're just auto
variables and are stored wherever the compiler likes.

Andrew.


Spurious register spill with volatile function argument

2016-03-26 Thread Michael Clark
Seems I had misused volatile. I removed ‘volatile’ from the function argument 
on test_0 and it prevented the spill through the stack.

I added volatile because I was trying to avoid the compiler optimising away the 
call to test_0 (as it has no side effects) but it appeared that volatile was 
unnecessary and was a misuse of volatile (intended to indicate storage may 
change outside of the control of the compiler). However it is an interesting 
case… as a register arguments don’t have storage.

GCC, Clang folk, any ideas on why there is a stack spill for a volatile 
register argument passed in esi? Does volatile force the argument to have 
storage allocated on the stack? Is this a corner case in the C standard? This 
argument in the x86_64 calling convention only has a register, so technically 
it can’t change outside the control of the C "virtual machine” so volatile has 
a vague meaning here. This seems to be a case of interpreting the C standard in 
such a was as to make sure that a volatile argument “can be changed” outside 
the control of the C "virtual machine” by explicitly giving it a storage 
location on the stack. I think volatile scalar arguments are a special case and 
that the volatile type label shouldn’t widen the scope beyond the register 
unless it actually *needs* storage to spill. This is not a volatile stack 
scoped variable unless the C standard interprets ABI register parameters as 
actually having ‘storage’ so this is questionable… Maybe I should have gotten a 
warning… or the volatile type qualifier on a scalar register argument should 
have been ignored…

volatile for scalar function arguments seems to mean: “make this volatile and 
subject to change outside of the compiler” rather than being a qualifier for 
its storage (which is a register).

# gcc
test_0:
 mov DWORD PTR [rsp-4], esi
 mov ecx, DWORD PTR [rsp-4]
 mov eax, edi
 cdq
 idivecx
 mov eax, edx
 ret

# clang
test_0:
mov dword ptr [rsp - 4], esi
xor edx, edx
mov eax, edi
div dword ptr [rsp - 4]
mov eax, edx
ret

/* Test program compiled on x86_64 with: cc -O3 -fomit-frame-pointer 
-masm=intel -S test.c -o test.S  */

#include 
#include 

static const int p = 8191;
static const int s = 13;

int __attribute__ ((noinline)) test_0(unsigned int k, volatile int p)
{
 return k % p;
}

int __attribute__ ((noinline)) test_1(unsigned int k)
{
 return k % p;
}

int __attribute__ ((noinline)) test_2(unsigned int k)
{
 int i = (k) + (k>>s);
 i = (i) + (i>>s);
 if (i>=p) i -= p;
 return i;
}

int main()
{
 test_0(1, 8191); /* control */
 for (int i = INT_MIN; i < INT_MAX; i++) {
 int r1 = test_1(i), r2 = test_2(i);
 if (r1 != r2) printf("%d %d %d\n", i, r1, r2);
 }
}

> On 27 Mar 2016, at 2:32 PM, Andrew Waterman  wrote:
> 
> It would be good to figure out how to get rid of the spurious register spills.
> 
> The strength reduction optimization isn't always profitable on Rocket,
> as it increases instruction count and code size.  The divider has an
> early out and for small numbers is quite fast.
> 
> On Fri, Mar 25, 2016 at 5:43 PM, Michael Clark  wrote:
>> Now considering I have no idea how many cycles it takes for an integer 
>> divide on the Rocket so the optimisation may not be a win.
>> 
>> Trying to read MuDiv in multiplier.scala, and will at some point run some 
>> timings in the cycle-accurate simulator.
>> 
>> In either case, the spurious stack moves emitted by GCC are curious...
>> 
>>> On 26 Mar 2016, at 9:42 AM, Michael Clark  wrote:
>>> 
>>> Hi All,
>>> 
>>> I have found an interesting case where an optimisation is not being applied 
>>> by GCC on RISC-V. And also some strange assembly output from GCC on RISC-V.
>>> 
>>> Both GCC and Clang appear to optimise division by a constant Mersenne prime 
>>> on x86_64 however GCC on RISC-V is not applying this optimisation.
>>> 
>>> See test program and assembly output for these platforms:
>>> 
>>> * GCC -O3 on RISC-V
>>> * GCC -O3 on x86_64
>>> * LLVM/Clang -O3 on x86_64
>>> 
>>> Another strange observation is GCC on RISC-V is moving a1 to a5 via a stack 
>>> store followed by a stack load. Odd? GCC 5 also seems to be doing odd stuff 
>>> with stack ‘moves' on x86_64, moving esi to ecx via the stack (I think 
>>> recent x86 micro-architecture treats tip of the stack like an extended 
>>> register file so this may only have a small penalty on x86).
>>> 
>>> See GCC on RISC-V is emitting this:
>>> 
>>> test_0:
>>>  add sp,sp,-16
>>>  sw  a1,12(sp)
>>>  lw  a5,12(sp)
>>>  add sp,sp,16
>>>  remuw   a0,a0,a5
>>>  jr  ra
>>> 
>>> instead of this:
>>> 
>>> test_0:
>>>  remuw   a0,a0,a1
>>>  jr  ra
>>> 
>>> Compiler devs, please read Test program and assembly output. I have not yet 
>>> tested