On Thu, Jun 5, 2008 at 7:49 AM, Richard Guenther
<[EMAIL PROTECTED]> wrote:
> On Thu, Jun 5, 2008 at 4:31 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> x86-64 psABI defines
>>
>> typedef struct
>> {
>>  unsigned int gp_offset;
>>  unsigned int fp_offset;
>>  void *overflow_arg_area;
>>  void *reg_save_area;
>> } va_list[1];
>>
>> for variable argument list. "va_list" is used to access variable argument
>> list:
>>
>> void
>> bar (const char *format, va_list ap)
>> {
>>  if (va_arg (ap, int) != 0)
>>    abort ();
>> }
>>
>> void
>> foo(char *fmt, ...)
>> {
>>  va_list ap;
>>  va_start (fmt, ap);
>>  bar (fmt, ap);
>>  va_end (ap);
>> }
>>
>> foo and bar may be compiled with different compilers. We have to keep
>> the current layout for va_list so that we can mix va_list codes compiled
>> with AVX and non-AVX compilers. We need to extend the variable argument
>> handling in the x86-64 psABI to support passing __m256/__m256d/__m256i
>> on the variable argument list. We propose 2 ways to extend the register
>> save area to add 256bit AVX registers support:
>>
>> 1. Extend the register save area to put upper 128bit at the end.
>>  Pros:
>>    Aligned access.
>>    Save stack space if 256bit registers are used.
>>  Cons
>>    Split access. Require more split access beyond 256bit.
>>
>> 2. Extend the register save area to put full 265bit YMMs at the end.
>> The first DWORD after the register save area has the offset of
>> the extended array for YMM registers. The next DWORD has the
>> element size of the extended array. Unaligned access will be used.
>>  Pros:
>>    No split access.
>>    Easily extendable beyond 256bit.
>>    Limited unaligned access penalty if stack is aligned at 32byte.
>>  Cons:
>>    May require store both the lower 128bit and full 256bit register
>>    content. We may avoid saving the lower 128bit if correct type
>>    is required when accessing variable argument list, similar to int
>>    vs. double.
>>    Waste 272 byte on stack when 256bit registers are used.
>>    Unaligned load and store.
>>
>> We should agree on one approach to ensure compatibility between
>> different compilers.
>>
>> Personally, I prefer #2 for its simplicity. Does anyone else have a
>> preference?
>
> If you want to mix AVX and non-AVX code then you need a way to
> detect if AVX information was saved at runtime.  What is it in those
> both cases?
>
> If you don't want to mix AVX and non-AVX code then basically you
> can declare the ABIs incompatible anyway?

We want to extend the psABI in such a way that we can link
AVX enabled code to call vfprintf in glibc which is compiled
with the older compiler and doesn't use YMM registers.
That is if bar, in the example above, doesn't use YMM
registers, it can be compiled by any compilers. bar doesn't
need to know if YMM  registers are used in caller at all.
All necessary information for YMM registers are specified
in the psABI. If  a compiler doesn't use YMM registers,
it  doesn't have to do anything.

>
> There is also a third option of passing AVX values by reference.
>
> For simplicity I would also prefer 2) - after all we don't need to fill
> in the XMM area / the AVX area if the value is unused.
>

That is what I believe.

Thanks.


-- 
H.J.

Reply via email to