Re: i386 __atomic_compare_exchange_n not found

2013-08-09 Thread Jonathan Wakely
On 9 August 2013 17:59, Joe Buck wrote:
> The i386 architecture lacks atomic compare instructions, to the point
> where libstdc++ can't be built with that architecture (correct and
> efficient atomic operations are vital important for libstdc++, andon i386
> it can't be done).

I think libstdc++ can be built for i386, but uses a global spinlock
for all atomic ops, instead of actual atomic instructions.

See libstdc++-v3/include/ext/atomicity.h and
libstdc++-v3/config/cpu/i386/atomicity.h

I look forward to the day when (not if) GCC drops support for i386 :-)


Re: [RFC] vector subscripts/BIT_FIELD_REF in Big Endian.

2013-08-09 Thread Bill Schmidt
On Mon, 2013-08-05 at 11:47 +0100, Tejas Belagod wrote:
> Hi,
> 
> I'm looking for some help understanding how BIT_FIELD_REFs work with 
> big-endian.
> 
> Vector subscripts in this example:
> 
> #define vector __attribute__((vector_size(sizeof(int)*4) ))
> 
> typedef int vec vector;
> 
> int foo(vec a)
> {
>return a[0];
> }
> 
> gets lowered into array accesses by c-typeck.c
> 
> ;; Function foo (null)
> {
>return *(int *) &a;
> }
> 
> and gets gimplified into BIT_FIELD_REFs a bit later.
> 
> foo (vec a)
> {
>int _2;
> 
>:
>_2 = BIT_FIELD_REF ;
>return _2;
> 
> }
> 
> What's interesting to me here is the bitpos - does this not need 
> BYTES_BIG_ENDIAN correction? This seems to be inconsistenct with what happens 
> with reduction operations in the autovectorizer where the scalar result in 
> the 
> reduction epilogue gets extracted with a BIT_FIELD_REF but the bitpos there 
> is 
> corrected for BIG_ENDIAN.

a[0] is at the left end of the array in BIG_ENDIAN, and big-endian
machines number bits from the left, so bit position 0 is correct.

> 
> ... from tree-vect-loop.c:vect_create_epilog_for_reduction ()
> 
>/* 2.4  Extract the final scalar result.  Create:
>s_out3 = extract_field   */
> 
>if (extract_scalar_result)
>  {
>tree rhs;
> 
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,
>"extract scalar result");
> 
>if (BYTES_BIG_ENDIAN)
>  bitpos = size_binop (MULT_EXPR,
>   bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 
> 1),
>   TYPE_SIZE (scalar_type));
>else
>  bitpos = bitsize_zero_node;
> 
> 
> For eg:
> 
> int foo(int * a)
> {
>int i, sum = 0;
> 
>for (i=0;i<16;i++)
> sum += a[i];
> 
>return sum;
> }
> 
> gets autovectorized into:
> 
> ...
>vect_sum_9.17_74 = [reduc_plus_expr] vect_sum_9.15_73;
>stmp_sum_9.16_75 = BIT_FIELD_REF ;
>sum_76 = stmp_sum_9.16_75 + sum_47;
> 
> the BIT_FIELD_REF here seems to have been corrected for BYTES_BIG_ENDIAN

Yes, because something else is going on here.  This is a reduction
operation where the sum ends up in the rightmost element of a vector
register that contains four 32-bit integers.  This is at position 96
from the left end of the register according to big-endian numbering.

> 
> If vec_extract is defined in the back-end, how does one figure out if the 
> BIT_FIELD_REF is a product of the gimplifier's indirect ref folding or the 
> vectorizer's bit-field extraction and apply the appropriate correction in 
> vec_extract's expansion? Or am I missing something that corrects 
> BIT_FIELD_REFs 
> between the gimplifier and the RTL expander?

There is no inconsistency here.

Hope this helps!
Bill

> 
> Thanks,
> Tejas.
> 



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-08-09 Thread H.J. Lu
On Fri, Aug 9, 2013 at 12:08 AM, Jan Beulich  wrote:
 On 08.08.13 at 18:01, "H.J. Lu"  wrote:
>> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich  wrote:
>> On 08.08.13 at 02:33, "H.J. Lu"  wrote:
 We use the .gnu_attribute directive to record an object attribute:

 enum
 {
   Tag_GNU_X86_EXTERN_BRANCH = 4,
 };

 for the types of external branch instructions in relocatable files.

 enum
 {
   /* All external branch instructions are legacy.  */
   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
   /* There is at lease one external branch instruction with BND prefix.  */
   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
 };

 An x86 feature note section, .note.x86-feature, is used to indicate
 features in executables and shared library. The contents of this note
 section are:

 .section.note.x86-feature
 .align  4
 .long   .L1 - .L0
 .long   .L3 - .L2
 .long   1
 .L0:
 .asciz "x86 feature"
 .L1:
 .align  4
 .L2:
 .longFeatureFlag (Feature flag)
 .L3:

 The current valid bits in FeatureFlag are

 #define NT_X86_FEATURE_PLT_BND(0x1 << 0)

 It should be set if PLT entry has BND prefix to preserve bound registers.

 The remaining bits in FeatureFlag are reserved.

 When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
 file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
 the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
 Val_GNU_X86_EXTERN_BRANCH_BND.

 When generating executable or shared library, if PLT is needed and
 Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
 the 32-byte PLT entry should be used and the feature note section should
 be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
 note section should be included in PT_NOTE segment. The benefit of the
 note section is it is backward compatible with existing run-time and tools.
>>>
>>> While I can see the purpose of the attribute section, I don't see
>>> what the note section is for: You don't mention at all what it's
>>> consumed for, and I also can't see how it validly would be for
>>> anything. That's because iirc note section contents, if not
>>> understood by the consumer, is required to not have any effect
>>> on the correctness of the program. Hence if loaded on a system
>>> that MPX capable, has an MPX aware kernel, but no MPX aware
>>> user space (apart from this one executable or shared library, or
>>> a set thereof), it ought to still work correctly. Which - afaict - it
>>> won't if the dynamic loader itself isn't MPX aware.
>>>
>>
>> The note section isn't required for correctness.  But it can be used
>> by ld.so to select an alternate MPX aware shared library in a different
>> directory, instead of a legacy one.
>
> Okay, that clarifies your intentions with the note section. However,
> then you need something else to make sure an MPX aware app can't
> load on an MPX enabled kernel without MPX-enabled ld.so.

The MPX enabled app will still run correctly.  ld.so will clear the bound
registers (that makes unlimited bound) for the first call with lazy binding.

>> There is another way to encode this information in the first entry
>> of PLT:
>>
>>0:ff 35 00 00 00 00pushq  GOT+8(%rip)
>>6:f2 ff 25 00 00 00 00 bnd jmpq *GOT+16(%rip)
>>d:0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)
>>   12:0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>>   19:0f 1f 80 00 00 00 01 nopl   0x100(%rax)
>>
>> We can encode PLT property in 10 (4 + 4 + 2) bytes of
>> displacements of 3 nops.  In this example, the first bit
>> of the last byte of PLT0 is 1.
>
> While a nice idea, I think that's worse, because much harder to
> determine from simply dumping information for a given binary.
>

I agree.  That is why a note section is better.


-- 
H.J.


Re: i386 __atomic_compare_exchange_n not found

2013-08-09 Thread Joe Buck
On Fri, Aug 09, 2013 at 11:23:51AM -0500, Joel Sherrill wrote:
> On 8/9/2013 11:05 AM, Deng Hengyi wrote:
> > Hi Joel,
> >
> > I have done a test, it seems that '-march=i386' does not provide 
> > "__atomic_compare_exchange_n" libs. And '-march=i486' or '-march=pentium' 
> > can find the '__atomic_compare_exchange_n' function.
> Look in the source for that methods on x86 and see what instruction
> it used. If it only got added in i486, then we have to figure out
> something for i386. If it was an oversight and the instruction is
> on an i386, we fix the code.

The i386 architecture lacks atomic compare instructions, to the point
where libstdc++ can't be built with that architecture (correct and
efficient atomic operations are vital important for libstdc++, andon i386
it can't be done).

The worry is that if you add "atomic" operations that don't lock for the
i386 architecture, you've screwed anyone who decides to build their
application for i386 hoping for maximum portability, but winds up with
locks that don't lock.

You could perhaps handle that for RTEMS by providing these functions in a
library, but users need to understand this issue, because improper locks
are tough to debug.



Re: conflict between scheduler and register allocator

2013-08-09 Thread Vladimir Makarov

On 13-08-09 7:25 AM, shmeel gutl wrote:
I am having trouble meeting the constraints of the scheduler and the 
register allocator for my back end. The relevant features are:


1) VLIW - up to 4 instructions can be issued each cycle
2) If a vliw bundle has both a set and a use, the use will use the old 
values.
3) A call instruction will push r30 and r31 to the stack making them 
natural candidates for callee saved.


The problem is that the scheduler might include an instruction that 
sets r30 in the same vliw as a call. This would result in a stale 
value being saved to the stack. (Note: the call instruction is not 
truly dependent on r30, just that r30 can't be set in the vliw that 
contains the call). On the other hand, if I declare that the call uses 
r30, the register allocator will refuse to use r30 since it thinks 
that the register is live.
I know that I can use a hook to fix-up the first problem by breaking a 
single vliw into two bundles, but that has a performance penalty. Is 
there a way to tell the scheduler to avoid issuing an instruction that 
sets a30 or a31 in the same bundle that contains a call instruction?


Thank you for any pointers.

You should look at haifa-sched.c::schedule_block.   There are a lot of 
hooks called at different stages of list scheduling algorithm. Depending 
on what the algorithm stage you want to do this, you can use a specific 
hook.  I'd pay attention to targetm.sched.reorder[2].


You also can look at the hooks implemented for IA64 as it is most widely 
used VLIW architecture for now.  But implementation of some IA64 hook 
are pretty big.




Re: i386 __atomic_compare_exchange_n not found

2013-08-09 Thread Joel Sherrill
On 8/9/2013 11:05 AM, Deng Hengyi wrote:
> Hi Joel,
>
> I have done a test, it seems that '-march=i386' does not provide 
> "__atomic_compare_exchange_n" libs. And '-march=i486' or '-march=pentium' can 
> find the '__atomic_compare_exchange_n' function.
Look in the source for that methods on x86 and see what instruction
it used. If it only got added in i486, then we have to figure out
something for i386. If it was an oversight and the instruction is
on an i386, we fix the code.
> weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
> -m32 -march=i386 -o test test.c 
> /home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
>  warning: cannot find entry symbol _start; defaulting to 08048074
> /tmp/ccTmf1pa.o: In function `main':
> test.c:(.text+0xaa): undefined reference to `__atomic_compare_exchange_4'
> test.c:(.text+0xfa): undefined reference to `__atomic_compare_exchange_4'
> collect2: error: ld returned 1 exit status
>
> weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
> -m32 -march=i486 -o test test.c 
> /home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
>  warning: cannot find entry symbol _start; defaulting to 08048074
>
> weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
> -m32 -march=i686 -o test test.c 
> /home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
>  warning: cannot find entry symbol _start; defaulting to 08048074
>
> weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
> -m32 -march=i586 -o test test.c 
> /home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
>  warning: cannot find entry symbol _start; defaulting to 08048074
>
> weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
> -m32 -march=pentium -o test test.c 
> /home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
>  warning: cannot find entry symbol _start; defaulting to 08048074
>
> weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
> -m32 -o test test.c 
> /home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
>  warning: cannot find entry symbol _start; defaulting to 08048074
> /tmp/cctQ68SN.o: In function `main':
> test.c:(.text+0xaa): undefined reference to `__atomic_compare_exchange_4'
> test.c:(.text+0xfa): undefined reference to `__atomic_compare_exchange_4'
> collect2: error: ld returned 1 exit status
>
> WeiY
> Best Regards
> 在 2013-8-10,上午12:02,Joel Sherrill  写道:
>
>> On 8/9/2013 10:15 AM, Deng Hengyi wrote:
>>> Hi all,
>>>
>>> does anyone know how to configure gcc to build with 
>>> "__atomic_compare_exchange_n" support for i386 target?
>> I recall that one issue with *-rtems* targets is that
>> we support CPU models which are lower than typically
>> used on Linux and BSD systems. I recall that CPU
>> models like the mc68000 and i386 don't necessarily
>> have the atomic instructions available in later models
>> like the mc68040 or i686.
>>
>> My suggestion is to see specifically how that is
>> implemented on the other CPU models and which models
>> don't have implementations. Then we can figure out
>> how to implement it.
>>
>> For lower model CPUs, it should be safe to assume they
>> will never be seen in SMP systems and using a
>> generic RTEMS providing implementation that disables
>> interrupts and does the operation is OK.
>>
>> --joel
>>> WeiY
>>> Best Regards
>>> 在 2013-8-6,下午11:37,Jonathan Wakely  写道:
>>>
 On 6 August 2013 16:30, Deng Hengyi wrote:
> Hi Jonathan,
>
> Thank you for your reply.
> And about the error i encounter, do you have any advice? maybe it is 
> caused by my toolchain not install rightly?
> In the standard pc686 architecture(not cross compile on RTEMS) will it 
> encounter the similar error?
 I don't know anything about the RTEMS port. You might need to build
 and link to libatomic, but I don't know.
>>
>> -- 
>> Joel Sherrill, Ph.D. Director of Research & Development 
>> joel.sherr...@oarcorp.comOn-Line Applications Research
>> Ask me about RTEMS: a free RTOS  Huntsville AL 35805 
>> Support Available(256) 722-9985 
>>


-- 
Joel Sherrill, Ph.D. Director of Research & Development 
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805 
Support Available(256) 722-9985 



Re: i386 __atomic_compare_exchange_n not found

2013-08-09 Thread Deng Hengyi
Hi Joel,

I have done a test, it seems that '-march=i386' does not provide 
"__atomic_compare_exchange_n" libs. And '-march=i486' or '-march=pentium' can 
find the '__atomic_compare_exchange_n' function.

weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
-m32 -march=i386 -o test test.c 
/home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 08048074
/tmp/ccTmf1pa.o: In function `main':
test.c:(.text+0xaa): undefined reference to `__atomic_compare_exchange_4'
test.c:(.text+0xfa): undefined reference to `__atomic_compare_exchange_4'
collect2: error: ld returned 1 exit status

weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
-m32 -march=i486 -o test test.c 
/home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 08048074

weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
-m32 -march=i686 -o test test.c 
/home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 08048074

weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
-m32 -march=i586 -o test test.c 
/home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 08048074

weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
-m32 -march=pentium -o test test.c 
/home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 08048074

weiy@ubuntu:~/project/gsoc/gsoc2013/rtems-build/arm-build$ i386-rtems4.11-gcc 
-m32 -o test test.c 
/home/weiy/project/gsoc/gsoc2013/rtems-toolchain/4.11/lib/gcc/i386-rtems4.11/4.8.1/../../../../i386-rtems4.11/bin/ld:
 warning: cannot find entry symbol _start; defaulting to 08048074
/tmp/cctQ68SN.o: In function `main':
test.c:(.text+0xaa): undefined reference to `__atomic_compare_exchange_4'
test.c:(.text+0xfa): undefined reference to `__atomic_compare_exchange_4'
collect2: error: ld returned 1 exit status

WeiY
Best Regards
在 2013-8-10,上午12:02,Joel Sherrill  写道:

> On 8/9/2013 10:15 AM, Deng Hengyi wrote:
>> Hi all,
>> 
>> does anyone know how to configure gcc to build with 
>> "__atomic_compare_exchange_n" support for i386 target?
> 
> I recall that one issue with *-rtems* targets is that
> we support CPU models which are lower than typically
> used on Linux and BSD systems. I recall that CPU
> models like the mc68000 and i386 don't necessarily
> have the atomic instructions available in later models
> like the mc68040 or i686.
> 
> My suggestion is to see specifically how that is
> implemented on the other CPU models and which models
> don't have implementations. Then we can figure out
> how to implement it.
> 
> For lower model CPUs, it should be safe to assume they
> will never be seen in SMP systems and using a
> generic RTEMS providing implementation that disables
> interrupts and does the operation is OK.
> 
> --joel
>> WeiY
>> Best Regards
>> 在 2013-8-6,下午11:37,Jonathan Wakely  写道:
>> 
>>> On 6 August 2013 16:30, Deng Hengyi wrote:
 Hi Jonathan,
 
 Thank you for your reply.
 And about the error i encounter, do you have any advice? maybe it is 
 caused by my toolchain not install rightly?
 In the standard pc686 architecture(not cross compile on RTEMS) will it 
 encounter the similar error?
>>> I don't know anything about the RTEMS port. You might need to build
>>> and link to libatomic, but I don't know.
> 
> 
> -- 
> Joel Sherrill, Ph.D. Director of Research & Development 
> joel.sherr...@oarcorp.comOn-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805 
> Support Available(256) 722-9985 
> 



Re: i386 __atomic_compare_exchange_n not found

2013-08-09 Thread Joel Sherrill
On 8/9/2013 10:15 AM, Deng Hengyi wrote:
> Hi all,
>
> does anyone know how to configure gcc to build with 
> "__atomic_compare_exchange_n" support for i386 target?

I recall that one issue with *-rtems* targets is that
we support CPU models which are lower than typically
used on Linux and BSD systems. I recall that CPU
models like the mc68000 and i386 don't necessarily
have the atomic instructions available in later models
like the mc68040 or i686.

My suggestion is to see specifically how that is
implemented on the other CPU models and which models
don't have implementations. Then we can figure out
how to implement it.

For lower model CPUs, it should be safe to assume they
will never be seen in SMP systems and using a
generic RTEMS providing implementation that disables
interrupts and does the operation is OK.

--joel
> WeiY
> Best Regards
> 在 2013-8-6,下午11:37,Jonathan Wakely  写道:
>
>> On 6 August 2013 16:30, Deng Hengyi wrote:
>>> Hi Jonathan,
>>>
>>> Thank you for your reply.
>>> And about the error i encounter, do you have any advice? maybe it is caused 
>>> by my toolchain not install rightly?
>>> In the standard pc686 architecture(not cross compile on RTEMS) will it 
>>> encounter the similar error?
>> I don't know anything about the RTEMS port. You might need to build
>> and link to libatomic, but I don't know.


-- 
Joel Sherrill, Ph.D. Director of Research & Development 
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805 
Support Available(256) 722-9985 



Re: i386 __atomic_compare_exchange_n not found

2013-08-09 Thread Deng Hengyi
Hi all,

does anyone know how to configure gcc to build with 
"__atomic_compare_exchange_n" support for i386 target?

WeiY
Best Regards
在 2013-8-6,下午11:37,Jonathan Wakely  写道:

> On 6 August 2013 16:30, Deng Hengyi wrote:
>> Hi Jonathan,
>> 
>> Thank you for your reply.
>> And about the error i encounter, do you have any advice? maybe it is caused 
>> by my toolchain not install rightly?
>> In the standard pc686 architecture(not cross compile on RTEMS) will it 
>> encounter the similar error?
> 
> I don't know anything about the RTEMS port. You might need to build
> and link to libatomic, but I don't know.



How to specify multiple OSDIRNAME suffixes for multilib (Multilib usage with MPX)?

2013-08-09 Thread Ilya Enkovich
Hi,

I'm currently trying to create multilib libraries compiled with MPX.
The main difference with existing multilib variants on i386 target is
that new targets (32/mpx, 64/mpx) are compatible with old variants
(32, 64). Also we should not prevent user from using mpx if he does
not have MPX variants for some libraries - legacy versions should be
used instead. Thus we need to check several suffixes instead of one.
E.g. for 64bit MPX binary we should firstly check ../lib64/mpx, then
check ../lib64 and finally the default one.

I looked at MULTILIB_REUSE and thought it might solve my problem
according to documentation: "And for some targets it is better to
reuse an existing multilib than to fall back to default multilib when
there is no corresponding multilib." [1]. So I tried following
declarations:

MULTILIB_OSDIRNAMES+= m64/fmpx=../lib64/mpx
MULTILIB_REUSE = m64=m64/fmpx

But it appeared that only the first entry for some options set counts
when multilibs are parsed in gcc.c and my reuse here is just ignored.

Is it a wrong implementation of MULTILIB_REUSE or my wrong
understanding of this option? Is there a way to implement mpx
multilibs still allowing legacy ones when some mpx libs are missing?

[1] http://gcc.gnu.org/onlinedocs/gccint/Target-Fragment.html#Target-Fragment

Thanks,
Ilya


conflict between scheduler and register allocator

2013-08-09 Thread shmeel gutl
I am having trouble meeting the constraints of the scheduler and the 
register allocator for my back end. The relevant features are:


1) VLIW - up to 4 instructions can be issued each cycle
2) If a vliw bundle has both a set and a use, the use will use the old 
values.
3) A call instruction will push r30 and r31 to the stack making them 
natural candidates for callee saved.


The problem is that the scheduler might include an instruction that sets 
r30 in the same vliw as a call. This would result in a stale value being 
saved to the stack. (Note: the call instruction is not truly dependent 
on r30, just that r30 can't be set in the vliw that contains the call). 
On the other hand, if I declare that the call uses r30, the register 
allocator will refuse to use r30 since it thinks that the register is live.
I know that I can use a hook to fix-up the first problem by breaking a 
single vliw into two bundles, but that has a performance penalty. Is 
there a way to tell the scheduler to avoid issuing an instruction that 
sets a30 or a31 in the same bundle that contains a call instruction?


Thank you for any pointers.



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-08-09 Thread Jan Beulich
>>> On 08.08.13 at 18:01, "H.J. Lu"  wrote:
> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich  wrote:
> On 08.08.13 at 02:33, "H.J. Lu"  wrote:
>>> We use the .gnu_attribute directive to record an object attribute:
>>>
>>> enum
>>> {
>>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>>> };
>>>
>>> for the types of external branch instructions in relocatable files.
>>>
>>> enum
>>> {
>>>   /* All external branch instructions are legacy.  */
>>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>>   /* There is at lease one external branch instruction with BND prefix.  */
>>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>>> };
>>>
>>> An x86 feature note section, .note.x86-feature, is used to indicate
>>> features in executables and shared library. The contents of this note
>>> section are:
>>>
>>> .section.note.x86-feature
>>> .align  4
>>> .long   .L1 - .L0
>>> .long   .L3 - .L2
>>> .long   1
>>> .L0:
>>> .asciz "x86 feature"
>>> .L1:
>>> .align  4
>>> .L2:
>>> .longFeatureFlag (Feature flag)
>>> .L3:
>>>
>>> The current valid bits in FeatureFlag are
>>>
>>> #define NT_X86_FEATURE_PLT_BND(0x1 << 0)
>>>
>>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>>
>>> The remaining bits in FeatureFlag are reserved.
>>>
>>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>>
>>> When generating executable or shared library, if PLT is needed and
>>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>>> the 32-byte PLT entry should be used and the feature note section should
>>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>>> note section should be included in PT_NOTE segment. The benefit of the
>>> note section is it is backward compatible with existing run-time and tools.
>>
>> While I can see the purpose of the attribute section, I don't see
>> what the note section is for: You don't mention at all what it's
>> consumed for, and I also can't see how it validly would be for
>> anything. That's because iirc note section contents, if not
>> understood by the consumer, is required to not have any effect
>> on the correctness of the program. Hence if loaded on a system
>> that MPX capable, has an MPX aware kernel, but no MPX aware
>> user space (apart from this one executable or shared library, or
>> a set thereof), it ought to still work correctly. Which - afaict - it
>> won't if the dynamic loader itself isn't MPX aware.
>>
> 
> The note section isn't required for correctness.  But it can be used
> by ld.so to select an alternate MPX aware shared library in a different
> directory, instead of a legacy one.

Okay, that clarifies your intentions with the note section. However,
then you need something else to make sure an MPX aware app can't
load on an MPX enabled kernel without MPX-enabled ld.so.

> There is another way to encode this information in the first entry
> of PLT:
> 
>0:ff 35 00 00 00 00pushq  GOT+8(%rip)
>6:f2 ff 25 00 00 00 00 bnd jmpq *GOT+16(%rip)
>d:0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)
>   12:0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>   19:0f 1f 80 00 00 00 01 nopl   0x100(%rax)
> 
> We can encode PLT property in 10 (4 + 4 + 2) bytes of
> displacements of 3 nops.  In this example, the first bit
> of the last byte of PLT0 is 1.

While a nice idea, I think that's worse, because much harder to
determine from simply dumping information for a given binary.

Jan