Re: [PATCH] [mingw] fix typo: s/_REENTRANCE/_REENTRANT/

2011-10-03 Thread Ozkan Sezer
PING?

On Thu, Sep 22, 2011 at 2:28 PM, Ozkan Sezer  wrote:
> Hi:
>
> Unless I'm missing something, the mingw CPP_SPEC changes introduced in
> r171833 have a typo: -D_REENTRANCE should read -D_REENTRANT . Patchlet
> below.  Please review, and apply if it's OK.
>
>
> config/i386/mingw-w64.h (CPP_SPEC): Rename _REENTRANCE to _REENTRANT.
> config/i386/mingw32.h (CPP_SPEC): Likewise.
>
> Index: config/i386/mingw-w64.h
> ===
> --- config/i386/mingw-w64.h     (revision 171833)
> +++ config/i386/mingw-w64.h     (working copy)
> @@ -25,8 +25,8 @@
>  #undef CPP_SPEC
>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
>                 "%{municode:-DUNICODE} " \
> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
> -                "%{" SPEC_PTHREAD2 ":-U_REENTRANCE} "
> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
> +                "%{" SPEC_PTHREAD2 ":-U_REENTRANT} "
>
>  #undef STARTFILE_SPEC
>  #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
> Index: config/i386/mingw32.h
> ===
> --- config/i386/mingw32.h       (revision 177789)
> +++ config/i386/mingw32.h       (working copy)
> @@ -87,7 +87,7 @@
>
>  #undef CPP_SPEC
>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
>                 "%{" SPEC_PTHREAD2 ": } "
>
>  /* For Windows applications, include more libraries, but always include
>
>
> --
> O.S.
>


Re: [RFC] Context sensitive inline analysis

2011-10-03 Thread Richard Sandiford
Richard Sandiford  writes:
> Jan Hubicka  writes:
>> the problem is sign overflow in time computation. Time should be
>> capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
>> 2. This happens to be >2^31 & <2^32 so we overflow here because of use
>> of signed arithmetics.
>>
>> Index: ipa-inline-analysis.c
>> ===
>> --- ipa-inline-analysis.c(revision 179266)
>> +++ ipa-inline-analysis.c(working copy)
>> @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
>>  /* Estimate runtime of function can easilly run into huge numbers with many
>> nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in 
>> integer.
>> For anything larger we use gcov_type.  */
>> -#define MAX_TIME 100
>> +#define MAX_TIME 50
>>  
>>  /* Number of bits in integer, but we really want to be stable across 
>> different
>> hosts.  */
>
> Could you update the comment too?  ("time * INLINE_SIZE_SCALE * 2")

OK, I did it myself.  Tested on x86_64-linux-gnu and applied as obvious.

Richard


gcc/
* ipa-inline-analysis.c (MAX_TIME): Update comment.

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   2011-10-03 09:10:21.0 +0100
+++ gcc/ipa-inline-analysis.c   2011-10-03 09:10:55.633044417 +0100
@@ -90,8 +90,8 @@ Software Foundation; either version 3, o
 #include "alloc-pool.h"
 
 /* Estimate runtime of function can easilly run into huge numbers with many
-   nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
-   For anything larger we use gcov_type.  */
+   nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE * 2 in an
+   integer.  For anything larger we use gcov_type.  */
 #define MAX_TIME 50
 
 /* Number of bits in integer, but we really want to be stable across different


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 09/14/11 11:03, Richard Sandiford wrote:
>> ...I didn't see from an admittedly quick read of the patch how you
>> handle memory disambiguation between iterations.  If a loop includes:
>> 
>>  lb $3,($4)
>>  sb $5,1($4)
>> 
>> then the two instructions can be reordered by normal ebb scheduling,
>> but the inter-iteration conflict is important for modulo scheduling.
>
> There's nothing special to handle, I think. sched-deps should see that
> the ld in iteration 1 is DEP_ANTI against the sb in iteration 0
> (assuming there's also an increment).

For the record, I don't agree that we should rely on register
dependencies to handle memory dependencies.  It's possible for MEMs in
different iterations to alias without there being a register dependence
between them.  It might not happen often in modulo-schedulable loops,
but it is possible...

I realise the patch has been approved though.  Like I say,
it's just for the record.

Richard





Re: Initial shrink-wrapping patch

2011-10-03 Thread Richard Sandiford
Just a suggestion, but...

Bernd Schmidt  writes:
> Index: gcc/cfgcleanup.c
> ===
> --- gcc/cfgcleanup.c  (revision 178734)
> +++ gcc/cfgcleanup.c  (working copy)
> @@ -1488,6 +1488,16 @@ outgoing_edges_match (int mode, basic_bl
>edge e1, e2;
>edge_iterator ei;
>  
> +  /* If we performed shrink-wrapping, edges to the EXIT_BLOCK_PTR can
> + only be distinguished for JUMP_INSNs.  The two paths may differ in
> + whether they went through the prologue.  Sibcalls are fine, we know
> + that we either didn't need or inserted an epilogue before them.  */
> +  if (flag_shrink_wrap
> +  && single_succ_p (bb1) && single_succ (bb1) == EXIT_BLOCK_PTR
> +  && !JUMP_P (BB_END (bb1))
> +  && !(CALL_P (BB_END (bb1)) && SIBLING_CALL_P (BB_END (bb1
> +return false;

...how about adding a bit to crtl to say whether shrink-wrap occured,
and check that instead of flag_shrink_wrap?

(Leaving the full review to Richard.)

Richard


Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread Andrew Haley
On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
> On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
> 
>> On 26 July 2011 10:01, Dr. David Alan Gilbert  
>> wrote:
>>>
>>> +
>>> +extern unsigned int __write(int fd, const void *buf, unsigned int count);
>>
>> Why are we using __write instead of write?
> 
> Because plain write is in the user's namespace in ISO C.  See what I said 
> in  - the 
> alternative is hardcoding the syscall number and using the syscall 
> directly.

That would be better, no?  Unless __write is part of the glibc API,
which AFAIK it isn't.

Andrew.


[committed] Fix ICE in init_range_entry (PR tree-optimization/50587)

2011-10-03 Thread Jakub Jelinek
Hi!

The operand can be not just SSA_NAME, but also gimple_is_min_invariant
and in that case we can ICE on the assert later on.
Example testcase is:
extern int c[64];

int
foo (int a)
{
  int x = a > 1;
  int y = &c[60] < (int *) 0x12345678UL;
  return x | y;
}
which is undefined behavior, so I'm not checking that in.

Fixed thusly, committed to trunk as obvious.

2011-10-03  Jakub Jelinek  

PR tree-optimization/50587
* tree-ssa-reassoc.c (init_range_entry): Stop iterating when
arg0 is not a SSA_NAME.

--- gcc/tree-ssa-reassoc.c.jj   2011-09-30 15:18:39.0 +0200
+++ gcc/tree-ssa-reassoc.c  2011-10-03 09:46:46.0 +0200
@@ -1648,6 +1648,8 @@ init_range_entry (struct range_entry *r,
 
   code = gimple_assign_rhs_code (stmt);
   arg0 = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (arg0) != SSA_NAME)
+   break;
   arg1 = gimple_assign_rhs2 (stmt);
   exp_type = TREE_TYPE (exp);
   loc = gimple_location (stmt);

Jakub


Ping^2: PR middle-end/48660: Assigning to BLKmode RESULT_DECL

2011-10-03 Thread Richard Sandiford
Ping for:

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00558.html

which fixes an expand-time bug in thunks that return BLKmode structures
in registers.  Tested on x86_64-linux-gnu and arm-linux-gnueabi.

Thanks,
Richard


Re: Initial shrink-wrapping patch

2011-10-03 Thread Basile Starynkevitch

Hello,

Regarding this shrink-wrapping patch, I would suggest to describe, in a
comments of one or two sentences, what shkink-wrapping means in the context
of GCC.

http://en.wikipedia.org/wiki/Shrink_wrap does not help much in understanding
that.

Cheers.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Stream cgraph order

2011-10-03 Thread Jan Hubicka
Hi,
this patch makes us to stream out and stream in the order fields of cgraph
correctly, so -fno-toplevel-reorder works within single compilation unit
with -flto-partition=none.

This is currently needed to build kernel with LTO and it is useful otherwise
(i.e. I made the patch originally for some experiments with Mozilla load times)

Andi has patch to fix stream in code to stream in linker order that fixes
the order with -flto-partition=none completely and I will followup with patch
to make partitioning to honor the streaming order.

Boostrapped/regtested x86_64-linux, OK?

Honza
* lto-streamer.h (lto_input_toplevel_asms): Add order_base parameter.
* lto-streamer-in.c (lto_input_toplevel_asms): Stream in order.
* lto-streamer-out.c (lto_output_toplevel_asms): Stream out order.
* lto-cgraph.c (order_base): New static var.
(lto_output_node): Stream in order.
(lto_output_varpool_node): Stream out order.
(input_node): Stream in order.
(input_varpool_node): Stream out order.
(input_cgraph_1): Initialize order base; update call of
lto_input_toplevel_asms.

Index: lto-streamer.h
===
--- lto-streamer.h  (revision 179413)
+++ lto-streamer.h  (working copy)
@@ -807,7 +807,7 @@ extern void lto_input_function_body (str
 const char *);
 extern void lto_input_constructors_and_inits (struct lto_file_decl_data *,
  const char *);
-extern void lto_input_toplevel_asms (struct lto_file_decl_data *);
+extern void lto_input_toplevel_asms (struct lto_file_decl_data *, int);
 extern struct data_in *lto_data_in_create (struct lto_file_decl_data *,
const char *, unsigned,
VEC(ld_plugin_symbol_resolution_t,heap) *);
Index: lto-streamer-in.c
===
--- lto-streamer-in.c   (revision 179413)
+++ lto-streamer-in.c   (working copy)
@@ -1144,7 +1144,7 @@ lto_input_tree (struct lto_input_block *
 /* Input toplevel asms.  */
 
 void
-lto_input_toplevel_asms (struct lto_file_decl_data *file_data)
+lto_input_toplevel_asms (struct lto_file_decl_data *file_data, int order_base)
 {
   size_t len;
   const char *data = lto_get_section_data (file_data, LTO_section_asm,
@@ -1173,7 +1173,12 @@ lto_input_toplevel_asms (struct lto_file
 header->lto_header.minor_version);
 
   while ((str = streamer_read_string_cst (data_in, &ib)))
-cgraph_add_asm_node (str);
+{
+  struct cgraph_asm_node *node = cgraph_add_asm_node (str);
+  node->order = streamer_read_hwi (&ib) + order_base;
+  if (node->order >= cgraph_order)
+   cgraph_order = node->order + 1;
+}
 
   clear_line_info (data_in);
   lto_data_in_delete (data_in);
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 179413)
+++ lto-streamer-out.c  (working copy)
@@ -954,7 +954,10 @@ lto_output_toplevel_asms (void)
   streamer_write_char_stream (ob->string_stream, 0);
 
   for (can = cgraph_asm_nodes; can; can = can->next)
-streamer_write_string_cst (ob, ob->main_stream, can->asm_str);
+{
+  streamer_write_string_cst (ob, ob->main_stream, can->asm_str);
+  streamer_write_hwi (ob, can->order);
+}
 
   streamer_write_string_cst (ob, ob->main_stream, NULL_TREE);
 
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 179413)
+++ lto-cgraph.c(working copy)
@@ -54,6 +54,9 @@ static void input_cgraph_opt_summary (VE
 /* Number of LDPR values known to GCC.  */
 #define LDPR_NUM_KNOWN (LDPR_RESOLVED_DYN + 1)
 
+/* All node orders are ofsetted by ORDER_BASE.  */
+static int order_base;
+
 /* Cgraph streaming is organized as set of record whose type
is indicated by a tag.  */
 enum LTO_cgraph_tags
@@ -425,6 +428,7 @@ lto_output_node (struct lto_simple_outpu
 
   streamer_write_enum (ob->main_stream, LTO_cgraph_tags, LTO_cgraph_last_tag,
   tag);
+  streamer_write_hwi_stream (ob->main_stream, node->order);
 
   /* In WPA mode, we only output part of the call-graph.  Also, we
  fake cgraph node attributes.  There are two cases that we care.
@@ -548,6 +552,7 @@ lto_output_varpool_node (struct lto_simp
   struct bitpack_d bp;
   int ref;
 
+  streamer_write_hwi_stream (ob->main_stream, node->order);
   lto_output_var_decl_index (ob->decl_state, ob->main_stream, node->decl);
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (&bp, node->externally_visible, 1);
@@ -960,7 +965,9 @@ input_node (struct lto_file_decl_data *f
   unsigned decl_index;
   int ref = LCC_NOT_FOUND, ref2 = LCC_NOT_FOUND;
   int clone_ref;
+  int order;
 
+  order = streamer_read_hwi (ib) + order_base;
   clone_ref = streamer_read_hwi (i

Re: Vector shuffling

2011-10-03 Thread Artem Shinkarov
Hi, can anyone commit it please?

Richard?
Or may be Richard?


Thanks,
Artem.



On Sat, Oct 1, 2011 at 12:21 AM, Artem Shinkarov
 wrote:
> Sorry for that, the vector comparison was submitted earlier. In the
> attachment there is a new version of the patch against the latest
> checkout.
>
> Richard, can you have a look at the genopinit.c, I am using
> set_direct_optab_handler, is it correct?
>
> All the rest seems to be the same.
>
>
> Thanks,
> Artem.
>
>
> On Fri, Sep 30, 2011 at 10:24 PM, Richard Henderson  wrote:
>> On 09/30/2011 12:14 PM, Artem Shinkarov wrote:
>>> Ok, in the attachment there is a patch which fixes mentioned errors.
>>
>> The changes are ok.  I would have committed it for you, only the patch
>> isn't against mainline.  There are 4 rejects.
>>
>>
>> r~
>>
>


Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
On 30 September 2011 14:21, Ramana Radhakrishnan
 wrote:
> Hi Dave,
>
>
> The nit-picky bit - There are still a number of formatting issues with
> your patch . Could you run your patch through
> contrib/check_GNU_style.sh and correct these. These are typically
> around problems with the number of spaces between a full stop and the
> end of comment, lines with trailing whitespaces and a few lines with
> number of characters > 80.  Thanks.

Oops - sorry about those; I'll run it through the check script and nail them.

>>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>>
>>+      else
>>+      {
>>+        /* Silence false potentially unused warning */
>>+        required_value_lo = NULL;
>>+        required_value_hi = NULL;
>>+      }
>>
>
> s/NULL/NULL_RTX in a number of places in arm.c

OK.

>>+      /* The restrictions on target registers in ARM mode are that the two
>>+       registers are consecutive and the first one is even; Thumb is
>>+       actually more flexible, but DI should give us this anyway.
>>+       Note that the 1st register always gets the lowest word in memory.  */
>>+      gcc_assert ((REGNO (value) & 1) == 0);
>>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>>+      operands[3] = memory;
>>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, 
>>%%C3",
>>+                         cc);
>>+    }
>>
>
> The restriction is actually mandatory for ARM state only and thus I'm fine
> with this assertion being true only in ARM state.

OK, I can make the assert only for thumb mode; but I thought the simpler
logic was better and should hold true anyway because of DI mode allocation.

> I don't like duplicating the tests from gcc.dg into gcc.target/arm.
> If you wanted to check for assembler output specific to a target you could
> add your own conditions to the test in gcc.dg and conditionalize that on
> target arm_eabi
>
> Something like :
>
> { dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .
>
> I would like a testsuite maintainer to comment on the testsuite infrastructure
> bits as well but I have a few comments below .

As discussed, I don't like the dupes either - the problem is that we
have 3 tests
with identical code but different dg annotation:

   1) Build & run and check that the sync behaves correctly - using whatever
   compile flags you happen to have. (gcc.dg version)
   2) Build and check assembler for use of ldrexd - compiled with armv6k flags
   3) Build and check assembler doesn't use ldrexd - compiled with armv5 flags

Because (2) and (3) include different dg-add-options lines I don't see
how I can combine them.

The suggestion that I'm OK with is to #include the gcc.dg one in the
gcc.arm one.

>>> +# Return 1 if the target supports atomic operations on "long long" and can 
>>> actually
>>+# execute them
>>+# So far only put checks in for ARM, others may want to add their own
>>+proc check_effective_target_sync_longlong { } {
>>+    return [check_runtime sync_longlong_runtime {
>>+      #include 
>>+      int main()
>>+      {
>>+      long long l1;
>>+
>>+      if (sizeof(long long)!=8)
>
> Space between ')' and ! as well as '=' and 8
>
>>+        exit(1);
>>+
>>+      #ifdef __arm__
>
> Why is this checking only for ARM state ? We could have ldrexd in T2 as
> well ?

Because __arm__ gets defined for either thumb or arm mode; in thumb mode
we just get __thumb__  (and __thumb2__) defined as well.

> Otherwise the functionality looks good to me. Can you confirm that
> this has survived a testrun for v7-a thumb2 and v7-a arm state ?

Yes it did.  I'll give it another whirl later today after I go and fix
the formatting niggles and mvoe the test.

Thanks for the review.

Dave


Re: Intrinsics for N2965: Type traits and base classes

2011-10-03 Thread Jason Merrill
The code looks good, though you are still missing some spaces before 
'('.  The main thing left is some testcases.


Jason


Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Tristan Gingold

On Sep 30, 2011, at 5:19 PM, Joseph S. Myers wrote:

> On Fri, 30 Sep 2011, Tristan Gingold wrote:
> 
>> If you prefer a target hook, I'm fine with that.  I will write such a patch.
>> 
>> I don't think it must be restricted to system headers, as it is possible 
>> that the user 'imports' such a function (and define it in one of VMS 
>> favorite languages such as macro-32 or bliss).
> 
> If it's not restricted to system headers, then probably the option is 
> better than the target hook.

Is it ok with this option name (-fdecc-extensions) or do you prefer a more 
generic option name,
such as -fallow-unnamed-variadic-functions ?

Tristan.



Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
On 30 September 2011 18:01, H.J. Lu  wrote:
> You may want to look a look at:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50583
>
> ARM may have the same problem.

OK - although to be honest this patch only stretches the same
structures to 64bit - any major changes in semantics are a separate issue - but
thanks for pointing it out.

Hmm - I think what's produced is correct; however the manual
description is inconsistent:

 These builtins perform the operation suggested by the name, and
 returns the value that had previously been in memory.  That is,

  { tmp = *ptr; *ptr OP= value; return tmp; }

The ARM code (see below) does a single load inside a loop with a guarded
store.  This guarantees that the value returned is the value that
was 'previously been in memory' directly prior to the atomic operation - however
that does mean it doesn't do the pair of accesses implied by the 'tmp
= *ptr; *ptr OP= value'

On ARM the operation for fetch_and_add we get:
(This is pre-my-patch and 32bit, my patch doesn't change the structure
except for the position of that last label):

mov r3, r0
dmb sy
.LSYT6:
ldrex   r0, [r3]
add r2, r0, r1
strex   r0, r2, [r3]
teq r0, #0
bne .LSYT6
sub r0, r2, r1
dmb sy

That seems the correct semantics to me - if not what am I missing? Was
the intention of the example
really to cause two loads - if so why?

for sync_and_fetch we get:


dmb sy
.LSYT6:
ldrex   r0, [r3]
add r0, r0, r1
strex   r2, r0, [r3]
teq r2, #0
bne .LSYT6
dmb sy

i.e. the value returned is always the value that goes into the guarded
store - and is hence
always the value that's stored.

Dave


Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Basile Starynkevitch
On Mon, Oct 03, 2011 at 03:16:11PM +0200, Tristan Gingold wrote:
> 
> On Sep 30, 2011, at 5:19 PM, Joseph S. Myers wrote:
> 
> Is it ok with this option name (-fdecc-extensions) or do you prefer a more 
> generic option name,
> such as -fallow-unnamed-variadic-functions ?


My preference is to avoid using the -fdecc-extensions and use a more 
explanative & generic option name.

But my (non-native English speaker) understanding of
-fallow-unnamed-variadic-functions is misleading: I read it to allow
anonymous functions (think of lambda) which happends to be variadic, which
is not what your patch gives.

What about -fallow-fully-variadic-functions or
-fallow-very-variadic-functions ?


And we could also imagine having a GCC #pragma which change the acceptance
of variadic functions.

By the way, I also regret the name of -fms-extensions option;
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Unnamed-Fields.html#Unnamed-Fields
because accepting an unamed field should not be percieved as MicroSoft
specific, but as a unseful language extension.

[My point is that language extensions should not be enabled by options
containing a brand or a trademark which suggested them, they should be named
by options which are evocative of what the extension provides.]

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Bernd Schmidt
On 10/03/11 10:23, Richard Sandiford wrote:
> Bernd Schmidt  writes:
>> On 09/14/11 11:03, Richard Sandiford wrote:
>>> ...I didn't see from an admittedly quick read of the patch how you
>>> handle memory disambiguation between iterations.  If a loop includes:
>>>
>>>  lb $3,($4)
>>>  sb $5,1($4)
>>>
>>> then the two instructions can be reordered by normal ebb scheduling,
>>> but the inter-iteration conflict is important for modulo scheduling.
>>
>> There's nothing special to handle, I think. sched-deps should see that
>> the ld in iteration 1 is DEP_ANTI against the sb in iteration 0
>> (assuming there's also an increment).
> 
> For the record, I don't agree that we should rely on register
> dependencies to handle memory dependencies.  It's possible for MEMs in
> different iterations to alias without there being a register dependence
> between them.

I don't know what you mean by "register dependence" here. sched-deps
analyzes MEMs for whether they depend on each other, but the term
"register dependence" suggests you aren't thinking about this.

If there was a problem, then rtl loop unrolling would also cause it
(since the modulo scheduling patch effectively does nothing else). Are
you sure there really is a problem?


Bernd


Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
On 3 October 2011 09:35, Andrew Haley  wrote:
> On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
>> On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
>>
>>> On 26 July 2011 10:01, Dr. David Alan Gilbert  
>>> wrote:

 +
 +extern unsigned int __write(int fd, const void *buf, unsigned int count);
>>>
>>> Why are we using __write instead of write?
>>
>> Because plain write is in the user's namespace in ISO C.  See what I said
>> in  - the
>> alternative is hardcoding the syscall number and using the syscall
>> directly.
>
> That would be better, no?  Unless __write is part of the glibc API,
> which AFAIK it isn't.

I could change it to calling the syscall directly - although it gets
a little messy having to deal with both ARM and Thumb syscalls;
I was trying to avoid further complicating an already complicated corner
case.

Dave


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 10/03/11 10:23, Richard Sandiford wrote:
>> Bernd Schmidt  writes:
>>> On 09/14/11 11:03, Richard Sandiford wrote:
 ...I didn't see from an admittedly quick read of the patch how you
 handle memory disambiguation between iterations.  If a loop includes:

  lb $3,($4)
  sb $5,1($4)

 then the two instructions can be reordered by normal ebb scheduling,
 but the inter-iteration conflict is important for modulo scheduling.
>>>
>>> There's nothing special to handle, I think. sched-deps should see that
>>> the ld in iteration 1 is DEP_ANTI against the sb in iteration 0
>>> (assuming there's also an increment).
>> 
>> For the record, I don't agree that we should rely on register
>> dependencies to handle memory dependencies.  It's possible for MEMs in
>> different iterations to alias without there being a register dependence
>> between them.
>
> I don't know what you mean by "register dependence" here. sched-deps
> analyzes MEMs for whether they depend on each other, but the term
> "register dependence" suggests you aren't thinking about this.

Well, as you said, sched-deps uses more exact memory disambiguation
than SMS.  But that's for a reason: if we're scheduling a loop body
using haifa-sched, we only care about intra-iteration memory
dependencies.  But modulo scheduling allows movement between
iterations as well.

So my original point was that it looked like you were adding support
for inter-iteration scheduling while still using intra-iteration memory
dependencies.  I (probably wrongly, sorry) took your response to mean
that inter-iteration memory dependencies would be accompanied by some
sort of register dependency, so that doesn't matter.

> If there was a problem, then rtl loop unrolling would also cause it
> (since the modulo scheduling patch effectively does nothing else). Are
> you sure there really is a problem?

I'm not sure I follow.  Unrolling a loop {A, B, C, D} gives:

  A1
  B1
  C1
  D1
 A2
 B2
 C2
 D2
A3
B3
C3
D3

so inter-iteration dependencies aren't a problem.  Whereas I thought your
modulo instruction did:

  A1
  B1  A2
  C1  B2  A3
  D1  C2  B3
  D2  C3
  D3

so if D1 writes to memory that A2 (but not A1) _might_ load, then the
loop doesn't behave the same way.

Richard


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Bernd Schmidt
On 10/03/11 16:21, Richard Sandiford wrote:
> so inter-iteration dependencies aren't a problem.  Whereas I thought your
> modulo instruction did:
> 
>   A1
>   B1  A2
>   C1  B2  A3
>   D1  C2  B3
>   D2  C3
>   D3
> 
> so if D1 writes to memory that A2 (but not A1) _might_ load, then the
> loop doesn't behave the same way.

But sched-deps will have found a dependence between D1 and A2 so the
schedule won't look like this.


Bernd


Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Andreas Schwab
Basile Starynkevitch  writes:

> What about -fallow-fully-variadic-functions or
> -fallow-very-variadic-functions ?

-fallow-parameterless-variadic-functions

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Bernd Schmidt
On 10/03/11 16:21, Richard Sandiford wrote:
> I'm not sure I follow.  Unrolling a loop {A, B, C, D} gives:
> 
>   A1
>   B1
>   C1
>   D1
>  A2
>  B2
>  C2
>  D2
> A3
> B3
> C3
> D3
> 
> so inter-iteration dependencies aren't a problem.

Expanding on the previous answer, yes they are if this basic block is
later scheduled by haifa sched. Modulo scheduling using the algorithm in
my patch is exactly equivalent to scheduling an unrolled loop, with
nothing more than the additional constraint that for any insn X,
  t(Xn+1) = t(Xn) + II

So, the above would be a valid schedule, and if there are no
inter-iteration dependencies, so would the following:

>   A1
>   B1  A2
>   C1  B2  A3
>   D1  C2  B3
>   D2  C3
>   D3


Bernd


Re: [PATCH] [mingw] fix typo: s/_REENTRANCE/_REENTRANT/

2011-10-03 Thread Kai Tietz
2011/10/3 Ozkan Sezer :
> PING?
>
> On Thu, Sep 22, 2011 at 2:28 PM, Ozkan Sezer  wrote:
>> Hi:
>>
>> Unless I'm missing something, the mingw CPP_SPEC changes introduced in
>> r171833 have a typo: -D_REENTRANCE should read -D_REENTRANT . Patchlet
>> below.  Please review, and apply if it's OK.
>>
>>
>> config/i386/mingw-w64.h (CPP_SPEC): Rename _REENTRANCE to _REENTRANT.
>> config/i386/mingw32.h (CPP_SPEC): Likewise.
>>
>> Index: config/i386/mingw-w64.h
>> ===
>> --- config/i386/mingw-w64.h     (revision 171833)
>> +++ config/i386/mingw-w64.h     (working copy)
>> @@ -25,8 +25,8 @@
>>  #undef CPP_SPEC
>>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
>>                 "%{municode:-DUNICODE} " \
>> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
>> -                "%{" SPEC_PTHREAD2 ":-U_REENTRANCE} "
>> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
>> +                "%{" SPEC_PTHREAD2 ":-U_REENTRANT} "
>>
>>  #undef STARTFILE_SPEC
>>  #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
>> Index: config/i386/mingw32.h
>> ===
>> --- config/i386/mingw32.h       (revision 177789)
>> +++ config/i386/mingw32.h       (working copy)
>> @@ -87,7 +87,7 @@
>>
>>  #undef CPP_SPEC
>>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
>> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
>> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
>>                 "%{" SPEC_PTHREAD2 ": } "
>>
>>  /* For Windows applications, include more libraries, but always include
>>
>>
>> --
>> O.S.
>>

Patch is ok together with a ChangeLog.

Thanks,
Kai


Re: [PATCH] [mingw] fix typo: s/_REENTRANCE/_REENTRANT/

2011-10-03 Thread Ozkan Sezer
On Mon, Oct 3, 2011 at 5:56 PM, Kai Tietz  wrote:
> 2011/10/3 Ozkan Sezer :
>> PING?
>>
>> On Thu, Sep 22, 2011 at 2:28 PM, Ozkan Sezer  wrote:
>>> Hi:
>>>
>>> Unless I'm missing something, the mingw CPP_SPEC changes introduced in
>>> r171833 have a typo: -D_REENTRANCE should read -D_REENTRANT . Patchlet
>>> below.  Please review, and apply if it's OK.
>>>
>>>
>>> config/i386/mingw-w64.h (CPP_SPEC): Rename _REENTRANCE to _REENTRANT.
>>> config/i386/mingw32.h (CPP_SPEC): Likewise.
>>>
>>> Index: config/i386/mingw-w64.h
>>> ===
>>> --- config/i386/mingw-w64.h     (revision 171833)
>>> +++ config/i386/mingw-w64.h     (working copy)
>>> @@ -25,8 +25,8 @@
>>>  #undef CPP_SPEC
>>>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
>>>                 "%{municode:-DUNICODE} " \
>>> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
>>> -                "%{" SPEC_PTHREAD2 ":-U_REENTRANCE} "
>>> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
>>> +                "%{" SPEC_PTHREAD2 ":-U_REENTRANT} "
>>>
>>>  #undef STARTFILE_SPEC
>>>  #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
>>> Index: config/i386/mingw32.h
>>> ===
>>> --- config/i386/mingw32.h       (revision 177789)
>>> +++ config/i386/mingw32.h       (working copy)
>>> @@ -87,7 +87,7 @@
>>>
>>>  #undef CPP_SPEC
>>>  #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
>>> -                "%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
>>> +                "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
>>>                 "%{" SPEC_PTHREAD2 ": } "
>>>
>>>  /* For Windows applications, include more libraries, but always include
>>>
>>>
>>> --
>>> O.S.
>>>
>
> Patch is ok together with a ChangeLog.
>
> Thanks,
> Kai
>

Here it is again with a changelog entry (Thought that I included one
the first time..)  Thanks.

* config/i386/mingw-w64.h (CPP_SPEC): Rename _REENTRANCE to _REENTRANT.
* config/i386/mingw32.h (CPP_SPEC): Likewise.

Index: config/i386/mingw-w64.h
===
--- config/i386/mingw-w64.h (revision 171833)
+++ config/i386/mingw-w64.h (working copy)
@@ -25,8 +25,8 @@
 #undef CPP_SPEC
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
 "%{municode:-DUNICODE} " \
-"%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
-"%{" SPEC_PTHREAD2 ":-U_REENTRANCE} "
+"%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
+"%{" SPEC_PTHREAD2 ":-U_REENTRANT} "

 #undef STARTFILE_SPEC
 #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
Index: config/i386/mingw32.h
===
--- config/i386/mingw32.h   (revision 177789)
+++ config/i386/mingw32.h   (working copy)
@@ -87,7 +87,7 @@

 #undef CPP_SPEC
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
-"%{" SPEC_PTHREAD1 ":-D_REENTRANCE} " \
+"%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
 "%{" SPEC_PTHREAD2 ": } "

 /* For Windows applications, include more libraries, but always include

--
O.S.


[pph] Detect state mutation in DECLs/TYPEs [1/8] (issue5180042)

2011-10-03 Thread Diego Novillo

This series of patches changes the streamer cache so we can:

1- Detect when certain tree nodes have changed during parsing.  This
   is useful when generating a PPH image after reading a set of
   images included by it.  During parsing, a DECL may change from a
   forward declaration to a full definition, or a FUNCTION_DECL may
   get its body filled-in, etc.
   
   In these cases, instead of emitting an external reference to the
   declaration, we start a new (mutated) record that fills in the
   new data in the mutated object.  Currently, this means that we
   overwrite ALL the fields in the mutated object.

2- Tag cache entries with the data type of the pointed-to objects.
   This is needed to make sure we are reading the data type we expect
   to be reading.  Again, this happens when multiple PPH images are
   being read to generate another one.

   After the children PPH images have been read, the parser may
   invalidate some/most of the data in those images (e.g., when
   merging declarations).  In those cases, the memory used by the
   original object is re-used, so when we get a cache hit by
   pointer-matching, we make sure that the cache entry is for the same
   data type that we expect.  If not, we don't consider that a cache
   hit and re-pickle the pointer.

This first patch introduces signatures.  It uses libiberty's crc32
computation to sign the tree.  We only care to sign certain trees, and
only when generating a PPH from other PPHs, so we do not always need
to sign trees.

In particular, we never need to sign trees when compiling translation
units (i.e. "pure" readers).

Tested on x86_64.  Committed to branch.


Diego.

* pph-streamer-in.c (pph_is_reference_marker): Move to
pph-streamer.h.
(pph_read_namespace_tree): Call tree_needs_signature to
determine if EXPR should be signed.
Call pph_get_signature.
* pph-streamer.c (pph_cache_sign): Add argument CRC.
Change return value to void.  Update all users.
(pph_get_signature): New.
* pph-streamer.h (pph_cache_sign): Declare.
(pph_get_signature): Declare.
(pph_is_reference_marker): Move from pph-streamer-in.c.
(tree_needs_signature): New.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 8e7c772..1fd810f 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -145,18 +145,6 @@ pph_init_read (pph_stream *stream)
 }
 
 
-/* Return true if MARKER is PPH_RECORD_IREF, PPH_RECORD_XREF,
-   or PPH_RECORD_PREF.  */
-
-static inline bool
-pph_is_reference_marker (enum pph_record_marker marker)
-{
-  return marker == PPH_RECORD_IREF
- || marker == PPH_RECORD_XREF
- || marker == PPH_RECORD_PREF;
-}
-
-
 /* Read and return a record header from STREAM.  When a PPH_RECORD_START
marker is read, the next word read is an index into the streamer
cache where the rematerialized data structure should be stored.
@@ -2128,8 +2116,7 @@ pph_read_namespace_tree (pph_stream *stream, tree 
enclosing_namespace)
   if (tag == LTO_builtin_decl)
 {
   /* If we are going to read a built-in function, all we need is
-the code and class.  Note that builtins are never stored in
- the pickle cache.  */
+the code and class.  */
   expr = streamer_get_builtin_tree (ib, data_in);
 }
   else if (tag == lto_tree_code_to_tag (INTEGER_CST))
@@ -2155,12 +2142,26 @@ pph_read_namespace_tree (pph_stream *stream, tree 
enclosing_namespace)
   expr = expr;
 }
 }
+
+  /* Add the new tree to the cache and read its body.  The tree
+ is added to the cache before we read its body to handle
+ circular references and references from children nodes.  */
   pph_cache_insert_at (&stream->cache, expr, ix);
   pph_read_tree_body (stream, expr);
 
-  /* If needed, sign the recently materialized tree to detect mutations.  
*/
-  if (DECL_P (expr) || TYPE_P (expr))
-pph_cache_sign (&stream->cache, ix, tree_size (expr));
+  /* If needed, sign the recently materialized tree to detect
+ mutations.  Note that we only need to compute signatures
+ if we are generating a PPH image.  That is the only time
+ where we need to determine whether a tree read from PPH
+ was updated while parsing the header file that we are
+ currently generating.  */
+  if (pph_writer_enabled_p () && tree_needs_signature (expr))
+{
+  unsigned crc;
+  size_t nbytes;
+  crc = pph_get_signature (expr, &nbytes);
+  pph_cache_sign (&stream->cache, ix, crc, nbytes);
+}
 }
 
   return expr;
diff --git a/gcc/cp/pph-streamer.c b/gcc/cp/pph-streamer.c
index 668f96c..d0fac57 100644
--- a/gcc/cp/pph-streamer.c
+++ b/gcc/cp/pph-streamer.c
@@ -533,20 +533,58 @@ pph_cache_add (pph_cache *cache, void *data, unsigned 
*ix_p)
 }
 
 
-/* Generate a CRC32 signature for the first NBYTES of

Re: Vector shuffling

2011-10-03 Thread Richard Henderson
On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
> Hi, can anyone commit it please?
> 
> Richard?
> Or may be Richard?

Committed.


r~


[pph] Detect state mutation in DECLs/TYPEs [2/8] (issue5179042)

2011-10-03 Thread Diego Novillo

This patch re-factors pph_start_record so we can separate the decision
of what marker to use from the emission of the marker.

This is used by pph_out_start_tree_record to decide if a new marker
will be needed.  In this patch, it will only display debugging info.
The actual change comes in a subsequent patch.

Tested on x86_64.  Committed to branch.

* pph-streamer-out.c (pph_get_marker_for): Factor out of
pph_start_record.
(pph_out_reference_record): Rename from pph_out_start_ref_record.
(pph_out_start_tree_record): Detect when a tree's signature
should be re-checked.

diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c
index 2f9fcae..4bff1b6 100644
--- a/gcc/cp/pph-streamer-out.c
+++ b/gcc/cp/pph-streamer-out.c
@@ -226,59 +226,75 @@ pph_cache_should_handle (tree t)
 }
 
 
-/* If DATA is NULL or it existed in one of the pickle caches associated
-   with STREAM, write a reference marker and return true.  Otherwise,
-   do nothing and return false.  */
+/* Return a PPH record marker according to whether DATA is NULL or
+   it can be found in one of the caches associated with STREAM.
 
-static bool
-pph_out_start_ref_record (pph_stream *stream, void *data)
+   If DATA is in any of the caches, return the corresponding slot in
+   *IX_P.  If DATA is in the cache of an image included by STREAM,
+   return the image's index in *INCLUDE_IX_P.
+
+   In all other cases, *IX_P and *INCLUDE_IX_P will be set to -1.  */
+
+static enum pph_record_marker
+pph_get_marker_for (pph_stream *stream, void *data, unsigned *include_ix_p,
+unsigned *ix_p)
 {
-  unsigned ix, include_ix;
+  *ix_p = -1u;
+  *include_ix_p = -1u;
 
-  /* Represent NULL pointers with a single PPH_RECORD_END.  */
+  /* We represent NULL pointers with PPH_RECORD_END.  */
   if (data == NULL)
-{
-  pph_out_record_marker (stream, PPH_RECORD_END);
-  return true;
-}
+return PPH_RECORD_END;
 
-  /* See if we have data in STREAM's cache.  If so, write an internal
- reference to it and inform the caller that it should not write a
- physical representation for DATA.  */
-  if (pph_cache_lookup (&stream->cache, data, &ix))
-{
-  pph_out_record_marker (stream, PPH_RECORD_IREF);
-  pph_out_uint (stream, ix);
-  return true;
-}
+  /* If DATA is in STREAM's cache, return an internal reference marker.  */
+  if (pph_cache_lookup (&stream->cache, data, ix_p))
+return PPH_RECORD_IREF;
 
-  /* DATA is not in STREAM's cache.  See if it is in any of the
- included images.  If it is, write an external reference to it
- and inform the caller that it should not write a physical
- representation for DATA.  */
-  if (pph_cache_lookup_in_includes (data, &include_ix, &ix))
-{
-  pph_out_record_marker (stream, PPH_RECORD_XREF);
-  pph_out_uint (stream, include_ix);
-  pph_out_uint (stream, ix);
-  return true;
-}
+  /* If DATA is in the cache of an included image, return an external
+ reference marker.  */
+  if (pph_cache_lookup_in_includes (data, include_ix_p, ix_p))
+return PPH_RECORD_XREF;
+
+  /* If DATA is a pre-loaded tree node, return a pre-loaded reference
+ marker.  */
+  if (pph_cache_lookup (NULL, data, ix_p))
+return PPH_RECORD_PREF;
+
+  /* DATA is in none of the caches.  It should be pickled out.  */
+  return PPH_RECORD_START;
+}
+
+
+/* Write a reference record on STREAM.  MARKER is the tag indicating what
+   kind of reference to write.  IX is the cache slot index to write.
+   INCLUDE_IX is used for PPH_RECORD_XREF records.  */
+
+static inline void
+pph_out_reference_record (pph_stream *stream, enum pph_record_marker marker,
+  unsigned include_ix, unsigned ix)
+{
+  gcc_assert (marker == PPH_RECORD_END || pph_is_reference_marker (marker));
+
+  pph_out_record_marker (stream, marker);
 
-  /* DATA is not in any stream's cache. See if it is a preloaded node.  */
-  if (pph_cache_lookup (NULL, data, &ix))
+  if (pph_is_reference_marker (marker))
 {
-  pph_out_record_marker (stream, PPH_RECORD_PREF);
+  if (marker == PPH_RECORD_XREF)
+{
+  gcc_assert (include_ix != -1u);
+  pph_out_uint (stream, include_ix);
+}
+
+  gcc_assert (ix != -1u);
   pph_out_uint (stream, ix);
-  return true;
 }
-
-  /* Could not write a reference record.  DATA must be pickled.  */
-  return false;
+  else
+gcc_assert (marker == PPH_RECORD_END);
 }
 
 
 /* Start a new record in STREAM for DATA.  If DATA is NULL
-   write an end-of-record marker and return false.
+   write an end-of-record marker and return true.
 
If DATA is not NULL and did not exist in the pickle cache, add it,
write a start-of-record marker and return true.  This means that we
@@ -292,15 +308,19 @@ pph_out_start_ref_record (pph_stream *stream, void *data)
 static inline bool
 pph_out_start_record (pph_stream *stream, void *data)
 {
-  unsigned ix;

Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
(Sorry, repost - I'd meant to cc Mike and Rainer into the
conversation, but forgot to
add them).

On 3 October 2011 13:53, David Gilbert  wrote:
> On 30 September 2011 14:21, Ramana Radhakrishnan
>  wrote:
>> Hi Dave,
>>
>>
>> The nit-picky bit - There are still a number of formatting issues with
>> your patch . Could you run your patch through
>> contrib/check_GNU_style.sh and correct these. These are typically
>> around problems with the number of spaces between a full stop and the
>> end of comment, lines with trailing whitespaces and a few lines with
>> number of characters > 80.  Thanks.
>
> Oops - sorry about those; I'll run it through the check script and nail them.
>
>>>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>>>
>>>+      else
>>>+      {
>>>+        /* Silence false potentially unused warning */
>>>+        required_value_lo = NULL;
>>>+        required_value_hi = NULL;
>>>+      }
>>>
>>
>> s/NULL/NULL_RTX in a number of places in arm.c
>
> OK.
>
>>>+      /* The restrictions on target registers in ARM mode are that the two
>>>+       registers are consecutive and the first one is even; Thumb is
>>>+       actually more flexible, but DI should give us this anyway.
>>>+       Note that the 1st register always gets the lowest word in memory.  */
>>>+      gcc_assert ((REGNO (value) & 1) == 0);
>>>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>>>+      operands[3] = memory;
>>>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, 
>>>%%C3",
>>>+                         cc);
>>>+    }
>>>
>>
>> The restriction is actually mandatory for ARM state only and thus I'm fine
>> with this assertion being true only in ARM state.
>
> OK, I can make the assert only for thumb mode; but I thought the simpler
> logic was better and should hold true anyway because of DI mode allocation.
>
>> I don't like duplicating the tests from gcc.dg into gcc.target/arm.
>> If you wanted to check for assembler output specific to a target you could
>> add your own conditions to the test in gcc.dg and conditionalize that on
>> target arm_eabi
>>
>> Something like :
>>
>> { dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .
>>
>> I would like a testsuite maintainer to comment on the testsuite 
>> infrastructure
>> bits as well but I have a few comments below .
>
> As discussed, I don't like the dupes either - the problem is that we
> have 3 tests
> with identical code but different dg annotation:
>
>   1) Build & run and check that the sync behaves correctly - using whatever
>       compile flags you happen to have. (gcc.dg version)
>   2) Build and check assembler for use of ldrexd - compiled with armv6k flags
>   3) Build and check assembler doesn't use ldrexd - compiled with armv5 flags
>
> Because (2) and (3) include different dg-add-options lines I don't see
> how I can combine them.
>
> The suggestion that I'm OK with is to #include the gcc.dg one in the
> gcc.arm one.
>
 +# Return 1 if the target supports atomic operations on "long long" and 
 can actually
>>>+# execute them
>>>+# So far only put checks in for ARM, others may want to add their own
>>>+proc check_effective_target_sync_longlong { } {
>>>+    return [check_runtime sync_longlong_runtime {
>>>+      #include 
>>>+      int main()
>>>+      {
>>>+      long long l1;
>>>+
>>>+      if (sizeof(long long)!=8)
>>
>> Space between ')' and ! as well as '=' and 8
>>
>>>+        exit(1);
>>>+
>>>+      #ifdef __arm__
>>
>> Why is this checking only for ARM state ? We could have ldrexd in T2 as
>> well ?
>
> Because __arm__ gets defined for either thumb or arm mode; in thumb mode
> we just get __thumb__  (and __thumb2__) defined as well.
>
>> Otherwise the functionality looks good to me. Can you confirm that
>> this has survived a testrun for v7-a thumb2 and v7-a arm state ?
>
> Yes it did.  I'll give it another whirl later today after I go and fix
> the formatting niggles and mvoe the test.
>
> Thanks for the review.
>
> Dave
>


[pph] Detect state mutation in DECLs/TYPEs [3/8] (issue5167053)

2011-10-03 Thread Diego Novillo
This patch introduces actual handling of mutated records.

When the reader finds a PPH_RECORD_START_MUTATED record, it knows that
the tree it is about to read does not need to be allocated.  It can be
found in the cache for an external PPH file.  It reads the external
location of the tree, grabs it from the external cache and then reads
on top of that tree.

This could be improved at some point. Technically we only need to read
the fields that changed.

There is some re-factoring in pph_read_namespace_tree and
pph_write_namespace_tree needed to handle the new record type.

Tested on x86_64.  Committed to branch.


Diego.

* pph-streamer-in.c (pph_in_start_record): Handle
PPH_RECORD_START_MUTATED.
(pph_read_namespace_tree): Re-organize to handle
PPH_RECORD_START_MUTATED.
* pph-streamer-out.c (pph_out_start_tree_record): Change
return value to enum pph_record_marker.  Update all users.
If the signature of the tree node changed, change the marker
to be PPH_RECORD_START_MUTATED (disabled for now).
Handle PPH_RECORD_START_MUTATED.
(pph_write_namespace_tree): Re-organize to handle
PPH_RECORD_START_MUTATED.
* gcc/cp/pph-streamer.h (enum pph_record_marker): Rename
PPH_RECORD_MREF to PPH_RECORD_START_MUTATED.  Update all
users.
(pph_in_record_marker): Handle PPH_RECORD_START_MUTATED.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 1fd810f..3cbe168 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -186,11 +186,16 @@ pph_in_start_record (pph_stream *stream, unsigned 
*include_ix_p,
   || marker == PPH_RECORD_IREF
   || marker == PPH_RECORD_PREF)
 *cache_ix_p = pph_in_uint (stream);
-  else if (marker == PPH_RECORD_XREF)
+  else if (marker == PPH_RECORD_XREF
+   || marker == PPH_RECORD_START_MUTATED)
 {
   *include_ix_p = pph_in_uint (stream);
   *cache_ix_p = pph_in_uint (stream);
 }
+  else if (marker == PPH_RECORD_END || marker == PPH_RECORD_START_NO_CACHE)
+; /* Nothing to do.  This record will not need cache updates.  */
+  else
+gcc_unreachable ();
 
   return marker;
 }
@@ -2084,6 +2089,7 @@ pph_read_tree (struct lto_input_block *ib_unused 
ATTRIBUTE_UNUSED,
   return pph_read_namespace_tree (stream, NULL);
 }
 
+
 /* Read a tree from the STREAM.  It ENCLOSING_NAMESPACE is not null,
the tree may be unified with an existing tree in that namespace.  */
 
@@ -2092,7 +2098,6 @@ pph_read_namespace_tree (pph_stream *stream, tree 
enclosing_namespace)
 {
   struct lto_input_block *ib = stream->encoder.r.ib;
   struct data_in *data_in = stream->encoder.r.data_in;
-
   tree expr;
   enum pph_record_marker marker;
   unsigned image_ix, ix;
@@ -2107,28 +2112,32 @@ pph_read_namespace_tree (pph_stream *stream, tree 
enclosing_namespace)
   pph_cache *cache = pph_cache_select (stream, marker, image_ix);
   return (tree) pph_cache_get (cache, ix);
 }
-
-  /* We did not find the tree in the pickle cache, allocate the tree by
- reading the header fields (different tree nodes need to be
- allocated in different ways).  */
-  tag = streamer_read_record_start (ib);
-  gcc_assert ((unsigned) tag < (unsigned) LTO_NUM_TAGS);
-  if (tag == LTO_builtin_decl)
-{
-  /* If we are going to read a built-in function, all we need is
-the code and class.  */
-  expr = streamer_get_builtin_tree (ib, data_in);
-}
-  else if (tag == lto_tree_code_to_tag (INTEGER_CST))
+  else if (marker == PPH_RECORD_START
+   || marker == PPH_RECORD_START_NO_CACHE)
 {
-  /* For integer constants we only need the type and its hi/low
-words.  */
-  expr = streamer_read_integer_cst (ib, data_in);
-}
-  else
-{
-  /* Otherwise, materialize a new node from IB.  This will also read
- all the language-independent bitfields for the new tree.  */
+  /* This is a new tree that we need to allocate.  Start by
+ reading the header fields, so we know how to allocate it
+ (different tree nodes need to be allocated in different
+ ways).  */
+  tag = streamer_read_record_start (ib);
+  gcc_assert ((unsigned) tag < (unsigned) LTO_NUM_TAGS);
+  if (tag == LTO_builtin_decl)
+{
+  /* If we are going to read a built-in function, all we need is
+ the code and class.  */
+  gcc_assert (marker == PPH_RECORD_START_NO_CACHE);
+  return streamer_get_builtin_tree (ib, data_in);
+}
+  else if (tag == lto_tree_code_to_tag (INTEGER_CST))
+{
+  /* For integer constants we only need the type and its hi/low
+ words.  */
+  gcc_assert (marker == PPH_RECORD_START_NO_CACHE);
+  return streamer_read_integer_cst (ib, data_in);
+}
+
+  /* Materialize a new node from IB.  This will also read all the
+ language-independent bitfields for the new tr

[pph] Detect state mutation in DECLs/TYPEs [4/8] (issue5172046)

2011-10-03 Thread Diego Novillo

Somewhat unrelated to state mutation, but I needed these changes while
debugging the code.  This adds more debugging information to
cp_debug_parser.

I will be sending these debugging changes for review for trunk.


Tested on x86_64.  Committed to branch.


Diego.
* parser.c (cp_debug_parser): Add location information on
the about-to-be-parsed token.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1a0ed89..5b87275 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -466,6 +466,8 @@ void
 cp_debug_parser (FILE *file, cp_parser *parser)
 {
   const size_t window_size = 20;
+  cp_token *token;
+  expanded_location eloc;
 
   if (file == NULL)
 file = stderr;
@@ -531,6 +533,15 @@ cp_debug_parser (FILE *file, cp_parser *parser)
   fprintf (file, "Number of template parameter lists for the current "
   "declaration: %u\n", parser->num_template_parameter_lists);
   cp_debug_parser_tokens (file, parser, window_size);
+  token = parser->lexer->next_token;
+  fprintf (file, "Next token to parse:\n");
+  fprintf (file, "\tToken:  ");
+  cp_lexer_print_token (file, token);
+  eloc = expand_location (token->location);
+  fprintf (file, "\n\tFile:   %s\n", eloc.file);
+  fprintf (file, "\tLine:   %d\n", eloc.line);
+  fprintf (file, "\tColumn: %d\n", eloc.column);
+
 }
 
 
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/5172046


[pph] Detect state mutation in DECLs/TYPEs [5/8] (issue5178044)

2011-10-03 Thread Diego Novillo
Fixlet to update documentation for pph_cache_insert_at.


Diego.


* pph-streamer.c (pph_cache_insert_at): Fix documentation.

diff --git a/gcc/cp/pph-streamer.c b/gcc/cp/pph-streamer.c
index d0fac57..26bc6cd 100644
--- a/gcc/cp/pph-streamer.c
+++ b/gcc/cp/pph-streamer.c
@@ -402,11 +402,9 @@ pph_trace_bitpack (pph_stream *stream, struct bitpack_d 
*bp)
 }
 
 
-/* Insert DATA in CACHE at slot IX.  We support inserting the same
-   DATA at different locations of the array (FIXME pph, this happens
-   when reading builtins, which may have been converted into builtins
-   after they were read originally.  This should be detected and
-   converted into mutated references).  */
+/* Insert DATA in CACHE at slot IX.  As a restriction to prevent
+   stomping on cache entries, this will not allow inserting
+   into the same slot more than once.  */
 
 void
 pph_cache_insert_at (pph_cache *cache, void *data, unsigned ix)
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/5178044


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 10/03/11 16:21, Richard Sandiford wrote:
>> so inter-iteration dependencies aren't a problem.  Whereas I thought your
>> modulo instruction did:
>> 
>>   A1
>>   B1  A2
>>   C1  B2  A3
>>   D1  C2  B3
>>   D2  C3
>>   D3
>> 
>> so if D1 writes to memory that A2 (but not A1) _might_ load, then the
>> loop doesn't behave the same way.
>
> But sched-deps will have found a dependence between D1 and A2 so the
> schedule won't look like this.

OK, bad example, sorry.  So the fundamental assumption is that
if you have a loop:

Loop 1:
  A
  B
  C
  D

that you can unroll 4 times and schedule as:

Loop 2:
  A
  B A
  C B A
  D C B A
D C B
  D C
D

then 2 iterations of that loop:

  A
  B A
  C B A
  D C B A
D C B
  D C
D
  A
  B A
  C B A
  D C B A
D C B
  D C
D

are necessarily equivalent to:

Loop 3:
  A
  B A
  C B A
  D C B A
  A D C B
  B A D C
  C B A D
  D C B A
D C B
  D C
D

Is that right?  So if D from iteration N*4 of loop 1 doesn't alias A
from iteration N*4+1 of loop 1 (meaning loop 2 is correct) then it
follows that the D from any iteration M doesn't alias A from M+1
(meaning loop 3 is correct?  I'm still not convinced that follows
for sufficiently clever alias analysis.

Reason for asking is that (AIUI) SMS used to use stronger memory
disambiguation, but had to pull back to something more conservative
for similar reasons.

Richard


[pph] Detect state mutation in DECLs/TYPEs [6/8] (issue5175049)

2011-10-03 Thread Diego Novillo
This test just needed -fpermissive (it's a C program originally).

* g++.dg/pph/p4mean.cc: Remove XFAILs.  Add -w -fpermissive flags.

diff --git a/gcc/testsuite/g++.dg/pph/p4mean.cc 
b/gcc/testsuite/g++.dg/pph/p4mean.cc
index aa08239..e832ce5 100644
--- a/gcc/testsuite/g++.dg/pph/p4mean.cc
+++ b/gcc/testsuite/g++.dg/pph/p4mean.cc
@@ -1,6 +1,4 @@
-// { dg-timeout 2 { target *-*-* } }
-// { dg-xfail-if "INFINITE" { "*-*-*" } { "-fpph-map=pph.map" } }
-
+/* { dg-options "-w -fpermissive" }  */
 #include 
 #include 
 #include 
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/5175049


[pph] Detect state mutation in DECLs/TYPEs [7/8] (issue5177042)

2011-10-03 Thread Diego Novillo
Another fixlet.  All the pph cache lookup routines accept NULL index
pointers when the caller does not care for them.  So should
pph_get_marker_for.

* pph-streamer-in.c (pph_read_namespace_tree): Initialize EXPR.
* pph-streamer-out.c (pph_get_marker_for): Handle NULL values
for INCLUDE_IX_P and IX_P.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 3cbe168..a2bad3d 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -2098,7 +2098,7 @@ pph_read_namespace_tree (pph_stream *stream, tree 
enclosing_namespace)
 {
   struct lto_input_block *ib = stream->encoder.r.ib;
   struct data_in *data_in = stream->encoder.r.data_in;
-  tree expr;
+  tree expr = NULL_TREE;
   enum pph_record_marker marker;
   unsigned image_ix, ix;
   enum LTO_tags tag;
diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c
index f46c849..516eabb 100644
--- a/gcc/cp/pph-streamer-out.c
+++ b/gcc/cp/pph-streamer-out.c
@@ -239,8 +239,11 @@ static enum pph_record_marker
 pph_get_marker_for (pph_stream *stream, void *data, unsigned *include_ix_p,
 unsigned *ix_p)
 {
-  *ix_p = -1u;
-  *include_ix_p = -1u;
+  if (ix_p)
+*ix_p = -1u;
+
+  if (include_ix_p)
+*include_ix_p = -1u;
 
   /* We represent NULL pointers with PPH_RECORD_END.  */
   if (data == NULL)
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/5177042


[v3] Don't declare insert(&&) members in _Hashtable

2011-10-03 Thread Paolo Carlini

Hi,

noticed while working on finally defaulting pair' move constructor (the 
way to go, proposed by Daniel, seems to be using 
std::is_constructible instead of std::is_convertible 
for constraining. Testing that uncovered a number of interesting latent 
issues at various levels ;)


Tested x86_64-linux, committed.

Paolo.

/
2011-10-03  Paolo Carlini  

* include/bits/hashtable.h (_Hashtable<>::insert(value_type&&),
insert(const_iterator, value_type&&)): Don't define here...
* include/bits/unordered_set.h (__unordered_set<>,
__unordered_multiset<>): ... define here instead.
Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h(revision 179456)
+++ include/bits/hashtable.h(working copy)
@@ -374,14 +374,6 @@
_M_insert_bucket(_Arg&&, size_type,
 typename _Hashtable::_Hash_code_type);
 
-  template
-   std::pair
-   _M_insert(_Arg&&, std::true_type);
-
-  template
-   iterator
-   _M_insert(_Arg&&, std::false_type);
-
   typedef typename std::conditional<__unique_keys,
std::pair,
iterator>::type
@@ -393,38 +385,38 @@
   >::type
_Insert_Conv_Type;
 
+protected:
+  template
+   std::pair
+   _M_insert(_Arg&&, std::true_type);
+
+  template
+   iterator
+   _M_insert(_Arg&&, std::false_type);
+
 public:
   // Insert and erase
   _Insert_Return_Type
   insert(const value_type& __v)
-  { return _M_insert(__v, std::integral_constant()); }
+  { return _M_insert(__v, integral_constant()); }
 
   iterator
   insert(const_iterator, const value_type& __v)
   { return _Insert_Conv_Type()(insert(__v)); }
 
-  _Insert_Return_Type
-  insert(value_type&& __v)
-  { return _M_insert(std::move(__v),
-std::integral_constant()); }
-
-  iterator
-  insert(const_iterator, value_type&& __v)
-  { return _Insert_Conv_Type()(insert(std::move(__v))); }
-
   template::value>::type>
+   std::enable_if<__and_,
+ std::is_convertible<_Pair,
+ value_type>>::value>::type>
_Insert_Return_Type
insert(_Pair&& __v)
{ return _M_insert(std::forward<_Pair>(__v),
-  std::integral_constant()); }
+  integral_constant()); }
 
   template::value>::type>
+std::enable_if<__and_,
+ std::is_convertible<_Pair,
+ value_type>>::value>::type>
iterator
insert(const_iterator, _Pair&& __v)
{ return _Insert_Conv_Type()(insert(std::forward<_Pair>(__v))); }
Index: include/bits/unordered_set.h
===
--- include/bits/unordered_set.h(revision 179456)
+++ include/bits/unordered_set.h(working copy)
@@ -63,7 +63,9 @@
   typedef typename _Base::hasher  hasher;
   typedef typename _Base::key_equal   key_equal;
   typedef typename _Base::allocator_type  allocator_type;
-  
+  typedef typename _Base::iteratoriterator;
+  typedef typename _Base::const_iterator  const_iterator;
+
   explicit
   __unordered_set(size_type __n = 10,
  const hasher& __hf = hasher(),
@@ -103,6 +105,16 @@
this->insert(__l.begin(), __l.end());
return *this;
   }
+
+  using _Base::insert;
+
+  std::pair
+  insert(value_type&& __v)
+  { return this->_M_insert(std::move(__v), std::true_type()); }
+
+  iterator
+  insert(const_iterator, value_type&& __v)
+  { return insert(std::move(__v)).first; }
 };
 
   templateinsert(__l.begin(), __l.end());
return *this;
   }
+
+  using _Base::insert;
+
+  iterator
+  insert(value_type&& __v)
+  { return this->_M_insert(std::move(__v), std::false_type()); }
+
+  iterator
+  insert(const_iterator, value_type&& __v)
+  { return insert(std::move(__v)); }
 };
 
   template

Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Gabriel Dos Reis
On Mon, Oct 3, 2011 at 8:16 AM, Tristan Gingold  wrote:
>
> On Sep 30, 2011, at 5:19 PM, Joseph S. Myers wrote:
>
>> On Fri, 30 Sep 2011, Tristan Gingold wrote:
>>
>>> If you prefer a target hook, I'm fine with that.  I will write such a patch.
>>>
>>> I don't think it must be restricted to system headers, as it is possible
>>> that the user 'imports' such a function (and define it in one of VMS
>>> favorite languages such as macro-32 or bliss).
>>
>> If it's not restricted to system headers, then probably the option is
>> better than the target hook.
>
> Is it ok with this option name (-fdecc-extensions) or do you prefer a more 
> generic option name,
> such as -fallow-unnamed-variadic-functions ?
>

As observed earlier, there is nothing DEC-C specific about this, so
-fdecc-extensions isnt
appropriate.

"unnamed variadic functions" sounds as if the function itself is
unnamed, so not good.


-funnamed-variadic-parameter
-fpointless-variadic-functions


[pph] Detect state mutation in DECLs/TYPEs [8/8] (issue5164048)

2011-10-03 Thread Diego Novillo
This final patch adds support for type tagging cache entries (and file
records as well).

This resolves the ICEs we were getting in x6dynarray4.cc and
x7dynarray5.cc.  The parser was replacing a LANG_DECL pointer (that
had been freed during decl merging) with a CALL_EXPR.

The patch is somewhat large because it adds a new TAG argument to
the cache functions.  However, it is straightforward.

It uncovered a potential bug, too.  struct c_language_function is
always allocated at the same address as struct language_function
(since that field is the same embedded field).  I was getting cache
data type conflicts because of that, that means that we were getting
different objects from the cache stored at the same location.  I fixed
this by simply embedding the streaming of c_language_function inside
language_function.

Tested on x86_64.  Committed to branch.


Diego.

* pph-streamer.h (enum pph_tag): Define.
(struct pph_cache_entry): Add field TAG.
(pph_cache_insert_at): Add new argument TAG.
Update all users.
(pph_cache_lookup): Likewise.
(pph_cache_lookup_in_includes): Likewise.
(pph_cache_add): Likewise.
(pph_out_record_marker): Likewise.
(pph_in_record_marker): Add new argument *TAG_P.
Read *TAG_P.  Assert that it is a valid tag.
(pph_cache_get): Call pph_cache_get_entry.
(pph_tag_is_tree_code): New.
(pph_tree_code_to_tag): New.
(pph_tag_to_tree_code): New.
* pph-streamer-in.c (ALLOC_AND_REGISTER): Add new argument
TAG.  Update all users.
(ALLOC_AND_REGISTER_ALTERNATE): Likewise.
(pph_in_start_record): Add new argument EXPECTED_TAG.  Update
all users.
(pph_in_c_language_function): Inline into ...
(pph_in_language_function): ... here.
* pph-streamer-out.c (pph_get_marker_for): Add argument TAG.
Update all users.
(pph_out_reference_record): Likewise.
(pph_out_start_record): Likewise.
(pph_out_start_tree_record): Re-enable state mutation
detection.
(pph_out_c_language_function): Inline into ...
(pph_out_language_function): ... here.

testsuite/ChangeLog.pph

* g++.dg/pph/x6dynarray4.cc: Mark partially fixed.
* g++.dg/pph/x7dynarray5.cc: Mark partially fixed.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 2cf2edb..dd63715 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -6019,12 +6019,12 @@ pph_out_binding_table (pph_stream *stream, 
binding_table bt)
 {
   if (bt->chain[i])
{
- pph_out_record_marker (stream, PPH_RECORD_START);
+ pph_out_record_marker (stream, PPH_RECORD_START, PPH_binding_entry);
  pph_out_tree (stream, bt->chain[i]->name);
  pph_out_tree (stream, bt->chain[i]->type);
}
   else
-   pph_out_record_marker (stream, PPH_RECORD_END);
+   pph_out_record_marker (stream, PPH_RECORD_END, PPH_binding_entry);
 }
   pph_out_uint (stream, bt->entry_count);
 }
@@ -6042,7 +6042,9 @@ pph_in_binding_table (pph_stream *stream)
   bt = binding_table_new (chain_count);
   for (i = 0; i < chain_count; i++)
 {
-  enum pph_record_marker marker = pph_in_record_marker (stream);
+  enum pph_tag tag;
+  enum pph_record_marker marker = pph_in_record_marker (stream, &tag);
+  gcc_assert (tag == PPH_binding_entry);
   if (marker == PPH_RECORD_START)
{
  tree name = pph_in_tree (stream);
diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index a2bad3d..197937c 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -56,10 +56,10 @@ static int pph_reading_includes = 0;
registered in the PPH streamer cache.  DATA is the pointer returned
by the memory allocation call in ALLOC_EXPR.  IX is the cache slot 
in CACHE where the newly allocated DATA should be registered at.  */
-#define ALLOC_AND_REGISTER(CACHE, IX, DATA, ALLOC_EXPR)\
+#define ALLOC_AND_REGISTER(CACHE, IX, TAG, DATA, ALLOC_EXPR)   \
 do {   \
   (DATA) = (ALLOC_EXPR);   \
-  pph_cache_insert_at (CACHE, DATA, IX);   \
+  pph_cache_insert_at (CACHE, DATA, IX, TAG);  \
 } while (0)
 
 /* Same as ALLOC_AND_REGISTER, but instead of registering DATA into the
@@ -68,10 +68,11 @@ static int pph_reading_includes = 0;
to a different instance when aggregating individual PPH files into
the current translation unit (see pph_in_binding_level for an
example).  */
-#define ALLOC_AND_REGISTER_ALTERNATE(CACHE, IX, DATA, ALLOC_EXPR, ALT_DATA)\
+#define ALLOC_AND_REGISTER_ALTERNATE(CACHE, IX, TAG, DATA, ALLOC_EXPR,  \
+ ALT_DATA)  \
 do {   \
   (DATA) = (ALLOC_EXPR);   \

Re: Intrinsics for N2965: Type traits and base classes

2011-10-03 Thread Jonathan Wakely
On 3 October 2011 02:55, Michael Spertus wrote:
> Index: gcc/c-family/c-common.h
> ===
> --- gcc/c-family/c-common.h     (revision 178892)
> +++ gcc/c-family/c-common.h     (working copy)
> @@ -139,7 +139,8 @@
>   RID_IS_LITERAL_TYPE,         RID_IS_POD,
>   RID_IS_POLYMORPHIC,          RID_IS_STD_LAYOUT,
>   RID_IS_TRIVIAL,              RID_IS_UNION,
> -  RID_UNDERLYING_TYPE,
> +  RID_UNDERLYING_TYPE,         RID_BASES,
> +  RID_DIRECT_BASES,


Should that be kept in alphabetical order?


Fix C6x 24-bit unwinding opcodes

2011-10-03 Thread Paul Brook
The C6XABI defined persoality routines ID 3 and 4 use a single 24-bit block 
word of unwinding data.  Patch below makes sure this is preserved, rather than 
treating it as a set of unwinding opcode bytes.

I seem to have lost this bit of code when I merged the ARM and c6x 
implementations.

Applied to svn trunk.

Paul

2011-10-03  Paul Brook  

libgcc/
* unwind-arm-common.inc: Handle ID3/4 unwinding data.

Index: libgcc/unwind-arm-common.inc
===
--- libgcc/unwind-arm-common.inc(revision 179178)
+++ libgcc/unwind-arm-common.inc(working copy)
@@ -583,7 +583,7 @@ __gnu_unwind_pr_common (_Unwind_State st
   uws.words_left = 0;
   uws.bytes_left = 3;
 }
-  else
+  else if (id < 3)
 {
   uws.words_left = (uws.data >> 16) & 0xff;
   uws.data <<= 16;


Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread Joseph S. Myers
On Mon, 3 Oct 2011, Andrew Haley wrote:

> On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
> > On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
> > 
> >> On 26 July 2011 10:01, Dr. David Alan Gilbert  
> >> wrote:
> >>>
> >>> +
> >>> +extern unsigned int __write(int fd, const void *buf, unsigned int count);
> >>
> >> Why are we using __write instead of write?
> > 
> > Because plain write is in the user's namespace in ISO C.  See what I said 
> > in  - the 
> > alternative is hardcoding the syscall number and using the syscall 
> > directly.
> 
> That would be better, no?  Unless __write is part of the glibc API,
> which AFAIK it isn't.

It's exported at version GLIBC_2.0 (not GLIBC_PRIVATE) under the comment 
"functions used in inline functions or macros", although I don't actually 
see any such functions or macros in current glibc headers.  I think being 
under a public version means you can rely on it staying there.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix c6x unwinding info

2011-10-03 Thread Paul Brook
Patch below makes gcc emit the resuired assembly directives for c6x unwinding 
tables, same as ARM and IA64.  This fixes most of the unwinding related 
failures.

Tested on c6x-elf

Ok?

Paul

2011-10-03  Paul Brook  

* config/c6x/c6x.c (c6x_asm_emit_except_personality,
c6x_asm_init_sections): New functions.
(TARGET_ASM_EMIT_EXCEPT_PERSONALITY, TARGET_ASM_INIT_SECTIONS):
Define.

Index: gcc/config/c6x/c6x.c
===
--- gcc/config/c6x/c6x.c(revision 179178)
+++ gcc/config/c6x/c6x.c(working copy)
@@ -5625,6 +5625,26 @@ c6x_debug_unwind_info (void)
 
   return default_debug_unwind_info ();
 }
+
+/* Implement TARGET_ASM_EMIT_EXCEPT_PERSONALITY.  */
+
+static void
+c6x_asm_emit_except_personality (rtx personality)
+{
+  fputs ("\t.personality\t", asm_out_file);
+  output_addr_const (asm_out_file, personality);
+  fputc ('\n', asm_out_file);
+}
+
+/* Use a special assembly directive rather than a regular setion for
+   unwind table data.  */
+
+static void
+c6x_asm_init_sections (void)
+{
+  exception_section = get_unnamed_section (0, output_section_asm_op,
+  "\t.handlerdata");
+}
 
 /* Target Structure.  */
 
@@ -5769,6 +5789,12 @@ c6x_debug_unwind_info (void)
 #undef TARGET_ARM_EABI_UNWINDER
 #define TARGET_ARM_EABI_UNWINDER true
 
+#undef TARGET_ASM_EMIT_EXCEPT_PERSONALITY
+#define TARGET_ASM_EMIT_EXCEPT_PERSONALITY c6x_asm_emit_except_personality
+
+#undef TARGET_ASM_INIT_SECTIONS
+#define TARGET_ASM_INIT_SECTIONS c6x_asm_init_sections
+
 #undef TARGET_DEBUG_UNWIND_INFO
 #define TARGET_DEBUG_UNWIND_INFO  c6x_debug_unwind_info
 


Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread Jakub Jelinek
On Mon, Oct 03, 2011 at 03:55:58PM +, Joseph S. Myers wrote:
> > That would be better, no?  Unless __write is part of the glibc API,
> > which AFAIK it isn't.
> 
> It's exported at version GLIBC_2.0 (not GLIBC_PRIVATE) under the comment 
> "functions used in inline functions or macros", although I don't actually 
> see any such functions or macros in current glibc headers.  I think being 
> under a public version means you can rely on it staying there.

On the architectures where it has been exported, yes.  New architectures
might prune those.  But we are talking here about a particular architecture
which had these exported, therefore they will continue to be exported
in the future as well, unless glibc SONAME changes (very unlikely).

Jakub


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Bernd Schmidt
On 10/03/11 17:26, Richard Sandiford wrote:
> are necessarily equivalent to:
> 
> Loop 3:
>   A
>   B A
>   C B A
>   D C B A
>   A D C B
>   B A D C
>   C B A D
>   D C B A
> D C B
>   D C
> D

Sort of. The insns wouldn't rotate like this in a modulo-scheduled loop.

> Is that right?  So if D from iteration N*4 of loop 1 doesn't alias A
> from iteration N*4+1 of loop 1 (meaning loop 2 is correct) then it
> follows that the D from any iteration M doesn't alias A from M+1
> (meaning loop 3 is correct?  I'm still not convinced that follows
> for sufficiently clever alias analysis.

Is there any reason to believe that gcc does sufficiently clever alias
analysis? Again, think RTL loop unrolling, if you unroll 3 times

A1
B1
  A2
  B2
A3
B3

then you get identical insns An as well as Bn, and I don't believe
you'll ever get rtl alias analysis to answer differently for pairs
(B1/A2) and (B2/A3). So if there is a way for any of these pairs to
alias in any iteration, it must always detect the conflict.

Also consider that the code in sched-deps doesn't know which loop
iteration it is. Imagine a fully unrolled loop; insns Xn are identical
for a given X and all n. It doesn't matter where you put your
N-iteration window that you pass to sched-deps; since it doesn't know
about the initial conditions, it must act as if it could be from any
start iteration.

Put another way: unless I see a testcase that demonstrates otherwise, I
don't believe there is a problem. If there were a problem, it would be
with the alias analysis rather than the scheduling code.

> Reason for asking is that (AIUI) SMS used to use stronger memory
> disambiguation, but had to pull back to something more conservative
> for similar reasons.

Pointers? All I could find is a thread where rth seems to be of the same
opinion as me:

  http://gcc.gnu.org/ml/gcc/2004-09/msg01648.html


Bernd


New German PO file for 'gcc' (version 4.6.1)

2011-10-03 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-4.6.1.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: Initial shrink-wrapping patch

2011-10-03 Thread Bernd Schmidt
On 10/03/11 13:28, Basile Starynkevitch wrote:
> Regarding this shrink-wrapping patch, I would suggest to describe, in a
> comments of one or two sentences, what shkink-wrapping means in the context
> of GCC.

See the documentation part of the patch.


Bernd



Re: Support for V2 plugin API

2011-10-03 Thread H.J. Lu
On Wed, Sep 28, 2011 at 5:29 AM, Jan Hubicka  wrote:
> Hi,
> this patch adds support for V2 plugin API (thanks, Cary) that adds
> LDPR_PREVAILING_DEF_IRONLY_EXP.
> The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported
> out of DSO.  It is up to the compiler to optimize it out or keep it based on
> the knowledge whether the symbol can be optimized out at all (i.e. most 
> COMDATs
> can, other types can't).
>
> This solve quite few problems with building C++ APPS, see the PR log.
>
> I was originally wrong about gold implementation being buggy. The problem 
> turned
> out to be subtle lto-symtab bug that was mostly latent because of the COMDAT 
> hack
> we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when 
> it is
> available but it doesn't when resolution of very first entry in the list is 
> UNKNOWN.
> This can happen because we add into symtab also declarations that are not in
> varpool (i.e. they are neither defined or used by the object file), but they 
> are
> used otherwise, i.e. referred from stuff used for debug info or TB 
> devirtualization.
>
> To ensure backward compatibility I am keeping the COMDAT hack in place.  It 
> won't help
> letting compiler know the plugin API version, since we decide on that at a 
> time
> we output object files and thus we are not called from plugin.  I suppose we 
> could
> keep the hack in place for next release and remove it afterwards penalizing 
> builds
> with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time.
>
> Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in 
> progress.
> OK if it passes?
>
> Honza
>
>        PR lto/47247
>        * lto-plugin.c (get_symbols_v2): New variable.
>        (write_resolution): Use V2 API when available.
>        (onload): Handle LDPT_GET_SYMBOLS_V2.
>
>        * lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve
>        when resolution is already availbale from plugin.
>        (lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP.
>        * cgraph.c (ld_plugin_symbol_resolution): Add 
> prevailing_def_ironly_exp.
>        * lto-cgraph.c (LDPR_NUM_KNOWN): Update.
>        * ipa.c (varpool_externally_visible_p): IRONLY variables are never
>        externally visible.
>        * varasm.c (resolution_to_local_definition_p): Add
>        LDPR_PREVAILING_DEF_IRONLY_EXP.
>        (resolution_local_p): Likewise.
>
>        * common.c (lto_resolution_str): Add new resolution.
>        * common.h (lto_resolution_str): Likewise.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50601

-- 
H.J.


[PATCH] Minor fixups to the sparc bmask/bshuffle patterns.

2011-10-03 Thread David Miller

Committed to trunk.

gcc/

* config/sparc/sparc.md (bmask_vis): Split into explicit 'di'
and 'si' patterns which describe the GSR changes explicitly in the
RTL using zero_extract.
(bshuffle_vis): Put the GSR use inside of the unspec.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179465 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |7 +++
 gcc/config/sparc/sparc.md |   27 +++
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 028ce8e..9fcee40 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2011-10-03  David S. Miller  
+
+   * config/sparc/sparc.md (bmask_vis): Split into explicit 'di'
+   and 'si' patterns which describe the GSR changes explicitly in the
+   RTL using zero_extract.
+   (bshuffle_vis): Put the GSR use inside of the unspec.
+
 2011-10-03  Artem Shinkarov  
 
* optabs.c (expand_vec_shuffle_expr_p): New function. Checks
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 6990746..c48c979 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -8228,11 +8228,22 @@
   "array32\t%r1, %r2, %0"
   [(set_attr "type" "array")])
 
-(define_insn "bmask_vis"
-  [(set (match_operand:P 0 "register_operand" "=r")
-(plus:P (match_operand:P 1 "register_operand" "rJ")
-(match_operand:P 2 "register_operand" "rJ")))
-   (clobber (reg:SI GSR_REG))]
+(define_insn "bmaskdi_vis"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(plus:DI (match_operand:DI 1 "register_operand" "rJ")
+ (match_operand:DI 2 "register_operand" "rJ")))
+   (set (zero_extract:DI (reg:DI GSR_REG) (const_int 32) (const_int 32))
+(plus:DI (match_dup 1) (match_dup 2)))]
+  "TARGET_VIS2"
+  "bmask\t%r1, %r2, %0"
+  [(set_attr "type" "array")])
+
+(define_insn "bmasksi_vis"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(plus:SI (match_operand:SI 1 "register_operand" "rJ")
+ (match_operand:SI 2 "register_operand" "rJ")))
+   (set (zero_extract:DI (reg:DI GSR_REG) (const_int 32) (const_int 32))
+(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
   "TARGET_VIS2"
   "bmask\t%r1, %r2, %0"
   [(set_attr "type" "array")])
@@ -8240,9 +8251,9 @@
 (define_insn "bshuffle_vis"
   [(set (match_operand:V64I 0 "register_operand" "=e")
 (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
- (match_operand:V64I 2 "register_operand" "e")]
- UNSPEC_BSHUFFLE))
-   (use (reg:SI GSR_REG))]
+ (match_operand:V64I 2 "register_operand" "e")
+ (use (reg:SI GSR_REG))]
+ UNSPEC_BSHUFFLE))]
   "TARGET_VIS2"
   "bshuffle\t%1, %2, %0"
   [(set_attr "type" "fga")
-- 
1.7.6.401.g6a319



Re: [PATCH] Fix c6x unwinding info

2011-10-03 Thread Bernd Schmidt
On 10/03/11 17:57, Paul Brook wrote:
> Patch below makes gcc emit the resuired assembly directives for c6x unwinding 
> tables, same as ARM and IA64.  This fixes most of the unwinding related 
> failures.

Most?

>   * config/c6x/c6x.c (c6x_asm_emit_except_personality,
>   c6x_asm_init_sections): New functions.
>   (TARGET_ASM_EMIT_EXCEPT_PERSONALITY, TARGET_ASM_INIT_SECTIONS):
>   Define.

Ok.


Bernd


Re: Vector shuffling

2011-10-03 Thread Artem Shinkarov
Hi, Richard

There is a problem with the testcases of the patch you have committed
for me. The code in every test-case is doubled. Could you please,
apply the following patch, otherwise it would fail all the tests from
the vector-shuffle-patch would fail.

Also, if it is possible, could you change my name from in the
ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
version is the way I am spelled in the passport, and the name I use in
the ChangeLog.



Thanks,
Artem.


On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson  wrote:
> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>> Hi, can anyone commit it please?
>>
>> Richard?
>> Or may be Richard?
>
> Committed.
>
>
> r~
>
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c(revision 
179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c(working copy)
@@ -17,55 +17,9 @@ int main (int argc, char *argv[]) {
 /*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
 vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
 vector (8, short) v2;
-   
-vector (8, short) smask = {0,0,1,2,3,4,5,6};
-
-v2 = __builtin_shuffle (v0,  smask);
-shufcompare (short, 8, v2, v0, smask);
-v2 = __builtin_shuffle (v0, v1);
-shufcompare (short, 8, v2, v0, v1);
-v2 = __builtin_shuffle (smask, v0);
-shufcompare (short, 8, v2, smask, v0);*/
-
-vector (4, int) i0 = {argc, 1,2,3};
-vector (4, int) i1 = {argc, 1, argc, 3};
-vector (4, int) i2;
-
-vector (4, int) imask = {0,3,2,1};
-
-/*i2 = __builtin_shuffle (i0, imask);
-shufcompare (int, 4, i2, i0, imask);*/
-i2 = __builtin_shuffle (i0, i1);
-shufcompare (int, 4, i2, i0, i1);
-
-i2 = __builtin_shuffle (imask, i0);
-shufcompare (int, 4, i2, imask, i0);
-
-return 0;
-}
-
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type type
-
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
 
-#define shufcompare(type, count, vres, v0, mask) \
-do { \
-int __i; \
-for (__i = 0; __i < count; __i++) { \
-if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
-__builtin_abort (); \
-} \
-} while (0)
-
-
-int main (int argc, char *argv[]) {
-/*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
-vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
-vector (8, short) v2;
-   
 vector (8, short) smask = {0,0,1,2,3,4,5,6};
-
+
 v2 = __builtin_shuffle (v0,  smask);
 shufcompare (short, 8, v2, v0, smask);
 v2 = __builtin_shuffle (v0, v1);
@@ -83,10 +37,10 @@ int main (int argc, char *argv[]) {
 shufcompare (int, 4, i2, i0, imask);*/
 i2 = __builtin_shuffle (i0, i1);
 shufcompare (int, 4, i2, i0, i1);
-
+
 i2 = __builtin_shuffle (imask, i0);
 shufcompare (int, 4, i2, imask, i0);
-
+
 return 0;
 }
 
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
===
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c(revision 
179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c(working copy)
@@ -42,47 +42,3 @@ int main (int argc, char *argv[]) {
 return 0;
 }
 
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type type
-
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
-
-#define shuf2compare(type, count, vres, v0, v1, mask) \
-do { \
-int __i; \
-for (__i = 0; __i < count; __i++) { \
-if (vidx(type, vres, __i) != ((vidx(type, mask, __i) < count) ? \
-  vidx(type, v0, vidx(type, mask, __i)) :  \
-  vidx(type, v1, (vidx(type, mask, __i) - count \
-__builtin_abort (); \
-} \
-} while (0)
-
-
-int main (int argc, char *argv[]) {
-vector (8, short) v0 = {5, 5,5,5,5,5,argc,7};
-vector (8, short) v1 = {argc, 1,8,8,4,9,argc,4};
-vector (8, short) v2;
-
-//vector (8, short) mask = {1,2,5,4,3,6,7};
-
-vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
-vector (8, short) mask1 = {0,12,3,4,3,0,10,9};
-
-vector (8, short) mask2 = {0,8,1,9,2,10,3,11};
-
-v2 = __builtin_shuffle (v0, v1,  mask0);
-shuf2compare (short, 8, v2, v0, v1, mask0);
-
-v2 = __builtin_shuffle (v0, v1,  mask1);
-shuf2compare (short, 8, v2, v0, v1, mask1);
-
-v2 = __builtin_shuffle (v0, v1,  mask2);
-shuf2compare (short, 8, v2, v0, v1, mask2);
-
-v2 = __builtin_shuffle (mask0, mask0,  v0);
-shuf2compare (short, 8, v2, mask0, mask0, v0);
-
-return 0;
-}
-
Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
===
--- gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c(revision 
179464)
+++ gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c   

Re: [PATCH] Add sparc VIS 2.0 builtins, intrinsics, and option to control them.

2011-10-03 Thread Richard Henderson
On 10/02/2011 10:28 PM, David Miller wrote:
>> (set (reg:DI GSR_REG)
>>   (unspec:DI [(match_dup 1) (match_dup 2) (reg:DI GSR_REG)]
>>  UNSPEC_BMASK))
> 
> Actually, can't we just use a (zero_extend:DI (plus:SI ...)) for the
> 32-bit case?  It seems to work fine.

Sure.

>>> +(define_insn "bshuffle_vis"
>>> +  [(set (match_operand:V64I 0 "register_operand" "=e")
>>> +(unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
>>> + (match_operand:V64I 2 "register_operand" "e")]
>>> + UNSPEC_BSHUFFLE))
>>> +   (use (reg:SI GSR_REG))]
>>
>> Better to push the use of the GSR_REG into the unspec, and not leave
>> it separate in the parallel.
> 
> This is actually just a non-constant vec_merge, and even though the internals
> documentation says that the 'items' operand has to be a const_int, the 
> compiler
> actually doesn't care.

Um, no it isn't.

The VEC_MERGE pattern uses N bits to select N elements from op0 and op1:

op0= A B C D
op1= W X Y Z
bmask  = 0 1 0 1 = 3
result = A X C D

Your insn doesn't use single bits for the select.  It uses nibbles to
select from the 16 input bytes.  It's akin to the VEC_SELECT pattern,
except that VEC_SELECT requires a constant input parallel.

---

You might have a look at the "Vector Shuffle" thread, where we've been
trying to provide builtin-level access to this feature.  We've not added
an rtx-level code for this because so far there isn't *that* much in
common between the various cpus.  They all seem to differ in niggling
details...

You'll have a somewhat harder time than i386 for this feature, given
that you've got to pack bytes into nibbles.  But it can certainly be done.


r~


Re: Support for V2 plugin API

2011-10-03 Thread Jan Hubicka
> This caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50601
Hi,
do you use linker with V2 API?

Honza
> 
> -- 
> H.J.


Re: [PATCH] Minor fixups to the sparc bmask/bshuffle patterns.

2011-10-03 Thread Richard Henderson
On 10/03/2011 09:43 AM, David Miller wrote:
>  (define_insn "bshuffle_vis"
>[(set (match_operand:V64I 0 "register_operand" "=e")
>  (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
> -   (match_operand:V64I 2 "register_operand" "e")]
> - UNSPEC_BSHUFFLE))
> -   (use (reg:SI GSR_REG))]
> +   (match_operand:V64I 2 "register_operand" "e")
> +   (use (reg:SI GSR_REG))]
> + UNSPEC_BSHUFFLE))]

I think I was less than clear here.  You don't need the USE either.
The GSR register is simply a normal (third) input to the unspec.


r~


Re: Vector shuffling

2011-10-03 Thread Richard Henderson
On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
> Hi, Richard
> 
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.

Huh.  Dunno what happened there.  Fixed.

> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.

Fixed.


r~


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-03 Thread Kirill Yukhin
Hi,
I did

cvs update
cvs diff > ~/changes.html.www.patch

It is attached. Is it applying?

Thanks, K


changes.html.www.patch
Description: Binary data


Re: Vector shuffling

2011-10-03 Thread Artem Shinkarov
On Mon, Oct 3, 2011 at 6:12 PM, Richard Henderson  wrote:
> On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>
> Huh.  Dunno what happened there.  Fixed.
>

This is a common pattern, when the patch adds new files and you apply
the same patch to the code-base second time. In that case the content
of the files is doubled. This is an annoying feature of svn. May be
there is a solution to the problem, but I never managed to find one.

>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>
> Fixed.

Thank you very much.


Artem.
>
>
> r~
>


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Richard Sandiford
Bernd Schmidt  writes:
>> Reason for asking is that (AIUI) SMS used to use stronger memory
>> disambiguation, but had to pull back to something more conservative
>> for similar reasons.
>
> Pointers? All I could find is a thread where rth seems to be of the same
> opinion as me:
>
>   http://gcc.gnu.org/ml/gcc/2004-09/msg01648.html

I was thinking of:

http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00294.html

Richard


Re: [PATCH] Fix c6x unwinding info

2011-10-03 Thread Paul Brook
> On 10/03/11 17:57, Paul Brook wrote:
> > Patch below makes gcc emit the resuired assembly directives for c6x
> > unwinding tables, same as ARM and IA64.  This fixes most of the
> > unwinding related failures.
> 
> Most?

There are a number of failures in g++.dg/torture/stackalign which appear to be 
EH related.  I'm still investigating the exact cause.

Paul


Re: [PATCH] Minor fixups to the sparc bmask/bshuffle patterns.

2011-10-03 Thread David Miller
From: Richard Henderson 
Date: Mon, 03 Oct 2011 10:07:26 -0700

> On 10/03/2011 09:43 AM, David Miller wrote:
>>  (define_insn "bshuffle_vis"
>>[(set (match_operand:V64I 0 "register_operand" "=e")
>>  (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
>> -  (match_operand:V64I 2 "register_operand" "e")]
>> - UNSPEC_BSHUFFLE))
>> -   (use (reg:SI GSR_REG))]
>> +  (match_operand:V64I 2 "register_operand" "e")
>> +  (use (reg:SI GSR_REG))]
>> + UNSPEC_BSHUFFLE))]
> 
> I think I was less than clear here.  You don't need the USE either.
> The GSR register is simply a normal (third) input to the unspec.

I see, I'll fix this up, thanks Richard.


Re: [PATCH] Add sparc VIS 2.0 builtins, intrinsics, and option to control them.

2011-10-03 Thread David Miller
From: Richard Henderson 
Date: Mon, 03 Oct 2011 09:49:37 -0700

> You might have a look at the "Vector Shuffle" thread, where we've been
> trying to provide builtin-level access to this feature.  We've not added
> an rtx-level code for this because so far there isn't *that* much in
> common between the various cpus.  They all seem to differ in niggling
> details...
> 
> You'll have a somewhat harder time than i386 for this feature, given
> that you've got to pack bytes into nibbles.  But it can certainly be done.

Ok, I'll take a look.


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Bernd Schmidt
On 10/03/11 19:23, Richard Sandiford wrote:
> Bernd Schmidt  writes:
>>> Reason for asking is that (AIUI) SMS used to use stronger memory
>>> disambiguation, but had to pull back to something more conservative
>>> for similar reasons.
>>
>> Pointers? All I could find is a thread where rth seems to be of the same
>> opinion as me:
>>
>>   http://gcc.gnu.org/ml/gcc/2004-09/msg01648.html
> 
> I was thinking of:
> 
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00294.html

What I see in this thread is suggestions that people use
{true,anti,output}_dependence, which are exactly the ones used by
sched-deps. We know that using these is (or rather, must be) safe
because RTL loop unrolling followed by scheduling works. So, anything I
am missing here?


Bernd


Vector Shuffle plans

2011-10-03 Thread Richard Henderson
On 10/03/2011 10:42 AM, David Miller wrote:
>> You might have a look at the "Vector Shuffle" thread, where we've been
>> trying to provide builtin-level access to this feature.  We've not added
>> an rtx-level code for this because so far there isn't *that* much in
>> common between the various cpus.  They all seem to differ in niggling
>> details...
>>
>> You'll have a somewhat harder time than i386 for this feature, given
>> that you've got to pack bytes into nibbles.  But it can certainly be done.
> 
> Ok, I'll take a look.

Oh, you should know that, at present, our generic shuffle support assumes
that shuffles with a constant control (which are also generated by the
vectorizer) get expanded to builtins.  And as builtins we wind up with
lots of them -- one per type.

I'm going to start fixing that in the coming week.

The vectorizer will be changed to emit VEC_SHUFFLE_EXPR.  It will still use
the target hook to see if the constant shuffle is supported.

The lower-vector pass currently tests the target hook and swaps the 
VEC_SHUFFLE_EXPRs that are validate into builtins.  That will be changed
to simply leave them unchanged if the other target hook returns NULL.
As the targets are updated to use vshuffle, the builtins get deleted
to return NULL.  After all targets are updated, we can remove this check
and the target hook itself.  This should preserve bisection on each of
the affected targets.

The rtl expander won't have to change.

The target backends will need to accept an immediate for vshuffle op3,
if anything special ought to be done for constant shuffles.  In addition,
the builtins should be removed, as previously noted.


r~


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-03 Thread H.J. Lu
On Mon, Oct 3, 2011 at 10:19 AM, Kirill Yukhin  wrote:
> Hi,
> I did
>
> cvs update
> cvs diff > ~/changes.html.www.patch
>
> It is attached. Is it applying?
>
> Thanks, K
>

Please use "cvs diff -up" to generate the patch.


-- 
H.J.


Re: Support for V2 plugin API

2011-10-03 Thread H.J. Lu
On Mon, Oct 3, 2011 at 9:52 AM, Jan Hubicka  wrote:
>> This caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50601
> Hi,
> do you use linker with V2 API?
>

No.


-- 
H.J.


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 10/03/11 19:23, Richard Sandiford wrote:
>> Bernd Schmidt  writes:
 Reason for asking is that (AIUI) SMS used to use stronger memory
 disambiguation, but had to pull back to something more conservative
 for similar reasons.
>>>
>>> Pointers? All I could find is a thread where rth seems to be of the same
>>> opinion as me:
>>>
>>>   http://gcc.gnu.org/ml/gcc/2004-09/msg01648.html
>> 
>> I was thinking of:
>> 
>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00294.html
>
> What I see in this thread is suggestions that people use
> {true,anti,output}_dependence, which are exactly the ones used by
> sched-deps. We know that using these is (or rather, must be) safe
> because RTL loop unrolling followed by scheduling works.

But what I'm trying to say is that you're not just doing loop
unrolling followed by scheduling.  You're doing loop unrolling,
followed by scheduling, followed by an overlapping of the unrolled loop
iterations.  It just felt strange that the overlapping was being done
without any additional alias analysis.

But I admit it's a theorotical objection at best, and I'm certainly
not going to be able to come with an example, so never mind me :-)

Richard


Re: [0/4] Modulo scheduling with haifa-sched for C6X

2011-10-03 Thread Bernd Schmidt
On 10/03/11 20:12, Richard Sandiford wrote:
> But what I'm trying to say is that you're not just doing loop
> unrolling followed by scheduling.  You're doing loop unrolling,
> followed by scheduling, followed by an overlapping of the unrolled loop
> iterations.  It just felt strange that the overlapping was being done
> without any additional alias analysis.

I wouldn't say this is completely accurate either. If we overlap N
iterations of the loop, then we are analyzing and scheduling N
iterations together, so there isn't really additional overlap besides
the loop kernel we find.

The only assumption is that it does not matter whether you analyze
iterations (X .. X + N - 1) or iterations (Y .. Y + N - 1), since they
are indistinguishable at the RTL level. Hence, any schedule we find for
overlapping N iterations must be valid for all starting points.


Bernd


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-03 Thread Kirill Yukhin
Done

K

On Mon, Oct 3, 2011 at 10:09 PM, H.J. Lu  wrote:
> On Mon, Oct 3, 2011 at 10:19 AM, Kirill Yukhin  
> wrote:
>> Hi,
>> I did
>>
>> cvs update
>> cvs diff > ~/changes.html.www.patch
>>
>> It is attached. Is it applying?
>>
>> Thanks, K
>>
>
> Please use "cvs diff -up" to generate the patch.
>
>
> --
> H.J.
>


changes.html.www.patch
Description: Binary data


Re: Vector Shuffle plans

2011-10-03 Thread Artem Shinkarov
On Mon, Oct 3, 2011 at 7:07 PM, Richard Henderson  wrote:
> On 10/03/2011 10:42 AM, David Miller wrote:
>>> You might have a look at the "Vector Shuffle" thread, where we've been
>>> trying to provide builtin-level access to this feature.  We've not added
>>> an rtx-level code for this because so far there isn't *that* much in
>>> common between the various cpus.  They all seem to differ in niggling
>>> details...
>>>
>>> You'll have a somewhat harder time than i386 for this feature, given
>>> that you've got to pack bytes into nibbles.  But it can certainly be done.
>>
>> Ok, I'll take a look.
>
> Oh, you should know that, at present, our generic shuffle support assumes
> that shuffles with a constant control (which are also generated by the
> vectorizer) get expanded to builtins.  And as builtins we wind up with
> lots of them -- one per type.
>
> I'm going to start fixing that in the coming week.
>
> The vectorizer will be changed to emit VEC_SHUFFLE_EXPR.  It will still use
> the target hook to see if the constant shuffle is supported.
>
> The lower-vector pass currently tests the target hook and swaps the
> VEC_SHUFFLE_EXPRs that are validate into builtins.  That will be changed
> to simply leave them unchanged if the other target hook returns NULL.
> As the targets are updated to use vshuffle, the builtins get deleted
> to return NULL.  After all targets are updated, we can remove this check
> and the target hook itself.  This should preserve bisection on each of
> the affected targets.
>
> The rtl expander won't have to change.
>
> The target backends will need to accept an immediate for vshuffle op3,
> if anything special ought to be done for constant shuffles.  In addition,
> the builtins should be removed, as previously noted.
>
>
> r~
>

Several orthogonal vector-shuffling issues.

Currently if vec_perm_ok returns false, we do not try to use a new
vshuffle routine. Would it make sense to implement that? The only
potential problem I can see is a possible performance degradation.
This leads us to the second issue.

When we perform vshuffle, we need to know whether it make sense to use
pshufb (in case of x86) or to perform data movement via standard
non-simd registers. Do we have this information in the current
cost-model? Also, in certain cases, when the mask is constant, I would
assume the memory movement is also faster. For example if the mask is
{4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.
Were there any attempts to perform such an analysis, and if not,
should we formalise the cases when the substitution of sorts would
make some sense.


Thanks,
Artem.


[RFC PATCH] restrict_based_on_field attribute

2011-10-03 Thread Jakub Jelinek
Hi!

std::vector acts roughly as something having a restrict pointer field,
i.e. two different std::vector objects will have the pointers pointing to
a different array, unfortunately unlike e.g. std::valarray we have 3
different pointers pointing into the array instead of 1, and we can't change
it without breaking the ABI.  This patch adds an extension, where several
pointer fields in the same structure can be marked as being a group for
restrict purposes (so the ISO C99 restrict wording would for them not
mandate that all accesses are made through pointers based on say _M_start,
but through pointers based on any of the fields in the group (_M_start,
_M_finish or _M_end_of_storage in the std::vector case).

With the patch (on top of the
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01605.html
patch) e.g.
void
f2 (std::vector &__restrict a, std::vector &__restrict b, 
std::vector &__restrict c, int z)
{
  int i;
  for (i = 0; i < z; i++)
{
  a[i] = b[i] + c[i];
  a[i] += b[i] * c[i];
}
}
can be vectorized without any runtime overlap tests.

The patch is still incomplete, I haven't added docs, nor testcases
and haven't tweaked expected line numbers in 4
23_containers/vector/requirements/dr438 libstdc++-v3 tests, but
has been otherwise bootstrapped/regtested on x86_64-linux and i686-linux.

If you have ideas for a better attribute name, etc., please let me know.

2011-10-03  Jakub Jelinek  

* c-decl.c (finish_struct): Call finish_restrict_based_on_field.
* tree-ssa-structalias.c (NO_RESTRICT_ID): Define.
(struct variable_info): Add restrict_id field.
(new_var_info): Initialize it.
(make_constraint_from_restrict): Change return type to varinfo_t,
return vi.
(struct restrict_id_struct): New type.
(make_constraint_from_restrict_id): New function.
(struct fieldoff): Add restrict_id field.
(push_fields_onto_fieldstack): Add restrictstack argument,
initialize restrict_id fields, handle fields with
restrict_based_on_field attribute.
(create_variable_info_for_1): Adjust caller, initialize
restrict_id fields.
(create_variable_info_for, intra_create_variable_infos): Call
make_constraint_from_restrict_id instead of
make_constraint_from_restrict where appropriate.
gcc/cp/
* class.c (finish_struct_1): Call finish_restrict_based_on_field.
gcc/c-family/
* c-common.h (finish_restrict_based_on_field): New prototype.
* c-common.c (handle_restrict_based_on_field_attribute,
finish_restrict_based_on_field): New functions.
(c_common_attribute_table): Add restrict_based_on_field attribute.
(attribute_takes_identifier_p): Return true for
restrict_based_on_field attribute.
libstdc++-v3/
* include/bits/stl_vector.h (struct _Vector_impl): Add
__restrict_based_on_field__ attributes to _M_start, _M_finish
and _M_end_of_storage.

--- gcc/c-decl.c.jj 2011-10-03 14:27:47.0 +0200
+++ gcc/c-decl.c2011-10-03 15:20:53.0 +0200
@@ -7198,6 +7198,8 @@ finish_struct (location_t loc, tree t, t
   }
   }
 
+  finish_restrict_based_on_field (t);
+
   for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
 {
   TYPE_FIELDS (x) = TYPE_FIELDS (t);
--- gcc/cp/class.c.jj   2011-10-03 14:27:49.0 +0200
+++ gcc/cp/class.c  2011-10-03 15:20:53.0 +0200
@@ -5811,6 +5811,8 @@ finish_struct_1 (tree t)
   /* Build the VTT for T.  */
   build_vtt (t);
 
+  finish_restrict_based_on_field (t);
+
   /* This warning does not make sense for Java classes, since they
  cannot have destructors.  */
   if (!TYPE_FOR_JAVA (t) && warn_nonvdtor && TYPE_POLYMORPHIC_P (t))
--- gcc/c-family/c-common.h.jj  2011-10-03 14:27:48.0 +0200
+++ gcc/c-family/c-common.h 2011-10-03 15:20:53.0 +0200
@@ -544,6 +544,7 @@ extern tree build_indirect_ref (location
 
 extern int c_expand_decl (tree);
 
+extern void finish_restrict_based_on_field (tree);
 extern int field_decl_cmp (const void *, const void *);
 extern void resort_sorted_fields (void *, void *, gt_pointer_operator,
  void *);
--- gcc/c-family/c-common.c.jj  2011-10-03 14:27:47.0 +0200
+++ gcc/c-family/c-common.c 2011-10-03 15:20:53.0 +0200
@@ -373,6 +373,8 @@ static tree handle_alloc_size_attribute 
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
+static tree handle_restrict_based_on_field_attribute (tree *, tree, tree,
+ int, bool *);
 static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
 
 static void check_function_nonnull (tree, int, tree *);
@@ -704,6 +706,9 @@ const struct attribute_spec c_common_att
 

Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-03 Thread H.J. Lu
On Mon, Oct 3, 2011 at 11:23 AM, Kirill Yukhin  wrote:
> Done
>
> K
>
> On Mon, Oct 3, 2011 at 10:09 PM, H.J. Lu  wrote:
>> On Mon, Oct 3, 2011 at 10:19 AM, Kirill Yukhin  
>> wrote:
>>> Hi,
>>> I did
>>>
>>> cvs update
>>> cvs diff > ~/changes.html.www.patch
>>>
>>> It is attached. Is it applying?
>>>
>>> Thanks, K
>>>
>>
>> Please use "cvs diff -up" to generate the patch.
>>
>>

The new patch looks very strange.  I don't think it can apply.

-- 
H.J.


Re: Vector Shuffle plans

2011-10-03 Thread Richard Henderson
On 10/03/2011 11:40 AM, Artem Shinkarov wrote:
> Currently if vec_perm_ok returns false, we do not try to use a new
> vshuffle routine. Would it make sense to implement that? The only
> potential problem I can see is a possible performance degradation.
> This leads us to the second issue.

Implement that where?  In the vectorizer?  No, I don't think so.
The _ok routine, while also indicating what the backend expander
supports, could also be thought of as a cost cutoff predicate.
Unless the vectorization folk request some more exact cost metric
I don't see any reason to change this.

> When we perform vshuffle, we need to know whether it make sense to use
> pshufb (in case of x86) or to perform data movement via standard
> non-simd registers. Do we have this information in the current
> cost-model?

Not really.  Again, if you're talking about the vectorizer, it
gets even more complicated than this because...

> Also, in certain cases, when the mask is constant, I would
> assume the memory movement is also faster. For example if the mask is
> {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.

... even before SSSE3 PSHUFB, we have all sorts of insns that can
perform a constant shuffle without having to resort to either
general-purpose registers or memory.  E.g. PSHUFD.  For specific
data types, we can handle arbitrary constant shuffle with 1 or 2
insns, even when arbitrary variable shuffles aren't.

It's certainly something that we could add to tree-vect-generic.c.
I have no plans to do anything of the sort, however.


r~


[v3] testsuite tweak

2011-10-03 Thread Jonathan Wakely
2011-10-03  Jonathan Wakely  

* testsuite/20_util/pointer_traits/pointer_to.cc: Define equality
operator and use.

Tested x86_64-linux, committed to trunk.
Index: testsuite/20_util/pointer_traits/pointer_to.cc
===
--- testsuite/20_util/pointer_traits/pointer_to.cc	(revision 179472)
+++ testsuite/20_util/pointer_traits/pointer_to.cc	(working copy)
@@ -29,12 +29,14 @@ struct Ptr
   static Ptr pointer_to(bool& b) { return Ptr{&b}; }
 };
 
+bool operator==(const Ptr& l, const Ptr& r) { return l.value == r.value; }
+
 void test01()
 {
   bool test = true;
-  Ptr p __attribute__((unused)) {&test};
+  Ptr p{ &test };
 
-  VERIFY( std::pointer_traits::pointer_to(test).value == &test );
+  VERIFY( std::pointer_traits::pointer_to(test) == p );
 }
 
 void test02()


Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Douglas Rupp

On 9/30/2011 8:19 AM, Joseph S. Myers wrote:

On Fri, 30 Sep 2011, Tristan Gingold wrote:


If you prefer a target hook, I'm fine with that.  I will write such a patch.

I don't think it must be restricted to system headers, as it is possible
that the user 'imports' such a function (and define it in one of VMS
favorite languages such as macro-32 or bliss).

If it's not restricted to system headers, then probably the option is
better than the target hook.

I'm not sure I understand the reasoning here.  This seems fairly VMS 
specific so what is the downside for a target hook and user written headers?


Re: Vector Shuffle plans

2011-10-03 Thread Artem Shinkarov
On Mon, Oct 3, 2011 at 8:02 PM, Richard Henderson  wrote:
> On 10/03/2011 11:40 AM, Artem Shinkarov wrote:
>> Currently if vec_perm_ok returns false, we do not try to use a new
>> vshuffle routine. Would it make sense to implement that? The only
>> potential problem I can see is a possible performance degradation.
>> This leads us to the second issue.
>
> Implement that where?  In the vectorizer?  No, I don't think so.
> The _ok routine, while also indicating what the backend expander
> supports, could also be thought of as a cost cutoff predicate.
> Unless the vectorization folk request some more exact cost metric
> I don't see any reason to change this.

I was thinking more about the expander of the backend itself. When we
throw sorry () in the ix86_expand_vec_perm_builtin, we can fall back
to the vshuffle routine, unless it would lead to the performance
degradation.

>> When we perform vshuffle, we need to know whether it make sense to use
>> pshufb (in case of x86) or to perform data movement via standard
>> non-simd registers. Do we have this information in the current
>> cost-model?
>
> Not really.  Again, if you're talking about the vectorizer, it
> gets even more complicated than this because...
>
>> Also, in certain cases, when the mask is constant, I would
>> assume the memory movement is also faster. For example if the mask is
>> {4,5,6,7,0,1,2,3...}, then two integer moves should do a better job.
>
> ... even before SSSE3 PSHUFB, we have all sorts of insns that can
> perform a constant shuffle without having to resort to either
> general-purpose registers or memory.  E.g. PSHUFD.  For specific
> data types, we can handle arbitrary constant shuffle with 1 or 2
> insns, even when arbitrary variable shuffles aren't.

But these cases are more or less covered. I am thinking about the
cases when vec_perm_ok, returns false, but the actual permutation
could be done faster with memory/register transfers, rather than with
the PSHUFB & Co.

> It's certainly something that we could add to tree-vect-generic.c.
> I have no plans to do anything of the sort, however.

I didn't quite understand what do you think can be added to the
tree-vect-generic? I thought that we are talking about more or less
backend issues.

In any case I am investigating these problems, and I will appreciate
any help or advices.


Thanks,
Artem.


Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-03 Thread H.J. Lu
On Mon, Oct 3, 2011 at 12:01 PM, H.J. Lu  wrote:
> On Mon, Oct 3, 2011 at 11:23 AM, Kirill Yukhin  
> wrote:
>> Done
>>
>> K
>>
>> On Mon, Oct 3, 2011 at 10:09 PM, H.J. Lu  wrote:
>>> On Mon, Oct 3, 2011 at 10:19 AM, Kirill Yukhin  
>>> wrote:
 Hi,
 I did

 cvs update
 cvs diff > ~/changes.html.www.patch

 It is attached. Is it applying?

 Thanks, K

>>>
>>> Please use "cvs diff -up" to generate the patch.
>>>
>>>
>

I checked it in.

Thanks.

-- 
H.J.


Patch committed: Fix -fdump-go-spec with large enum values

2011-10-03 Thread Ian Lance Taylor
Jakub discovered that -fdump-go-spec crashes when it tries to print an
enum value which does not fit in a signed HOST_WIDE_INT.  This patch
fixes the problem.  Bootstrapped and tested on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian


2011-10-03  Jakub Jelinek  
Ian Lance Taylor  

* godump.c (go_output_typedef): Support printing enum values that
don't fit in a signed HOST_WIDE_INT.


Index: godump.c
===
--- godump.c	(revision 179472)
+++ godump.c	(working copy)
@@ -920,9 +920,20 @@ go_output_typedef (struct godump_contain
 	  if (*slot == NULL)
 	{
 	  *slot = CONST_CAST (char *, name);
-	  fprintf (go_dump_file,
-		   "const _%s = " HOST_WIDE_INT_PRINT_DEC "\n",
-		   name, tree_low_cst (TREE_VALUE (element), 0));
+	  fprintf (go_dump_file, "const _%s = ", name);
+	  if (host_integerp (TREE_VALUE (element), 0))
+		fprintf (go_dump_file, HOST_WIDE_INT_PRINT_DEC,
+			 tree_low_cst (TREE_VALUE (element), 0));
+	  else if (host_integerp (TREE_VALUE (element), 1))
+		fprintf (go_dump_file, HOST_WIDE_INT_PRINT_UNSIGNED,
+			 ((unsigned HOST_WIDE_INT)
+			  tree_low_cst (TREE_VALUE (element), 1)));
+	  else
+		fprintf (go_dump_file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+			 ((unsigned HOST_WIDE_INT)
+			  TREE_INT_CST_HIGH (TREE_VALUE (element))),
+			 TREE_INT_CST_LOW (TREE_VALUE (element)));
+	  fprintf (go_dump_file, "\n");
 	}
 	}
   pointer_set_insert (container->decls_seen, TREE_TYPE (decl));


Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Joseph S. Myers
On Mon, 3 Oct 2011, Douglas Rupp wrote:

> On 9/30/2011 8:19 AM, Joseph S. Myers wrote:
> > On Fri, 30 Sep 2011, Tristan Gingold wrote:
> > 
> > > If you prefer a target hook, I'm fine with that.  I will write such a
> > > patch.
> > > 
> > > I don't think it must be restricted to system headers, as it is possible
> > > that the user 'imports' such a function (and define it in one of VMS
> > > favorite languages such as macro-32 or bliss).
> > If it's not restricted to system headers, then probably the option is
> > better than the target hook.
> > 
> I'm not sure I understand the reasoning here.  This seems fairly VMS specific
> so what is the downside for a target hook and user written headers?

The language accepted by the compiler in the user's source code (as 
opposed to in system headers) shouldn't depend on the target except for 
certain well-defined areas such as target attributes and built-in 
functions; behaving the same across different systems is an important 
feature of GCC.  This isn't one of those areas of target-dependence; it's 
generic syntax rather than e.g. exploiting a particular processor feature.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [3/4] SMS: Record moves in the partial schedule

2011-10-03 Thread Ayal Zaks
On Wed, Sep 28, 2011 at 4:53 PM, Richard Sandiford
 wrote:
> Ayal Zaks  writes:
 Only request is to document that the register moves are
 placed/assigned-id's in a specific order.
>>>
>>>I suppose this is the downside of splitting the patches up, sorry,
>>>but the ids are only ordered for the throw-away loop:
>>>
>>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>>>
>>>and for the prologue/epilogue code.  Both are replaced in patch 4
>>>with code that doesn't rely on the ordering of ids.
>> Ok then, very well. I was mostly referring to the following in
>> prologue/epiloque code, which merely relies on assigning all regmoves
>> of a node consecutive id's:
>
> FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
> & nreg_moves fields.
>
> Here's a slightly updated patch in line with your 4/4 comments.
> The move->def is now always the id of the predecessor, rather than
> the id of the original ddg node producer.  I've adapted the throw-away
> loop accordingly, but there are no other changes.
>

This is ok.

Just to make sure I follow, the changes were (for this patch):

1. setting up a different move->def for each move

> + move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i;

instead of the same original def for all

> +  move->def = i;

2. inserting each move right before its own def, bottom-up:

> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
> +add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

instead of inserting each move right before the original def, top-down:

>>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

Thanks,
Ayal.


> Tested on powerpc64-linux-gnu and with ARM benchmarks.
>
> Richard
>
> gcc/
>        * modulo-sched.c (ps_insn): Adjust comment.
>        (ps_reg_move_info): New structure.
>        (partial_schedule): Add reg_moves field.
>        (SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
>        (node_sched_params): Turn first_reg_move into an identifier.
>        (ps_reg_move): New function.
>        (ps_rtl_insn): Cope with register moves.
>        (ps_first_note): Adjust comment and assert that the instruction
>        isn't a register move.
>        (node_sched_params): Replace with...
>        (node_sched_param_vec): ...this vector.
>        (set_node_sched_params): Adjust accordingly.
>        (print_node_sched_params): Take a partial schedule instead of a ddg.
>        Use ps_rtl_insn and ps_reg_move.
>        (generate_reg_moves): Rename to...
>        (schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
>        move in the partial schedule, but don't emit it here.  Don't perform
>        register substitutions here either.
>        (apply_reg_moves): New function.
>        (duplicate_insns_of_cycles): Use register indices directly,
>        rather than finding instructions using PREV_INSN.  Use ps_reg_move.
>        (sms_schedule): Call schedule_reg_moves before committing to
>        a partial schedule.   Try the next ii if the schedule fails.
>        Use apply_reg_moves instead of generate_reg_moves.  Adjust
>        call to print_node_sched_params.  Free node_sched_param_vec
>        instead of node_sched_params.
>        (create_partial_schedule): Initialize reg_moves.
>        (free_partial_schedule): Free reg_moves.
>
> Index: gcc/modulo-sched.c
> ===
> --- gcc/modulo-sched.c  2011-09-28 09:03:10.334301485 +0100
> +++ gcc/modulo-sched.c  2011-09-28 11:24:26.530300781 +0100
> @@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
>  /* A single instruction in the partial schedule.  */
>  struct ps_insn
>  {
> -  /* The number of the ddg node whose instruction is being scheduled.  */
> +  /* Identifies the instruction to be scheduled.  Values smaller than
> +     the ddg's num_nodes refer directly to ddg nodes.  A value of
> +     X - num_nodes refers to register move X.  */
>   int id;
>
>   /* The (absolute) cycle in which the PS instruction is scheduled.
> @@ -137,6 +139,30 @@ struct ps_insn
>
>  };
>
> +/* Information about a register move that has been added to a partial
> +   schedule.  */
> +struct ps_reg_move_info
> +{
> +  /* The source of the move is defined by the ps_insn with id DEF.
> +     The destination is used by the ps_insns with the ids in USES.  */
> +  int def;
> +  sbitmap uses;
> +
> +  /* The original form of USES' instructions used OLD_REG, but they
> +     should now use NEW_REG.  */
> +  rtx old_reg;
> +  rtx new_reg;
> +
> +  /* An instruction that sets NEW_REG to the correct value.  The first
> +     move associated with DEF will have an rhs of OLD_REG; later moves
> +     use the result of the previous move.  */
> +  rtx insn;
> +};
> +
> +typedef struct ps

Re: [Patch] Support DEC-C extensions

2011-10-03 Thread Douglas Rupp

On 10/3/2011 1:23 PM, Joseph S. Myers wrote:

The language accepted by the compiler in the user's source code (as
opposed to in system headers) shouldn't depend on the target except for
certain well-defined areas such as target attributes and built-in
functions; behaving the same across different systems is an important
feature of GCC.  This isn't one of those areas of target-dependence; it's
generic syntax rather than e.g. exploiting a particular processor feature.



I understand now, thanks for explaining.


Re: Patch committed: Fix -fdump-go-spec with large enum values

2011-10-03 Thread Jakub Jelinek
On Mon, Oct 03, 2011 at 01:09:17PM -0700, Ian Lance Taylor wrote:
> Jakub discovered that -fdump-go-spec crashes when it tries to print an
> enum value which does not fit in a signed HOST_WIDE_INT.  This patch
> fixes the problem.  Bootstrapped and tested on x86_64-unknown-linux-gnu.
> Committed to mainline.

And here is 4.6 backport that I've committed to 4.6 branch.

2011-10-03  Jakub Jelinek  
Ian Lance Taylor  

* godump.c (go_output_typedef): Support printing enum values that
don't fit in a signed HOST_WIDE_INT.

--- gcc/godump.c(revision 179479)
+++ gcc/godump.c(working copy)
@@ -844,9 +844,24 @@ go_output_typedef (struct godump_contain
   for (element = TYPE_VALUES (TREE_TYPE (decl));
   element != NULL_TREE;
   element = TREE_CHAIN (element))
-   fprintf (go_dump_file, "const _%s = " HOST_WIDE_INT_PRINT_DEC "\n",
-IDENTIFIER_POINTER (TREE_PURPOSE (element)),
-tree_low_cst (TREE_VALUE (element), 0));
+   {
+ fprintf (go_dump_file, "const _%s = ",
+  IDENTIFIER_POINTER (TREE_PURPOSE (element)));
+ if (host_integerp (TREE_VALUE (element), 0))
+   fprintf (go_dump_file, HOST_WIDE_INT_PRINT_DEC,
+tree_low_cst (TREE_VALUE (element), 0));
+ else if (host_integerp (TREE_VALUE (element), 1))
+   fprintf (go_dump_file, HOST_WIDE_INT_PRINT_UNSIGNED,
+((unsigned HOST_WIDE_INT)
+ tree_low_cst (TREE_VALUE (element), 1)));
+ else
+   fprintf (go_dump_file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+((unsigned HOST_WIDE_INT)
+ TREE_INT_CST_HIGH (TREE_VALUE (element))),
+TREE_INT_CST_LOW (TREE_VALUE (element)));
+ fprintf (go_dump_file, "\n");
+   }
+
   pointer_set_insert (container->decls_seen, TREE_TYPE (decl));
   if (TYPE_CANONICAL (TREE_TYPE (decl)) != NULL_TREE)
pointer_set_insert (container->decls_seen,

Jakub


[Patch, Fortran] PR 35831: [F95] Shape mismatch check missing for dummy procedure argument

2011-10-03 Thread Janus Weil
Hi all,

here is a patch for a rather long-standing PR. It continues my ongoing
campaign of improving the checks for "procedure characteristics" (cf.
F08 chapter 12.3), which are relevant for dummy procedures, procedure
pointer assignments, overriding of type-bound procedures, etc.

This particular patch checks for the correct shape of array arguments,
in a manner similar to the recently added check for the string length
(PR 49638), namely via 'gfc_dep_compare_expr'.

The hardest thing about this PR was to find out what exactly the
standard requires (cf. c.l.f. thread linked in comment #12): Only the
shape of the argument has to match (i.e. upper minus lower bound), not
the bounds themselves (no matter if the bounds are constant or not).

I also added a FIXME, in order to remind myself of adding the same
check for function results soon.

The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?

Cheers,
Janus


2011-10-03  Janus Weil  

PR fortran/35831
* interface.c (check_dummy_characteristics): Check the array shape.


2011-10-03  Janus Weil  

PR fortran/35831
* gfortran.dg/dummy_procedure_6.f90: New.
Index: gcc/fortran/interface.c
===
--- gcc/fortran/interface.c	(revision 179468)
+++ gcc/fortran/interface.c	(working copy)
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "gfortran.h"
 #include "match.h"
+#include "arith.h"
 
 /* The current_interface structure holds information about the
interface currently being parsed.  This structure is saved and
@@ -1071,13 +1072,51 @@ check_dummy_characteristics (gfc_symbol *s1, gfc_s
   /* Check array shape.  */
   if (s1->as && s2->as)
 {
+  int i, compval;
+  gfc_expr *shape1, *shape2;
+
   if (s1->as->type != s2->as->type)
 	{
 	  snprintf (errmsg, err_len, "Shape mismatch in argument '%s'",
 		s1->name);
 	  return FAILURE;
 	}
-  /* FIXME: Check exact shape.  */
+
+  if (s1->as->type == AS_EXPLICIT)
+	for (i = 0; i < s1->as->rank + s1->as->corank; i++)
+	  {
+	shape1 = gfc_subtract (gfc_copy_expr (s1->as->upper[i]),
+  gfc_copy_expr (s1->as->lower[i]));
+	shape2 = gfc_subtract (gfc_copy_expr (s2->as->upper[i]),
+  gfc_copy_expr (s2->as->lower[i]));
+	compval = gfc_dep_compare_expr (shape1, shape2);
+	gfc_free_expr (shape1);
+	gfc_free_expr (shape2);
+	switch (compval)
+	{
+	  case -1:
+	  case  1:
+	  case -3:
+		snprintf (errmsg, err_len, "Shape mismatch in dimension %i of "
+			  "argument '%s'", i, s1->name);
+		return FAILURE;
+
+	  case -2:
+		/* FIXME: Implement a warning for this case.
+		gfc_warning ("Possible shape mismatch in argument '%s'",
+			s1->name);*/
+		break;
+
+	  case 0:
+		break;
+
+	  default:
+		gfc_internal_error ("check_dummy_characteristics: Unexpected "
+"result %i of gfc_dep_compare_expr",
+compval);
+		break;
+	}
+	  }
 }
 
   return SUCCESS;
@@ -1131,6 +1170,8 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol
 			  "of '%s'", name2);
 	  return 0;
 	}
+
+	  /* FIXME: Check array bounds and string length of result.  */
 	}
 
   if (s1->attr.pure && !s2->attr.pure)
! { dg-do compile }
!
! PR 35381: [F95] Shape mismatch check missing for dummy procedure argument
!
! Contributed by Janus Weil 

module m

  implicit none

contains

  ! constant array bounds

  subroutine s1(a)
integer :: a(1:2)
  end subroutine

  subroutine s2(a)
integer :: a(2:3)
  end subroutine

  subroutine s3(a)
integer :: a(2:4)
  end subroutine

  ! non-constant array bounds

  subroutine t1(a,b)
integer :: b
integer :: a(1:b,1:b)
  end subroutine

  subroutine t2(a,b)
integer :: b
integer :: a(1:b,2:b+1)
  end subroutine

  subroutine t3(a,b)
integer :: b
integer :: a(1:b,1:b+1)
  end subroutine

end module


program test
  use m
  implicit none

  call foo(s1)  ! legal
  call foo(s2)  ! legal
  call foo(s3)  ! { dg-error "Shape mismatch in dimension" }

  call bar(t1)  ! legal
  call bar(t2)  ! legal
  call bar(t3)  ! { dg-error "Shape mismatch in dimension" }

contains

  subroutine foo(f)
procedure(s1) :: f
  end subroutine

  subroutine bar(f)
procedure(t1) :: f
  end subroutine

end program

! { dg-final { cleanup-modules "m" } }


Re: Vector shuffling

2011-10-03 Thread Artem Shinkarov
On Mon, Oct 3, 2011 at 6:12 PM, Richard Henderson  wrote:
> On 10/03/2011 09:43 AM, Artem Shinkarov wrote:
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>
> Huh.  Dunno what happened there.  Fixed.
>
>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>
> Fixed.
>
>
> r~
>

Richard, there was a problem causing segfault in ix86_expand_vshuffle
which I have fixed with the patch attached.

Another thing I cannot figure out is the following case:
#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type type

vector (8, short) __attribute__ ((noinline))
f (vector (8, short) x, vector (8, short) y, vector (8, short) mask) {
return  __builtin_shuffle (x, y, mask);
}

int main (int argc, char *argv[]) {
vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
vector (8, short) mask0 = {0,2,3,1,4,5,6,7};
vector (8, short) v2;
int i;

v2 = f (v0, v1,  mask0);
/* v2 =  __builtin_shuffle (v0, v1, mask0); */
for (i = 0; i < 8; i ++)
  __builtin_printf ("%i, ", v2[i]);

return 0;
}

I am compiling with support of ssse3, in my case it is ./xgcc -B. b.c
-O3 -mtune=core2 -march=core2

And I get 1, 1, 1, 3, 4, 5, 1, 7, on the output, which is wrong.

But if I will call __builtin_shuffle directly, then the answer is correct.

Any ideas?


Thanks,
Artem.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 179464)
+++ gcc/config/i386/i386.c  (working copy)
@@ -19312,14 +19312,17 @@ ix86_expand_vshuffle (rtx operands[])
   xops[1] = operands[1];
   xops[2] = operands[2];
   xops[3] = gen_rtx_EQ (mode, mask, w_vector);
-  xops[4] = t1;
-  xops[5] = t2;
+  xops[4] = t2;
+  xops[5] = t1;
 
   return ix86_expand_int_vcond (xops);
 }
 
-  /* mask = mask * {w, w, ...}  */
-  new_mask = expand_simple_binop (maskmode, MULT, new_mask, w_vector,
+  /* mask = mask * {16/w, 16/w, ...}  */
+  for (i = 0; i < w; i++)
+vec[i] = GEN_INT (16/w);
+  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  new_mask = expand_simple_binop (maskmode, MULT, new_mask, vt,
  NULL_RTX, 0, OPTAB_DIRECT);
 
   /* Convert mask to vector of chars.  */
@@ -19332,7 +19335,7 @@ ix86_expand_vshuffle (rtx operands[])
  ...  */
   for (i = 0; i < w; i++)
 for (j = 0; j < 16/w; j++)
-  vec[i*w+j] = GEN_INT (i*16/w);
+  vec[i*(16/w)+j] = GEN_INT (i*16/w);
   vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
   vt = force_reg (V16QImode, vt);
 
@@ -19344,7 +19347,7 @@ ix86_expand_vshuffle (rtx operands[])
  new_mask = new_mask + {0,1,..,16/w, 0,1,..,16/w, ...}  */
   for (i = 0; i < w; i++)
 for (j = 0; j < 16/w; j++)
-  vec[i*w+j] = GEN_INT (j);
+  vec[i*(16/w)+j] = GEN_INT (j);
 
   vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec));
   new_mask = expand_simple_binop (V16QImode, PLUS, new_mask, vt,
@@ -19386,8 +19389,8 @@ ix86_expand_vshuffle (rtx operands[])
   xops[1] = operands[1];
   xops[2] = operands[2];
   xops[3] = gen_rtx_EQ (mode, mask, w_vector);
-  xops[4] = t1;
-  xops[5] = t2;
+  xops[4] = t2;
+  xops[5] = t1;
 
   return ix86_expand_int_vcond (xops);
 }


Use i386/t-crtstuff for i?86-elf and x86_64-elf

2011-10-03 Thread Joseph S. Myers
i?86-elf and x86_64-elf targets default to
-fasynchronous-unwind-tables, like most other x86 targets, and so
since they build crtend.o they need to build it with
-fno-asynchronous-unwind-tables, like other targets.  This patch
accordingly makes them use i386/t-crtstuff.

I'm not set up for full testing of these targets with FSF sources, but
did a sanity check on this patch by building cc1 for i686-elf, and in
a 4.6-based tree it fixes "ld: error in crtend.o(.eh_frame); no
.eh_frame_hdr table will be created.".  OK to commit?

2011-10-03  Joseph Myers  

* config.gcc (i[34567]86-*-elf*, x86_64-*-elf*): Use
i386/t-crtstuff.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 179480)
+++ gcc/config.gcc  (working copy)
@@ -1229,11 +1229,11 @@ x86_64-*-darwin*)
;;
 i[34567]86-*-elf*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h"
-   tmake_file="${tmake_file} i386/t-i386elf t-svr4"
+   tmake_file="${tmake_file} i386/t-i386elf i386/t-crtstuff t-svr4"
;;
 x86_64-*-elf*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
newlib-stdint.h i386/i386elf.h i386/x86-64.h"
-   tmake_file="${tmake_file} i386/t-i386elf t-svr4"
+   tmake_file="${tmake_file} i386/t-i386elf i386/t-crtstuff t-svr4"
;;
 i[34567]86-*-freebsd*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h 
${fbsd_tm_file} i386/freebsd.h"

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Start adding support for VIS 3.0 instructions.

2011-10-03 Thread Richard Henderson
On 10/01/2011 11:40 AM, David Miller wrote:
> +;; Conditional moves are possible via fcmpX --> cmaskX -> bshuffle

Does this comment mean you can fairly efficiently implement the
vcond patterns?


r~


Re: Vector shuffling

2011-10-03 Thread H.J. Lu
On Fri, Sep 30, 2011 at 4:21 PM, Artem Shinkarov
 wrote:
> Sorry for that, the vector comparison was submitted earlier. In the
> attachment there is a new version of the patch against the latest
> checkout.
>
> Richard, can you have a look at the genopinit.c, I am using
> set_direct_optab_handler, is it correct?
>
> All the rest seems to be the same.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50607

-- 
H.J.


PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-03 Thread H.J. Lu
This patch improves address combine for x32 by forcing the memory memory
operand of PLUS operation into register.  Tested on Linux/x86-64 with
-mx32.  OK for trunk?

Thanks.


H.J.
---
2011-10-03  H.J. Lu  

PR target/50603
* config/i386/i386.c (ix86_fixup_binary_operands): Force the
memory operand of PLUS operation into register for x32.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9b079af..922f691 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15713,6 +15713,16 @@ ix86_fixup_binary_operands (enum rtx_code code, enum 
machine_mode mode,
   else
src2 = force_reg (mode, src2);
 }
+  else
+{
+  /* Improve address combine in x32 mode.  */
+  if (TARGET_X32
+ && code == PLUS
+ && !MEM_P (dst)
+ && !MEM_P (src1)
+ && MEM_P (src2) )
+   src2 = force_reg (mode, src2);
+}
 
   /* If the destination is memory, and we do not have matching source
  operands, do things in registers.  */


Re: [RFC] Builtin infrastructure change

2011-10-03 Thread Michael Meissner
On Sat, Oct 01, 2011 at 02:11:27PM +, Joseph S. Myers wrote:
> On Fri, 30 Sep 2011, Michael Meissner wrote:
> 
> > Is this enough of a savings to continue on?  I'm of two minds about it, one 
> > is
> 
> The thing to measure is not so much memory as startup time (how long it 
> takes to compile an empty source file), which is important for libraries 
> and programs using a coding style with lots of small source files.

With my current changes which has modified the standard builtins to be lazy,
but I haven't yet done the machine dependent builtins, I saw a difference of
0.0022 seconds (0.0170 vs. 0.0148) on my 3 year old Intel core 2 laptop.  I did
14 runs in total, and skipped the fastest 2 runs and slowest 2 runs, and then
averaged the 10 runs in the middle.  I built boostrap builds with release
checking with the top of subversion head without changes and with my changes.

So at this point, I'm wondering whether it is worth it to finish the
optimization for lazy builtins.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


Re: [PATCH] Start adding support for VIS 3.0 instructions.

2011-10-03 Thread David Miller
From: Richard Henderson 
Date: Mon, 03 Oct 2011 15:19:00 -0700

> On 10/01/2011 11:40 AM, David Miller wrote:
>> +;; Conditional moves are possible via fcmpX --> cmaskX -> bshuffle
> 
> Does this comment mean you can fairly efficiently implement the
> vcond patterns?

That seems to be the case.  So such an expander would emit something
like:

fcmple32%f0, %f2, %g1
cmask32 %g1
bshuffle%f0, %f2, %f4

for an "le" conditional move.

When modes N and M use different vector element types, I'll have
to frob the bitmask produced by the fcmp and adjust the cmask
variant used.

What exactly is supposed to happen when, for example, the comparison
is between two v4hi values and the conditional move is done on
v2si values?  It seems the only requirement is that modes N and M
be vector modes of the same size.










Re: [PATCH] Start adding support for VIS 3.0 instructions.

2011-10-03 Thread Richard Henderson
On 10/03/2011 06:43 PM, David Miller wrote:
> What exactly is supposed to happen when, for example, the comparison
> is between two v4hi values and the conditional move is done on
> v2si values?  It seems the only requirement is that modes N and M
> be vector modes of the same size.

It's supposed to be vector modes of the same size and element count.
I.e. only V4SI and V4SF.  I guess that's not 100% clear from the
documentation...


r~


Re: [PATCH] Minor fixups to the sparc bmask/bshuffle patterns.

2011-10-03 Thread Hans-Peter Nilsson
On Mon, 3 Oct 2011, David Miller wrote:
>   * config/sparc/sparc.md (bmask_vis): Split into explicit 'di'
>   and 'si' patterns which describe the GSR changes explicitly in the
>   RTL using zero_extract.
>   (bshuffle_vis): Put the GSR use inside of the unspec.

(Heh, so I guess USE wasn't really that canonical. ;-)

Beware, ISTM you've now created an exception to your SPARC VIS
programming model that GSR belongs to the programmer (by analogy
with floating-point rounding) as you're letting gcc modify it.

Maybe have a command-line option controlling this and other
possible GSR-setting SIMD patterns from which vectorization
could benefit?

brgds, H-P


Re: Vector shuffling

2011-10-03 Thread Hans-Peter Nilsson
On Fri, 30 Sep 2011, Artem Shinkarov wrote:
>   gcc/doc
>   * extend.texi: Adjust.

Pretty please document the new pattern names in doc/md.texi as
well.  Thanks in advance.

brgds, H-P


Re: Intrinsics for N2965: Type traits and base classes

2011-10-03 Thread Benjamin Kosnik

> OK. Here is a new diff that hopefully takes into account all of
> Jason's and Benjamin's comments. Benjamin's TR2 build patch is not
> repeated (or tested!) here. Benjamin, I'd really appreciate if you
> wouldn't mind confirming I handled that correctly in tr2/type_traits
> (Including the inclusion of std/type_traits).

Hey! Here is a preliminary test suite. Just the basics on this one.
There's a bit of an issue with fundamental types, ICEs, but seems
fixable.

From here on in, just populate the testsuite/tr2/* directories with .cc
files. They will be tested by the testsuite machinery.

Your typelist interface looks pretty good. We should start here for the
interface, and can embellish it after it goes in.

-benjamin
diff --git a/libstdc++-v3/include/tr2/type_traits b/libstdc++-v3/include/tr2/type_traits
new file mode 100644
index 000..94aebf0
--- /dev/null
+++ b/libstdc++-v3/include/tr2/type_traits
@@ -0,0 +1,102 @@
+// TR2 type_traits -*- C++ -*-
+
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file tr2/type_traits
+ *  This is a TR2 C++ Library header.
+ */
+
+#ifndef _GLIBCXX_TR2_TYPE_TRAITS
+#define _GLIBCXX_TR2_TYPE_TRAITS 1
+
+#pragma GCC system_header
+#include 
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+namespace tr2
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /**
+   * @defgroup metaprogramming Type Traits
+   * @ingroup utilities
+   *
+   * Compile time type transformation and information.
+   * @{
+   */
+
+  template 
+struct typelist;
+
+  template<>
+struct typelist<>
+{
+  typedef std::true_type 			empty;
+};
+
+  template
+struct typelist<_First, _Rest...>
+{
+  typedef std::false_type 			empty;
+
+  struct first
+  {
+	typedef _First type;
+  };
+
+  struct rest
+  {
+	typedef typelist<_Rest...> 		type;
+  };
+};
+
+  // Sequence abstraction metafunctions default to looking in the type
+  template
+struct first : public _Tp::first { };
+
+  template
+struct rest : public _Tp::rest { };
+
+  template
+struct empty : public _Tp::empty { };
+
+
+  template
+struct bases
+{
+  typedef typelist<__bases(_Tp)...> 	type;
+};
+
+  template
+struct direct_bases
+{
+  typedef typelist<__direct_bases(_Tp)...> 	type;
+};
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+}
+
+#endif // _GLIBCXX_TR2_TYPE_TRAITS
diff --git a/libstdc++-v3/scripts/create_testsuite_files b/libstdc++-v3/scripts/create_testsuite_files
index f4a0bcd..a427eef 100755
--- a/libstdc++-v3/scripts/create_testsuite_files
+++ b/libstdc++-v3/scripts/create_testsuite_files
@@ -32,7 +32,7 @@ cd $srcdir
 # This is the ugly version of "everything but the current directory".  It's
 # what has to happen when find(1) doesn't support -mindepth, or -xtype.
 dlist=`echo [0-9][0-9]*`
-dlist="$dlist abi backward ext performance tr1 decimal"
+dlist="$dlist abi backward ext performance tr1 tr2 decimal"
 find $dlist "(" -type f -o -type l ")" -name "*.cc" -print > $tmp.01
 find $dlist "(" -type f -o -type l ")" -name "*.c" -print > $tmp.02
 cat  $tmp.01 $tmp.02 | sort > $tmp.1
diff --git a/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp b/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
index 8642eb7..19fa0e2 100644
--- a/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
+++ b/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp
@@ -58,6 +58,7 @@ if {[info exists tests_file] && [file exists $tests_file]} {
 lappend subdirs "$srcdir/ext"
 lappend subdirs "$srcdir/performance"
 lappend subdirs "$srcdir/tr1"
+lappend subdirs "$srcdir/tr2"
 lappend subdirs "$srcdir/decimal"
 verbose "subdirs are $subdirs"
 
diff --git a/libstdc++-v3/testsuite/tr2/bases/requirements/explicit_instantiation.cc b/libstdc++-v3/testsuite/tr2/bases/requirements/explicit_instantiation.cc
new file mode 100644
index 000..ddd6d6f
--- /dev/null
+++ b/libstdc++-v3/testsuite/tr2/bases/requirements/explicit_instantiation.cc
@@ -0,0 +1,3

Re: Intrinsics for N2965: Type traits and base classes

2011-10-03 Thread Benjamin Kosnik

> Hey! Here is a preliminary test suite. Just the basics on this one.
> There's a bit of an issue with fundamental types, ICEs, but seems
> fixable.

here's the pre-processed sources for the ICE

-benjamin

ice-reflection-1.ii.bz2
Description: application/bzip


Re: [PATCH] Start adding support for VIS 3.0 instructions.

2011-10-03 Thread Jakub Jelinek
On Mon, Oct 03, 2011 at 03:19:00PM -0700, Richard Henderson wrote:
> On 10/01/2011 11:40 AM, David Miller wrote:
> > +;; Conditional moves are possible via fcmpX --> cmaskX -> bshuffle
> 
> Does this comment mean you can fairly efficiently implement the
> vcond patterns?

vcond is actually vcond

Jakub


Re: PATCH: PR target/50603: [x32] Unnecessary lea

2011-10-03 Thread Uros Bizjak
On Tue, Oct 4, 2011 at 1:00 AM, H.J. Lu  wrote:
> This patch improves address combine for x32 by forcing the memory memory
> operand of PLUS operation into register.  Tested on Linux/x86-64 with
> -mx32.  OK for trunk?

Does the patch fix

FAIL: gcc.target/i386/pr45670.c scan-assembler-not lea[lq]

on x32 ?

Uros.


[PATCH] More consistent access to sparc %ger register.

2011-10-03 Thread David Miller

Committed to trunk.

gcc/

* config/sparc/sparc.md (fpack16_vis, fpackfix_vis, fpack32_vis): Make
GSR_REG an input operand to UNSPEC instead of a parallel USE.
(faligndata_vis): Likewise and use DI mode.
(alignaddrsi_vis, alignaddrdi_vis, alignaddrlsi_vis, alignaddrldi_vis):
Reference GSR_REG in DI mode, simplify convoluted expressions by using
zero_extract.
(bshuffle_vis): Reference GSR_REG in DI mode.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179489 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |   10 
 gcc/config/sparc/sparc.md |   55 ++--
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3fa190b..bdbe2a3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2011-10-03  David S. Miller  
+
+   * config/sparc/sparc.md (fpack16_vis, fpackfix_vis, fpack32_vis): Make
+   GSR_REG an input operand to UNSPEC instead of a parallel USE.
+   (faligndata_vis): Likewise and use DI mode.
+   (alignaddrsi_vis, alignaddrdi_vis, alignaddrlsi_vis, alignaddrldi_vis):
+   Reference GSR_REG in DI mode, simplify convoluted expressions by using
+   zero_extract.
+   (bshuffle_vis): Reference GSR_REG in DI mode.
+
 2011-10-03  Maxim Kuvyrkov  
 
* tree-eh.c (remove_unreachable_handlers): Obvious cleanup.
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 85d140e..92ec3a6 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -7867,9 +7867,9 @@
 
 (define_insn "fpack16_vis"
   [(set (match_operand:V4QI 0 "register_operand" "=f")
-(unspec:V4QI [(match_operand:V4HI 1 "register_operand" "e")]
- UNSPEC_FPACK16))
-   (use (reg:DI GSR_REG))]
+(unspec:V4QI [(match_operand:V4HI 1 "register_operand" "e")
+  (reg:DI GSR_REG)]
+ UNSPEC_FPACK16))]
   "TARGET_VIS"
   "fpack16\t%1, %0"
   [(set_attr "type" "fga")
@@ -7877,9 +7877,9 @@
 
 (define_insn "fpackfix_vis"
   [(set (match_operand:V2HI 0 "register_operand" "=f")
-(unspec:V2HI [(match_operand:V2SI 1 "register_operand" "e")]
- UNSPEC_FPACKFIX))
-   (use (reg:DI GSR_REG))]
+(unspec:V2HI [(match_operand:V2SI 1 "register_operand" "e")
+  (reg:DI GSR_REG)]
+ UNSPEC_FPACKFIX))]
   "TARGET_VIS"
   "fpackfix\t%1, %0"
   [(set_attr "type" "fga")
@@ -7888,9 +7888,9 @@
 (define_insn "fpack32_vis"
   [(set (match_operand:V8QI 0 "register_operand" "=e")
 (unspec:V8QI [(match_operand:V2SI 1 "register_operand" "e")
- (match_operand:V8QI 2 "register_operand" "e")]
- UNSPEC_FPACK32))
-   (use (reg:DI GSR_REG))]
+ (match_operand:V8QI 2 "register_operand" "e")
+  (reg:DI GSR_REG)]
+ UNSPEC_FPACK32))]
   "TARGET_VIS"
   "fpack32\t%1, %2, %0"
   [(set_attr "type" "fga")
@@ -8053,9 +8053,9 @@
 (define_insn "faligndata_vis"
   [(set (match_operand:V64I 0 "register_operand" "=e")
 (unspec:V64I [(match_operand:V64I 1 "register_operand" "e")
-  (match_operand:V64I 2 "register_operand" "e")]
- UNSPEC_ALIGNDATA))
-   (use (reg:SI GSR_REG))]
+  (match_operand:V64I 2 "register_operand" "e")
+  (reg:DI GSR_REG)]
+ UNSPEC_ALIGNDATA))]
   "TARGET_VIS"
   "faligndata\t%1, %2, %0"
   [(set_attr "type" "fga")
@@ -8065,10 +8065,8 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
 (plus:SI (match_operand:SI 1 "register_or_zero_operand" "rJ")
  (match_operand:SI 2 "register_or_zero_operand" "rJ")))
-   (set (reg:SI GSR_REG)
-(ior:SI (and:SI (reg:SI GSR_REG) (const_int -8))
-(and:SI (plus:SI (match_dup 1) (match_dup 2))
-(const_int 7]
+   (set (zero_extract:DI (reg:DI GSR_REG) (const_int 3) (const_int 0))
+(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
   "TARGET_VIS"
   "alignaddr\t%r1, %r2, %0")
 
@@ -8076,10 +8074,8 @@
   [(set (match_operand:DI 0 "register_operand" "=r")
 (plus:DI (match_operand:DI 1 "register_or_zero_operand" "rJ")
  (match_operand:DI 2 "register_or_zero_operand" "rJ")))
-   (set (reg:SI GSR_REG)
-(ior:SI (and:SI (reg:SI GSR_REG) (const_int -8))
-(and:SI (truncate:SI (plus:DI (match_dup 1) (match_dup 2)))
-(const_int 7]
+   (set (zero_extract:DI (reg:DI GSR_REG) (const_int 3) (const_int 0))
+(plus:DI (match_dup 1) (match_dup 2)))]
   "TARGET_VIS"
   "alignaddr\t%r1, %r2, %0")
 
@@ -8087,11 +8083,9 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
 (plus:SI (match_operand:SI 1 "register_or_zero_operand" "rJ")
  (match_operand:SI 2 "register_or_zero_operand" "rJ")))
-   (set (reg:SI G

Re: Use i386/t-crtstuff for i?86-elf and x86_64-elf

2011-10-03 Thread Paolo Bonzini

On 10/04/2011 12:14 AM, Joseph S. Myers wrote:

i?86-elf and x86_64-elf targets default to
-fasynchronous-unwind-tables, like most other x86 targets, and so
since they build crtend.o they need to build it with
-fno-asynchronous-unwind-tables, like other targets.  This patch
accordingly makes them use i386/t-crtstuff.


Ok.

Paolo