Re: LLVM as a gcc plugin?

2009-06-03 Thread Miles Bader
Chris Lattner  writes:
>> Some time ago, there was a discussion about integrating LLVM and GCC
>> [1]. However, with plugin infrastructure in place, could LLVM be
>> plugged into GCC as an additional optimization plugin?
>
> I'd love to see this, but I can't contribute to it directly.  I think
> the plugin interfaces would need small extensions, but there are no
> specific technical issues preventing it from happening.  LLVM has
> certainly progressed a lot since that (really old) email went out :)

Is there a description somewhere of areas where llvm is thought to do
well compared to gcc, and maybe future plans for improvement?

In the (limited) tests I've done, gcc [4.4, but 4.2 yields similar
results] seems to do a lot better than llvm [2.5], but those were C++
code and I wonder if llvm is currently concentrating on C?

-Miles

-- 
Quotation, n. The act of repeating erroneously the words of another. The words
erroneously repeated.



Re: LLVM as a gcc plugin?

2009-06-03 Thread Chris Lattner


On Jun 3, 2009, at 11:30 PM, Uros Bizjak wrote:


Hello!

Some time ago, there was a discussion about integrating LLVM and GCC
[1]. However, with plugin infrastructure in place, could LLVM be
plugged into GCC as an additional optimization plugin?

[1] http://gcc.gnu.org/ml/gcc/2005-11/msg00888.html


Hi Uros,

I'd love to see this, but I can't contribute to it directly.  I think  
the plugin interfaces would need small extensions, but there are no  
specific technical issues preventing it from happening.  LLVM has  
certainly progressed a lot since that (really old) email went out :)


-Chris


LLVM as a gcc plugin?

2009-06-03 Thread Uros Bizjak
Hello!

Some time ago, there was a discussion about integrating LLVM and GCC
[1]. However, with plugin infrastructure in place, could LLVM be
plugged into GCC as an additional optimization plugin?

[1] http://gcc.gnu.org/ml/gcc/2005-11/msg00888.html

Uros.


Help with BLOCK vs BIND_EXPR trees?

2009-06-03 Thread Jerry Quinn
Hi, all.  I have a basic question about GENERIC trees.

I'm playing with writing a front end, and find the distinction between
BLOCK and BIND_EXPR to be somewhat confusing.  In particular, I'm trying
to get a handle on how to represent a function in GENERIC form.

On the surface the texi docs and code comments don't seem to agree:

Section on function trees say FUNCTION_DECL represents a function, and
that DECL_INITIAL is not empty.  But it doesn't say what should be
contained there.  It says DECL_SAVED_TREE should contain the body of the
function.

The comments in tree.h say that DECL_INITIAL holds the body of a
function, with a BLOCK tree at the root.

BLOCK nodes are described under TREE_SSA->GIMPLE, though these nodes are
part of GENERIC if I understand correctly.  In this section, it says
that block scopes and variables are declared in BIND_EXPR nodes.

Can someone please clarify how these things are supposed to relate in
GENERIC form, assuming the default conversion to GIMPLE will be used?

Thanks,
Jerry Quinn



[4.3] Invalid code or invalid optimisation?

2009-06-03 Thread Dave Korn

Good morning everyone,

  I have an interesting situation.  In this bit of code below, extracted from
a simple testcase, I have a singly-linked list:


template  class List
{
 public:
  List() : head(__null)
  {
  }
  void insert (list_node *node)
  {
List_insert (head, node);
  }
  list_node *head;
};

class pthread_mutex: public verifyable_object
{
public:
[ ... data members ... ]
  pthread_mutex (pthread_mutexattr * = __null);
  ~pthread_mutex ();

  class pthread_mutex * next;

private:
  static List mutexes;
};

List pthread_mutex::mutexes;

pthread_mutex::pthread_mutex (pthread_mutexattr *attr) :
[ ... member initialisers ... ]
{
[ ... code ... ]
  mutexes.insert (this);
}


  I am getting unexpected results in the inlined List.insert operation in the
pthread_mutex constructor.  The critical part of the code uses an inlined
interlocked compare-and-exchange asm, that looks like this:


extern __inline__ long
ilockcmpexch (volatile long *t, long v, long c)
{
  return ({
__typeof (*t) ret;
__asm __volatile ("lock cmpxchgl %2, %1"
: "=a" (ret), "=m" (*t)
: "r" (v), "m" (*t), "0" (c));
ret;
});
}

template  inline void
List_insert (list_node *&head, list_node *node)
{
  if (!node)
return;
  do
node->next = head;
  while ((PVOID)ilockcmpexch((LONG volatile
*)(&head),(LONG)(node),(LONG)(node->next)) != node->next);
}


  To my surprise, GCCs 4.3.2 and 4.3.3 at -O2 both sink the store to
node->next after the call to ilockcmpexch:


__ZN13pthread_mutexC1EP17pthread_mutexattr:
[ ... code ... ]
L15:
movl__ZN13pthread_mutex7mutexesE, %edx   # mutexes.head, D.1991
movl%edx, %eax   # D.1991, tmp69
/APP
 # 35 "mxfull.cpp" 1
lock cmpxchgl %esi, __ZN13pthread_mutex7mutexesE # this,
 # 0 "" 2
/NO_APP
movl%eax, -12(%ebp)  # tmp69, ret
movl-12(%ebp), %eax  # ret, D.1988
cmpl%eax, %edx   # D.1988, D.1991
jne L15  #,
movl%edx, 36(%esi)   # D.1991, .next


  This is obviously bad news for the consistency of the list; the value of
'head' is cached in %edx and not written to node->next until after the
ilockcmpexch inline, meaning an incompletely-constructed node gets linked on
the front of the chain for a window of several instructions.  By contrast, a
recent build from head does what I want: it writes to node->next in front of
the ilockcmpexch, and only tests its value afterward:


__ZN13pthread_mutexC2EP17pthread_mutexattr:
[ ... code ... ]
L9:
movl__ZN13pthread_mutex7mutexesE, %eax   # mutexes.head, D.2119
movl%eax, 36(%ebx)   # D.2119, .next
/APP
 # 35 "mxfull.cpp" 1
lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE # this,
 # 0 "" 2
/NO_APP
movl%eax, -12(%ebp)  # tmp79, ret
movl-12(%ebp), %eax  # ret, D.2120
cmpl%eax, 36(%ebx)   # D.2120, .next
jne L9   #,


  Adding a "memory" clobber to the inline asm works around the problem,
causing 4.3 series to generate the same assembly as head, but I think it's a
sledgehammer approach.  Am I asking too much of GCC to not sink the store, or
is 4.3 doing something wrong?  I /think/ that the fact that there's a volatile
store in ilockcmpexch means the earlier store shouldn't be moved past it, and
that GCC is perhaps missing that the asm's output operand effectively
represents a volatile write through *t, but I could be misunderstanding the
rules about volatile.  Anyone got their language lawyer's hat on at the moment?

cheers,
  DaveK


// g++ -c mxfull.cpp -o mxfull.o --save-temps -O2 -fverbose-asm

typedef long LONG;
typedef void *HANDLE;
typedef void *PVOID;
typedef char *LPCSTR;

typedef class pthread_mutex *pthread_mutex_t;
typedef class pthread_mutexattr *pthread_mutexattr_t;
typedef class pthread *pthread_t;

struct SECURITY_ATTRIBUTES;
typedef struct SECURITY_ATTRIBUTES *LPSECURITY_ATTRIBUTES;
extern struct SECURITY_ATTRIBUTES sec_none_nih;

HANDLE __attribute__((__stdcall__)) 
CreateSemaphoreA(LPSECURITY_ATTRIBUTES,LONG,LONG,LPCSTR);

class verifyable_object
{
public:
  long magic;

  verifyable_object (long);
  virtual ~verifyable_object ();
};

extern __inline__ long
ilockcmpexch (volatile long *t, long v, long c)
{
  return ({
__typeof (*t) ret;
__asm __volatile ("lock cmpxchgl %2, %1"
 

Re: Problem with init of structure bit fields

2009-06-03 Thread Stelian Pop
On Wed, Jun 03, 2009 at 01:07:08PM -0700, Ian Lance Taylor wrote:

> > - unsigned int wordnum = (backwards ? nwords - i - 1 : i);
> > + unsigned int wordnum = (backwards
> > + ? GET_MODE_SIZE(fieldmode) / UNITS_PER_WORD
> > +   - i - 1
> > + : i);
> >   unsigned int bit_offset = (backwards
> >  ? MAX ((int) bitsize - ((int) i + 1)
> > * BITS_PER_WORD,
> 
> Your patch looks correct.  However, it makes me wonder how the test case
> passes on existing big-endian platforms, such as the PowerPC.  Can you
> explain how that works?

I'm not sure, but it is related to the smaller word size on my platform
and the size of the field (a 64 bit long long).

The 991118-1.c test case passes ok on my platform when the word
size is 32 or 16 bits (my processor has configurable word sizes). The only
configuration that fails is when the word size is 8 bits...

When I tried to simplify the testcase I obtained the simple assignment
problem I posted in the original mail. And that problem was visible in
both 8 and 16 bits configurations.

So I suspect the problem won't be seen on PowerPC unless someone does
TI or OI bit fields...

-- 
Stelian Pop 


Re: Problem with init of structure bit fields

2009-06-03 Thread Ian Lance Taylor
Stelian Pop  writes:

> On Wed, Jun 03, 2009 at 07:49:29PM +0200, Stelian Pop wrote:
>
>> I'm doing a port of gcc 4.3.3 on a custom architecture and I'm having trouble
>> when initializing the bit fields of  a structure.
>
> Ok, after further analysis, it looks like a genuine bug in gcc,
> in store_bit_field_1(): when a field is bigger than a word, the
> rvalue is splitted in several words, after being placed in a
> smallest_mode_for_size() operand.
>
> But the logic for adressing the words of this operand is buggy
> in the WORDS_BIG_ENDIAN case, the most signficant words being
> used instead of the least significant ones.
>
> The following patch corrects this, and makes the
> gcc.c-torture/execute/991118-1.c testcase work correctly on my platform.
>
> Stelian.
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index dc61de7..03c60a8 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -582,7 +582,10 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT 
> bitsize,
>   {
> /* If I is 0, use the low-order word in both field and target;
>if I is 1, use the next to lowest word; and so on.  */
> -   unsigned int wordnum = (backwards ? nwords - i - 1 : i);
> +   unsigned int wordnum = (backwards
> +   ? GET_MODE_SIZE(fieldmode) / UNITS_PER_WORD
> + - i - 1
> +   : i);
> unsigned int bit_offset = (backwards
>? MAX ((int) bitsize - ((int) i + 1)
>   * BITS_PER_WORD,

Your patch looks correct.  However, it makes me wonder how the test case
passes on existing big-endian platforms, such as the PowerPC.  Can you
explain how that works?

Ian


Re: Problem with init of structure bit fields

2009-06-03 Thread Stelian Pop
On Wed, Jun 03, 2009 at 07:49:29PM +0200, Stelian Pop wrote:

> I'm doing a port of gcc 4.3.3 on a custom architecture and I'm having trouble
> when initializing the bit fields of  a structure.

Ok, after further analysis, it looks like a genuine bug in gcc,
in store_bit_field_1(): when a field is bigger than a word, the
rvalue is splitted in several words, after being placed in a
smallest_mode_for_size() operand.

But the logic for adressing the words of this operand is buggy
in the WORDS_BIG_ENDIAN case, the most signficant words being
used instead of the least significant ones.

The following patch corrects this, and makes the
gcc.c-torture/execute/991118-1.c testcase work correctly on my platform.

Stelian.

diff --git a/gcc/expmed.c b/gcc/expmed.c
index dc61de7..03c60a8 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -582,7 +582,10 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT 
bitsize,
{
  /* If I is 0, use the low-order word in both field and target;
 if I is 1, use the next to lowest word; and so on.  */
- unsigned int wordnum = (backwards ? nwords - i - 1 : i);
+ unsigned int wordnum = (backwards
+ ? GET_MODE_SIZE(fieldmode) / UNITS_PER_WORD
+   - i - 1
+ : i);
  unsigned int bit_offset = (backwards
 ? MAX ((int) bitsize - ((int) i + 1)
* BITS_PER_WORD,
-- 
Stelian Pop 


Problem with init of structure bit fields

2009-06-03 Thread Stelian Pop
Hi,

I'm doing a port of gcc 4.3.3 on a custom architecture and I'm having trouble
when initializing the bit fields of  a structure.

The testcase is based on a modified gcc torture testcase, the natural
registers are 16 bits, and long long is defined to be 64 bit wide:

struct itmp
{
  long long int pad :   30; 
  long long int field : 34; 
};

struct itmp itmp = {0x123LL, 0x123456LL};

int main(void)
{
  itmp.field = 0x42;
  return 1;
}

Running the above example gives (compiled with -O0...):

12itmp.field = 0x42;
(gdb) p/x itmp
$1 = {pad = 0x123, field = 0x123456}
(gdb) n
13return 1;
(gdb) p/x itmp
$2 = {pad = 0x123, field = 0x0}

If I use 32 bits for pad and 32 bits for field, the result is correct.
Also, if I use 'long' instead of 'long long' (and change the bit lengths
of course), it works too.

Looking at the RTL shows the problem right from the beginning, in the
expand pass (there is no reference to the constant 66 = 0x42 in the RTL
below):

;; itmp.field = 66
(insn 5 4 6 991118-1.c:12 (set (reg/f:HI 25) 
(symbol_ref:HI ("itmp") [flags 0x2] )) -1
(nil))

(insn 6 5 7 991118-1.c:12 (set (reg:HI 26) 
(reg/f:HI 25)) -1 (nil))

(insn 7 6 8 991118-1.c:12 (set (reg/f:HI 27) 
(plus:HI (reg/f:HI 25) 
(const_int 6 [0x6]))) -1 (nil))

(insn 8 7 9 991118-1.c:12 (set (reg:HI 28) 
(const_int 0 [0x0])) -1 (nil))

(insn 9 8 10 991118-1.c:12 (set (mem/s/j/c:HI (reg/f:HI 27) [0+6 S2 A16])
(reg:HI 28)) -1 (nil))

(insn 10 9 11 991118-1.c:12 (set (reg:HI 29)
(reg/f:HI 25)) -1 (nil))

(insn 11 10 12 991118-1.c:12 (set (reg/f:HI 30)
(plus:HI (reg/f:HI 25)
(const_int 4 [0x4]))) -1 (nil))

(insn 12 11 13 991118-1.c:12 (set (reg:HI 31)
(const_int 0 [0x0])) -1 (nil))

(insn 13 12 14 991118-1.c:12 (set (mem/s/j/c:HI (reg/f:HI 30) [0+4 S2 A16])
(reg:HI 31)) -1 (nil))

(insn 14 13 15 991118-1.c:12 (set (reg:HI 32)
(reg/f:HI 25)) -1 (nil))

(insn 15 14 16 991118-1.c:12 (set (reg/f:HI 33)
(plus:HI (reg/f:HI 25)
(const_int 2 [0x2]))) -1 (nil))

(insn 16 15 17 991118-1.c:12 (set (reg:HI 34)
(mem/s/j/c:HI (reg/f:HI 33) [0+2 S2 A16])) -1 (nil))

(insn 17 16 18 991118-1.c:12 (set (reg:HI 36)
(const_int -4 [0xfffc])) -1 (nil))

(insn 18 17 19 991118-1.c:12 (set (reg:HI 35)
(and:HI (reg:HI 34)
(reg:HI 36))) -1 (nil))

(insn 19 18 0 991118-1.c:12 (set (mem/s/j/c:HI (reg/f:HI 33) [0+2 S2 A16])
(reg:HI 35)) -1 (nil))


Any idea on what is happenning here ? Am I missing some standard
patterns in my .md file ?

Thanks !

-- 
Stelian Pop 


Re: From regno to pseudo?

2009-06-03 Thread Adam Nemet
Steven Bosscher  writes:
> Is there a way to get the REG for a given regno?  I am building a
> register renumbering map that is just a pair of unsigned int
> , but I can't figure out how to get the REG for
> new_regno without remembering a pointer to it myself. Is there an
> easier/better way?

regno_reg_rtx in emit-rtl.c?

Adam


From regno to pseudo?

2009-06-03 Thread Steven Bosscher
Hello,

Is there a way to get the REG for a given regno?  I am building a
register renumbering map that is just a pair of unsigned int
, but I can't figure out how to get the REG for
new_regno without remembering a pointer to it myself. Is there an
easier/better way?

Ciao!
Steven


Re: Restrict keyword doesn't work correctly in GCC 4.4

2009-06-03 Thread Richard Guenther
On Wed, Jun 3, 2009 at 5:34 PM, Bingfeng Mei
 wrote:
> Richard,
> Thanks. I tried your patch and the -fno-tree-ter, and none works. The problem 
> is that
>
>       decl = find_base_decl (TREE_OPERAND (inner, 0));  <--- Cannot find the 
> base declaration, so decl = 0
>
>       if (decl                                  <-- won't be checked
>          && POINTER_TYPE_P (TREE_TYPE (decl))
>          && TYPE_RESTRICT (TREE_TYPE (decl)))
>
>
> The TREE_OPERAND (inner, 0) is:
>
>      type         type             size 
>            unit size 
>            align 32 symtab 0 alias set 2 canonical type 0xf7f122f4 precision 
> 32 min  max  2147483647>
>            pointer_to_this >
>        sizes-gimplified public unsigned restrict SI size  0xf7f0f9d8 32> unit size 
>        align 32 symtab 0 alias set -1 canonical type 0xf7fa6870>
>
>    arg 0         type  size  unit size 
>            align 32 symtab 0 alias set -1 canonical type 0xf7f12438 precision 
> 32 min  max >
>
>        arg 0  unsigned int>
>            used unsigned ignored SI file tst.c line 1 col 6 size  0xf7f0f9d8 32> unit size 
>            align 32 context 
>            (reg:SI 104 [ D.1768 ])>
>        arg 1 
>        tst.c:7:5>
>    tst.c:7:5>
>
>
> I added the following code. It seems to work for my example and others. Not 
> sure potential hazard with it.
>         ...
>          else if(!decl
>                  && POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND(inner, 0)))
>                  && TYPE_RESTRICT (TREE_TYPE (TREE_OPERAND(inner, 0
>            {
>               return new_alias_set ();
>            }
>         
>

Ah, of course.  As I said - restrict support is broken.

Richard.


RE: Restrict keyword doesn't work correctly in GCC 4.4

2009-06-03 Thread Bingfeng Mei
Richard,
Thanks. I tried your patch and the -fno-tree-ter, and none works. The problem 
is that 

   decl = find_base_decl (TREE_OPERAND (inner, 0));  <--- Cannot find the 
base declaration, so decl = 0

   if (decl  <-- won't be checked
  && POINTER_TYPE_P (TREE_TYPE (decl))
  && TYPE_RESTRICT (TREE_TYPE (decl)))


The TREE_OPERAND (inner, 0) is:

 
unit size 
align 32 symtab 0 alias set 2 canonical type 0xf7f122f4 precision 
32 min  max 
pointer_to_this >
sizes-gimplified public unsigned restrict SI size  unit size 
align 32 symtab 0 alias set -1 canonical type 0xf7fa6870>
   
arg 0  unit size 
align 32 symtab 0 alias set -1 canonical type 0xf7f12438 precision 
32 min  max >
   
arg 0 
used unsigned ignored SI file tst.c line 1 col 6 size  unit size 
align 32 context 
(reg:SI 104 [ D.1768 ])>
arg 1 
tst.c:7:5>
tst.c:7:5>


I added the following code. It seems to work for my example and others. Not 
sure potential hazard with it.
 ...
  else if(!decl
  && POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND(inner, 0)))
  && TYPE_RESTRICT (TREE_TYPE (TREE_OPERAND(inner, 0
{
   return new_alias_set ();
}
  


Bingfeng

> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com] 
> Sent: 03 June 2009 15:10
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: Restrict keyword doesn't work correctly in GCC 4.4
> 
> On Wed, Jun 3, 2009 at 1:02 PM, Bingfeng Mei 
>  wrote:
> > Richard,
> > Yes, my original code does have restrict qualified decl:
> >
> >  void foo(int byte, char *a, char *b){
> >  int * restrict dest = (int *)a;
> >  int * restrict src = (int *)b;
> >
> >  for(int i = 0; i < byte/8; i++){
> >    *dest++ = *src++;
> >  }
> > }
> >
> >
> > The code I shown is produced by tree level compilation.
> >
> >  *(int * restrict) (D.1934 + 4) = *(int * restrict) (D.1936 + 4);
> >  *(int * restrict) (D.1934 + 8) = *(int * restrict) (D.1936 + 8);
> >  *(int * restrict) (D.1934 + 12) = *(int * restrict) (D.1936 + 12);
> >  *(int * restrict) (D.1934 + 16) = *(int * restrict) (D.1936 + 16);
> >  *(int * restrict) (D.1934 + 20) = *(int * restrict) (D.1936 + 20);
> >  *(int * restrict) (D.1934 + 24) = *(int * restrict) (D.1936 + 24);
> >  *(int * restrict) (D.1934 + 28) = *(int * restrict) (D.1936 + 28);
> >  *(int * restrict) (D.1934 + 32) = *(int * restrict) (D.1936 + 32);
> >  *(int * restrict) (D.1934 + 36) = *(int * restrict) (D.1936 + 36);
> >  *(int * restrict) (D.1934 + 40) = *(int * restrict) (D.1936 + 40);
> >  *(int * restrict) (D.1934 + 44) = *(int * restrict) (D.1936 + 44);
> >  *(int * restrict) (D.1934 + 48) = *(int * restrict) (D.1936 + 48);
> >  *(int * restrict) (D.1934 + 52) = *(int * restrict) (D.1936 + 52);
> >  *(int * restrict) (D.1934 + 56) = *(int * restrict) (D.1936 + 56);
> >  *(int * restrict) (D.1934 + 60) = *(int * restrict) (D.1936 + 60);
> >
> > If we agree these tree statements still preserve the 
> meaning of restrict,
> > it should be RTL expansion going wrong. Am I right?
> 
> No, it is TER that removes the temporary that is required to make
> restrict work.  Try -fno-tree-ter or fixing TER to not TER
> to-restrict-pointer conversions.
> 
> Richard.
> 
> >
> > - Bingfeng
> >
> >
> >> -Original Message-
> >> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> >> Sent: 03 June 2009 11:54
> >> To: Bingfeng Mei
> >> Cc: gcc@gcc.gnu.org
> >> Subject: Re: Restrict keyword doesn't work correctly in GCC 4.4
> >>
> >> On Wed, Jun 3, 2009 at 12:41 PM, Bingfeng Mei
> >>  wrote:
> >> > Hello,
> >> > I noticed that the restrict doesn't work fully on 4.4.0
> >> (used to work on
> >> >  our port based on 4.3 branch). The problem is that tree
> >> optimizer can do a
> >> > lot of optimization regarding pointer, e.g., at -O3. The
> >> alias set property
> >> > is not propagated accordingly.
> >> >
> >> > Is the following RTL expansion correct? Both read and write
> >> address are
> >> > converted to a restrict pointer, but the both mem rtx have
> >> the same alias set (2).
> >> >
> >> > ;; *(int * restrict) (D.1768 + 4) = *(int * restrict) 
> (D.1770 + 4);
> >>
> >> restrict only works if there is a restrict qualified 
> pointer decl in
> >> your source.
> >>
> >> I will re-implement restrict support completely for 4.5.
> >>
> >> You can try the attached hack which might help (but also cause
> >> weird effects ...).
> >>
> >> Richard.
> >>
> >> > (insn 56 55 57 tst.c:7 (set (reg:SI 124)
> >> >        (mem:SI (plus:SI (reg:SI 103 [ D.1770 ])
> >> >                (const_int 4 [0x4])) [2 S4 A32])) -1 (nil))
> >> >
> >> > (insn 57 56 0 tst.c:7 (set (mem:SI (plus:SI (reg:SI 104 
> [ D.1768 ])
> >> >                (const_int 4 [0x4])) [2 S4 A32])
> >> >        (reg:SI 1

Re: Restrict keyword doesn't work correctly in GCC 4.4

2009-06-03 Thread Richard Guenther
On Wed, Jun 3, 2009 at 1:02 PM, Bingfeng Mei  wrote:
> Richard,
> Yes, my original code does have restrict qualified decl:
>
>  void foo(int byte, char *a, char *b){
>  int * restrict dest = (int *)a;
>  int * restrict src = (int *)b;
>
>  for(int i = 0; i < byte/8; i++){
>    *dest++ = *src++;
>  }
> }
>
>
> The code I shown is produced by tree level compilation.
>
>  *(int * restrict) (D.1934 + 4) = *(int * restrict) (D.1936 + 4);
>  *(int * restrict) (D.1934 + 8) = *(int * restrict) (D.1936 + 8);
>  *(int * restrict) (D.1934 + 12) = *(int * restrict) (D.1936 + 12);
>  *(int * restrict) (D.1934 + 16) = *(int * restrict) (D.1936 + 16);
>  *(int * restrict) (D.1934 + 20) = *(int * restrict) (D.1936 + 20);
>  *(int * restrict) (D.1934 + 24) = *(int * restrict) (D.1936 + 24);
>  *(int * restrict) (D.1934 + 28) = *(int * restrict) (D.1936 + 28);
>  *(int * restrict) (D.1934 + 32) = *(int * restrict) (D.1936 + 32);
>  *(int * restrict) (D.1934 + 36) = *(int * restrict) (D.1936 + 36);
>  *(int * restrict) (D.1934 + 40) = *(int * restrict) (D.1936 + 40);
>  *(int * restrict) (D.1934 + 44) = *(int * restrict) (D.1936 + 44);
>  *(int * restrict) (D.1934 + 48) = *(int * restrict) (D.1936 + 48);
>  *(int * restrict) (D.1934 + 52) = *(int * restrict) (D.1936 + 52);
>  *(int * restrict) (D.1934 + 56) = *(int * restrict) (D.1936 + 56);
>  *(int * restrict) (D.1934 + 60) = *(int * restrict) (D.1936 + 60);
>
> If we agree these tree statements still preserve the meaning of restrict,
> it should be RTL expansion going wrong. Am I right?

No, it is TER that removes the temporary that is required to make
restrict work.  Try -fno-tree-ter or fixing TER to not TER
to-restrict-pointer conversions.

Richard.

>
> - Bingfeng
>
>
>> -Original Message-
>> From: Richard Guenther [mailto:richard.guent...@gmail.com]
>> Sent: 03 June 2009 11:54
>> To: Bingfeng Mei
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Restrict keyword doesn't work correctly in GCC 4.4
>>
>> On Wed, Jun 3, 2009 at 12:41 PM, Bingfeng Mei
>>  wrote:
>> > Hello,
>> > I noticed that the restrict doesn't work fully on 4.4.0
>> (used to work on
>> >  our port based on 4.3 branch). The problem is that tree
>> optimizer can do a
>> > lot of optimization regarding pointer, e.g., at -O3. The
>> alias set property
>> > is not propagated accordingly.
>> >
>> > Is the following RTL expansion correct? Both read and write
>> address are
>> > converted to a restrict pointer, but the both mem rtx have
>> the same alias set (2).
>> >
>> > ;; *(int * restrict) (D.1768 + 4) = *(int * restrict) (D.1770 + 4);
>>
>> restrict only works if there is a restrict qualified pointer decl in
>> your source.
>>
>> I will re-implement restrict support completely for 4.5.
>>
>> You can try the attached hack which might help (but also cause
>> weird effects ...).
>>
>> Richard.
>>
>> > (insn 56 55 57 tst.c:7 (set (reg:SI 124)
>> >        (mem:SI (plus:SI (reg:SI 103 [ D.1770 ])
>> >                (const_int 4 [0x4])) [2 S4 A32])) -1 (nil))
>> >
>> > (insn 57 56 0 tst.c:7 (set (mem:SI (plus:SI (reg:SI 104 [ D.1768 ])
>> >                (const_int 4 [0x4])) [2 S4 A32])
>> >        (reg:SI 124)) -1 (nil))
>> >
>> >
>> > The alias set property is copied from tree node:
>> >  > >    type > >        size 
>> >        unit size 
>> >        align 32 symtab 0 alias set 2 canonical type
>> 0xf7f122f4 precision 32 min > -2147483648> max 
>> >        pointer_to_this >
>> >
>> >    arg 0 > >        type > 0xf7f122f4 int>
>> >            sizes-gimplified public unsigned restrict SI
>> size  unit size 
>> >            align 32 symtab 0 alias set -1 canonical type 0xf7fa6870>
>> >
>> >        arg 0 > 0xf7f12438 long unsigned int>
>> >            arg 0 
>> >            arg 1 
>> >            tst.c:7:5>
>> >        tst.c:7:5>
>> >    tst.c:7:5>
>> >
>> > Is the RTL expansion wrong or the orginal tree node is
>> constructed incorrectly?
>> >
>> > Thanks,
>> > Bingfeng Mei
>> >
>> > Broadcom UK
>> >
>>
>


The Linux binutils 2.19.51.0.8 is released.

2009-06-03 Thread H.J. Lu
This is the beta release of binutils 2.19.51.0.8 for Linux, which is
based on binutils 2009 0603 in CVS on sourceware.org plus various
changes. It is purely for Linux.

All relevant patches in patches have been applied to the source tree.
You can take a look at patches/README to see what have been applied and
in what order they have been applied.

Starting from the 2.18.50.0.4 release, the x86 assembler no longer
accepts

fnstsw %eax

fnstsw stores 16bit into %ax and the upper 16bit of %eax is unchanged.
Please use

fnstsw %ax

Starting from the 2.17.50.0.4 release, the default output section LMA
(load memory address) has changed for allocatable sections from being
equal to VMA (virtual memory address), to keeping the difference between
LMA and VMA the same as the previous output section in the same region.

For

.data.init_task : { *(.data.init_task) }

LMA of .data.init_task section is equal to its VMA with the old linker.
With the new linker, it depends on the previous output section. You
can use

.data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) }

to ensure that LMA of .data.init_task section is always equal to its
VMA. The linker script in the older 2.6 x86-64 kernel depends on the
old behavior.  You can add AT (ADDR(section)) to force LMA of
.data.init_task section equal to its VMA. It will work with both old
and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and
above is OK.

The new x86_64 assembler no longer accepts

monitor %eax,%ecx,%edx

You should use

monitor %rax,%ecx,%edx

or
monitor

which works with both old and new x86_64 assemblers. They should
generate the same opcode.

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are
available at

http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch
http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch

The ia64 assembler is now defaulted to tune for Itanium 2 processors.
To build a kernel for Itanium 1 processors, you will need to add

ifeq ($(CONFIG_ITANIUM),y)
CFLAGS += -Wa,-mtune=itanium1
AFLAGS += -Wa,-mtune=itanium1
endif

to arch/ia64/Makefile in your kernel source tree.

Please report any bugs related to binutils 2.19.51.0.8 to
hjl.to...@gmail.com

and

http://www.sourceware.org/bugzilla/

Changes from binutils 2.19.51.0.7:

1. Update from binutils 2009 0603.
2. Fix STT_GNU_IFUNC symbol with pointer equality.

Changes from binutils 2.19.51.0.6:

1. Update from binutils 2009 0601.
2. Update STT_GNU_IFUNC support. PR 10205.
3. Fix x86 asssembler Intel syntax regression with '$'. PR 10198.

Changes from binutils 2.19.51.0.5:

1. Update from binutils 2009 0529.
2. Rewrite STT_GNU_IFUNC, R_386_IRELATIVE and R_X86_64_IRELATIVE linker
support for STT_GNU_IFUNC symbols in shared library, dynamic executable
and static executable.
3. Add plugin support.
4. Improve spu support.

Changes from binutils 2.19.51.0.4:

1. Update from binutils 2009 0525.
2. Add STT_GNU_IFUNC, R_386_IRELATIVE and R_X86_64_IRELATIVE support to
assembler and linker.
3. Add LD_AS_NEEDED support to linker.
4. Remove AMD SSE5 support.
5. A new Intel syntax parser in x86 assembler.
6. Add DWARF discriminator support.
7. Add --64 support for x86 PE/COFF assembler.
8. Support common symbol with alignment for PE/COFF.
9. Improve gold support.
10. Improve arm support.
11. Improve mep support.
12. Improve mips support.
13. Improve ppc support.
14. Improve spu support.

Changes from binutils 2.19.51.0.3:

1. Update from binutils 2009 0418.
2. Remove EFI targets and use PEI targets for EFI. Add --file-alignment,
--heap, --image-base, --section-alignment, --stack and --subsystem command
line options for objcopy.  PR 10074.
3. Update linker to warn alternate ELF machine code.
4. Fix x86 linker TLS transition.  PR 9938.
5. Improve DWARF dumper to check relocations against STT_SECTION
symbol.
6. Guard DWARF dumper on bad DWARF input.
7. Add EM_ETPU and EM_SLE9X.  Reserve 3 ELF machine types for Intel.
8. Adding a linker missing entry symbol warning for -pie. PR 9970.
9. Make the -e option for linker to imply -u.  PR 6766.
10. Properly handle paging for PEI targets.
11. Fix assembler listing with input from stdin.
12. Update objcopy/string to generate symbol table if there is any
relocation in output.  PR 9945.
13. Require texinfo 4.7 for build.  PR 10039.
14. Add moxie support.
15. Improve gold support.
16. Improve AIX support.
17.

Re: Auto-import problem

2009-06-03 Thread Dave Korn
Piotr Wyderski wrote:
> Trying to work-around PR40269 (which doesn't happen
> anymore on trunk, so you may close it) I've commented
> out the dllexport/dllimport section:
> 
> #define BASE_DLLEXPORT  /*__declspec(dllexport)*/
> #define BASE_DLLIMPORT  /*__declspec(dllimport)*/
> 
> Then the program compiled successfully, emitting a lot
> of auto-import warning messages, but crashed unexpectedly
> somewhere in the middle of execution. If these __declspecs
> are uncommented, then it works correctly, as expected. Is
> it a known auto-import bug or "feature" I should be aware of,
> or should I dig deeper into the subject and ask out a debbugger
> for a passionate afternoon session?

  You deleted correct code, ignored all the warnings, and got a crashing
executable.  Is that a bug?  I can't be sure without seeing your full
testcase, but I'd guess not.  Auto-import is something of a best-effort
last-resort fallback.  It cannot handle everything that properly annotating
the source code can handle.

  See the section of the ld manual documenting `--enable-auto-import' for the
full and gory details.  Does passing --enable-runtime-pseudo-reloc to the
linker help any in this case?

cheers,
  DaveK


RE: Restrict keyword doesn't work correctly in GCC 4.4

2009-06-03 Thread Bingfeng Mei
Richard,
Yes, my original code does have restrict qualified decl:

 void foo(int byte, char *a, char *b){
  int * restrict dest = (int *)a;
  int * restrict src = (int *)b;

  for(int i = 0; i < byte/8; i++){
*dest++ = *src++;
  }
}  


The code I shown is produced by tree level compilation. 

  *(int * restrict) (D.1934 + 4) = *(int * restrict) (D.1936 + 4);
  *(int * restrict) (D.1934 + 8) = *(int * restrict) (D.1936 + 8);
  *(int * restrict) (D.1934 + 12) = *(int * restrict) (D.1936 + 12);
  *(int * restrict) (D.1934 + 16) = *(int * restrict) (D.1936 + 16);
  *(int * restrict) (D.1934 + 20) = *(int * restrict) (D.1936 + 20);
  *(int * restrict) (D.1934 + 24) = *(int * restrict) (D.1936 + 24);
  *(int * restrict) (D.1934 + 28) = *(int * restrict) (D.1936 + 28);
  *(int * restrict) (D.1934 + 32) = *(int * restrict) (D.1936 + 32);
  *(int * restrict) (D.1934 + 36) = *(int * restrict) (D.1936 + 36);
  *(int * restrict) (D.1934 + 40) = *(int * restrict) (D.1936 + 40);
  *(int * restrict) (D.1934 + 44) = *(int * restrict) (D.1936 + 44);
  *(int * restrict) (D.1934 + 48) = *(int * restrict) (D.1936 + 48);
  *(int * restrict) (D.1934 + 52) = *(int * restrict) (D.1936 + 52);
  *(int * restrict) (D.1934 + 56) = *(int * restrict) (D.1936 + 56);
  *(int * restrict) (D.1934 + 60) = *(int * restrict) (D.1936 + 60);

If we agree these tree statements still preserve the meaning of restrict,
it should be RTL expansion going wrong. Am I right? 

- Bingfeng


> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com] 
> Sent: 03 June 2009 11:54
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: Restrict keyword doesn't work correctly in GCC 4.4
> 
> On Wed, Jun 3, 2009 at 12:41 PM, Bingfeng Mei 
>  wrote:
> > Hello,
> > I noticed that the restrict doesn't work fully on 4.4.0 
> (used to work on
> >  our port based on 4.3 branch). The problem is that tree 
> optimizer can do a
> > lot of optimization regarding pointer, e.g., at -O3. The 
> alias set property
> > is not propagated accordingly.
> >
> > Is the following RTL expansion correct? Both read and write 
> address are
> > converted to a restrict pointer, but the both mem rtx have 
> the same alias set (2).
> >
> > ;; *(int * restrict) (D.1768 + 4) = *(int * restrict) (D.1770 + 4);
> 
> restrict only works if there is a restrict qualified pointer decl in
> your source.
> 
> I will re-implement restrict support completely for 4.5.
> 
> You can try the attached hack which might help (but also cause
> weird effects ...).
> 
> Richard.
> 
> > (insn 56 55 57 tst.c:7 (set (reg:SI 124)
> >        (mem:SI (plus:SI (reg:SI 103 [ D.1770 ])
> >                (const_int 4 [0x4])) [2 S4 A32])) -1 (nil))
> >
> > (insn 57 56 0 tst.c:7 (set (mem:SI (plus:SI (reg:SI 104 [ D.1768 ])
> >                (const_int 4 [0x4])) [2 S4 A32])
> >        (reg:SI 124)) -1 (nil))
> >
> >
> > The alias set property is copied from tree node:
> >   >    type  >        size 
> >        unit size 
> >        align 32 symtab 0 alias set 2 canonical type 
> 0xf7f122f4 precision 32 min  -2147483648> max 
> >        pointer_to_this >
> >
> >    arg 0  >        type  0xf7f122f4 int>
> >            sizes-gimplified public unsigned restrict SI 
> size  unit size 
> >            align 32 symtab 0 alias set -1 canonical type 0xf7fa6870>
> >
> >        arg 0  0xf7f12438 long unsigned int>
> >            arg 0 
> >            arg 1 
> >            tst.c:7:5>
> >        tst.c:7:5>
> >    tst.c:7:5>
> >
> > Is the RTL expansion wrong or the orginal tree node is 
> constructed incorrectly?
> >
> > Thanks,
> > Bingfeng Mei
> >
> > Broadcom UK
> >
> 


Auto-import problem

2009-06-03 Thread Piotr Wyderski
Trying to work-around PR40269 (which doesn't happen
anymore on trunk, so you may close it) I've commented
out the dllexport/dllimport section:

#define BASE_DLLEXPORT  /*__declspec(dllexport)*/
#define BASE_DLLIMPORT  /*__declspec(dllimport)*/

Then the program compiled successfully, emitting a lot
of auto-import warning messages, but crashed unexpectedly
somewhere in the middle of execution. If these __declspecs
are uncommented, then it works correctly, as expected. Is
it a known auto-import bug or "feature" I should be aware of,
or should I dig deeper into the subject and ask out a debbugger
for a passionate afternoon session?

Best regards
Piotr Wyderski


Re: Restrict keyword doesn't work correctly in GCC 4.4

2009-06-03 Thread Richard Guenther
On Wed, Jun 3, 2009 at 12:41 PM, Bingfeng Mei  wrote:
> Hello,
> I noticed that the restrict doesn't work fully on 4.4.0 (used to work on
>  our port based on 4.3 branch). The problem is that tree optimizer can do a
> lot of optimization regarding pointer, e.g., at -O3. The alias set property
> is not propagated accordingly.
>
> Is the following RTL expansion correct? Both read and write address are
> converted to a restrict pointer, but the both mem rtx have the same alias set 
> (2).
>
> ;; *(int * restrict) (D.1768 + 4) = *(int * restrict) (D.1770 + 4);

restrict only works if there is a restrict qualified pointer decl in
your source.

I will re-implement restrict support completely for 4.5.

You can try the attached hack which might help (but also cause
weird effects ...).

Richard.

> (insn 56 55 57 tst.c:7 (set (reg:SI 124)
>        (mem:SI (plus:SI (reg:SI 103 [ D.1770 ])
>                (const_int 4 [0x4])) [2 S4 A32])) -1 (nil))
>
> (insn 57 56 0 tst.c:7 (set (mem:SI (plus:SI (reg:SI 104 [ D.1768 ])
>                (const_int 4 [0x4])) [2 S4 A32])
>        (reg:SI 124)) -1 (nil))
>
>
> The alias set property is copied from tree node:
>      type         size 
>        unit size 
>        align 32 symtab 0 alias set 2 canonical type 0xf7f122f4 precision 32 
> min  max  2147483647>
>        pointer_to_this >
>
>    arg 0         type 
>            sizes-gimplified public unsigned restrict SI size  0xf7f0f9d8 32> unit size 
>            align 32 symtab 0 alias set -1 canonical type 0xf7fa6870>
>
>        arg 0  unsigned int>
>            arg 0 
>            arg 1 
>            tst.c:7:5>
>        tst.c:7:5>
>    tst.c:7:5>
>
> Is the RTL expansion wrong or the orginal tree node is constructed 
> incorrectly?
>
> Thanks,
> Bingfeng Mei
>
> Broadcom UK
>


p
Description: Binary data


Restrict keyword doesn't work correctly in GCC 4.4

2009-06-03 Thread Bingfeng Mei
Hello, 
I noticed that the restrict doesn't work fully on 4.4.0 (used to work on
 our port based on 4.3 branch). The problem is that tree optimizer can do a
lot of optimization regarding pointer, e.g., at -O3. The alias set property
is not propagated accordingly. 

Is the following RTL expansion correct? Both read and write address are
converted to a restrict pointer, but the both mem rtx have the same alias set 
(2). 

;; *(int * restrict) (D.1768 + 4) = *(int * restrict) (D.1770 + 4);

(insn 56 55 57 tst.c:7 (set (reg:SI 124)
(mem:SI (plus:SI (reg:SI 103 [ D.1770 ])
(const_int 4 [0x4])) [2 S4 A32])) -1 (nil))

(insn 57 56 0 tst.c:7 (set (mem:SI (plus:SI (reg:SI 104 [ D.1768 ])
(const_int 4 [0x4])) [2 S4 A32])
(reg:SI 124)) -1 (nil))


The alias set property is copied from tree node:
 
unit size 
align 32 symtab 0 alias set 2 canonical type 0xf7f122f4 precision 32 
min  max 
pointer_to_this >
   
arg 0 
sizes-gimplified public unsigned restrict SI size  unit size 
align 32 symtab 0 alias set -1 canonical type 0xf7fa6870>
   
arg 0 
arg 0 
arg 1 
tst.c:7:5>
tst.c:7:5>
tst.c:7:5>

Is the RTL expansion wrong or the orginal tree node is constructed incorrectly? 

Thanks,
Bingfeng Mei

Broadcom UK


Re: Any comment about the replacement of gcc news?

2009-06-03 Thread Ben Elliston
On Wed, 2009-06-03 at 16:33 +0800, Eric Fisher wrote:

> Sorry, I hope it's not an offensive or boring topic.

No, just off-topic.  This list is for developing gcc, not plotting our
demise.

Cheers, Ben




Re: Enquiry

2009-06-03 Thread Vijay
Thanks for the response Mukti.  I think the options could be: 
|-mlong-calls -mno-ep and ||-mno-prolog-function. Could please tell me 
how to sepcify these options in makefile?  Because I use gmake (in 
Cygwin shell)


Thanks,
-Vijay



|
mukti jain wrote:

Can you experiment with optionmization options and -m850 and -m850e?
have a look at this too if you are looking for quick fix.
 http://sourceware.org/ml/binutils/2005-08/msg00214.html

Thanks,
Mukti

On Tue, Jun 2, 2009 at 11:18 PM, mukti jain > wrote:




On Tue, Jun 2, 2009 at 7:22 PM, Vijay Holimath mailto:vi...@nii.ac.jp>> wrote:

Dear Sir,

  I am using gcc compiler for v850e cpu. When I use the
arrtribute: __attribute__ ((interrupt_handler)) or
__attribute__ ((interrupt); for interrupt function,  say for
example

void swnmi() __attribute__ ((interrupt_handler));

void swnmi()
{
...
..
}

main()
{
..
..
}

I am getting following error messages when I compile:

main.o (.text+0xea): In function 'swnmi': undefined reference
to '__ep'

main.o (.text+0xee): In function 'swnmi': undefined reference
to '__ep'

collect2: Id returned 1 exit status


I will be grateful to you if you could help me to get rid of
these error messages.  Probably I have to link some libraries?

No, __ep is a linker variable.

man page says..

(http://ftp.gnu.org/pub/pub/old-gnu/Manuals/gas-2.9.1/html_chapter/as_24.html)

This can either be set up automatically by the linker, or
specifically set by using the -defsym __ep= command
line option].



Thanks,
Mukti



Many thanks,
-Vijay

-- 
Vijay Holimath, Ph.D

National Institute of Informatics
2-1-2 Hitotsubashi Chiyoda-ku Tokyo 101-8430, Japan
Tel: +81-3-4212-2662 |Fax:+81-3-3556-1916
Mobile: +81-80-3542-1560
e-mail: vi...@nii.ac.jp 
http://www.linkedin.com/in/vijayholimath
Skype name: vijay.holimath







Re: Using a umulhisi3

2009-06-03 Thread Julian Brown
On Wed, 3 Jun 2009 21:39:34 +1200
Michael Hope  wrote:

> How does the combine stage work?  It looks like it could get multiple
> potential matches for a set of RTLs.  Does it use some type of costing
> function to pick between them?  Can I tell combine that a umulhisi3 is
> cheaper than a mulsi3?

You could try defining TARGET_RTX_COSTS, if you haven't already.

Julian


Using a umulhisi3

2009-06-03 Thread Michael Hope
Hi there.  The architecture I'm working is a 32 bit, word based
machine with a 16x16 -> 32 unsigned multiply.  For some reason the
combine stage is converting the umulhisi3 into a mulsi3 and I'm not
sure how to track this down.

The test code is part of an alpha blend:

void blend(uint8_t* sb, uint8_t* db)
{
  uint16_t ia = 256 - *sb;
  uint16_t d = *db;

  *db = ((d * ia) >> 8) + *sb;
}

I've define the different multiplies in the .md file:
(define_insn "umulhisi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
(mult:SI (zero_extend:SI
  (match_operand:HI 1 "register_operand" "%r"))
 (zero_extend:SI
  (match_operand:HI 2 "register_operand" "r"]
  ""
...

(define_insn "mulsi3"
  [(set (match_operand:SI 0 "register_operand" "=r")
(mult:SI (match_operand:SI 1 "register_operand" "%r")
 (match_operand:SI 2 "register_operand" "r")))]
   ""
...

Running at -O level optimisations gives the following in
umul.157r.outof_cfglayout, just before the combine stage:
---
(insn 3 6 4 2 umul.c:16 (set (reg/v/f:SI 28 [ sb ])
(reg:SI 0 R10 [ sb ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 0
R10 [ sb ])
(nil)))

(insn 4 3 5 2 umul.c:16 (set (reg/v/f:SI 29 [ db ])
(reg:SI 1 R11 [ db ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 1
R11 [ db ])
(nil)))

(note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)

(insn 8 5 9 2 umul.c:17 (set (reg:SI 26 [ D.1217 ])
(zero_extend:SI (mem:QI (reg/v/f:SI 28 [ sb ]) [0 S1 A8]))) 27
{zero_extendqisi2} (expr_list:REG_DEAD (reg/v/f:SI 28 [ sb ])
(nil)))

(insn 9 8 10 2 umul.c:20 (set (reg:HI 30)
(const_int 256 [0x100])) 1 {movhi_insn} (nil))

(insn 10 9 11 2 umul.c:20 (set (reg:SI 31)
(minus:SI (subreg:SI (reg:HI 30) 0)
(reg:SI 26 [ D.1217 ]))) 12 {subsi3} (expr_list:REG_DEAD (reg:HI 30)
(nil)))

(insn 11 10 12 2 umul.c:20 (set (reg:SI 33)
(zero_extend:SI (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]))) 27
{zero_extendqisi2} (nil))

(insn 12 11 13 2 umul.c:20 (set (reg:HI 32)
(subreg:HI (reg:SI 33) 0)) 1 {movhi_insn} (expr_list:REG_DEAD
(reg:SI 33)
(nil)))

(insn 13 12 14 2 umul.c:20 (set (reg:SI 34)
(mult:SI (zero_extend:SI (reg:HI 32))
(zero_extend:SI (subreg:HI (reg:SI 31) 0 14
{umulhisi3} (expr_list:REG_DEAD (reg:HI 32)
(expr_list:REG_DEAD (reg:SI 31)
(nil

(insn 14 13 15 2 umul.c:20 (set (reg:SI 35)
(ashiftrt:SI (reg:SI 34)
(const_int 8 [0x8]))) 21 {ashrsi3_const}
(expr_list:REG_DEAD (reg:SI 34)
(nil)))

(insn 15 14 16 2 umul.c:20 (set (reg:QI 36)
(subreg:QI (reg:SI 35) 0)) 0 {movqi_insn} (expr_list:REG_DEAD
(reg:SI 35)
(nil)))

(insn 16 15 17 2 umul.c:20 (set (reg:SI 37)
(plus:SI (reg:SI 26 [ D.1217 ])
(subreg:SI (reg:QI 36) 0))) 11 {addsi3}
(expr_list:REG_DEAD (reg:QI 36)
(expr_list:REG_DEAD (reg:SI 26 [ D.1217 ])
(nil

(insn 17 16 0 2 umul.c:20 (set (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8])
(subreg:QI (reg:SI 37) 0)) 0 {movqi_insn} (expr_list:REG_DEAD
(reg:SI 37)
(expr_list:REG_DEAD (reg/v/f:SI 29 [ db ])
(nil
---
The umulhisi3 has been correctly found and used at this stage.  In the
following combine stage however, it gets converted into a mulsi3.  The
.combine dump is attached.

The xtensa port is the closest match I can find as it is 32 bit, word
based, and has the umulhisi3.  It correctly keeps the 16 bit multiply.

Some other test cases like:
uint32_t mul(uint16_t a, uint16_t b)
{
return a*b;
}

come through fine.  It might be something to do with the memory access.

How does the combine stage work?  It looks like it could get multiple
potential matches for a set of RTLs.  Does it use some type of costing
function to pick between them?  Can I tell combine that a umulhisi3 is
cheaper than a mulsi3?

Thanks for the earlier help on the post reload split to use the
accumulator - it's working well.

-- Michael


umul.i.159r.combine
Description: Binary data


Any comment about the replacement of gcc news?

2009-06-03 Thread Eric Fisher
Hello

Sorry, I hope it's not an offensive or boring topic.

Some of my friends asked me if it's true that gcc will be replaced by
other compilers on a few OS and what is the problem.

Any comment?

Best wishes
Eric Fisher


Re: gcc --help for options which are not warnings or optimizations

2009-06-03 Thread Nick Clifton

Hi Ian, Hi Diego,

> Diego Novillo wrote:

--help=other?


That works. :-)

Actually, there already is a qualifier which will select 
-fstack-protector, I had just forgotten about it:


  % gcc --help=common | grep stack
  -Wstack-protector   Warn when not issuing stack smashing 
protection

  -fdefer-pop Defer popping functions args from stack until
  -fomit-frame-pointerWhen possible do not generate stack frames
  -fstack-check   Insert stack checking code into the program
  -fstack-limit   This switch lacks documentation
  -fstack-limit-register= Trap if the stack goes past 
  -fstack-limit-symbol= Trap if the stack goes past symbol 
  -fstack-protector   Use propolice as a stack protection method
  -fstack-protector-all   Use a stack protection method for every 
function


Cheers
  Nick


Re: question about TARGET_MUST_PASS_IN_STACK

2009-06-03 Thread Ian Lance Taylor
DJ Delorie  writes:

> On xstormy16, when structures with variable-length arrays are passed
> to functions (execute/20020412-1.c), it appears that they're passed by
> reference (based on examining the stack), despite the port not
> explicitly requesting that.
>
> This causes a mis-match in the va_arg code, which assumes the array is
> passed by value, just pushed to the stack portion of the argument
> list.
>
> Which interpretation of these macros is correct?  (based on that, I'll
> debug further)
>
> Xstormy16 uses the default TARGET_MUST_PASS_IN_STACK, which returns
> true only for variable-length arrays, and uses the default
> TARGET_PASS_BY_REFERENCE, which always returns false.

See the function pass_by_reference in function.c.

  /* GCC post 3.4 passes *all* variable sized types by reference.  */

Ian