Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?

2013-04-19 Thread Steven Bosscher
On Thu, Apr 18, 2013 at 12:58 PM, Bernd Schmidt wrote:
> In general I think if a new target wants more than one delay slot, it
> should try to use the C6X method instead of reorg.c. It would be nice
> for someone to try it on a target like mips or PA as well;

Agreed on both points. I actually considered this first, but it's a
little involved because you have to somehow model nullified branches
on targets that don't have predication support. Just the
INSN_FROM_TARGET flag isn't good enough, and wrapping insns in
annulling branch delay slots in a COND_EXEC would require some changes
in recog that I didn't really like.

I think that it would be hard to put all of the smarts of reorg in the
scheduler proper. For all its ugliness, AFAICT reorg is remarkably
effective for targets with a single delay slot, with its
"opposite-arithmetic" trick, "un-PRE" around loops, code hoisting and
redundancy elimination, etc. My new toy scheduler can't compete unless
I implement at least a few of these tricks (but hopefully a little
cleaner)...

Ciao!
Steven


Re: Delay slot filling - what still matters, and what doesn't matter so much anymore?

2013-04-19 Thread Steven Bosscher
On Thu, Apr 18, 2013 at 6:22 AM, Jeff Law wrote:
> On 04/17/2013 03:52 PM, Steven Bosscher wrote:
>>
>> First of all: What is still important to handle?
>>
>> It's clear that the expectations in reorg.c are "anything goes" but
>> modern RISCs (everything since the PA-8000, say) probably have some
>> limitations on what is helpful to have, or not have, in a delay slot.
>> According to the comments in pa.h about MASK_JUMP_IN_DELAY, having
>> jumps in delay slots of other jumps is one such thing: They don't
>> bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As
>> far as I know, SPARC and MIPS don't allow jumps in delay slots, SH
>> looks like it doesn't allow it either, and CRIS can do it for short
>> branches but doesn't do because the trade-off between benefit and
>> machine description complexity comes out negative.
>
> Note that sparc and/or mips might use the adjust the return pointer trick.
> I know it wasn't my idea when I added it to the PA.

If they do that trick, it's not documented how that should work. It
doesn't look like it though.

Test case:
 8< 
void f1 (void);
void f2 (void);
void f3 (void);

void foo (long a)
{
  if (a != 0)
{
  f1 ();
  goto skip_some;
}
  else
f2 ();
skip_some:
  f3 ();
}
 8< 

sparc64 assembly (with -O2 -fno-reorder-blocks):
 8< 
foo:
save%sp, -176, %sp
brz,pt  %i0, .L2
 nop
callf1, 0
 nop
ba,pt   %xcc, .L3
 nop
.L2:
callf2, 0
 nop
.L3:
callf3, 0
 restore
 8< 
sparc32 is identical except for the frame size.

mipsisa64 assembly (also with -O2 -fno-reorder-blocks):
 8< 
foo:
.frame  $sp,8,$31   # vars= 0, regs= 1/0, args= 0, gp= 0
.mask   0x8000,0
.fmask  0x,0
.setnoreorder
.setnomacro
daddiu  $sp,$sp,-8
beq $4,$0,$L2
sd  $31,0($sp)

jal f1
nop

j   $L6
ld  $31,0($sp)

.align  3
$L2:
jal f2
nop

$L3 = .
ld  $31,0($sp)
$L6:
j   f3
daddiu  $sp,$sp,8
 8< 


>>  On the scheduler
>> implementation side: Branches as delayed insns in delay slots of other
>> branches is impossible to express in the CFG (at least in GCC, but I
>> think in general it can't be done cleanly). Therefore I want to drop
>> support for branches in delay slots. What do you think about this?
>
> Certainly no need to support it in the generic case.  The only question is
> whether or not it's worth supporting the adjust the return pointer in the
> delay slot stuff.  Given an target without call/ret predictor stack, it can
> be a singificant advantage.  Such things might exist in the embedded space.

This shouldn't be very difficult to support if the target models this
as a jump in the delay slot of calls only. I can let the delay slot
filler allow jumps in delay slots of calls but not in delay slots of
other jumps. But for the moment I'm going to ignore this case unless
someone knows a target in the FSF tree that would benefit of it.


>> What about multiple delay slots? It looks like reorg.c has code to
>> handle insns with multiple delay slots, but there currently are no GCC
>> targets in the FSF tree that have insns with multiple delay slots and
>> that use define_delay.
>
> Ping Hans, I think he was the last person who tried to deal with reorg and
> multiple delay slots (c4x?).  I certainly wouldn't lose any sleep if we
> killed the limit support for multiple delay slots.

Right, c4x has 3 delay slots. There are also out-of-tree ports for
targets like SHARC. But most such DSP-like targets have some form of
support for predication, so Bernd's c6x scheme would be a better fit.
(And c4x is too old to care about anyway :-)


>> Another thing I noticed about targets with delay slots that can be
>> nullified, is that at least some of the ifcvt.c transformations could
>> be applied to fill more delay slots (obviously if_case_1 and
>> if_case_2. In reorg.c, optimize_skip does some kind of if-conversion.
>> Has anyone looked at whether optimize_skip still does something, and
>> derived a test case for that?
>
> I doubt anyone has looked at it recently.  It pre-dates our if-conversion
> code by a decade or more.

So I collected some stats myself, for a small number (31) files of gcc
itself, mostly from libcpp and various generator files, compiled at
-O2 for sparc64:

pass 1  pass 2  
total   simple  eager   skipsimple  eager   skip
insns   9743348822  1297525 0
filled  5918298022  21  0   0
hit%61% 31% 0%  0%  0%  0%

total   pass 1  pass 2  
insns   97431297
filled  892021
hit%92% 2%

So the first fill_simple_delay_slots pass fills ~60% of

GTY question

2013-04-19 Thread Hendrik Greving
The GCC port I am using has some 'hack', that shares some data structs
interfaces (not the data  itself) between front- and backend.
Therefore, I have the very ugly situation to have the definition of
these data structures both in the .h as well as in .c
(for GTY). Is there a more elegant way to make structures visible to
GTY? Can I make GTY look at header files? I am absolutely not familiar
with GTY.

Thanks,
Hendrik Greving


Re: LRA assign same hard register with live range overlapped pseduos

2013-04-19 Thread Vladimir Makarov

On 13-04-17 11:11 PM, Shiva Chen wrote:

Hi, Vladimir

Overlapped live range RTL is from line 7577 to 7597 in test2.c.209r.reload

Previous patch probably not completed.
The new patch will record lra_reg_info[i].offset as the offset from
eliminate register to the pseudo i
and keep updating when the stack has been changed.
Therefore, lra-assign could get the latest offset to identify the
pseudo content is equal or not.

  gcc/lra-assigns.c  |6 --
  gcc/lra-eliminations.c |   12 ++--
  gcc/lra-int.h  |2 ++
  gcc/lra.c  |5 -
  4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index b204513..daf0aa9 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int
try_only_hard_regno)
int hr, conflict_hr, nregs;
enum machine_mode biggest_mode;
unsigned int k, conflict_regno;
-  int val, biggest_nregs, nregs_diff;
+  int offset, val, biggest_nregs, nregs_diff;
enum reg_class rclass;
bitmap_iterator bi;
bool *rclass_intersect_p;
@@ -508,9 +508,11 @@ find_hard_regno_for (int regno, int *cost, int
try_only_hard_regno)
  #endif
sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
val = lra_reg_info[regno].val;
+  offset = lra_reg_info[regno].offset;
CLEAR_HARD_REG_SET (impossible_start_hard_regs);
EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
-if (val == lra_reg_info[conflict_regno].val)
+if ((val == lra_reg_info[conflict_regno].val)
+&& (offset == lra_reg_info[conflict_regno].offset))
{
 conflict_hr = live_pseudos_reg_renumber[conflict_regno];
 nregs = (hard_regno_nregs[conflict_hr]
diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index 9df0bae..2d34b51 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -1046,6 +1046,7 @@ spill_pseudos (HARD_REG_SET set)
  static void
  update_reg_eliminate (bitmap insns_with_changed_offsets)
  {
+  int i;
bool prev;
struct elim_table *ep, *ep1;
HARD_REG_SET temp_hard_reg_set;
@@ -1124,8 +1125,15 @@ update_reg_eliminate (bitmap insns_with_changed_offsets)
setup_elimination_map ();
for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++)
  if (elimination_map[ep->from] == ep && ep->previous_offset != ep->offset)
-  bitmap_ior_into (insns_with_changed_offsets,
-  &lra_reg_info[ep->from].insn_bitmap);
+  {
+bitmap_ior_into (insns_with_changed_offsets,
+&lra_reg_info[ep->from].insn_bitmap);
+
+   /* Update offset when the eliminate offset have been changed.  */
+for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+ if (lra_reg_info[i].val - 1 == ep->from)

I guess, -1 here is typo.

+   lra_reg_info[i].offset += (ep->offset - ep->previous_offset);
+  }
  }

  /* Initialize the table of hard registers to eliminate.
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 98f2ff7..944cad1 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -116,6 +116,8 @@ struct lra_reg
/* Value holding by register. If the pseudos have the same value
   they do not conflict.  */
int val;
+  /* Offset from relative eliminate register to pesudo reg.  */
+  int offset;
/* These members are set up in lra-lives.c and updated in
   lra-coalesce.c.  */
/* The biggest size mode in which each pseudo reg is referred in
diff --git a/gcc/lra.c b/gcc/lra.c
index 9df24b5..7a60281 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -194,7 +194,10 @@ lra_create_new_reg (enum machine_mode md_mode,
rtx original,
new_reg
  = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title);
if (original != NULL_RTX && REG_P (original))
-lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+{
+  lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+  lra_reg_info[REGNO (new_reg)].offset = 0;
+}
return new_reg;
  }



Thanks for the dump files.  They help me to understand better the 
situation.  The patch is better but it is still incomplete.  There are 
more places where the values are used (more places in lra-assigns.c, a 
few places in lra-constraints.c).


As there are many places it would be nice to have helper functions (I'd 
make them static inline and put them in lra-int.h):


static inline void
lra_set_up_reg_val (int regno, int val, int offset)

and

static inline bool
lra_reg_val_equal_p (int regno, int val, int offset)

and use them wherever it is possible.

So could you check all the places where .val is used, define the helper 
functions, and use them wherever it is possible and send me the new 
version of the patch.  I'll approve it after some checking.


Thanks for working on this problem.  I really appreciate it.



Related to GSOC 2013 project

2013-04-19 Thread Abhishek Sharma
Good Day to all.

I'm an aspirant for GSOC 2013.
Currently I'm studying in Jaypee University of Information and
Technology, HP, India.

I'm interested in GCC that's why I'm looking forward to doing one of
your projects titled as : Improvements to GCC on windows.

After reading the resources available on the internet I was having two queries:

1. Various programming languages required for this project; I'm fluent
in C,C++ and JAVA.
2. The relevant forms related to "Legal Prerequisites".

Kindly guide me on these two points for getting me started to involve
in this project.

Yours sincerely
Abhishek


Re: vec<> inside GTYed struct

2013-04-19 Thread Diego Novillo

On 2013-04-19 10:21 , Paulo Matos wrote:

struct GTY(()) LOOP_INFO
{
   ...
   vec infos;


You are declaring a heap vector here.  Since your structure is in GC 
memory, the vector must also be in GC memory.  Add 'va_gc' to the 
arguments; and make infos a pointer (a sad side-effect of using GC):


vec *infos;


Diego.


Re: section attribute

2013-04-19 Thread reed kotler

My mistake here.

I have not used the section attribute myself much and was looking at gcc 
code

just passed the right handside of the .section to the attribute string.

# Stub function for foovf (float)
.section.mips16.fn.foovf,"ax",@progbits
.align2
.setnomips16
.setnomicromips
.ent__fn_stub_foovf
.type__fn_stub_foovf, @function
__fn_stub_foovf:
la$25,foovf
mfc1$4,$f12
jr$25
.end__fn_stub_foovf
.text
$__fn_local_foovf = foovf
--More--(55%)


On 04/19/2013 08:22 AM, reed kotler wrote:

I tried to report a bug against llvm for not properly handling the
section attribute but they claim that it's not the intention for gcc 
to work this way.


I reported it as an X86 problem because it's more generally 
understandable to people but actually the problem occurs when mips16 
generates stubs for floating point interoperability with mips32.


The gcc documentation ways that the section attribute takes the 
section name.


What is the real rule?

Is this "feature" used by other ports except for gcc mips16?

TIA.

Reed

Here is what I reported:

Consider the following code:

void x(int i) __attribute((section(".mySection,\"aw\",@progbits#")));

void x(int i) {
}


If you compile this with gcc you get:

.file"sectbug.c"
.section.mySection,"aw",@progbits#,"ax",@progbits
.globl x


 With Clang you get

.file"sectbug.c"
.section".mySection,\"aw\",@progbits#","ax",@progbits
.globlx







section attribute

2013-04-19 Thread reed kotler

I tried to report a bug against llvm for not properly handling the
section attribute but they claim that it's not the intention for gcc to 
work this way.


I reported it as an X86 problem because it's more generally 
understandable to people but actually the problem occurs when mips16 
generates stubs for floating point interoperability with mips32.


The gcc documentation ways that the section attribute takes the section 
name.


What is the real rule?

Is this "feature" used by other ports except for gcc mips16?

TIA.

Reed

Here is what I reported:

Consider the following code:

void x(int i) __attribute((section(".mySection,\"aw\",@progbits#")));

void x(int i) {
}


If you compile this with gcc you get:

.file"sectbug.c"
.section.mySection,"aw",@progbits#,"ax",@progbits
.globl x


 With Clang you get

.file"sectbug.c"
.section".mySection,\"aw\",@progbits#","ax",@progbits
.globlx




vec<> inside GTYed struct

2013-04-19 Thread Paulo Matos
Hello,

Should I be concerned about using a vec<> inside a GTYed struct. Something like:
typedef loop_info * LOOP_INFO;

struct GTY(()) LOOP_INFO 
{
  ...
  vec infos;
};

is causing me some pain due to invalid free() / delete / delete[] / realloc() 
(as reported by valgrind) after a segfault).

Are there are any rules of thumb to what can go inside a GTYed struct? I read 
http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gccint/Type-Information.html#Type-Information
unfortunately it doesn't mention the use of vecs inside GTYed structs.

For sake of completion the backtrace looks like:
==30111== Invalid free() / delete / delete[] / realloc()
==30111==at 0x4A078F0: realloc (vg_replace_malloc.c:632)
==30111==by 0x1037CEC: xrealloc (xmalloc.c:179)
==30111==by 0x663C4E: void 
va_heap::reserve(vec*&, 
unsigned int, bool) (vec.h:300)
==30111==by 0x663B49: vec::reserve(unsigned int, bool) (vec.h:1468)
==30111==by 0xCEA04E: firepath_reorg_loops(_IO_FILE*) (firepath.c:8764)
==30111==by 0xCE6107: firepath_reorg() (firepath.c:6678)
==30111==by 0x9BE01D: rest_of_handle_machine_reorg() (reorg.c:3927)
==30111==by 0x94BC0A: execute_one_pass(opt_pass*) (passes.c:2379)
==30111==by 0x94BDFE: execute_pass_list(opt_pass*) (passes.c:2427)
==30111==by 0x94BE2F: execute_pass_list(opt_pass*) (passes.c:2428)
==30111==by 0x94BE2F: execute_pass_list(opt_pass*) (passes.c:2428)
==30111==by 0x68A254: expand_function(cgraph_node*) (cgraphunit.c:1640)

Cheers,

Paulo Matos