Re: question about rtl loop-iv analysis

2007-08-28 Thread Zdenek Dvorak
Hello,

> And finally at the stage of rtl unrolling it looks like this:
> [6]   r186 = r2 + C;
>   r318 = r186 + 160;
>   loop:
>   r186 = r186 + 16
>   if (r186 != r318) then goto loop else exit
> 
> Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
> calls iv_analyze_biv (with r258/r186), which in turn calls
> latch_dominating_def.
> There, when processing the vectorized case, it encounters the def in the
> loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
> point it fails cause it found two defs (and so we get that this is not a
> simple iv, and not a simple loop, and unrolling fails: "Unable to prove
> that the loop iterates constant times").
> When processing the unvectorized case, we also first encounter the def in
> the loop ('r258+16'), and then the def outside the loop ('0'), but this def
> succeeds the test "if (!bitmapset (bb_info->out,))", and so we don't
> fail when we encounter the second def, and all is well.
> 
> So one question I have is what is that bitmap exactly, and why does loop
> [6] fail rtl iv-analysis?

the bitmap bb_info->out should contain reaching definitions at the end
of the latch block; so the definition of r186 outside of the loop
should not be contained in it.  This seems to be a dataflow analysis
problem.

Zdenek


Re: question about rtl loop-iv analysis

2007-08-28 Thread Kenneth Zadeck
Zdenek Dvorak wrote:
> Hello,
>
>   
>> And finally at the stage of rtl unrolling it looks like this:
>> [6]   r186 = r2 + C;
>>   r318 = r186 + 160;
>>   loop:
>>   r186 = r186 + 16
>>   if (r186 != r318) then goto loop else exit
>>
>> Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
>> calls iv_analyze_biv (with r258/r186), which in turn calls
>> latch_dominating_def.
>> There, when processing the vectorized case, it encounters the def in the
>> loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
>> point it fails cause it found two defs (and so we get that this is not a
>> simple iv, and not a simple loop, and unrolling fails: "Unable to prove
>> that the loop iterates constant times").
>> When processing the unvectorized case, we also first encounter the def in
>> the loop ('r258+16'), and then the def outside the loop ('0'), but this def
>> succeeds the test "if (!bitmapset (bb_info->out,))", and so we don't
>> fail when we encounter the second def, and all is well.
>>
>> So one question I have is what is that bitmap exactly, and why does loop
>> [6] fail rtl iv-analysis?
>> 
>
> the bitmap bb_info->out should contain reaching definitions at the end
> of the latch block; so the definition of r186 outside of the loop
> should not be contained in it.  This seems to be a dataflow analysis
> problem.
>
> Zdenek
>   
It could be, but it is hard to see how.  There is nothing magic about
the code, i goes over the set of blocks looking for defs and that is
what it uses.  Each call to df_set_blocks resets the table of defs that
can be considered.

In particular, you need to first verify that you are passing exactly the
correct set of blocks to df_set_blocks.


Kenny


Re: question about rtl loop-iv analysis

2007-08-28 Thread Zdenek Dvorak
Hello,

> >> And finally at the stage of rtl unrolling it looks like this:
> >> [6]   r186 = r2 + C;
> >>   r318 = r186 + 160;
> >>   loop:
> >>   r186 = r186 + 16
> >>   if (r186 != r318) then goto loop else exit
> >>
> >> Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
> >> calls iv_analyze_biv (with r258/r186), which in turn calls
> >> latch_dominating_def.
> >> There, when processing the vectorized case, it encounters the def in the
> >> loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
> >> point it fails cause it found two defs (and so we get that this is not a
> >> simple iv, and not a simple loop, and unrolling fails: "Unable to prove
> >> that the loop iterates constant times").
> >> When processing the unvectorized case, we also first encounter the def in
> >> the loop ('r258+16'), and then the def outside the loop ('0'), but this def
> >> succeeds the test "if (!bitmapset (bb_info->out,))", and so we don't
> >> fail when we encounter the second def, and all is well.
> >>
> >> So one question I have is what is that bitmap exactly, and why does loop
> >> [6] fail rtl iv-analysis?
> >> 
> >
> > the bitmap bb_info->out should contain reaching definitions at the end
> > of the latch block; so the definition of r186 outside of the loop
> > should not be contained in it.  This seems to be a dataflow analysis
> > problem.
> >
> It could be, but it is hard to see how.  There is nothing magic about
> the code, i goes over the set of blocks looking for defs and that is
> what it uses.  Each call to df_set_blocks resets the table of defs that
> can be considered.
> 
> In particular, you need to first verify that you are passing exactly the
> correct set of blocks to df_set_blocks.

regardless of the set of blocks considered, there is no way how both the
definition outside of the loop and the definition inside the loop could
both be reaching definitions at the same point in the program.

Zdenek


Re: question about rtl loop-iv analysis

2007-08-28 Thread Kenneth Zadeck
Zdenek Dvorak wrote:
> Hello,
>
>   
 And finally at the stage of rtl unrolling it looks like this:
 [6]   r186 = r2 + C;
   r318 = r186 + 160;
   loop:
   r186 = r186 + 16
   if (r186 != r318) then goto loop else exit

 Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
 calls iv_analyze_biv (with r258/r186), which in turn calls
 latch_dominating_def.
 There, when processing the vectorized case, it encounters the def in the
 loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
 point it fails cause it found two defs (and so we get that this is not a
 simple iv, and not a simple loop, and unrolling fails: "Unable to prove
 that the loop iterates constant times").
 When processing the unvectorized case, we also first encounter the def in
 the loop ('r258+16'), and then the def outside the loop ('0'), but this def
 succeeds the test "if (!bitmapset (bb_info->out,))", and so we don't
 fail when we encounter the second def, and all is well.

 So one question I have is what is that bitmap exactly, and why does loop
 [6] fail rtl iv-analysis?
 
 
>>> the bitmap bb_info->out should contain reaching definitions at the end
>>> of the latch block; so the definition of r186 outside of the loop
>>> should not be contained in it.  This seems to be a dataflow analysis
>>> problem.
>>>
>>>   
>> It could be, but it is hard to see how.  There is nothing magic about
>> the code, i goes over the set of blocks looking for defs and that is
>> what it uses.  Each call to df_set_blocks resets the table of defs that
>> can be considered.
>>
>> In particular, you need to first verify that you are passing exactly the
>> correct set of blocks to df_set_blocks.
>> 
>
> regardless of the set of blocks considered, there is no way how both the
> definition outside of the loop and the definition inside the loop could
> both be reaching definitions at the same point in the program.
>
> Zdenek
>   
sure it can.  if the def is not a killing def, i.e. the def is a partial
or conditional, then the prior defs reach right in.

kenny


Re: question about rtl loop-iv analysis

2007-08-28 Thread Michael Matz
On Tue, 28 Aug 2007, Kenneth Zadeck wrote:

>  And finally at the stage of rtl unrolling it looks like this:
>  [6]   r186 = r2 + C;
>    r318 = r186 + 160;
>    loop:
>    r186 = r186 + 16
>    if (r186 != r318) then goto loop else exit
> 
> >
> > regardless of the set of blocks considered, there is no way how both the
> > definition outside of the loop and the definition inside the loop could
> > both be reaching definitions at the same point in the program.
> >   
> sure it can.  if the def is not a killing def, i.e. the def is a partial
> or conditional, then the prior defs reach right in.

I don't think that's the case above for r186.

Apart from that I agree that in the case of conditional defs multiple ones 
might reach a use.  I disagree for partial defs, if one dominates the 
other.  For most cases (except register allocation) a partial def is just 
a complete killing def, which happens to leave some contents unchanged (by 
conceptually copying that part from what was in there before, hence the 
usual read-mod-write dependency from a partial def which generates also a 
use).  But the use doesn't care for this, for all it knows that def is the 
one changing the contents, hence a use-def chain should not contain a def 
dominating other defs (partial or not).  The relationship between both 
defs will be described by the use-def chain hanging off the implicit use 
generated by the partial def.

E.g. in this example (using my old hopefully understandable notation), if 
p1 is a DImode pseudo:

1 p1+0:SI <- bla
2 p1+4:SI <- blubb
3 use <- p1:DI

insn 3 should only see def 2 in it's chain.  That it also is influenced by 
the def 1 will be captured by the implicit use of p1 in insn 2.  I.e. not 
different to

1 p1 <- bla
2 p1 <- p1 + blubb
3 use <- p1

Exact partial defs should only (and do) matter for register allocation.  
That's one of the reasons why (some years ago for new-ra) I exposed 
real partial defs of subregs from the df framework only for the register 
allocator, but not for general code.


Ciao,
Michael.


Re: question about rtl loop-iv analysis

2007-08-28 Thread Zdenek Dvorak
Hello,

>  And finally at the stage of rtl unrolling it looks like this:
>  [6]   r186 = r2 + C;
>    r318 = r186 + 160;
>    loop:
>    r186 = r186 + 16
>    if (r186 != r318) then goto loop else exit
> 
>  Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
>  calls iv_analyze_biv (with r258/r186), which in turn calls
>  latch_dominating_def.
>  There, when processing the vectorized case, it encounters the def in the
>  loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
>  point it fails cause it found two defs (and so we get that this is not a
>  simple iv, and not a simple loop, and unrolling fails: "Unable to prove
>  that the loop iterates constant times").
>  When processing the unvectorized case, we also first encounter the def in
>  the loop ('r258+16'), and then the def outside the loop ('0'), but this 
>  def
>  succeeds the test "if (!bitmapset (bb_info->out,))", and so we don't
>  fail when we encounter the second def, and all is well.
> 
>  So one question I have is what is that bitmap exactly, and why does loop
>  [6] fail rtl iv-analysis?
>  
>  
> >>> the bitmap bb_info->out should contain reaching definitions at the end
> >>> of the latch block; so the definition of r186 outside of the loop
> >>> should not be contained in it.  This seems to be a dataflow analysis
> >>> problem.
> >>>
> >>>   
> >> It could be, but it is hard to see how.  There is nothing magic about
> >> the code, i goes over the set of blocks looking for defs and that is
> >> what it uses.  Each call to df_set_blocks resets the table of defs that
> >> can be considered.
> >>
> >> In particular, you need to first verify that you are passing exactly the
> >> correct set of blocks to df_set_blocks.
> >> 
> >
> > regardless of the set of blocks considered, there is no way how both the
> > definition outside of the loop and the definition inside the loop could
> > both be reaching definitions at the same point in the program.
> >
> sure it can.  if the def is not a killing def, i.e. the def is a partial
> or conditional, then the prior defs reach right in.

that obviously is not the case here, though.  Do you (or someone else
responsible for df) have time to have a look at this problem?
Otherwise, we may discuss it forever, but we will not get anywhere.

Zdenek


Re: question about rtl loop-iv analysis

2007-08-28 Thread Seongbae Park (박성배, 朴成培)
On 8/28/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
...
> that obviously is not the case here, though.  Do you (or someone else
> responsible for df) have time to have a look at this problem?
> Otherwise, we may discuss it forever, but we will not get anywhere.
>
> Zdenek

Open a PR and assign it to me, if you're not in a hurry -
I should be able to look at it next week.
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: question about rtl loop-iv analysis

2007-08-29 Thread Dorit Nuzman
"Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> wrote on
29/08/2007 01:01:42:

> On 8/28/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
> ...
> > that obviously is not the case here, though.  Do you (or someone else
> > responsible for df) have time to have a look at this problem?
> > Otherwise, we may discuss it forever, but we will not get anywhere.
> >
> > Zdenek
>
> Open a PR and assign it to me, if you're not in a hurry -
> I should be able to look at it next week.

Thanks! This is PR33224

dorit

> --
> #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";



Re: question about rtl loop-iv analysis

2007-08-29 Thread Bernd Schmidt

Michael Matz wrote:


Apart from that I agree that in the case of conditional defs multiple ones 
might reach a use.  I disagree for partial defs, if one dominates the 
other.  For most cases (except register allocation) a partial def is just 
a complete killing def, which happens to leave some contents unchanged (by 
conceptually copying that part from what was in there before, hence the 
usual read-mod-write dependency from a partial def which generates also a 
use).  But the use doesn't care for this, for all it knows that def is the 
one changing the contents, hence a use-def chain should not contain a def 
dominating other defs (partial or not).  The relationship between both 
defs will be described by the use-def chain hanging off the implicit use 
generated by the partial def.


I completely agree, and I've tried very hard to argue that with Kenneth 
last year when I spotted this kind of bogosity in the dataflow code, but 
I got nowhere.  I wish you better luck.



Bernd
--
This footer brought to you by insane German lawmakers.
Analog Devices GmbH  Wilhelm-Wagenfeld-Str. 6  80807 Muenchen
Sitz der Gesellschaft Muenchen, Registergericht Muenchen HRB 40368
Geschaeftsfuehrer Thomas Wessel, William A. Martin, Margaret Seif


Re: question about rtl loop-iv analysis

2007-08-29 Thread Michael Matz
Hi,

On Wed, 29 Aug 2007, Bernd Schmidt wrote:

> > Apart from that I agree that in the case of conditional defs multiple 
> > ones might reach a use.  I disagree for partial defs, if one dominates 
> > the other. For most cases (except register allocation) a partial def 
> > is just a complete killing def, which happens to leave some contents 
> > unchanged (by conceptually copying that part from what was in there 
> > before, hence the usual read-mod-write dependency from a partial def 
> > which generates also a use). But the use doesn't care for this, for 
> > all it knows that def is the one changing the contents, hence a 
> > use-def chain should not contain a def dominating other defs (partial 
> > or not).  The relationship between both defs will be described by the 
> > use-def chain hanging off the implicit use generated by the partial 
> > def.
> 
> I completely agree, and I've tried very hard to argue that with Kenneth 
> last year when I spotted this kind of bogosity in the dataflow code, but 

I initially implemented giving out partial defs in df.c on the new-ra 
branch because I needed it for precise subreg register allocation.  
Before that df.c was doing simply the read-mod-write scheme (i.e. 
generating a use-ref and a def-ref for each partial write).  That code 
later also landed in mainline, with new-ra being removed somewhat later, 
but not the df.c changes (but they were still conditional on some flag, 
that you really wanted those partial defs, for exclusive use of a reg 
allocator).  I believe that code was ported forward in all later df*.c 
rewrites and extensions, and hence still is there.

If that is true, then I can say for sure that I definitely never intended 
to expose partial defs without use-refs to the outside except for very 
constrained use cases, namely register allocation, or generally any pass 
which might be interested in exact subreg life times _and_ is prepared to 
handle them, which most usual code is not, simply because it doesn't 
matter for them.  Because if they aren't prepared to handle them they will 
become confused as to what is and what is not a "complete" definition (in 
the sense that between that def and the use in question no contents from 
before that bordering def remain; in fact they have no concept of that) 
and be surprised by uses reached by two defs where one dominates the other 
(e.g. because both are in the same basic block).



If df*.c is now doing that for all it's callers I'm pretty sure that just 
started as an artifact and later transformed into the normal way of doing 
things.  It would be wrong nevertheless.


Ciao,
Michael.


Re: question about rtl loop-iv analysis

2007-08-30 Thread Kenneth Zadeck
Michael Matz wrote:
> Hi,
>
> On Wed, 29 Aug 2007, Bernd Schmidt wrote:
>
>   
>>> Apart from that I agree that in the case of conditional defs multiple 
>>> ones might reach a use.  I disagree for partial defs, if one dominates 
>>> the other. For most cases (except register allocation) a partial def 
>>> is just a complete killing def, which happens to leave some contents 
>>> unchanged (by conceptually copying that part from what was in there 
>>> before, hence the usual read-mod-write dependency from a partial def 
>>> which generates also a use). But the use doesn't care for this, for 
>>> all it knows that def is the one changing the contents, hence a 
>>> use-def chain should not contain a def dominating other defs (partial 
>>> or not).  The relationship between both defs will be described by the 
>>> use-def chain hanging off the implicit use generated by the partial 
>>> def.
>>>   
>> I completely agree, and I've tried very hard to argue that with Kenneth 
>> last year when I spotted this kind of bogosity in the dataflow code, but 
>> 
>
> I initially implemented giving out partial defs in df.c on the new-ra 
> branch because I needed it for precise subreg register allocation.  
> Before that df.c was doing simply the read-mod-write scheme (i.e. 
> generating a use-ref and a def-ref for each partial write).  That code 
> later also landed in mainline, with new-ra being removed somewhat later, 
> but not the df.c changes (but they were still conditional on some flag, 
> that you really wanted those partial defs, for exclusive use of a reg 
> allocator).  I believe that code was ported forward in all later df*.c 
> rewrites and extensions, and hence still is there.
>
> If that is true, then I can say for sure that I definitely never intended 
> to expose partial defs without use-refs to the outside except for very 
> constrained use cases, namely register allocation, or generally any pass 
> which might be interested in exact subreg life times _and_ is prepared to 
> handle them, which most usual code is not, simply because it doesn't 
> matter for them.  Because if they aren't prepared to handle them they will 
> become confused as to what is and what is not a "complete" definition (in 
> the sense that between that def and the use in question no contents from 
> before that bordering def remain; in fact they have no concept of that) 
> and be surprised by uses reached by two defs where one dominates the other 
> (e.g. because both are in the same basic block).
>
> 
>
> If df*.c is now doing that for all it's callers I'm pretty sure that just 
> started as an artifact and later transformed into the normal way of doing 
> things.  It would be wrong nevertheless.
>
>
> Ciao,
> Michael.
>   
Michael and Bernd,

I you look at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224 you will
see that this bug has nothing to do with the type of def.  It is pilot
error on the author of the loop unrolling code.  They are looking in the
wrong places in the bitvectors.

Kenny


Re: question about rtl loop-iv analysis

2007-08-30 Thread Michael Matz
Hi,

On Thu, 30 Aug 2007, Kenneth Zadeck wrote:

> Michael and Bernd,
> 
> I you look at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224 you will 
> see that this bug has nothing to do with the type of def.  It is pilot 
> error on the author of the loop unrolling code.  They are looking in the 
> wrong places in the bitvectors.

Yes, but I wasn't talking about the specific bug in loop-iv, sorry if that 
didn't become clear.  I was specifically reacting to your claim that 
multiple partial defs might reach one use even if one dominates the other 
and that this is the correct thing to happen.  IMO it is not and it would 
be a mistake if df*.c let this happen.  I tried to explain why I think so, 
and again wasn't refering in any way to the loop-iv bug.


Ciao,
Michael.


Re: question about rtl loop-iv analysis

2007-09-03 Thread Dorit Nuzman
Zdenek's patch here -
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg02291.html - solved the
problem.
Kenny, Zdenek - many thanks for solving this issue!

dorit

> "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> wrote on
> 29/08/2007 01:01:42:
>
> > On 8/28/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
> > ...
> > > that obviously is not the case here, though.  Do you (or someone else
> > > responsible for df) have time to have a look at this problem?
> > > Otherwise, we may discuss it forever, but we will not get anywhere.
> > >
> > > Zdenek
> >
> > Open a PR and assign it to me, if you're not in a hurry -
> > I should be able to look at it next week.
>
> Thanks! This is PR33224
>
> dorit
>
> > --
> > #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
>