Limits of stage3 changes

2007-11-16 Thread Steven Bosscher
Hello,

The amount of duplicate work done on RTL is sometimes really amazing,
especially since the merge of the dataflow branch.  Some of the people
who have worked on the dataflow branch had hoped that other developers
would help with the follow-up actions to actually *use* all the
information that, for example, the df-scan problem collects.

My favorite example of this lack of follow-through is gcse.c.  It
computes reg-def chains and monotonic insn IDs. Guess what df-scan
provides?

Attached is an example of the cleanups I would have liked to see
during stage 2 of GCC 4.3.  The patch is not finished, but it gives
you an idea of what is possible.  The patch avoids computation of
redundant information by making gcse use the df-scan information,
instead of computing everything for itself.  In theory, this saves
memory and it reduces the number of passes over all RTX'en in memory.
So, if finished, this should give compile time improvements and a
slightly smaller memory footprint.

Question is, whether this kind of rather large changes is acceptable
for stage 3 or not.  Me, I call it a "regression fix" if it reduces
compile time.  But I will not work on it now (or ask help from others)
if it is a priori not acceptable for stage 3.

Thus, thoughts please! :-)

(FWIW I have almost complete rewrites of half the passes in gcse.c,
and the attached patch is a kind-of back port of the ideas from my new
implementations...)

Gr.
Steven
	* gcse.c (uid_cuid, max_uid, INSN_CUID, max_cuid, struct reg_set,
	reg_set_table, REG_SET_TABLE_SLOP, reg_set_in_block,
	alloc_reg_set_mem, free_reg_set_mem, record_one_set,
	record_set_info, compute_set, grealloc): Remove.
	(recompute_all_luids): New function.
	(gcse_main): Don't compute sets, and don't do related memory
	allocations/free-ing.  If something changed before the end of the
	pass, update LUIDs using recompute_all_luids.
	(alloc_gcse_mem): Don't compute LUIDs.  Don't allocate reg_set memory.
	(free_gcse_mem): Don't free it either.
	(oprs_unchanged_p, load_killed_in_block, record_last_reg_set_info):
	Use the df insn LUIDs.
	(load_killed_in_block): Likewise.
	(compute_hash_table_work): Don't compute reg_set_in_block.
	(compute_transp): Use DF_REG_DEF_CHAINs.
	(local_cprop_pass): Don't use compute_sets and related functions.
	(one_cprop_pass, pre_gcse, one_pre_gcse_pass, one_code_hoisting_pass):
	Use get_max_uid() instead of max_cuid.
	(insert_insn_end_basic_block, pre_insert_copy_insn,
	update_ld_motion_stores): Don't try to
	keep reg_set tables up to date.
	(pre_insert_copies): Use df insn LUIDs.
	(reg_set_info): Don't use extra bitmap argument.
	(compute_store_table): Don't compute reg_set_in_block.  Use DF scan
	information to compute regs_set_in_block.
	(free_store_memory, store_motion): Don't nullify reg_set_in_block.
	(bypass_jumps): Don't use compute_sets and friends.

Index: gcse.c
===
*** gcse.c	(revision 130236)
--- gcse.c	(working copy)
*** along with GCC; see the file COPYING3.  
*** 27,36 
 - a store to the same address as a load does not kill the load if the
   source of the store is also the destination of the load.  Handling this
   allows more load motion, particularly out of loops.
-- ability to realloc sbitmap vectors would allow one initial computation
-  of reg_set_in_block with only subsequent additions, rather than
-  recomputing it for each pass
- 
  */
  
  /* References searched while implementing this.
--- 27,32 
*** static struct hash_table expr_hash_table
*** 362,431 
  /* Copy propagation hash table.  */
  static struct hash_table set_hash_table;
  
- /* Mapping of uids to cuids.
-Only real insns get cuids.  */
- static int *uid_cuid;
- 
- /* Highest UID in UID_CUID.  */
- static int max_uid;
- 
- /* Get the cuid of an insn.  */
- #ifdef ENABLE_CHECKING
- #define INSN_CUID(INSN) \
-   (gcc_assert (INSN_UID (INSN) <= max_uid), uid_cuid[INSN_UID (INSN)])
- #else
- #define INSN_CUID(INSN) (uid_cuid[INSN_UID (INSN)])
- #endif
- 
- /* Number of cuids.  */
- static int max_cuid;
- 
  /* Maximum register number in function prior to doing gcse + 1.
 Registers created during this pass have regno >= max_gcse_regno.
 This is named with "gcse" to not collide with global of same name.  */
  static unsigned int max_gcse_regno;
  
- /* Table of registers that are modified.
- 
-For each register, each element is a list of places where the pseudo-reg
-is set.
- 
-For simplicity, GCSE is done on sets of pseudo-regs only.  PRE GCSE only
-requires knowledge of which blocks kill which regs [and thus could use
-a bitmap instead of the lists `reg_set_table' uses].
- 
-`reg_set_table' and could be turned into an array of bitmaps (num-bbs x
-num-regs) [however perhaps it may be useful to keep the data as is].  One
-advantage of recording things this way is that `reg_set_table' is fairly
-sparse with r

Re: Limits of stage3 changes

2007-11-16 Thread Richard Guenther
On Nov 16, 2007 11:43 PM, Steven Bosscher <[EMAIL PROTECTED]> wrote:
> Hello,
>
> The amount of duplicate work done on RTL is sometimes really amazing,
> especially since the merge of the dataflow branch.  Some of the people
> who have worked on the dataflow branch had hoped that other developers
> would help with the follow-up actions to actually *use* all the
> information that, for example, the df-scan problem collects.
>
> My favorite example of this lack of follow-through is gcse.c.  It
> computes reg-def chains and monotonic insn IDs. Guess what df-scan
> provides?
>
> Attached is an example of the cleanups I would have liked to see
> during stage 2 of GCC 4.3.  The patch is not finished, but it gives
> you an idea of what is possible.  The patch avoids computation of
> redundant information by making gcse use the df-scan information,
> instead of computing everything for itself.  In theory, this saves
> memory and it reduces the number of passes over all RTX'en in memory.
> So, if finished, this should give compile time improvements and a
> slightly smaller memory footprint.
>
> Question is, whether this kind of rather large changes is acceptable
> for stage 3 or not.  Me, I call it a "regression fix" if it reduces
> compile time.  But I will not work on it now (or ask help from others)
> if it is a priori not acceptable for stage 3.
>
> Thus, thoughts please! :-)

I think we should be able to rely on DF enough to make mostly mechanical
changes like this ok for stage3.  I don't know where we are in terms of
compile-time compared to 4.2 or 4.1, but certainly if you go back further
compile-time is always a "regression".

Thanks,
Richard.


Re: Limits of stage3 changes

2007-11-17 Thread Paolo Bonzini



My favorite example of this lack of follow-through is gcse.c.  It
computes reg-def chains and monotonic insn IDs. Guess what df-scan
provides?


This is great.  I did that for combine and CSE, but I didn't know GCSE 
as well.  From the description from my first read I like this patch, and 
I think you are the person that has studied GCSE more so I trust it a 
lot.  (Most of this could be done in postreload GCSE too, by the way).


Paolo


Re: Limits of stage3 changes

2007-11-17 Thread Rask Ingemann Lambertsen
On Fri, Nov 16, 2007 at 11:43:31PM +0100, Steven Bosscher wrote:
> Hello,
> 
> The amount of duplicate work done on RTL is sometimes really amazing,
> especially since the merge of the dataflow branch.  Some of the people
> who have worked on the dataflow branch had hoped that other developers
> would help with the follow-up actions to actually *use* all the
> information that, for example, the df-scan problem collects.

   I think a lot of people don't know how to use DF. Myself, I wouldn't know
where to look other than in the source code. That's fine as a reference, but
doesn't help a complete newbie to DF a lot. Because DF is new, there will be
newbies to DF.

-- 
Rask Ingemann Lambertsen
Danish law requires addresses in e-mail to be logged and stored for a year


Re: Limits of stage3 changes

2007-11-17 Thread Kaveh R. GHAZI
On Fri, 16 Nov 2007, Steven Bosscher wrote:

> Question is, whether this kind of rather large changes is acceptable
> for stage 3 or not.  Me, I call it a "regression fix" if it reduces
> compile time.  But I will not work on it now (or ask help from others)
> if it is a priori not acceptable for stage 3.
>
> Thus, thoughts please! :-)


I think the answer is "maybe".  In the past we have counted compile-time
savings as appropriate for stage3 regression fixes.  However IMHO you
would need to provide some measurement of the improvements (memory saved,
speed timings) so the RM and perhaps middle-end maintainers can weigh the
risk vs benefit and make an informed decision.

So far you've only shown us the "risky" part, i.e. the patch.

--Kaveh
--
Kaveh R. Ghazi  [EMAIL PROTECTED]


Re: Limits of stage3 changes

2007-11-18 Thread Steven Bosscher
On Nov 18, 2007 7:28 AM, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote:
> On Fri, 16 Nov 2007, Steven Bosscher wrote:
>
> > Question is, whether this kind of rather large changes is acceptable
> > for stage 3 or not.  Me, I call it a "regression fix" if it reduces
> > compile time.  But I will not work on it now (or ask help from others)
> > if it is a priori not acceptable for stage 3.
...
> I think the answer is "maybe".  In the past we have counted compile-time
> savings as appropriate for stage3 regression fixes.  However IMHO you
> would need to provide some measurement of the improvements (memory saved,
> speed timings) so the RM and perhaps middle-end maintainers can weigh the
> risk vs benefit and make an informed decision.
>
> So far you've only shown us the "risky" part, i.e. the patch.

Did you read what I wrote (and you quoted)?

1. I call it a "regression fix" if it reduces compile time.
[and a patch that fixes a regression should be considered for stage 3]

2. But *I will not work on it* now (or ask help from others) if it is
*a priori* not acceptable for stage 3.
["a priori" as in "don't replace the dataflow engine in our global
optimizers during stage2", and "not work on" includes not spend time
and effort doing testing/measurements unless I have the feeling it is
worth the trouble.]

Gr.
Steven


Re: Limits of stage3 changes

2007-11-18 Thread David Edelsohn
> Kaveh R GHAZI writes:

Kaveh> I think the answer is "maybe".  In the past we have counted compile-time
Kaveh> savings as appropriate for stage3 regression fixes.  However IMHO you
Kaveh> would need to provide some measurement of the improvements (memory saved,
Kaveh> speed timings) so the RM and perhaps middle-end maintainers can weigh the
Kaveh> risk vs benefit and make an informed decision.

Kaveh,

The time savings fall out of the algorithm improvements.

David



Re: Limits of stage3 changes

2007-11-18 Thread Kaveh R. GHAZI
On Sun, 18 Nov 2007, Steven Bosscher wrote:

> 2. But *I will not work on it* now (or ask help from others) if it is *a
> priori* not acceptable for stage 3.

As I parse your sentence, you were asking if your patch would be
automatically (a priori) rejected for stage3.  If I say it may be
acceptable, that means the rejection is not presumptively automatic.
Therefore IMHO, I answered your question in an affirmative way.

OTOH, I don't think final acceptance of your patch is guaranteed either,
without the same analysis we ask of any contributor.  You'll have to
decide how you feel about "maybe", is the glass half full or half empty?

--Kaveh
--
Kaveh R. Ghazi  [EMAIL PROTECTED]


Re: Limits of stage3 changes

2007-11-18 Thread Kaveh R. GHAZI
On Sun, 18 Nov 2007, David Edelsohn wrote:

> > Kaveh R GHAZI writes:
>
> Kaveh> I think the answer is "maybe".  In the past we have counted 
> compile-time
> Kaveh> savings as appropriate for stage3 regression fixes.  However IMHO you
> Kaveh> would need to provide some measurement of the improvements (memory 
> saved,
> Kaveh> speed timings) so the RM and perhaps middle-end maintainers can weigh 
> the
> Kaveh> risk vs benefit and make an informed decision.
>
> Kaveh,
>   The time savings fall out of the algorithm improvements.
> David

I believe you that there are improvements to be had, and I'd like to see
them in GCC.  And I want to clearly state that I believe compile-time
fixes are appropriate for stage 3.

Perhaps the confusion was whether Steven was asking if he's facing a
priori rejection, or if he's asking for a priori approval.  Those are two
entirely different requests.

--Kaveh
--
Kaveh R. Ghazi  [EMAIL PROTECTED]


Re: Limits of stage3 changes

2007-11-18 Thread Mark Mitchell
Kaveh R. GHAZI wrote:

>> Kaveh> I think the answer is "maybe".  In the past we have counted 
>> compile-time
>> Kaveh> savings as appropriate for stage3 regression fixes. 

Yes, we have, and I continue to believe that's reasonable.  If the
patches provide compile-time improvements, then I think they would count
as "bug fixes" and thus be potentially acceptable in Stage 3.  Thus, to
answer Steven's implied question:

> But I will not work on it now (or ask help from others)
> if it is a priori not acceptable for stage 3.

I think the answer is that the patch is not a priori unacceptable.

But, given that we're talking about a relatively large change, I think
the bar should be set higher than for a change to just a few lines of
code.  In particular, I would want to see that there is, in fact,
measurable improvement to compile-time -- whereas in Stage 1, we might
just approve the patch on the grounds that it moves us towards the
infrastructure that we want to be using throughout the compiler.

I would also want a relevant maintainer to carefully review the patch.
It might be that even though we think the patch is likely to be correct,
either that maintainer, or I, decide that there's too much associated
risk.  So, I don't want to promise that we would accept the patch in
Stage 3, either.

Steven, I recognize that might not be as definitive an answer as you
would like, but I hope you will understand my thinking.

Thanks,

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: Limits of stage3 changes

2007-11-18 Thread Steven Bosscher
On Nov 18, 2007 8:32 PM, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote:
> On Sun, 18 Nov 2007, Steven Bosscher wrote:
>
> > 2. But *I will not work on it* now (or ask help from others) if it is *a
> > priori* not acceptable for stage 3.
>
> As I parse your sentence, you were asking if your patch would be
> automatically (a priori) rejected for stage3.  If I say it may be
> acceptable, that means the rejection is not presumptively automatic.
> Therefore IMHO, I answered your question in an affirmative way.

The way I read your answer, you were asking for more than the "risky"
stuff.  A simple yes or no would have been sufficient.

> OTOH, I don't think final acceptance of your patch is guaranteed either,
> without the same analysis we ask of any contributor.

I don't understand why you keep repeating this. I'm not stupid, I'm
not exactly new to contributing to gcc (been working on it for more
than 7 years, in fact), and I never asked for special treatment.  I
know what I have to do to make a patch acceptable.  But, again,
approval of the patch was not what I was asking for.

Gr.
Steven


Re: Limits of stage3 changes

2007-11-18 Thread Steven Bosscher
On Nov 18, 2007 10:29 PM, Mark Mitchell <[EMAIL PROTECTED]> wrote:
> I think the answer is that the patch is not a priori unacceptable.
>
> But, given that we're talking about a relatively large change, I think
> the bar should be set higher than for a change to just a few lines of
> code.  In particular, I would want to see that there is, in fact,
> measurable improvement to compile-time -- whereas in Stage 1, we might
> just approve the patch on the grounds that it moves us towards the
> infrastructure that we want to be using throughout the compiler.

OK, that was more or less what I was expecting.

I don't expect the compile time impact to be very significant. The
changes in the patch I posted would remove 2 passes over all RTL insns
when compiling with -fgcse.  That is significant but not
earth-shattering, compared to the ~80 per function we do during a
compilation.  Plus, the real bottle-necks in gcse.c are not in
compute_sets() and in the computation of CUIDs, but in the dataflow
solver and functions like find_avail_set().

(Those bottle-necks are also fixable (they are gone in my CPROP
rewrite) but that's something I would not even want to contribute for
stage3 ;-))

> I would also want a relevant maintainer to carefully review the patch.
> It might be that even though we think the patch is likely to be correct,
> either that maintainer, or I, decide that there's too much associated
> risk.

Not to brag, but I think I know df and gcse as well as anyone. So I
know the associated risks, and I can share my thoughts about them with
you ;-)

My biggest worry was that the removal of CUIDs would break something,
and the patch as posted indeed does break PRE (causes ICEs when taking
the INSN_CUID of a deleted insn).  The removal of compute_sets() is
quite safe, because compute_sets() computes exactly the same
information as df-scan does.

>  So, I don't want to promise that we would accept the patch in
> Stage 3, either.

OK, then, all things considered, I think I can better spend my time on
something else than this patch.

> Steven, I recognize that might not be as definitive an answer as you
> would like, but I hope you will understand my thinking.

No, this is the kind of answer I was looking for.  I fully understand
your thinking, and I just wanted to check my own thoughts against the
opinions on the list.  So thanks for the elaborate answer!

Gr.
Steven