Limits of stage3 changes
Hello, The amount of duplicate work done on RTL is sometimes really amazing, especially since the merge of the dataflow branch. Some of the people who have worked on the dataflow branch had hoped that other developers would help with the follow-up actions to actually *use* all the information that, for example, the df-scan problem collects. My favorite example of this lack of follow-through is gcse.c. It computes reg-def chains and monotonic insn IDs. Guess what df-scan provides? Attached is an example of the cleanups I would have liked to see during stage 2 of GCC 4.3. The patch is not finished, but it gives you an idea of what is possible. The patch avoids computation of redundant information by making gcse use the df-scan information, instead of computing everything for itself. In theory, this saves memory and it reduces the number of passes over all RTX'en in memory. So, if finished, this should give compile time improvements and a slightly smaller memory footprint. Question is, whether this kind of rather large changes is acceptable for stage 3 or not. Me, I call it a "regression fix" if it reduces compile time. But I will not work on it now (or ask help from others) if it is a priori not acceptable for stage 3. Thus, thoughts please! :-) (FWIW I have almost complete rewrites of half the passes in gcse.c, and the attached patch is a kind-of back port of the ideas from my new implementations...) Gr. Steven * gcse.c (uid_cuid, max_uid, INSN_CUID, max_cuid, struct reg_set, reg_set_table, REG_SET_TABLE_SLOP, reg_set_in_block, alloc_reg_set_mem, free_reg_set_mem, record_one_set, record_set_info, compute_set, grealloc): Remove. (recompute_all_luids): New function. (gcse_main): Don't compute sets, and don't do related memory allocations/free-ing. If something changed before the end of the pass, update LUIDs using recompute_all_luids. (alloc_gcse_mem): Don't compute LUIDs. Don't allocate reg_set memory. (free_gcse_mem): Don't free it either. (oprs_unchanged_p, load_killed_in_block, record_last_reg_set_info): Use the df insn LUIDs. (load_killed_in_block): Likewise. (compute_hash_table_work): Don't compute reg_set_in_block. (compute_transp): Use DF_REG_DEF_CHAINs. (local_cprop_pass): Don't use compute_sets and related functions. (one_cprop_pass, pre_gcse, one_pre_gcse_pass, one_code_hoisting_pass): Use get_max_uid() instead of max_cuid. (insert_insn_end_basic_block, pre_insert_copy_insn, update_ld_motion_stores): Don't try to keep reg_set tables up to date. (pre_insert_copies): Use df insn LUIDs. (reg_set_info): Don't use extra bitmap argument. (compute_store_table): Don't compute reg_set_in_block. Use DF scan information to compute regs_set_in_block. (free_store_memory, store_motion): Don't nullify reg_set_in_block. (bypass_jumps): Don't use compute_sets and friends. Index: gcse.c === *** gcse.c (revision 130236) --- gcse.c (working copy) *** along with GCC; see the file COPYING3. *** 27,36 - a store to the same address as a load does not kill the load if the source of the store is also the destination of the load. Handling this allows more load motion, particularly out of loops. -- ability to realloc sbitmap vectors would allow one initial computation - of reg_set_in_block with only subsequent additions, rather than - recomputing it for each pass - */ /* References searched while implementing this. --- 27,32 *** static struct hash_table expr_hash_table *** 362,431 /* Copy propagation hash table. */ static struct hash_table set_hash_table; - /* Mapping of uids to cuids. -Only real insns get cuids. */ - static int *uid_cuid; - - /* Highest UID in UID_CUID. */ - static int max_uid; - - /* Get the cuid of an insn. */ - #ifdef ENABLE_CHECKING - #define INSN_CUID(INSN) \ - (gcc_assert (INSN_UID (INSN) <= max_uid), uid_cuid[INSN_UID (INSN)]) - #else - #define INSN_CUID(INSN) (uid_cuid[INSN_UID (INSN)]) - #endif - - /* Number of cuids. */ - static int max_cuid; - /* Maximum register number in function prior to doing gcse + 1. Registers created during this pass have regno >= max_gcse_regno. This is named with "gcse" to not collide with global of same name. */ static unsigned int max_gcse_regno; - /* Table of registers that are modified. - -For each register, each element is a list of places where the pseudo-reg -is set. - -For simplicity, GCSE is done on sets of pseudo-regs only. PRE GCSE only -requires knowledge of which blocks kill which regs [and thus could use -a bitmap instead of the lists `reg_set_table' uses]. - -`reg_set_table' and could be turned into an array of bitmaps (num-bbs x -num-regs) [however perhaps it may be useful to keep the data as is]. One -advantage of recording things this way is that `reg_set_table' is fairly -sparse with r
Re: Limits of stage3 changes
On Nov 16, 2007 11:43 PM, Steven Bosscher <[EMAIL PROTECTED]> wrote: > Hello, > > The amount of duplicate work done on RTL is sometimes really amazing, > especially since the merge of the dataflow branch. Some of the people > who have worked on the dataflow branch had hoped that other developers > would help with the follow-up actions to actually *use* all the > information that, for example, the df-scan problem collects. > > My favorite example of this lack of follow-through is gcse.c. It > computes reg-def chains and monotonic insn IDs. Guess what df-scan > provides? > > Attached is an example of the cleanups I would have liked to see > during stage 2 of GCC 4.3. The patch is not finished, but it gives > you an idea of what is possible. The patch avoids computation of > redundant information by making gcse use the df-scan information, > instead of computing everything for itself. In theory, this saves > memory and it reduces the number of passes over all RTX'en in memory. > So, if finished, this should give compile time improvements and a > slightly smaller memory footprint. > > Question is, whether this kind of rather large changes is acceptable > for stage 3 or not. Me, I call it a "regression fix" if it reduces > compile time. But I will not work on it now (or ask help from others) > if it is a priori not acceptable for stage 3. > > Thus, thoughts please! :-) I think we should be able to rely on DF enough to make mostly mechanical changes like this ok for stage3. I don't know where we are in terms of compile-time compared to 4.2 or 4.1, but certainly if you go back further compile-time is always a "regression". Thanks, Richard.
Re: Limits of stage3 changes
My favorite example of this lack of follow-through is gcse.c. It computes reg-def chains and monotonic insn IDs. Guess what df-scan provides? This is great. I did that for combine and CSE, but I didn't know GCSE as well. From the description from my first read I like this patch, and I think you are the person that has studied GCSE more so I trust it a lot. (Most of this could be done in postreload GCSE too, by the way). Paolo
Re: Limits of stage3 changes
On Fri, Nov 16, 2007 at 11:43:31PM +0100, Steven Bosscher wrote: > Hello, > > The amount of duplicate work done on RTL is sometimes really amazing, > especially since the merge of the dataflow branch. Some of the people > who have worked on the dataflow branch had hoped that other developers > would help with the follow-up actions to actually *use* all the > information that, for example, the df-scan problem collects. I think a lot of people don't know how to use DF. Myself, I wouldn't know where to look other than in the source code. That's fine as a reference, but doesn't help a complete newbie to DF a lot. Because DF is new, there will be newbies to DF. -- Rask Ingemann Lambertsen Danish law requires addresses in e-mail to be logged and stored for a year
Re: Limits of stage3 changes
On Fri, 16 Nov 2007, Steven Bosscher wrote: > Question is, whether this kind of rather large changes is acceptable > for stage 3 or not. Me, I call it a "regression fix" if it reduces > compile time. But I will not work on it now (or ask help from others) > if it is a priori not acceptable for stage 3. > > Thus, thoughts please! :-) I think the answer is "maybe". In the past we have counted compile-time savings as appropriate for stage3 regression fixes. However IMHO you would need to provide some measurement of the improvements (memory saved, speed timings) so the RM and perhaps middle-end maintainers can weigh the risk vs benefit and make an informed decision. So far you've only shown us the "risky" part, i.e. the patch. --Kaveh -- Kaveh R. Ghazi [EMAIL PROTECTED]
Re: Limits of stage3 changes
On Nov 18, 2007 7:28 AM, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote: > On Fri, 16 Nov 2007, Steven Bosscher wrote: > > > Question is, whether this kind of rather large changes is acceptable > > for stage 3 or not. Me, I call it a "regression fix" if it reduces > > compile time. But I will not work on it now (or ask help from others) > > if it is a priori not acceptable for stage 3. ... > I think the answer is "maybe". In the past we have counted compile-time > savings as appropriate for stage3 regression fixes. However IMHO you > would need to provide some measurement of the improvements (memory saved, > speed timings) so the RM and perhaps middle-end maintainers can weigh the > risk vs benefit and make an informed decision. > > So far you've only shown us the "risky" part, i.e. the patch. Did you read what I wrote (and you quoted)? 1. I call it a "regression fix" if it reduces compile time. [and a patch that fixes a regression should be considered for stage 3] 2. But *I will not work on it* now (or ask help from others) if it is *a priori* not acceptable for stage 3. ["a priori" as in "don't replace the dataflow engine in our global optimizers during stage2", and "not work on" includes not spend time and effort doing testing/measurements unless I have the feeling it is worth the trouble.] Gr. Steven
Re: Limits of stage3 changes
> Kaveh R GHAZI writes: Kaveh> I think the answer is "maybe". In the past we have counted compile-time Kaveh> savings as appropriate for stage3 regression fixes. However IMHO you Kaveh> would need to provide some measurement of the improvements (memory saved, Kaveh> speed timings) so the RM and perhaps middle-end maintainers can weigh the Kaveh> risk vs benefit and make an informed decision. Kaveh, The time savings fall out of the algorithm improvements. David
Re: Limits of stage3 changes
On Sun, 18 Nov 2007, Steven Bosscher wrote: > 2. But *I will not work on it* now (or ask help from others) if it is *a > priori* not acceptable for stage 3. As I parse your sentence, you were asking if your patch would be automatically (a priori) rejected for stage3. If I say it may be acceptable, that means the rejection is not presumptively automatic. Therefore IMHO, I answered your question in an affirmative way. OTOH, I don't think final acceptance of your patch is guaranteed either, without the same analysis we ask of any contributor. You'll have to decide how you feel about "maybe", is the glass half full or half empty? --Kaveh -- Kaveh R. Ghazi [EMAIL PROTECTED]
Re: Limits of stage3 changes
On Sun, 18 Nov 2007, David Edelsohn wrote: > > Kaveh R GHAZI writes: > > Kaveh> I think the answer is "maybe". In the past we have counted > compile-time > Kaveh> savings as appropriate for stage3 regression fixes. However IMHO you > Kaveh> would need to provide some measurement of the improvements (memory > saved, > Kaveh> speed timings) so the RM and perhaps middle-end maintainers can weigh > the > Kaveh> risk vs benefit and make an informed decision. > > Kaveh, > The time savings fall out of the algorithm improvements. > David I believe you that there are improvements to be had, and I'd like to see them in GCC. And I want to clearly state that I believe compile-time fixes are appropriate for stage 3. Perhaps the confusion was whether Steven was asking if he's facing a priori rejection, or if he's asking for a priori approval. Those are two entirely different requests. --Kaveh -- Kaveh R. Ghazi [EMAIL PROTECTED]
Re: Limits of stage3 changes
Kaveh R. GHAZI wrote: >> Kaveh> I think the answer is "maybe". In the past we have counted >> compile-time >> Kaveh> savings as appropriate for stage3 regression fixes. Yes, we have, and I continue to believe that's reasonable. If the patches provide compile-time improvements, then I think they would count as "bug fixes" and thus be potentially acceptable in Stage 3. Thus, to answer Steven's implied question: > But I will not work on it now (or ask help from others) > if it is a priori not acceptable for stage 3. I think the answer is that the patch is not a priori unacceptable. But, given that we're talking about a relatively large change, I think the bar should be set higher than for a change to just a few lines of code. In particular, I would want to see that there is, in fact, measurable improvement to compile-time -- whereas in Stage 1, we might just approve the patch on the grounds that it moves us towards the infrastructure that we want to be using throughout the compiler. I would also want a relevant maintainer to carefully review the patch. It might be that even though we think the patch is likely to be correct, either that maintainer, or I, decide that there's too much associated risk. So, I don't want to promise that we would accept the patch in Stage 3, either. Steven, I recognize that might not be as definitive an answer as you would like, but I hope you will understand my thinking. Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Limits of stage3 changes
On Nov 18, 2007 8:32 PM, Kaveh R. GHAZI <[EMAIL PROTECTED]> wrote: > On Sun, 18 Nov 2007, Steven Bosscher wrote: > > > 2. But *I will not work on it* now (or ask help from others) if it is *a > > priori* not acceptable for stage 3. > > As I parse your sentence, you were asking if your patch would be > automatically (a priori) rejected for stage3. If I say it may be > acceptable, that means the rejection is not presumptively automatic. > Therefore IMHO, I answered your question in an affirmative way. The way I read your answer, you were asking for more than the "risky" stuff. A simple yes or no would have been sufficient. > OTOH, I don't think final acceptance of your patch is guaranteed either, > without the same analysis we ask of any contributor. I don't understand why you keep repeating this. I'm not stupid, I'm not exactly new to contributing to gcc (been working on it for more than 7 years, in fact), and I never asked for special treatment. I know what I have to do to make a patch acceptable. But, again, approval of the patch was not what I was asking for. Gr. Steven
Re: Limits of stage3 changes
On Nov 18, 2007 10:29 PM, Mark Mitchell <[EMAIL PROTECTED]> wrote: > I think the answer is that the patch is not a priori unacceptable. > > But, given that we're talking about a relatively large change, I think > the bar should be set higher than for a change to just a few lines of > code. In particular, I would want to see that there is, in fact, > measurable improvement to compile-time -- whereas in Stage 1, we might > just approve the patch on the grounds that it moves us towards the > infrastructure that we want to be using throughout the compiler. OK, that was more or less what I was expecting. I don't expect the compile time impact to be very significant. The changes in the patch I posted would remove 2 passes over all RTL insns when compiling with -fgcse. That is significant but not earth-shattering, compared to the ~80 per function we do during a compilation. Plus, the real bottle-necks in gcse.c are not in compute_sets() and in the computation of CUIDs, but in the dataflow solver and functions like find_avail_set(). (Those bottle-necks are also fixable (they are gone in my CPROP rewrite) but that's something I would not even want to contribute for stage3 ;-)) > I would also want a relevant maintainer to carefully review the patch. > It might be that even though we think the patch is likely to be correct, > either that maintainer, or I, decide that there's too much associated > risk. Not to brag, but I think I know df and gcse as well as anyone. So I know the associated risks, and I can share my thoughts about them with you ;-) My biggest worry was that the removal of CUIDs would break something, and the patch as posted indeed does break PRE (causes ICEs when taking the INSN_CUID of a deleted insn). The removal of compute_sets() is quite safe, because compute_sets() computes exactly the same information as df-scan does. > So, I don't want to promise that we would accept the patch in > Stage 3, either. OK, then, all things considered, I think I can better spend my time on something else than this patch. > Steven, I recognize that might not be as definitive an answer as you > would like, but I hope you will understand my thinking. No, this is the kind of answer I was looking for. I fully understand your thinking, and I just wanted to check my own thoughts against the opinions on the list. So thanks for the elaborate answer! Gr. Steven