On Wed, May 27, 2015 at 2:11 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova > <evgeniya.maenk...@gmail.com> wrote: >> Hi, Richard >> >> Thanks for review starting. >> >> Do you see any major issues with this patch (i.e. algorithms and ideas >> that should be completely replaced, effectively causing the re-write >> of most code)? >> >> To decide if there are major issues in the patch, perhaps, you need >> additional clarifications from me? Could you point at the places where >> additional explanations could save you most effort? >> >> Your answers to these questions are looking the first priority ones. >> You wrote about several issues in the code, which are looking as easy >> (or almost easy ;) to fix(inline functions, unswitch-loops flag, >> comments, etc). But, I think you agree, let’s first decide about the >> major issues (I mean, whether we continue with this patch or starting >> new one, this will save a lot of time for both of us). > > I didn't get an overall idea on how the patch works, that is, how it > integrates > with the existing algorithm. If you can elaborate on that a bit that would > be helpful. > Hi, Sure, I'll write you some notes in several days.
> I think the code-generation part needs some work (whether by following > my idea with re-using copy_bbs or whether by basically re-implementing > it is up to debate). How does your code handle > > for () > { > if (cond1) > { > if (cond2) > invariant; > if (cond3) > invariant; > } > } > > ? Optimally we'd have before the loop exactly the same if () structure > (thus if (cond1) is shared). If both invariants are going out of the same loop (i mean tgt_level), then if structure will be the same. for1() for () { if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } will be transformed to for1() if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } for () { if (cond1) { if (cond2); if (cond3); } } (I don't cleanup empty if's in lim code). If these invarians are moved in different loops then for1 for2() for() { if (cond1) { if (cond2) invariant1; if (cond3) invariant2; } } will be transformed to: for1 { if (cond1) if (cond2) invariant1; for2() { if (cond1) if (cond3) invariant2; for() { if (cond1) { if (cond2); if (cond3); } } } } Of course, there could be some bugs, but the idea was as mentioned above. This transformation was looking logical to me. What do you think? Thanks, Evgeniya > > Richard. > > >> Thanks, >> >> Evgeniya >> >> On Tue, May 26, 2015 at 2:31 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova >>> <evgeniya.maenk...@gmail.com> wrote: >>>> Hi, >>>> >>>> Could you please review my patch for predicated lim? >>>> >>>> Let me note some details about it: >>>> >>>> >>>> >>>> 1) Phi statements are still moved only if they have 1 or 2 >>>> arguments. However, phi statements could be move under conditions (as >>>> it’s done for the other statements). Probably, phi statement motion >>>> with 3 + arguments could be implemented in the next patch after >>>> predicated lim. >>>> >>>> 2) Patch has limitations/features like (it was ok to me to >>>> implement it such way, maybe I’m not correct. ): >>>> >>>> a) Loop1 >>>> >>>> { >>>> >>>> If (a) >>>> >>>> Loop2 >>>> >>>> { >>>> >>>> Stmt - Invariant for Loop1 >>>> >>>> } >>>> >>>> } >>>> >>>> In this case Stmt will be moved only out of Loop2, because of if >>>> (a). >>>> >>>> b) Or >>>> >>>> Loop1 >>>> >>>> { >>>> >>>> … >>>> >>>> If (cond1) >>>> >>>> If (cond2) >>>> >>>> If (cond3) >>>> >>>> Stmt; >>>> >>>> } >>>> >>>> Stmt will be moved out only if cond1 is always executed in Loop1. >>>> >>>> c) It took me a long time to write all of these code, so there >>>> might be other peculiarities which I forgot to mention. :) >>>> >>>> Let’s discuss these ones as you will review my >>>> patch. >>>> >>>> 3) Patch consists of 9 files: >>>> >>>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: >>>> >>>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >>>> predicated lim moves 2 more statements out of the loop; >>>> >>>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional >>>> lim recip optimization in this test doesn’t work (the corresponding >>>> value is below threshold as I could see in the code for recip, 1<3). >>>> So to have recip working in this test I changed test a little bit. >>>> >>>> b) gcc/tree-ssa-loop-im.c – the patched lim per se >>>> >>>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >>>> >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >>>> >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >>>> >>>> the tests for conditional lim. >>>> >>>> 4) Patch testing: >>>> >>>> a) make –k check (no difference in results for me for the clean >>>> build and the patched one, >>>> >>>> - Revision: 222849, >>>> >>>> - uname -a >>>> >>>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct >>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >>>> >>>> b) Bootstrap. >>>> >>>> It goes well now, however to fix it I have made a temporary hack in >>>> the lim code. And with this fix patch definitely shouldn’t be >>>> committed. >>>> >>>> I did so, as I would like to discuss this issue first. >>>> >>>> The problem is: I got stage2-stage3 comparison failure on the single >>>> file (tree-vect-data-refs.o). After some investigation I understood >>>> that tree-vect-data-refs.o differs being compiled with and without >>>> ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, >>>> and for stage 3 this is ‘-g –O2’. But to simplify things I can >>>> reproduce this difference on the same build (even not bootstrapped), >>>> with option ‘ –g’ and without it). >>>> >>>> Of course, the file compiled with –g option will contain debug >>>> information and will differ from the corresponding file without debug >>>> information. I mean there is the difference reported by >>>> contrib/compare-debug (I mean we talk about stripped files). >>>> >>>> Then I compared assemblers and lim dump files and made a conclusion >>>> the difference is: >>>> >>>> There is statement _484=something, which is conditionally moved out of >>>> loop X. In non debug cases no usages of _484 are left inside the loop >>>> X. In debug case, there is the single use in debug statement. So for >>>> debug case my patch generates phi statement for _484 to make it >>>> available inside the loop X. For non debug case, no such phi statement >>>> generated as there is no uses of _484. >>>> >>>> There is one more statement with lhs=_493, with exactly this pattern >>>> of use. But this is not important for our discussion. >>>> >>>> >>>> >>>> So, in my opinion it’s ok to generate additional phi node for debug >>>> case. But I’m not a compiler expert and maybe there is some >>>> requirement that debug and non-debug versions should differ only by >>>> debug statements, I don’t know. >>> >>> As Andi said code generation needs to be the same. >>> >>>> My temporary hack fix is just removing of such debug statements (no >>>> debug statements, no problems J). >>> >>> That's fine and generally what is done in other passes. >>> >>>> The alternatives I see are: >>>> >>>> - Move debug statements outside the loop (not good in my opinion); >>>> >>>> - Somehow reset values in debug statements (not good in my >>>> opinion, if I could provide correct information for them); >>> >>> You could re-set them via gimple_debug_bind_reset_value which >>> will tell the user that at this point the value is optimized out >>> (that's slightly >>> better than removing the debug stmts). >>> >>>> - Generate phi for debug statements and fix bootstrap (say, >>>> let’s have some list of files, which we have special check for. I mean >>>> for my case, the difference is not in stage2 and stage3, it’s in debug >>>> and non-debug). I like this variant. However, as I don’t see such list >>>> of special files in the whole gcc, I think I might be missing >>>> something. >>> >>> The policy ;) >>> >>>> What do you think? >>> >>> Now about the patch. Generally there seem to be spurious whitespace-only >>> or formatting differences - try to avoid these. >>> >>> All new functiuons need a comment explaining them and their function >>> parameters. >>> >>> +static cond_data_map *get_cond_data_map (struct loop * level) >>> +{ >>> >>> per GCC coding conventions this should be formatted like >>> >>> static cond_data_map * >>> get_cond_data_map (struct loop * level) >>> { >>> >>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\ >>> + BB->aux = (void *) (XCNEW (struct move_cond));\ >>> + BB_MOVE_COND (BB)->type=ALWAYS_EXECUTED_IN;\ >>> + BB_MOVE_COND (BB)->loop=VAL; >>> >>> this kind of side-effects in macros are bad - better turn them into >>> inline functions. >>> >>> - if (flag_unswitch_loops >>> - && gimple_code (stmt) == GIMPLE_COND) >>> - { >>> - /* If we perform unswitching, force the operands of the invariant >>> - condition to be moved out of the loop. */ >>> - return MOVE_POSSIBLE; >>> - } >>> + if (gimple_code (stmt) == GIMPLE_COND) >>> + return MOVE_POSSIBLE; >>> >>> any reason you removed the guard on flag_unswitch_loops? We may >>> want to add a new flag, certainly this transform might not be suitable >>> for -O[12s] for example. Like if not all stmts inside a conditional can >>> be moved the transform will increase code-size. >>> >>> You are doing a (lot of?) refactoring which makes review of the patch >>> harder. For example >>> >>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost, >>> + bool cond_executed, basic_block bb) >>> +{ >>> + struct lim_aux_data *lim_dat >>> ... >>> + >>> + if (lim_data->cost >= LIM_EXPENSIVE) >>> + set_profitable_level (stmt); >>> + return true; >>> +} >>> >>> ... >>> >>> - >>> - lim_data = init_lim_data (stmt); >>> - lim_data->always_executed_in = outermost; >>> - >>> - if (!determine_max_movement (stmt, false)) >>> - { >>> - lim_data->max_loop = NULL; >>> - continue; >>> - } >>> - >>> - if (dump_file && (dump_flags & TDF_DETAILS)) >>> - { >>> - print_gimple_stmt (dump_file, stmt, 2, 0); >>> - fprintf (dump_file, " invariant up to level %d, cost %d.\n\n", >>> - loop_depth (lim_data->max_loop), >>> - lim_data->cost); >>> - } >>> - >>> - if (lim_data->cost >= LIM_EXPENSIVE) >>> - set_profitable_level (stmt); >>> + if (!calculate_tgt_level (stmt, outermost, cond_executed, bb)) >>> + continue; >>> >>> where it is not obvious why moving set_profitable_level should be >>> in a function named calculate_tgt_level. In fact determine_max_movement >>> was supposed to be the abstraction already. >>> >>> Without yet getting an idea of the overall algorithm (I'm really missing >>> such >>> a thing...) it seems to me that the current handling of doing invariant >>> motion >>> of PHIs should handle the analysis phase just fine (you might be adding >>> more advanced cases in this patch, but I'd rather do that as a followup for >>> initial support). >>> >>> The code-gen part looks somewhat ad-hoc to me. In principle if we have >>> a list of PHIs to hoist we can materialize the needed part of the original >>> basic-block structure on the pre-header edge together with the controlling >>> conditionals. The loop pre-header would be the always-executed BB >>> and the needed part would always start with the loop header and be a >>> single-entry multiple-exit region. So ideally we could use sth like >>> gimple_duplicate_sese_region with the BB copying worker copy_bbs >>> refactored/replaced with something only copying controlling conditionals >>> and statements/PHIs that a callback identifies (tree-loop-distribution.c >>> would also like sth like that - currently it copies whole loops and removes >>> stmts it didn't intend to copy...). >>> >>> So basically invariantness_dom_walker would record a vector of PHIs >>> to hoist (or we'd construct that at move_computations time). >>> If there is any control-flow to hoist then we'd do an alternative >>> move_computations by "copying" the needed CFG and covered stmts. >>> In theory doing it the loop-distribution way (simply copy the whole >>> loop body and then remove anything not needed) would also be possible >>> though that wouldn't be the very best idea obviously ;) >>> >>> Thanks, >>> Richard. >>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> Evgeniya >> >> >> >> -- >> Thanks, >> >> Evgeniya >> >> perfstories.wordpress.com -- Thanks, Evgeniya perfstories.wordpress.com