On 28.01.2016 23:17, Richard Sandiford wrote: > Bernd Edlinger <bernd.edlin...@hotmail.de> writes: >> On 26.01.2016 22:18, Richard Sandiford wrote: >>> [cc-ing Eric as RTL maintainer] >>> >>> Matthew Fortune <matthew.fort...@imgtec.com> writes: >>>> Bernd Edlinger <bernd.edlin...@hotmail.de> writes: >>>>> Matthew Fortune <matthew.fort...@imgtec.com> writes: >>>>>> Has the patch been tested beyond just building GCC? I can do a >>>>>> test run for you if you don't have things set up to do one yourself. >>>>> >>>>> I built a cross-gcc with all languages and a cross-glibc, but I have >>>>> not set up an emulation environment, so if you could give it a test >>>>> that would be highly welcome. >>>> >>>> mipsel-linux-gnu test results are the same before and after this patch. >>>> >>>> Please go ahead and commit. >>> >>> I still object to this. And it feels like the patch was posted >>> as though it was a new one in order to avoid answering the objections >>> that were raised when it was last posted: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html >>> >> >> Richard, I am really sorry when you feel now like I did not take your >> objections seriously. Let me first explain what happened from my point >> of view: >> >> When I posted this response to your objections here: >> >> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html >> >> I was waiting for your response, but nothing happened, so I kind of >> forgot about the issue. In the meantime Ubuntu and Debian began to >> roll out GCC-6 and they got stuck at the same issue, but instead of >> waiting for pr69012 to be eventually resolved they created a duplicate >> pr69129, and then last Thursday Nick applied my initial patch without >> my intervention, probably because of the pressure they put on us. > > Ah, I'd missed that, sorry. It's not obvious from the email thread > that the patch was actually approved. I just see a message from Matthias > saying that it worked for him and then a message from Nick saying that > he'd applied it. > > Ah well. I guess at some point I have to get over the fact that I'm no > longer the MIPS maintainer :-) > >> That changed significantly how I looked at the issue after that point, >> as there was no actual regression anymore, the revised patch still made >> sense, but for another reason. When you look at the 20+ targets in the >> gcc tree you'll see that almost all of them have a frame-layout >> computation function and all except mips have a shortcut >> "if (reload_completed) return;" in that function. And OTOH mips has >> one of the most complicated frame layout functions of all targets. >> >> For all of these reasons I posted a new patch which tries to resolve >> differences between mips and other targets inital_elimination_offset >> functions. > > OK. But the point still stands that the patch is only useful because > we're now calling mips_compute_frame_info in cases where we wouldn't > previously, because of the rtx_addr_can_trap_p changes. >
Yes of course, but it cannot hurt to have all targets behave identical in such a central point. Even if at some point also the implementation in rtx_addr_can_trap_p will eventually be improved in one way or the other. >> I still think that it is not OK for the mips target to do the frame >> layout already in mips_frame_pointer_required because the frame layout >> will change several times, until reload is completed, and that function >> is only called right in the beginning. > > I don't think it's any better or worse than doing the frame layout in > INITIAL_ELIMINATION_OFFSET (which is common practice and pretty much > required). They're both part of the initial setup phase -- > targetm.frame_layout_required determines frame_pointer_needed, which is > a vital input to the code that decides which eliminations to make. > Hmm... That can also be a difference between LRA and traditional reload. Reload calls targetm.frame_pointer_required in update_eliminables, and can apparently handle 0=>1 and 1=>0 transitions here. While LRA calls frame_pointer_required only once in ira_setup_eliminable_regset and can only handle 0=>1 transitions of frame_pointer_needed in setup_can_eliminate, but not based on targetm.frame_pointer_required but targetm.can_eliminate(FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM). At least it I read the source correctly, I could be wrong of course. > And this is an example of us doing the kind of caching that I was > suggesting. Code that wants to know whether the frame pointer is needed > for the current function should use frame_pointer_needed. Only the code > that sets up frame_pointer_needed should call frame_pointer_required > directly. > But frame_pointer_needed adds at least some value to the raw targetm.frame_pointer_required, while a cached result of initial_elimination_offset would not, just conserve the current interface exactly as it is. >> And I think that it is not really well-designed to have a frame layout >> function F(x,y)->z unless you assume F to be a trivial O(1) function >> that has no significant computation overhead. When you assume F to be >> an expensive O(N) or higher complexity you would design the interface >> like compute_frame_layout_now() and >> get_cached_initial_elimination_offset(x,y)->z. >> >> Also note, that with the current design, of initial_elimination_offset >> the frame layout is already computed several times, because reload has >> to call the function with different parameters, and there is no way how >> to know when the cached value can be used and when not. > > Agreed -- the current interface is pretty poor. But that's even more > reason not to add uses of it after LRA/reload. Caching would be both > simpler and more efficient. > >> I do also think that having to cache the initial elimination offset >> values in the back-end that are already cached in the target does not >> improve anything. Then I would prefer to have a set of new target >> callbacks that makes it easy to access the cached target values. > > I don't agree. Asking each backend to cache a particular value in > its frame_info structure and then providing a hook to query that > cached value means adding code to every target. And calling an > indirect function is going to much less efficient than accessing > a structure field. > > It seems better to have one piece of code (or maybe two, until > reload goes away) that caches the information that the target-independent > code needs. > It is for sure a matter of personal taste, nothing more. And yes, it is do-able, and not even really complicated, but it would not improve anything, just conserve the interface as it is (or as it was before I spoiled it :). I would rather like to improve something at this point. For instance I think it would be good to split the initial_elimination_offset function in an optional compute_initial_frame_layout function, doing only the O(N) computation and leave the actual O(1) querying of the result in the initial_elimination_offset function. If the target does not implement the compute_initial_frame_layout the default just does nothing, and the initial_elimination_offset function does not have to change, but if the target implements the optional compute_initial_frame_layout, it can rely on that function to be called immediately before the initial_elimination_offset, which will then be only an O(1) function. That would at least improve a tiny bit over the current state of the initial_elimination_offset function. Tanks Bernd. > I think that applies to other frame-related information too. At the > moment there's no simple way for the postreload passes to tell which > registers are available for use and which have to be left untouched, > so we get complicated tests like: > > for (i = nregs - 1; i >= 0; --i) > if (TEST_HARD_REG_BIT (this_unavailable, new_reg + i) > || fixed_regs[new_reg + i] > || global_regs[new_reg + i] > /* Can't use regs which aren't saved by the prologue. */ > || (! df_regs_ever_live_p (new_reg + i) > && ! call_used_regs[new_reg + i]) > #ifdef LEAF_REGISTERS > /* We can't use a non-leaf register if we're in a > leaf function. */ > || (crtl->is_leaf > && !LEAF_REGISTERS[new_reg + i]) > #endif > || ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i)) > return false; > > This is an example of relying (to some extent) on querying the target > code every time we want to know something, but surely it would be > better to have LRA/reload create a single "usable registers" HARD_REG_SET. > (I have an old patch series for that but never found time to clean it up > and submit it.) > >> If you want to propose a patch for that I will not object, but I doubt >> it will be suitable for Stage 4. > > Fair enough. Guess I just have to learn to live with it. > > Thanks, > Richard >