On 29.01.2016 02:09, Bernd Schmidt wrote:
> On 01/28/2016 12:36 AM, Eric Botcazou wrote:
>>> [cc-ing Eric as RTL maintainer]
>>
>> Sorry for the delay, the message apparently bounced]
>>
>>> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
>>> bit of LRA/reload logic:
>>>
>>> [...]
>>>
>>> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
>>> are expected to trigger the calculation of the target's frame layout
>>> (since you need that information to answer the question).
>>> To me it seems wrong that we're attempting to call that sort of
>>> macro in a query routine like rtx_addr_can_trap_p_1.
>>
>> I'm a little uncomfortable stepping in here because, while I
>> essentially share
>> your objections and was opposed to the patch (I was almost sure that
>> it would
>> introduce maintainance issues for no practical benefit), I didn't
>> imagine that
>> such a failure mode would have been possible (computing an
>> approximation of
>> the frame layout after reload being problematic) so I didn't really
>> object to
>> being overruled after seeing Bernd's patch...
>>
>>> IMO we should cache the information we need@the start of each
>>> LRA/reload cycle.  This will be more robust, because there will
>>> be no accidental changes to global state either during or after
>>> LRA/reload.  It will also be more efficient because
>>> rtx_addr_can_trap_p_1 can read the cached variables rather
>>> than calling back into the target.
>>
>> That would be a better design for sure and would eliminate the
>> maintainance
>> issues I was originally afraid of.
>>
>> My recommendation would be to back out Bernd's patch for GCC 6.0
>> (again, it
>> doesn't fix any regression and, more importantly, any bug in real
>> software,
>> but only corner case bugs in dumb computer-generated testcases) but
>> with the
>> commitment to address the underlying issue for GCC 7.0 and backport
>> the fix to
>> GCC 6.x unless really impracticable.  That being said, it's ultimately
>> Jakub
>> and Richard's call.
>
> I'm on the fence; I do think the original problem is an issue we should
> fix, but I'm also not terribly happy with the implementation we have
> right now. Besides the issues already mentioned, doesn't it kind of
> assume these offsets are constant (which they definitely aren't,
> consider arg pushes for example)?
>

Yes that is right.  I saw it as a thing that could possibly happen more
often than we know, because it is difficult to spot a wrong code by
ordinary tests.  Even the reproducer from pr61047 does not crash when
it runs in the gcc testsuite, I had to tweak the example first to use
a larger offset to compensate the large number of environment values
that are passed from the test suite in each test execution.

Yes, rtx_addr_can_trap_p_1 does not know how many bytes are pushed on
the stack and I saw no easy way how to get to the REG_NOTES for
instance, because they are attached to the INSN and rtx_addr_can_trap_p
has only access to a MEM rtx.  It is no exact science, but the error is
on the safe side.

Nevertheless, with the data from the target hook the approximation is
good enough, to not pessimize any single bit of code that is generated
in stage2 vs. stage3, which would not have been the case without some
help from the target.

> I think a better approach might be to just mark accesses at known
> locations in the frame, or arg pushes, as MEM_NOTRAP_P, and consider
> accesses with non-constant or calculated offsets as potentially trapping.
>

Yes I think also that might be a next step.  It would probably be good
to somehow fixate the result of rtx_addr_can_trap_p immediately before
reload when the RTX's are still FRAMEP+x and ARGP+x, and annotate that
somehow to the reloaded RTX's, that way it would finally be superfluous
to call the target hook at all, because the actual addresses should not
change during or after reload.  It will probably have to be a spare bit
on the RTX that is currently unused, because MEM_NOTRAP_P is already
used for something different.

It will however not be simple to find a valid piece of C code where the
current implementation with all of its limitations generates different
code compared to an implementation that has access to the exact offsets.
As I said, I tried already, but could not find an example of a missed
optimization due to my patch.


Thanks
Bernd.

>
> Bernd

Reply via email to