On Wednesday 17 January 2018 11:13 PM, Wilco Dijkstra wrote:
> Are you saying the same issue exists for all stores with writeback? If so then
> your patch would need to address that too.

Yes, I'll be posting a separate patch for that because the condition set
is slightly different for it.  It will also be accompanied with a
slightly different tuning for addrcost, which is why it needs separate
testing.

> It seems way more fundamental if it affects anything that isn't a simple
> immediate offset. Again I suggest using the existing cost infrastructure
> to find a setting that improves performance. If discouraging pre/post 
> increment
> helps Falkor then that's better than doing nothing.

The existing costs don't differentiate between loads and stores and that
is specifically what I need for falkor.

>>> I think a special case for Falkor in aarch64_address_cost would be 
>>> acceptable
>>> in GCC8 - that would be much smaller and cleaner than the current patch. 
>>> If required we could improve upon this in GCC9 and add a way to 
>>> differentiate
>>> between loads and stores.
>>
>> I can't do this in address_cost since I can't determine whether the
>> address is a load or a store location.  The most minimal way seems to be
>> using the patterns in the md file.
> 
> Well I don't think the approach of blocking specific patterns is a good 
> solution to
> this problem and may not be accepted by AArch64 maintainers. Try your example
> with -funroll-loops and compare with my suggestion (with or without extra 
> code to
> increase cost of writeback too). It seems to me adjusting costs is always 
> going to
> result in better overall code quality, even if it also applies to loads for 
> the time being.

Costs are not useful for this scenario because they cannot differentiate
between loads and stores.  To make that distinction I have to block
specific patterns, unless there's a better way I'm unaware of that helps
determine whether a memory reference is a load or a store.

Another approach I am trying to minimize the change is to add a new
ADDR_QUERY_STR for aarch64_legitimate_address_p, which can then be used
in classify_address to skip register addressing mode for falkor.  That
way we avoid the additional hook.  It will still need the additional Utf
memory constraint though.

Do you know of a way I can distinguish between loads and stores in costs
tuning?

Siddhesh

Reply via email to