On Wednesday 17 January 2018 11:13 PM, Wilco Dijkstra wrote: > Are you saying the same issue exists for all stores with writeback? If so then > your patch would need to address that too.
Yes, I'll be posting a separate patch for that because the condition set is slightly different for it. It will also be accompanied with a slightly different tuning for addrcost, which is why it needs separate testing. > It seems way more fundamental if it affects anything that isn't a simple > immediate offset. Again I suggest using the existing cost infrastructure > to find a setting that improves performance. If discouraging pre/post > increment > helps Falkor then that's better than doing nothing. The existing costs don't differentiate between loads and stores and that is specifically what I need for falkor. >>> I think a special case for Falkor in aarch64_address_cost would be >>> acceptable >>> in GCC8 - that would be much smaller and cleaner than the current patch. >>> If required we could improve upon this in GCC9 and add a way to >>> differentiate >>> between loads and stores. >> >> I can't do this in address_cost since I can't determine whether the >> address is a load or a store location. The most minimal way seems to be >> using the patterns in the md file. > > Well I don't think the approach of blocking specific patterns is a good > solution to > this problem and may not be accepted by AArch64 maintainers. Try your example > with -funroll-loops and compare with my suggestion (with or without extra > code to > increase cost of writeback too). It seems to me adjusting costs is always > going to > result in better overall code quality, even if it also applies to loads for > the time being. Costs are not useful for this scenario because they cannot differentiate between loads and stores. To make that distinction I have to block specific patterns, unless there's a better way I'm unaware of that helps determine whether a memory reference is a load or a store. Another approach I am trying to minimize the change is to add a new ADDR_QUERY_STR for aarch64_legitimate_address_p, which can then be used in classify_address to skip register addressing mode for falkor. That way we avoid the additional hook. It will still need the additional Utf memory constraint though. Do you know of a way I can distinguish between loads and stores in costs tuning? Siddhesh
