On 05/16/14 04:07, Bin.Cheng wrote:
On Fri, May 16, 2014 at 1:13 AM, Jeff Law <l...@redhat.com> wrote:
On 05/15/14 10:51, Mike Stump wrote:

On May 15, 2014, at 12:26 AM, bin.cheng <bin.ch...@arm.com> wrote:

Here comes up with a new GCC pass looking through each basic block
and merging paired load store even they are not adjacent to each
other.


So I have a target that has load and store multiple support that
supports large a number of registers (2-n registers), and I added a
sched0 pass that is a light copy of the regular scheduling pass that
uses a different cost function which arranges all loads first, then
all stores then everything else.  Within a group of loads or stores
the secondary key is the base register, the next key is the offset.
The net result, all loads off the same register are sorted in
increasing order.

Glad to see someone else stumble on (ab)using the scheduler to do this.
Emm, If it's (ab)using, should we still do it then?
I think it'd still be fine. There's even been a comment about doing this kind of thing in the scheduler that's been around since the early 90s...

The scheduler is a bit interesting in that it has a wealth of dependency information and the ability to reorganize the insn stream in relatively arbitrary ways. That seems to make it a natural place to think about transformations of this nature. We just haven't had a good infrastructure for doing that.

In theory we're a lot closer now to being able to plug in different costing/sorting models and let the scheduler do its thing. Those models might rewrite for register pressure, or encourage certain independent insns to issue back-to-back to encourage combining, or to build candidate insns for delay slot scheduling, etc.

As Mike stated, merging of consecutive memory accesses is all about
the base register and the offset. I am thinking another method
collecting all memory accesses with same base register then doing the
merge work.  In this way, we should be able to merge more than 2
instructions, also it would be possible to remove redundant load
instructions in one pass.

My question is how many these redundant loads could be?  Is there any
rtl pass responsible for this now?
I suspect it's a lot less important now than it used to be. But there's probably some cases where it'd be useful. Combining sub-word accesses into full-word accesses come immediately to mind.

I'm not aware of any pass which does these kind of changes in a general form. Some passes (caller-save) do a fair amount of work to track when they can generate multi-object loads/stores (and it was a huge win back on the old sparc processors).


jeff

Reply via email to