On Fri, Sep 29, 2006 at 05:27:10AM +0000, Erich Plondke wrote: > rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq > instructions (respectively), but it looks like there's no reason for > the register allocater to allocate registers together. The peephole2 > just picks up loads to adjacent memory locations if the allocater > happens to choose adjacent registers (is that correct?) or the > variables are specified as living in hard registers with the help > of an asm. > > Several other architectures have paired loads: some ARM targets have ldrd > which can be cheaper than a ldm, and ia64 has a pair load. > > It seems like GCC does a good job of knowing how to modify register- > sized subregs of two- or four-register larger modes. So if I could > tell GCC to turn: > > [(set (reg:SI X) (mem:SI (addr))) > (set (reg:SI Y) (mem:SI (addr+4)))] > > (where addr is aligned to DI) into something like: > [(set (reg:DI T) (mem:DI (addr))) > (set (reg:SI X) (subreg:SI (reg:DI T) 0)) > (set (reg:SI Y) (subreg:SI (reg:DI T) 4))] > > and I could do so early enough, GCC would know to access the subregs > directly in instruction(s) using the loaded values, and I would end up > loading > the register pair and using the individual elements. But it has to > be done early on; after register allocation even if I could get a > DI temporary I'd probably have the two SI moves and that's probably > not a win.
You may have success using the combine pass to do this. The difficulty is that combine only tries to combine instructions when the LOG_LINKS field is set up. I think this only happens for plain SET insns when subregs are involved, e.g. (set (subreg:SI (reg:DI T) 0) (mem:SI addr)) (set (subreg:SI (reg:DI T) 4) (mem:SI addr+4)) For example, I don't know how to make this work with adjecent structure fields. You could try to extend the optimization that GCC already does for loading adjecent structure fields smaller than a word; the one enabled by SLOW_BYTE_ACCESS. -- Rask Ingemann Lambertsen