On 08/04/2011 01:10 PM, Andrew Haley wrote:
>>  It's the sort of thing that gets done in threaded interpreters,
>>  where you really need to keep a few pointers in registers and
>>  the interpreter itself is a very long function.  gcc has always
>>  done a dreadful job of register allocation in such cases.
>
>  Sure, but what I have seen people use global register variables
>  for this (which means they get taken away from the register allocator).

Not always though, and the x86 has so few registers that using a
global register variable is very problematic.  I suppose you could
compile the threaded interpreter in a file of its own, but I'm not
sure that has quite the same semantics as local register variables.

Indeed, local register variables give almost the same benefit as globals with half the burden. The idea is that you don't care about the exact register that holds the contents but, by specifying a callee-save register, GCC will use those instead of memory across calls. This reduces _a lot_ the number of spills.

The problem is that people who care about this stuff very much don't
always read...@gcc.gnu.org  so won't be heard.  But in their own world
(LISP, Forth) nice features like register variables and labels as
values have led to gcc being the preferred compiler for this kind of
work.

/me raises hands.

For GNU Smalltalk, using

#if defined(__i386__)
# define __DECL_REG1 __asm("%esi")
# define __DECL_REG2 __asm("%edi")
# define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
#endif

#if defined(__x86_64__)
# define __DECL_REG1 __asm("%r12")
# define __DECL_REG2 __asm("%r13")
# define __DECL_REG3 __asm("%rbx")
#endif

...

  register unsigned char *ip __DECL_REG1;
  register OOP * sp __DECL_REG2;
  register intptr_t arg __DECL_REG3;

improves performance by up to 20% if I remember correctly. I can benchmark it if desired.

It does not come for free, in some cases the register allocator does some stupid things due to the hard register declaration. But it gets much better code overall, so who cares about the microoptimization.

Of course, if the register allocator did the right thing, or if I could use simply

  unsigned char *ip __attribute__(__do_not_spill_me__(20)));
  OOP *sp __attribute__(__do_not_spill_me__(10)));
  intptr_t arg __attrbite__(__do_not_spill_me__(0)));

that would be just fine.

Paolo

Reply via email to