Starting from the unbearable fact, that optimized compiled C is still faster then parrot -j (in primes.pasm), I did this experiment:
- do register allocation for JIT in imcc
- use the first N registers as MAPped processor registers

Here is the JIT optimized PASM output of

$ imcc -Oj -o p.pasm primes.pasm
$ cat p.pasm
set ri2, 1
set I5, 50
set I4, 0
print "N primes up to "
print I5
print " is: "
time N1
set rn1, N1 # load
REDO:
set ri0, 2
div ri3, ri2, 2
LOOP:
cmod ri1, ri2, ri0
if ri1, OK # with -O1j unless ri1, NEXT
branch NEXT # deleted
OK: # deleted
inc ri0
le ri0, ri3, LOOP
inc I4
set I6, ri2
NEXT:
inc ri2
le ri2, I5, REDO
time N0
set rn0, N0 # load
print I4
print "\nlast is: "
print I6
print "\n"
sub rn0, rn1
set N0, rn0 # save
print "Elapsed time: "
print N0
print "\n"
end

The ri? and rn? are processor registers, above is for intel (4 mapped int/float regs), you can translate the ri? to [%ebx, %edi, %esi, %edx).
The processor regs are represented as (-1 - parrot_reg),
i.e. %ebx == -1, %edi == -2 ...

The MAP macro in jit_emit.h would then be:
# define MAP(i) ((i)>= 0 ? 0 : ...map_branch[jit_info->op_i -1-(i)])
where the mappings are directly intval_map or floatval_map. JIT wouldn't need any further calculations.

The load/save instructions get inserted by looking at op_jit[].extcall, i.e. if the instruction reads or writes a register, it gets saved/loaded before/after and the parrot register is used instead. (Only the print and time ops are external in i386).

I currently have the imcc part for some common cases, emough for above output.

What do people think?

For reference: a similar idea: "Of mops and microops"

leo
PS: -O3 C 3.64s, JIT ~3.55.

Reply via email to