Re: Using imcc as JIT optimizer

2003-02-20 Thread Leopold Toetsch
Sean O'Rourke wrote:


On Thu, 20 Feb 2003, Leopold Toetsch wrote:


What do people think?



Cool idea -- a lot of optimization-helpers could eventually be passed on
to the jit (possibly in the metadata?).  One thought -- the information
imcc computes should be platform-independent.  e.g. it could pass a
control flow graph to the JIT, but it probably shouldn't do register
allocation for a specific number of registers.  How much worse do you
think it would be to have IMCC just rank the Parrot registers in order of
decreasing spill cost, then have the JIT take the top N, where N is the
number of available architectural registers?



The registers are already in that order (with -Op or -Oj), this wouldn't 
be a problem. Difficulties arise, when it comes to the register 
load/save instructions, which get inserted by imcc in my scheme. These 
are definitely processor/$arch specific. They depend on the number of 
mappable (and non-preserved too) registers, and on the state of the 
op_jit function table.

Of course CFG and register life information could be passed on to the 
JIT, but this seems a little bit complicated, as JIT has it's own 
sections, which match either a basic block from imcc or are a sequence 
of non-JITable instructions.
But in the long run, it could be a way to go. OTOH - PBC compatibility 
is not a big point here, when JIT is involved: in 99% of the time the 
code would run on the machine, where it is generated.
And it would be AFAIK easier, to make some JIT crosscompiler. This would 
basically only need the amount of mappable registers and the extcall 
bits from the jump table, read in from some config file.


/s



leo






Using imcc as JIT optimizer

2003-02-20 Thread Leopold Toetsch
Starting from the unbearable fact, that optimized compiled C is still 
faster then parrot -j (in primes.pasm), I did this experiment:
- do register allocation for JIT in imcc
- use the first N registers as MAPped processor registers

Here is the JIT optimized PASM output of

$ imcc -Oj -o p.pasm primes.pasm
$ cat p.pasm
set ri2, 1
set I5, 50
set I4, 0
print N primes up to 
print I5
print  is: 
time N1
set rn1, N1 # load
REDO:
set ri0, 2
div ri3, ri2, 2
LOOP:
cmod ri1, ri2, ri0
if ri1, OK			# with -O1j unless ri1, NEXT
branch NEXT			# deleted
OK:		 
		# deleted
inc ri0
le ri0, ri3, LOOP
inc I4
set I6, ri2
NEXT:
inc ri2
le ri2, I5, REDO
time N0
set rn0, N0 # load
print I4
print \nlast is: 
print I6
print \n
sub rn0, rn1
set N0, rn0 # save
print Elapsed time: 
print N0
print \n
end

The ri? and rn? are processor registers, above is for intel (4 mapped 
int/float regs), you can translate the ri? to [%ebx, %edi, %esi, %edx).
The processor regs are represented as (-1 - parrot_reg),
i.e. %ebx == -1, %edi == -2 ...

The MAP macro in jit_emit.h would then be:
# define MAP(i) ((i)= 0 ? 0 : ...map_branch[jit_info-op_i -1-(i)])
where the mappings are directly intval_map or floatval_map. JIT wouldn't 
need any further calculations.

The load/save instructions get inserted by looking at op_jit[].extcall, 
i.e. if the instruction reads or writes a register, it gets saved/loaded 
before/after and the parrot register is used instead. (Only the print 
and time ops are external in i386).

I currently have the imcc part for some common cases, emough for above 
output.

What do people think?

For reference: a similar idea: Of mops and microops

leo
PS: -O3 C 3.64s, JIT ~3.55.



Re: Using imcc as JIT optimizer

2003-02-20 Thread Daniel Grunblatt
On Thursday 20 February 2003 18:14, Leopold Toetsch wrote:
 Tupshin Harper wrote:
  Leopold Toetsch wrote:
  Starting from the unbearable fact, that optimized compiled C is still
  faster then parrot -j (in primes.pasm)
 
  Lol...what are you going to do when somebody comes along with the
  unbearable example of primes.s(optimized x86 assembly), and you are
  forced to throw up your hands in defeat? ;-)

 It only may be equally fast, that's it :)
Nahh, you know it can be faster... may be in a couple of years ;-D


  Cool idea, if I understand correctly, and I am in awe of how fast the
  bloody thing is already.

 That's integer/float only. When it comes to objects, different things
 matter.

  -Tupshin

 leo



Re: Configure.pl --cgoto=0 doesn't work

2003-02-20 Thread Simon Glover

On Thu, 20 Feb 2003, Nicholas Clark wrote:

 If I

 perl Configure.pl --cgoto=0  make all test

 then the build fails with:

 ccache /usr/local/bin/gcc -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H  
 -I/usr/local/include  -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline 
 -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Waggregate-return 
 -Winline -W -Wno-unused -Wsign-compare -Wformat-nonliteral -Wformat-security 
 -Wpacked -Wpadded -Wdisabled-optimization  -I./include  -DHAS_JIT -DI386 -o 
 jit_cpu.o -c jit_cpu.c
 In file included from jit_cpu.c:39:
 include/parrot/jit_emit.h:2302:39: parrot/oplib/core_ops_cgp.h: No such file or 
 directory
 In file included from jit_cpu.c:39:
 include/parrot/jit_emit.h: In function `Parrot_jit_begin':
 include/parrot/jit_emit.h:2349: `cgp_core' undeclared (first use in this function)
 include/parrot/jit_emit.h:2349: (Each undeclared identifier is reported only once
 include/parrot/jit_emit.h:2349: for each function it appears in.)
 *** Error code 1

 The problem is that an ifdef in jit/i386/jit_emit.h is defining JIT_CGP
 based on whether or not the compiler is GCC, and not on whether
 HAS_COMPUTED_GOTO is defined. The attached patch fixes this, but I'm
 not sure if the __GCC__ bit is still necessary. Leo?

 Simon

--- jit/i386/jit_emit.h.old Thu Feb 20 20:59:11 2003
+++ jit/i386/jit_emit.h Thu Feb 20 20:58:52 2003
 -8,7 +8,7 

 #include assert.h

-#ifdef __GNUC__
+#if defined HAS_COMPUTED_GOTO  defined __GCC__
 #  define JIT_CGP
 #endif




Re: Configure.pl --cgoto=0 doesn't work

2003-02-20 Thread Simon Glover

On Thu, 20 Feb 2003, Simon Glover wrote:

 On Thu, 20 Feb 2003, Nicholas Clark wrote:

  If I
 
  perl Configure.pl --cgoto=0  make all test
 
  then the build fails with:
 
  ccache /usr/local/bin/gcc -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H  
  -I/usr/local/include  -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline 
  -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings 
  -Waggregate-return -Winline -W -Wno-unused -Wsign-compare -Wformat-nonliteral 
  -Wformat-security -Wpacked -Wpadded -Wdisabled-optimization  -I./include  
  -DHAS_JIT -DI386 -o jit_cpu.o -c jit_cpu.c
  In file included from jit_cpu.c:39:
  include/parrot/jit_emit.h:2302:39: parrot/oplib/core_ops_cgp.h: No such file or 
  directory
  In file included from jit_cpu.c:39:
  include/parrot/jit_emit.h: In function `Parrot_jit_begin':
  include/parrot/jit_emit.h:2349: `cgp_core' undeclared (first use in this function)
  include/parrot/jit_emit.h:2349: (Each undeclared identifier is reported only once
  include/parrot/jit_emit.h:2349: for each function it appears in.)
  *** Error code 1

  The problem is that an ifdef in jit/i386/jit_emit.h is defining JIT_CGP
  based on whether or not the compiler is GCC, and not on whether
  HAS_COMPUTED_GOTO is defined. The attached patch fixes this, but I'm
  not sure if the __GCC__ bit is still necessary. Leo?

 OK, let's try this again, with the _correct_ spelling this time...

 Simon

--- jit/i386/jit_emit.h.old Thu Feb 20 21:43:53 2003
+++ jit/i386/jit_emit.h Thu Feb 20 21:43:20 2003
 -8,7 +8,7 

 #include assert.h

-#ifdef __GNUC__
+#if defined HAVE_COMPUTED_GOTO  defined __GCC__
 #  define JIT_CGP
 #endif




Re: --optimize

2003-02-20 Thread Simon Glover

On Tue, 18 Feb 2003 [EMAIL PROTECTED] wrote:

 I think --optimize alone is busted.


 You'd be right: the code adding the optimization flags to the list of
 compiler flags is only executed if 'debugging' is defined, which is
 only the case when the --debugging flag has been used.

 The patches below seem to fix this problem and get the --optimize
 option working as intended.

 One minor niggle that remains is that when I use the --debugging option,
 I get two copies of '-g' appended to the ldflags, ie:

  LINKFLAGS =  -L/usr/local/lib
  LDFLAGS =  -L/usr/local/lib -g  -g

 I'm not sure what's happening here, but fortunately it seems to be
 harmless.

 Simon

--- config/init/debug.pl.oldThu Feb 20 21:56:31 2003
+++ config/init/debug.plThu Feb 20 22:01:42 2003
 -10,11 +10,11  $description=Enabling debugging...;

 sub runstep {
   if (Configure::Data-get('debugging')) {
-my($ccflags, $linkflags, $ldflags, $optimize) =
-  Configure::Data-get(qw(ccflags linkflags ldflags optimize));
+my($ccflags, $linkflags, $ldflags) =
+  Configure::Data-get(qw(ccflags linkflags ldflags));
 my($cc_debug, $link_debug, $ld_debug) =
   Configure::Data-get(qw(cc_debug link_debug ld_debug));
-$ccflags .=  $cc_debug $optimize;
+$ccflags .=  $cc_debug;
 $linkflags .=  $link_debug;
 $ldflags .=  $ld_debug;

--- /dev/null   Thu Aug 30 16:30:55 2001
+++ config/init/optimize.pl Thu Feb 20 22:01:57 2003
 -0,0 +1,26 
+package Configure::Step;
+
+use strict;
+use vars qw($description args);
+use Parrot::Configure::Step;
+
+$description=Enabling optimization...;
+
[EMAIL PROTECTED]();
+
+sub runstep {
+  if (Configure::Data-get('optimize')) {
+my($ccflags, $optimize) =
+  Configure::Data-get(qw(ccflags optimize));
+$ccflags .=  $optimize;
+
+Configure::Data-set(
+ ccflags = $ccflags,
+);
+  }
+  else {
+print (none requested) ;
+  }
+}
+
+1;

--- lib/Parrot/Configure/RunSteps.pm.oldThu Feb 20 21:59:48 2003
+++ lib/Parrot/Configure/RunSteps.pmThu Feb 20 22:00:03 2003
 -10,6 +10,7  use vars qw(steps);
init/miniparrot.pl
init/hints.pl
init/debug.pl
+init/optimize.pl
inter/progs.pl
inter/types.pl
inter/ops.pl

--- MANIFEST.oldThu Feb 20 22:13:05 2003
+++ MANIFESTThu Feb 20 22:13:17 2003
 -112,6 +112,7  config/init/hints/os2.pl
 config/init/hints/vms.pl
 config/init/manifest.pl
 config/init/miniparrot.pl
+config/init/optimize.pl
 config/inter/exp.pl
 config/inter/ops.pl
 config/inter/pmc.pl



Re: Using imcc as JIT optimizer

2003-02-20 Thread Tupshin Harper
Leopold Toetsch wrote:

Starting from the unbearable fact, that optimized compiled C is still 
faster then parrot -j (in primes.pasm)
Lol...what are you going to do when somebody comes along with the 
unbearable example of primes.s(optimized x86 assembly), and you are 
forced to throw up your hands in defeat? ;-)

Cool idea, if I understand correctly, and I am in awe of how fast the 
bloody thing is already.

-Tupshin





Re: Objects, methods, attributes, properties, and other related frobnitzes

2003-02-20 Thread Dan Sugalski
At 2:06 PM + 2/19/03, Peter Haworth wrote:
On Fri, 14 Feb 2003 15:56:25 -0500, Dan Sugalski wrote:
 I got clarification. The sequence is:

 1) Search for method of the matching name in inheritance tree
 2) if #1 fails, search for an AUTOLOAD
 3) if #2 fails (or all AUTOLOADs give up) then do MM dispatch
Shouldn't we be traversing the inheritance tree once, doing these three
steps at each node until one works, rather doing each step once for the
whole tree. MM dispatch probably complicates this, though.
No, you have to do it multiple times. AUTOLOAD is a last-chance 
fallback, so it ought not be called until all other chances have 
failed.

If my derived class has an autoloaded method which overrides the base class'
method, I don't want the base class method to be called, just because parrot
does things in a peculiar order. Well, I know it's the same order that perl5
does things, but it's still peculiar.
If you prototype the sub but AUTOLOAD the body it'll work OK.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Configure.pl --cgoto=0 doesn't work

2003-02-20 Thread Nicholas Clark
If I

perl Configure.pl --cgoto=0  make all test

then the build fails with:

ccache /usr/local/bin/gcc -DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H  -I/usr/local/include  
-Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow -Wpointer-arith 
-Wcast-qual -Wcast-align -Wwrite-strings -Waggregate-return -Winline -W -Wno-unused 
-Wsign-compare -Wformat-nonliteral -Wformat-security -Wpacked -Wpadded 
-Wdisabled-optimization  -I./include  -DHAS_JIT -DI386 -o jit_cpu.o -c jit_cpu.c
In file included from jit_cpu.c:39:
include/parrot/jit_emit.h:2302:39: parrot/oplib/core_ops_cgp.h: No such file or 
directory
In file included from jit_cpu.c:39:
include/parrot/jit_emit.h: In function `Parrot_jit_begin':
include/parrot/jit_emit.h:2349: `cgp_core' undeclared (first use in this function)
include/parrot/jit_emit.h:2349: (Each undeclared identifier is reported only once
include/parrot/jit_emit.h:2349: for each function it appears in.)
*** Error code 1


And if I don't disable cgoto I run out of swap. :-(
I've got 96M of swap, which happens to be enough to build world, perl and
gcc 3.2

I've not tried X yet :-)

Nicholas Clark


Re: Using imcc as JIT optimizer

2003-02-20 Thread Leopold Toetsch
Tupshin Harper wrote:

Leopold Toetsch wrote:

Starting from the unbearable fact, that optimized compiled C is still 
faster then parrot -j (in primes.pasm)


Lol...what are you going to do when somebody comes along with the 
unbearable example of primes.s(optimized x86 assembly), and you are 
forced to throw up your hands in defeat? ;-)


It only may be equally fast, that's it :)


Cool idea, if I understand correctly, and I am in awe of how fast the 
bloody thing is already.


That's integer/float only. When it comes to objects, different things 
matter.


-Tupshin


leo






Re: Objects, methods, attributes, properties, and other related frobnitzes

2003-02-20 Thread Mark Jason Dominus
Dan Sugalski [EMAIL PROTECTED]:
 At 2:06 PM + 2/19/03, Peter Haworth wrote:
 On Fri, 14 Feb 2003 15:56:25 -0500, Dan Sugalski wrote:
   I got clarification. The sequence is:
 
   1) Search for method of the matching name in inheritance tree
   2) if #1 fails, search for an AUTOLOAD
   3) if #2 fails (or all AUTOLOADs give up) then do MM dispatch
 
 Shouldn't we be traversing the inheritance tree once, doing these three
 steps at each node until one works, rather doing each step once for the
 whole tree. MM dispatch probably complicates this, though.
 
 No, you have to do it multiple times. AUTOLOAD is a last-chance 
 fallback, so it ought not be called until all other chances have 
 failed.

Pardon me for coming in in the middle, but it seems to me that only
one traversal should be necessary.  The first traversal can accumulate
a temporary linked list of AUTOLOAD subroutines.  If the first
traversal locates an appropriate method, the linked list is discarded.
If no appropriate method is found, control is dispatched to the
AUTOLOAD subroutine at the head of the list, if there is one; if the
list is empty the MM dispatch is tried.