Re: OO benches

2004-04-19 Thread Dan Sugalski
At 11:19 AM +0100 4/17/04, Piers Cawley wrote:
Leopold Toetsch <[EMAIL PROTECTED]> writes:

 Aaron Sherman <[EMAIL PROTECTED]> wrote:
 On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:

 Sorry, I gave the wrong impression. I meant it looks suspiciously like
 Python is doing a lazy construction on those objects, not that there is
 anything wrong with the benchmark.
 No, I don't think that this is happening. Parrot's slightly slower
 object instantiation is due to register preserving mainly. The "__init"
 code is run from inside the "new PObj, IClass" opcode. As its not known
 that a method call is happening here, we can't use register preserving
 operations that only save needed registers--we have to save all
 registers. These two memcpys are the most heavy part of the operation.
Maybe we should rethink that then and make allocation and
initialization two different phases.
That's the way I'm leaning. I know it's a *bad* idea from a 
high-level language point of view, but from the lower levels it's 
less of a bad idea.

New, then, would allocate the object and you'd need to then call its 
constructor, with the constructor call using full-on parrot calling 
conventions and giving the calling code a chance to save the 
registers it was interested in. Of course, then we get into the issue 
of handling return values from multiple calls into methods as we 
automatically redispatch the constructor, but...
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: OO benches

2004-04-17 Thread Leopold Toetsch
Piers Cawley wrote:

Leopold Toetsch <[EMAIL PROTECTED]> writes:
These two memcpys are the most heavy part of the operation.

Maybe we should rethink that then and make allocation and
initialization two different phases. Or dictate that 

   new PObj, IClass

should be treated as if it were a function call with all the caller
saves implications that go with it. 
Well, its not only object creation. While this is a bit special and 
could have a special syntax, the problem is with all delegate usage e.g. 
for tying.
If we need some extra speed for object creation, we could define it as

  new PObj, IClass, "BUILD"  # call sub in BUILD prop
  new PObj, IClass, "CONSTRUCT"  # call sub in CONSTRUCT prop
  new PObj, IClass   # no init call at all
and just save needed registers, as we know, that a Sub is called (or not).

But as said, it doesn't help here:

$ time perl ff.pl
010
real0m3.287s
$ time parrot -j  ff.pasm
010
real0m2.334s
leo :)


ff.pl
Description: Perl program
newclass P1, "FF"
addattribute P1, 'r'
find_type I12, "FF"
new P2, I12

set I10, 0
set I11, 50
loop:
set I15, P2
inc I10
lt I10, I11, loop

set I15, P2
print I15
set I15, P2
print I15
set I15, P2
print I15
print "\n"
end
.namespace ["FF"]
.pcc_sub __init:
classoffset I0, P2, 'FF'
new P3, .PerlInt
setattribute P2, I0, P3
invoke P1
.pcc_sub __get_integer:
classoffset I0, P2, 'FF'
getattribute P3, P2, I0
new P4, .PerlInt
band P4, P3, 1
inc P3
if P4, r1
set I5, 0
invoke P1
r1: set I5, 1
invoke P1



Re: OO benches

2004-04-17 Thread Piers Cawley
Leopold Toetsch <[EMAIL PROTECTED]> writes:

> Aaron Sherman <[EMAIL PROTECTED]> wrote:
>> On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:
>
>> Sorry, I gave the wrong impression. I meant it looks suspiciously like
>> Python is doing a lazy construction on those objects, not that there is
>> anything wrong with the benchmark.
>
> No, I don't think that this is happening. Parrot's slightly slower
> object instantiation is due to register preserving mainly. The "__init"
> code is run from inside the "new PObj, IClass" opcode. As its not known
> that a method call is happening here, we can't use register preserving
> operations that only save needed registers--we have to save all
> registers. These two memcpys are the most heavy part of the operation.

Maybe we should rethink that then and make allocation and
initialization two different phases. Or dictate that 

   new PObj, IClass

should be treated as if it were a function call with all the caller
saves implications that go with it. 




Re: OO benches

2004-04-17 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote:

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:

Fixed. It was caused by the faster PMC creation code I've put in earlier
in the week, if ARENA_DOD_FLAGS is off (e.g. due to missing memalign).

Thanks for reporting,
leo


Re: OO benches

2004-04-17 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote:

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:

> t/op/gc.NOK 11# Failed test (t/op/gc.t at line 219)
> #  got: 'get_pmc_keyed_str() not implemented in class
> 'RetContinuation''

Have that now too - recompiled with ARENA_DOD_FLAGS turned off. The
property hash got freed during DOD. I'm still searching why.

leo


Re: OO benches

2004-04-17 Thread Jeff Clites
On Apr 16, 2004, at 11:19 PM, Jeff Clites wrote:

BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's  
related:

Failed Test  Stat Wstat Total Fail  Failed  List of Failed
--- 

t/op/gc.t   1   256131   7.69%  11
t/pmc/dumper.t 13  332813   13 100.00%  1-13
t/pmc/object-meths.t1   256191   5.26%  9
t/pmc/objects.t 7  1792377  18.92%  23-26 28 35-36
And of those, only these 2 fail if run without --gc-debug, _or_ if  
configured with --optimize (seems like an odd correlation):

Failed TestStat Wstat Total Fail  Failed  List of Failed
 
---
t/op/gc.t 1   256131   7.69%  11
t/pmc/dumper.t1   256131   7.69%  12

JEff



Re: OO benches

2004-04-16 Thread Leopold Toetsch
Jeff Clites <[EMAIL PROTECTED]> wrote:

> On Apr 16, 2004, at 9:29 AM, Leopold Toetsch wrote:

>> $ ./bench -b=^oo[234f]

> Looks cool!

Yep.

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:

Strange. valgrind doesn't indicate any problem with these tests.

> I'll poke a bit and see if I can figure out what's going on.

Yes please.

>> - constant strings e.g. "BUILD" get a precomputed hash value from
>> c2str.pl

> This isn't checked in yet, right? (Didn't see c2str.pl anywhere.)

It was attached to yesterdays message "Constant strings - again". But
I'll resend it with my recent changes WRT hashvalue precalculation.

>> - use of _S("BUILD") and _S("CONSTRUCT") in objects.c

> Mac OS X doesn't like the _S()--it seems it may already be defined to
> something. How about something clearer (and less likely to conflict)
> instead, like STRING_LITERAL()?

We can undef it before using. STRING_LITERAL is more typing and doesn't
assure uniqueness - so rather not. Maybe PSC() - Parrot String Constant.

> JEff

leo


Re: OO benches

2004-04-16 Thread Leopold Toetsch
Aaron Sherman <[EMAIL PROTECTED]> wrote:
> On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:

> Sorry, I gave the wrong impression. I meant it looks suspiciously like
> Python is doing a lazy construction on those objects, not that there is
> anything wrong with the benchmark.

No, I don't think that this is happening. Parrot's slightly slower
object instantiation is due to register preserving mainly. The "__init"
code is run from inside the "new PObj, IClass" opcode. As its not known
that a method call is happening here, we can't use register preserving
operations that only save needed registers--we have to save all
registers. These two memcpys are the most heavy part of the operation.

> Lazy construction is perhaps something Parrot should think about too,

I can't imagine that lazy construction could be of any value. You have
to construct it finally. Sum up the two parts.

And 90% (or ~100 with gcc 3.3.3 on a Pentium) of Python's performance
isn't that bad the more that Python AFAIK is constructing kind of a hash
and we have a full fledged object.

leo


Re: OO benches

2004-04-16 Thread Jeff Clites
On Apr 16, 2004, at 9:29 AM, Leopold Toetsch wrote:

With all current optimizations[1] I now have these timings:

$ ./bench -b=^oo[234f]
Numbers are relative to the first one. (lower is better)
p-j-Oc  perl-th perlpython  ruby
oo2 100%182%152%90% 132%
oo3 100%276%256%333%383%
oo4 100%137%128%171%292%
oofib   100%303%261%157%161%
Looks cool!

BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's  
related:

Failed Test  Stat Wstat Total Fail  Failed  List of Failed
 
---
t/op/gc.t   1   256131   7.69%  11
t/pmc/dumper.t 13  332813   13 100.00%  1-13
t/pmc/object-meths.t1   256191   5.26%  9
t/pmc/objects.t 7  1792377  18.92%  23-26 28 35-36

The gc test is failing with:

t/op/gc.NOK 11# Failed test (t/op/gc.t at line 219)
#  got: 'get_pmc_keyed_str() not implemented in class  
'RetContinuation''
# expected: 'hello
# hello
# '
# '(cd . && ./parrot -b --gc-debug /tmp/gc_11.pasm)' failed with exit  
code 2

and all of the dumper ones look like double-frees:

t/pmc/dumperNOK 7# Failed test (t/pmc/dumper.t at line  
359)
#  got: '*** malloc[9416]: Deallocation of a pointer not  
malloced: 0x200ee30; This could be a double free(), or free() called  
with the middle of an allocated block

I'll poke a bit and see if I can figure out what's going on.

- constant strings e.g. "BUILD" get a precomputed hash value from  
c2str.pl
This isn't checked in yet, right? (Didn't see c2str.pl anywhere.)

- use of _S("BUILD") and _S("CONSTRUCT") in objects.c
Mac OS X doesn't like the _S()--it seems it may already be defined to  
something. How about something clearer (and less likely to conflict)  
instead, like STRING_LITERAL()?

JEff



Re: OO benches

2004-04-16 Thread Aaron Sherman
On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:
> Aaron Sherman <[EMAIL PROTECTED]> wrote:
> 
> > That looks suspicious... especially Python.
> 
> You have the sources in examples/benchmarks. Maybe we are comparing
> apples and oranges. But the code looks good to me.

Sorry, I gave the wrong impression. I meant it looks suspiciously like
Python is doing a lazy construction on those objects, not that there is
anything wrong with the benchmark.

Lazy construction is perhaps something Parrot should think about too,
though I've not looked into what Parrot does now. How often would it be
of any value to construct a stub object which still needed to be fully
constructed before use? Do such objects pop into existence implicitly in
any commonly-used places that would yield a performance win in the
general case?

Just wondering why Python would do that (if, indeed it is doing that).

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback




Re: OO benches

2004-04-16 Thread Leopold Toetsch
Aaron Sherman <[EMAIL PROTECTED]> wrote:

> That looks suspicious... especially Python.

You have the sources in examples/benchmarks. Maybe we are comparing
apples and oranges. But the code looks good to me.

> I would suggest using iterations that go much longer so that you can
> detect over-optimizations and such more easily.

More benchmarks welcome.

> Very nice!

leo


Re: OO benches

2004-04-16 Thread Aaron Sherman
On Fri, 2004-04-16 at 12:29, Leopold Toetsch wrote:
> With all current optimizations[1] I now have these timings:
> 
> $ ./bench -b=^oo[234f]
> Numbers are relative to the first one. (lower is better)
>  p-j-Oc  perl-th perlpython  ruby
> oo2 100%182%152%90% 132%
> oo3 100%276%256%333%383%

That looks suspicious... especially Python. It smells there's some lazy
evaluation going on here, and that object doesn't get fully instantiated
until oo3. I suspect, in that light, that the numbers aren't quite as
bad for Parrot as they look in oo2, nor as good for Parrot as the look
in oo3 (well, maybe as good as they look, but not as bad... I have to
think about that).

> $ time CALL__BUILD=1  parrot -j oo2b.pasm
> real0m2.566s
> vs  0m2.630s  for oo2.pasm
> (w.o any of these optimizations oo2b takes 3.9s)

I would suggest using iterations that go much longer so that you can
detect over-optimizations and such more easily.

Very nice!

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback




OO benches

2004-04-16 Thread Leopold Toetsch
With all current optimizations[1] I now have these timings:

$ ./bench -b=^oo[234f]
Numbers are relative to the first one. (lower is better)
p-j-Oc  perl-th perlpython  ruby
oo2 100%182%152%90% 132%
oo3 100%276%256%333%383%
oo4 100%137%128%171%292%
oofib   100%303%261%157%161%
And
$ time CALL__BUILD=1  parrot -j oo2b.pasm
real0m2.566s
vs  0m2.630s  for oo2.pasm
(w.o any of these optimizations oo2b takes 3.9s)
oo2 is basically object instantiation, oo2b calls the method in the 
BUILD property, oo2 calls __init directly.
oo3 is attribute get
oo4 is attribute set (where Parrot creates new PMCs, which isn't really 
needed :)
oofib tests mostly function/method call speed

[1]
- set_string_native references the string
- constant strings e.g. "BUILD" get a precomputed hash value from c2str.pl
- use of _S("BUILD") and _S("CONSTRUCT") in objects.c
Athlon 800
Parrot -O3 (gcc 2.92.2)
perl-th is threaded 5.8
perl is 5.8 with long double support
python 2.3.3
ruby 1.8.0