Darwin Lives!

2001-11-26 Thread Dan Sugalski

Granted, I need to have the fink install installed to provide dynaloading, 
but with it I get a reasonably clean build, link, and test. Woohoo!

I'll check in the darwin hints file in a bit.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




mem manager direction

2001-11-26 Thread Michael Maraist

I've done a bunch of reading, and though I'm not finished, I'm starting to
look towards the following overall algorithm based on the below specified
assumptions.  I'm not _necessarily_ looking for comments at this point,
because I haven't finished evaluating the specifics of several algorithms,
but feel free to comment.

Foreward:
I'm looking at a hybrid of several algorithms, since different regions
have different advantages / disadvantages.  The typical downside of
hybriding is extra overhead in the inner-loop.  Here, the cost is in
a greater number of branches in the inner loop. You can skip to the
posted algorithm pseduo-code for a quick overview.


Since we're inlining the data-size, I'm selecting which sub-algorithm to
use based on the size.  The data-type (beit a raw character-buffer
like a string, or a nested data-structure such as an Array) will have
to be known and thus interleaved within the GC for now.  What I'd like to do is
define vtableish ops for each data-type, and have the perl-make-script
generate the core routine by concatenating the different types.
I don't want the overhead of a vtable function call, but I don't want
to expose the garbage collector to extension writers.  More to come
with this.

Small data-types ( size <= 256 Bytes):
Ideally, there's zero header-space due to the relative space-cost, and
since we're already using the size this should be possible.  For now, I'm
looking towards a generic copying-collector.  Generational schemes can
require additional header-space (either for marking or
boundry-references). The only advantage of making this generational would
be to minimize the scan-depth (most generational algorithms don't decend
through data-structures living in aged heaps), but this requires a
write-barrier, and I'm currently opposed to that (defeats much of the
advantage of abstracted garbage collection over ref-counting).  I'm still
considering this for terminal (non descending) data-types.  I figure small
data-types will be the most common by far, and so the less work done per
object the better.  Further, since each object will probably be visited on
each reclaimation stage, we don't get to take advantage of
anti-swap-thrashing techniques without imposing relatively high
out-of-band overhead.  The associated heap is grown (doubled in size)
whenever less than 33% of the heap is available after a reclaimation.
Unfortunately we don't know that we want to grow the heap until we've
finished reclaiming, and a regrowth involves another copying phase, so
it's deferred until the next collection.  I found that this isn't so bad
if we keep the max data-size much less than 33% of the heap.  I'm looking
at 64K as a starting heap-size, thus 256B is much less than 21K.

Medium data-types ( 256 < size <= page-size ):
Here we can afford header-prefixes, while copies are more expensive
(relative to other operations).  Allocations in this region can be pretty
eratic and thus hard to coallesce, so I'm looking at using a generational
scheme.  Unfortunately many such schemes "age" the wrong data, or require
a lot of extra copying.  Some of the nicer schemes only require comparison
to a water-mark (below which an object is considered aged, and above which
the object is considered young and thus likely to expire soon). Such
watermark schemes allow adaptive modification of the water-mark(s).  In
other words, when we find that most data is not dying, we want to raise
the water-mark (to avoid unneeded copying).  When we find that a lot of
aged data is dying, we want to lower the water-mark (to alleviate garbage
in the aged region).  I'm currenty fiddling with a compacting collector
here.  This requires two stages.  The first stage applies a "version" tag
prefixed to each object ( 4 byte integer (where possible)).  On each
execution of gc_reclaim, the version number is incremented.  In the
Dead-Object-Detection stage (which, in comparison, is also where small
objects are copied out to their semi-space), versions of all live objects
are updated.  Additionally, the allocated size of both the young and old
regions are calculated.  Unlike the copying-collector, we have the option
of growing the mid-sized heap before performing the copying/compaction;
providing more up-to-date and thus accurate feed-back.  If we don't copy,
then we only compact the "young" region.  Thus the advantage of aging
objects is to avoid copying them (at the expense of leaving fragments).
Alternatively, if we determine that there is too much free space in the
aged section, we can temporarily reset the water-mark and compact that
region too.  The big advantage of compaction is that the age is directly
related to how far up the heap we are; objects near the top are garunteed
to be younger than at lower levels.  The water-mark, is therefore an
accurate representation of age.  Since this heap is efficiently growable,
I'm looking to start it at 16K (or some fraction of the semi-space), since
each thread requires it's own local 

RE: Benchmarking the proposed RE engine

2001-11-26 Thread Dan Sugalski

At 11:10 AM 11/26/2001 -0800, Brent Dax wrote:
>Dan Sugalski;
># What I really want to know is whether using ops to do regexes
># is feasable.
># After that we'll work on programmatically generating the op stream.
>
>There are definitely some weaknesses to it (for example, it's hard to do
>operations on 'optimized' forms of things, like using bitmaps on
>strings) but it's certainly feasable.  (And that particular example can
>be sped up by caching in a hash once hashes are available.)

Then lets have some numbers! :) If all the general-purpose stuff's not good 
enough for RE performance, now is the time to find out. If it's not, we can 
do whatever we need to, short of single-threading regexes, to make them faster.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




RE: Benchmarking the proposed RE engine

2001-11-26 Thread Brent Dax

Dan Sugalski;
# What I really want to know is whether using ops to do regexes
# is feasable.
# After that we'll work on programmatically generating the op stream.

There are definitely some weaknesses to it (for example, it's hard to do
operations on 'optimized' forms of things, like using bitmaps on
strings) but it's certainly feasable.  (And that particular example can
be sped up by caching in a hash once hashes are available.)

--Brent Dax
[EMAIL PROTECTED]
Configure pumpking for Perl 6

"Nothing important happened today."
--George III of England's diary entry for 4-Jul-1776




Re: We have PMCs. Time to start work.

2001-11-26 Thread Dan Sugalski

At 01:46 PM 11/23/2001 -0500, brian wheeler wrote:
>On Fri, 2001-11-23 at 13:41, Simon Cozens wrote:
> > On Fri, Nov 23, 2001 at 06:04:29PM +, Simon Cozens wrote:
> > > * Rewrite mops.pasm to use integer PMCs, and compare the speeds.
> >
> > I couldn't wait. :)
> >
> >  % ../../test_prog mops.pbc
> > Iterations:1
> > Estimated ops: 2
> > Elapsed time:  9.948440
> > M op/s:20.103654
> >
> >  % ../../test_prog mops_p.pbc
> > Iterations:1
> > Estimated ops: 2
> > done
> > Elapsed time:  20.994231
> > M op/s:9.526427
> >
>
>I don't get it.  What kind of machine was that on?  Here are my numbers:
>
>
>[bdwheele@thor parrot]$ ./test_prog mops.pbc
>Iterations:1
>Estimated ops: 2
>Elapsed time:  11.016154
>M op/s:18.155156
>[bdwheele@thor parrot]$ ./test_prog mops_p.pbc
>Iterations:1
>Estimated ops: 2
>done
>Elapsed time:  14.288639
>M op/s:13.997134
>
>My I-regs one is slower than yours, and my P-regs one is faster...this
>is on a P4/1.7G under Linux 2.4.15pre8

Welcome to the joys of architecture specific wackiness. :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Did I miss this?!?!

2001-11-26 Thread Dan Sugalski

At 02:54 PM 11/22/2001 -0800, Wizard wrote:
>I was wandering around looking for some non-parrot related stuff, and came
>across this wonderful tidbit. Was this mentioned somewhere in the mail list
>or on perl.com and I missed it?
>http://www.unixreview.com/documents/s=1780/urm0111h/0111h.htm
>
>If it wasn't posted to the prl6-* list, it should be.

I didn't realize that'd hit the web. I'll forward it on appropriately.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




RE: PMC Classes Preprocessor

2001-11-26 Thread Angel Faus


Hi,

Currently pmc2c.pl requires that the user writes methods in .pmc files in
exaclty the same order than in vtable.tbl. That's not a nice thing to do.

The version I am attaching hopefully fixes this, and creates a dummy version
of the method if the user has not provided one.

This allows us to add new methods in vtable.tbl, without worrying about old
..pmc files.

btw, sorry for that silly Text::Balanced dependency. I stupidly assumed it
was a standard module without checking.

Sincerly,

---
Angel Faus
[EMAIL PROTECTED]



pmc2c.pl
Description: Binary data


RE: Benchmarking the proposed RE engine

2001-11-26 Thread Dan Sugalski

At 07:34 PM 11/25/2001 -0800, Brent Dax wrote:
>Dan Sugalski:
># I realize that benchmarking the RE engine's a pain, what with
># no GC so we
># leak until we blow memory and die, but...
>#
># I'd like to take a series of regexes that exercise various bits of the
># perl 5 engine and time them against the equivalent perl 6 RE
># code. I think
># it's important at this point to see where we stand relative to the
># benchmark standard, to see whether this is even worth doing
># from a speed
># standpoint. (I really, really hope so, but...)
>#
># When we've a reasonably comparable engine then I think it'll
># be time to
># write the text->opcode compiler so we can start turning
># scalar variables
># and suchlike things into regexes.
>
>There's a big problem with that: Perl 5's regex compiler, pregcomp.
>
>Perl 5's REs will always appear faster because Perl 5 has an
>intelligent, optimizing regex compiler.

I'm reasonably certain you can beat perl 5's regex compiler if you try. :)

Seriously, the benchmarks I want first are for a hand-rolled version of the 
regex in the proposed parrot re ops. Feel free to optimize as you see fit 
for this as long as the optimizations are something a reasonable optimizer 
would make.

What I really want to know is whether using ops to do regexes is feasable. 
After that we'll work on programmatically generating the op stream.


Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: [Proposal] Regex documentation

2001-11-26 Thread David M. Lloyd

On Sat, 24 Nov 2001, Bryan C.Warnock wrote:

> On a side note, do we want to be positive-oriented or
> negative-oriented? (Do we want to succeed as fast as possible, or fail
> as fast as possible?  I'm thinking fail, since you're probably more
> likely to do that.)  If you weight each kind of regex op by cost, and
> them run them in cost order (as possible), you may get better overall
> performance, even with the overhead.

Maybe let the user decide, if that's possible... the user will know
whether the regex is likely to fail or succeed.  Make the default fail.

- D

<[EMAIL PROTECTED]>




Re: Benchmarking the proposed RE engine

2001-11-26 Thread Bart Lateur

On Sun, 25 Nov 2001 19:34:15 -0800, Brent Dax wrote:

>Perl 5's REs will always appear faster because Perl 5 has an
>intelligent, optimizing regex compiler.  For example, take the following
>simple regex:
>
>   /a+bc+/
>
>pregcomp will optimize that by searching for a 'b' and working outwards
>both ways from there.  (Actually, it might search for 'abc' and work
>from there; I'm not really sure.)  Without considering pregcomp's
>optimizations, that RE is pretty easy to write in Parrot:
>
>RE_0:
>   #/a+bc+/
>   rx_minlength 3
>   branch $start
>$advance:
>   rx_advance $fail
>$start:
>   rx_literal P0, "a", $advance
>$a_loop:
>   rx_literal P0, "a", $b
>   rx_pushindex P0
>   branch $a_loop
>$a_back:
>   rx_popindex P0, $advance
>$b:
>   rx_literal P0, "bc", $a_back
>$c_loop:
>   rx_literal P0, "c", $succeed
>   branch $c_loop
>$succeed:
>   rx_succeed P0
>$fail:
>   rx_fail P0

Before we go down that road for good, may I draw your attention to the
principle of a regex matcher that appeared in an article in DDJ a few
years ago? The name was "Grouse Grep", it has a website
(), but I think (I'm not absolutely
sure) that the latest version has stepped away from the principle that
made the original interesting.

That principle was, in a nutshell: implement each character class as a
jump table.

I guess that is unfeasable for Unicode, but for 8 bit characters it's
easy to do.

What you need is a table of 256 entries for each state, and using the
next character in the input stream, you simply look up the address of
the next state in the lookup table. Yes, that means that you *always*
use character classes, even if just to match one literal character.

The original used i386 machine code to do it, (I think it was a
combination of LODSB to fetch the byte, and XLAT to look up the lowest
byte of the address in the jump table), but I would think that jumping
through a lookup table should be fairly easy to implement on top of
Parrot VM instructions.

The DDJ article appeared in november 1997, but it looks like it's not
online. (The table of contents for that issue is at
)

(Grouse Grep 2 appears to be released under the GNU license, but I
wouldn't *use* the code, only re-implement the idea.)

-- 
Bart.



Re: PMC Classes Preprocessor

2001-11-26 Thread Simon Cozens

On Sun, Nov 25, 2001 at 04:39:40PM +0100, Bart Lateur wrote:
> I think that this may come close:
 
I think it does. Thanks very much, applied.

-- 
"You can have my Unix system when you pry it from my cold, dead fingers."