RE: [ID 20020130.001] Unicode broken for 0x10FFFF

2002-01-30 Thread Brent Dax

Larry Wall:
# For various reasons, some of which relate to the sequence-of-integer
# abstraction, and some of which relate to "infinite" strings
# and arrays,
# I think Perl 6 strings are likely to be represented by a list of
# chunks, where each chunk is a sequence of integers of the same size or
# representation, but different chunks can have different integer sizes
# or representations.  The abstract string interface must hide this from
# any module that wishes to work at the abstract string level.  In
# particular, it must hide this from the regex engine, which works on
# pure sequences in the abstract.
#
# Note that I did not use the phrase "pure sequences of integers" in the
# last sentence.  The regex engine must not care if it is matching
# characters from a string of known length, or tokens objects from an
# array that is being grown arbitrarily on demand.  Matching on UTF-32
# is not good enough.
#
# This is just a heads up for some of the stuff in Apocalypse 5.
# Backtracking behavior will not necessarily be limited to regexes in
# Perl 6, and if so, we have to consider very carefully how regex
# backtracking, continuations, and temp variable unifications all work
# together.  (This is part of the reason I pushed earlier for the regex
# opcodes to be meshed with the normal opcodes.)
#
# I seriously intend that it be trivial to write a Perl parser (or any
# other parser) in Perl, and that changing a grammar rule be as
# simple as
# swapping in a different qr// (or a sub equivalent to a qr//).  More
# generally, I want logic programming to be one of the paradigms that
# Perl supports.  And as usual, I want to support it without forcing it
# on people who aren't interested.

As the regex guy for Parrot, my first response to this sounded something
like "oh, crap".  This'll be hard to make efficient, hard to implement
for all cases, and all that.  But as I thought about it more, I realized
that there's a fairly easy way to do this.

The first thing is to make sure that, at the Parrot level, "$left =~
$right" calls $right->vtable->match, not $left.  The second thing is to
make sure that =~ on characters (or character streams) is the same as
"eq"--character-set-independent comparison.

Once that's done, it's quite easy.

A regex becomes a series of =~ operations.  For example, let's say @toke
contains a series of tokens:

@toke=(... new Perl6::Toke::Term(), new Perl6::Toke::Operator::Plus(),
new Perl6::Toke::Term() ...);

Now, assume that \t{Foo} in a regex is like $curitem =~
Perl6::Toke::Foo.  (I assume Larry will come up with a more general
mechanism, but you get the idea.)

Finally, assume =~ on classes is an ISA search.

Now, to find the first addition operation in the given token stream, you
just do something like this:

@toke =~ m<\t{Value}\t{Operator::Plus}\t{Value}>;

To find the first unary plus operator:

@toke =~ m<(?;  #or something like that

To compress all value/addition-precedence-operator/value sequences into
value tokens:

@toke =~ s<
\t{Value}
[
\t{Operator::Plus}
\t{Operator::Minus}
\t{Operator::Underscore}
]
\t{Value}
><
new Perl6::Toke::Value($&)
>eg;

Now, check this one out:

$unop=qr[(?, qr<
\t{Value}
\t{Operator::StarStar}
\t{Value}
>r, qr<
$unop
[
\t{Operator::Exclamation}
\t{Operator::Tilde}
\t{Operator::Backslash}
\t{Operator::Plus}
\t{Operator::Minus}
]
\t{Value}
>, qr<
\t{Value}
[
\t{Operator::EqualsTilde}
\t{Operator::ExclamationTilde}
]
\t{Value}
>, qr<
\t{Value}
[
\t{Operator::Star}
\t{Operator::Slash}
\t{Operator::Percent}
\t{Operator::X}
]
\t{Value}
>, qr<
\t{Value}
[
\t{Operator::Plus}
\t{Operator::Minus}
\t{Operator::Underscore}
]
\t{Value}
>, ...
);

($top)=map { @toke =~ s/$_/new Perl6::Toke::Value($&)/e } @rules;

For those who can't see what that is (and I don't blame you if

Re: Jit on Solaris: using dis instead of objdump?

2002-01-30 Thread Jason Gloudon

On Wed, Jan 30, 2002 at 03:27:18PM -0500, Andy Dougherty wrote:
> On Solaris, it looks like JIT will now be enabled if the user has also
> installed GNU objdump.  However, there is (often) already a disassembler
> in /usr/ccs/bin/dis.  It's output is similar, but not identical to,
> objdump.  Is anyone with a Solaris system familiar enough with jit
> internals to have a go at adapting it to use dis instead of GNU objdump?

The difference was pretty minimal. It should work with 'dis'.

-- 
Jason



Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock

On Wednesday 30 January 2002 21:42, Dan Sugalski wrote:
> I think we may want trees as a fundamental data type at some point...

I wonder about the trees

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: parrot rx engine

2002-01-30 Thread Dan Sugalski

At 6:28 PM -0800 1/30/02, Steve Fink wrote:
>I'm sure in Apoc 5 Larry's going to go way beyond that and embed full
>parsers, not just regularish language matchers, but the above is
>easier to grasp.

Odds are, yes. And don't be surprised if the RE engine's required to 
return data structures as well. (Nested parens returns you a tree 
struct, for example)

I think we may want trees as a fundamental data type at some point...
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: parrot rx engine

2002-01-30 Thread Steve Fink

On Wed, Jan 30, 2002 at 08:37:30PM -0500, Bryan C. Warnock wrote:
> "But if you know they're going to be twenty times slower, why are you doing 
> it?"  Because we know / think / hope / pray / have been making sacrifices to 

Tangential note: current benchmarking indicates that we're doing a lot
better than this. Two times slower is the right ballpark, and in cases
where we can tell the re compiler that we only need restricted
information out of the match, something like 20%. But that's based on
a probably nonrepresentative example and a duck walked within four
miles of my computer while I ran the test, so my numbers are about as
meaningful as the points on Whose Line Is It Anyway.

Somebody's been watching too much TV.

And on another tangent (tangential to this thread, not the list),
here's some motivation for an op-based regex engine: array matching.

if (@ARGV =~ regex('-o', '(', '.', ')')) { # pardon the syntax
$output = $1;
}

Maybe that would look better as regex/-o (.)/ ?

With an opcode-based engine, you get to reuse all the mechanics of
* + ? *? (?>) etc., and just add a new match_list_elt op. Which could
itself invoke a parrot subroutine so we're not restricted to element
equality matching as I implied in my above example.

I'm sure in Apoc 5 Larry's going to go way beyond that and embed full
parsers, not just regularish language matchers, but the above is
easier to grasp.



Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock

On Wednesday 30 January 2002 11:13, Ashley Winters wrote:
> First, we set the rx engine to case-insensitive. Why is that bad? It's
> setting a runtime property for what should be compile-time

{snip}

> Now, the current CVS rx engine is/would do this at runtime.

We're also currently a compiler short.  

> What I see is that rx_literal is a speed hack to avoid compiling this
> into parrot code:

{excellent example of a pure_parrot regex engine snipped}

Is something *wrong* with speed hacks? 

When we talk about wanting to make Parrot blazingly, blindly fast, we're 
talking in relative terms.  The mechanics of interpretation - sans JIT - 
pretty much restrict you to racing in the 125cc class.  You may blow away 
every other bike in the class, but good luck going up against some 800cc 
monster.  That's why these virtual machines aren't very RISCish.  There's 
entirely too much stuff that has to be done that is unrelated to what you're 
actually trying to do - the more you can stuff into an op, the faster you 
will be.

Yes, that means that the fastest regex engine would probably be, yes, one op.
match.  The rationale for *not* doing that (yet) is a design choice - we 
want regex ops - in some form - to be first-class citizens of Parrot 
opcodes.  But in truth, if Parrot can be four times as fast as Perl 5 - 
currently, from an op dispatch perspective, it's a measly two; from a 
functionality perspective, it's much greater, but then again, we don't have 
all the functionality - would you be content in having your regexes run 
twenty times slower?

"But if you know they're going to be twenty times slower, why are you doing 
it?"  Because we know / think / hope / pray / have been making sacrifices to 
the gods that we can make up the speed in other ways.  A smarter regex 
engine.  Faster op dispatch.  Pure compilation.  The JIT.

Our target is to match the current speed.  If we can't do that, we'll more 
than likely reduce the number of Parrot ops.  If we blow away previous 
marks; well, then, we can expand.

>
> Am I fool, or an idiot? Discuss.

Overzealous, perhaps.  It'd be nice for Perl 7 to be written in Perl 7, but 
I don't think that's realistic.

>
> Mostly, I'd like to hear how either Unicode character-ranges aren't
> deterministic at compile-time (I doubt that) or how crippling to
> performance this would be (and by implication how slow parrot will be)
> in either time or space.

Literal character classes ([abc]) will most likely be compiled.  
Meta-character classes (\d) may be compiled.  Character ranges ([a-f]) may 
or may not be.  It's hard to say, because we seem to still not be sure what 
any of those mean.  (Particularly when locale comes into play.)  That's not 
a regex issue, it's a Unicode one.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: parrot rx engine

2002-01-30 Thread Bryan C. Warnock

On Wednesday 30 January 2002 12:32, Brent Dax wrote:
> # Mostly, I'd like to hear how either Unicode character-ranges aren't
> # deterministic at compile-time (I doubt that) or how crippling to
>
> One word: locale.

Not that locales couldn't provide pre-compiled character classes.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: Interpreter startup environment

2002-01-30 Thread Melvin Smith

At 06:21 PM 1/30/2002 -0500, Dan Sugalski wrote:
>A quick recap and elaboration for folks following along at home.
>
>On interpreter startup. P0 will hold an Array with ARGV in it if there is 
>one, or NULL if not.
>
>P1 will hold a Hash with %ENV in it if there is one, or NULL if not
>
>P2 (this is the new bit) will hold an Array of three elements 
>corresponding to stdin, stdout, and stderr. If NULL the default's used, 
>and if an entry is undef that file's not available.

Sounds good. This jogged my memory, I'm using interp->piodata->table[0-2]
for the standard handles, but thought about caching those directly into the 
interp
struct for cutting out 2 derefs for most common cases? I'll do the mod if 
you say
the word.

-Melvin




Re: New PMC vtable methods

2002-01-30 Thread Dan Sugalski

At 11:40 PM + 1/30/02, Nicholas Clark wrote:
>On Wed, Jan 30, 2002 at 06:23:43PM -0500, Dan Sugalski wrote:
>>  We're adding the following:
>>
>> INTVAL get_character(PMC *, INTVAL)
>> INTVAL get_character(PMC *, KEY *, INTVAL)
>>
>>  to return the character at position INTVAL in the passed in PMC.
>
>are characters really INTVAL? I have this gut feeling that they ought to
>be UINTVAL. At least, that's my personal world view.

That lets us reserve the negative characters for error conditions. 31 
bits should be enough for a long time, even for Unicode.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]

2002-01-30 Thread Steve Fink

On Wed, Jan 30, 2002 at 11:20:28PM +, Nicholas Clark wrote:
> On Wed, Jan 30, 2002 at 02:55:54PM -0800, Steve Fink wrote:
> 
> Ah. This suggests that this bit of the proposed MANIFEST.SKIP:
> 
> ...
> 
> becomes
> 
> ^classes/.*\.[ch]$
> 
> Is it valid to assume that the only .h and .c files in classes/ are
> autogenerated?

Well, if it isn't, then classes/.cvsignore should be told.



Re: New PMC vtable methods

2002-01-30 Thread Nicholas Clark

On Wed, Jan 30, 2002 at 06:23:43PM -0500, Dan Sugalski wrote:
> We're adding the following:
> 
>INTVAL get_character(PMC *, INTVAL)
>INTVAL get_character(PMC *, KEY *, INTVAL)
> 
> to return the character at position INTVAL in the passed in PMC.

are characters really INTVAL? I have this gut feeling that they ought to
be UINTVAL. At least, that's my personal world view.

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html



Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]

2002-01-30 Thread Nicholas Clark

On Wed, Jan 30, 2002 at 02:55:54PM -0800, Steve Fink wrote:
> On Wed, Jan 30, 2002 at 09:32:45PM +, Nicholas Clark wrote:
> > You can now do:
> > 
> > nick@thinking-cap maniskip$ make manitest
> > perl14405-32 -MExtUtils::Manifest=fullcheck -e fullcheck
> > Not in MANIFEST: Configure.pl.rej
> > Not in MANIFEST: MANIFEST.SKIP.orig
> > Not in MANIFEST: MANIFEST.SKIP~
> > Not in MANIFEST: MANIFEST.orig
> > Not in MANIFEST: Makefile.in.orig
> > Not in MANIFEST: Makefile.in~
> > Not in MANIFEST: classes/array.c
> > Not in MANIFEST: classes/array.h
> > Not in MANIFEST: docs/embed.pod
> > Not in MANIFEST: docs/io_ops.pod
> > Not in MANIFEST: newpatch
> > Not in MANIFEST: patch
> > 
> > Should I add classes/array.c classes/array.h docs/embed.pod docs/io_ops.pod
> > to MANIFEST?
> 
> *.pod, yes. classes/*.[ch], no. Autogenerated.

Ah. This suggests that this bit of the proposed MANIFEST.SKIP:

^classes/default\.h$
^classes/default\.c$
^classes/intqueue\.h$
^classes/intqueue\.c$
^classes/parrotpointer\.h$
^classes/parrotpointer\.c$
^classes/perlarray\.h$
^classes/perlarray\.c$
^classes/perlhash\.h$
^classes/perlhash\.c$
^classes/perlint\.h$
^classes/perlint\.c$
^classes/perlnum\.h$
^classes/perlnum\.c$
^classes/perlstring\.h$
^classes/perlstring\.c$
^classes/perlundef\.h$
^classes/perlundef\.c$

becomes

^classes/.*\.[ch]$

Is it valid to assume that the only .h and .c files in classes/ are
autogenerated?

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html



Re: New Todo

2002-01-30 Thread Steve Fink

On Wed, Jan 30, 2002 at 10:01:50PM +, Simon Cozens wrote:
> begin quote from Steve Fink:
> > Perhaps a target version for each item?
> 
> Oh, bother. This is the second time I've been asked about this, so I
> suspect that my goals for the forthcoming releases aren't amazingly
> clear.

Or perhaps people lose track of your last pronouncement over time.
Rather than periodically resending it to the list, might I suggest
checking it into CVS? TODO sounds like a nice filename for it. :-)



[COMMIT] infer possible control flow changes

2002-01-30 Thread Steve Fink

I just committed a patch to jit2h.pl, Op.pm, and OpsFile.pm that
infers what ops may modify control flow, used by the jit to decide
whether to fall through to the next op or jump. (Daniel Grunblatt is
ok with it.) Just FYI.



New PMC vtable methods

2002-01-30 Thread Dan Sugalski

We're adding the following:

INTVAL get_character(PMC *, INTVAL)
INTVAL get_character(PMC *, KEY *, INTVAL)

to return the character at position INTVAL in the passed in PMC.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Interpreter startup environment

2002-01-30 Thread Dan Sugalski

A quick recap and elaboration for folks following along at home.

On interpreter startup. P0 will hold an Array with ARGV in it if 
there is one, or NULL if not.

P1 will hold a Hash with %ENV in it if there is one, or NULL if not

P2 (this is the new bit) will hold an Array of three elements 
corresponding to stdin, stdout, and stderr. If NULL the default's 
used, and if an entry is undef that file's not available.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [Patch] manually disabling jit compilation, testing forcranky jit-unfriendly compilers [APPLIED]

2002-01-30 Thread Dan Sugalski

At 4:28 PM -0500 1/30/02, Josh Wilmes wrote:
>This patch allows parrot to mostly-build with tcc.  It allows one to skip
>compiling the JIT stuff (by specifying --define jitcapable=0), and it
>introduces a test program which gives a friendlier error in this case for
>compilers which are as picky as tcc is about function pointer conversion.
>
>If anyone figures out the proper way to cast these function pointers this
>may not be necessary.

Applied, thanks.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]

2002-01-30 Thread Steve Fink

On Wed, Jan 30, 2002 at 09:32:45PM +, Nicholas Clark wrote:
> You can now do:
> 
> nick@thinking-cap maniskip$ make manitest
> perl14405-32 -MExtUtils::Manifest=fullcheck -e fullcheck
> Not in MANIFEST: Configure.pl.rej
> Not in MANIFEST: MANIFEST.SKIP.orig
> Not in MANIFEST: MANIFEST.SKIP~
> Not in MANIFEST: MANIFEST.orig
> Not in MANIFEST: Makefile.in.orig
> Not in MANIFEST: Makefile.in~
> Not in MANIFEST: classes/array.c
> Not in MANIFEST: classes/array.h
> Not in MANIFEST: docs/embed.pod
> Not in MANIFEST: docs/io_ops.pod
> Not in MANIFEST: newpatch
> Not in MANIFEST: patch
> 
> Should I add classes/array.c classes/array.h docs/embed.pod docs/io_ops.pod
> to MANIFEST?

*.pod, yes. classes/*.[ch], no. Autogenerated.



Re: New Todo

2002-01-30 Thread Simon Cozens

begin quote from Steve Fink:
> Perhaps a target version for each item?

Oh, bother. This is the second time I've been asked about this, so I
suspect that my goals for the forthcoming releases aren't amazingly
clear.

Here is the Grand Pronouncement!

0.0.4 WILL HAPPEN WHEN we have decent keyed aggregate support, and
support for other string encodings to the same level as ASCII. (This may
have to slip, since it's a biggy.)

0.0.5 WILL HAPPEN WHEN we have symbol table and heap access. This also
includes storage of subroutines in the symbol table, which in turn
requires the serialisation of PMCs to the bytecode constant table.

0.0.6 WILL HAPPEN WHEN we have Really Good GC support.

0.1.0 WILL HAPPEN WHEN we have an implementation of one reasonably
well-known high-level programming language. That's to say, Perl, Python,
Scheme, etc. Parsing does not need to be included, nor a full "library";
just running bytecode is fine.

Given that the Unicode requirement may slip, expect 0.0.4 relatively
soon.
-- 
buf[hdr[0]] = 0;/* unbelievably lazy ken (twit) */  - Andrew Hume



Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]

2002-01-30 Thread Nicholas Clark

On Mon, Jan 28, 2002 at 08:13:11PM +, Nicholas Clark wrote:
> Is a MANIFEST.SKIP a good idea, even if Configure.pl doesn't check it by
> default?

Revised patch. Any objections?
[Either express objections or remove my commit privs else it goes in in 24
hours :-)]

You can now do:

nick@thinking-cap maniskip$ make manitest
perl14405-32 -MExtUtils::Manifest=fullcheck -e fullcheck
Not in MANIFEST: Configure.pl.rej
Not in MANIFEST: MANIFEST.SKIP.orig
Not in MANIFEST: MANIFEST.SKIP~
Not in MANIFEST: MANIFEST.orig
Not in MANIFEST: Makefile.in.orig
Not in MANIFEST: Makefile.in~
Not in MANIFEST: classes/array.c
Not in MANIFEST: classes/array.h
Not in MANIFEST: docs/embed.pod
Not in MANIFEST: docs/io_ops.pod
Not in MANIFEST: newpatch
Not in MANIFEST: patch

Should I add classes/array.c classes/array.h docs/embed.pod docs/io_ops.pod
to MANIFEST?

Nicholas Clark

--- MANIFEST.orig   Wed Jan 30 17:33:48 2002
+++ MANIFESTWed Jan 30 21:01:47 2002
@@ -4,6 +4,7 @@
 KNOWN_ISSUES
 LICENSES/Artistic
 MANIFEST
+MANIFEST.SKIP
 Makefile.in
 NEWS
 README
--- Makefile.in.origWed Jan 30 10:31:28 2002
+++ Makefile.in Wed Jan 30 21:10:17 2002
@@ -408,6 +408,8 @@
 reconfig:
$(MAKE) clean; $(PERL) Configure.pl --reconfig
 
+manitest:
+   $(PERL) -MExtUtils::Manifest=fullcheck -e fullcheck
 
 ###
 #
--- /dev/null   Wed Jan 30 19:14:25 2002
+++ MANIFEST.SKIP   Wed Jan 30 21:05:44 2002
@@ -0,0 +1,53 @@
+\.o$
+^\.cvsignore$
+/\.cvsignore$
+CVS/[^/]+$
+^include/parrot/config\.h$
+^include/parrot/platform\.h$
+^Makefile$
+/Makefile$
+^lib/Parrot/Types\.pm$
+^lib/Parrot/Config\.pm$
+^platform\.c$
+^config.opt$
+
+^vtable\.ops$
+^include/parrot/vtable\.h$
+^include/parrot/jit_struct\.h$
+^include/parrot/oplib/core_ops\.h$
+^include/parrot/oplib/core_ops_prederef\.h$
+
+^core_ops\.c$
+^core_ops_prederef\.c$
+^vtable_ops\.c$
+
+^lib/Parrot/Jit\.pm$
+^lib/Parrot/PMC\.pm$
+^lib/Parrot/OpLib/core\.pm$
+
+^classes/default\.h$
+^classes/default\.c$
+^classes/intqueue\.h$
+^classes/intqueue\.c$
+^classes/parrotpointer\.h$
+^classes/parrotpointer\.c$
+^classes/perlarray\.h$
+^classes/perlarray\.c$
+^classes/perlhash\.h$
+^classes/perlhash\.c$
+^classes/perlint\.h$
+^classes/perlint\.c$
+^classes/perlnum\.h$
+^classes/perlnum\.c$
+^classes/perlstring\.h$
+^classes/perlstring\.c$
+^classes/perlundef\.h$
+^classes/perlundef\.c$
+
+^docs/packfile-c\.pod$
+^docs/packfile-perl\.pod$
+^docs/core_ops\.pod$
+
+^test_parrot$
+^pdump$
+^blib/



[Patch] manually disabling jit compilation, testing for cranky jit-unfriendly compilers

2002-01-30 Thread Josh Wilmes


This patch allows parrot to mostly-build with tcc.  It allows one to skip 
compiling the JIT stuff (by specifying --define jitcapable=0), and it 
introduces a test program which gives a friendlier error in this case for
compilers which are as picky as tcc is about function pointer conversion.

If anyone figures out the proper way to cast these function pointers this 
may not be necessary.

--Josh

-- 
Josh Wilmes  ([EMAIL PROTECTED]) | http://www.hitchhiker.org




Index: Configure.pl
===
RCS file: /home/perlcvs/parrot/Configure.pl,v
retrieving revision 1.87
diff -u -r1.87 Configure.pl
--- Configure.pl30 Jan 2002 04:20:37 -  1.87
+++ Configure.pl30 Jan 2002 21:25:12 -
@@ -162,6 +162,8 @@
 }
 }
 
+$jitcapable = $opt_defines{jitcapable} if exists $opt_defines{jitcapable};
+
 unless($jitcapable){
 $jitarchname = 'i386-nojit';
 }
@@ -262,6 +264,11 @@
 
 my $ccname = $Config{ccname} || $Config{cc};
 
+# Make one more check before allowing the use of the JIT code.
+# make sure that their choice of compiler and cflags will allow our JIT's
+# non-ansi use of function pointers.
+#
+
 # Add the -DHAS_JIT if we're jitcapable
 if ($jitcapable) {
 $c{cc_hasjit} = " -DHAS_JIT -D" . uc $jitcpuarch;
@@ -348,8 +355,8 @@
 my %gnuc;
 
 compiletestc("test_gnuc");
-%gnuc=eval(runtestc()) or die "Can't run the test program: $!";
-unlink("test_siz$c{exe}", "test$c{o}");
+%gnuc=eval(runtestc("test_gnuc")) or die "Can't run the test program: $!";
+cleantestc("test_gnuc");
 
 unless (exists $gnuc{__GNUC__}) {
 print <<'END';
@@ -490,12 +497,13 @@
 my %newc;
 
 buildfile("test_c");
-compiletestc();
-%newc=eval(runtestc()) or die "Can't run the test program: $!";
+compiletestc("test");
+%newc=eval(runtestc("test")) or die "Can't run the test program: $!";
 
 @c{keys %newc}=values %newc;
 
-unlink('test.c', "test_siz$c{exe}", "test$c{o}");
+cleantestc("test");
+unlink('test.c');
 }
 
 print <<"END";
@@ -611,6 +619,26 @@
 buildfile("Types_pm", "lib/Parrot");
 
 buildconfigpm();
+print "\n";
+
+
+if ($jitcapable) {
+print "Verifying that the compiler supports function pointer casts...\n";
+eval { compiletestc("testparrotfuncptr"); };
+
+if ($@ || !(runtestc("testparrotfuncptr") =~ /OK/)) {
+print "Although it is not required by the ANSI C standard,\n";
+print "Parrot requires the ability to cast from void pointers to function\n";
+print "pointers for its JIT support.\n\n";
+print "Your compiler does not appear to support this behavior with the\n";
+print "flags you have specified.  You must adjust your settings in order\n";
+   print "to use the JIT code.\n\n";
+print "If you wish to continue without JIT support, please re-run this 
+script\n";
+   print "With the '--define jitcapable=0' argument.\n";
+   exit(-1);
+}
+cleantestc("testparrotfuncptr");
+}
 
 
 #
@@ -632,13 +660,14 @@
 close NEEDED;
 buildfile("testparrotsizes_c");
 compiletestc("testparrotsizes");
-%newc=eval(runtestc()) or die "Can't run the test program: $!";
-
+%newc=eval(runtestc("testparrotsizes"))
+  or die "Can't run the test program: $!";
 @c{keys %newc}=values %newc;
 
 @c{qw(stacklow intlow numlow strlow pmclow)} = lowbitmask(@c{qw(stackchunk 
iregchunk nregchunk sregchunk pregchunk)});
 
-unlink('testparrotsizes.c', "test_siz$c{exe}", "test$c{o}");
+cleantestc("testparrotsizes");
+unlink('testparrotsizes.c');
 unlink("include/parrot/vtable.h");
 }
 
@@ -846,10 +875,13 @@
 #
 
 sub compiletestc {
-my $name;
-$name = shift;
-$name = "test" unless $name;
-system("$c{cc} $c{ccflags} -I./include $c{cc_exe_out}test_siz$c{exe} $name.c 
$c{cc_ldflags} $c{ldflags} $c{libs}") and die "C compiler died!";
+my ($name) = @_;
+
+my $cmd = "$c{cc} $c{ccflags} -I./include -c $c{ld_out} $name$c{o} $name.c";
+system($cmd) and die "C compiler died!  Command was '$cmd'\n";
+
+$cmd = "$c{ld} $c{ldflags} $c{libs} $name$c{o} $c{cc_exe_out}$name$c{exe}";
+system($cmd) and die "Linker died!  Command was '$cmd'\n";
 }
 
 
@@ -858,9 +890,21 @@
 #
 
 sub runtestc {
-`./test_siz$c{exe}`
+my ($name) = @_;
+
+my $cmd = "$name$c{exe}";
+`./$cmd`;
 }
 
+#
+# cleantestc
+#
+
+sub cleantestc {
+my ($name) = @_;
+
+unlink("$name$c{o}", "$name$c{exe}");
+}
 
 #
 # lowbitmas()
--- /dev/null   Sat Jul 14 02:37:41 2001
+++ testparrotfuncptr.c Wed Jan  9 02:52:47 2002
@@ -0,0 +1,30 @@
+/*
+ * testparrotfuncptr.c - figure out if the compiler will let us do
+ *   non-ansi function pointer casts.
+ */
+
+#include 
+
+int a_function(int some_number) {
+   if (some_number == 42) {
+  printf("OK\n");
+  return 0;
+   } else {
+  printf("FAIL\n");
+  return -1;
+   }
+}
+
+typedef int (*func_t)(int);
+
+in

Re: [PATCH] POST_MORTERM, running.pod [APPLIED]

2002-01-30 Thread Dan Sugalski

At 12:36 PM -0800 1/30/02, Steve Fink wrote:
>I'm being anal again. Here's an update to docs/running.pod to better
>reflect the current state (both the test_parrot and assemble.pl
>improvements, plus documentation of a few more things.) And also a
>speling fiks s/POST_MORTERM/POST_MORTEM/.
>
>I could also replace some "perl foo" calls with "./foo" if someone
>wanted to set the executable flag in CVS on assemble.pl, optimize.pl,
>etc.

Applied with some chagrin. Thanks.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: Jit on Solaris: using dis instead of objdump?

2002-01-30 Thread Dan Sugalski

At 3:27 PM -0500 1/30/02, Andy Dougherty wrote:
>On Solaris, it looks like JIT will now be enabled if the user has also
>installed GNU objdump.  However, there is (often) already a disassembler
>in /usr/ccs/bin/dis.  It's output is similar, but not identical to,
>objdump.  Is anyone with a Solaris system familiar enough with jit
>internals to have a go at adapting it to use dis instead of GNU objdump?

Apparently in progress even as we type.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: New Todo

2002-01-30 Thread Steve Fink

On Wed, Jan 30, 2002 at 08:39:17PM +, Alex Gough wrote:
> On Wed, 30 Jan 2002, Steve Fink wrote:
> 
> > Any idea what of this will become 0.0.4?
> 
> Is there any chance someone (simon) could make a TODO_FIRST [1], which contains
> the goals for our next point release.  I'm far too lazy to search through
> mailing list archives to find it every time I want it.
> 
> [1] or TODO_NOW or TO_REALLY_DO or JFDI or something.

Perhaps a target version for each item?

[0.0.4] collision resolution in hashtables
[0.0.5] PMC attributes
[future] translate Larry's brain to parrot opcodes



Re: New Todo

2002-01-30 Thread Alex Gough

On Wed, 30 Jan 2002, Steve Fink wrote:

> Any idea what of this will become 0.0.4?

Is there any chance someone (simon) could make a TODO_FIRST [1], which contains
the goals for our next point release.  I'm far too lazy to search through
mailing list archives to find it every time I want it.

[1] or TODO_NOW or TO_REALLY_DO or JFDI or something.

Alex Gough




[PATCH] POST_MORTERM, running.pod

2002-01-30 Thread Steve Fink

I'm being anal again. Here's an update to docs/running.pod to better
reflect the current state (both the test_parrot and assemble.pl
improvements, plus documentation of a few more things.) And also a
speling fiks s/POST_MORTERM/POST_MORTEM/.

I could also replace some "perl foo" calls with "./foo" if someone
wanted to set the executable flag in CVS on assemble.pl, optimize.pl,
etc.

Index: docs/running.pod
===
RCS file: /home/perlcvs/parrot/docs/running.pod,v
retrieving revision 1.2
diff -p -u -b -r1.2 running.pod
--- docs/running.pod22 Jan 2002 23:57:15 -  1.2
+++ docs/running.pod30 Jan 2002 20:31:32 -
@@ -11,10 +11,10 @@ them and modify this document accordingl
 
 Converts a Parrot Assembly file to Parrot ByteCode.
 
-  assemble.pl foo.pasm > foo.pbc
+  perl assemble.pl foo.pasm > foo.pbc
 
-Usage information: no usage message available. There is some amount of
-malformed POD visible by running C.
+Usage information: C. Detailed documentation on the
+underlying module can be read with C.
 
 =item C
 
@@ -52,7 +52,8 @@ Usage information: none available.
 
 =item C
 
-Does something. Use it by running
+Causes a segmentation fault and dumps core. (Not its intention,
+despite the name!) Use it by running
 
   something
 
@@ -64,7 +65,7 @@ Converts a bytecode file to a native .c 
 
   perl pbc2c.pl foo.pbc > foo.c
 
-Usage information: C
+Usage information (and malformed pod error message): C
 
 No documentation is available for compiling the .c file to a binary.
 This works, but produces a binary that crashes:
@@ -72,5 +73,26 @@ This works, but produces a binary that c
   ./assemble.pl examples/assembly/life.pasm > life.pbc
   perl pbc2c.pl life.pbc > life.c
   ls **/*.o | egrep -v 'pdump|test_main' | xargs gcc -Iinclude -o life life.c -lm -ldl
+
+=item B
+
+C will compile anything that needs to be compiled and run
+all standard regression tests. To look at a test more closely, run the
+appropriate test file in the t/ directory:
+
+  perl -Ilib t/op/basic.t
+
+To keep a copy of all of the test C<.pasm> and C<.pbc> files
+generated, set the environment variable POST_MORTEM to 1:
+
+  POSTMORTEM=1 perl -Ilib t/op/basic.t 
+  ls t/op/basic*
+
+To run tests with a different dispatcher, edit
+C<$Parrot::Config::PConfig{test_prog}> in lib/Parrot/Config.pm:
+
+   'test_prog' => 'test_parrot -P',
+
+and then use any of the above methods for running tests.
 
 =back
Index: lib/Parrot/Test.pm
===
RCS file: /home/perlcvs/parrot/lib/Parrot/Test.pm,v
retrieving revision 1.13
diff -p -u -b -r1.13 Test.pm
--- lib/Parrot/Test.pm  30 Jan 2002 11:42:44 -  1.13
+++ lib/Parrot/Test.pm  30 Jan 2002 20:31:32 -
@@ -81,7 +81,7 @@ foreach my $func ( keys %Test_Map ) {
 my $meth = $Test_Map{$func};
 my $pass = $Builder->$meth( $prog_output, $output, $desc );
 
-unless($ENV{POSTMORTERM}) {
+unless($ENV{POSTMORTEM}) {
   foreach my $i ( $as_f, $by_f, $out_f ) {
 unlink $i;
   }



Jit on Solaris: using dis instead of objdump?

2002-01-30 Thread Andy Dougherty

On Solaris, it looks like JIT will now be enabled if the user has also
installed GNU objdump.  However, there is (often) already a disassembler
in /usr/ccs/bin/dis.  It's output is similar, but not identical to,
objdump.  Is anyone with a Solaris system familiar enough with jit
internals to have a go at adapting it to use dis instead of GNU objdump?

-- 
Andy Dougherty  [EMAIL PROTECTED]




Re: New Todo

2002-01-30 Thread Steve Fink

Any idea what of this will become 0.0.4?



Re: [ID 20020130.001] Unicode broken for 0x10FFFF

2002-01-30 Thread Larry Wall

Jarkko Hietaniemi writes:
: > What I notice, though, is that the current code does not warn for
: > characters beyond 0x10, which is definitely a bug.
: 
: Ahh, it's all coming back now... warning about such characters
: causes pain in the complementing tr///... have to look at this later.

I think the general policy of Perl should be that it is allowed to
think about bad thoughts, because that is the only way to understand
what's bad about the bad thoughts Perl receives on input.  If there is
to be any self-censorship, it should be on the output, I believe.
That's why they're called "disciplines", after all. :-) So it's fine if
the default output discipline enforces that the internal representation
is transformed to well-formed UTF-8.  It's even okay if the default
input discipline enforces well-formedness, as long as there's a way
to get at the raw badness.

But within Perl, character strings are simply sequences of integers.
The internal representation must be optimized for this concept, not for
any particular Unicode representation, whether UTF-8 or UTF-16 or
UTF-32.  Any of these could be used as underlying representations, but
the abstraction of sequences of integers must be there explicitly in
the internal high-level string API.  To oversimplify, the high-level
API must not have any parameters whose type contains the string "UTF".

In the absence of other type information, these integers are assumed
to be Unicode code points.  Additional strictures are possible and even
useful, but should not be the default (except for certain operations that
are explicitly designed for Unicode.)

For various reasons, some of which relate to the sequence-of-integer
abstraction, and some of which relate to "infinite" strings and arrays,
I think Perl 6 strings are likely to be represented by a list of
chunks, where each chunk is a sequence of integers of the same size or
representation, but different chunks can have different integer sizes
or representations.  The abstract string interface must hide this from
any module that wishes to work at the abstract string level.  In
particular, it must hide this from the regex engine, which works on
pure sequences in the abstract.

Note that I did not use the phrase "pure sequences of integers" in the
last sentence.  The regex engine must not care if it is matching
characters from a string of known length, or tokens objects from an
array that is being grown arbitrarily on demand.  Matching on UTF-32
is not good enough.

This is just a heads up for some of the stuff in Apocalypse 5.
Backtracking behavior will not necessarily be limited to regexes in
Perl 6, and if so, we have to consider very carefully how regex
backtracking, continuations, and temp variable unifications all work
together.  (This is part of the reason I pushed earlier for the regex
opcodes to be meshed with the normal opcodes.)

I seriously intend that it be trivial to write a Perl parser (or any
other parser) in Perl, and that changing a grammar rule be as simple as
swapping in a different qr// (or a sub equivalent to a qr//).  More
generally, I want logic programming to be one of the paradigms that
Perl supports.  And as usual, I want to support it without forcing it
on people who aren't interested.

Sorry I can't be more clear yet.  Story of my life.  That's the basic
problem with the bear-of-very-little-brain approach.  So please "bear"
with me.

[I've cross-posted because of the wide interest, but I don't want to
start a general frenzy cross-posted to all the lists.  Please answer
specific points in separate messages, and please direct each followup
to the appropriate list.  Thanks.]

Larry



Re: [Reposted PATCH] Parrot::Assembler pod clean-up [APPLIED]

2002-01-30 Thread Dan Sugalski

At 6:30 PM + 1/30/02, Simon Glover wrote:
>  I posted this patch a while back, but it seems to have slipped
>  through the cracks. It fixes the POD in Parrot::Assember so that perldoc
>  can read it and just tidies it up generally. It also adds documentation
>  for the constantize_integer and constantize_number functions.

Applied, thanks.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: New Todo

2002-01-30 Thread Dan Sugalski

At 11:06 AM + 1/30/02, Tim Bunce wrote:
>On Wed, Jan 30, 2002 at 07:48:25AM +0100, Paul Johnson wrote:
>>  On Tue, Jan 29, 2002 at 09:57:16PM +, Simon Cozens wrote:
>>
>>  > I've started a new TODO list. Remind me of anything else that needs
>>  > doing;
>>
>>  Sandboxes.
>>
>>  Has anyone given any thought as to whether Parrot should support
>>  "use Safe", and if so, how?
>
>And remember that Safe is built on ops (ops.pm etc) and ops is very
>useful in it's own right (eg for allowing limited perl ops in a config file).

And our safe interpreter will use the same sort of mechanism. (Though 
you won't necessarily be able to do that from within a safe 
interpreter. Some stuff won't be overridable, but that's fine.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



[Reposted PATCH] Parrot::Assembler pod clean-up

2002-01-30 Thread Simon Glover


 I posted this patch a while back, but it seems to have slipped
 through the cracks. It fixes the POD in Parrot::Assember so that perldoc
 can read it and just tidies it up generally. It also adds documentation
 for the constantize_integer and constantize_number functions.

 Simon 

--- lib/Parrot/Assembler.pm.old Wed Jan 30 18:20:46 2002
+++ lib/Parrot/Assembler.pm Wed Jan 30 18:21:51 2002
@@ -67,6 +67,7 @@
 output_listing() if $options{'listing'};
 exit 0;
 
+=cut
 
 ###
 ###
@@ -85,6 +86,7 @@
 my $pf = $asm->assemble($code);
 exit $interp->run($pf);
 
+=cut
 
 ###
 ###
@@ -105,8 +107,8 @@
 
 =head2 %type_to_suffix
 
-type_to_suffix is used to change from an argument type to the suffix that
-would be used in the name of the function that contained that argument.
+This is used to change from an argument type to the suffix that would be 
+used in the name of the function that contained that argument.
 
 =cut
 
@@ -120,26 +122,26 @@
 
 =head2 @program
 
-@program will hold an array ref for each line in the program. Each array ref
-will contain:
+This holds an array ref for each line in the program. Each array ref
+contains: 
 
 =over 4
 
 =item 1
 
-The file name in which the source line was found
+The file name in which the source line was found.
 
 =item 2
 
-The line number in the file of the source line
+The line number in the file of the source line.
 
 =item 3
 
-The chomped source line without beginning and ending spaces
+The chomped source line without beginning and ending spaces.
 
 =item 4
 
-The chomped source line
+The chomped source line.
 
 =back
 
@@ -150,25 +152,17 @@
 
 ###
 
-=head2 $output
-=head2 $listing
-=head2 $bytecode
-
-=over 4
-
-=item $output
-
-will be what is output to the bytecode file.
+=head2 $output 
 
-=item $listing
+What is output to the bytecode file.
 
-will be what is output to the listing file.
+=head2 $listing
 
-=item $bytecode
+What is output to the listing file.
 
-is the program's bytecode (executable instructions).
+=head2 $bytecode
 
-=back
+The program's bytecode (executable instructions).
 
 =cut
 
@@ -177,14 +171,10 @@
 
 ###
 
-=head2 $file
-=head2 $line
-=head2 $pline
-=head2 $sline
-
-$file, $line, $pline, and $sline are used to reference information from the
-@program array.  Please look at the comments for @program for the description
-of each.
+=head2 $file, $line, $pline, $sline
+
+These variables are used to reference information from the C<@program> array.  
+Please look at the comments for C<@program> for the description of each.
 
 =cut
 
@@ -194,41 +184,31 @@
 ###
 
 =head2 %label
-=head2 %fixup
-=head2 %macros
-=head2 %local_label
-=head2 %local_fixup
-=head2 $last_label
-
-=over 4
-
-=item %label
 
-will hold each label and the PC at which it was defined.
+This holds each label and the PC at which it was defined.
 
-=item %fixup
-
-will hold labels that have not yet been defined, where they are used in
-the source code, and the PC at that point. It is used for backpatching.
+=head2 %fixup
 
-=item %macros
+This holds labels that have not yet been defined, the position they are 
+used in the source code, and the PC at that point. It is used for 
+backpatching.
 
-will map a macro name to an array of program lines with the same format
-as @program.
+=head2 %macros
 
-=item %local_label
+This maps a macro name to an array of program lines with the same format
+as C<@program>.
 
-will hold local label definitions,
+=head2 %local_label
 
-=item %local_fixup
+This holds local label definitions.
 
-will hold the occurances of local labels in the source file.
+=head2 %local_fixup
 
-=item $last_label
+This holds the occurrences of local labels in the source file.
 
-is the name of the last label seen
+=head2 $last_label
 
-=back
+This the name of the last label seen.
 
 =cut
 
@@ -238,10 +218,12 @@
 ###
 
 =head2 $pc
+
+This is the current program counter. 
+
 =head2 $op_pc
 
-pc is the current program counter. op_pc is the program counter for the most
-recent operator.
+This is the program counter for the most recent operator.
 
 =cut
 
@@ -251,11 +233,13 @@
 ###
 
 =head2 %constants
+
+This maps the name of each constant to its index in the constant table.
+
 =head2 @constants
 
-%constants is a map of constant name to index in the constant table
-@constant

Re: [PATCH] Post-reorganization clearup [APPLIED]

2002-01-30 Thread Dan Sugalski

At 5:53 PM + 1/30/02, Simon Glover wrote:
>  Many of the Perl scripts in the distribution (including assemble.pl !)
>  can no longer find the Parrot::* modules. Enclosed patch fixes (although
>  it would be nice if there were an easier way to do this).

Applied, thanks.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [PATCH] Post-reorganization clearup

2002-01-30 Thread Simon Glover


 Oops, scratch a couple of those; the pmc2c.pl one's not necessary,
 and I see Daniel's already patched pbc2c.pl

 Simon
 





[PATCH] Post-reorganization clearup

2002-01-30 Thread Simon Glover


 Many of the Perl scripts in the distribution (including assemble.pl !)
 can no longer find the Parrot::* modules. Enclosed patch fixes (although
 it would be nice if there were an easier way to do this).

 Simon

--- languages/scheme/Scheme/Test.pm.old  Wed Jan 30 17:42:08 2002
+++ languages/scheme/Scheme/Test.pm Wed Jan 30 17:42:11 2002
@@ -4,7 +4,7 @@
 
 use strict;
 use vars qw(@EXPORT @ISA);
-use lib '../..';
+use lib '../../lib';
 use Parrot::Config;
 
 require Exporter;

--- classes/pmc2c.pl.oldWed Jan 30 17:35:26 2002
+++ classes/pmc2c.plWed Jan 30 17:35:51 2002
@@ -6,6 +6,7 @@
 #
 
 use FindBin;
+use lib 'lib';
 use lib "$FindBin::Bin/..";
 use lib "$FindBin::Bin/../lib";
 use Parrot::Vtable;

--- classes/genclass.pl.old Wed Jan 30 17:35:19 2002
+++ classes/genclass.pl   Wed Jan 30 17:35:41 2002
@@ -1,6 +1,7 @@
 # $Id: genclass.pl,v 1.7 2002/01/04 16:09:01 dan Exp $
 
 use FindBin;
+use lib 'lib';
 use lib "$FindBin::Bin/..";
 use Parrot::Vtable;
 my %vtbl = parse_vtable("$FindBin::Bin/../vtable.tbl");

--- assemble.pl.old Wed Jan 30 17:09:20 2002
+++ assemble.pl Wed Jan 30 17:14:41 2002
@@ -6,6 +6,7 @@
 #
 
 use strict;
+use lib 'lib';
 use Parrot::Assembler;
 
 init_assembler(@ARGV);

--- disassemble.pl.old  Wed Jan 30 17:09:26 2002
+++ disassemble.pl  Wed Jan 30 17:14:53 2002
@@ -12,7 +12,7 @@
 #
 
 use strict;
-
+use lib 'lib';
 use Parrot::Config;
 
 use Parrot::OpLib::core;

--- optimizer.pl.oldWed Jan 30 17:26:14 2002
+++ optimizer.plWed Jan 30 17:26:17 2002
@@ -1,7 +1,7 @@
 #!/usr/bin/perl -w
 
 use strict;
-use lib '.';
+use lib 'lib';
 use Parrot::Optimizer;
 
 my $file = $ARGV[0];

--- pbc2c.pl.oldWed Jan 30 17:26:57 2002
+++ pbc2c.plWed Jan 30 17:27:02 2002
@@ -12,6 +12,7 @@
 #
 
 use strict;
+use lib 'lib';
 
 use Parrot::Types;
 use Parrot::PackFile;




Re: parrot rx engine

2002-01-30 Thread Graham Barr

On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote:
> # rx_setprops P0, "i", 2
> # branch $start0
> # $advance:
> # rx_advance P0, $fail
> # $start0:
> # rx_literal P0, "a", $advance
> #
> # First, we set the rx engine to case-insensitive. Why is that bad? It's
> # setting a runtime property for what should be compile-time
> # unicode-character-kung-fu. Assuming your "CPU" knows what the gritty
> # details of unicode in the first place just feels wrong, but I digress.
> 
> That "i" does a once-off case-folding operation on the target string.
> All other input to the engine MUST already be case-folded for speed.

Hm, is that going to work ? What about a rx like /^a(?i:b)C/ where the
case insensitivity only applies to part of the pattern ?

> # Mostly, I'd like to hear how either Unicode character-ranges aren't
> # deterministic at compile-time (I doubt that) or how crippling to
> 
> One word: locale.


How did I know you would say that :)

Graham.




Re: parrot rx engine

2002-01-30 Thread Simon Cozens

begin quote from Ashley Winters:
> I think that's exactly what you should be doing! Neither parrot nor the
> rx engine should try to be a full compiler. The rx engine definitely
> should have opcodes in the virtual machine, but those opcodes should
> simply contain state-machine/backtracking info, not godly unicode info.

If you want to hear how much fun there is to be had in bolting on Unicode
semantics from a language level to a regular expression engine that doesn't
have them built-in, I'll buy Jarkko a double whisky and send him in your
direction. I don't think you really want that.

I agree we should have a black-box regular expression engine. I believe,
however, it should conform to Unicode Technical Report #18. Because believe
me, if it doesn't do so out of the box, there's no hope it ever will.

-- 
int three = 128+64, two = 128, one=64;
- plan9 has a bad day



RE: parrot rx engine

2002-01-30 Thread Brent Dax

Ashley Winters:
# Who the hell am I?
# I've been only a weblog-lurker till now. It's been a couple
# years since
# I last contributed to Perl5. I just read the latest Apocalypse and it
# inspired me to get a parrot snapshot and look around.

Welcome back to the land of the living.  :^)

# What's my beef?
# I don't like the rx_literal and rx_oneof ops, and I don't like how the
# "on parrot strings" thread is being related to the regex
# engine. I know
# how useless "wouldn't it be nice" messages are, so please
# understand my
# advocacy on this is more than just raving lunacy.
#
# Basically, I see a black-box being built in the interests of speed.
# Voodoo array formats, bitmaps, and other such things to avoid actually
# spelling out what the regular expression is doing *in parrot code*.
#
# What the hell am I talking about?
# Let me cut&paste some code from the great rx.ops
# documentation (million
# thanks to the authors).

And a million "you're welcome"s.  :^)  (To be fair, japhy and Angel Faus
helped with the design a LOT.)

# rx_setprops P0, "i", 2
# branch $start0
# $advance:
# rx_advance P0, $fail
# $start0:
# rx_literal P0, "a", $advance
#
# First, we set the rx engine to case-insensitive. Why is that bad? It's
# setting a runtime property for what should be compile-time
# unicode-character-kung-fu. Assuming your "CPU" knows what the gritty
# details of unicode in the first place just feels wrong, but I digress.

That "i" does a once-off case-folding operation on the target string.
All other input to the engine MUST already be case-folded for speed.

# Next, a branch. No problem there.
#
# Next, a comparison between the string in P0, and whatever the hell "a"
# means. In this case, it probably means at least:
#
# 0041 LATIN CAPITAL LETTER A
# 0061 LATIN SMALL LETTER A
# FF21 FULLWIDTH LATIN CAPITAL LETTER A
# FF41 FULLWIDTH LATIN SMALL LETTER A

Which have been case-folded and normalized (Normalization Form KC,
probably) to LATIN SMALL LETTER A.

# If you include various diacritic thingies through some voodoo
# /switch-fu
#
# LATIN CAPITAL LETTER A WITH .*
# LATIN SMALL LETTER A WITH .*
# AKA.
# 0041, 0061, 00C0-00C5, 00E0-00E5, 0100-0105,
# 01CD-01E1, 01FA, 01FB, 0200-0203, 0226, 0227,
# 1E00, 1E01, 1E9A, 1EA0-1EB7, and perhaps more.

Once again, the switch-fu plus normalization will have converted all of
that to LATIN SMALL LETTER A.

# Now, the current CVS rx engine is/would do this at runtime. I
# read that
# someone else is working on doing that at compile-time (a
# necessity) and
# caching the results in some data structure. The "some data structure"
# part bothers me. Using Perl to create "some data structure" which is
# needed by C seems dubious at best. Whatever. Moving on...

Why?  If you know what info is contained in the "some data structure"
and it gives you a speedup, who cares?

# What I see is that rx_literal is a speed hack to avoid compiling this
# into parrot code:

That's more or less true.  Speed hacks are extremely important in regex
engines.

# given $a_utf32_code_point {
# when U+41 {}
# when U+61 {}
# when U+C0 <= $_ <= U+C5 {}
# when U+E0 <= $_ <= U+E5 {}
# # . on and on and on .
# default { next ON_SOME_LOOP }
# }
#
# I think that's exactly what you should be doing! Neither
# parrot nor the
# rx engine should try to be a full compiler. The rx engine definitely
# should have opcodes in the virtual machine, but those opcodes should
# simply contain state-machine/backtracking info, not godly
# unicode info.

This "godly Unicode info" is actually limited to "call a transcoding
function, a normalizing function, and a case-folding function if /i is
used".  All of this information is built-in to the string library from
the start.

# If you want to optimize a regular expression, you should write that
# optimizer in Perl6, or Python, or Scheme, or whatever, not in C.

C will almost always be faster with this sort of thing than Perl or any
other language, simply because of all the extra layers of Stuff you'll
have to go through in a higher-level language.

# So, what am I saying?
#
# Once you squash rx_literal and friends, any attempt to benchmark the
# "rx" engine really becomes a benchmark of parrot itself. When
# you speed
# up parrot, you speed up regular expressions. Voila, no more black box.

This is true already.  The regex engine consists of normal opcodes, so a
fast, general Parrot opcode dispatch speedup will speed up regexes, too.

# If Parrot is just too damn slow for you, whip out libmylang and do the
# nitty gritty yourself. Since this is mostly a "just don't do it" post,
# no code is actually *required* from me, right? :)
#
# Here is an example based on the rx.ops example written in
# psuedo-Perl6.apocalypse.4. It may or may not be relevant. Any errors
# are my own. Any formatting problems are my fault for using an inferior
# mail system. This is purely for

RE: parrot rx engine

2002-01-30 Thread Angel Faus


Ashley Winters wrote:
>First, we set the rx engine to case-insensitive. Why is that bad? It's
>setting a runtime property for what should be compile-time
>unicode-character-kung-fu. Assuming your "CPU" knows what the gritty
>details of unicode in the first place just feels wrong, but I digress.

I tend to agree to that. Many run-time options can be turned to compile-time
versions of the opcodes, which hopefully will produce a speed increase.

>Once you squash rx_literal and friends, any attempt to benchmark the
>"rx" engine really becomes a benchmark of parrot itself. When you speed
>up parrot, you speed up regular expressions. Voila, no more black box.
>If Parrot is just too damn slow for you, whip out libmylang and do the
>nitty gritty yourself. Since this is mostly a "just don't do it" post,
>no code is actually *required* from me, right? :)

We are already doing so. What you are suggesting in fact, is to compile down
regular expressions to Perl code (and this one to Parrot then). This will be
always slower than directly generating Parrot, because some Perl features
prevent the heavy use of some optimitzations (think JIT) that are necessary
if we want good regex perfomance.

In other words. With your proposal, if you have a better general-purpose
optimizer you will get better regex perfomance, but it will always remain
worse than the current state.

If what you are suggesting is that everything is compiled to general-purpose
opcodes (branch, unicode's, etc..) [which is what is derived from your
words, but not from your examples], I still believe this to be a perfomance
mistake. It would dramatically reduce the code density,
and no matter how fast parrot dispatch is, this will kill your perfomance.

And using too much stacks (as the usage of exceptions would probably
require), will also be too slow (as Brent Dax showed me when we
where discussing our two regex opcodes designs).

Just my 2 cents (of euros) :)

---
Angel faus
[EMAIL PROTECTED]




Re: parrot rx engine

2002-01-30 Thread Melvin Smith


>Basically, I see a black-box being built in the interests of speed.
>Voodoo array formats, bitmaps, and other such things to avoid actually
>spelling out what the regular expression is doing *in parrot code*.
[snip]
>What I see is that rx_literal is a speed hack to avoid compiling this
>into parrot code:
[snip]
>I think that's exactly what you should be doing! Neither parrot nor the
>rx engine should try to be a full compiler. The rx engine definitely
>should have opcodes in the virtual machine, but those opcodes should
[snip]
>Once you squash rx_literal and friends, any attempt to benchmark the
>"rx" engine really becomes a benchmark of parrot itself. When you speed
>up parrot, you speed up regular expressions. Voila, no more black box.
>If Parrot is just too damn slow for you, whip out libmylang and do the

This is a serious reply, I'm not taking potshots, but correct me if I'm
wrong: by your argument, we should implement lots of other black boxes
in "parrot" rather than C such as anything that is not a basic low level
call (for example upper layer IO system, buffering, etc.).

Otherwise I'm unsure where you think a black box is appropriate and
where it isn't.

-Melvin




Re: parrot rx engine

2002-01-30 Thread Jonathan Scott Duff

On Wed, Jan 30, 2002 at 08:13:55AM -0800, Ashley Winters wrote:
> I think that's exactly what you should be doing! Neither parrot nor the
> rx engine should try to be a full compiler. The rx engine definitely
> should have opcodes in the virtual machine, but those opcodes should
> simply contain state-machine/backtracking info, not godly unicode info.

So, basically, you just want to push Unicode onto the language that
sits atop parrot.  If that language were Perl, for instance, you'd
advocate that everywhere the user had written /a/ be replaced (by the
Perl compiler) with the big long "given" you described?  Have I got
that right?

Excerpt from Apocalypse 2:

Perl 6 programs are notionally written in Unicode, and assume
Unicode semantics by default even when they happen to be
processing other character sets behind the scenes. Note that
when we say that Perl is written in Unicode, we're speaking of
an abstract character set, not any particular encoding. (The
typical program will likely be written in UTF-8 in the West, and
in some 16-bit character set in the East.)

It seems to me that in order for Perl 6 programs to be written in
Unicode, Parrot needs to grok unicode (everwhere, including regular
expressions).

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]



parrot rx engine

2002-01-30 Thread Ashley Winters

Hello p6i,

Who the hell am I?
I've been only a weblog-lurker till now. It's been a couple years since
I last contributed to Perl5. I just read the latest Apocalypse and it
inspired me to get a parrot snapshot and look around.

What's my beef?
I don't like the rx_literal and rx_oneof ops, and I don't like how the
"on parrot strings" thread is being related to the regex engine. I know
how useless "wouldn't it be nice" messages are, so please understand my
advocacy on this is more than just raving lunacy.

Basically, I see a black-box being built in the interests of speed.
Voodoo array formats, bitmaps, and other such things to avoid actually
spelling out what the regular expression is doing *in parrot code*.

What the hell am I talking about?
Let me cut&paste some code from the great rx.ops documentation (million
thanks to the authors).

rx_setprops P0, "i", 2
branch $start0
$advance:
rx_advance P0, $fail
$start0:
rx_literal P0, "a", $advance

First, we set the rx engine to case-insensitive. Why is that bad? It's
setting a runtime property for what should be compile-time
unicode-character-kung-fu. Assuming your "CPU" knows what the gritty
details of unicode in the first place just feels wrong, but I digress.

Next, a branch. No problem there.

Next, a comparison between the string in P0, and whatever the hell "a"
means. In this case, it probably means at least:

0041 LATIN CAPITAL LETTER A
0061 LATIN SMALL LETTER A
FF21 FULLWIDTH LATIN CAPITAL LETTER A
FF41 FULLWIDTH LATIN SMALL LETTER A

If you include various diacritic thingies through some voodoo
/switch-fu

LATIN CAPITAL LETTER A WITH .*
LATIN SMALL LETTER A WITH .*
AKA.
0041, 0061, 00C0-00C5, 00E0-00E5, 0100-0105,
01CD-01E1, 01FA, 01FB, 0200-0203, 0226, 0227,
1E00, 1E01, 1E9A, 1EA0-1EB7, and perhaps more.

Now, the current CVS rx engine is/would do this at runtime. I read that
someone else is working on doing that at compile-time (a necessity) and
caching the results in some data structure. The "some data structure"
part bothers me. Using Perl to create "some data structure" which is
needed by C seems dubious at best. Whatever. Moving on...

What I see is that rx_literal is a speed hack to avoid compiling this
into parrot code:

given $a_utf32_code_point {
when U+41 {}
when U+61 {}
when U+C0 <= $_ <= U+C5 {}
when U+E0 <= $_ <= U+E5 {}
# . on and on and on .
default { next ON_SOME_LOOP }
}

I think that's exactly what you should be doing! Neither parrot nor the
rx engine should try to be a full compiler. The rx engine definitely
should have opcodes in the virtual machine, but those opcodes should
simply contain state-machine/backtracking info, not godly unicode info.

If you want to optimize a regular expression, you should write that
optimizer in Perl6, or Python, or Scheme, or whatever, not in C.

So, what am I saying?

Once you squash rx_literal and friends, any attempt to benchmark the
"rx" engine really becomes a benchmark of parrot itself. When you speed
up parrot, you speed up regular expressions. Voila, no more black box.
If Parrot is just too damn slow for you, whip out libmylang and do the
nitty gritty yourself. Since this is mostly a "just don't do it" post,
no code is actually *required* from me, right? :)

Here is an example based on the rx.ops example written in
psuedo-Perl6.apocalypse.4. It may or may not be relevant. Any errors
are my own. Any formatting problems are my fault for using an inferior
mail system. This is purely for entertainment purposes, no warranty or
specification expressed or implied.

#!/usr/bin/perl6
# /ab*[cd]+/i
sub match ($string) {
return false if $string.length < 2;
my $r = rx::allocateinfo($string);
ADVANCE:
loop {
NEXT { $r.advance or last ADVANCE }
# /a/
given $r.current_code_point {   # whatever
when U+41, U+61 {} # it's an "a"
default { next ADVANCE }
}
# rx_literal used to move the current pointer
# upon success. replace with rx_next_code_point?
$r.next_code_point;
$r.pushmark;# backtracking starts here

# /b*/
loop {
NEXT { $r.next_code_point or last; $r.pushindex }
given $r.current_code_point {
when U+42, U+62 {}   # it's a "b"
# one-to-many unicode ops resolved at compile-time?
#   when $_ =~ any(toupper(U+62), tolower(U+62),
#  totitle(U+62), tofold(U+62)) {}
default { last }
}
}

loop {
NEXT {
if $r.distance_from_last_index > 1 {
return true;   # success...
} else {
$r.popindex or last;   # backtrack or start over
}
}
# /[cd]+/
loop {
NEXT { $r.next_code_point or last }
 

Re: flags in io/io_unix.c

2002-01-30 Thread Melvin Smith

At 10:16 AM 1/30/2002 -0500, Andy Dougherty wrote:
>Sun's compiler is (rightly) complaining about the following lines in
>io/io_unix.c:
>
>PIO_unix_fdopen() is defined to take a UINTVAL fourth argument:
>
> ParrotIO * PIO_unix_fdopen(theINTERP, ParrotIOLayer * layer,
> PIOHANDLE fd, UINTVAL flags);
>
>but it is later called with a string fourth argument, e.g.:
>
> PIO_unix_fdopen(interpreter, layer, STDIN_FILENO, "<"))
>
>Does anyone know the actual intent?  Which one is right?

Yep thats my bug. Low level fdopen should take an int val flags, not
a string.  I'll commit a fix.

-Melvin




Re: [PATCH] MANIFEST update [APPLIED]

2002-01-30 Thread Daniel Grunblatt

On Wed, 30 Jan 2002, Simon Glover wrote:

>
>  Enclosed patch adds the new SPARC-based JIT files to the manifest,
>  and also puts it back into alphabetical order.
>
>  Simon
>
Applied thanks.





flags in io/io_unix.c

2002-01-30 Thread Andy Dougherty

Sun's compiler is (rightly) complaining about the following lines in
io/io_unix.c:

PIO_unix_fdopen() is defined to take a UINTVAL fourth argument:

ParrotIO * PIO_unix_fdopen(theINTERP, ParrotIOLayer * layer,
PIOHANDLE fd, UINTVAL flags);

but it is later called with a string fourth argument, e.g.:

PIO_unix_fdopen(interpreter, layer, STDIN_FILENO, "<"))

Does anyone know the actual intent?  Which one is right?

-- 
Andrew Dougherty[EMAIL PROTECTED]




Re: [PATCH] Clean-up warnings

2002-01-30 Thread Nicholas Clark

On Wed, Jan 30, 2002 at 01:15:42PM +, Simon Glover wrote:
> 
>  This patch clears up warnings in embed.c and test_main.c coming
>  from function declarations of the form:

Thanks applied (with modifications, in that the functions in test_main are
now declared static).

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html



[PATCH] Clean-up warnings

2002-01-30 Thread Simon Glover


 This patch clears up warnings in embed.c and test_main.c coming
 from function declarations of the form:

 void
 foobar();

 which should properly be

 void
 foobar(void);


 Simon



--- test_main.c.old Tue Jan 29 20:00:01 2002
+++ test_main.c Wed Jan 30 12:05:36 2002
@@ -18,10 +18,10 @@
 parseflags(struct Parrot_Interp *interpreter, int *argc, char **argv[]);
 
 void
-usage();
+usage(void);
 
 void
-version();
+version(void);
 
 int
 main(int argc, char *argv[]) {


--- embed.c.old Wed Jan 30 12:10:03 2002
+++ embed.c Wed Jan 30 12:13:53 2002
@@ -19,7 +19,7 @@
 static BOOLVAL world_inited=0;
 
 struct Parrot_Interp *
-Parrot_new() {
+Parrot_new(void) {
 if(!world_inited) {
 world_inited=1;
 init_world();





[PATCH] interp_new and Parrot_new

2002-01-30 Thread Simon Glover


 In embed.h, we declare a function:

  struct Parrot_Interp *interp_new();

 that's never subsequently used. On the other hand, in embed.c,
 we use a function

 struct Parrot_Interp * Parrot_new() { ...

 that isn't previously declared. Are these supposed to be the same thing? 
 If so, the patch below fixes up the header.

 Simon

--- include/parrot/embed.h.old  Wed Jan 30 12:03:38 2002
+++ include/parrot/embed.h  Wed Jan 30 12:15:37 2002
@@ -31,7 +31,7 @@
 struct Parrot_Interp;
 struct PackFile;
 
-struct Parrot_Interp *interp_new();
+struct Parrot_Interp *Parrot_new(void);
 
 void Parrot_init(struct Parrot_Interp *);
 
 
 




Re: CVS Reorganization Complete

2002-01-30 Thread Simon Cozens

begin quote from Simon Cozens:
> > - Moved t/op/pmc* to t/op/pmc/

Sorry. Bad instructions. t/op/pmc/pmc_(.*) should become t/pmc/$1.
Thanks to Ask for fixing this. I'll fix the MANIFEST.

-- 
The warly race may riches chase,
An' riches still may fly them, O;
An' tho' at last they catch them fast,
Their hearts can ne'er enjoy them, O.



[PATCH] MANIFEST update

2002-01-30 Thread Simon Glover


 Enclosed patch adds the new SPARC-based JIT files to the manifest,
 and also puts it back into alphabetical order.

 Simon

--- MANIFEST.oldWed Jan 30 11:42:42 2002
+++ MANIFESTWed Jan 30 11:46:42 2002
@@ -6,35 +6,9 @@
 MANIFEST
 Makefile.in
 NEWS
-lib/Parrot/Assembler.pm
-lib/Parrot/BuildUtil.pm
-lib/Parrot/Jit/alpha-bsd.pm
-lib/Parrot/Jit/alpha-linux.pm
-lib/Parrot/Jit/alphaGeneric.pm
-lib/Parrot/Jit/i386-bsd.pm
-lib/Parrot/Jit/i386-linux.pm
-lib/Parrot/Jit/i386-nojit.pm
-lib/Parrot/Jit/i386Generic.pm
-lib/Parrot/Op.pm
-lib/Parrot/OpTrans.pm
-lib/Parrot/OpTrans/C.pm
-lib/Parrot/OpTrans/CGoto.pm
-lib/Parrot/OpTrans/CPrederef.pm
-lib/Parrot/OpsFile.pm
-lib/Parrot/Optimizer.pm
-lib/Parrot/PackFile.pm
-lib/Parrot/PackFile/ConstTable.pm
-lib/Parrot/PackFile/Constant.pm
-lib/Parrot/PackFile/FixupTable.pm
-lib/Parrot/String.pm
-lib/Parrot/Test.pm
-lib/Parrot/Vtable.pm
 README
 README.OS_X
 TODO
-lib/Test/Builder.pm
-lib/Test/More.pm
-lib/Test/Simple.pm
 Types_pm.in
 VERSION
 assemble.pl
@@ -147,6 +121,9 @@
 jit/i386/core.jit
 jit/i386/lib.jit
 jit/i386/string.jit
+jit/sun4/core.jit
+jit/sun4/lib.jit
+jit/sun4/string.jit
 jit2h.pl
 key.c
 languages/Makefile.in
@@ -189,6 +166,34 @@
 languages/scheme/t/harness
 languages/scheme/t/io/basic.t
 languages/scheme/t/logic/basic.t
+lib/Parrot/Assembler.pm
+lib/Parrot/BuildUtil.pm
+lib/Parrot/Jit/alpha-bsd.pm
+lib/Parrot/Jit/alpha-linux.pm
+lib/Parrot/Jit/alphaGeneric.pm
+lib/Parrot/Jit/i386-bsd.pm
+lib/Parrot/Jit/i386-linux.pm
+lib/Parrot/Jit/i386-nojit.pm
+lib/Parrot/Jit/i386Generic.pm
+lib/Parrot/Jit/sun4-solaris.pm
+lib/Parrot/Jit/sun4Generic.pm
+lib/Parrot/Op.pm
+lib/Parrot/OpTrans.pm
+lib/Parrot/OpTrans/C.pm
+lib/Parrot/OpTrans/CGoto.pm
+lib/Parrot/OpTrans/CPrederef.pm
+lib/Parrot/OpsFile.pm
+lib/Parrot/Optimizer.pm
+lib/Parrot/PackFile.pm
+lib/Parrot/PackFile/ConstTable.pm
+lib/Parrot/PackFile/Constant.pm
+lib/Parrot/PackFile/FixupTable.pm
+lib/Parrot/String.pm
+lib/Parrot/Test.pm
+lib/Parrot/Vtable.pm
+lib/Test/Builder.pm
+lib/Test/More.pm
+lib/Test/Simple.pm
 make.pl
 make_vtable_ops.pl
 manicheck.pl




Re: [PATCH lib/Parrot/Test.pm] More info about failed compiles

2002-01-30 Thread Simon Cozens

begin quote from Michael G Schwern:
> This little patch makes command failures in tests (ie. if Parrot pukes
> on compile) report the command and exit code like so:

Thank you. This is quite wonderful.

-- 
I want you to know that I create nice things like this because it
pleases the Author of my story.  If this bothers you, then your notion
of Authorship needs some revision.  But you can use perl anyway. :-)
- Larry Wall



[PATCH lib/Parrot/Test.pm] More info about failed compiles

2002-01-30 Thread Michael G Schwern

This little patch makes command failures in tests (ie. if Parrot pukes
on compile) report the command and exit code like so:

# 'perl assemble.pl t/op/basic2.pasm --output t/op/basic2.pbc' failed with exit code 1

I don't know if that's informative enough, but its a start anyway.


--- lib/Parrot/Test.pm  29 Jan 2002 02:32:15 -  1.12
+++ lib/Parrot/Test.pm  30 Jan 2002 11:38:11 -
@@ -37,6 +37,8 @@
   }
 
   system "$^X -e \"$redir_string;system q{$command};\"";
+  my $exit_code = $? / 256;
+  $Builder->diag("'$command' failed with exit code $exit_code") if $exit_code;
 }
 
 my $count;


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
sort God kill 9, @ARGV;



Re: New Todo

2002-01-30 Thread Tim Bunce

On Wed, Jan 30, 2002 at 07:48:25AM +0100, Paul Johnson wrote:
> On Tue, Jan 29, 2002 at 09:57:16PM +, Simon Cozens wrote:
> 
> > I've started a new TODO list. Remind me of anything else that needs
> > doing;
> 
> Sandboxes.
> 
> Has anyone given any thought as to whether Parrot should support
> "use Safe", and if so, how?

And remember that Safe is built on ops (ops.pm etc) and ops is very
useful in it's own right (eg for allowing limited perl ops in a config file).

Tim.



Re: CVS Reorganization Complete

2002-01-30 Thread Simon Cozens

begin quote from Robert Spier:
> - Renamed include/parrot/register_funcs.h to regfuncs.h
> - Renamed languages/miniperl/miniperlc to mpc
> - Moved t/op/pmc* to t/op/pmc/
> - Moved Parrot/* to lib/Parrot/
> - Moved Test/* to lib/Test/

Thanks to Robert and Jeff for this reorganisation and the post-reorg
cleanups.

-- 
Look, there are only a few billion people in the world, right?  And they can 
only possibly know a few thousand bits of information not known by someone 
else, right?  So the human race will never have a real need for more than a 
few terabits of storage, except possibly as cache. - Geraint Jones