RE: [ID 20020130.001] Unicode broken for 0x10FFFF
Larry Wall: # For various reasons, some of which relate to the sequence-of-integer # abstraction, and some of which relate to "infinite" strings # and arrays, # I think Perl 6 strings are likely to be represented by a list of # chunks, where each chunk is a sequence of integers of the same size or # representation, but different chunks can have different integer sizes # or representations. The abstract string interface must hide this from # any module that wishes to work at the abstract string level. In # particular, it must hide this from the regex engine, which works on # pure sequences in the abstract. # # Note that I did not use the phrase "pure sequences of integers" in the # last sentence. The regex engine must not care if it is matching # characters from a string of known length, or tokens objects from an # array that is being grown arbitrarily on demand. Matching on UTF-32 # is not good enough. # # This is just a heads up for some of the stuff in Apocalypse 5. # Backtracking behavior will not necessarily be limited to regexes in # Perl 6, and if so, we have to consider very carefully how regex # backtracking, continuations, and temp variable unifications all work # together. (This is part of the reason I pushed earlier for the regex # opcodes to be meshed with the normal opcodes.) # # I seriously intend that it be trivial to write a Perl parser (or any # other parser) in Perl, and that changing a grammar rule be as # simple as # swapping in a different qr// (or a sub equivalent to a qr//). More # generally, I want logic programming to be one of the paradigms that # Perl supports. And as usual, I want to support it without forcing it # on people who aren't interested. As the regex guy for Parrot, my first response to this sounded something like "oh, crap". This'll be hard to make efficient, hard to implement for all cases, and all that. But as I thought about it more, I realized that there's a fairly easy way to do this. The first thing is to make sure that, at the Parrot level, "$left =~ $right" calls $right->vtable->match, not $left. The second thing is to make sure that =~ on characters (or character streams) is the same as "eq"--character-set-independent comparison. Once that's done, it's quite easy. A regex becomes a series of =~ operations. For example, let's say @toke contains a series of tokens: @toke=(... new Perl6::Toke::Term(), new Perl6::Toke::Operator::Plus(), new Perl6::Toke::Term() ...); Now, assume that \t{Foo} in a regex is like $curitem =~ Perl6::Toke::Foo. (I assume Larry will come up with a more general mechanism, but you get the idea.) Finally, assume =~ on classes is an ISA search. Now, to find the first addition operation in the given token stream, you just do something like this: @toke =~ m<\t{Value}\t{Operator::Plus}\t{Value}>; To find the first unary plus operator: @toke =~ m<(?; #or something like that To compress all value/addition-precedence-operator/value sequences into value tokens: @toke =~ s< \t{Value} [ \t{Operator::Plus} \t{Operator::Minus} \t{Operator::Underscore} ] \t{Value} >< new Perl6::Toke::Value($&) >eg; Now, check this one out: $unop=qr[(?, qr< \t{Value} \t{Operator::StarStar} \t{Value} >r, qr< $unop [ \t{Operator::Exclamation} \t{Operator::Tilde} \t{Operator::Backslash} \t{Operator::Plus} \t{Operator::Minus} ] \t{Value} >, qr< \t{Value} [ \t{Operator::EqualsTilde} \t{Operator::ExclamationTilde} ] \t{Value} >, qr< \t{Value} [ \t{Operator::Star} \t{Operator::Slash} \t{Operator::Percent} \t{Operator::X} ] \t{Value} >, qr< \t{Value} [ \t{Operator::Plus} \t{Operator::Minus} \t{Operator::Underscore} ] \t{Value} >, ... ); ($top)=map { @toke =~ s/$_/new Perl6::Toke::Value($&)/e } @rules; For those who can't see what that is (and I don't blame you if
Re: Jit on Solaris: using dis instead of objdump?
On Wed, Jan 30, 2002 at 03:27:18PM -0500, Andy Dougherty wrote: > On Solaris, it looks like JIT will now be enabled if the user has also > installed GNU objdump. However, there is (often) already a disassembler > in /usr/ccs/bin/dis. It's output is similar, but not identical to, > objdump. Is anyone with a Solaris system familiar enough with jit > internals to have a go at adapting it to use dis instead of GNU objdump? The difference was pretty minimal. It should work with 'dis'. -- Jason
Re: parrot rx engine
On Wednesday 30 January 2002 21:42, Dan Sugalski wrote: > I think we may want trees as a fundamental data type at some point... I wonder about the trees -- Bryan C. Warnock [EMAIL PROTECTED]
Re: parrot rx engine
At 6:28 PM -0800 1/30/02, Steve Fink wrote: >I'm sure in Apoc 5 Larry's going to go way beyond that and embed full >parsers, not just regularish language matchers, but the above is >easier to grasp. Odds are, yes. And don't be surprised if the RE engine's required to return data structures as well. (Nested parens returns you a tree struct, for example) I think we may want trees as a fundamental data type at some point... -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: parrot rx engine
On Wed, Jan 30, 2002 at 08:37:30PM -0500, Bryan C. Warnock wrote: > "But if you know they're going to be twenty times slower, why are you doing > it?" Because we know / think / hope / pray / have been making sacrifices to Tangential note: current benchmarking indicates that we're doing a lot better than this. Two times slower is the right ballpark, and in cases where we can tell the re compiler that we only need restricted information out of the match, something like 20%. But that's based on a probably nonrepresentative example and a duck walked within four miles of my computer while I ran the test, so my numbers are about as meaningful as the points on Whose Line Is It Anyway. Somebody's been watching too much TV. And on another tangent (tangential to this thread, not the list), here's some motivation for an op-based regex engine: array matching. if (@ARGV =~ regex('-o', '(', '.', ')')) { # pardon the syntax $output = $1; } Maybe that would look better as regex/-o (.)/ ? With an opcode-based engine, you get to reuse all the mechanics of * + ? *? (?>) etc., and just add a new match_list_elt op. Which could itself invoke a parrot subroutine so we're not restricted to element equality matching as I implied in my above example. I'm sure in Apoc 5 Larry's going to go way beyond that and embed full parsers, not just regularish language matchers, but the above is easier to grasp.
Re: parrot rx engine
On Wednesday 30 January 2002 11:13, Ashley Winters wrote: > First, we set the rx engine to case-insensitive. Why is that bad? It's > setting a runtime property for what should be compile-time {snip} > Now, the current CVS rx engine is/would do this at runtime. We're also currently a compiler short. > What I see is that rx_literal is a speed hack to avoid compiling this > into parrot code: {excellent example of a pure_parrot regex engine snipped} Is something *wrong* with speed hacks? When we talk about wanting to make Parrot blazingly, blindly fast, we're talking in relative terms. The mechanics of interpretation - sans JIT - pretty much restrict you to racing in the 125cc class. You may blow away every other bike in the class, but good luck going up against some 800cc monster. That's why these virtual machines aren't very RISCish. There's entirely too much stuff that has to be done that is unrelated to what you're actually trying to do - the more you can stuff into an op, the faster you will be. Yes, that means that the fastest regex engine would probably be, yes, one op. match. The rationale for *not* doing that (yet) is a design choice - we want regex ops - in some form - to be first-class citizens of Parrot opcodes. But in truth, if Parrot can be four times as fast as Perl 5 - currently, from an op dispatch perspective, it's a measly two; from a functionality perspective, it's much greater, but then again, we don't have all the functionality - would you be content in having your regexes run twenty times slower? "But if you know they're going to be twenty times slower, why are you doing it?" Because we know / think / hope / pray / have been making sacrifices to the gods that we can make up the speed in other ways. A smarter regex engine. Faster op dispatch. Pure compilation. The JIT. Our target is to match the current speed. If we can't do that, we'll more than likely reduce the number of Parrot ops. If we blow away previous marks; well, then, we can expand. > > Am I fool, or an idiot? Discuss. Overzealous, perhaps. It'd be nice for Perl 7 to be written in Perl 7, but I don't think that's realistic. > > Mostly, I'd like to hear how either Unicode character-ranges aren't > deterministic at compile-time (I doubt that) or how crippling to > performance this would be (and by implication how slow parrot will be) > in either time or space. Literal character classes ([abc]) will most likely be compiled. Meta-character classes (\d) may be compiled. Character ranges ([a-f]) may or may not be. It's hard to say, because we seem to still not be sure what any of those mean. (Particularly when locale comes into play.) That's not a regex issue, it's a Unicode one. -- Bryan C. Warnock [EMAIL PROTECTED]
Re: parrot rx engine
On Wednesday 30 January 2002 12:32, Brent Dax wrote: > # Mostly, I'd like to hear how either Unicode character-ranges aren't > # deterministic at compile-time (I doubt that) or how crippling to > > One word: locale. Not that locales couldn't provide pre-compiled character classes. -- Bryan C. Warnock [EMAIL PROTECTED]
Re: Interpreter startup environment
At 06:21 PM 1/30/2002 -0500, Dan Sugalski wrote: >A quick recap and elaboration for folks following along at home. > >On interpreter startup. P0 will hold an Array with ARGV in it if there is >one, or NULL if not. > >P1 will hold a Hash with %ENV in it if there is one, or NULL if not > >P2 (this is the new bit) will hold an Array of three elements >corresponding to stdin, stdout, and stderr. If NULL the default's used, >and if an entry is undef that file's not available. Sounds good. This jogged my memory, I'm using interp->piodata->table[0-2] for the standard handles, but thought about caching those directly into the interp struct for cutting out 2 derefs for most common cases? I'll do the mod if you say the word. -Melvin
Re: New PMC vtable methods
At 11:40 PM + 1/30/02, Nicholas Clark wrote: >On Wed, Jan 30, 2002 at 06:23:43PM -0500, Dan Sugalski wrote: >> We're adding the following: >> >> INTVAL get_character(PMC *, INTVAL) >> INTVAL get_character(PMC *, KEY *, INTVAL) >> >> to return the character at position INTVAL in the passed in PMC. > >are characters really INTVAL? I have this gut feeling that they ought to >be UINTVAL. At least, that's my personal world view. That lets us reserve the negative characters for error conditions. 31 bits should be enough for a long time, even for Unicode. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]
On Wed, Jan 30, 2002 at 11:20:28PM +, Nicholas Clark wrote: > On Wed, Jan 30, 2002 at 02:55:54PM -0800, Steve Fink wrote: > > Ah. This suggests that this bit of the proposed MANIFEST.SKIP: > > ... > > becomes > > ^classes/.*\.[ch]$ > > Is it valid to assume that the only .h and .c files in classes/ are > autogenerated? Well, if it isn't, then classes/.cvsignore should be told.
Re: New PMC vtable methods
On Wed, Jan 30, 2002 at 06:23:43PM -0500, Dan Sugalski wrote: > We're adding the following: > >INTVAL get_character(PMC *, INTVAL) >INTVAL get_character(PMC *, KEY *, INTVAL) > > to return the character at position INTVAL in the passed in PMC. are characters really INTVAL? I have this gut feeling that they ought to be UINTVAL. At least, that's my personal world view. Nicholas Clark -- EMCFT http://www.ccl4.org/~nick/CV.html
Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]
On Wed, Jan 30, 2002 at 02:55:54PM -0800, Steve Fink wrote: > On Wed, Jan 30, 2002 at 09:32:45PM +, Nicholas Clark wrote: > > You can now do: > > > > nick@thinking-cap maniskip$ make manitest > > perl14405-32 -MExtUtils::Manifest=fullcheck -e fullcheck > > Not in MANIFEST: Configure.pl.rej > > Not in MANIFEST: MANIFEST.SKIP.orig > > Not in MANIFEST: MANIFEST.SKIP~ > > Not in MANIFEST: MANIFEST.orig > > Not in MANIFEST: Makefile.in.orig > > Not in MANIFEST: Makefile.in~ > > Not in MANIFEST: classes/array.c > > Not in MANIFEST: classes/array.h > > Not in MANIFEST: docs/embed.pod > > Not in MANIFEST: docs/io_ops.pod > > Not in MANIFEST: newpatch > > Not in MANIFEST: patch > > > > Should I add classes/array.c classes/array.h docs/embed.pod docs/io_ops.pod > > to MANIFEST? > > *.pod, yes. classes/*.[ch], no. Autogenerated. Ah. This suggests that this bit of the proposed MANIFEST.SKIP: ^classes/default\.h$ ^classes/default\.c$ ^classes/intqueue\.h$ ^classes/intqueue\.c$ ^classes/parrotpointer\.h$ ^classes/parrotpointer\.c$ ^classes/perlarray\.h$ ^classes/perlarray\.c$ ^classes/perlhash\.h$ ^classes/perlhash\.c$ ^classes/perlint\.h$ ^classes/perlint\.c$ ^classes/perlnum\.h$ ^classes/perlnum\.c$ ^classes/perlstring\.h$ ^classes/perlstring\.c$ ^classes/perlundef\.h$ ^classes/perlundef\.c$ becomes ^classes/.*\.[ch]$ Is it valid to assume that the only .h and .c files in classes/ are autogenerated? Nicholas Clark -- EMCFT http://www.ccl4.org/~nick/CV.html
Re: New Todo
On Wed, Jan 30, 2002 at 10:01:50PM +, Simon Cozens wrote: > begin quote from Steve Fink: > > Perhaps a target version for each item? > > Oh, bother. This is the second time I've been asked about this, so I > suspect that my goals for the forthcoming releases aren't amazingly > clear. Or perhaps people lose track of your last pronouncement over time. Rather than periodically resending it to the list, might I suggest checking it into CVS? TODO sounds like a nice filename for it. :-)
[COMMIT] infer possible control flow changes
I just committed a patch to jit2h.pl, Op.pm, and OpsFile.pm that infers what ops may modify control flow, used by the jit to decide whether to fall through to the next op or jump. (Daniel Grunblatt is ok with it.) Just FYI.
New PMC vtable methods
We're adding the following: INTVAL get_character(PMC *, INTVAL) INTVAL get_character(PMC *, KEY *, INTVAL) to return the character at position INTVAL in the passed in PMC. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Interpreter startup environment
A quick recap and elaboration for folks following along at home. On interpreter startup. P0 will hold an Array with ARGV in it if there is one, or NULL if not. P1 will hold a Hash with %ENV in it if there is one, or NULL if not P2 (this is the new bit) will hold an Array of three elements corresponding to stdin, stdout, and stderr. If NULL the default's used, and if an entry is undef that file's not available. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [Patch] manually disabling jit compilation, testing forcranky jit-unfriendly compilers [APPLIED]
At 4:28 PM -0500 1/30/02, Josh Wilmes wrote: >This patch allows parrot to mostly-build with tcc. It allows one to skip >compiling the JIT stuff (by specifying --define jitcapable=0), and it >introduces a test program which gives a friendlier error in this case for >compilers which are as picky as tcc is about function pointer conversion. > >If anyone figures out the proper way to cast these function pointers this >may not be necessary. Applied, thanks. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]
On Wed, Jan 30, 2002 at 09:32:45PM +, Nicholas Clark wrote: > You can now do: > > nick@thinking-cap maniskip$ make manitest > perl14405-32 -MExtUtils::Manifest=fullcheck -e fullcheck > Not in MANIFEST: Configure.pl.rej > Not in MANIFEST: MANIFEST.SKIP.orig > Not in MANIFEST: MANIFEST.SKIP~ > Not in MANIFEST: MANIFEST.orig > Not in MANIFEST: Makefile.in.orig > Not in MANIFEST: Makefile.in~ > Not in MANIFEST: classes/array.c > Not in MANIFEST: classes/array.h > Not in MANIFEST: docs/embed.pod > Not in MANIFEST: docs/io_ops.pod > Not in MANIFEST: newpatch > Not in MANIFEST: patch > > Should I add classes/array.c classes/array.h docs/embed.pod docs/io_ops.pod > to MANIFEST? *.pod, yes. classes/*.[ch], no. Autogenerated.
Re: New Todo
begin quote from Steve Fink: > Perhaps a target version for each item? Oh, bother. This is the second time I've been asked about this, so I suspect that my goals for the forthcoming releases aren't amazingly clear. Here is the Grand Pronouncement! 0.0.4 WILL HAPPEN WHEN we have decent keyed aggregate support, and support for other string encodings to the same level as ASCII. (This may have to slip, since it's a biggy.) 0.0.5 WILL HAPPEN WHEN we have symbol table and heap access. This also includes storage of subroutines in the symbol table, which in turn requires the serialisation of PMCs to the bytecode constant table. 0.0.6 WILL HAPPEN WHEN we have Really Good GC support. 0.1.0 WILL HAPPEN WHEN we have an implementation of one reasonably well-known high-level programming language. That's to say, Perl, Python, Scheme, etc. Parsing does not need to be included, nor a full "library"; just running bytecode is fine. Given that the Unicode requirement may slip, expect 0.0.4 relatively soon. -- buf[hdr[0]] = 0;/* unbelievably lazy ken (twit) */ - Andrew Hume
Re: [nick@unfortu.net: [PATCH] MANIFEST.SKIP]
On Mon, Jan 28, 2002 at 08:13:11PM +, Nicholas Clark wrote: > Is a MANIFEST.SKIP a good idea, even if Configure.pl doesn't check it by > default? Revised patch. Any objections? [Either express objections or remove my commit privs else it goes in in 24 hours :-)] You can now do: nick@thinking-cap maniskip$ make manitest perl14405-32 -MExtUtils::Manifest=fullcheck -e fullcheck Not in MANIFEST: Configure.pl.rej Not in MANIFEST: MANIFEST.SKIP.orig Not in MANIFEST: MANIFEST.SKIP~ Not in MANIFEST: MANIFEST.orig Not in MANIFEST: Makefile.in.orig Not in MANIFEST: Makefile.in~ Not in MANIFEST: classes/array.c Not in MANIFEST: classes/array.h Not in MANIFEST: docs/embed.pod Not in MANIFEST: docs/io_ops.pod Not in MANIFEST: newpatch Not in MANIFEST: patch Should I add classes/array.c classes/array.h docs/embed.pod docs/io_ops.pod to MANIFEST? Nicholas Clark --- MANIFEST.orig Wed Jan 30 17:33:48 2002 +++ MANIFESTWed Jan 30 21:01:47 2002 @@ -4,6 +4,7 @@ KNOWN_ISSUES LICENSES/Artistic MANIFEST +MANIFEST.SKIP Makefile.in NEWS README --- Makefile.in.origWed Jan 30 10:31:28 2002 +++ Makefile.in Wed Jan 30 21:10:17 2002 @@ -408,6 +408,8 @@ reconfig: $(MAKE) clean; $(PERL) Configure.pl --reconfig +manitest: + $(PERL) -MExtUtils::Manifest=fullcheck -e fullcheck ### # --- /dev/null Wed Jan 30 19:14:25 2002 +++ MANIFEST.SKIP Wed Jan 30 21:05:44 2002 @@ -0,0 +1,53 @@ +\.o$ +^\.cvsignore$ +/\.cvsignore$ +CVS/[^/]+$ +^include/parrot/config\.h$ +^include/parrot/platform\.h$ +^Makefile$ +/Makefile$ +^lib/Parrot/Types\.pm$ +^lib/Parrot/Config\.pm$ +^platform\.c$ +^config.opt$ + +^vtable\.ops$ +^include/parrot/vtable\.h$ +^include/parrot/jit_struct\.h$ +^include/parrot/oplib/core_ops\.h$ +^include/parrot/oplib/core_ops_prederef\.h$ + +^core_ops\.c$ +^core_ops_prederef\.c$ +^vtable_ops\.c$ + +^lib/Parrot/Jit\.pm$ +^lib/Parrot/PMC\.pm$ +^lib/Parrot/OpLib/core\.pm$ + +^classes/default\.h$ +^classes/default\.c$ +^classes/intqueue\.h$ +^classes/intqueue\.c$ +^classes/parrotpointer\.h$ +^classes/parrotpointer\.c$ +^classes/perlarray\.h$ +^classes/perlarray\.c$ +^classes/perlhash\.h$ +^classes/perlhash\.c$ +^classes/perlint\.h$ +^classes/perlint\.c$ +^classes/perlnum\.h$ +^classes/perlnum\.c$ +^classes/perlstring\.h$ +^classes/perlstring\.c$ +^classes/perlundef\.h$ +^classes/perlundef\.c$ + +^docs/packfile-c\.pod$ +^docs/packfile-perl\.pod$ +^docs/core_ops\.pod$ + +^test_parrot$ +^pdump$ +^blib/
[Patch] manually disabling jit compilation, testing for cranky jit-unfriendly compilers
This patch allows parrot to mostly-build with tcc. It allows one to skip compiling the JIT stuff (by specifying --define jitcapable=0), and it introduces a test program which gives a friendlier error in this case for compilers which are as picky as tcc is about function pointer conversion. If anyone figures out the proper way to cast these function pointers this may not be necessary. --Josh -- Josh Wilmes ([EMAIL PROTECTED]) | http://www.hitchhiker.org Index: Configure.pl === RCS file: /home/perlcvs/parrot/Configure.pl,v retrieving revision 1.87 diff -u -r1.87 Configure.pl --- Configure.pl30 Jan 2002 04:20:37 - 1.87 +++ Configure.pl30 Jan 2002 21:25:12 - @@ -162,6 +162,8 @@ } } +$jitcapable = $opt_defines{jitcapable} if exists $opt_defines{jitcapable}; + unless($jitcapable){ $jitarchname = 'i386-nojit'; } @@ -262,6 +264,11 @@ my $ccname = $Config{ccname} || $Config{cc}; +# Make one more check before allowing the use of the JIT code. +# make sure that their choice of compiler and cflags will allow our JIT's +# non-ansi use of function pointers. +# + # Add the -DHAS_JIT if we're jitcapable if ($jitcapable) { $c{cc_hasjit} = " -DHAS_JIT -D" . uc $jitcpuarch; @@ -348,8 +355,8 @@ my %gnuc; compiletestc("test_gnuc"); -%gnuc=eval(runtestc()) or die "Can't run the test program: $!"; -unlink("test_siz$c{exe}", "test$c{o}"); +%gnuc=eval(runtestc("test_gnuc")) or die "Can't run the test program: $!"; +cleantestc("test_gnuc"); unless (exists $gnuc{__GNUC__}) { print <<'END'; @@ -490,12 +497,13 @@ my %newc; buildfile("test_c"); -compiletestc(); -%newc=eval(runtestc()) or die "Can't run the test program: $!"; +compiletestc("test"); +%newc=eval(runtestc("test")) or die "Can't run the test program: $!"; @c{keys %newc}=values %newc; -unlink('test.c', "test_siz$c{exe}", "test$c{o}"); +cleantestc("test"); +unlink('test.c'); } print <<"END"; @@ -611,6 +619,26 @@ buildfile("Types_pm", "lib/Parrot"); buildconfigpm(); +print "\n"; + + +if ($jitcapable) { +print "Verifying that the compiler supports function pointer casts...\n"; +eval { compiletestc("testparrotfuncptr"); }; + +if ($@ || !(runtestc("testparrotfuncptr") =~ /OK/)) { +print "Although it is not required by the ANSI C standard,\n"; +print "Parrot requires the ability to cast from void pointers to function\n"; +print "pointers for its JIT support.\n\n"; +print "Your compiler does not appear to support this behavior with the\n"; +print "flags you have specified. You must adjust your settings in order\n"; + print "to use the JIT code.\n\n"; +print "If you wish to continue without JIT support, please re-run this +script\n"; + print "With the '--define jitcapable=0' argument.\n"; + exit(-1); +} +cleantestc("testparrotfuncptr"); +} # @@ -632,13 +660,14 @@ close NEEDED; buildfile("testparrotsizes_c"); compiletestc("testparrotsizes"); -%newc=eval(runtestc()) or die "Can't run the test program: $!"; - +%newc=eval(runtestc("testparrotsizes")) + or die "Can't run the test program: $!"; @c{keys %newc}=values %newc; @c{qw(stacklow intlow numlow strlow pmclow)} = lowbitmask(@c{qw(stackchunk iregchunk nregchunk sregchunk pregchunk)}); -unlink('testparrotsizes.c', "test_siz$c{exe}", "test$c{o}"); +cleantestc("testparrotsizes"); +unlink('testparrotsizes.c'); unlink("include/parrot/vtable.h"); } @@ -846,10 +875,13 @@ # sub compiletestc { -my $name; -$name = shift; -$name = "test" unless $name; -system("$c{cc} $c{ccflags} -I./include $c{cc_exe_out}test_siz$c{exe} $name.c $c{cc_ldflags} $c{ldflags} $c{libs}") and die "C compiler died!"; +my ($name) = @_; + +my $cmd = "$c{cc} $c{ccflags} -I./include -c $c{ld_out} $name$c{o} $name.c"; +system($cmd) and die "C compiler died! Command was '$cmd'\n"; + +$cmd = "$c{ld} $c{ldflags} $c{libs} $name$c{o} $c{cc_exe_out}$name$c{exe}"; +system($cmd) and die "Linker died! Command was '$cmd'\n"; } @@ -858,9 +890,21 @@ # sub runtestc { -`./test_siz$c{exe}` +my ($name) = @_; + +my $cmd = "$name$c{exe}"; +`./$cmd`; } +# +# cleantestc +# + +sub cleantestc { +my ($name) = @_; + +unlink("$name$c{o}", "$name$c{exe}"); +} # # lowbitmas() --- /dev/null Sat Jul 14 02:37:41 2001 +++ testparrotfuncptr.c Wed Jan 9 02:52:47 2002 @@ -0,0 +1,30 @@ +/* + * testparrotfuncptr.c - figure out if the compiler will let us do + * non-ansi function pointer casts. + */ + +#include + +int a_function(int some_number) { + if (some_number == 42) { + printf("OK\n"); + return 0; + } else { + printf("FAIL\n"); + return -1; + } +} + +typedef int (*func_t)(int); + +in
Re: [PATCH] POST_MORTERM, running.pod [APPLIED]
At 12:36 PM -0800 1/30/02, Steve Fink wrote: >I'm being anal again. Here's an update to docs/running.pod to better >reflect the current state (both the test_parrot and assemble.pl >improvements, plus documentation of a few more things.) And also a >speling fiks s/POST_MORTERM/POST_MORTEM/. > >I could also replace some "perl foo" calls with "./foo" if someone >wanted to set the executable flag in CVS on assemble.pl, optimize.pl, >etc. Applied with some chagrin. Thanks. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Jit on Solaris: using dis instead of objdump?
At 3:27 PM -0500 1/30/02, Andy Dougherty wrote: >On Solaris, it looks like JIT will now be enabled if the user has also >installed GNU objdump. However, there is (often) already a disassembler >in /usr/ccs/bin/dis. It's output is similar, but not identical to, >objdump. Is anyone with a Solaris system familiar enough with jit >internals to have a go at adapting it to use dis instead of GNU objdump? Apparently in progress even as we type. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: New Todo
On Wed, Jan 30, 2002 at 08:39:17PM +, Alex Gough wrote: > On Wed, 30 Jan 2002, Steve Fink wrote: > > > Any idea what of this will become 0.0.4? > > Is there any chance someone (simon) could make a TODO_FIRST [1], which contains > the goals for our next point release. I'm far too lazy to search through > mailing list archives to find it every time I want it. > > [1] or TODO_NOW or TO_REALLY_DO or JFDI or something. Perhaps a target version for each item? [0.0.4] collision resolution in hashtables [0.0.5] PMC attributes [future] translate Larry's brain to parrot opcodes
Re: New Todo
On Wed, 30 Jan 2002, Steve Fink wrote: > Any idea what of this will become 0.0.4? Is there any chance someone (simon) could make a TODO_FIRST [1], which contains the goals for our next point release. I'm far too lazy to search through mailing list archives to find it every time I want it. [1] or TODO_NOW or TO_REALLY_DO or JFDI or something. Alex Gough
[PATCH] POST_MORTERM, running.pod
I'm being anal again. Here's an update to docs/running.pod to better reflect the current state (both the test_parrot and assemble.pl improvements, plus documentation of a few more things.) And also a speling fiks s/POST_MORTERM/POST_MORTEM/. I could also replace some "perl foo" calls with "./foo" if someone wanted to set the executable flag in CVS on assemble.pl, optimize.pl, etc. Index: docs/running.pod === RCS file: /home/perlcvs/parrot/docs/running.pod,v retrieving revision 1.2 diff -p -u -b -r1.2 running.pod --- docs/running.pod22 Jan 2002 23:57:15 - 1.2 +++ docs/running.pod30 Jan 2002 20:31:32 - @@ -11,10 +11,10 @@ them and modify this document accordingl Converts a Parrot Assembly file to Parrot ByteCode. - assemble.pl foo.pasm > foo.pbc + perl assemble.pl foo.pasm > foo.pbc -Usage information: no usage message available. There is some amount of -malformed POD visible by running C. +Usage information: C. Detailed documentation on the +underlying module can be read with C. =item C @@ -52,7 +52,8 @@ Usage information: none available. =item C -Does something. Use it by running +Causes a segmentation fault and dumps core. (Not its intention, +despite the name!) Use it by running something @@ -64,7 +65,7 @@ Converts a bytecode file to a native .c perl pbc2c.pl foo.pbc > foo.c -Usage information: C +Usage information (and malformed pod error message): C No documentation is available for compiling the .c file to a binary. This works, but produces a binary that crashes: @@ -72,5 +73,26 @@ This works, but produces a binary that c ./assemble.pl examples/assembly/life.pasm > life.pbc perl pbc2c.pl life.pbc > life.c ls **/*.o | egrep -v 'pdump|test_main' | xargs gcc -Iinclude -o life life.c -lm -ldl + +=item B + +C will compile anything that needs to be compiled and run +all standard regression tests. To look at a test more closely, run the +appropriate test file in the t/ directory: + + perl -Ilib t/op/basic.t + +To keep a copy of all of the test C<.pasm> and C<.pbc> files +generated, set the environment variable POST_MORTEM to 1: + + POSTMORTEM=1 perl -Ilib t/op/basic.t + ls t/op/basic* + +To run tests with a different dispatcher, edit +C<$Parrot::Config::PConfig{test_prog}> in lib/Parrot/Config.pm: + + 'test_prog' => 'test_parrot -P', + +and then use any of the above methods for running tests. =back Index: lib/Parrot/Test.pm === RCS file: /home/perlcvs/parrot/lib/Parrot/Test.pm,v retrieving revision 1.13 diff -p -u -b -r1.13 Test.pm --- lib/Parrot/Test.pm 30 Jan 2002 11:42:44 - 1.13 +++ lib/Parrot/Test.pm 30 Jan 2002 20:31:32 - @@ -81,7 +81,7 @@ foreach my $func ( keys %Test_Map ) { my $meth = $Test_Map{$func}; my $pass = $Builder->$meth( $prog_output, $output, $desc ); -unless($ENV{POSTMORTERM}) { +unless($ENV{POSTMORTEM}) { foreach my $i ( $as_f, $by_f, $out_f ) { unlink $i; }
Jit on Solaris: using dis instead of objdump?
On Solaris, it looks like JIT will now be enabled if the user has also installed GNU objdump. However, there is (often) already a disassembler in /usr/ccs/bin/dis. It's output is similar, but not identical to, objdump. Is anyone with a Solaris system familiar enough with jit internals to have a go at adapting it to use dis instead of GNU objdump? -- Andy Dougherty [EMAIL PROTECTED]
Re: New Todo
Any idea what of this will become 0.0.4?
Re: [ID 20020130.001] Unicode broken for 0x10FFFF
Jarkko Hietaniemi writes: : > What I notice, though, is that the current code does not warn for : > characters beyond 0x10, which is definitely a bug. : : Ahh, it's all coming back now... warning about such characters : causes pain in the complementing tr///... have to look at this later. I think the general policy of Perl should be that it is allowed to think about bad thoughts, because that is the only way to understand what's bad about the bad thoughts Perl receives on input. If there is to be any self-censorship, it should be on the output, I believe. That's why they're called "disciplines", after all. :-) So it's fine if the default output discipline enforces that the internal representation is transformed to well-formed UTF-8. It's even okay if the default input discipline enforces well-formedness, as long as there's a way to get at the raw badness. But within Perl, character strings are simply sequences of integers. The internal representation must be optimized for this concept, not for any particular Unicode representation, whether UTF-8 or UTF-16 or UTF-32. Any of these could be used as underlying representations, but the abstraction of sequences of integers must be there explicitly in the internal high-level string API. To oversimplify, the high-level API must not have any parameters whose type contains the string "UTF". In the absence of other type information, these integers are assumed to be Unicode code points. Additional strictures are possible and even useful, but should not be the default (except for certain operations that are explicitly designed for Unicode.) For various reasons, some of which relate to the sequence-of-integer abstraction, and some of which relate to "infinite" strings and arrays, I think Perl 6 strings are likely to be represented by a list of chunks, where each chunk is a sequence of integers of the same size or representation, but different chunks can have different integer sizes or representations. The abstract string interface must hide this from any module that wishes to work at the abstract string level. In particular, it must hide this from the regex engine, which works on pure sequences in the abstract. Note that I did not use the phrase "pure sequences of integers" in the last sentence. The regex engine must not care if it is matching characters from a string of known length, or tokens objects from an array that is being grown arbitrarily on demand. Matching on UTF-32 is not good enough. This is just a heads up for some of the stuff in Apocalypse 5. Backtracking behavior will not necessarily be limited to regexes in Perl 6, and if so, we have to consider very carefully how regex backtracking, continuations, and temp variable unifications all work together. (This is part of the reason I pushed earlier for the regex opcodes to be meshed with the normal opcodes.) I seriously intend that it be trivial to write a Perl parser (or any other parser) in Perl, and that changing a grammar rule be as simple as swapping in a different qr// (or a sub equivalent to a qr//). More generally, I want logic programming to be one of the paradigms that Perl supports. And as usual, I want to support it without forcing it on people who aren't interested. Sorry I can't be more clear yet. Story of my life. That's the basic problem with the bear-of-very-little-brain approach. So please "bear" with me. [I've cross-posted because of the wide interest, but I don't want to start a general frenzy cross-posted to all the lists. Please answer specific points in separate messages, and please direct each followup to the appropriate list. Thanks.] Larry
Re: [Reposted PATCH] Parrot::Assembler pod clean-up [APPLIED]
At 6:30 PM + 1/30/02, Simon Glover wrote: > I posted this patch a while back, but it seems to have slipped > through the cracks. It fixes the POD in Parrot::Assember so that perldoc > can read it and just tidies it up generally. It also adds documentation > for the constantize_integer and constantize_number functions. Applied, thanks. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: New Todo
At 11:06 AM + 1/30/02, Tim Bunce wrote: >On Wed, Jan 30, 2002 at 07:48:25AM +0100, Paul Johnson wrote: >> On Tue, Jan 29, 2002 at 09:57:16PM +, Simon Cozens wrote: >> >> > I've started a new TODO list. Remind me of anything else that needs >> > doing; >> >> Sandboxes. >> >> Has anyone given any thought as to whether Parrot should support >> "use Safe", and if so, how? > >And remember that Safe is built on ops (ops.pm etc) and ops is very >useful in it's own right (eg for allowing limited perl ops in a config file). And our safe interpreter will use the same sort of mechanism. (Though you won't necessarily be able to do that from within a safe interpreter. Some stuff won't be overridable, but that's fine. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
[Reposted PATCH] Parrot::Assembler pod clean-up
I posted this patch a while back, but it seems to have slipped through the cracks. It fixes the POD in Parrot::Assember so that perldoc can read it and just tidies it up generally. It also adds documentation for the constantize_integer and constantize_number functions. Simon --- lib/Parrot/Assembler.pm.old Wed Jan 30 18:20:46 2002 +++ lib/Parrot/Assembler.pm Wed Jan 30 18:21:51 2002 @@ -67,6 +67,7 @@ output_listing() if $options{'listing'}; exit 0; +=cut ### ### @@ -85,6 +86,7 @@ my $pf = $asm->assemble($code); exit $interp->run($pf); +=cut ### ### @@ -105,8 +107,8 @@ =head2 %type_to_suffix -type_to_suffix is used to change from an argument type to the suffix that -would be used in the name of the function that contained that argument. +This is used to change from an argument type to the suffix that would be +used in the name of the function that contained that argument. =cut @@ -120,26 +122,26 @@ =head2 @program -@program will hold an array ref for each line in the program. Each array ref -will contain: +This holds an array ref for each line in the program. Each array ref +contains: =over 4 =item 1 -The file name in which the source line was found +The file name in which the source line was found. =item 2 -The line number in the file of the source line +The line number in the file of the source line. =item 3 -The chomped source line without beginning and ending spaces +The chomped source line without beginning and ending spaces. =item 4 -The chomped source line +The chomped source line. =back @@ -150,25 +152,17 @@ ### -=head2 $output -=head2 $listing -=head2 $bytecode - -=over 4 - -=item $output - -will be what is output to the bytecode file. +=head2 $output -=item $listing +What is output to the bytecode file. -will be what is output to the listing file. +=head2 $listing -=item $bytecode +What is output to the listing file. -is the program's bytecode (executable instructions). +=head2 $bytecode -=back +The program's bytecode (executable instructions). =cut @@ -177,14 +171,10 @@ ### -=head2 $file -=head2 $line -=head2 $pline -=head2 $sline - -$file, $line, $pline, and $sline are used to reference information from the -@program array. Please look at the comments for @program for the description -of each. +=head2 $file, $line, $pline, $sline + +These variables are used to reference information from the C<@program> array. +Please look at the comments for C<@program> for the description of each. =cut @@ -194,41 +184,31 @@ ### =head2 %label -=head2 %fixup -=head2 %macros -=head2 %local_label -=head2 %local_fixup -=head2 $last_label - -=over 4 - -=item %label -will hold each label and the PC at which it was defined. +This holds each label and the PC at which it was defined. -=item %fixup - -will hold labels that have not yet been defined, where they are used in -the source code, and the PC at that point. It is used for backpatching. +=head2 %fixup -=item %macros +This holds labels that have not yet been defined, the position they are +used in the source code, and the PC at that point. It is used for +backpatching. -will map a macro name to an array of program lines with the same format -as @program. +=head2 %macros -=item %local_label +This maps a macro name to an array of program lines with the same format +as C<@program>. -will hold local label definitions, +=head2 %local_label -=item %local_fixup +This holds local label definitions. -will hold the occurances of local labels in the source file. +=head2 %local_fixup -=item $last_label +This holds the occurrences of local labels in the source file. -is the name of the last label seen +=head2 $last_label -=back +This the name of the last label seen. =cut @@ -238,10 +218,12 @@ ### =head2 $pc + +This is the current program counter. + =head2 $op_pc -pc is the current program counter. op_pc is the program counter for the most -recent operator. +This is the program counter for the most recent operator. =cut @@ -251,11 +233,13 @@ ### =head2 %constants + +This maps the name of each constant to its index in the constant table. + =head2 @constants -%constants is a map of constant name to index in the constant table -@constant
Re: [PATCH] Post-reorganization clearup [APPLIED]
At 5:53 PM + 1/30/02, Simon Glover wrote: > Many of the Perl scripts in the distribution (including assemble.pl !) > can no longer find the Parrot::* modules. Enclosed patch fixes (although > it would be nice if there were an easier way to do this). Applied, thanks. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [PATCH] Post-reorganization clearup
Oops, scratch a couple of those; the pmc2c.pl one's not necessary, and I see Daniel's already patched pbc2c.pl Simon
[PATCH] Post-reorganization clearup
Many of the Perl scripts in the distribution (including assemble.pl !) can no longer find the Parrot::* modules. Enclosed patch fixes (although it would be nice if there were an easier way to do this). Simon --- languages/scheme/Scheme/Test.pm.old Wed Jan 30 17:42:08 2002 +++ languages/scheme/Scheme/Test.pm Wed Jan 30 17:42:11 2002 @@ -4,7 +4,7 @@ use strict; use vars qw(@EXPORT @ISA); -use lib '../..'; +use lib '../../lib'; use Parrot::Config; require Exporter; --- classes/pmc2c.pl.oldWed Jan 30 17:35:26 2002 +++ classes/pmc2c.plWed Jan 30 17:35:51 2002 @@ -6,6 +6,7 @@ # use FindBin; +use lib 'lib'; use lib "$FindBin::Bin/.."; use lib "$FindBin::Bin/../lib"; use Parrot::Vtable; --- classes/genclass.pl.old Wed Jan 30 17:35:19 2002 +++ classes/genclass.pl Wed Jan 30 17:35:41 2002 @@ -1,6 +1,7 @@ # $Id: genclass.pl,v 1.7 2002/01/04 16:09:01 dan Exp $ use FindBin; +use lib 'lib'; use lib "$FindBin::Bin/.."; use Parrot::Vtable; my %vtbl = parse_vtable("$FindBin::Bin/../vtable.tbl"); --- assemble.pl.old Wed Jan 30 17:09:20 2002 +++ assemble.pl Wed Jan 30 17:14:41 2002 @@ -6,6 +6,7 @@ # use strict; +use lib 'lib'; use Parrot::Assembler; init_assembler(@ARGV); --- disassemble.pl.old Wed Jan 30 17:09:26 2002 +++ disassemble.pl Wed Jan 30 17:14:53 2002 @@ -12,7 +12,7 @@ # use strict; - +use lib 'lib'; use Parrot::Config; use Parrot::OpLib::core; --- optimizer.pl.oldWed Jan 30 17:26:14 2002 +++ optimizer.plWed Jan 30 17:26:17 2002 @@ -1,7 +1,7 @@ #!/usr/bin/perl -w use strict; -use lib '.'; +use lib 'lib'; use Parrot::Optimizer; my $file = $ARGV[0]; --- pbc2c.pl.oldWed Jan 30 17:26:57 2002 +++ pbc2c.plWed Jan 30 17:27:02 2002 @@ -12,6 +12,7 @@ # use strict; +use lib 'lib'; use Parrot::Types; use Parrot::PackFile;
Re: parrot rx engine
On Wed, Jan 30, 2002 at 09:32:49AM -0800, Brent Dax wrote: > # rx_setprops P0, "i", 2 > # branch $start0 > # $advance: > # rx_advance P0, $fail > # $start0: > # rx_literal P0, "a", $advance > # > # First, we set the rx engine to case-insensitive. Why is that bad? It's > # setting a runtime property for what should be compile-time > # unicode-character-kung-fu. Assuming your "CPU" knows what the gritty > # details of unicode in the first place just feels wrong, but I digress. > > That "i" does a once-off case-folding operation on the target string. > All other input to the engine MUST already be case-folded for speed. Hm, is that going to work ? What about a rx like /^a(?i:b)C/ where the case insensitivity only applies to part of the pattern ? > # Mostly, I'd like to hear how either Unicode character-ranges aren't > # deterministic at compile-time (I doubt that) or how crippling to > > One word: locale. How did I know you would say that :) Graham.
Re: parrot rx engine
begin quote from Ashley Winters: > I think that's exactly what you should be doing! Neither parrot nor the > rx engine should try to be a full compiler. The rx engine definitely > should have opcodes in the virtual machine, but those opcodes should > simply contain state-machine/backtracking info, not godly unicode info. If you want to hear how much fun there is to be had in bolting on Unicode semantics from a language level to a regular expression engine that doesn't have them built-in, I'll buy Jarkko a double whisky and send him in your direction. I don't think you really want that. I agree we should have a black-box regular expression engine. I believe, however, it should conform to Unicode Technical Report #18. Because believe me, if it doesn't do so out of the box, there's no hope it ever will. -- int three = 128+64, two = 128, one=64; - plan9 has a bad day
RE: parrot rx engine
Ashley Winters: # Who the hell am I? # I've been only a weblog-lurker till now. It's been a couple # years since # I last contributed to Perl5. I just read the latest Apocalypse and it # inspired me to get a parrot snapshot and look around. Welcome back to the land of the living. :^) # What's my beef? # I don't like the rx_literal and rx_oneof ops, and I don't like how the # "on parrot strings" thread is being related to the regex # engine. I know # how useless "wouldn't it be nice" messages are, so please # understand my # advocacy on this is more than just raving lunacy. # # Basically, I see a black-box being built in the interests of speed. # Voodoo array formats, bitmaps, and other such things to avoid actually # spelling out what the regular expression is doing *in parrot code*. # # What the hell am I talking about? # Let me cut&paste some code from the great rx.ops # documentation (million # thanks to the authors). And a million "you're welcome"s. :^) (To be fair, japhy and Angel Faus helped with the design a LOT.) # rx_setprops P0, "i", 2 # branch $start0 # $advance: # rx_advance P0, $fail # $start0: # rx_literal P0, "a", $advance # # First, we set the rx engine to case-insensitive. Why is that bad? It's # setting a runtime property for what should be compile-time # unicode-character-kung-fu. Assuming your "CPU" knows what the gritty # details of unicode in the first place just feels wrong, but I digress. That "i" does a once-off case-folding operation on the target string. All other input to the engine MUST already be case-folded for speed. # Next, a branch. No problem there. # # Next, a comparison between the string in P0, and whatever the hell "a" # means. In this case, it probably means at least: # # 0041 LATIN CAPITAL LETTER A # 0061 LATIN SMALL LETTER A # FF21 FULLWIDTH LATIN CAPITAL LETTER A # FF41 FULLWIDTH LATIN SMALL LETTER A Which have been case-folded and normalized (Normalization Form KC, probably) to LATIN SMALL LETTER A. # If you include various diacritic thingies through some voodoo # /switch-fu # # LATIN CAPITAL LETTER A WITH .* # LATIN SMALL LETTER A WITH .* # AKA. # 0041, 0061, 00C0-00C5, 00E0-00E5, 0100-0105, # 01CD-01E1, 01FA, 01FB, 0200-0203, 0226, 0227, # 1E00, 1E01, 1E9A, 1EA0-1EB7, and perhaps more. Once again, the switch-fu plus normalization will have converted all of that to LATIN SMALL LETTER A. # Now, the current CVS rx engine is/would do this at runtime. I # read that # someone else is working on doing that at compile-time (a # necessity) and # caching the results in some data structure. The "some data structure" # part bothers me. Using Perl to create "some data structure" which is # needed by C seems dubious at best. Whatever. Moving on... Why? If you know what info is contained in the "some data structure" and it gives you a speedup, who cares? # What I see is that rx_literal is a speed hack to avoid compiling this # into parrot code: That's more or less true. Speed hacks are extremely important in regex engines. # given $a_utf32_code_point { # when U+41 {} # when U+61 {} # when U+C0 <= $_ <= U+C5 {} # when U+E0 <= $_ <= U+E5 {} # # . on and on and on . # default { next ON_SOME_LOOP } # } # # I think that's exactly what you should be doing! Neither # parrot nor the # rx engine should try to be a full compiler. The rx engine definitely # should have opcodes in the virtual machine, but those opcodes should # simply contain state-machine/backtracking info, not godly # unicode info. This "godly Unicode info" is actually limited to "call a transcoding function, a normalizing function, and a case-folding function if /i is used". All of this information is built-in to the string library from the start. # If you want to optimize a regular expression, you should write that # optimizer in Perl6, or Python, or Scheme, or whatever, not in C. C will almost always be faster with this sort of thing than Perl or any other language, simply because of all the extra layers of Stuff you'll have to go through in a higher-level language. # So, what am I saying? # # Once you squash rx_literal and friends, any attempt to benchmark the # "rx" engine really becomes a benchmark of parrot itself. When # you speed # up parrot, you speed up regular expressions. Voila, no more black box. This is true already. The regex engine consists of normal opcodes, so a fast, general Parrot opcode dispatch speedup will speed up regexes, too. # If Parrot is just too damn slow for you, whip out libmylang and do the # nitty gritty yourself. Since this is mostly a "just don't do it" post, # no code is actually *required* from me, right? :) # # Here is an example based on the rx.ops example written in # psuedo-Perl6.apocalypse.4. It may or may not be relevant. Any errors # are my own. Any formatting problems are my fault for using an inferior # mail system. This is purely for
RE: parrot rx engine
Ashley Winters wrote: >First, we set the rx engine to case-insensitive. Why is that bad? It's >setting a runtime property for what should be compile-time >unicode-character-kung-fu. Assuming your "CPU" knows what the gritty >details of unicode in the first place just feels wrong, but I digress. I tend to agree to that. Many run-time options can be turned to compile-time versions of the opcodes, which hopefully will produce a speed increase. >Once you squash rx_literal and friends, any attempt to benchmark the >"rx" engine really becomes a benchmark of parrot itself. When you speed >up parrot, you speed up regular expressions. Voila, no more black box. >If Parrot is just too damn slow for you, whip out libmylang and do the >nitty gritty yourself. Since this is mostly a "just don't do it" post, >no code is actually *required* from me, right? :) We are already doing so. What you are suggesting in fact, is to compile down regular expressions to Perl code (and this one to Parrot then). This will be always slower than directly generating Parrot, because some Perl features prevent the heavy use of some optimitzations (think JIT) that are necessary if we want good regex perfomance. In other words. With your proposal, if you have a better general-purpose optimizer you will get better regex perfomance, but it will always remain worse than the current state. If what you are suggesting is that everything is compiled to general-purpose opcodes (branch, unicode's, etc..) [which is what is derived from your words, but not from your examples], I still believe this to be a perfomance mistake. It would dramatically reduce the code density, and no matter how fast parrot dispatch is, this will kill your perfomance. And using too much stacks (as the usage of exceptions would probably require), will also be too slow (as Brent Dax showed me when we where discussing our two regex opcodes designs). Just my 2 cents (of euros) :) --- Angel faus [EMAIL PROTECTED]
Re: parrot rx engine
>Basically, I see a black-box being built in the interests of speed. >Voodoo array formats, bitmaps, and other such things to avoid actually >spelling out what the regular expression is doing *in parrot code*. [snip] >What I see is that rx_literal is a speed hack to avoid compiling this >into parrot code: [snip] >I think that's exactly what you should be doing! Neither parrot nor the >rx engine should try to be a full compiler. The rx engine definitely >should have opcodes in the virtual machine, but those opcodes should [snip] >Once you squash rx_literal and friends, any attempt to benchmark the >"rx" engine really becomes a benchmark of parrot itself. When you speed >up parrot, you speed up regular expressions. Voila, no more black box. >If Parrot is just too damn slow for you, whip out libmylang and do the This is a serious reply, I'm not taking potshots, but correct me if I'm wrong: by your argument, we should implement lots of other black boxes in "parrot" rather than C such as anything that is not a basic low level call (for example upper layer IO system, buffering, etc.). Otherwise I'm unsure where you think a black box is appropriate and where it isn't. -Melvin
Re: parrot rx engine
On Wed, Jan 30, 2002 at 08:13:55AM -0800, Ashley Winters wrote: > I think that's exactly what you should be doing! Neither parrot nor the > rx engine should try to be a full compiler. The rx engine definitely > should have opcodes in the virtual machine, but those opcodes should > simply contain state-machine/backtracking info, not godly unicode info. So, basically, you just want to push Unicode onto the language that sits atop parrot. If that language were Perl, for instance, you'd advocate that everywhere the user had written /a/ be replaced (by the Perl compiler) with the big long "given" you described? Have I got that right? Excerpt from Apocalypse 2: Perl 6 programs are notionally written in Unicode, and assume Unicode semantics by default even when they happen to be processing other character sets behind the scenes. Note that when we say that Perl is written in Unicode, we're speaking of an abstract character set, not any particular encoding. (The typical program will likely be written in UTF-8 in the West, and in some 16-bit character set in the East.) It seems to me that in order for Perl 6 programs to be written in Unicode, Parrot needs to grok unicode (everwhere, including regular expressions). -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
parrot rx engine
Hello p6i, Who the hell am I? I've been only a weblog-lurker till now. It's been a couple years since I last contributed to Perl5. I just read the latest Apocalypse and it inspired me to get a parrot snapshot and look around. What's my beef? I don't like the rx_literal and rx_oneof ops, and I don't like how the "on parrot strings" thread is being related to the regex engine. I know how useless "wouldn't it be nice" messages are, so please understand my advocacy on this is more than just raving lunacy. Basically, I see a black-box being built in the interests of speed. Voodoo array formats, bitmaps, and other such things to avoid actually spelling out what the regular expression is doing *in parrot code*. What the hell am I talking about? Let me cut&paste some code from the great rx.ops documentation (million thanks to the authors). rx_setprops P0, "i", 2 branch $start0 $advance: rx_advance P0, $fail $start0: rx_literal P0, "a", $advance First, we set the rx engine to case-insensitive. Why is that bad? It's setting a runtime property for what should be compile-time unicode-character-kung-fu. Assuming your "CPU" knows what the gritty details of unicode in the first place just feels wrong, but I digress. Next, a branch. No problem there. Next, a comparison between the string in P0, and whatever the hell "a" means. In this case, it probably means at least: 0041 LATIN CAPITAL LETTER A 0061 LATIN SMALL LETTER A FF21 FULLWIDTH LATIN CAPITAL LETTER A FF41 FULLWIDTH LATIN SMALL LETTER A If you include various diacritic thingies through some voodoo /switch-fu LATIN CAPITAL LETTER A WITH .* LATIN SMALL LETTER A WITH .* AKA. 0041, 0061, 00C0-00C5, 00E0-00E5, 0100-0105, 01CD-01E1, 01FA, 01FB, 0200-0203, 0226, 0227, 1E00, 1E01, 1E9A, 1EA0-1EB7, and perhaps more. Now, the current CVS rx engine is/would do this at runtime. I read that someone else is working on doing that at compile-time (a necessity) and caching the results in some data structure. The "some data structure" part bothers me. Using Perl to create "some data structure" which is needed by C seems dubious at best. Whatever. Moving on... What I see is that rx_literal is a speed hack to avoid compiling this into parrot code: given $a_utf32_code_point { when U+41 {} when U+61 {} when U+C0 <= $_ <= U+C5 {} when U+E0 <= $_ <= U+E5 {} # . on and on and on . default { next ON_SOME_LOOP } } I think that's exactly what you should be doing! Neither parrot nor the rx engine should try to be a full compiler. The rx engine definitely should have opcodes in the virtual machine, but those opcodes should simply contain state-machine/backtracking info, not godly unicode info. If you want to optimize a regular expression, you should write that optimizer in Perl6, or Python, or Scheme, or whatever, not in C. So, what am I saying? Once you squash rx_literal and friends, any attempt to benchmark the "rx" engine really becomes a benchmark of parrot itself. When you speed up parrot, you speed up regular expressions. Voila, no more black box. If Parrot is just too damn slow for you, whip out libmylang and do the nitty gritty yourself. Since this is mostly a "just don't do it" post, no code is actually *required* from me, right? :) Here is an example based on the rx.ops example written in psuedo-Perl6.apocalypse.4. It may or may not be relevant. Any errors are my own. Any formatting problems are my fault for using an inferior mail system. This is purely for entertainment purposes, no warranty or specification expressed or implied. #!/usr/bin/perl6 # /ab*[cd]+/i sub match ($string) { return false if $string.length < 2; my $r = rx::allocateinfo($string); ADVANCE: loop { NEXT { $r.advance or last ADVANCE } # /a/ given $r.current_code_point { # whatever when U+41, U+61 {} # it's an "a" default { next ADVANCE } } # rx_literal used to move the current pointer # upon success. replace with rx_next_code_point? $r.next_code_point; $r.pushmark;# backtracking starts here # /b*/ loop { NEXT { $r.next_code_point or last; $r.pushindex } given $r.current_code_point { when U+42, U+62 {} # it's a "b" # one-to-many unicode ops resolved at compile-time? # when $_ =~ any(toupper(U+62), tolower(U+62), # totitle(U+62), tofold(U+62)) {} default { last } } } loop { NEXT { if $r.distance_from_last_index > 1 { return true; # success... } else { $r.popindex or last; # backtrack or start over } } # /[cd]+/ loop { NEXT { $r.next_code_point or last }
Re: flags in io/io_unix.c
At 10:16 AM 1/30/2002 -0500, Andy Dougherty wrote: >Sun's compiler is (rightly) complaining about the following lines in >io/io_unix.c: > >PIO_unix_fdopen() is defined to take a UINTVAL fourth argument: > > ParrotIO * PIO_unix_fdopen(theINTERP, ParrotIOLayer * layer, > PIOHANDLE fd, UINTVAL flags); > >but it is later called with a string fourth argument, e.g.: > > PIO_unix_fdopen(interpreter, layer, STDIN_FILENO, "<")) > >Does anyone know the actual intent? Which one is right? Yep thats my bug. Low level fdopen should take an int val flags, not a string. I'll commit a fix. -Melvin
Re: [PATCH] MANIFEST update [APPLIED]
On Wed, 30 Jan 2002, Simon Glover wrote: > > Enclosed patch adds the new SPARC-based JIT files to the manifest, > and also puts it back into alphabetical order. > > Simon > Applied thanks.
flags in io/io_unix.c
Sun's compiler is (rightly) complaining about the following lines in io/io_unix.c: PIO_unix_fdopen() is defined to take a UINTVAL fourth argument: ParrotIO * PIO_unix_fdopen(theINTERP, ParrotIOLayer * layer, PIOHANDLE fd, UINTVAL flags); but it is later called with a string fourth argument, e.g.: PIO_unix_fdopen(interpreter, layer, STDIN_FILENO, "<")) Does anyone know the actual intent? Which one is right? -- Andrew Dougherty[EMAIL PROTECTED]
Re: [PATCH] Clean-up warnings
On Wed, Jan 30, 2002 at 01:15:42PM +, Simon Glover wrote: > > This patch clears up warnings in embed.c and test_main.c coming > from function declarations of the form: Thanks applied (with modifications, in that the functions in test_main are now declared static). Nicholas Clark -- EMCFT http://www.ccl4.org/~nick/CV.html
[PATCH] Clean-up warnings
This patch clears up warnings in embed.c and test_main.c coming from function declarations of the form: void foobar(); which should properly be void foobar(void); Simon --- test_main.c.old Tue Jan 29 20:00:01 2002 +++ test_main.c Wed Jan 30 12:05:36 2002 @@ -18,10 +18,10 @@ parseflags(struct Parrot_Interp *interpreter, int *argc, char **argv[]); void -usage(); +usage(void); void -version(); +version(void); int main(int argc, char *argv[]) { --- embed.c.old Wed Jan 30 12:10:03 2002 +++ embed.c Wed Jan 30 12:13:53 2002 @@ -19,7 +19,7 @@ static BOOLVAL world_inited=0; struct Parrot_Interp * -Parrot_new() { +Parrot_new(void) { if(!world_inited) { world_inited=1; init_world();
[PATCH] interp_new and Parrot_new
In embed.h, we declare a function: struct Parrot_Interp *interp_new(); that's never subsequently used. On the other hand, in embed.c, we use a function struct Parrot_Interp * Parrot_new() { ... that isn't previously declared. Are these supposed to be the same thing? If so, the patch below fixes up the header. Simon --- include/parrot/embed.h.old Wed Jan 30 12:03:38 2002 +++ include/parrot/embed.h Wed Jan 30 12:15:37 2002 @@ -31,7 +31,7 @@ struct Parrot_Interp; struct PackFile; -struct Parrot_Interp *interp_new(); +struct Parrot_Interp *Parrot_new(void); void Parrot_init(struct Parrot_Interp *);
Re: CVS Reorganization Complete
begin quote from Simon Cozens: > > - Moved t/op/pmc* to t/op/pmc/ Sorry. Bad instructions. t/op/pmc/pmc_(.*) should become t/pmc/$1. Thanks to Ask for fixing this. I'll fix the MANIFEST. -- The warly race may riches chase, An' riches still may fly them, O; An' tho' at last they catch them fast, Their hearts can ne'er enjoy them, O.
[PATCH] MANIFEST update
Enclosed patch adds the new SPARC-based JIT files to the manifest, and also puts it back into alphabetical order. Simon --- MANIFEST.oldWed Jan 30 11:42:42 2002 +++ MANIFESTWed Jan 30 11:46:42 2002 @@ -6,35 +6,9 @@ MANIFEST Makefile.in NEWS -lib/Parrot/Assembler.pm -lib/Parrot/BuildUtil.pm -lib/Parrot/Jit/alpha-bsd.pm -lib/Parrot/Jit/alpha-linux.pm -lib/Parrot/Jit/alphaGeneric.pm -lib/Parrot/Jit/i386-bsd.pm -lib/Parrot/Jit/i386-linux.pm -lib/Parrot/Jit/i386-nojit.pm -lib/Parrot/Jit/i386Generic.pm -lib/Parrot/Op.pm -lib/Parrot/OpTrans.pm -lib/Parrot/OpTrans/C.pm -lib/Parrot/OpTrans/CGoto.pm -lib/Parrot/OpTrans/CPrederef.pm -lib/Parrot/OpsFile.pm -lib/Parrot/Optimizer.pm -lib/Parrot/PackFile.pm -lib/Parrot/PackFile/ConstTable.pm -lib/Parrot/PackFile/Constant.pm -lib/Parrot/PackFile/FixupTable.pm -lib/Parrot/String.pm -lib/Parrot/Test.pm -lib/Parrot/Vtable.pm README README.OS_X TODO -lib/Test/Builder.pm -lib/Test/More.pm -lib/Test/Simple.pm Types_pm.in VERSION assemble.pl @@ -147,6 +121,9 @@ jit/i386/core.jit jit/i386/lib.jit jit/i386/string.jit +jit/sun4/core.jit +jit/sun4/lib.jit +jit/sun4/string.jit jit2h.pl key.c languages/Makefile.in @@ -189,6 +166,34 @@ languages/scheme/t/harness languages/scheme/t/io/basic.t languages/scheme/t/logic/basic.t +lib/Parrot/Assembler.pm +lib/Parrot/BuildUtil.pm +lib/Parrot/Jit/alpha-bsd.pm +lib/Parrot/Jit/alpha-linux.pm +lib/Parrot/Jit/alphaGeneric.pm +lib/Parrot/Jit/i386-bsd.pm +lib/Parrot/Jit/i386-linux.pm +lib/Parrot/Jit/i386-nojit.pm +lib/Parrot/Jit/i386Generic.pm +lib/Parrot/Jit/sun4-solaris.pm +lib/Parrot/Jit/sun4Generic.pm +lib/Parrot/Op.pm +lib/Parrot/OpTrans.pm +lib/Parrot/OpTrans/C.pm +lib/Parrot/OpTrans/CGoto.pm +lib/Parrot/OpTrans/CPrederef.pm +lib/Parrot/OpsFile.pm +lib/Parrot/Optimizer.pm +lib/Parrot/PackFile.pm +lib/Parrot/PackFile/ConstTable.pm +lib/Parrot/PackFile/Constant.pm +lib/Parrot/PackFile/FixupTable.pm +lib/Parrot/String.pm +lib/Parrot/Test.pm +lib/Parrot/Vtable.pm +lib/Test/Builder.pm +lib/Test/More.pm +lib/Test/Simple.pm make.pl make_vtable_ops.pl manicheck.pl
Re: [PATCH lib/Parrot/Test.pm] More info about failed compiles
begin quote from Michael G Schwern: > This little patch makes command failures in tests (ie. if Parrot pukes > on compile) report the command and exit code like so: Thank you. This is quite wonderful. -- I want you to know that I create nice things like this because it pleases the Author of my story. If this bothers you, then your notion of Authorship needs some revision. But you can use perl anyway. :-) - Larry Wall
[PATCH lib/Parrot/Test.pm] More info about failed compiles
This little patch makes command failures in tests (ie. if Parrot pukes on compile) report the command and exit code like so: # 'perl assemble.pl t/op/basic2.pasm --output t/op/basic2.pbc' failed with exit code 1 I don't know if that's informative enough, but its a start anyway. --- lib/Parrot/Test.pm 29 Jan 2002 02:32:15 - 1.12 +++ lib/Parrot/Test.pm 30 Jan 2002 11:38:11 - @@ -37,6 +37,8 @@ } system "$^X -e \"$redir_string;system q{$command};\""; + my $exit_code = $? / 256; + $Builder->diag("'$command' failed with exit code $exit_code") if $exit_code; } my $count; -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One sort God kill 9, @ARGV;
Re: New Todo
On Wed, Jan 30, 2002 at 07:48:25AM +0100, Paul Johnson wrote: > On Tue, Jan 29, 2002 at 09:57:16PM +, Simon Cozens wrote: > > > I've started a new TODO list. Remind me of anything else that needs > > doing; > > Sandboxes. > > Has anyone given any thought as to whether Parrot should support > "use Safe", and if so, how? And remember that Safe is built on ops (ops.pm etc) and ops is very useful in it's own right (eg for allowing limited perl ops in a config file). Tim.
Re: CVS Reorganization Complete
begin quote from Robert Spier: > - Renamed include/parrot/register_funcs.h to regfuncs.h > - Renamed languages/miniperl/miniperlc to mpc > - Moved t/op/pmc* to t/op/pmc/ > - Moved Parrot/* to lib/Parrot/ > - Moved Test/* to lib/Test/ Thanks to Robert and Jeff for this reorganisation and the post-reorg cleanups. -- Look, there are only a few billion people in the world, right? And they can only possibly know a few thousand bits of information not known by someone else, right? So the human race will never have a real need for more than a few terabits of storage, except possibly as cache. - Geraint Jones