[PATCH] Regular expression stacks
The attached patch adds a new stack type that only handles INTVALs. These are much more efficient than generic stacks--on Win32 they shave a few ten-thousandths of a second off each run of the rx_popindex op, and take a full hundredth of a second off the benchmark. It also shows performance improvements on BSD. They also take up less memory. All tests pass on both platforms; one warning is removed (as a side effect of the modified interface for regex stacks) and no new ones are introduced. --Brent Dax [EMAIL PROTECTED] Parrot Configure pumpking and regex hacker obra . hawt sysadmin chx0rs lathos This is sad. I know of *a* hawt sysamin chx0r. obra I know more than a few. lathos obra: There are two? Are you sure it's not the same one? diff -ruN -x CVS parrot-cvs/parrot/MANIFEST parrot/parrot/MANIFEST --- parrot-cvs/parrot/MANIFEST Wed Jan 16 01:14:26 2002 +++ parrot/parrot/MANIFEST Tue Jan 15 18:03:52 2002 @@ -114,6 +114,8 @@ include/parrot/resources.h include/parrot/runops_cores.h include/parrot/rx.h +include/parrot/rxstacks.h +include/parrot/runops_cores.h include/parrot/stacks.h include/parrot/string.h include/parrot/trace.h @@ -193,6 +195,7 @@ runops_cores.c rx.c rx.ops +rxstacks.c stacks.c string.c t/harness diff -ruN -x CVS parrot-cvs/parrot/Makefile.in parrot/parrot/Makefile.in --- parrot-cvs/parrot/Makefile.in Wed Jan 16 01:14:26 2002 +++ parrot/parrot/Makefile.in Tue Jan 15 18:02:56 2002 @@ -63,7 +63,7 @@ $(INC)/global_setup.h $(INC)/vtable.h $(INC)/oplib/core_ops.h $(INC)/oplib/core_ops_prederef.h \ $(INC)/runops_cores.h $(INC)/trace.h \ $(INC)/pmc.h $(INC)/key.h $(INC)/resources.h $(INC)/platform.h \ -$(INC)/interp_guts.h ${jit_h} ${jit_struct_h} $(INC)/rx.h +$(INC)/interp_guts.h ${jit_h} ${jit_struct_h} $(INC)/rx.h $(INC)/rxstacks.h CLASS_O_FILES = classes/default$(O) classes/perlint$(O) classes/perlstring$(O) \ classes/perlnum$(O) classes/perlarray$(O) classes/perlundef$(O) \ @@ -79,7 +79,7 @@ INTERP_O_FILES = exceptions$(O) global_setup$(O) interpreter$(O) parrot$(O) register$(O) \ core_ops$(O) core_ops_prederef$(O) memory$(O) packfile$(O) stacks$(O) \ string$(O) encoding$(O) chartype$(O) runops_cores$(O) trace$(O) pmc$(O) key$(O) \ -platform$(O) ${jit_o} resources$(O) rx$(O) +platform$(O) ${jit_o} resources$(O) rx$(O) rxstacks$(O) O_FILES = $(INTERP_O_FILES) $(IO_O_FILES) $(CLASS_O_FILES) $(ENCODING_O_FILES) $(CHARTYPE_O_FILES) @@ -303,6 +303,8 @@ register$(O): $(H_FILES) rx$(O): $(H_FILES) + +rxstacks$(O): $(H_FILES) stacks$(O): $(H_FILES) diff -ruN -x CVS parrot-cvs/parrot/include/parrot/rx.h parrot/parrot/include/parrot/rx.h --- parrot-cvs/parrot/include/parrot/rx.h Wed Jan 16 01:14:26 2002 +++ parrot/parrot/include/parrot/rx.h Tue Jan 15 14:26:50 2002 @@ -1,7 +1,7 @@ /* rx.h * Copyright: (When this is determined...it will go here) * CVS Info - * $Id: rx.h,v 1.8 2002/01/15 22:13:39 brentdax Exp $ + * $Id: rx.h,v 1.7 2002/01/15 16:54:35 brentdax Exp $ * Overview: * Supporting file for the regular expression engine * Data Structure and Algorithms: @@ -16,6 +16,7 @@ #define PARROT_RX_H_GUARD #include parrot/parrot.h +#include parrot/rxstacks.h typedef struct bitmap_t { char *bmp; @@ -57,7 +58,7 @@ opcode_t *substfunc; -struct Stack_chunk_t* stack; +rxStack stack; } rxinfo; #if __cplusplus diff -ruN -x CVS parrot-cvs/parrot/include/parrot/rxstacks.h parrot/parrot/include/parrot/rxstacks.h --- parrot-cvs/parrot/include/parrot/rxstacks.h Wed Dec 31 16:00:00 1969 +++ parrot/parrot/include/parrot/rxstacks.h Tue Jan 15 14:30:16 2002 @@ -0,0 +1,55 @@ +/* stacks.h + * Copyright: (When this is determined...it will go here) + * CVS Info + * $Id$ + * Overview: + * Regex stack handling routines for Parrot + * Data Structure and Algorithms: + * History: + * Notes: + * References: + */ + +#if !defined(PARROT_RXSTACKS_H_GUARD) +#define PARROT_RXSTACKS_H_GUARD + +#include parrot/parrot.h + +#define STACK_CHUNK_DEPTH 256 + +typedef struct rxStack_entry_t { +INTVAL value; +}* rxStack_Entry; + +typedef struct rxStack_chunk_t { + INTVAL used; + struct rxStack_chunk_t *next; + struct rxStack_chunk_t *prev; + struct rxStack_entry_t entry[STACK_CHUNK_DEPTH]; +}* rxStack_Chunk; + +typedef rxStack_Chunk rxStack; + +rxStack +rxstack_new(struct Parrot_Interp *); + +INTVAL +rxstack_depth(struct Parrot_Interp *, rxStack); + +void +rxstack_push(struct Parrot_Interp *, rxStack, INTVAL); + +INTVAL +rxstack_pop(struct Parrot_Interp *, rxStack); + +#endif + +/* + * Local variables: + * c-indentation-style: bsd + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + * + * vim: expandtab shiftwidth=4: +*/ diff -ruN -x CVS parrot-cvs/parrot/rx.c parrot/parrot/rx.c --- parrot-cvs/parrot/rx.c Wed Jan 16 01:14:26 2002 +++ parrot/parrot/rx.c Tue Jan 15 17:57:12 2002 @@ -37,7 +37,7 @@ rx-groupstart=pmc_new(interpreter, enum_class_PerlArray);
Re: gcc warnings: rx-startindex
On Tue, Jan 15, 2002 at 06:51:25PM -0500, Melvin Smith wrote: Hey Nicholas, Just to be clear, I wasn't directing my concern at anyone, nor am I not glad for the work, heck you've probably contributed more to this project than me. It was just a general concern that I felt should be thought about. I've contributed too high a talk to do ratio than I'd like. And I fear I've upset this Finnish guy by mentioning decaf, so I might have to appease him by contributing more patches to perl5. :-) I was also feeling guilty that I've effectively also suggested that compiler warnings would be a good thing long term, volunteered to clean the current ones up, and then found once compiler warnings were turned on that there are many many of the things, that it's a far bigger job than I'd realised to clean them up (without introducing bugs) and that I'm not going to be able to deliver on cleaning them all up in the short term. (Or my between the lines position of getting to zero warnings and then deciding that all new warnings were introduced by code patches, not my patches to makefiles, so I don't feel duty bound to clean them up, and go off and hack perl5 again instead) I could also be overly paranoid. No no no. There is no such thing as overly paranoid. Even now the bugs are breeding, conspiring, out to get each and every one of us... Nicholas Clark -- ENOJOB http://www.ccl4.org/~nick/CV.html
[PATCH] (Easy) cleanup of obvious warnings
The following patch cleans up some 700+ warnings on my Solaris 8/gcc-2.8 system. I've bundled them all together since they are (I hope) non-controversial. Two hunks merit special mention: The first is removing -ansi -pendantic, which I ranted about yesterday, and is necessary to apply to get anywhere at all. The second is classes/pmc2c.pl. I now have things like default.c include their relevant header files, e.g. default.h. A dependency issue could eventually arise as header files are needed before they are built, but that's a problem for another day, I think. Patch and enjoy, diff -r -u parrot/Configure.pl parrot-andy/Configure.pl --- parrot/Configure.pl Tue Jan 15 10:08:15 2002 +++ parrot-andy/Configure.plWed Jan 16 09:39:51 2002 @@ -393,7 +393,7 @@ my @opt_and_vers = -(0 = -Wall -ansi -pedantic -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Winline -W -Wsign-compare -Wno-unused, +(0 = -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow +-Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion +-Waggregate-return -Winline -W -Wsign-compare -Wno-unused, # others; ones we might like marked with ? # ? -Wundef for undefined idenfiers in #if # ? -Wbad-function-cast diff -r -u parrot/classes/pmc2c.pl parrot-andy/classes/pmc2c.pl --- parrot/classes/pmc2c.pl Thu Jan 3 21:29:18 2002 +++ parrot-andy/classes/pmc2c.plWed Jan 16 10:57:04 2002 @@ -228,7 +228,12 @@ my $includes = ''; foreach my $class (keys %visible_supers) { - next if $class eq $classname; + # This used to be + #next if $class eq $classname; + # But most files (e.g. default.c) should include their own headers + # (e.g. default.h). I'm not sure what this was attempting to + # guard against, so I've left this comment in as a reminder. + # -- A.D. 1/2002. $includes .= qq(#include \L$class.h\n); } @@ -243,6 +248,9 @@ unless (exists $flags{noinit}) { my $initline = 1+count_newlines($OUT)+1; $OUT .= qq(#line $initline $cfile\n) unless $suppress_lines; + $HOUT .= EOH; +void $initname (void); +EOH $OUT .= EOC; void $initname (void) { diff -r -u parrot/key.c parrot-andy/key.c --- parrot/key.cTue Jan 15 10:08:16 2002 +++ parrot-andy/key.c Wed Jan 16 10:08:21 2002 @@ -26,14 +26,14 @@ debug_key (struct Parrot_Interp* interpreter, KEY* key) { INTVAL i; fprintf(stderr, *** key %p\n,key); - fprintf(stderr, *** size %d\n,key-size); + fprintf(stderr, *** size INTVAL_FMT \n,key-size); for(i=0;ikey-size;i++) { INTVAL type = key-keys[i].type; if(type == enum_key_bucket) { - fprintf(stderr, *** Bucket %d type %d\n,i,type); + fprintf(stderr, *** Bucket INTVAL_FMT type INTVAL_FMT \n,i,type); } else if(type != enum_key_undef) { - fprintf(stderr, *** Other %d type %d\n,i,type); + fprintf(stderr, *** Other INTVAL_FMT type INTVAL_FMT \n,i,type); } } } diff -r -u parrot/runops_cores.c parrot-andy/runops_cores.c --- parrot/runops_cores.c Tue Jan 15 10:08:17 2002 +++ parrot-andy/runops_cores.c Wed Jan 16 10:27:28 2002 @@ -45,7 +45,7 @@ INTVAL code_size; opcode_t * code_end; opcode_t * lastpc = NULL; -FLOATVAL time = 0; +FLOATVAL starttime = 0; code_start = (opcode_t *)interpreter-code-byte_code; code_size = interpreter-code-byte_code_size; @@ -59,7 +59,7 @@ if (interpreter-flags PARROT_PROFILE_FLAG) { interpreter-profile[*pc].numcalls++; lastpc=pc; -time=Parrot_floatval_time(); +starttime=Parrot_floatval_time(); } DO_OP(pc, interpreter); @@ -68,7 +68,7 @@ trace_op(interpreter, code_start, code_end, pc); } if (interpreter-flags PARROT_PROFILE_FLAG) { -interpreter-profile[*lastpc].time += Parrot_floatval_time() - time; +interpreter-profile[*lastpc].time += Parrot_floatval_time() - starttime; } } diff -r -u parrot/test_main.c parrot-andy/test_main.c --- parrot/test_main.c Tue Jan 15 10:08:21 2002 +++ parrot-andy/test_main.c Wed Jan 16 09:54:30 2002 @@ -246,7 +246,7 @@ unsigned int j; int op_count = 0; int call_count = 0; -FLOATVAL time = 0.0; +FLOATVAL tottime = 0.0; printf(Operation profile:\n\n); @@ -257,7 +257,7 @@ if(interpreter-profile[j].numcalls 0) { op_count++; call_count += interpreter-profile[j].numcalls; -time += interpreter-profile[j].time; +tottime += interpreter-profile[j].time; printf( %5d %-12s %12ld %5.6f %5.6f\n, j,
patchy patch applications
Folks, I've been downsized, and as a result I'm sans laptop for a bit. I'm going to fix that soon, but until then my patch application will be a bit spotty as I'm not quite set up for it. So, if you've got commit privs and the patch passes muster on-list, or is sensible, go commit it and we'll deal with the potential aftermath (and I don't expect there will be) later. If you *don't* have commit privs but should (Andy, Steve, Nick, Melvin, other folks I've forgotten) go over to dev.perl.com, set up an account, and pop me mail with your account name. I'll get you set up. If there are any outstanding patches (the scheme one is a biggie) that need some discussion, I'll try and get to them as soon as I can. Thanks. Dan
Re: [PATCH] Regular expression stacks
On Wed, Jan 16, 2002 at 01:30:42AM -0800, Brent Dax wrote: The attached patch adds a new stack type that only handles INTVALs. These are much more efficient than generic stacks--on Win32 they shave a few ten-thousandths of a second off each run of the rx_popindex op, and take a full hundredth of a second off the benchmark. It also shows performance improvements on BSD. They also take up less memory. All tests pass on both platforms; one warning is removed (as a side effect of the modified interface for regex stacks) and no new ones are introduced. Why call them rxStacks if they're just stacks of INTVALs? Why not intStack or something? I can see them being useful in other code too.
Re: [PATCH] (Easy) cleanup of obvious warnings
On Wed, Jan 16, 2002 at 11:11:10AM -0500, Andy Dougherty wrote: diff -r -u parrot/classes/pmc2c.pl parrot-andy/classes/pmc2c.pl --- parrot/classes/pmc2c.pl Thu Jan 3 21:29:18 2002 +++ parrot-andy/classes/pmc2c.pl Wed Jan 16 10:57:04 2002 @@ -228,7 +228,12 @@ my $includes = ''; foreach my $class (keys %visible_supers) { - next if $class eq $classname; + # This used to be + # next if $class eq $classname; + # But most files (e.g. default.c) should include their own headers + # (e.g. default.h). I'm not sure what this was attempting to + # guard against, so I've left this comment in as a reminder. + # -- A.D. 1/2002. $includes .= qq(#include \L$class.h\n); } That was me. I'm still unsure of the best way to do this. Before I added inheritance, there were no .h files (except for default.h, which was hand-generated), and more importantly, all of the PMC methods were static. I was trying to make as little a change as possible, so first I just spewed out prototypes for everything a particular .c needed, then later decided that was silly and went the .h route. The line you commented out was purely a thinko resulting from that process. However, I'm still not sure if we want to pollute the global namespace with all of the method names for all PMC classes. I wonder if it would be better to export a single Parrot_class_fillin_vtable function from each PMC, and just call each of your parents' in turn, and then override with the locally defined methods. Then none of the method names would need to be exported. On the other hand, if we wanted to use SUPER::method, we'd need to keep a copy of the table just before overriding it with the locally defined methods. Am I making sense? Andy's patch is definitely correct, but anyone want to venture an opinion about namespace pollution? - fprintf(stderr, *** size %d\n,key-size); + fprintf(stderr, *** size INTVAL_FMT \n,key-size); That's what I've been doing in my local copy, but is that portable? I seem to remember that some preprocessors require strange tricks to concatenate strings.
Re: patchy patch applications
On Wed, Jan 16, 2002 at 11:25:29AM -0500, Dan Sugalski wrote: should (Andy, Steve, Nick, Melvin, other folks I've forgotten) go over to dev.perl.com, set up an account, and pop me mail with your account dev.perl.org
Re: [PATCH] (Easy) cleanup of obvious warnings
On Wed, 16 Jan 2002, Steve Fink wrote: On Wed, Jan 16, 2002 at 11:11:10AM -0500, Andy Dougherty wrote: - fprintf(stderr, *** size %d\n,key-size); + fprintf(stderr, *** size INTVAL_FMT \n,key-size); That's what I've been doing in my local copy, but is that portable? I seem to remember that some preprocessors require strange tricks to concatenate strings. Yes, it's portable enough. (It is what current bleadperl uses.) Your memory is correct, however, in that KR-era pre-processers did indeed require quite strange tricks, and Configure used to try to find the best one. -- Andy Dougherty [EMAIL PROTECTED] Dept. of Physics Lafayette College, Easton PA 18042
Re: [PATCH] (Easy) cleanup of obvious warnings
On Wed, 16 Jan 2002, Steve Fink wrote: - fprintf(stderr, *** size %d\n,key-size); + fprintf(stderr, *** size INTVAL_FMT \n,key-size); That's what I've been doing in my local copy, but is that portable? I seem to remember that some preprocessors require strange tricks to concatenate strings. KR says that this is an OK thing to do. Alex Gough
RE: [PATCH] Regular expression stacks
Steve Fink: # On Wed, Jan 16, 2002 at 01:30:42AM -0800, Brent Dax wrote: # The attached patch adds a new stack type that only handles INTVALs. # These are much more efficient than generic stacks--on Win32 # they shave a # few ten-thousandths of a second off each run of the # rx_popindex op, and # take a full hundredth of a second off the benchmark. It also shows # performance improvements on BSD. They also take up less # memory. All # tests pass on both platforms; one warning is removed (as a # side effect # of the modified interface for regex stacks) and no new ones are # introduced. # # Why call them rxStacks if they're just stacks of INTVALs? Why not # intStack or something? I can see them being useful in other code too. That makes sense, I suppose. I don't know where else it would be used, so I didn't think of it. --Brent Dax [EMAIL PROTECTED] Parrot Configure pumpking and regex hacker obra . hawt sysadmin chx0rs lathos This is sad. I know of *a* hawt sysamin chx0r. obra I know more than a few. lathos obra: There are two? Are you sure it's not the same one?