HP-UX 11.00 back on track again
Automated smoke report for patch Oct 28 20:00:01 2001 UTC v0.02 on hpux using cc version B.11.11.02 O = OK F = Failure(s), extended report at the bottom ? = still running or test results not (yet) available Build failures during: - = unknown c = Configure, m = make, t = make test-prep Configuration --- O O O O nv=double O O iv=int O O iv=int --define nv=double O O iv=long O O iv=long --define nv=double | | | +- --debugging +--- normal -- H.Merijn BrandAmsterdam Perl Mongers (http://www.amsterdam.pm.org/) using perl-5.6.1, 5.7.2 629 on HP-UX 10.20 11.00, AIX 4.2, AIX 4.3, WinNT 4, Win2K pro WinCE 2.11. Smoking perl CORE: [EMAIL PROTECTED] http:[EMAIL PROTECTED]/ [EMAIL PROTECTED] send smoke reports to: [EMAIL PROTECTED], QA: http://qa.perl.org
Re: Parameter passing conventions
Dan -- On Fri, 2001-10-26 at 16:38, Dan Sugalski wrote: Okay, here are the conventions. Looks like I'm going to have to write some real logic in jakoc pretty soon... *) The callee is responsible for saving and restoring non-scratch registers Nice for callee since if its work fits into five regs of each type its not going to have to do any saves or restores. Caller, though, is going to have to vacate those regs. So, if caller got args in those regs and then calls anyone else, it has to move them from those regs (or save them). *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch and do not have to be preserved by the callee Still thinking about this... We are reducing the overall number of reg copies going on by adding these special cases. I just wish we had an approach that was both uniform (simple, no special cases) and fast too. *) In *ALL* cases where the stack is used, things are put on the stack in *reverse* order. The topmost stack element *must* be the integer count of the number of elements on the stack OK. *) The callee is responsible for making sure the stack is cleaned off. So, in the case of zero args, do we still push a zero on the stack to make a proper frame? I think yes... Inbound args If the called subroutine has a fixed number of arguments, they will be placed in the first five registers of the appropriate register types. First integer goes in I0, second in I1, and so on. If there are too many arguments of a particular type the overflow go on the stack. If there are a variable number of arguments, all the *non* fixed args go on the stack. So for right now, just pretend that all Jako subroutines take a variable number of args.. :) (Until I get the time to write fully compatible conventions in jakoc, anyway). Can we have ops to inquire on the type of the topmost stack entry? [snip] Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/
Re: Parameter passing conventions
Sam -- Okay, here are the conventions. Great. Anyone want to offer up some examples or should I just wait for Jako support to see this in action? I'll be working on making jakoc support the convention, but it may take a while with my day job duties as they are. If I can get it in quickly I will, but please continue breathing :) The first step I'm going to take is to start putting the arg and result counts on the stack, and remove the stack rotation stuff. Then, I'll start thinking about how I want to wrap up the conventions so I don't have to think about them more than once. Hey! We should be thinking about the minimum amount of stuff we need to do to support separate compilation so we can implement the conventions in more than one of the Parrot-targeted languages and do a demo of mixed language programming. Heres a partial list: * export table segment in packfile. Put the subroutine entry points here. * import table segment in packfile (fixup table sufficient for this?) Put the unresolved external symbols here. * possibly unify all this into symbol table segment. * linker that takes multiple pbc files and concatenates them, doing relocating to produce a single pbc file. Regards -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/
Re: Parameter passing conventions
At 08:43 AM 10/29/2001 -0500, Gregor N. Purdy wrote: Dan -- On Fri, 2001-10-26 at 16:38, Dan Sugalski wrote: Okay, here are the conventions. Looks like I'm going to have to write some real logic in jakoc pretty soon... Ahhh! The horror! :-) Seriously, the conventions are geared towards full-blown compilers with a reasonable register ordering module at the very least, which isn't unreasonable to expect for a language implementation. (And folks that want to fake out using a stack will probably work with the top few registers to avoid having to deal with parameter conflicts) *) The callee is responsible for saving and restoring non-scratch registers Nice for callee since if its work fits into five regs of each type its not going to have to do any saves or restores. Caller, though, is going to have to vacate those regs. So, if caller got args in those regs and then calls anyone else, it has to move them from those regs (or save them). Caller will only have to vacate those registers if they're being used and need to last past the call to the function. If the register assignment algorithm's clever (which is a big if) the lifetime of temporaries will keep function calls in mind. *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch and do not have to be preserved by the callee Still thinking about this... We are reducing the overall number of reg copies going on by adding these special cases. I just wish we had an approach that was both uniform (simple, no special cases) and fast too. You, and me, and about a zillion other people. Generally speaking the choices are fast, uniform, and scalable. Choose two. This is really only an issue for folks writing code generators by hand, and with 32 of each register type most people won't hit it. Plain parser add-ons will use the core code generator, so they won't need to worry about it. *) The callee is responsible for making sure the stack is cleaned off. So, in the case of zero args, do we still push a zero on the stack to make a proper frame? I think yes... If the function is listed as taking a variable number of args, yes. Functions marked as taking no args at all don't get anything put on the stack. Inbound args If the called subroutine has a fixed number of arguments, they will be placed in the first five registers of the appropriate register types. First integer goes in I0, second in I1, and so on. If there are too many arguments of a particular type the overflow go on the stack. If there are a variable number of arguments, all the *non* fixed args go on the stack. So for right now, just pretend that all Jako subroutines take a variable number of args.. :) (Until I get the time to write fully compatible conventions in jakoc, anyway). That's fine. A perfectly workable solution. Can we have ops to inquire on the type of the topmost stack entry? In the works, yep. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Parameter passing conventions
At 08:52 AM 10/29/2001 -0500, Gregor N. Purdy wrote: The first step I'm going to take is to start putting the arg and result counts on the stack, and remove the stack rotation stuff. Leave the rotate opcode, though. That might come in handy for the Forth/Scheme/Postscript folks, once we have them. Hey! We should be thinking about the minimum amount of stuff we need to do to support separate compilation so we can implement the conventions in more than one of the Parrot-targeted languages and do a demo of mixed language programming. Darned straight. Anyone want to take a shot at a proposed bytecode file format update? Heres a partial list: * export table segment in packfile. Put the subroutine entry points here. Yep. * import table segment in packfile (fixup table sufficient for this?) Put the unresolved external symbols here. Dunno if we need this. We can leave symbol resolution to runtime when we come across them, but we probably ought to have it for those languages that want full linktime resolution. * possibly unify all this into symbol table segment. That would be spiffy-keen. :) * linker that takes multiple pbc files and concatenates them, doing relocating to produce a single pbc file. While I don't think we need this for normal use, it could be quite handy. (I don't want to require linking before running--loading up module bytecode at runtime is definitely a requirement) Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Parameter passing conventions
Dan -- Looks like I'm going to have to write some real logic in jakoc pretty soon... Ahhh! The horror! :-) :) Seriously, the conventions are geared towards full-blown compilers with a reasonable register ordering module at the very least, which isn't unreasonable to expect for a language implementation. (And folks that want to fake out using a stack will probably work with the top few registers to avoid having to deal with parameter conflicts) I am thinking about having Jako take the position that it doesn't use those regs except for calls, and values are immediately copied from those regs to the real regs for the variables as the tail end of the callee part of the subroutine linkage. That will at least permit Jako to be correct even if it isn't as efficient as possible. Later I can worry about being smarter. Heck, right now Jako sometimes generates code with a branch to the next instruction (ah, the joy of simple code generators...). *) The callee is responsible for saving and restoring non-scratch registers Nice for callee since if its work fits into five regs of each type its not going to have to do any saves or restores. Caller, though, is going to have to vacate those regs. So, if caller got args in those regs and then calls anyone else, it has to move them from those regs (or save them). Caller will only have to vacate those registers if they're being used and need to last past the call to the function. If the register assignment algorithm's clever (which is a big if) the lifetime of temporaries will keep function calls in mind. Big if indeed. At least for Jako's near future. *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch and do not have to be preserved by the callee Still thinking about this... We are reducing the overall number of reg copies going on by adding these special cases. I just wish we had an approach that was both uniform (simple, no special cases) and fast too. You, and me, and about a zillion other people. Generally speaking the choices are fast, uniform, and scalable. Choose two. H. I tried reading section 29 (Subroutine Linkage) of the MMIXware book (pages 32-34) for inspiration, but I didn't see how anything there could help us. MMIX has 256 logical general-purpose 64-bit registers. That's a handy reg size since a reasonable float can sit in there as well as an unreasonable int. The local-marginal-global register distinction used by MMIX is interesting, but I think it might lose its appeal with 4 distinct typed register files. Knuth does make the statement: These conventions for parameter passing are admittedly a bit confusing in the general case, and I suppose people who use them extensively might sometime find themselves talking about the infamous MMIX register shuffle. However, there is good use for subroutines that convert a sequence of register contents like (x, a, b, c) into (f, a, b, c) where f is a function of a, b, and c but not x. Moreover PUSHGO and POP can be implemented with great efficiency, and subroutine linkage tends to be a significant bottleneck when other conventions are used. Its that last sentence that got my attention... But, I still don't know if we could make use of any of those ideas. I can imagine having separate L and G for each register file, and otherwise following the same procedure, but I suspect we'd be unhappy with the MMIX conventions without having a larger number of registers. BTW, how did you choose 32 for the number of regs? This is really only an issue for folks writing code generators by hand, and with 32 of each register type most people won't hit it. Plain parser add-ons will use the core code generator, so they won't need to worry about it. Yeah. I'm trying very hard not to put anything really sophisticated into jakoc (at least not yet). Right now I can still tweak things reasonably well. If I add much more complexity, I'm going to have to actually write a real compiler, and if I write a real compiler I probably won't be able to resist the temptation to turn Jako into the language I *really* wish I had, and that would be a bigger project. *) The callee is responsible for making sure the stack is cleaned off. So, in the case of zero args, do we still push a zero on the stack to make a proper frame? I think yes... If the function is listed as taking a variable number of args, yes. Functions marked as taking no args at all don't get anything put on the stack. I'm thinking yes because of stack unwinding. Don't we need to have parity between return addresses on their stack and frames of args on their stack? Oh wait. We're popping (restoring) those off the stack on subroutine entry, so in general the arg stack should be empty most of the time, right? Adding to that the fact that most of the time our args and results will be passed in regs, and I guess I can see that we won't need it. Except for
Re: Parameter passing conventions
At 11:17 AM 10/29/2001 -0500, Gregor N. Purdy wrote: *) The first five registers (I0-I4, S0-S4, P0-P4, N0-N4) are scratch and do not have to be preserved by the callee Still thinking about this... We are reducing the overall number of reg copies going on by adding these special cases. I just wish we had an approach that was both uniform (simple, no special cases) and fast too. You, and me, and about a zillion other people. Generally speaking the choices are fast, uniform, and scalable. Choose two. H. I tried reading section 29 (Subroutine Linkage) of the MMIXware book (pages 32-34) for inspiration, but I didn't see how anything there could help us. MMIX has 256 logical general-purpose 64-bit registers. That's a handy reg size since a reasonable float can sit in there as well as an unreasonable int. The local-marginal-global register distinction used by MMIX is interesting, but I think it might lose its appeal with 4 distinct typed register files. Knuth does make the statement: These conventions for parameter passing are admittedly a bit confusing in the general case, and I suppose people who use them extensively might sometime find themselves talking about the infamous MMIX register shuffle. However, there is good use for subroutines that convert a sequence of register contents like (x, a, b, c) into (f, a, b, c) where f is a function of a, b, and c but not x. Moreover PUSHGO and POP can be implemented with great efficiency, and subroutine linkage tends to be a significant bottleneck when other conventions are used. Its that last sentence that got my attention... But, I still don't know if we could make use of any of those ideas. I can imagine having separate L and G for each register file, and otherwise following the same procedure, but I suspect we'd be unhappy with the MMIX conventions without having a larger number of registers. I'll have to snag that manual next time I'm around a good bookstore. I've not read it as of yet, and Knuth generally has good things to say. A split between local, marginal, and global registers would be an interesting thing to do, and I can see it making the code more elegant. I worry about it making things more complex, though, especially with us already having multiple register types. (We'd double or triple the number of register types essentially, and to some extent blow cache even more than we do now. Might be a win in other ways, though. I'll have to ponder a bit) BTW, how did you choose 32 for the number of regs? Picked it out of the air. :) Seriously, I wanted a power-of-two number, I wanted the resulting size of a register file to be equal to or smaller than your average page size (512 bytes for most folks IIRC) and I wanted to be able to encode the register number and type in a single byte if it turned out that the overhead of decoding was smaller than the speed hit we took from the extra bus bandwidth wasting a full 32 bit word for each parameter. So, the two-bit type limits us to 64 registers max, and that seemed a bit too big in the general case. 16 was too few by a bit (most of my compiler books say that's not quite enough for most code, and you'll end up with overflow to the stack to handle temps), so that left 32. Still a bit big in some cases, especially considering we have four full sets of registers, but we'll see how that goes. Yeah. I'm trying very hard not to put anything really sophisticated into jakoc (at least not yet). Right now I can still tweak things reasonably well. If I add much more complexity, I'm going to have to actually write a real compiler, and if I write a real compiler I probably won't be able to resist the temptation to turn Jako into the language I *really* wish I had, and that would be a bigger project. And this would be a bad thing because? (Well, besides the demands on what little free time you might have now, but that's not our problem... :) *) The callee is responsible for making sure the stack is cleaned off. So, in the case of zero args, do we still push a zero on the stack to make a proper frame? I think yes... If the function is listed as taking a variable number of args, yes. Functions marked as taking no args at all don't get anything put on the stack. I'm thinking yes because of stack unwinding. Don't we need to have parity between return addresses on their stack and frames of args on their stack? Sort of. The only place we really need to have it is for the exception handling, which needs to quickly unwind the register stacks, but I'm thinking we'll push the addresses of the current register files when we push an exception handler, and restore them (along with the stack) when we catch an exception. Oh wait. We're popping (restoring) those off the stack on subroutine entry, so in general the arg stack should be empty most of the time, right? I don't know that it'll be empty all the time, as
Re: Parameter passing conventions
Dan -- [snip] I'll have to snag that manual next time I'm around a good bookstore. I've not read it as of yet, and Knuth generally has good things to say. You can grab PDFs here: http://link.springer.de/link/service/series/0558/tocs/t1750.htm Of course, you can also browse around on Knuth's site for other related stuff... http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html A split between local, marginal, and global registers would be an interesting thing to do, and I can see it making the code more elegant. I worry about it making things more complex, though, especially with us already having multiple register types. (We'd double or triple the number of register types essentially, and to some extent blow cache even more than we do now. Might be a win in other ways, though. I'll have to ponder a bit) Yeah, I didn't like the idea of proliferating that more either. I still sometimes dream about a single register file of N regs into which we can put whatever we want. Each block of registers has room for the reg contents and the type info too. Seems you've got some of the support for that figured out in the stack already. Just declare that either (a) it is illegal (or behavior undefined) to do set $2, 5 set $3, foo bar add $1, $2, $3 [just because we have higher-level data types than a real machine doesn't mean we can't still have general-purpose registers, I think] or (b) that if you do something numeric with a register that is non-numeric type mucking happens behind the scenes and throws an exception if there is a problem. Certainly this wouldn't be surprising to anyone who had been looking at what we do with PMCs and arithmetic ops. If we ever did move to such a single-register-file model, I'd support looking seriously at the calling conventions of MMIX to see if we can get the appropriate performance characteristics. And, BTW, we have 4*32 = 128 regs now. We could even match the logical register count of MMIX (256) with only a doubling of total register count. And, if we ever determined we needed another kind of register (such as one that can be used for address arithmetic, since INTVAL doesn't cut it), we wouldn't have to add a fifth file, we'd just add another type (thinking again about the stack implementation). [snip] Yeah. I'm trying very hard not to put anything really sophisticated into jakoc (at least not yet). Right now I can still tweak things reasonably well. If I add much more complexity, I'm going to have to actually write a real compiler, and if I write a real compiler I probably won't be able to resist the temptation to turn Jako into the language I *really* wish I had, and that would be a bigger project. And this would be a bad thing because? (Well, besides the demands on what little free time you might have now, but that's not our problem... :) It might be a bad thing because Jako would then not be a little demo language. I suppose I could start from scratch, but then I'd have to come up with another language name (oh the horrors!) [snip] Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/
Re: Parameter passing conventions
Dan -- You can also look at section 1.4.1' of http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz for another view of subroutine linkage from the upcoming TAOCP. Regards, -- Gregor _ / perl -e 'srand(-2091643526); print chr rand 90 for (0..4)' \ Gregor N. Purdy [EMAIL PROTECTED] Focus Research, Inc.http://www.focusresearch.com/ 8080 Beckett Center Drive #203 513-860-3570 vox West Chester, OH 45069 513-860-3579 fax \_/
Re: String rationale
In message [EMAIL PROTECTED] Dan Sugalski [EMAIL PROTECTED] wrote: At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote: Attached is my first pass at this - it's not fully ready yet but is something for people to cast an eye over before I spend lots of time going down the wrong path ;-) It looks pretty good on first glance. I've done a bit more work now, and the latest version is attached. This version can do transcoding. The intention is that there will be some sort of cache in chartype_lookup_transcoder to avoid repeating the expensive lookups by name too much. One interesting question is who is responsible for transcoding from character set A to character set B - is it A or B? and how about the other way? My code currently allows either set to provide the transform on the grounds that otherwise the unicode module would have to either know how to convert to everything else or from everything else. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ # This is a patch for parrot to update it to parrot-ns # # To apply this patch: # STEP 1: Chdir to the source directory. # STEP 2: Run the 'applypatch' program with this patch file as input. # # If you do not have 'applypatch', it is part of the 'makepatch' package # that you can fetch from the Comprehensive Perl Archive Network: # http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz # In the above URL, 'x' should be 2 or higher. # # To apply this patch without the use of 'applypatch': # STEP 1: Chdir to the source directory. # If you have a decent Bourne-type shell: # STEP 2: Run the shell with this file as input. # If you don't have such a shell, you may need to manually create/delete # the files/directories as shown below. # STEP 3: Run the 'patch' program with this file as input. # # These are the commands needed to create/delete files/directories: # mkdir 'chartypes' chmod 0755 'chartypes' mkdir 'encodings' chmod 0755 'encodings' rm -f 'transcode.c' rm -f 'strutf8.c' rm -f 'strutf32.c' rm -f 'strutf16.c' rm -f 'strnative.c' rm -f 'include/parrot/transcode.h' rm -f 'include/parrot/strutf8.h' rm -f 'include/parrot/strutf32.h' rm -f 'include/parrot/strutf16.h' rm -f 'include/parrot/strnative.h' touch 'chartype.c' chmod 0644 'chartype.c' touch 'chartypes/unicode.c' chmod 0644 'chartypes/unicode.c' touch 'chartypes/usascii.c' chmod 0644 'chartypes/usascii.c' touch 'encoding.c' chmod 0644 'encoding.c' touch 'encodings/singlebyte.c' chmod 0644 'encodings/singlebyte.c' touch 'encodings/utf16.c' chmod 0644 'encodings/utf16.c' touch 'encodings/utf32.c' chmod 0644 'encodings/utf32.c' touch 'encodings/utf8.c' chmod 0644 'encodings/utf8.c' touch 'include/parrot/chartype.h' chmod 0644 'include/parrot/chartype.h' touch 'include/parrot/encoding.h' chmod 0644 'include/parrot/encoding.h' # # This command terminates the shell and need not be executed manually. exit # End of Preamble Patch data follows diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST' Index: ./MANIFEST *** ./MANIFEST Sun Oct 28 17:11:21 2001 --- ./MANIFEST Sun Oct 28 17:11:07 2001 *** *** 1,5 --- 1,8 assemble.pl ChangeLog + chartype.c + chartypes/unicode.c + chartypes/usascii.c classes/genclass.pl classes/intclass.c classes/scalarclass.c *** *** 15,20 --- 18,28 docs/parrotbyte.pod docs/strings.pod docs/vtables.pod + encoding.c + encodings/singlebyte.c + encodings/utf8.c + encodings/utf16.c + encodings/utf32.c examples/assembly/bsr.pasm examples/assembly/call.pasm examples/assembly/euclid.pasm *** *** 30,35 --- 38,45 global_setup.c hints/mswin32.pl hints/vms.pl + include/parrot/chartype.h + include/parrot/encoding.h include/parrot/events.h include/parrot/exceptions.h include/parrot/global_setup.h *** *** 46,56 include/parrot/runops_cores.h include/parrot/stacks.h include/parrot/string.h - include/parrot/strnative.h - include/parrot/strutf16.h - include/parrot/strutf32.h - include/parrot/strutf8.h - include/parrot/transcode.h include/parrot/trace.h include/parrot/unicode.h interpreter.c --- 56,61 *** *** 108,117 runops_cores.c stacks.c string.c - strnative.c - strutf16.c - strutf32.c - strutf8.c test_c.in test_main.c Test/More.pm --- 113,118 *** *** 129,135 t/op/time.t t/op/trans.t trace.c - transcode.c Types_pm.in vtable_h.pl vtable.tbl --- 130,135 diff -c 'parrot/Makefile.in' 'parrot-ns/Makefile.in' Index: ./Makefile.in *** ./Makefile.in Wed Oct 24 19:23:47 2001 --- ./Makefile.in Sat Oct 27 15:02:45 2001 *** *** 11,19 $(INC)/pmc.h $(INC)/resources.h O_FILES = global_setup$(O) interpreter$(O) parrot$(O) register$(O) \ ! core_ops$(O) memory$(O) packfile$(O) stacks$(O) string$(O) strnative$(O) \ ! strutf8$(O) strutf16$(O) strutf32$(O) transcode$(O) runops_cores$(O) \ ! trace$(O) vtable_ops$(O)
RE: String rationale
You might consider requiring all character sets be able to convert to Unicode, and otherwise only have to know how to convert other character sets to it's own set. -Original Message- From: Tom Hughes [mailto:[EMAIL PROTECTED]] Sent: Monday, October 29, 2001 02:31 PM To: [EMAIL PROTECTED] Subject: Re: String rationale In message [EMAIL PROTECTED] Dan Sugalski [EMAIL PROTECTED] wrote: At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote: Attached is my first pass at this - it's not fully ready yet but is something for people to cast an eye over before I spend lots of time going down the wrong path ;-) It looks pretty good on first glance. I've done a bit more work now, and the latest version is attached. This version can do transcoding. The intention is that there will be some sort of cache in chartype_lookup_transcoder to avoid repeating the expensive lookups by name too much. One interesting question is who is responsible for transcoding from character set A to character set B - is it A or B? and how about the other way? My code currently allows either set to provide the transform on the grounds that otherwise the unicode module would have to either know how to convert to everything else or from everything else. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: String rationale
At 02:52 PM 10/29/2001 -0500, Stephen Howard wrote: You might consider requiring all character sets be able to convert to Unicode, That's already a requirement. All character sets must be able to go to or come from Unicode. They can do others if they want, but it's not required. (And we'll have to figure out how to allow that reasonably efficiently) Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: String rationale
right. I had just keyed in on this from Tom's message: My code currently allows either set to provide the transform on the grounds that otherwise the unicode module would have to either know how to convert to everything else or from everything else. ...which seemed to posit that Unicode module could be responsible for all the transcodings to and from it's own character set, which seemed backwards to me. -Stephen -Original Message- From: Dan Sugalski [mailto:[EMAIL PROTECTED]] Sent: Monday, October 29, 2001 02:43 PM To: Stephen Howard; Tom Hughes; [EMAIL PROTECTED] Subject: RE: String rationale At 02:52 PM 10/29/2001 -0500, Stephen Howard wrote: You might consider requiring all character sets be able to convert to Unicode, That's already a requirement. All character sets must be able to go to or come from Unicode. They can do others if they want, but it's not required. (And we'll have to figure out how to allow that reasonably efficiently) Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Anybody write a threaded dispatcher yet?
Anybody do a gcc-specific goto *pc dispatcher for Parrot yet? On some architectures it really cooks. - Ken
Re: Anybody write a threaded dispatcher yet?
At 03:33 PM 10/29/2001 -0500, Ken Fox wrote: Anybody do a gcc-specific goto *pc dispatcher for Parrot yet? On some architectures it really cooks. That's a good question. There was talk and benchmark numbers from a variety of different dispatchers. C'mon folks, kick in the code. I'll weld dispatch selection into configure.pl if I've got the dispatchers to work from... Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
New patch
OK, there is another workaround to make pbc2c.pl work which still uses the goto model so speed is not affected but it's harder to maintain since it's not as generic as the other one. Daniel. Index: pbc2c.pl === RCS file: /home/perlcvs/parrot/pbc2c.pl,v retrieving revision 1.3 diff -r1.3 pbc2c.pl 68a69 my $op; 70a72 my @pcs = (); 88c90 inti; --- intcur_opcode,to; 142c144 my $op; --- my $jump; 145a148 $jump .= case . $pc . : goto PC_ . $pc . ;\n; 162a166 $source = cur_opcode = . $pc . ;\n . $source if ($op-full_name eq 'bsr_ic'); 172a177,187 JUMP:{ switch (to) { case 0: goto PC_0; END_C print $jump; print END_C; default: exit(0); } } 189c204,208 return sprintf(goto PC_%d, $addr); --- if ($op-full_name =~ 'ret') { return sprintf(to = dest;\ngoto JUMP); } else { return sprintf(goto PC_%d, $addr); } 201c220,224 return sprintf(goto PC_%d, $pc + $offset); --- if ($op-full_name eq 'jump_i') { return sprintf(to = . $pc . + . $offset . ;\ngoto JUMP); } else { return sprintf(goto PC_%d, $pc + $offset); } Index: pbc2c.pl === RCS file: /home/perlcvs/parrot/pbc2c.pl,v retrieving revision 1.3 diff -r1.3 pbc2c.pl 70a71 my @functions = (); 79a81,82 void start(); 85a89,90 struct Parrot_Interp * interpreter; 88,89d92 inti; struct Parrot_Interp * interpreter; 134a138,142 print END_C; start(); return 0; } END_C 163c171,172 printf(PC_%d: { /* %s */\n%s}\n\n, $pc, $op-full_name, $source); --- push(@functions,$pc); printf(int\nPC_%d(int cur_opcode) /* %s */\n{\n%s}\n\n, $pc, $op-full_name, $source); 168,171c177,181 PC_$new_pc: PC_0: { exit(0); } --- void start() { int(*functions[$pc])(int); intj = 1; 173c183,191 return 0; --- END_C foreach (0..scalar(@functions) - 1) { print functions[ . $functions[$_] . ] = (int (*)(int))PC_ . $functions[$_] . ;\n; } print END_C; while (j) { j = (*functions[j])(j); }; exit(0); 189c207 return sprintf(goto PC_%d, $addr); --- return sprintf return ( . $addr . ); 201c219 return sprintf(goto PC_%d, $pc + $offset); --- return sprintf return (cur_opcode+ . $offset . );
Re: New patch
Just to make it clear both of them still need a LOT of work, but I don't know to which should I stick. On Mon, 29 Oct 2001, Daniel Grunblatt wrote: OK, there is another workaround to make pbc2c.pl work which still uses the goto model so speed is not affected but it's harder to maintain since it's not as generic as the other one. Daniel.
Re: Schedule of things to come
John Siracusa writes: I think we're due out in reasonably good alpha/beta shape for the summer. Heh, the phrase suitable vague springs to mind... :) There's a good reason for that, for why I've tried hard to avoid giving promises of when things would be ready. Have you seen Apache 2 and Mozilla slip their schedules? I'm making everyone take things feature-by-feature, and we'll give a release schedule when we can see the end in sight and not before. What would be the point of naming an arbitrary date when we don't even know when Larry will finish his Apocalypses? It seems crazy to have dates before you have specifications of the final system. Nat
RE: String rationale
In message [EMAIL PROTECTED] Stephen Howard [EMAIL PROTECTED] wrote: right. I had just keyed in on this from Tom's message: My code currently allows either set to provide the transform on the grounds that otherwise the unicode module would have to either know how to convert to everything else or from everything else. ...which seemed to posit that Unicode module could be responsible for all the transcodings to and from it's own character set, which seemed backwards to me. I was only positing it long enough to acknowledge that such a rule was untenable. What it comes down to is that there are three possibles rules, namely: 1. Each character set defines transforms from itself to other character sets. 2. Each character set defines transforms to itself from other character sets. 3. Each character set defines transforms both from itself to other character sets and from other character sets to itself. We have established that the first two will not work because of the unicode problem. That leaves the third, which is what I have implemented. When looking to transcode from A to B it will first ask A if can it transcode to B and if that fails then it will ask B if it can transcode from A. That way each character set can manage it's own translations both to and from unicode as we require. The problem it raises is, whois reponsible for transcoding from ASCII to Latin-1? and back again? If we're not careful both ends will implement both translations and we will have effective duplication. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: New patch
On Mon, Oct 29, 2001 at 03:15:07PM -0300, Daniel Grunblatt wrote: Just to make it clear both of them still need a LOT of work, but I don't know to which should I stick. Just in case anyone wonders what's up with this patch, I'm waiting for some feedback from others before applying. -- So i get the chance to reread my postings to asr at times, with a corresponding conservation of the almighty leviam00se, Kai Henningsen. -- Megahal (trained on asr), 1998-11-06
Improved storage-to-storage architecture performance
A little while back I posted some code that implemented a storage-to-storage architecture. It was slow, but I tossed that off as an implementation detail. Really. It was. :) Well, I've tuned things up a bit. It's now hitting 56 mops with the mops.pasm example. Parrot turns in 24 mops on the same machine with the same compiler options. This is not a fair comparison because the Parrot dispatcher isn't optimal, but it shows I'm not hand waving about the architecture any more... ;) Dan was right. It's a lot faster to emit explicit scope change instructions than to include a scope tag everywhere. Memory usage is about the same, but the explicit instructions permit code threading which is a *huge* win on some architectures. The assembler does 99% of the optimizations, and it still uses scope tagged instructions, so nothing is really lost by ripping out the scope tags. One thing I learned is that it's not necessary (or desirable) to do enter/exit scope ops. I implemented sync_scope which takes a scope id as an operand and switches the VM into that scope, adjusting the current lexical environment as necessary. This works really well. The reason why sync_scope works better than explicit enter/exit ops is because sync_scope doesn't force any execution order on the code. Compilers just worry about flow control and the VM figures out how to adjust the environment automatically. For example, Algol-style non-local goto is very fast -- faster and cleaner than exceptions for escaping from deep recursion. One other thing I tested was subroutine calling. This is an area where a storage-to-storage arch really shines. I called a naive factorial(5) in a loop 10 million times. Subroutine call performance obviously dominates. Here's the code and the times: Parrot: 237,000 fact(5)/sec fact: clonei eq I0, 1, done set I1, I0 dec I0, 1 bsr fact mul I0, I0, I1 done: saveI0 popi restore I0 ret Kakapo: 467,000 fact(5)/sec .begin fact: arg L0, 0 cmp L1, L0, 1 brne L1, else ret.i 1 else: sub L2, L0, 1 jsr L3, fact, L2 mul L4, L0, L3 ret.i L4 .end I think the main thing that makes the storage- to-storage architecture faster is that the callee won't step on the caller's registers. The caller's arguments can be fetched directly by the callee. There's no argument stack or save/restore needed. Here's the calling conventions for Kakapo. On a sub call, the pc is saved in the ret_pc register. Any frames not shared (lexically) between the caller and callee are dumped to the stack (just the frame pointers; the frames themselves are never copied). A sync_scope instruction at the start of a sub takes care of building the callee's lexical environment. The caller passes arguments by reference. The arg instruction uses the operands in the jsr instruction as an argument list. (The jsr instruction is easy to access because the ret_pc register points to it.) arg works exactly like set except that it uses the caller's lexical environment to fetch the source value. Yes, this makes jsr a variable-size instruction, but so what? There's no penalty on a software VM. - Ken
Re: Improved storage-to-storage architecture performance
At 04:44 PM 10/29/2001 -0500, Ken Fox wrote: Well, I've tuned things up a bit. It's now hitting 56 mops with the mops.pasm example. Parrot turns in 24 mops on the same machine with the same compiler options. Damn. I hate it when things outside my comfort zone end up being faster. :) This is not a fair comparison because the Parrot dispatcher isn't optimal, but it shows I'm not hand waving about the architecture any more... ;) I didn't think you were, unfortunately. (for me, at least) A SS architecture skips a level of indirection, and that'll end up being faster generally. What sort of dispatch was your version using, and what sort was parrot using in your test? One thing I learned is that it's not necessary (or desirable) to do enter/exit scope ops. Don't forget that you'll need those for higher-level constructs. For example, this code: { my Dog $spot is color('brindle'):breed('welsh corgi'); } will need to call Dog's constructor and attribute setting code every time you enter that scope. You also potentially need to allocate a new scope object every time you enter a scope so you can remember it properly if any closures are created. I implemented sync_scope which takes a scope id as an operand and switches the VM into that scope, adjusting the current lexical environment as necessary. How does this handle nested copies of a single scope? That's the spot a SS architecture needs to switch to indirect access from direct, otherwise you can only have a single instance of a particular scope active at any one time, and that won't work. I'm curious as to whether the current bytecode could be translated on load to something a SS interpreter could handle. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: String rationale
On Mon, Oct 29, 2001 at 08:32:16PM +, Tom Hughes wrote: We have established that the first two will not work because of the unicode problem. Hm. I think instead of requiring Unicode to support everything, we should require Unicode to support /nothing/. If A and B have no mutual transcoding function, we should use Unicode as a intermediary. (This means that charsets that are lossy to unicode need to transcode to eachother directly, like Far Eastern sets. (And Klingon, but that can't transcode to anything.)) This still makes Unicode a special case, but not a terrible one. (In fact, unicode can be treated like any other charset, except when we want to trancode between mutualy incompatable sets, since we always try both A-B and A-B. (Notational note: A-B means that A is implementing a transcoding from itself to B. A-B means that A is implementing a transcoding from B to A.) That leaves the third, which is what I have implemented. When looking to transcode from A to B it will first ask A if can it transcode to B and if that fails then it will ask B if it can transcode from A. I propose another variant on this: If that fails, it asks A to transcode to Unicode, and B to transcode from Unicode. (Not Unicode to transcode to B; Unicode implements no transcodings.) The problem it raises is, whois reponsible for transcoding from ASCII to Latin-1? and back again? If we're not careful both ends will implement both translations and we will have effective duplication. 1) Neither. Each must support transcoding to and from Unicode. 2) But either can support converting directly if it wants. I also think that, for efficency, we might want a 7-bit chars match ASCII flag, since most charactersets do, and that means that we don't have to deal with the overhead for strings that fit in 7 bits. This smells of premature optimization, though, so sombody just file this away in their heads for future reference. That would also mean that neither is responsible for converting between Latin-1 and ASCII, because core will do it, most of the time, and the rest of the time, it isn't possible. Hm. But it isn't possible _losslessly_, though it is possibly lossfuly. IMHO, there should be two ways to transcode, or the transcoding function should flag to it's caller somehow. (Sorry for the train-of-thought, but I think it's decently clear.) (BTW, for those paying attention, I'm waiting on this discussion for my chr/ord patch, since I want them in terms of charsets, not encodings.) -=- James Mastros
Re: String rationale
In message [EMAIL PROTECTED] James Mastros [EMAIL PROTECTED] wrote: That leaves the third, which is what I have implemented. When looking to transcode from A to B it will first ask A if can it transcode to B and if that fails then it will ask B if it can transcode from A. I propose another variant on this: If that fails, it asks A to transcode to Unicode, and B to transcode from Unicode. (Not Unicode to transcode to B; Unicode implements no transcodings.) My code does that, though at a slightly higher level. If you look at string_transcode() you will see that if it can't find a direct mapping it will go via unicode. If C had closures then I'd have buried that down in the chartype_lookup_transcoder() layer, but it doesn't so I couldn't ;-) The problem it raises is, whois reponsible for transcoding from ASCII to Latin-1? and back again? If we're not careful both ends will implement both translations and we will have effective duplication. 1) Neither. Each must support transcoding to and from Unicode. Absolutely. 2) But either can support converting directly if it wants. The danger is that everybody tries to be clever and support direct conversion to and from as many other character sets as possible, which leads to lots of duplication. I also think that, for efficency, we might want a 7-bit chars match ASCII flag, since most charactersets do, and that means that we don't have to deal with the overhead for strings that fit in 7 bits. This smells of premature optimization, though, so sombody just file this away in their heads for future reference. I have already been thinking about this although it does get more complicated as you have to consider the encoding as well - if you have a single byte encoded ASCII string then transcoding to a single byte encoded Latin-1 string is a no-op, but that may not be true for other encodings if such a thing makes sense for those character types. (BTW, for those paying attention, I'm waiting on this discussion for my chr/ord patch, since I want them in terms of charsets, not encodings.) I suspect that the encode and decode methods in the encoding vtable are enough for doing chr/ord aren't they? Surely chr() is just encoding the argument in the chosen encoding (which can be the default encoding for the char type if you want) and then setting the type and encoding of the resulting string appropriately. Equally ord() is decoding the first character of the string to get a number. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
On Mon, Oct 29, 2001 at 11:20:47PM +, Tom Hughes wrote: 2) But either can support converting directly if it wants. The danger is that everybody tries to be clever and support direct conversion to and from as many other character sets as possible, which leads to lots of duplication. Yeah. But that's a convention thing, I think. I also think that most people won't go to the bother of writing conversion functions that they don't have to. What we need to worry about is both, say, big5 and shiftjis writing both of the conversions. And it shouldn't come up all that much, because Unicode is /supposted to be/ lossless for most things. I have already been thinking about this although it does get more complicated as you have to consider the encoding as well - if you have a single byte encoded ASCII string then transcoding to a single byte encoded Latin-1 string is a no-op, but that may not be true for other encodings if such a thing makes sense for those character types. Hm. All the encodings I can think of (which is rather limited -- the UTFs), you can scan for units (IE ints of the proper size) 0x7f, and if you don't find any, it's 7bit, and you can just change the charset marker without doing any work. In any case, it's up to the encoding to tell if we've got a pure 7bit string. If that's complicated for it, it can just always return FALSE. I suspect that the encode and decode methods in the encoding vtable are enough for doing chr/ord aren't they? Hmm... come to think of it, yes. chr will always create a utf32-encoded string with the given charset number (or unicode for the two-arg version), ord will return the codepoint within the current charset. (This, BTW, means that only encodings that feel like it have to provide either, but all encodings must be able to convert to utf32.) Powers-that-be (I'm looking at you, Dan), is that good? -=- James Mastros
Re: Parameter passing conventions
A split between local, marginal, and global registers would be an interesting thing to do, and I can see it making the code more elegant. I worry about it making things more complex, though, especially with us already having multiple register types. (We'd double or triple the number of register types essentially, and to some extent blow cache even more than we do now. Might be a win in other ways, though. I'll have to ponder a bit) Yeah, I didn't like the idea of proliferating that more either. I still sometimes dream about a single register file of N regs into which we can put whatever we want. Each block of registers has room for the reg contents and the type info too. Seems you've got some of the support for that figured out in the stack already. Just declare that either (a) it is illegal (or behavior undefined) to do set $2, 5 set $3, foo bar add $1, $2, $3 [just because we have higher-level data types than a real machine doesn't mean we can't still have general-purpose registers, I think] or (b) that if you do something numeric with a register that is non-numeric type mucking happens behind the scenes and throws an exception if there is a problem. Certainly this wouldn't be surprising to anyone who had been looking at what we do with PMCs and arithmetic ops. If we ever did move to such a single-register-file model, I'd support looking seriously at the calling conventions of MMIX to see if we can get the appropriate performance characteristics. And, BTW, we have 4*32 = 128 regs now. We could even match the logical register count of MMIX (256) with only a doubling of total register count. And, if we ever determined we needed another kind of register (such as one that can be used for address arithmetic, since INTVAL doesn't cut it), we wouldn't have to add a fifth file, we'd just add another type (thinking again about the stack implementation). After reading the entire MMIX chapter, my mind went back and forth. First of all, the only reason that 256 registers were used was because of the byte aligned register arguments to the op-codes (plus modern intuition is that more registers = faster execution). Currently we have 4B arguments and don't perform boundry-condition checks, so this is neither here nor there. Currently we translate P1 to: interpreter-num_reg-registers[ cur_opcode[1] ] Which requires 4 indirections. (multiple Px instances should be optimized by gcc so that subsequent accesses only require 2 indirections) To avoid core-dumping on invalid arguments, we could up the reg-set to 256 and convert the above to: interpreter-num_reg-registers[ (unsigned char)cur_opcode[1] ] and adjust the assembler accordingly. Alternatively to work with 32 regs, we'd have: interpreter-num_reg-registers[ cur_opcode[1] 0x001F ] As for MMIX. I don't see a need for globals, since we're going to have various global symbol stashes available to us. Further, I don't see a value in providing special trapping code to return zero when reading from a Marginal or extending the local variable space when writing. For writing, an explicit reserve (once (or less)per function call) shouldn't be too much bother. And if the code is silly enough to write or read from this marginal region, then we'll pretend that they're using uninitialized values. Further, the reserveer must require that there is enough space to fully utilize the n-register set (currently 32), so that set $r31, 5 doesn't spill into the tail of the register set (since that was previously handled by the trapping code). This modifies the MMIX spec such that we potentially waste up to 31 register slots in the register window (which is trivial when we have a window of size = 1024). The rolling register set can be accomplished via three methods. First, realloc the register stack every time an extend exceeds the size. (This sucks for recursive functions). Second use paged register sets and copy values durring partial spillover (very complex). Lastly utilize a [pow2] modulous on a fixed size register stack. This has a very interesting implication; that we completely do away with the (push|pop)[ipsn] and their associated data-structures. Thus the P1 translation becomes: //(for a 1K rolling stack) #define STACK_MASK 0x03FF interpreter-num_reg[ ( interp-num_offset + cur_opcode[1] ) STACK_MASK ] This has 4 indirections and two integer ops. As above, for multiple uses of Px in an op-code, this should be optimized to 2 indirections. Since indirections are significantly slower than bitwise logical operations, this should be roughly equivalent in speed to our current interpreter. If we were hell-bent on speed, we could utilize STACK_SIZE alligned memory regions and perform direct memory arithmetic as with: interp-x_reg_base = [ ... ] 1 K chunk of memory aligned to 1K boundry interp-x_reg = interp-x_reg_base + interp-x_offset #define P1 *(int*)(((int)( interp-x_reg
Re: Improved storage-to-storage architecture performance
Dan Sugalski wrote: What sort of dispatch was your version using, and what sort was parrot using in your test? Parrot used the standard function call dispatcher without bounds checking. Kakapo used a threaded dispatcher. There's a pre-processing phase that does byte code verification because threading makes for some outrageously unsafe code. Parrot and Kakapo should have very similar mops when using the same dispatcher. You all know what a Parrot add op looks like. Here's the Kakapo add op: op_add: STORE(kvm_int32, pc[1]) = FETCH(kvm_int32, pc[2]) + FETCH(kvm_int32, pc[3]); pc += 4; NEXT_OP; Ok, ok. You want to know what those macros do... ;) op_add: *(kvm_int32 *)(frame[pc[1].word.hi] + pc[1].word.lo) = *(const kvm_int32 *)(frame[pc[2].word.hi] + pc[2].word.lo) + *(const kvm_int32 *)(frame[pc[3].word.hi] + pc[3].word.lo); pc += 4; goto *(pc-i_addr); I haven't counted derefs, but Parrot and Kakapo should be close. On architectures with very slow word instructions, some code bloat to store hi/lo offsets in native ints might be worth faster address calculations. Ken Fox wrote: One thing I learned is that it's not necessary (or desirable) to do enter/exit scope ops. Don't forget that you'll need those for higher-level constructs. For example, this code: { my Dog $spot is color('brindle'):breed('welsh corgi'); } will need to call Dog's constructor and attribute setting code every time you enter that scope. Definitely. I didn't say Kakapo doesn't have enter/exit scope semantics -- it does. There's no byte code enter scope op though. What happens is more declarative. There's a sync_scope guard op that means the VM must be in lexical scope X to properly run the following code. If the VM is already in scope X, then it's a nop. If the VM is in the parent of X, then it's an enter scope. If the VM is in a child of X, then it's an exit scope. This makes it *very* easy for a compiler to generate flow control instructions. For example: { my Dog $spot ... { my Cat $fluffy ... middle: $spot-chases($fluffy); } } What happens when you goto middle depends on where you started. sync_scope might have to create both Dog and Cat scopes when code jumps to the middle. Or, code might already be in a sub-scope of Cat, so sync_scope would just pop scopes until it gets back to Cat. This is where sync_scope is very useful. It allows the compiler to say this is the environment I want here and delegates the job to the VM on how it happens. You also potentially need to allocate a new scope object every time you enter a scope so you can remember it properly if any closures are created. Closures in Kakapo are simple. All it needs to do is: 1. copy any current stack frames to the heap 2. copy the display (array of frame pointers) to the heap 3. save the pc Step #1 can be optimized because the assembler will have a pretty good idea which frames escape -- the run-time can scribble a note on the scope definition if it finds one the assembler missed. Escaping frames will just be allocated on the heap to begin with. This means that taking a closure is almost as cheap as calling a subroutine. Calling a closure is also almost as cheap as calling a subroutine because we just swap in an entirely new frame display. How does this handle nested copies of a single scope? That's the spot a SS architecture needs to switch to indirect access from direct, otherwise you can only have a single instance of a particular scope active at any one time, and that won't work. Calling a subroutine basically does this: 1. pushes previous return state on the stack 2. sets the return state registers 3. finds the deepest shared scope between caller and callee's parent 4. pushes the non-shared frames onto the stack 5. transfers control to the callee 6. sync_scope at the callee creates any frames it needs I'm curious as to whether the current bytecode could be translated on load to something a SS interpreter could handle. Never thought of that -- I figured the advantage of an SS machine is that brain-dead compilers can still generate fast code. Taking a really smart compiler generating register-based code and then translating it to an SS machine seems like a losing scenario. I think this is why storage-to-storage architectures have lost favor -- today's compilers are just too smart. Possibly with a software VM the memory pressure argument favoring registers isn't strong enough to offset the disadvantage of requiring smart compilers. I just put up the 0.2 version of Kakapo at http://www.msen.com/~fox/Kakapo-0.2.tar.gz This version has the sync_scope instruction, threaded dispatch, immediate mode operands, and a really crappy rewrite technique for instruction selection. One other thing that I discovered is how sensitive the VM is to dereferences. Adding the immediate mode versions of add and cmp gave me 10 more mops in the
Re: Improved storage-to-storage architecture performance
Uri Guttman wrote: and please don't bring in hardware comparisons again. a VM design cannot be compared in any way to a hardware design. I have absolutely no idea what you are talking about. I didn't say a single thing about hardware. My entire post was simply about an alternative VM architecture. It's not a theory. You can go get the code right now. I'm just messing around on a storage-to-storage VM system I've named Kakapo. It's a dead-end. A fat, flightless, endangered kind of parrot. It's fun to experiment with ideas and I hope that good ideas might make it into Parrot. - Ken
Re: Improved storage-to-storage architecture performance
Uri Guttman wrote: that is good. i wasn't disagreeing with your alternative architecture. i was just making sure that the priority was execution over compilation speed. I use a snazzy quintuple-pass object-oriented assembler written in equal parts spit and string (with a little RecDescent thrown in for good measure). A real speed demon it is... ;) The real motivation of my work is to see if a storage-to-storage machine ends up using cache better and with less compiler effort than a register machine. When I read about CRISP, the first thing that came to mind was the top-of-stack-register-file could be simulated exactly with high-speed cache in a software VM. Dropping the stack-machine instructions in favor of Parrot's 3 operand ones made it sound even better. ... then be mmap'ed in and run with hopefully impressive speed. I'm impressed with the possibilities of the pbc-C translator. The core modules on my system probably won't be mmap'ed byte code -- they'll be mmap'ed executable. Reducing memory foot-print this way might take some of the pressure off the need to share byte code. Lots of really nice optimizations require frobbing the byte code, which definitely hurts sharing. - Ken
Request for new feature: attach a perl debugger to a running process
Hi, I would like to request a new feature for perl: The ability to attach a perl debugger to a running process. Also, it would be nice to have the capability to generate a dump (core file) for post-mortem analysis. The perl debugger could then read the core file. These capabilities would add a lot of value to perl. Thanks in advance!! David _ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
Re: Request for new feature: attach a perl debugger to a running process
On Mon, Oct 29, 2001 at 05:27:30PM +, David Trusty wrote: I would like to request a new feature for perl: The ability to attach a perl debugger to a running process. The DB module gives you the tools to do this sort of thing, though there is some assembly required for certain very large values of some. -- Michael G. Schwern [EMAIL PROTECTED]http://www.pobox.com/~schwern/ Perl6 Quality Assurance [EMAIL PROTECTED] Kwalitee Is Job One There is a disurbing lack of PASTE ENEMA on the internet.