Re: Transferring control between code segments, eval, and suchlike things
At 03:00 PM 1/22/2003 -0500, you wrote: Okay, since this has all come up, here's the scoop from a design perspective. First, the branch opcodes (branch, bsr, and the conditionals) are all meant for movement within a segment of bytecode. They are *not* supposed to leave a segment. To do so was arguably a bad idea, now it's officially an error. If you need to do so, branch to an op that can transfer across boundaries. Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment. Seems reasonable. Especially when they bytecode loader may not guarantee the relative placement of segments (think mmap()). Although, all this would seem to suggest that we'd need/want a special-purpose allocator for bytecode segments, since every sub has to fit within precisely one segment (and I know _I'd_ like to keep bytecode segments on their own memory pages, to e.g. maximize sharing on fork()). Next, jumps. Jumps take absolute addresses, so either need fixup at load time (blech), are only valid in dynamically generated code (okay, but limiting), or can only jump to values in registers (that's fine). Jumps aren't a problem in general. Fixups aren't so bad if we make the jump opcode itself take an index into a table of fixups (thus letting the bytecode stream stay read-only). Register jumps are dangerous, since parrot can't control what the user code loads into the register (while we can theoretically protect the fixup table from anything short of native code). Design Edict #2: Jumps may go anywhere. Destinations. These are a pain, since if we can go anywhere then the JIT has to do all sorts of nasty and unpleasant things to compensate, and to make every op a valid destination. Yuck. Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil) Marked destinations are very important; as for evil subversion, how about just saying untrusted code only gets pure interpretation, and the untrusting interpreter bounds-checks everything? [snip] Calling actual routines--subs, methods, functions, whatever--at the high level isn't done with branches or jumps. It is, instead, done with the call series of ops. (call, callmeth, callcc, tailcall, tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) These are specifically for calling code that's potentially in other segments, and to call into them at fixed points. I think these need to be hashed out a bit to make them more JIT-friendly, but they're the primary transfer destination point Design Edict #6: The first op in a sub is always a valid jump/branch/control transfer destination Wouldn't make much sense if you had a sub but couldn't call it, now would it? :-D Now. Eval. The compile opcode going in is phenomenally cool (thanks, Leo!) but has pointed out some holes in the semantics. I got handwavey and, well, it shows. No cookie for me. The compreg op should compile the passed code in the language that is indicated and should load that bytecode into the current interpreter. That means that if there are any symbols that get installed because someone's defined a sub then, well, they should get installed into the interpreter's symbol tables. Compiled code is an interesting thing. In some cases it should return a sub PMC, in some cases it should execute and return a value, and in some cases it should install a bunch of stuff in a symbol table and then return a value. These correspond to: eval print 12; $foo = eval sub bar{return 1;}; require foo.pm; respectively. It's sort of a mixed bag, and unfortunately we can't count on the code doing the compilation to properly handle the semantics of the language being compiled. So... Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. How does this play with eval 'sub bar { change_foo(); } BEGIN { bar(); } (...stuff that depends on foo...)'; ? The semantics of BEGIN{} would seem to require that bar be installed into the symbol table immediately... but then how do we reproduce that if we're e.g. loading precompiled bytecode? Oh, and: Design Edict #8: compreg is prototyped. It takes a single string and must return a single PMC. The compiler may cheat as need be. (No need to check and see if it returned a string, or an int) Yes, this does mean that for plain assembly that we want to compile and return a sub ref for we need to do extra in the assembly we pass in. Tough, we can deal. If it was dead-simple it wouldn't be assembly. :) That makes sense. -- BKS
interpreter passing (was Re: Large string patch)
--- Dan Sugalski [EMAIL PROTECTED] wrote: At 07:30 AM 12/30/2001 -1000, David Lisa Jacobs wrote: From: Dan Sugalski [EMAIL PROTECTED] At 08:33 PM 12/29/2001 -1000, David Lisa Jacobs wrote: GC will manage all the memory. Everything managed should either be hung off a PMC or an internal structure. (There are GC hooks in the vtable for complex things) So does that mean I can get rid of passing around the interpreter? Sort of. Memory and structure (pmc header string header) allocation must be from interpreter-local pools. There's a patch to use TLS for the interpreter pointer rather than passing it as an argument--I've pretty much decided it's The Way To Go, so I'm going to dig it out and apply it. So you still need the interpreter pointer, you just don't have to pass it. Are you really sure about this? The reason perl5 threads are MULTIPLICITY-based (pass around an interpreter pointer) is that Sarathy got a noticeable speedup from not having to call pthread_getspecific() every time he needed to allocate memory or look up a symbol. It can be good to have _nocontext functions that who fetch the interpreter when it's really needed (e.g. to throw an error), but do we want to have to make an extra library call of unknown efficiency on _every_ call to string_make()? -- BKS __ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com
Re: Request for comments
--- Dan Sugalski [EMAIL PROTECTED] wrote: At 08:03 PM 12/18/2001 -0800, Benjamin Stuhl wrote: --- Melvin Smith [EMAIL PROTECTED] wrote: 3) Perl IO has conditional compilation for using stdio. Dan has said no STDIO but are we going to abandon conditional support for Parrot? (I vote for ditching conditional STDIO support because then its easier to stop thinking in STDIO terms...) Unfortunately, I don't think we can completely do without stdio in parrot for one reason: miniparrot. Without doing a full configure.pl run, the _only_ I/O API we're guarranteed is a basic stdio. Ah, but we can abandon stdio completely, unless you file read/write in with stdio. Which is reasonable, but I don't generally count them. I was going to argue that unix-ish open()/read()/write()/close() aren't portable, but even Win32's runtime provides an emulation of that much (as does VMS's). Are there any platforms that don't provide this API? (I argued for stdio because it's in the ANSI spec, and so _must_ be there, as opposed to us simply assuming it'll be there.) I want comments now or else I threaten to post replies to myself in a creepy third person way. No! Anything but that. BKS hates things like that! Suckered into those freshmen harmless Psych 101 experiments, I see :) Way too many friends in Prof. Moss's cult^Hclass... ;-) -- BKS __ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com
Re: Hello? Win32 on fire?
--- Andy Dougherty [EMAIL PROTECTED] wrote: One idiom which might work is cd foo $(MAKE) Since lines in makefiles are handed off to the native shell, this will be dependent upon the user's native shell. I don't know any details, but I gather the various shells in Win95, Win98, WinNT, and WinXP are not necessarily identical. I *think* the above idiom works in NT, but not Win98. I'm hopeful it will work in XP. (Of course, the user may well have installed a different command shell, in which case who knows what will happen.) It should work, IIRC, since XP's shell is the latest version of cmd.exe, the NT shell. If the user is using dmake, but is stuck with Win95(?)'s command.com, then he or she can still use perl5's win32/genmk95.pl. Here are the comments from it: # genmk95.pl - uses miniperl to generate a makefile that command.com will # understand given one that cmd.exe will understand # Author: Benjamin K. Stuhl # Date: 10-16-1999 # how it works: #dmake supports an alternative form for its recipes, called group #recipes, in which all elements of a recipe are run with only one shell. #This program converts the standard dmake makefile.mk to one using group #recipes. This is done so that lines using or || (which command.com #doesn't understand) may be split into two lines that will still be run #with one shell. We would need permission from the author to include the script itself in parrot. Consider permission granted, but I probably won't get around to rigging it up to work with parrot myself. (I finally got a newer system with Win2k, so I don't need to arm-wrestle command.com anymore, woohoo!) Of course, this idiom won't AFAIK work on VMS, but that shouldn't surprise anyone. -- BKS __ Do You Yahoo!? Check out Yahoo! Shopping and Yahoo! Auctions for all of your unique holiday gifts! Buy at http://shopping.yahoo.com or bid at http://auctions.yahoo.com
Re: Key stuff for aggregates
--- Dan Sugalski [EMAIL PROTECTED] wrote: At 10:28 AM 12/5/2001 -0500, Jason Gloudon wrote: Using the aggregate's vtable is another way of getting the job done that avoids all the extra reference PMCs. However, references will have to be supported. References are interesting. I'm currently thinking that: *) PMCs should have a get_reference vtable entry *) Accessing a reference should be just like accessing the referent. (i.e. you pass in the same key stuff and the reference vtable does the indirect lookup for you) *) Some references will need to be 'smart', so if you do: $foo = \@bar[4]; and @bar's a packed array, $foo's actually a fancy ref that knows it points to @bar[4] and calls @bar's vtables when you access it. Or something like that. This looks interesting, as far as it goes, but how will parrot support the perl5ish use overload '@{}' = \deref_as_array, '%{}' = \deref_as_hash; ? Do we pass in the PMC type that we want it to come back as? But then how do we tell between the various types of e.g. arrays. (To put it simply, how do you say I want an arry. No, I don't _care_ if it's a PerlPMCArray or a PerlIntArray or a PerlWhatHaveYouArray! ?) -- BKS __ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com
Re: Opcode numbers
--- Gregor N. Purdy [EMAIL PROTECTED] wrote: Brian -- None of these are issues with the approach I've been working on / advocating. I'm hoping we can avoid these altogether. I think this is a cool concept, but it seems like a lot of overhead with the string lookups. I'm hoping we can keep the string lookups in order to sidestep the versioning issue. They can be made pretty cheap with a hashtable or search tree, and the lookups only happen once when we load. And, we may even be able to create the tree or hash table structure as part of the oplib.so, so we don't even have to pay to construct it at run time. I guess I'm making the provisional assumption that by the type we go out and dynamically load the oplib, a few op lookups by name won't be too big a deal if we are smart about it. Of course, I could be wrong, but I'd like to see it in action before passing judgement on it. [snip] Better than doing two string lookups for every op we use (library, op_name), we can vector the library through the fixup section. This is sort of how I at least envision accessing global variables: the fixup has an entry (PAR_FIXUP_GLOBVAR, strtab_ref($foo)), where strtab_ref() is the index of a string in the string table. So loading another oplib becomes as simple as (PAR_FIXUP_OPLIB, core). The individual op descriptors then simply reference the fixup for that library, which after fixup contains the global index of the library. Actually, if libraries are good about not reusing op numbers, we don't have to do _any_ string lookups. Since we're building each module's op table at load time anyway, we don't loose any cache space by not reusing op numbers, since unused ops will never show up in any module's table. e.g. ..use core ..use perl set I0, 2 set I1, 3 add I3, I0, I1 fetch P0, $foo inc P0 produces # .section .fixup PAR_FIXUP_OPLIB, 1 PAR_FIXUP_OPLIB, 2 PAR_FIXUP_OPTABLE_SIZE, 4 # put this here so .optable can # be processed w/o any special # cases and we can prealloc the # table PAR_FIXUP_VARREF, 3# this becomes the pointer to the # entry in the symbol table (not # the variable itself - its slot # in the table so that aliasing # works right) # .section .strtab core perl $foo # .section .optable 1, 54 # core::set_i_ic 1, 33 # core::add_i_i_i 2, 1 # perl::fetch_p_ic 2, 23 # perl::inc_p # .section .text # numbers are the actual opcode/operand values 1 0 2 1 1 3 2 3 0 1 3 0 4 4 0 By using this scheme we manage to not do _any_ string lookups, and if we pick our base set of ops well enough that we don't end up obsoleting many of them, we also won't be using as much memory as the string lookups would require. -- BKS __ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com
Re: Revamping the build system
--- Dan Sugalski [EMAIL PROTECTED] wrote: Okay, I think it's time to abstract out how the build system's handled a bit. I'm not sure how much we need, but filling in a template makefile's not going to cut it, I think. We've a couple of things we need to do generically: *) Compile C code to an object module and put that module in a library We'll also need to be able to apply specific compiler options to specific source files from at least the platform-specific hints files (e.g. on platforms whose optimizer breaks regexec.c). -- BKS __ Do You Yahoo!? Make a great connection at Yahoo! Personals. http://personals.yahoo.com
Re: [PATCH] Big patch to have DO_OP as optional switch() statment
--- Paolo Molaro [EMAIL PROTECTED] wrote: [snip, snip] The problem here is to make sure we really need the opcode swap functionality, it's really something that is going to kill dispatch performance. If a module wants to change the meaning of, eg the + operator, it can simply request the compiler to insert a call to a subroutine, instead of changing the meaning assigned to the VM opcode. The compiler is free to inline the sub, of course, just don't cripple the normal case with unnecessary overhead and let the special case pay the price of flexibility. Of course, if the special case is not so special, a _new_ opcode can be introduced, but there is really no reason to change the meaning of an opcode on the fly, IMHO. Comment, or flame, away. Unfortunately, compiler tricks only work at compile time. They're great for static languages like C++ or C#, but Perl supports doing %CORE::GLOBAL::{'print'} = \myprint; at _runtime_. This is much to late to be going back and patching up any occurences of print_p in the opstream, so we need a level of indirection on every overridable opcode. (Note that the _overloadable_ ones like the math routines don't need that level of indirection - they get it by vectoring through the PMC vtables). -- BKS __ Do You Yahoo!? NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1
RE: Parrot 0.0.2
--- Brent Dax [EMAIL PROTECTED] wrote: --Brent Dax [EMAIL PROTECTED] Configure pumpking for Perl 6 They *will* pay for what they've done. # -Original Message- # From: Simon Cozens [mailto:[EMAIL PROTECTED]] # Sent: Wednesday, October 03, 2001 09:51 # To: Brent Dax # Cc: [EMAIL PROTECTED] # Subject: Re: Parrot 0.0.2 # # # OK, let's try and clear this up. # # On Wed, Oct 03, 2001 at 09:39:32AM -0700, Brent Dax wrote: # # got: 'Seem to have negative Nx # not ok # ' # # expected: 'Seem to have negative Nx # Seem to have positive Nx after pop # ' # # Don't know what's going on here. # # t/op/string.NOK 4# Failed test (Parrot/Test.pm at line 74) # # got: 'Error: Control left bounds of byte-code # block (now at # location # 31)! # # There isn't an end on that test. Fixed. # # # got: 'failure # ' # # Since there was no other output, this failed: # timeI0 # ge I0, 0, OK1 # # Now that's anyone's guess. # # I've added some debugging prints, can you try a resync? I resynced at 3:05pm EDT today, and I'm seeing the same errors (Pentium III, Win2k): C:\parrotperl t/harness t/op/basic..ok t/op/bitwiseok t/op/integerok t/op/number.ok t/op/stacks.NOK 5# Failed test (Parrot/Test.pm at line 74) # got: 'Seem to have negative Nx not ok ' # expected: 'Seem to have negative Nx Seem to have positive Nx after pop ' t/op/stacks.ok 9/9# Looks like you failed 1 tests of 9. t/op/stacks.dubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 5 Failed 1/9 tests, 88.89% okay (-3 skipped tests: 5 okay, 55.56%) t/op/string.ok, 1/11 skipped: TODO: printing empty string reg segfaults t/op/time...NOK 2# Failed test (Parrot/Test.pm at line 74) # got: 'failure ' # expected: 'ok, (!= 1970) Grateful Dead not ok, (nowbefore) timelords need not apply ' # Looks like you failed 1 tests of 2. t/op/time...dubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 2 Failed 1/2 tests, 50.00% okay t/op/trans..ok Failed Test Status Wstat Total Fail Failed List of Failed t/op/stacks.t 1 256 91 11.11% 5 t/op/time.t1 256 21 50.00% 2 4 subtests skipped. Failed 2/8 test scripts, 75.00% okay. 2/100 subtests failed, 98.00% okay. -- BKS __ Do You Yahoo!? NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month. http://geocities.yahoo.com/ps/info1
Re: SV: Parrot multithreading?
--- Alan Burlison [EMAIL PROTECTED] wrote: or have entered a mutex, If they're holding a mutex over a function call without a _really_ good reason, it's their own fault. Rubbish. It is common to take out a lock in an outer functions and then to call several other functions under the protection of the lock. Let me be more specific: if you're holding a mutex over a call back into parrot, it's your own fault. Parrot itself knows which functions may croak() and which won't, so it can use utility funtions that return a status in places where it'd be unsafe to croak(). (And true panics probably should not be croak()s the way they are in perl5 - there's not much an application can do with Bizarre copy of ARRAY) The alternative is that _every_ function simply return a status, which is fundamentally expensive (your real retval has to be an out parameter, to start with). Are we talking 'expensive in C' or 'expensive in parrot?' Expensive in C (wasted memory bandwidth, code bloat - cache waste), which translates to a slower parrot. It is also slow, and speed is priority #1. As far as I'm aware, trading correctness for speed is not an option. This is true, which is why I asked if there were any platforms that have a nonfunctional (set|long)jump. -- BKS __ Do You Yahoo!? Listen to your Yahoo! Mail messages from any phone. http://phone.yahoo.com
Re: SV: Parrot multithreading?
Thus did the Illustrious Dan Sugalski [EMAIL PROTECTED] write: Croak's going to throw an interpreter exception. There's a little bit of documentation about the exception handling opcodes in docs/parrot_assembly.pod, with more to come soonish. This is fine at the target language level (e.g. perl6, python, jako, whatever), but how do we throw catchable exceptions up through six or eight levels of C code? AFAICS, this is more of why perl5 uses the JMP_BUF stuff - so that XS and functions like sv_setsv() can Perl_croak() without caring about who's above them in the call stack. The alternative is that _every_ function simply return a status, which is fundamentally expensive (your real retval has to be an out parameter, to start with). -- BKS __ Do You Yahoo!? Listen to your Yahoo! Mail messages from any phone. http://phone.yahoo.com
RE: SV: Parrot multithreading?
--- Hong Zhang [EMAIL PROTECTED] wrote: This is fine at the target language level (e.g. perl6, python, jako, whatever), but how do we throw catchable exceptions up through six or eight levels of C code? AFAICS, this is more of why perl5 uses the JMP_BUF stuff - so that XS and functions like sv_setsv() can Perl_croak() without caring about who's above them in the call stack. This is my point exactly. This is the wrong assumption. If you don't care about the call stack, how can you expect the [sig]longjmp can successfully unwind stack? The caller may have a malloc memory block, Irrelevant with a GC. or have entered a mutex, If they're holding a mutex over a function call without a _really_ good reason, it's their own fault. or acquire the file lock of Perl cvs directory. You probably have to call Dan or Simon for the last case. The alternative is that _every_ function simply return a status, which is fundamentally expensive (your real retval has to be an out parameter, to start with). This is the only right solution generally. If you really really really know everything between setjmp and longjmp, you can use it. However, the chance is very low. It is also slow, and speed is priority #1. [snip, snip] code. The problem is they can not be used inside signal handler under MT, and it is (almost) impossible to write a thread-safe version. Signals are an event, and so don't need jumps. Under MT, it's not like there would be a lot of contention for PAR_jump_lock. -- BKS __ Do You Yahoo!? Listen to your Yahoo! Mail messages from any phone. http://phone.yahoo.com
Re: 0.0.2 needs what?
--- Dan Sugalski [EMAIL PROTECTED] wrote: At 06:07 PM 9/25/2001 -0700, Benjamin Stuhl wrote: Just to make sure that it's making the _right_ sense, the fixup section is basically our single level of indirection so that we can make the bytecode itself be position-independant, right? Yup. But why store it in this format? What we really need to store is the list of what we expect in the table and where. We have 8 bytes per entry. We can store a lot in there. :) But not enought to allow us to vector Perl-level variable references - to support run-time aliasing, we need to have the fixup table entry be built from the symbol names: e.g. for accessing $Foo::bar, we need index in bytecode (offset in current fixup table) - fixup table entry (constructed by looking up $Foo::bar in the symbol table and getting the address of it's symtab entry) - the PMC * from the symtab entry This extra level of indirection lets us be sure that all the references to a variable are updated when someone does %Foo::{'$bar'} = \$glock; Can't we just have the bytecode header have int32 *data_template; int32 fixup_space_needed; and build the final table as needed? If we do that, the fixup section won't be part of the same section of memory as the bytecode, which means we'll need to touch at least some of the actual bytecode so we can set the absolute address of the fixups. Sticking it on the end means we can access it relatively, and padding to 8k means the section should start on its own memory page so we won't be making a private copy of anything but the fixup section. But it should be just as cheap to have a pointer to the current fixup section in the interpreter structure and just vector off of that, rather than doing relative lookups. Besides, for non-constant entries in the fixup table (PMCs, again), the fixup section needs to be per-interpreter, since different interpreters will want to reference their own PMCs, not someone else's. -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com
RE: [PATCH] assemble.pl registers go from 0-31
--- Hong Zhang [EMAIL PROTECTED] wrote: Just curious, do we need a dedicated zero register and sink register? The zero register always reads zero, and can not be written. The sink register can not be read, and write to it can be ignored. Those, probably not = we have a real nop, and it takes the same number of bits to encode a register as it does a literal integral zero. What we may want (and I've brought up before) is special PMC registers, so our contstant tables aren't clogged up with undefs or the like. -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com
Re: pack(d) packs floats, I think.
--- Simon Cozens [EMAIL PROTECTED] wrote: On Sun, Sep 23, 2001 at 02:17:40AM +0300, Jarkko Hietaniemi wrote: unaligned access Bother. It is as I feared. Dan, we need to do something about this. The choices are: put floats into the constant section, or ensure instructions are assigned on an appropriate boundary. I can see pros and cons of both. What I ended up doing in my work on the perl5 B::Bytecode stuff was actually storing floats as their string representation and restoring them back into native floats at load time. Yes, it's slow and hackish, but strings are the only *portable* format for floating-point numbers, and they have the added benefit of degrading gracefully when bytecode from a machine with more bits of precision is loaded on a machine with fewer. -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger. http://im.yahoo.com
Re: Patch to add string_nprintf
--- Simon Cozens [EMAIL PROTECTED] wrote: On Mon, Sep 17, 2001 at 09:33:56AM +0100, Tom Hughes wrote: The attached patch adds string_nprintf, the last unimplemented function listed in strings.pod as far as I can see. Thanks; but I think I'm going to wait for the portability police to comment. There's every likelihood we want to write out own sprintf-like function. I'm quite sure we will. There are so many unportabilities in *printf (like, what's the format for an IV? an NV?), as well lots of potential buffer overruns (not all platforms have vsnprintf). And besides, we may not want a vsnprintf - we may want something that autogrows the string. Take a look at sv_vcatpvf in perl5's sv.c for inspiration. -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/
RE: [PATCH] testsuite and Win32 compilation
--- Brent Dax [EMAIL PROTECTED] wrote: Gibbs Tanton - tgibbs: # ## +#if defined(WIN32) # ## +program_code = malloc( file_stat.st_size ); # ## +#else # # Also, since more than win32 is not going to have mmap, # perhaps you could add # a Configure #define for HAS_MMAP or something like that. # Then you could # test the cc compiler to check for mmap availability. Configure sets up a bunch of HAS_HEADER_FOO macros in parrot/config.h, including HAS_HEADER_MEMORY (undef on my Win32 system). Would this be the correct file? I'd recommend HAS_HEADER_SYSMMAN (and if anyone saw it, I posted a patch yesterday that started making header includes actually be dependent on the configure macros). -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
[PATCH] Win32 build
As promised, here's the patch that gets Parrot building out of the box on Win32. I had to comment out the mmap(), I'll start working on doing that portably after class. -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/ patch
Re: Call/savestack popping semantics
--- Dan Sugalski [EMAIL PROTECTED] wrote: So, I'm currently working on the stack system for Parrot. I've got the following issue here. Assuming there's one general stack to save stuff on, where stuff is: Out of curiousity, why only one stack? Perl 5 has at least four or five that I can think of off hand (and actually several times that, since it stack-switches when its calling a tie or a signal handler or a hook or...). * Scope entries * Return addresses for JSRs * Saved individual registers * Local() calls These really ought to be separate stacks - not every scope has a return address and not every return address has a scope (e.g. scopeless subs). Why conflate them when we don't need to? (similar arguments hold true for register saving and dynamic variables) Should plain returns at the parrot level clean things up? Which is to say, when I walk back up the stack looking for a return address, if I come across any scope entry markers, shall I clean 'em up, toss them, or pitch a fit? (Currently I'm going to pitch a fit, but that's a temporary solution) Having two types of returns, one that cleans up and one that doesn't, is also an option. (I can see both being useful and reasonable) I would say go with this. It will probably be quite common that we'll need to do cleanup on return, but there are sitiuations where we won't need it, and so shouldn't pay the overhead. -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
Re: RFC: Bytecode file format
--- Brian Wheeler [EMAIL PROTECTED] wrote: I've been thinking alot about the bytecode file format lately. Its going to get really gross really fast when we start adding other (optional) sections to the code. So, with that in mind, here's what I propose: * All data sizes are in longwords (4 bytes) because that's just the way things are :) We can't do that. There are platforms on both ends that have _no_ native 32-bit data formats (Crays, some 16-bit CPUs?). They still need to be able to load and generate bytecode without ridiculuous CPU penalties (your Palm III is not running on a 700MHz Pentium III, after all!) * The file is composed of a header (which is really just a magic cookie) , a series of data chunks, and a directory (of sorts) OffsetLength Description 0 1 Magic Cookie (0x013155a1) 1 n Data n+1 m Directory Table m+n+1 1 Offset of beginning of directory table (i.e. n+1) No, we _really_ need some versioning info (either major/minor or just a single integer (if we go through even 16000 revisions of the bytecode, we've screwed up somewhere). As Dan said, we also need a BOM of some sort and a word size indicator. I think we certainly want the directory at the front, especially since we will likely end up with data segments that we might not want/need to load (e.g. the original source code, debugging info (?), the optimized parse tree, etc.). If we have to map in the entire file to find the end so we can find what parts we don't want to map, that sort of defeats the purpose. -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/
Re: RFC: Bytecode file format
--- Brian Wheeler [EMAIL PROTECTED] wrote: Ok, what if we did IFF with these caveats: * all chunks must be padded to 4 bytes (instead of IFF's 2) * no nesting of FORMs Chunks we'd need are: Name: 'PINF' - Parrot Information Size: 28 bytes + size of directory Optional: No Data: longmagic cookie (or will PINF) be enough? 8-byte word:endianness (magic value 0x123456789abcdef0) byte: word size byte[7]:empty word: major version word: minor version longcount of directory entries --- directory goes here --- -- each entry as follows -- longtype of chunk longoffset Name: 'PBYT' - Parrot Bytecode Size: Varies Optional: Sure. :) Data: bytes of the bytecode Name: 'PSTR' - Parrot String Table Size: Varies Optional: Yes Data: longCount of string entries --- each string as follows --- longbyte length n bytes + pad string data Name: 'PFIX' - Parrot Fixup Table Size: Varies Optional: Yes Data: --- beats me...how are we doing fixups? --- Name: 'PNOT' - Parrot Notes Block Size: Varies Optional: Yes Data: free-form text for 'notes' about the file. How's this? A few more chunks: Name: 'PCON' - Parrot Constants Block Size: Varies Optional: Yes Data: the constants section (you know, all those intialized PMCs... :-) Name: 'PSOU' - Parrot Source Block Size: Varies Optional: Yes Data: the source code for the program Name: 'PMOD' - Parrot Module list Size: Varies Optional: Yes Data: the list of all the modules that need to be loaded for this module to work (since there may be a lot of time between BEGIN{} and execution, we need to record just what was loaded so we can reload and initialize it) Name: 'PSYM' - Parrot symbol table Size: Varies Optional: Yes Data: an offset-based hash table for every symbol defined in the module (dynamic symbol table manipulation is two-level - an in-memory table that is queried first for dynamic overrides and the static per- module one for compile-time definitions) Name: 'PSPS' Parrot Special Subroutines Size: Varies Optional: Yes Data: word: number of special routines (followed by a list of word: type word: bytecode offset pairs) (the point of this is that multiple BEGIN/INIT/CHECK/END subs are allowed, so they can't be simply stored in the symbol table) There are probably some other sections I can't think of, but these are a start. -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/
Quick success report for Win32
I had to hand-apply the NV patch and some of the casting patches to get VC++ to shut up and compile, but Parrot works on Win32 (Win2k, VC++ 6.0SP5). (it takes 1 sec to count to ten million on my PIII 1Ghz) I'll post the hacked-up Makefile that I fed through nmake to get it to work when I get back from classes. -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/
Re: #include config.h or #include parrot/config.h
--- Dave Mitchell [EMAIL PROTECTED] wrote: Andy Dougherty [EMAIL PROTECTED] wrote: On Wed, 12 Sep 2001, Dan Sugalski wrote: changing parrot.h to do #include parrot/config.h and then changing Makefile to add -I./include to CCFLAGS. One thing to keep in mind is that the directory may not be sufficient on some platforms. VMS, specifically, ignores the directory portion of the include filename. (And the suffix, generally, but that's separate) Hmm. So would you suggest adding -I[.include] -I[.include.parrot] for VMS as well? (My VMS days were a very long time ago.) Not, mind, that I'm proposing prepending parrot_ to all the filenames, though that's an option certainly. That would be fun on 8.3 filesystems :-). Perhaps I'm missing something here, but I always thought that #include config.h rather than #include config.h would ensure that the local Perl version would get always get picked up in preference. The point is not us finding our config.h; the problem is if we are embedded, we don't want to clobber our embedder's config.h. (Besides, when we're being embedded, there're no promises about what order the -I flags will be when we're compiled. E.g.: #include parrot/embed.h /* includes parrot/config.h #include project.h /* includes config.h */ int call_parrot(...) { IV i; ... #ifdef USE_FUNCTION_FOO embedders_foo(); #endif ... } -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/
Re: patch: assembly listings from assembler
--- Brian Wheeler [EMAIL PROTECTED] wrote: Index: assemble.pl === RCS file: /home/perlcvs/parrot/assemble.pl,v retrieving revision 1.14 diff -r1.14 assemble.pl 7a8 use Getopt::Long; 9,12c10,33 my $opt_c; if (@ARGV and $ARGV[0] eq -c) { shift @ARGV; $opt_c = 1; --- my %options; GetOptions(\%options,('checksyntax', 'help', 'version', 'verbose', 'output=s', 'listing=s')); Could we please get in the habit of adding a -c or a -u to our CVS diffs, just as we would with normal patches? Many thanks, -- BKS __ Terrorist Attacks on U.S. - How can you help? Donate cash, emergency relief information http://dailynews.yahoo.com/fc/US/Emergency_Information/
Re: #include config.h or #include parrot/config.h
--- Andy Dougherty [EMAIL PROTECTED] wrote: In perl5, we've had occasional header file name conflicts over the years. One common example is someone putting a file named config.h in /usr/local/include. Other conflicts with string.h and memory.h are also conceivable. I'd suggest cd parrot mkdir include mkdir include/parrot mv *.h include/parrot changing parrot.h to do #include parrot/config.h and then changing Makefile to add -I./include to CCFLAGS. YES!!! This is something I've wanted to do to Perl5 for years, but we can't because every XS module in the world expects the headers to be at the root. We should _definitely_ do this -- the fewer namespaces we pollute, the better. -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
Re: Muddled Boundaries - Perl 6 vs Parrot
--- Simon Cozens [EMAIL PROTECTED] wrote: On Mon, Sep 10, 2001 at 11:26:03AM -0700, Benjamin Stuhl wrote: It's not a prioirty, but it's so much easier to walk the correct path from the start. Since it's all Parrot, it's even easier. Hear, hear! I remember the pain in 5.005_5* of turning off PERL_POLLUTE. I expect that there may still be CPAN modules that won't build without manually defining it. You are, of course, correct; I back down. However, if you care that much, I'm going to make you prove it by implementing it. :) On my personal todo list for this afternoon is to go through each of the source files and enforce the coding PDD. However, before I do this, I would like to bring up the question of prefixes once again. Do we really need a 7 letter, mixed-case prefix (Parrot_)? If Apache can do ap_, why can't we do par_ (and maybe parp_ for private stuff)? Also, I am planning to go through the structs and prefix them (STRING - PAR_STRIN, for instance) and clean up the subsystem naming. What do you think of par_gc_* for memory management (yes, that means that uncollected memory is gotten by par_gc_memalloc_nogc(), but if you really want it, you deserve to type in all those letters) and par_io_* for I/O? -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
Feature testing API?
It seems to me that Parrot should expose a feature testing API. Something on the order of int [read: boolean] par_has_feature(PAR_STRING *feature); (yes, I _do_ think that claiming STRING is unnecessary namespace pollution - sepecially as ANSI compilers, AFAIK, aren't required to be case-sensitive) This would be very useful for any language running on Parrot, so that they can just test (via some language binding) its presence at the beginning of a program, rather than bombind out in the middle when they hit an unimplemented call. Example features might be async I/O, run-time compilation (eval) of different languages, dynamic loading, etc. Basically, I would like to be able to say: use features qw[asyncio eval(perl6)]; and know that if my program loads, it's not going to panic when it gets to an eval 'Async::queueio_out($out_fh, $text);'; What d'y'all think? -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
Re: Muddled Boundaries - Perl 6 vs Parrot
--- Bryan C. Warnock [EMAIL PROTECTED] wisely wrote: On Monday 10 September 2001 01:08 pm, Simon Cozens wrote: And in addition - why are we worrying about namespace collision RIGHT NOW? Sure, when Parrot can be embedded, then we should ensure that our names aren't going to clash. But who in their right minds is going to embed Parrot in anything in its current state? (Leon, I said in their right minds) It's not a priority, compared to getting working code out there. We can sort it out later. Oh, how many times have I heard that before? It's not a prioirty, but it's so much easier to walk the correct path from the start. Since it's all Parrot, it's even easier. Hear, hear! I remember the pain in 5.005_5* of turning off PERL_POLLUTE. I expect that there may still be CPAN modules that won't build without manually defining it. It's much better if we get it right from the start so that there's less that we need to go back and fix. -- BKS __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/
Re: Math functions? (Particularly transcendental ones)
--- Dan Sugalski [EMAIL PROTECTED] wrote: Okay, I'm whipping together the fancy math section of the interpreter assembly language. I've got: sin, cos, tan : Plain ones asin, acos, atan : arc-whatevers shinh, cosh, tanh : Hyperbolic whatevers log2, log10, log : Base 2, base 10, and explicit base logarithms pow : Raise x to the y power Can anyone think of things I've forgotten? It's been a while since I've done numeric work. ln, asinh, acosh, atanh2? -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
language agnosticism and internal naming
I had a thought this morning on funtion/struct/global prefixes for Parrot. If we really plan to also run Python/Ruby/whatever on it, it does not look good for the entire API to be prefixed with perl_. We really (IMHO) ought to pick something else so that we don't give people a convenient target for FUD. For lack of anything better, I propose par_ for functions. We might stll be able to get away with PL_ for globals (Parrot Library?), but I doubt it. Just something else to consider. (But hopefully a topic that won't make Dan's brain hurt any more than it probably does. :-) -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
Re: An overview of the Parrot interpreter
--- Dan Sugalski [EMAIL PROTECTED] wrote: At 03:48 PM 9/4/2001 -0400, Uri Guttman wrote: DS == Dan Sugalski [EMAIL PROTECTED] writes: DS Ah. I've always wanted to do that with tied hashes. Okay, even DS more reason to pass the data in! (We're going to end up with a DS WANT register by the time we're done...) that is not a bad idea. we could allocate a PMC register (e.g. #31) permanently to store WANT info (in a hash i assume like the RFC implies). I don't think I'd want to soak up a PMC register that way. Maybe an integer one. Maybe not a general purpose PMC register, but what about a special one? Since the proposal was to lazily update it, it doesn't need to be part of the standard register frame. Besides, I though we were going with having a few special PMC registers (PL_sv_yes, PL_sv_no, PL_sv_undef, etc.) to reduce the size of the constants section? -- BKS __ Do You Yahoo!? Get email alerts NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
-g vs. -O
Alright, here's an issue I was musing on after dinner yesterday: There are huge sets of optimizations that could be made *if* the user promises not to do certain things. For instance, who needs a symbol table when the user has promised not to do any symbolic lookups? (Yes, I know, the debugger, but only _if_ the user actually wants to debug the program.) Thus, I propose 3 command-line switches (highly reminiscent of gcc...): -g : include all information required for debugging (symbol tables, etc.) and do not perform optimizations -O# : controls just how complex the optimizations we try to make are, given the constraints we're under from what the user did _not_ promise to abstain from -fpromise : this is a bit different from gcc, where it controls individual optimizations - instead, in Parrot it enumerates the promises the user makes (eg. I solemnly swear to never use symbolic references, count on specific op patterns, or use any number large enough to require bignums.) If certain promises become very common, they could possibly get their own flag or something. Thoughts? -- BKS __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/
Re: new event loop
Thus spake the enlightened Uri Guttman [EMAIL PROTECTED]: i am going to make a proposal that we ('we' to be defined later) develop a new common event loop with two major goals in mind: 1. the event loop should be fully portable over all modern unix OS's and the win32 server flavors (nt, 2k). VMS! We must have VMS! Oh, and it should proabably be modular enough that one can use a stripped down version of it to write Palm apps too. (Perl6 for the TI-89, anyone?) -- BKS __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/
Re: ~ for concat / negation (Re: The Perl 6 Emulator)
In summary: 1. I don't like ~ for concat 2. But if it does become concat, then we still shouldn't change ~'s current unary meaning Thanks for listening. -Nate I agree completely. However, this is no longer really a topic for -internals, it's really a purely language thing. -- BKS __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/
Re: A quick sketch of the interpreter
--- Dan Sugalski [EMAIL PROTECTED] wrote: =head1 Stacks [snip] The stacks are at least: =over 4 =item Temp stack for squirreling away the contents of individual registers =item Register stack For pushing the entire register file at once. There are four sets, one for each register type. =item state stack For the interpreter's internal state =back Perl 5-ish save stack for dynamic scoping? (whatever term replaces 'local') What is the subroutine calling convention? Caller cleans or callee cleans? =head1 Registers We have four sets. Each set has 64 members Do we really need 64 ints and 64 floats? 64 stringish ones I can understand (sort of) - the RE engine could use them. Maybe only 32 each of ints and floats? Also, what about the suggestion to have the various special values (PL_sv_undef, PL_sv_yes, PL_sv_yes) be registers (so undef $foo becomes 'st sp_reg0, $foo' or somesuch)? Also, what about having one or more of the registers be the 'lexical state register(s)' to inplement pragmas (or is this the state stack?)? =head1 Opcodes Opcodes are all dispatched indirectly via an opcode function table. Each segment of bytecode (a segment roughly corresponding to a compilation unit--a precompiled module would be in its own segment, for example) has its own opcode function table. Be wary of this. I tried this in Perl 5 (on an old sun4c, granted), and I came out with something like a 5% slowdown over having the function pointer actually stored in the op, IIRC. =head1 The opcode loop This is a tight loop. All it does is call an opcode function, get back a pointer to the next opcode to execute, and check the event dispatch flag. Lather, rinse, repeat ad infinitum. How does this port to a TIL form? =head1 Bytecode Looks good from here (and a _lot_ prettier than B::Bytecode/ByteLoader!). -- BKS __ Do You Yahoo!? Spot the hottest trends in music, movies, and more. http://buzz.yahoo.com/
Re: Should the op dispatch loop decode?
--- Dan Sugalski [EMAIL PROTECTED] wrote: 'Kay, here's a question to ponder. Should the op dispatch loop handle argument decoding, or should that be left to the opcode functions? [good analysis of trade-off's snipped] At the moment I'm leaning towards the functions doing their own decoding, as it seems likely to be faster. (Though we'd be duplicating the decoding logic everywhere, and bigger's reasonably bad) Possibly mandating shadow functions for each opcode function, where the shadow does the decoding and calls the real functions which take real things rather than our registers. Opinions anyone? I don't see where shadow functions are really necessary - after all, no one has ever complained that you can't do pp_chomp(sv); /* or pp_add(sv1, sv2), for that matter */ in Perl 5. Quite frankly the shadow thing sounds like a bundle of unnecessary function calls. But that's just my opinion, feel free to disagree. -- BKS __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
vtbl-based SVs and sv_setsv()
How is setting one SV from another going to be implemented? My (admittedly vague) recollection was that it would be something like void sv_setsv(SV* dest, SV* src) { dest-sv_vtbl-delete(dest); /* clear the old value */ dest-sv_vtbl = src-sv_vtbl; dest-sv_vtbl-dupfrom(dest, src); /* and copy in the new */ } That is, in $a = $b, $a would get a new vtbl, the one from $b. My question is how does this work with the need for assign-by-value, which is required for things like ties and overloads? -- BKS __ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/
Re: standard representations
--- Dan Sugalski [EMAIL PROTECTED] wrote: At 08:02 AM 12/26/00 -0800, Benjamin Stuhl wrote: Thus spake the illustrious Dan Sugalski [EMAIL PROTECTED]: For integers, we have two types, platform native, and bigint. No guarantees are made as to the size of a native int. bigints can be of any size. I'm not sure about the wisdom of not making any guarrantees about int size, since that means that extensions have to go through the same hoops perl5 has, dealing with "unspecified" behaviors (cf. fun with ANSI stdio). To make life easy, we might want to ordain sizeof(p6int) = sizeof(void *) sizeof(p6int) = 4. Perl will (well, should at least) automagically upgrade to bigints if a regular int overflows (Assuming that conversion's not been forbidden by a particular variable), so that's not going to be an issue inside variables. As for types presented to extensions, we can certainly provide I8, I16, I32, and friends. On the other hand, this makes a port of the PVM to Palms and the like somewhat harder (but would it be much easier to wedge them into the standard PVM?). Also, can we please mandate 2s-complement integral math? Perl 5 really always has, but can we please make it official? Why? For variables, math is math--2+2=4 regardless of whether you're one or two's complement, or BCD-encoded, or use the EBCDIC signed characters, or... Mandating representations seems rather too low-level to me, though if you've got a good argument I'm OK with it. Mostly because it seems to be a requirement for intelligent integer-preserving maths (cf. PERL_PRESERVE_IVUV in perl5). For floats, we also have two types, C double and bigfloat. No guarantees to the size or accuracy of the double. bigfloats can be of any size. Floating point is even harder, and will require a lot of build-time checks anyway. The big issue I have with floats is that bigfloats will be more precise than regular floats/doubles, and so downconverting will lose data, which I don't like. Other than that, because floats should autoconvert like ints, I don't see any problem. (Not to say there isn't one, just that I don't see it... :) My question here is whether each supported platform is going to need to provide its own overflow detection/autoconversion decision routines, since the portable part of perl6 will have now idea how far it can go with native numbers. Strings can be of three types--binary data, platform native, and UTF-32. "platform native"? ASCII, EBCDIC, 16-bit chars, whatever. I'd rather not deal with variable-length characters at all, so things like UTF-8 and friends aren't really on the list. (Though they could be with the regex engine dealing with them in UTF-32 format) But why is perl6 messing with them, since it has no idea what they mean? No, we are not messing around with UTF-8 or 16, nor are we messing with EBCDIC, shift-JIS, or any of that stuff. Strings can be stored internally that way (and the native form might be one of them) but as far as the interface is concerned we have only three. Yes, this does mean if we mess with strings in UTF-8 format on a non-UTF-8 system they'll need to be fed out in UTF-32. It's bigger, but we can deal. The issue with UTF-32 is that we'd need to write an entire string-handling library, while quite a few modern platforms have _wstr* or equivalent. I'm not sure there's much in the way of string handling that we need to do that's not perl-specific. It's also not all that much work anyway and, while it is stuff we'll need to do, the benefits seem worth it. The only issue is that the CRTL's versions may be in hand-tuned assembler, and perl6 will be doing a _lot_ of strlen()s, I expect. But with good compilers, I suppose, its an open question on how much performance one really gains from hand assembly. -- BKS __ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/
Re: standard representations
Thus spake the illustrious Dan Sugalski [EMAIL PROTECTED]: Okay, here's what I'm currently thinking of for standard representations of integers, numbers, strings, and (possibly) complex data. These are not necessarily indicative of how the data's stored in scalars (or hashes or arrays), merely the types that will need to be dealt with in the vtables. In addition to each of the types below, each vtable will have a 'same type' entry that'll be used if the optimizer can guarantee that the scalars involved in an operation are of the identical type. (Presumably things can be faster that way) For integers, we have two types, platform native, and bigint. No guarantees are made as to the size of a native int. bigints can be of any size. I'm not sure about the wisdom of not making any guarrantees about int size, since that means that extensions have to go through the same hoops perl5 has, dealing with "unspecified" behaviors (cf. fun with ANSI stdio). To make life easy, we might want to ordain sizeof(p6int) = sizeof(void *) sizeof(p6int) = 4. On the other hand, this makes a port of the PVM to Palms and the like somewhat harder (but would it be much easier to wedge them into the standard PVM?). Also, can we please mandate 2s-complement integral math? Perl 5 really always has, but can we please make it official? For floats, we also have two types, C double and bigfloat. No guarantees to the size or accuracy of the double. bigfloats can be of any size. Floating point is even harder, and will require a lot of build-time checks anyway. Strings can be of three types--binary data, platform native, and UTF-32. "platform native"? No, we are not messing around with UTF-8 or 16, nor are we messing with EBCDIC, shift-JIS, or any of that stuff. Strings can be stored internally that way (and the native form might be one of them) but as far as the interface is concerned we have only three. Yes, this does mean if we mess with strings in UTF-8 format on a non-UTF-8 system they'll need to be fed out in UTF-32. It's bigger, but we can deal. The issue with UTF-32 is that we'd need to write an entire string-handling library, while quite a few modern platforms have _wstr* or equivalent. Finally, complex numbers, if we deal with them, will be either double or bigfloat complexes. (I don't see any reason to mess with integer versions, nor with mixed double/bigfloat types) And, unless Larry objects, I feel that all vtable methods should have the option of going with a 'scalar native' form if the operation if it's determined at runtime that two scalars are the same type, though this is optional and bay be skipped for cost reasons. (Doing it with, for example, complex numbers might be worth it, or when expensive conversions might be avoided) This part sounds good. Comments? I'm trying to balance out accuracy and DWIMmery with cost here, and I'm not 100% sure things are quite right yet. Dan -- BKS __ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/
Re: [not quite an RFC] shared bytecode/optree
--- Chaim Frenkel [EMAIL PROTECTED] wrote: "BS" == Benjamin Stuhl [EMAIL PROTECTED] writes: BS 1. Bytecode can just be mmap'ed or read in, no playing BS around with relocations on loading or games with RVAs BS (which can't be used anyway, since variable RVAs vary based BS on what's been allocated or freed earlier). (What is an RVA?) relative virtual address And how does the actual runtime use a relocatable pointer? If it is an offset, then any access becomes an add. And depending upon the source of the pointer, it would either be a real address or an offset. Or if everything is a handle, then each access requires two fetches. And I don't see where you avoided the relocation. The handle table that would come in with the bytecode would need to be adjusted to reflect the real address. I vaguly can see a TIL that uses machine code linkage (real machine code jumps) that perhaps could use relative addressing as not needing relocation. But I'm not sure that all architectures support long enough relative jumps/calls. Doing the actual relocation should be quite fast. I believe that all current executables have to be relocated upon loading. Not to mention the calls to shared modules/dlls. chaim -- Chaim Frenkel Nonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183 My primary goal (it may not have come accross strongly enough) in this proposal was sharing bytecode between threads even with an ithreadsish model (variables are thread-private, except when explicitly shared). This requires that the bytecode not contain direct pointers to variables, but rather references with at least one level of indirection. Avoiding fixups/relocations and allowing bytecode to be mmap()ed are additional potential benefits. But my first goal was to not have one copy of each subroutine in File::Spec::Functions for each thread I run. -- BKS __ Do You Yahoo!? Yahoo! Messenger - Talk while you surf! It's FREE. http://im.yahoo.com/
[not quite an RFC] shared bytecode/optree
Firstly, by "bytecode" I mean a .pmc and by "optree" I mean the perl6 VM's internal form that it goes through executing. It seems to me that one thing that the perl6 bytecode implementation _should_ do (in the interests of being light and fast, as well as meshing well with MT) is be position-independant. What do I mean? That all direct references to SV*'s or regexes or anything else in the bytecode _and_ the optree should actually be handles of some sort. This has several benefits: 1. Bytecode can just be mmap'ed or read in, no playing around with relocations on loading or games with RVAs (which can't be used anyway, since variable RVAs vary based on what's been allocated or freed earlier). 2. (more importantly, IMHO) Bytecode and the optree are shareable between threads. My primary reason for opposing to the RFC proposing that modules must be reloaded in each thread is the immense amount of memory that would be wasted without bytecode/optree sharing. 3. With a good slab allocator and possibly some mprotect() calls (and a good OS) bytecode/optree suddenly becomes _completely_ shared between child processed. No more needing to restart httpd and mod_perl6 because the mixing of code and data has doubled the core usage of each process! I don't have the background to seriously argue implementation, but I might suggest a "handle table" of sorts which defines for each thread and CV which variable goes with which handle. This sort of ties in with my (vague) idea that CVs should carry around instructions for building their scratchpad, rather than the pad itself (IOW, scratchpads become purely part of the stack frame, rather than the subroutine's carrier variable). This is all for the purpose of reducing the required locking around subroutine calls to nil or almost nil (perhaps one to make sure that no-one's changed the subroutine out from under us via eval("*foo = \bar;"); or the like). At any rate, I'm just spouting off ideas sparked by various recent discussions (I probably need a higher blood sugar or something). It's probably too early to seriously argue technical merits, but on the other hand, basic VM design can start before we know the precise grammar. -- BKS __ Do You Yahoo!? Yahoo! Messenger - Talk while you surf! It's FREE. http://im.yahoo.com/
Re: RFCs for thread models
--- Chaim Frenkel [EMAIL PROTECTED] wrote: "SWM" == Steven W McDougall [EMAIL PROTECTED] writes: SWM If you actually compile a Perl program, like SWM $a = $b SWM and then look at the op tree, you won't find the symbol "$b", or "b" SWM anywhere in it. The fetch() op does not have the name of the variable SWM $b; rather, it holds a pointer to the value for $b. Where did you get this idea from? P5 currently does many lookups for names. All globals. Lexicals live elsewhere. Globals whose names can be resolved at compile time are, with the SV* is stuck in to o-op_sv. SWM If each thread is to have its own value for $b, then the fetch() op SWM can't hold a pointer to *the* value. Instead, it must hold a pointer SWM to a map that indexes from thread ID to the value of $b for that SWM thread. Thread IDs tend to be sparse, so the map can't be implemented SWM as an array. It will have to be a hash, or a B*-tree, or a balanced SWM B-tree, or the like. Or, say a hash table by pointer value that only contains thread-local-ified globals - the rest juat use the stored pointer (So for only a few thread-local globals, there is very little overhead). I.e. OP* PERL_FASTCALL p6_pp_fetch (perl_thread *t) { SV *real_sv = ((SVOP*)PL_op)-op_sv, tsv; if (tsv = p6_ptrtbl_fetch(t-t_localsvs, real_sv)) real_sv = tsv; p6_extend_stack(t-t_stack, 1); p6_push(t-t_stack, real_sv); } Now where sub recursive() { my $a :shared; ; return recursive() } would put $a or even which $a is meant, is left as an excersize for someone brighter than me. %P6-E-MEANINGLESS, "my $a : shared" is a meaningless construct. -- BKS __ Do You Yahoo!? Yahoo! Mail - Free email you can access from anywhere! http://mail.yahoo.com/
YAVTBL: yet another vtbl scheme
All - I fail to see the reason for imposing that all variables "know" how to perform ops upon themselves. An operation is separate from the data it operates on. Therefore, I propose the following vtbl scheme, with two goals: 1. that the minimal vtbl be just that, minimal 2. that it be possible (convenient) to override ops as needed First, a few basic types (these are sample only, and should be beaten on for cach-friendliness, etc. once a design is formalized). typedef struct _ovl { U32 ov_type; U32 ov_flags; void *ov_vtbl; void *ov_data; struct _ovl *ov_next; } OVERLOAD; typedef union { SCALAR_VTBL s; ARRAY_VTBL a; HASH_VTBL h; } SV_VTBL; typedef struct sv { void *sv_data; OVERLOAD *sv_magic; SV_VTBL *sv_vtbl; U32 sv_flags; /* and type (SV, AV, HV) */ (... GC stuff ... MT-safe stuff ...) } SV, *PMC; SV_VTBL, then, supports basic operations on perlish data types (get, store, and a few housekeeping things). Since noone (outside perl and libperl.so) should be directly calling vtbl functions, this makes it easy to put checks in that a variable is the appropriate type (ie, av_fetch will die if the variable is really a scalar). Here are what each data type should support (each get/set may require an argument giving a bit more detail (ie, U16 vs. I64, UTF8 vs. UTF16-bigendian, etc.)): SCALAR_VTBL: get_int get_string get_real get_ref num_sign /* positive or negative (or zero?)*/ num_is_integral set_int set_string set_real set_ref set_multival /* == perl5ish sv_setpv(sv...); sv_setiv(sv,...); SvPOK_on(sv); (esp this part) */ undef construct finalize ARRAY_VTBL: get_at set_at grow /* a hint on where we plan to put values, ie av-sv_vtbl.a.grow(bottom_ix, top_ix) */ size clear /* @av = (); */ undef /* undef @av; */ get_interator construct finalize HASH_VTBL: fetch store get_iterator /* not sure if these two are needed */ get_iterator_keys get_iterator_values clear undef size construct finalize In order to allow overriding of opcodes for, say, BigInts, several types of OVERLOAD are defined (4 basic types (flags in bottom byte of ov_type?) are defined, based on what flavor of vtbl is in ov_vtbl). These are OV_GET, OV_SET, OV_RANDOM, OV_OPS and are denoted in sv-sv_flags. The first three correspond to the perl5 GMG, SMG, and RMG. The last marks that the vtbl is an overload of one or more opcodes. Every op checks to see if it is overloaded, and if it is, calls that. Some ops don't need to (ie, vec() can just do a set_string and add an OVERLOAD for the bitwise ops). If necessary, additional subclasses of OV_OPS may be defined (ie, OV_NUMERIC, OV_STRING, OV_IO). -- BKS __ Do You Yahoo!? Yahoo! Mail - Free email you can access from anywhere! http://mail.yahoo.com/
Re: RFC 146 (v1) Remove socket functions from core
--- "Stephen P. Potter" [EMAIL PROTECTED] wrote: Lightning flashed, thunder crashed and Tom Christiansen [EMAIL PROTECTED] m whispered: | Unless that's done completely transparently, you'll pretty much screw the | pooch as far as "Perl is the Cliff Notes of Unix" notion. Not to | mention running a very strong risk of butchering the performance. I don't think there is any ruling from Larry that perl must remain the "Cliff Notes of Unix." In fact, there seems to be a bit of a concerted effort (partly suggested by Larry, IIRC) to make perl *less* Unix-centric and more friendly for other environments. I'm not concerned with performance, per se. I have confidence in the people who will actually write the code to take care of that issue. Performance will be a factor in deciding whether this can be implemented or not. If performance will suffer unacceptably, then this won't get implemented. It probably would. Dynamic loading is not cheap, and having to do a dlopen() and a dlsym() (or a LoadLibrary() and a GetProcAddress()) to find out the square root of 2 is not my idea of a _useful_ lightweight programing language. | I don't understand this desire to eviscerate Perl's guts. Having | everything you want just *there* is part of what's made Perl fast, | fun, and successful. Good luck on preserving all three. This desire stems from having a wonderful mechanism for making the core more lightweight (hopefully improving performance) called loadable modules. Larry designed this feature for a reason, and has been saying since the early perl5 alphas that we could/should migrate some things out of the core. I'm simply suggesting all the parts that I think reasonably go together than could be migrated. They can still be "there", just in a module. If the AUTOLOAD stuff that is being discussed works out, you won't even know the internals have changed. AUTOLOAD searches are not cheap either. It can take a lot of stat() calls to even _find_ the correct module, much less load it. The average math function in the perl5 core is about 13 lines of C code. Eviscerating it out of the core would accomplish nothing. I don't understand this desire to not want anything to change. This is an opportunity to clean up the language, make it more useable, and more fun. Slowing perl down and forcing everyone to add 5 "use" statements to the top of every program to get any useful features would neither make it more useful or more fun. I would have a lot more fun if perl were a better performer and if it was easy for me to expand it, contract it, reshape it, improve it, etc. -spp -- BKS __ Do You Yahoo!? Yahoo! Mail - Free email you can access from anywhere! http://mail.yahoo.com/
Re: Avoid memory copy and redundant loops in reduce/fold
The normal problem with this type of structure is that the previous statement would create 2 array copies, and 3 loops for most compilers. In perl speak, it might look like: $dummy1[$_] = $b[$_]*$c[$_] for (0..$#b-1); $dummy2[$_] = $d[$_]+$dummy1[$_] for (0..$#dummy1-1); $sum+=$_ for (@dummy2); (Sorry if this isn't very idiomatic perl--it's not really my native language.) Progressive C++ numeric programming libraries like POOMA and Blitz++ use template meta-programming techniques to implement 'expression templates'. Templates are used to create the parse tree for these kind of array expressions at compile time, and the compiler then optimises out the extra loops and array copies to create something like: $sum+=$b[$_]*$c[$_]+$d[$_] for (0..$#b-1); Without this optimisation, array semantics become next to useless for numeric programming, because their overhead is just so high. But writing numericly intensive programs without array semantics is messy--they become littered with control structures and loops (which is particularly unintuitive for mathmaticians used to the compact notation of mathematics). So, could perl 6 do this optimisation (assuming that the array notation/folding stuff makes its way into the language)? Given that some amount of compilation or interpretation will presumably still be done at run-time, perhaps this is much easier in perl than in C++... This actually leads to a much more general question, namely passing of arrays to functions. For ppcode at least and probably any code using the perl API, it should be possible and IMHO desirable to push the AV* (or equivalent), rather than expanding the array and pushing each of its elements. Furthermore, if we do this, it would make passing named array arguments (sub foo (@baz, @qux) { ... }) much simpler. If a subroutine asks for @_, than we can go the old way and push everything, but reducing the number of stack pushes on list operators could be a major win. -- BKS __ Do You Yahoo!? Kick off your party with Yahoo! Invites. http://invites.yahoo.com/