IMCC fixes
I've fixed nearly all of breakage with IMCC that was introduced with the last large patch. I'm currently trying to localize all APIs to the IMC_Unit but I'm not quite there yet. A hash test is failing, but I have no clue how my IMCC work affected that code. I'm hoping it was already failing before I synced? -Melvin
Re: Unifying call and return
Dan Sugalski [EMAIL PROTECTED] wrote: YHO would be incorrect here. There's a lot of runtime mutability, and there's no guarantee that a sub or method has the same prototype at runtime that it did at compiletime. *if* the pdds allow such weirdness with native types. Can we define another property for such calls: .sub _weird prototyped check_args I still think, that the *common* case (i.e. fixed and known arguments on both sides) should be as fast as possible. Dan leo
Re: Arena flags and floating exception
Melvin Smith [EMAIL PROTECTED] wrote: When I compile with Electric Fence (linux Athlon XP) I get a floating point exception on startup. valgrind doesn't show this problem - strange. -Melvin leo
Re: [CVS ci] hash compare
Dan Sugalski [EMAIL PROTECTED] wrote: You're going to run into problems no matter what you do, and as transcoding could happen with each comparison arguably you need to make a local copy of the string for each comparison, as otherwise you run the risk of significant data loss as a sring gets transcoded back and forth across a lossy boundary. Here is again, what I already had proposed: * as long as there are only ascii keys: noop * on first non ascii key, convert all hash to utf8 - doesn't change hash values * then if key is non-ascii and non-utf8 transcode it in find_bucket() before string_compare The hash (assuming ascii is used mainly) starts out with a compare function pointing to a strcmp()-alike compare function. Each key that enters the hash either for insert or for find is checked for its encoding/type. When the first non-ascii key is inserted, hash keys are converted to utf8 and the compare function pointer is changed to do utf8 compare. Non-ascii search keys are always transcoded to utf8 first - and only once. Regardless, I think at least a single string copy with comparison against that copy within the hash functions is the only way to get correct results. Yes. That's the point - a single string copy. Now each compare could do a transcode i.e. generate a new string. Dan leo
Re: [CVS ci] hash compare
Peter Gibbs [EMAIL PROTECTED] wrote: I would prefer this to be done via an iterator, as it would also solve the skip_backward problems with DBCS encoding. Something like: There was a discussion, that current string iterators are wrong. They should take a position argument (and start of string) instead the pointer... typedef struct string_iterator_t { String *str; UINTVAL bytepos; UINTVAL charpos; UINTVAL (*decode_and_advance)(struct string_iterator_t *i); } string_iterator; ... which is done here. Does anybody think this is worth implementing? Yes. Its a nice speed up and a clean interface. Peter Gibbs leo
Re: Calling conventions. Again
I'm getting a little confused about what we're arguing about. I will take a stab at describing the playing field, so people can correct me where I'm wrong: Nonprototyped functions: these are simpler. The only point of contention here is whether args should be passed in P5..P15, overflowing into P3; or just everything in P3. Dan has stated at least once that he much prefers the P5..P15, and there hasn't been much disagreement, so I'll assume that that's the way it'll be. Prototyped functions: there are a range of possibilities. 1. Everything gets PMC-ized and passed in P3. (Oops, I wasn't going to mention this. I did because Joe Wilson seemed to be proposing this.) No arg counts. 2. Everything gets PMC-ized and passed in P5..P15+P3. Ix is an arg count for the number of args passed in P5..P15. P3 is empty if argcount = 11 (so you have to completely fill P5..P15 before putting stuff in P3.) 3. Same as above, but you can start overflowing into P3 whenever you want. Mentioned for completeness. Not gonna happen. In fact, anything above this point ain't gonna happen. 4. PMCs get passed in P5..P15+P3, ints get passed in I5..I15+P3, etc. Ix is a total argument count (number of non-overflowed PMC args + number of non-overflowed int args + ...). Arguments are always ordered, so it is unambiguous which ones were omitted in a varargs situation. I think this is what Leo is arguing for. 5. PMCs get passed in P5..P15+P3, ints get passed in I5..I15+P3, etc. Ix is the number of non-overflowed PMC args, Iy is the number of non-overflowed int args, etc. I think this is what Dan is arguing for. 6. PMCs get passed in Px..P15+P3, ints get passed in I5..I15+P4, etc. Ix is the number of non-overflowed PMC args, Iy is the number of non-overflowed int args, etc. I made this one up; see below. Given that all different types of arguments get overflowed into the same array (P3) in #4 and #5, #4 makes some sense -- if you want to separate out the types, then perhaps it should be done consistently with both argument counts _and_ overflow arrays. That's what #6 would be. Note that it burns a lot of PMC registers. The other question is how much high-level argument passing stuff (eg, default values) should be crammed in. The argument against is that it will bloat the interface and slow down calling. The argument for is that it increases the amount of shared semantics between Parrot-hosted languages. An example of how default values could be wedged in is to say that any PMC parameter can be passed a Null PMC, which is the signal to use the default value (which would need to be computed in the callee, remember), or die loudly if the parameter is required. Supporting optional integer, numeric, or string parameters would be trickier. Or disallowed. Hopefully I got all that right.
Re: IMCC fixes
Melvin Smith [EMAIL PROTECTED] wrote: I've fixed nearly all of breakage with IMCC that was introduced with the last large patch. Great, thanks. A hash test is failing, but I have no clue how my IMCC work affected that code. I'm hoping it was already failing before I synced? Yep. Obviously introduced by my hash value randomization patch. -Melvin leo
Re: Calling conventions. Again
Steve Fink wrote: Prototyped functions: there are a range of possibilities. 2. Everything gets PMC-ized and passed in P5..P15+P3. Ix is an arg count for the number of args passed in P5..P15. P3 is empty if argcount = 11 (so you have to completely fill P5..P15 before putting stuff in P3.) That is exactly the unprototyped case. 4. PMCs get passed in P5..P15+P3, ints get passed in I5..I15+P3, etc. Ix is a total argument count (number of non-overflowed PMC args + number of non-overflowed int args + ...). Arguments are always ordered, so it is unambiguous which ones were omitted in a varargs situation. I think this is what Leo is arguing for. Almost. An argument count doesn't help in the general case (e.g. against swapped params of the same type or such). So I'd just omit it for the probably more common case of fixed and known arguments. But, we need something for varargs or even changed signatures (wherever they might come from). The HLL compiler shouldn't know much about the underlying calling conventions. OTOH the code generated for argument binding in the callee has to interface with the HLL code for providing values for missing (default) arguments. We might need something like this .sub prototyped var_args# I1 is count of I-args .param int a# I5 when presemt .if_missing_param a # if I1 = 1 goto a_is_valid # HLL code to set a # ... # a = some # I5 = ... .end# a_is_valid: So while its up to the HLL to produce the code for providing a default value, its up to our calling conventions to allow such code to interface, when a param is missing. Just having counts isn't enough to provide all we need. We must provide an interface for the HLL to fill in code for missing arguments. Changed signatures can't be handled with counts anyway. The other question is how much high-level argument passing stuff (eg, default values) should be crammed in. See above. ... The argument against is that it will bloat the interface and slow down calling. I'd really have fixed and vararg cases separated for this reason. The fib() benchmark measuring raw call speed mainly, is already slow enough. leo
Re: [CVS ci] hash compare
On Thu, 13 Nov 2003, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: You're going to run into problems no matter what you do, and as transcoding could happen with each comparison arguably you need to make a local copy of the string for each comparison, as otherwise you run the risk of significant data loss as a sring gets transcoded back and forth across a lossy boundary. Here is again, what I already had proposed: * as long as there are only ascii keys: noop * on first non ascii key, convert all hash to utf8 - doesn't change hash values Well... this is the place where things fall down. It does change hash values. You may find yourself transcoding from, say, Shift-JIS to Unicode, which will result in most (if not all) of the characters in the string changing code-points. That's likely to change hash values just a little... Regardless, I think at least a single string copy with comparison against that copy within the hash functions is the only way to get correct results. Yes. That's the point - a single string copy. Now each compare could do a transcode i.e. generate a new string. When I said at least a single copy I meant that we might have multiple copies made, though that will definitely do nasty things to performance. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [CVS ci] hash compare
Dan Sugalski [EMAIL PROTECTED] wrote: On Thu, 13 Nov 2003, Leopold Toetsch wrote: * as long as there are only ascii keys: noop * on first non ascii key, convert all hash to utf8 - doesn't change hash values Well... this is the place where things fall down. It does change hash values. You may find yourself transcoding from, say, Shift-JIS to Unicode, Dan. It starts with ascii keys, unicode code-points 0x00..0x7f. When the first non-ascii key is to be stored, *ascii* keys are changed to utf8. Dan leo
Re: [CVS ci] hash compare
On Thu, 13 Nov 2003, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: On Thu, 13 Nov 2003, Leopold Toetsch wrote: * as long as there are only ascii keys: noop * on first non ascii key, convert all hash to utf8 - doesn't change hash values Well... this is the place where things fall down. It does change hash values. You may find yourself transcoding from, say, Shift-JIS to Unicode, Dan. It starts with ascii keys, unicode code-points 0x00..0x7f. When the first non-ascii key is to be stored, *ascii* keys are changed to utf8. Which doesn't do much good if we've got non-ascii, non-unicode keys. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Calling conventions. Again
On Thu, 13 Nov 2003, Steve Fink wrote: I'm getting a little confused about what we're arguing about. I will take a stab at describing the playing field, so people can correct me where I'm wrong: The current big issue is whether non-PMC parameter types get counts. There's not really anything else up for dispute. I think the for/against argument there is: for) Since we can't guarantee any sort of compiletime/runtime coherence, if we're passing parameters in I/S/N registers we need to note how many. against) That takes a lot of time and is unneeded most of the time While I sympathize with the argument against, for it to be a feasable soulution would require that runtime signatures not change from what was in place at compiletime. Given that there may be some significant amount of time between run and compile times (what with precompiled executables and libraries), and that we are in general targeting, supporting, and expecting fairly dynamic languages I'm expecting that we're going to have issues here. While the code that uses N/I/S registers is likely going to run at a lower level than most perl/python/ruby code, we're looking to encourage their use with HLL prototypes and such, which means we're going to see more use of low-level types than we might have in the past, so we're going to get more use of them than I think folks might be expecting. The other question is how much high-level argument passing stuff (eg, default values) should be crammed in. The argument against is that it will bloat the interface and slow down calling. The argument for is that it increases the amount of shared semantics between Parrot-hosted languages. An example of how default values could be wedged in is to say that any PMC parameter can be passed a Null PMC, which is the signal to use the default value (which would need to be computed in the callee, remember), or die loudly if the parameter is required. Supporting optional integer, numeric, or string parameters would be trickier. Or disallowed. That's something separate, though needing addressing. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: This week's summary
On Tue, 11 Nov 2003 12:21, Piers Cawley wrote; Freeze/thaw data format and PBC Leo Tötsch is working on the data serialization/deserialization (aka Freeze/Thaw) system discussed over the last few weeks. He wondered if there were any plans for the frozen image data format. Leo's plan is to use PBC constant format (with possible extensions) so things integrate neatly into bytecode. Dan had a bunch of comments, but the PBC based format idea seemed to be well received, with the caveat that it should be a 'dense' format. Cool. How are hooks in place for tools like Pixie and Tangram when these objects are being stored? Pixie works (Piers, please correct me if I'm wrong) by hanging magic off the object to let Pixie know when it encounters a storage object. It then lets the serialiser freeze the object. So, it needs to intercept the serialiser on a per-object *instance* basis. I've been considering the differences and inadequacies between Tangram and Pixie for quite some time now, and I think all I need to write a persistence tool that is as useful as Pixie, but as OLTP capable as Tangram is to be able to intercept freezing, and be able to feed the thaw'er, on a defined *Class/Property* basis. ie, I want to let the serialiser work just like Pixie, but when you encounter an object whose Class is defined to have a slightly different storage mechanism (perhaps there is a column which you want a database to add to an index for you; maybe the object is a static sized object and all the rows in it map to columns), it could be extracted without re-inventing the serialisation wheel for the process, or expecting the Classes to implement special methods. Much as I appreciate that Perl based index objects are just as functional as database indexes, I'd quite like to use my database for that. It's actually quite good at it :-). Would it be too much to ask for such hooks? Or should I come up with a sample implementation / design? -- Sam Vilain, [EMAIL PROTECTED] We must become the change we want to see. -- Mahatma Gandhi
[perl #24489] intor.pod contains a slight error.
# New Ticket Created by [EMAIL PROTECTED] # Please include the string: [perl #24489] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt2/Ticket/Display.html?id=24489 I hope this is the correct place to send this. intro.pod contains an error in one of the examples. 304c304 set I2 0 --- set I2, 0 Harry __ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree
Re: [CVS ci] hash compare
Dan Sugalski [EMAIL PROTECTED] wrote: On Thu, 13 Nov 2003, Leopold Toetsch wrote: Dan. It starts with ascii keys, unicode code-points 0x00..0x7f. When the first non-ascii key is to be stored, *ascii* keys are changed to utf8. Which doesn't do much good if we've got non-ascii, non-unicode keys. Whatever these are (PDDs?), they aren't handled with string_compare (or string_eqal) either. Hash has since some time a pluggable interface for the compare (and the hash_value) function. Such keys need a different compare function. I just would like to replace the current slow string_compare solution with a faster equivalent. Dan leo
Word for the day: Undocumentation
Just a reminder for new checkins. Please make sure there is a minimum of a header comment for each routine you checkin describing just what the heck the routine does. Debugging certain parts of Parrot has become akin to mapping out a rabbit hole using marking flares. For example, just picking a random file (ok not really so random...) which will remain nameless to protect the innocent, with a 20 second search I locate blocks of 129 and 112 consecutive lines with no comments. Not that comments have to be an exact ratio but ours is approaching 1 percent. I think overall there is just too much undocumentation going on. :) Cheers, -Melvin
Review of a book about VM
Disclaimer: Pardon my French :) I have bought Virtual Machine Design and Implementation in C++ by Bill Blunden. This book has very positive reviews (see slashdot or amazon.com). It seems to impress people by the apparent width of covered topics. Most of it is off topic. The book gives to the moderately knowledgeable reader no insight about virtual machines or even about language compilation. No architectural clues. Absolutely nothing about tight run loops, copy on write, compiling , optimizing code (SSA, tail calls, continuation passing style) or jitting code. The remarks about memory management are laughable. The book present a trivial VM and make no attempt to compare the features of existing VMs. The author has the chutzpah, pedantry or naiveté to compare himself to Knuth: If I had money, I could offer an award like Don Knuth (pronounced Ka-Nooth). But it is unclear that the book has been thru any reviewing or editorial process before publishing. I don't say there are obvious typos or so many factual errors. The book is just vain and vacuous. It is difficult to make a review because the book has about no substance concerning its stated subject: virtual machines. The author takes tangent after tangent and gives his opinion about almost anything concerning the computer industry, cramming irrelevant knowledge and speaking of all the wars he has waged. He usually states in one paragraph one can be said in one sentence. Most of code in the book has no interest whatsoever, like gigantic switches or declaration of symbolic constants. Visually disturbing notes are plain rants or state the obvious: virtual machines are run-time systems, but not all run-time systems are virtual machines (boy! I was thinking that was VMs all the way down). One note even propagates the old urban myth about TCP/IP: [the DoD] was originally interested in designing a command and control network that could survive a nuclear war. In Chapter 1, History and Goal, the author conveys the idea that coding on top of a virtual machine is a good protection about obsolescence of the real architectures. But the book will hardly convince you about the relevance of the presented VM according to this criteria because it apparently only runs on x86. The author states his goals about the VM he presents: 1/ portability 2/ simplicity 3/ Performance. Modulo order, these properties are true of any sensical software project. This does not give a clue about what is the purpose of the VM. He cites Java, so the purpose may be to run such a language. But after reading the book, I had no further clue. Chapter 2, 5 and 7 is about the presentation of his assembler. Don't ask me about the logic or the interest of the presentation. Almost nothing is specific about VM. So why bother to say more? Chapter 3 talks about a debugger. Chapter 6 is about virtual interrupts. The author seems to have used DOS as a model and is clearly limiting his discussion about x86. So much about portability. Chapter 8 called advanced topics explains the software industry progression toward increased abstractions going from binary to high level languages. There is a comparaison between Linux and Windows. Go figure. The anemic index is 6 pages and reflects the book, almost none of the entries are specific to VM. Each chapter is followed by a reference with a book list and an explanation of what they cover. Certainly better that the gratuitous show off list of academic papers. The choice of listed books is usually pretty good but often the comments show the cluelessness of the author. I enjoyed this one concerning the Tanenbaum book about OSes: People don't often realize that it was Minix that led to the development of Linux. I don't think people give Andrew enough credit. His ideas on modularity are very contemporary. The back cover says covers in detail features such as memory management, TCP/IP networking (...) everything expected from a commercial system. In fact the author decides that real garbage collection is too slow and too complex, does not even consider reference counting and decides to roll his own malloc. I don't know what TCP/IP has to do with VM... The book certainly does not fulfill the the back cover promises. It seems to me that the described VM is to VMs what a hello world is to real C programs. I did not bother to see the content of the bundled CD-ROM. -- stef
Re: Review of a book about VM
At 06:30 PM 11/13/2003 +0100, Stéphane Payrard wrote: Disclaimer: Pardon my French :) I have bought Virtual Machine Design and Implementation in C++ by Bill Blunden. This book has very positive reviews (see slashdot or amazon.com). It seems to impress people by the apparent width of covered topics. Most of it is off topic. The book gives to the moderately knowledgeable reader no insight about virtual machines or even about language compilation. Stef, Thanks for the review. Yours is the 2nd review I've read about this book. I was about to buy it but after the 1st review I changed my mind as it echoed your review in almost every sense. I did browse it at the bookstore and it seemed that the book tackled no real topics/problems associated with modern VM design. Disappointing. -Melvin
Re: Word for the day: Undocumentation
snip ...too much undocumentation going on. One of the reasons I started putting stuff on the wiki was because I could see that updating documentation was not a high priority. On the wiki I neither have to have CVS checkin rights, nor do I have to wait for someone with those rights to act upon what I suggest. This has led to my own parallel documentation - I even document the state of the documentation. I know what I have put together is incomplete and inadequate, but do I move it forward, and I keep it up to date. When it comes to pointing out that parrot_assembly.pod is just an earlier version of PDD 6, or that the Per-entity comments section in PDD 7 needs some thought, or that a submissions.pod should be added, I get warnocked. I'm fine with that, I understand why - this is not a rant - but I do think that Parrot has a steep learning curve and that good documentation is essential if we want to lower it. The potential benefits seem obvious. I'd like to volunteer myself as official Parrot documentation person - a semi-autonomous process with clearly defined protocols and goals - and the necessary rights to achieve them. I'm happy to expand on what I mean by that - if I get a response. Mike
Re: Word for the day: Undocumentation
On Thu, 13 Nov 2003, Michael Scott wrote: I'd like to volunteer myself as official Parrot documentation person - a semi-autonomous process with clearly defined protocols and goals - and the necessary rights to achieve them. I'm happy to expand on what I mean by that - if I get a response. If you're willing, fine with me. Get yourself an account on perl.org and mail me the name and we'll get you CVS checkin privs. And we can discuss what you've got in mind. Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Word for the day: Undocumentation
At 08:10 PM 11/13/2003 +0100, Michael Scott wrote: snip ...too much undocumentation going on. One of the reasons I started putting stuff on the wiki was because I could see that updating documentation was not a high priority. On the wiki I neither have to have CVS checkin rights, nor do I have to wait for someone with those rights to act upon what I suggest. This has led to my own parallel documentation - I even document the state of the documentation. I know what I have put together is incomplete and inadequate, but do I move it forward, and I keep it up to date. I've used the wiki quite a few times lately. It is great. When it comes to pointing out that parrot_assembly.pod is just an earlier version of PDD 6, or that the Per-entity comments section in PDD 7 needs some thought, or that a submissions.pod should be added, I get warnocked. That will keep happening until our number of active, core developers grows beyond its current number of 4-5. Also, a couple of the guys split time between Parrot and P5 so its worse. I'd like to volunteer myself as official Parrot documentation person - a semi-autonomous process with clearly defined protocols and goals - and the necessary rights to achieve them. Would commit privs to the docs directories help, for a start? -Melvin
Re: This week's summary
Sam Vilain [EMAIL PROTECTED] wrote: On Tue, 11 Nov 2003 12:21, Piers Cawley wrote; Freeze/thaw data format and PBC Leo Tötsch is working on the data serialization/deserialization Cool. How are hooks in place for tools like Pixie and Tangram when these objects are being stored? Finally freeze/thaw will be overridable as well as the {de,}serializer itself. The latter has an interface looking like io-vtable-push_pmc() or i = io-vtable-shift_integer(), which is the same as a PMC has. Would it be too much to ask for such hooks? Or should I come up with a sample implementation / design? We can discuss that better, when the code is in. leo
Re: Review of a book about VM
From: Melvin Smith [EMAIL PROTECTED] At 06:30 PM 11/13/2003 +0100, Stéphane Payrard wrote: Disclaimer: Pardon my French :) I have bought Virtual Machine Design and Implementation in C++ by Bill Blunden. This book has very positive reviews (see slashdot or amazon.com). It seems to impress people by the apparent width of covered topics. Most of it is off topic. The book gives to the moderately knowledgeable reader no insight about virtual machines or even about language compilation. Stef, Thanks for the review. Yours is the 2nd review I've read about this book. I was about to buy it but after the 1st review I changed my mind as it echoed your review in almost every sense. I did browse it at the bookstore and it seemed that the book tackled no real topics/problems associated with modern VM design. Disappointing. Any chance anyone could recommend a good book about VM design and implementation? Perferably one that covers some stuff that's relevant to Parrot. Thanks, Jonathan
Re: Review of a book about VM
On Thu, 2003-11-13 at 20:08, Melvin Smith wrote: At 06:30 PM 11/13/2003 +0100, Stéphane Payrard wrote: I have bought Virtual Machine Design and Implementation in C++ by Bill Blunden. This book has very positive reviews (see slashdot or amazon.com). It seems to impress people by the apparent width of covered topics. Most of it is off topic. The book gives to the moderately knowledgeable reader no insight about virtual machines or even about language compilation. Thanks for the review. Yours is the 2nd review I've read about this book. I was about to buy it but after the 1st review I changed my mind as it echoed your review in almost every sense. On a related note: what, if any, books would you recommend to someone who is eager to join parrot-development, but feels he doesn't know enough about compiler design, virtual machines etc. Michel Rijnders PS I'm new to this list: hi all!
Re: [CVS ci] hash compare
On Nov 13, 2003, at 2:21 PM, Nicholas Clark wrote: On Wed, Nov 12, 2003 at 02:07:52PM -0800, Mark A. Biggar wrote: And even when the sequence of Unicode code-points is the same, some encodings have multiple byte sequences for the same code-point. For example, UTF-8 has two ways to encode a code-point that is larger the 0x (Unicode as code-points up to 0x10FFF), as either two 16 bit surrogate code points encoded as two 3 byte UTF-8 code sequences or as a single value encoded as a single 4 or 5 byte UTF-8 code sequence. Is it legal to encode surrogate pairs as UTF8? Or does that count as malformed UTF8? No, it's not legal. As of Unicode 3.2, it's not permissible to encode a non-BMP (that is, code point 0x) character in UTF-8 via two 3-byte UTF-8 sequences. There is another encoding which does this, called CESU-8, but I don't think it's really ever used. JEff