newPMC() (was: Re: PDD 2, vtables)
Dan Sugalski wrote: Grab one via a utility function. getPMC() or something of the sort. newPMC() ? ;-) I think we shouldn't rule out the possibility of having multiple newPMC() style functions for grabbing PMCs used for different activities (eg lexicals vs tmps vs guaranteed-to-have-refcount=1 vs whatever), which may all have different allocation/GC schemes. Heck, we might even consider having newPMC(N) returning a pointer to a contiguous chuck of N PMCs; this might then allow us to have good locality of reference for all the stuff in a scratchpad, for example. I dont think we can have a sensible discussion of this until the GC PDD is been released, but I thought I'd flag up the idea anyway.
Re: PDD 2, vtables
Dan Sugalski wrote: If PMC is a pointer to a structure, "new" will need to allocate memory for a new structure, and hence the value of mypmc will have to change. Nope. PMC structures will be parcelled out from arenas and not malloc'd, and they won't be freed and re-malloced much. If we're passing in a PMC pointer, we won't be reallocating the memory pointed to--rather we'll be reusing it. So how do you get hold of a PMC from the arena in the first place? Alan Burlison
Re: PDD 2, vtables
At 12:45 AM 2/19/2001 +, Alan Burlison wrote: Dan Sugalski wrote: If PMC is a pointer to a structure, "new" will need to allocate memory for a new structure, and hence the value of mypmc will have to change. Nope. PMC structures will be parcelled out from arenas and not malloc'd, and they won't be freed and re-malloced much. If we're passing in a PMC pointer, we won't be reallocating the memory pointed to--rather we'll be reusing it. So how do you get hold of a PMC from the arena in the first place? Grab one via a utility function. getPMC() or something of the sort. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
Dan Sugalski wrote: Grab one via a utility function. getPMC() or something of the sort. newPMC() ? ;-) Alan Burlison
Re: PDD 2, vtables
At 01:13 AM 2/19/2001 +, Alan Burlison wrote: Dan Sugalski wrote: Grab one via a utility function. getPMC() or something of the sort. newPMC() ? ;-) Works for me. Though for some reason it brings up visions of the Village People, and that's generally a Bad Thing... :) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
On Sun, Feb 18, 2001 at 09:34:46PM -0500, Dan Sugalski wrote: At 01:13 AM 2/19/2001 +, Alan Burlison wrote: Dan Sugalski wrote: Grab one via a utility function. getPMC() or something of the sort. newPMC() ? ;-) Works for me. Slight that-sucks alert: So, if I have to get a new PMC of class X (which is something that's going to have to happen often) I call X-vtbl-new(newPMC()) or X-vtbl-morph(newPMC(),X); ? -- It's is not, it isn't ain't, and it's it's, not its, if you mean it is. If you don't, it's its. Then too, it's hers. It isn't her's. It isn't our's either. It's ours, and likewise yours and theirs. -- Oxford University Press, Edpress News
Re: PDD 2, vtables
At 02:50 AM 2/19/2001 +, Simon Cozens wrote: On Sun, Feb 18, 2001 at 09:34:46PM -0500, Dan Sugalski wrote: At 01:13 AM 2/19/2001 +, Alan Burlison wrote: Dan Sugalski wrote: Grab one via a utility function. getPMC() or something of the sort. newPMC() ? ;-) Works for me. Slight that-sucks alert: So, if I have to get a new PMC of class X (which is something that's going to have to happen often) I call X-vtbl-new(newPMC()) or X-vtbl-morph(newPMC(),X); X-vtbl-new. (Whether you snag a new PMC with newPMC or get handed a scratch one by the interpreter depends on circumstance) Morph is used for active variables, while new can assume that the PMC it was handed is dead. I was originally thinking that new should handle the case where the PMC it was handed might need cleanup, but at this point I think that's a bad thing--it forces important code out too many places, and if we want at least semi-determinate cleanup we ought be doing it someplace else anyway. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =item new void new(PMC[, key]); Creates a new variable of the appropriate type out of the passed PMC, destroying the current contents if there are any. This is a class function. Can I suggest this becomes PMC new(PMC[, key]); It gets really hard to get things started otherwise: PMC mypmc; (sviv_vtable.new)(mypmc); printf("mypmc's class is %s\n", (mypmc-vtbl-name)(mypmc)); If PMC is a pointer to a structure, "new" will need to allocate memory for a new structure, and hence the value of mypmc will have to change. So I'd rather say PMC mypmc; mypmc = (sviv_vtable.new)(mypmc); /* Old value of mypmc is ignored */ Unless it's a pointer to a pointer to a structure. Which is possible, but yuck. The structure struct _pmc { VTABLE* vtbl; void* private_data; }; makes it nice and easy to morph one PMC class into another; the "new" entry in the hypothetical sviv_vtable becomes: PMC int_new(PMC pmc, ...) { if (pmc) (pmc-vtbl-destroy)(pmc); else pmc = (PMC)malloc(sizeof(PMC)); pmc-vtbl = sviv_vtable; pmc-private_data = (IV*)malloc(sizeof(IV*)); return pmc; } -- If you give a man a fire, he'll be warm for a day. If you set a man on fire, he'll be warm for the rest of his life.
Re: PDD 2, vtables
On Sat, Feb 17, 2001 at 02:34:08PM -0500, Dan Sugalski wrote: Well, the idea was that the passed in PMC is either reusable, can be trashed, or is an aggregate of some point and we may autoviv the element corresponding to the key. Right, OK, but how do we create them in the first place? Nope. PMC structures will be parcelled out from arenas and not malloc'd, and they won't be freed and re-malloced much. Oh, phew, good. A bit too much Perl5-think on my part. We talked a few months ago about the structure of the base PMC piece. IIRC the general consensus is we need a flags field, a field for the garbage collector, and were going to tack on an integer and float fields as well for speed. Yuh, that dawned on me just after I sent it. Basically, I've been doing some quick mock-ups, and wasn't worrying about GC and things just yet. Simon -- Some people claim that the UNIX learning curve is steep, but at least you only have to climb it once.
Re: PDD 2, vtables
After a week's delay where I've just been too busy, I thought I'd resurrect the corpse of a thread I was involved in. First off, on Wed, 07 Feb 2001 14:37:33, Dan Sugalski [EMAIL PROTECTED] wrote: At 07:05 PM 2/7/2001 +, David Mitchell wrote: Dan, before I followup your reply to my list of nits about the PDD, can I clarify one thing: destruction. I am assuming that many PMCs will require destruction, eg calling destroy() on a string PMC will cause the memory used by the string data to be freed or whatever. Only very simple PMCs (such as integers) need to do no detruction. Is this the same as your perception of reality :-) ? Nope. The things that call destroy will be those things that have some sort of active destruction. Freeing memory doesn't count in this case, since the plan is to have some sort of external garbage collector that handles that. I also gather that PMCs will have a flag saying whether they need destroying, (eg ints say no, strings say yes), and that calls to destroy() are preceeded by a check on this flag for efficiency? Yep, though strings will be a no in this case. Ah, I see. I really must get my head round this GC business (my copy of the GC book is on order, but alas hasn't yet arrived :-( ). As a quick aside, If a PMC contains a pointer to a string, is there ever any need to set that pointer to NULL to allow the GC to reclaim the string? Given that PMCs will have a flag saying whether they need destroying, the naive view is that the external API definition for vtables will say 'you must check the NEED_DESTROY flag, and if true, you must call the destroy() method'. I'd regard this an implementation and optimisation detail which should be hideen from the user. Instead, I'd prefer that there be a standard set of macros to invoke methods, and that the macro for destroy() checks the flag before calling the real method. Then the API docs just say 'you must always call destroy()'. Similarly, I suggest that PMC flags appear (via the use of macros) to be identical to vtable methods. So that as implementations change, whether something can be determined purely by examining a flag or needs to be done via a full method call, is hidden from the user. Or to put it another way, PDD 2 should pretend that everything is a method, and whether some of these get optimised into checks for flag bits or conditional calls, is an implementation detail that we can hide from the user by the judicious use of macros or whatever. =item new void new(PMC[, key]); Creates a new variable of the appropriate type out of the passed PMC, destroying the current contents if there are any. This is a class function. As an aside, am I right in assuming that there will be a function somewhere (outside the scope of this PDD) that creates new, empty PMCs, which can then be acted upon by new() to turn them into PMCs of a particular type? Probably, yes. More likely, PMCs will be declared nukable unless a "clean me up" flag is set, in which case we'd just stomp on what was in there. (Since we generally don't care about the contents of a PMC when trashing it, with some relatively rare exceptions) Will PMCs that have just been new()ed have a default null value, eg 0/"" ? Note that they wont be undefined - at least, I'm assuming that undefined is handled by a separate class. Perhaps we need instead a range of new()s that initialise to various string, numeric etc values? I'm figuring that'll be handled by the bits that deal with constants, but there's no reason that a plain new PMC can't be undef. (Basically set the private data pointer to NULL and the vtable pointer to the undef class vtable) Hmm, there doesnt seem to be anything related to handling constants in PDD 2. The above definition of new() implies that it first calls destroy() to release any previous contents. I think it would be better to define new() as operating on an empty PMC (so it is the the caller's responsibility to call destroy() first, if necessary). destroy will only be called if the PMC needs it. Most PMCs will just get their contents trashed and let the GC clean up after. yes, but we have to agree whether its the callers or callee's responsibilty to determine whether to check for nuke requirements. I think it should be the caller's responsibity: often the caller will know that the thing it is passing has already been nuked, so it knows not to waste time checking. Actually, I suspect that the whole area of new/clone/destroy etc will need to be examined carefully in the light of a 'typical' variable lifecycle, to avoid to unecessary transitions. For example, my $a = 'abc' might involve $a going from empty - undef - empty - "" - "abc" or similar, if we're not careful. That's an area for the optimizer. I'd like it to go from empty-"abc", assuming we don't skip the empty step. But its also a task for us to define PDD 2 carefully
Re: PDD 2, vtables
On Wed, Feb 14, 2001 at 06:37:18PM +, David Mitchell wrote: Hmm, there doesnt seem to be anything related to handling constants in PDD 2. I anticipate constants will be PMCs with a small vtable of "get methods", possibly with several different types of value (string, numberic, float, etc.) precomputed at compile time. -- We *have* dirty minds. This is not news. - Kake Pugh
Re: PDD 2, vtables
At 08:47 AM 2/10/2001 -0200, Branden wrote: Dan Sugalski wrote: The string API should be sufficiently smart to be able to convert data from one encoding to another as it's more convenient. No, the vtable functions for the variables should know how to convert from and to perl's preferred string representations, and can do whatever Bizarre Magic they care to iternally. I don't see why Perl couldn't deal with multiple representations internally. Conversion could be done on the way in, internally for efficiency on certain operations, and on the way out, again. It can, and it will. The question is "which ones". The regex engine will almost undoubtedly deal with only fixed-sized characters. Perl itself will probably restrict itself to fixed width characters as well. Individual variable classes can store data in any form they want. (If someone wants to leverage zlib to write a class that compresses its data, I'm fine with that) On the other side, for a string that is matched against regexps, it doesn't matter much if it has variable character length, since regexps normally read all the string anyway, and indexing characters isn't much of a concern. You underestimate the impact of variable-length data, I think. Regexes should go rather faster on fixed-length than variable length data. How much so depends on your processor. (I can guarantee that Alphas will run a darned sight faster on UTF-32 than UTF-8...) Aggreed. Should go faster. But maybe I don't need it that fast! That's fine. Speed is my #1 priority. Memory usage is secondary. (An important secondary, but secondary nonetheless) Which doesn't rule out UTF-8, of course--it may turn out that converting things is slower than dealing with variable width data, in which case priority #1 wins. (I really think it shouldn't be so much slower than doing it on an ASCII string with the same total buffer size, it only would have to fetch another byte on certain conditions and build the extended character representation, what isn't hard either.) You might not think so, but you would be wrong. You have a test and potential branch (possibly more--folks with lots of UTF-8 data, which includes everyone with a non-latin alphabet) on *every* character. That is not cheap on modern processors. Yes, you're pulling in significantly less data, which has an impact with UTF-32 (and garbage collection) but I'm not sure you'll find it a win. We can benchmark it and see if my feeling is wrong once we get some code and a testing scaffold built. It would be nice if the user had some control to this, for example by saying "I don't care this string will be used by substr, leave it in UTF-8 since it's too big and I don't want to waste memory!", or "This string isn't too big, so I should convert it to bloated UTF-32 at once!", or even "use less 'memory';". That would be: my str $foo : utf8 : fixed; or possibly use less qw(memory); Probably not my str $foo :utf8 :fixed, since then if I have $bar = $foo it would convert the string value from $foo to anything else, right? Might. Larry's not set the rules on what attributes are passed on with assignment. If you're really worried, there's no reason not to set attributes on $bar either. Generally speaking you probably don't want to do this. Odds are if you think you know what's going on better than the compiler, you're wrong. (Not always, but in a non-trivial number of cases, in my experience) I can't beat the compiler, that's for sure. But I really don't think I want to read a 100KB file into a variable all at once and end up with 400KB memory usage only for that file. And I really don't care if `regexps' go slower on that, I can live with it... If it's binary data or 8-bit characters, you won't. If it's UTF-8 you might see expansion, but how much depends on how many 7-bit characters you have. And then only if something actually asks for the data in UTF-32 format. This has been enough to convince me that there should be UTF-8 as one of the base character types for vtables, even if we don't use it in many places internaly. For stuff that's just read and printed, it'll save memory, I think. Hope, at least. (Though it probably means the regex engine should deal with variable-width characters, and I'd really rather it didn't) And I believe 8-bit ASCII will always be an option, for who doesn't care about extended characters and want the best of both worlds on speed and memory usage. 8-bit characters in general, yep. (ASCII is really 7-bit) ASCII, EBCDIC, or raw byte buffers. That includes Latin-1, Latin-etc. (I believe they're 10 or 12), which are the same as the ISO-8859-1, ISO-8859-(etc). Yes. Anything that doesn't require UTF-8. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED]
Re: PDD 2, vtables
On Thu, Feb 08, 2001 at 09:55:17PM -0600, Jarkko Hietaniemi wrote: Umm, one way or another I suspect UTF-8 will be in there. I suspect so too but very grudgingly. If we abstract the string handling nicely, it can be added on later or even separately. (Credo: We don't have to write all of Perl 6 at once.) Yes, UTF8 is a bit of a pig to deal with, but since most of the world's data is still 7-bit, it's almost certainly worth it. -- Britain has football hooligans, Germany has neo-Nazis, and France has farmers. -The Times
Re: PDD 2, vtables
Dan Sugalski wrote: At 04:02 PM 2/7/2001 +, David Mitchell wrote: Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...) The second PMC would point to a lazy list, so it wouldn't be evaluated unless its value gets fetched. This implies there will be PMCs containing or refering to an arbitrary amount of bytecode (with delayed execution). Consider this: $dest = $i || ${ BLOCK }; I wonder which vtable function $i-logical_or will use to trigger the delayed evaluation of the right operand. (I assume it will be evaluated before the assignment.) The only thing $i-logical_or can know about ${ BLOCK } is that it will be something scalar. Problem: 1. the evaluation of ${ BLOCK } must be done now. 2. the result should not be forced to a certain type, because || doesn't do this. (Contrary to arithmetic operators the left operand $i has nothing to say in this.) I don't see any vtable functions in the PDD which could solve this. Maybe there should be get_scalar and get_list functions, which do not force anything but evaluation in scalar/list context. -Edwin
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 01:24:27PM -0500, Dan Sugalski wrote: At 06:12 PM 2/7/2001 +, Nicholas Clark wrote: But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. That particular case would get caught by the optimizer (I'd hope) so it'd not be an issue anyway. Oops. Should have written it as sub foo { $[0] += 2; } and somewhere else $a = 3; foo ($a); but I agree - we need to time these things, not talk about them Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Whether a special case in the code would get a speedup or not's up in the air. (Is the test and branch faster than a generic doing it routine?) I'd want to test that and see before I decided. Yes, we're trading size [data cache (larger virtual tables on scalars) + instruction cache (2 routines)] versus branch slowdown. Or probably something more complicated than that. Hence your approach of make it flexible enough and try both. Nicholas Clark
Re: PDD 2, vtables
At 12:12 PM 2/8/2001 +, Nicholas Clark wrote: On Wed, Feb 07, 2001 at 01:24:27PM -0500, Dan Sugalski wrote: At 06:12 PM 2/7/2001 +, Nicholas Clark wrote: But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. That particular case would get caught by the optimizer (I'd hope) so it'd not be an issue anyway. Oops. Should have written it as sub foo { $[0] += 2; } and somewhere else $a = 3; foo ($a); It's examples like these that make me wince at the things we can't do in the perl optimizer. (This is a great candidate for inlining, and we probably won't be able to.) but I agree - we need to time these things, not talk about them Once we hammer out the vtable spec, I want to start writing code for the base vtable types. (Plain scalar, plain array, plain hash) I'm hoping this won't be hindered by the current licensing terms. (Which are laid out in http://archive.develooper.com/perl6-internals%40perl.org/msg01678.html for the interested) Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Whether a special case in the code would get a speedup or not's up in the air. (Is the test and branch faster than a generic doing it routine?) I'd want to test that and see before I decided. Yes, we're trading size [data cache (larger virtual tables on scalars) + instruction cache (2 routines)] versus branch slowdown. Or probably something more complicated than that. And probably completely counter-intuitive on top of it. (Plus the performance characteristics will probably be completely opposite on x86 and Alpha/SPARC machines. That'd be about right) Hence your approach of make it flexible enough and try both. Yep. Time it and try it, I think. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
At 11:57 AM 2/8/2001 +0100, Edwin Steiner wrote: Dan Sugalski wrote: At 04:02 PM 2/7/2001 +, David Mitchell wrote: Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...) The second PMC would point to a lazy list, so it wouldn't be evaluated unless its value gets fetched. This implies there will be PMCs containing or refering to an arbitrary amount of bytecode (with delayed execution). Consider this: $dest = $i || ${ BLOCK }; I wonder which vtable function $i-logical_or will use to trigger the delayed evaluation of the right operand. (I assume it will be evaluated before the assignment.) The only thing $i-logical_or can know about ${ BLOCK } is that it will be something scalar. What we can do in that case is treat the right-hand block as an anonymous sub, and put a code ref in the list for the righthand side. Problem: 1. the evaluation of ${ BLOCK } must be done now. Must it? Lazy evaluation would lead me to think it shouldn't be done unless the left side of the || evaluates to something false. 2. the result should not be forced to a certain type, because || doesn't do this. (Contrary to arithmetic operators the left operand $i has nothing to say in this.) Well, the right side of || gets the context of the left side of the assignment, if it's in one. But dealing with odd context isn't anything new in perl. I don't see any vtable functions in the PDD which could solve this. Maybe there should be get_scalar and get_list functions, which do not force anything but evaluation in scalar/list context. Context is set separately, and I'm not sure how to do it yet. I'm considering some sort of interpreter variable with the current context, but I'm not sure that's the best way. We will need utility functions to find it out, though. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
Dan Sugalski wrote: At 11:57 AM 2/8/2001 +0100, Edwin Steiner wrote: Dan Sugalski wrote: At 04:02 PM 2/7/2001 +, David Mitchell wrote: Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...) The second PMC would point to a lazy list, so it wouldn't be evaluated unless its value gets fetched. This implies there will be PMCs containing or refering to an arbitrary amount of bytecode (with delayed execution). Consider this: $dest = $i || ${ BLOCK }; I wonder which vtable function $i-logical_or will use to trigger the delayed evaluation of the right operand. (I assume it will be evaluated before the assignment.) The only thing $i-logical_or can know about ${ BLOCK } is that it will be something scalar. What we can do in that case is treat the right-hand block as an anonymous sub, and put a code ref in the list for the righthand side. That would be perfectly fine, I think. Would the sub (itself, not the reference) be a PMC? Either way you still need a means to evaluate/call this thing. I thought there should be virtual functions for this on the ref PMC or on the sub PMC itself. Is this a misconception? Problem: 1. the evaluation of ${ BLOCK } must be done now. Must it? Lazy evaluation would lead me to think it shouldn't be done unless the left side of the || evaluates to something false. I didn't express myself clearly. I was (unfortunately not explicitly) refering to the case when the left side evaluates to false, and only to this case. By "now" I meant after the evaluation of the left side (to false) and before the assignment. The problem I was pointing out is triggering the delayed evaluation "now" (probably by calling a get_* function) without forcing the result to be one of (integer/number/string/bool). You suggest making it an anon-sub-ref call, which is fine. But as I stated above I expected there would be virtual functions for this. After all the internals of a PMC should only be known to the virtual functions, right? (eg. Where the ref PMC stores the function/sub pointer would be internal.) Your words (PDD 2): Nothing outside the core of perl (in fact, nothing outside the data type's vtable routines) should infer anything about a PMC. (hence the Magic part) 2. the result should not be forced to a certain type, because || doesn't do this. (Contrary to arithmetic operators the left operand $i has nothing to say in this.) Well, the right side of || gets the context of the left side of the assignment, if it's in one. But dealing with odd context isn't anything new in perl. Exactly. I actually found that what I meant (in this example) is described as "don't-care scalar context" in the Camel Book. I was troubled because in the PDD there are only get_* functions for more specific cases (int,number,string,bool). I don't see any vtable functions in the PDD which could solve this. Maybe there should be get_scalar and get_list functions, which do not force anything but evaluation in scalar/list context. Context is set separately, and I'm not sure how to do it yet. I'm considering some sort of interpreter variable with the current context, but I'm not sure that's the best way. We will need utility functions to find it out, though. -Edwin
Re: PDD 2, vtables
On Tue, Feb 06, 2001 at 12:28:23PM -0500, Dan Sugalski wrote: At 11:26 AM 2/6/2001 +, Tim Bunce wrote: [First off: I've not really been paying attention so forgive me if I'm being dumb here. And many thanks for helping to drive this forwards.] On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =head2 Core datatypes For ease of use, we define the following semi-abstract data types Probably worth stating upfront that it'll be easy to add new types to avoid people argusing for their favorite type to be added here. I'm not sure it should be--that'd mean extending the vtables in ways they have little room to grow. Adding new perl datatypes is easy, adding new low-level types is harder. That's pretty much what I meant. I think it's worth saying. =item INT =item NUM =item STR =item BOOL What about references? Special type of scalar, not dealt with here. But should be at least mentioned. =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. Except to say that some clarification here may help, and explicitly mentioning UTF-8 (even to say it won't be a core type and provide a reference to why) would be good. I didn't put UTF-8 in on purpose, because I'd just as soon not deal with it internally. Variable length character data's a pain in the butt, and if we can avoid having the internals deal with it except as a source that gets converted to UTF-32, that's fine with me. I agree with Branden that a default 4x memory bloat would not be popular. The native and foreign string data types were an attempt to accommodate UTF-8, as well as ASCII and EBCDIC character data. One of the three will likely be the native type, and the rest will be foreign strings. I'm not sure if perl should have only one foreign string type, or if we should have a type tag along with the other bits for strings. Umm, one way or another I suspect UTF-8 will be in there. =item is_same BOOL is_same(PMC1, PMC2[, key]); Returns TRUE if CPMC1 and CPMC2 refer to the same value, and FALSE otherwise. I think that needs more clarification, especially where they are of different types. Contrast with is_equal() below. If they're different types they can't be the same. This would be used to check if two references have the same referent, or if two magic variables (database handles, say) pointed to the same thing. Okay, so say so in the PPD. "refer to the same value" isn't very clear (the word value is probably the problem). =item concatenate void concatenate(PMC1, PMC2, PMC3[, key]); ## Concatenates the strings in CPMC2 and CPMC3, storing the result in CPMC1. and insert (ala sv_insert) etc? Hadn't considered them. Care to elaborate on the etc? Er, I haven't looked at sv.c for ages but basically all the kinds of string manipulations that ended up in there for good reason will probably need to be in perl6. sv_insert is a good example (and possibly the only one :-) =item logical_or =item logical_and =item logical_not Er, why not just use get_bool? The only reason I can think of is to support three-value-logic but that would probably be better handled via a higher-level overloading kind of mechanism. Either way, clarify. Well, there's overloading. Plus the potential that a class will do something odd with it--if you || on two custom arrays in list context you might get an array with each pair (left[0] || right [0] and so on) logically or'd. Okay, don't forget xor then :) =item match void match(PMC1, PMC2, REGEX[, key]); Performs a regular expression match on CPMC2 against the expression CREGEX, placing the results in CPMC1. Results, plural = container = array or hash. Needs clarifying. Yep, especially since I'd considered tossing the match destination entirely. (Though that means special variables, and I'm not sure I want to go there) It'll likely just return true or false. I'll rethink it. A BOOL return would be good. But "placing the results in CPMC1" is also good (assuming 'results' are equiv to $1, $2 etc in perl5). =head1 REFERENCES PDD 3: Perl's Internal Data Types. Some references to any other vtable based languages would be good. (I presume people have looked at some and learnt lessons.) Alas not. This is pretty much head of zeus stuff, modulo some ego. (Mine's not *that* big...) :-) Without studying history we may be doomed to repeat it. So can anyone point to vtable based language implementations? Tim.
Re: PDD 2, vtables
Some comments about the vtable PDD... First a general comment. I think we really need to make it clear for each method, which arg respresents the object that is having its method called (ie which is $self/this so to speak). One way to make this clear would be to insist that the first arg is always $self, but failing that, it should be explicity mentioned for each function. Also, can I suggest a bit of terminology, which I will use below? I define an *empty* PMC as a PMC which exists, but does not have a valid vtable ptr or content (and so whose methods must not be called under any circumstances). I specifically contrast this with a undefined PMC, which has a valid vtable pointer that points to a bunch of methods that mostly call carp("use of undefined value ..."). Now onwards and upwards The Ckey parameter is optional, and if passed it refers to an array of key structure pointers. A mere detail, but would it not be more efficient to just pass them as extra args, ie add(PMC1, PMC2, PMC3, key1, key2, key3), rather than having to potentially create and populate a tmp struct just to call the function??? IVtype(PMC[, subtype]); Returns the type of the PMC. If the subtype is passed (int, string, num) it returns the subtype of the PMC. This is generally a class function rather than a variable one, but the PMC is passed in just in case. (And so we can have the subtype be a vararg parameter) I dont understand the subtype bit. If for example we pass an optional 2nd arg of INT (assuming this is some sort of enum?), what does type() return? =item new void new(PMC[, key]); Creates a new variable of the appropriate type out of the passed PMC, destroying the current contents if there are any. This is a class function. As an aside, am I right in assuming that there will be a function somewhere (outside the scope of this PDD) that creates new, empty PMCs, which can then be acted upon by new() to turn them into PMCs of a particular type? Will PMCs that have just been new()ed have a default null value, eg 0/"" ? Note that they wont be undefined - at least, I'm assuming that undefined is handled by a separate class. Perhaps we need instead a range of new()s that initialise to various string, numeric etc values? The above definition of new() implies that it first calls destroy() to release any previous contents. I think it would be better to define new() as operating on an empty PMC (so it is the the caller's responsibility to call destroy() first, if necessary). Actually, I suspect that the whole area of new/clone/destroy etc will need to be examined carefully in the light of a 'typical' variable lifecycle, to avoid to unecessary transitions. For example, my $a = 'abc' might involve $a going from empty - undef - empty - "" - "abc" or similar, if we're not careful. void clone(PMC1, PMC2 [, int flags[,key]); Copies CPMC2 into CPMC1. The Cflags parameter notes whether a deep copy should be done. (Possibly other things as well, if someone thinks of something reasonable) One flag that would be very useful is 'destroy', which tells clone() to destroy PMC2 immediately after the clone operation. This is because a clone will often be immediately followed by a destroy of the copied PMC, and delegating the destory() to clone() allows clone() the chance to do things more effiently (eg even when asked to do a deep copy, it just copies the vtable pointer and payload pointer(s), then scrubs the old PMC) I alo think that clone should expect PMC1 to be empty - ie it assumes the caller has already called destroy() if necessary. I guess we also need an assign() method, to handle $a = $b and the like. Note that assign and clone are very different operations (although assign may well call clone): assign() copies *to* itself, while clone() copies *from* itself. Note that if $a is a 'simple' variable, $a-assign($b) will itself just fall through to $b-clone($a) and let $a be wiped; while if $a is magic or tied or whatever, then $a-assign($b) will take a more active role in setting its own value, based on the value of $b. void morph(PMC, type[, key]); Tells the PMC to change itself into a PMC of the specified type. I dont really see what the difference is between this and new(). void destroy(PMC[, key]); Destroys the variable the PMC represents, leaving it undef. (See also my comments earlier about new/clone/destroy etc). I think destroy should leave an empty PMC rather than an undef one, since as I said earlier, I think undef is a class in its own right. =item exists (x) BOOL exists(PMC1[, key]); Presumably we also need defined(). (where most classes will always return false, while the 'undefined' classs class always returns true.)
Re: PDD 2, vtables
Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...)
Re: PDD 2, vtables
At 04:02 PM 2/7/2001 +, David Mitchell wrote: Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...) The second PMC would point to a lazy list, so it wouldn't be evaluated unless its value gets fetched. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 04:03:49PM +, David Mitchell wrote: BTW, should the vtable include all the mutator operators too, ie ++, += and so on, on the grounds that an implementation may be able do this more efficiently internally? ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. Nicholas Clark
Re: PDD 2, vtables
Nicholas Clark wrote: ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. Nicholas Clark I guess everything (including get/set, add/sub/mul/...) could have a speed up hack for the common case. The vtables (or PMC's flags as well) could have a flag that indicate ``No tying and no overloading here, nothing special, just another plain old variable''. Every operation would check the `special' flag of the values it operates, and do the right thing on them. Otherwise, call the vtable for the generic way of doing it. I _think_ this would be a great speed up if the program doesn't use much magic, but perhaps the overhead would be too big and make tying slower than in Perl 5... something to consider tough. - Branden
Re: PDD 2, vtables
Nicholas Clark [EMAIL PROTECTED] mused: On Wed, Feb 07, 2001 at 04:03:49PM +, David Mitchell wrote: BTW, should the vtable include all the mutator operators too, ie ++, += and so on, on the grounds that an implementation may be able do this more efficiently internally? ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. I'm not sure I follow you. What is the "this" in "this nearly provides a divide"? Confused of Sheffield.
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 05:19:16PM +, David Mitchell wrote: Nicholas Clark [EMAIL PROTECTED] mused: On Wed, Feb 07, 2001 at 04:03:49PM +, David Mitchell wrote: BTW, should the vtable include all the mutator operators too, ie ++, += and so on, on the grounds that an implementation may be able do this more efficiently internally? ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. I'm not sure I follow you. What is the "this" in "this nearly provides a divide"? this example. I think the "nearly" probably should go. Maybe I should have written "++ and -- in perl5 provides an example of a (nearly clean) divide between operator and value Confused of Sheffield. Hmm. Yes. I'm confused too. Confused of Newcastle
Re: PDD 2, vtables
At 03:09 PM 2/7/2001 +, David Mitchell wrote: Some comments about the vtable PDD... First a general comment. I think we really need to make it clear for each method, which arg respresents the object that is having its method called (ie which is $self/this so to speak). One way to make this clear would be to insist that the first arg is always $self, but failing that, it should be explicity mentioned for each function. The docs are unclear there. I'll patch 'em up. FWIW, generally the first argument is the destination. Also, can I suggest a bit of terminology, which I will use below? I define an *empty* PMC as a PMC which exists, but does not have a valid vtable ptr or content (and so whose methods must not be called under any circumstances). I specifically contrast this with a undefined PMC, which has a valid vtable pointer that points to a bunch of methods that mostly call carp("use of undefined value ..."). This is an area that needs addressing, that's for sure. I'll deal with that as well. The Ckey parameter is optional, and if passed it refers to an array of key structure pointers. A mere detail, but would it not be more efficient to just pass them as extra args, ie add(PMC1, PMC2, PMC3, key1, key2, key3), rather than having to potentially create and populate a tmp struct just to call the function??? Well, extra arguments cost. I'd originally had only a single key optionally passed in, but that meant potentially two of the three PMCs had to be real, rather than keyed into containers. (And thus possibly virtual) Hence the key array. I admit it's not a swell decision--I'm not sure whether the single optional parameter is better, or several optional (or required) parameters are better. I can see it going either way, and I didn't put all that much thought into it. IVtype(PMC[, subtype]); Returns the type of the PMC. If the subtype is passed (int, string, num) it returns the subtype of the PMC. This is generally a class function rather than a variable one, but the PMC is passed in just in case. (And so we can have the subtype be a vararg parameter) I dont understand the subtype bit. If for example we pass an optional 2nd arg of INT (assuming this is some sort of enum?), what does type() return? If you pass in INT, you'll get back NATIVE or BIGINT, depending on which one the PMC would rather be. STR would get BINARY, NATIVE, FOREIGN, or UTF_32. Basically, subtype says "If I asked you what kind of string you were, what would you tell me?" =item new void new(PMC[, key]); Creates a new variable of the appropriate type out of the passed PMC, destroying the current contents if there are any. This is a class function. As an aside, am I right in assuming that there will be a function somewhere (outside the scope of this PDD) that creates new, empty PMCs, which can then be acted upon by new() to turn them into PMCs of a particular type? Probably, yes. More likely, PMCs will be declared nukable unless a "clean me up" flag is set, in which case we'd just stomp on what was in there. (Since we generally don't care about the contents of a PMC when trashing it, with some relatively rare exceptions) Will PMCs that have just been new()ed have a default null value, eg 0/"" ? Note that they wont be undefined - at least, I'm assuming that undefined is handled by a separate class. Perhaps we need instead a range of new()s that initialise to various string, numeric etc values? I'm figuring that'll be handled by the bits that deal with constants, but there's no reason that a plain new PMC can't be undef. (Basically set the private data pointer to NULL and the vtable pointer to the undef class vtable) The above definition of new() implies that it first calls destroy() to release any previous contents. I think it would be better to define new() as operating on an empty PMC (so it is the the caller's responsibility to call destroy() first, if necessary). destroy will only be called if the PMC needs it. Most PMCs will just get their contents trashed and let the GC clean up after. Actually, I suspect that the whole area of new/clone/destroy etc will need to be examined carefully in the light of a 'typical' variable lifecycle, to avoid to unecessary transitions. For example, my $a = 'abc' might involve $a going from empty - undef - empty - "" - "abc" or similar, if we're not careful. That's an area for the optimizer. I'd like it to go from empty-"abc", assuming we don't skip the empty step. void clone(PMC1, PMC2 [, int flags[,key]); Copies CPMC2 into CPMC1. The Cflags parameter notes whether a deep copy should be done. (Possibly other things as well, if someone thinks of something reasonable) One flag that would be very useful is 'destroy', which tells clone() to destroy PMC2 immediately after the clone operation. That makes no sense to me. If we're cloning PMC2 then trashing it, why not just set
Re: PDD 2, vtables
++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. I'm not sure I follow you. What is the "this" in "this nearly provides a divide"? this example. I think the "nearly" probably should go. Maybe I should have written "++ and -- in perl5 provides an example of a (nearly clean) divide between operator and value Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. Confused of Sheffield. Hmm. Yes. I'm confused too. Confused of Newcastle Fancy swapping some cutlery for some Brown Ale? ;-)
Re: PDD 2, vtables
Dan Sugalski wrote: At 03:09 PM 2/7/2001 +, David Mitchell wrote: A mere detail, but would it not be more efficient to just pass them as extra args, ie add(PMC1, PMC2, PMC3, key1, key2, key3), rather than having to potentially create and populate a tmp struct just to call the function??? Well, extra arguments cost. I'd originally had only a single key optionally passed in, but that meant potentially two of the three PMCs had to be real, rather than keyed into containers. (And thus possibly virtual) Hence the key array. I admit it's not a swell decision--I'm not sure whether the single optional parameter is better, or several optional (or required) parameters are better. I can see it going either way, and I didn't put all that much thought into it. I think filling an array costs at least as much as passing parameters, not to mention the cost of passing that array as a parameter, also... Unless the array is constant and can be pre-filled by the compiler (which I think is a somewhat rare case, considering all the three arguments), or once filled there are cases it can be reused, I'm not sure it's worth using an array to save 2 pushes into the stack... - Branden
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 05:54:14PM +, David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. I'm not either. They feel like they should be operators. But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Nicholas Clark
Re: PDD 2, vtables
David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. Oppose. (Actually I'm talking about my idea on vtables, i.e. separate +/-/* in one vtable and store/fetch in another). My proposal on ++ and -- would be having the `value'-part of the vtable (the one that handles +/-/*) return a value corresponding to what would be the value of it after an increment or decrement. store would be used to actually commit the ++/-- operation. This would serve both postfix and prefix cases, because in one case the value before the store would be used, and in the other the one after. (I just reminded the C++ overloading of ++, that uses a dummy parameter to tell if it's a pre or a post increment. So bad...) - Branden
Re: PDD 2, vtables
At 06:12 PM 2/7/2001 +, Nicholas Clark wrote: On Wed, Feb 07, 2001 at 05:54:14PM +, David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. I'm not either. They feel like they should be operators. But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. That particular case would get caught by the optimizer (I'd hope) so it'd not be an issue anyway. Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Whether a special case in the code would get a speedup or not's up in the air. (Is the test and branch faster than a generic doing it routine?) I'd want to test that and see before I decided. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
At 04:15 PM 2/7/2001 -0200, Branden wrote: David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. Oppose. (Actually I'm talking about my idea on vtables, i.e. separate +/-/* in one vtable and store/fetch in another). My proposal on ++ and -- would be having the `value'-part of the vtable (the one that handles +/-/*) return a value corresponding to what would be the value of it after an increment or decrement. store would be used to actually commit the ++/-- operation. This would serve both postfix and prefix cases, because in one case the value before the store would be used, and in the other the one after. Splitting the vtable into two pieces, with one piece not tied to a PMC, makes some things impossible. Consider this: @foo = @bar * @baz; where all three arrays are really matrix types. In the separate load/store and do vtable scheme it means you get the value of @bar and @baz in scalar context, and multiply the results. Two operations, and the resultant values are sanitzed. In the single vtable scheme, we'd execute @bar's multiply routine, which would be clever enough (because we wrote it that way) to see the second parameter's also a matrix, and do matrix math. Splitting things up also loses information when moving data between the two vtables routines. While that's not a big deal generally (as the info lost is irrelevant) it forbids some rather interesting side cases. (I just reminded the C++ overloading of ++, that uses a dummy parameter to tell if it's a pre or a post increment. So bad...) I'm not sure it's worth having both a preinc and postinc operator, as opposed to splitting it into an inc/fetch or fetch/inc pair. (And yes, I know, earlier I was arguing that one opcode's better than two. This one's rare enough that profiling it would probably be in order...) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
I'm not either. They feel like they should be operators. But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. That particular case would get caught by the optimizer (I'd hope) so it'd not be an issue anyway. Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Whether a special case in the code would get a speedup or not's up in the air. (Is the test and branch faster than a generic doing it routine?) I'd want to test that and see before I decided. Are we all clear then, that in perl 6, since the opcodes etc are no longer allowed to rummage around in the internals of a PMC, its purely a question of whether $a += 3 invokes add($a,$a,3) or eqadd($a,3) and whether $a++ invokes add($a,$a,1) or postinc($a) etc? And that this decision is mainly a 'time it and see' decision?
Re: PDD 2, vtables
Dan, before I followup your reply to my list of nits about the PDD, can I clarify one thing: destruction. I am assuming that many PMCs will require destruction, eg calling destroy() on a string PMC will cause the memory used by the string data to be freed or whatever. Only very simple PMCs (such as integers) need to do no detruction. Is this the same as your perception of reality :-) ? I also gather that PMCs will have a flag saying whether they need destroying, (eg ints say no, strings say yes), and that calls to destroy() are preceeded by a check on this flag for efficiency?
Re: PDD 2, vtables
Dan Sugalski wrote: Splitting the vtable into two pieces, with one piece not tied to a PMC, makes some things impossible. Consider this: @foo = @bar * @baz; where all three arrays are really matrix types. By the PDD's notion of `key', what would be the `key' of a matrix type ? (I think that's actually a -language question, but) What $foo[42] (where @foo is matrix) would compile to? In the separate load/store and do vtable scheme it means you get the value of @bar and @baz in scalar context, and multiply the results. Two operations, and the resultant values are sanitzed. In the single vtable scheme, we'd execute @bar's multiply routine, which would be clever enough (because we wrote it that way) to see the second parameter's also a matrix, and do matrix math. Actually, not necessarily. It depends of what the compiler does... There could be special entries for array operations, like +/-/*/... . The problem I see with it is what happens when you @a = @b. Actually, if @b is a matrix, @a = @b makes @a a matrix or evaluates @b in list context? What about @a = (@b) ? What if @a is a tied array? This matrix thing is actually getting very confusing to me... I think all these proposed additions to the language should be carefully examined for possible mis-interpretations like these. - Branden
Re: PDD 2, vtables
At 06:08 PM 2/7/2001 -0200, Branden wrote: Dan Sugalski wrote: Splitting the vtable into two pieces, with one piece not tied to a PMC, makes some things impossible. Consider this: @foo = @bar * @baz; where all three arrays are really matrix types. By the PDD's notion of `key', what would be the `key' of a matrix type ? Probably an integer, possibly a list for multidimensional matrices. (And I haven't thought about how to handle that--probably force a series of index lookups) (I think that's actually a -language question, but) What $foo[42] (where @foo is matrix) would compile to? Identically to how $foo[42] would if @foo were a plain array. In the separate load/store and do vtable scheme it means you get the value of @bar and @baz in scalar context, and multiply the results. Two operations, and the resultant values are sanitzed. In the single vtable scheme, we'd execute @bar's multiply routine, which would be clever enough (because we wrote it that way) to see the second parameter's also a matrix, and do matrix math. Actually, not necessarily. It depends of what the compiler does... There could be special entries for array operations, like +/-/*/... . The problem I see with it is what happens when you @a = @b. Actually, if @b is a matrix, @a = @b makes @a a matrix or evaluates @b in list context? That's a language issue. I don't know--I can see it going either way. I'd prefer a straight assign and let the assignment vtable entry handle it, but I don't know that we'll have that option. What about @a = (@b) ? Good question. I'd like to see it handled the same way as @a=@b, but I'm not sure that's going to happen. It's Larry's decision. (Mainly because I'd like to see this: @a = (@b, @c); turn into: @a = @b; push @a, @c; but I don't know that we'll be able to) What if @a is a tied array? What if? Larry's call as to whether it makes @a a copy of the data from @b, or creates a new tied thing, or an alias. Probably @a would be a plain copy of @b with no magic, but that's not my call. This matrix thing is actually getting very confusing to me... I think all these proposed additions to the language should be carefully examined for possible mis-interpretations like these. I'm not proposing they go in (well, OK, I am, but I'm not forcing it). What I am doing is trying to not preclude the possibility if its decided that it will happen. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: PDD 2, vtables [pointers to related documentation]
From: Tim Bunce [mailto:[EMAIL PROTECTED]] On Tue, Feb 06, 2001 at 12:28:23PM -0500, Dan Sugalski wrote: At 11:26 AM 2/6/2001 +, Tim Bunce wrote: On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =head2 Core datatypes For ease of use, we define the following semi-abstract data types Probably worth stating upfront that it'll be easy to add new types to avoid people arguing for their favorite type to be added here. I'm not sure it should be--that'd mean extending the vtables in ways they have little room to grow. Adding new perl datatypes is easy, adding new low-level types is harder. That's pretty much what I meant. I think it's worth saying. Adding comments like the ones Tim is suggesting, are just what someone like myself needs. A statement of the obvious and the context in which it fits... for those who haven't a clue and are trying to piece together a conceptual model of system. It saves me the conflict of deciding whether or not to pester you all with questions or not ask, never know and continue to be my own little mushroom. =head1 REFERENCES PDD 3: Perl's Internal Data Types. Some references to any other vtable based languages would be good.(I presume people have looked at some and learnt lessons.) Alas not. This is pretty much head of zeus stuff, modulo some ego. (Mine's not *that* big...) Without studying history we may be doomed to repeat it. So can anyone point to vtable based language implementations? Well, I may be one of the least qualified subscribers on this list, but I'm a pretty good gopher... Some of this relates to languages implementing vtables as opposed to being implemented with them. Everything I've scanned so far seems to raise the flag concerning overhead associated... Title:Portable Inheritance and Polymorphism in C URL: http://www.embedded.com/97/fe29712.htm Abstract: A lower-level view that assumes only a procedural language like C for embedded developers who want to apply OO without switching to an OO language Title:Programming Language Pragmatics URL: http://www.amazon.com/exec/obidos/ASIN/1558604421 Abstract: Mentions virtual methods and tables in Sections 10.4-5. It discusses vtables from a high level in the general context of Eiffel, Simula, C++, and Ada. Title:SableVM: A Research Framework for the Efficient Execution of Java Bytecode URL: http://www.j-meg.com/~egagnon/sable-report_2000-3/ Abstract: SableVM is an open-source virtual machine for Java2, intended as a research framework for efficient execution of Java bytecode. The framework is essentially composed of an extensible bytecode interpreter using state-of-the-art and innovative techniques. Written in the C programming language, and assuming minimal system dependencies, the interpreter emphasizes high-level techniques to support efficient execution. In particular, we introduce new data layouts for classes, virtual tables and object instances that reduce the cost of interface method calls to that of normal virtual calls, allow efficient garbage collection and light synchronization, and make effective use of memory space. Title:C++ Producer Guide URL: http://www.cse.unsw.edu.au/~patrykz/TenDRA/tcpplus/lib.html#vtable Abstract: vtable implementation in C++ Title:C++ ABI for IA-64 URL: http://reality.sgi.com/dehnert_engr/cxx/abi.html Abstract: vtables layout, etc. is discussed in sections 2.5-2.6 and scattered throughout. You can find similar information in C++ ABI documentation for Macintosh, etc.
Re: PDD 2, vtables
[First off: I've not really been paying attention so forgive me if I'm being dumb here. And many thanks for helping to drive this forwards.] On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =head2 Core datatypes For ease of use, we define the following semi-abstract data types Probably worth stating upfront that it'll be easy to add new types to avoid people argusing for their favorite type to be added here. =item INT =item NUM =item STR =item BOOL What about references? Arrays and hashes should probably be at least mentioned here. =head3 String data types =item binary buffer 'Binary string' =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. Except to say that some clarification here may help, and explicitly mentioning UTF-8 (even to say it won't be a core type and provide a reference to why) would be good. The functions are divided into two broad categories, those that perl will use the value of internally (for example the type functions) and those that produce or modify a PMC, such as the add function. So possibly a good idea to explicitly group them that way. =head2 Functions in detail =item type =item name STRname(PMC[, key]); Returns the name of the class the PMC belongs to. So I'd call it type_name (or maybe class_name as you seem to be useing the words interchangably. If type != class then clarify somewhere.). =item move_to BOOL move_to(void *, PMC); Tells the PMC to move its contents to a block of memory starting at the passed address. Used by the garbage collector to compact memory, this call can return a false value if the move can't be done for some reason. The pointer is guaranteed to point to a chunk of memory at least as large as that returned by the Creal_size vtable function. Shouldn't the PMC be the first arg for consistency? =item real_size IV real_size(PMC[, key]); Returns an integer value that represents the real size of the data portion, excluding the vtable, of the PMC. Contiguous? Sum of parts (allowing for allignment) if it contains multiple chunks of data? =item destroy void destroy(PMC[, key]); Destroys the variable the PMC represents, leaving it undef. Using the word 'variable' here probably isn't a good idea. Maybe "Destroys the contents of the PMC leaving it undef." =item is_same BOOL is_same(PMC1, PMC2[, key]); Returns TRUE if CPMC1 and CPMC2 refer to the same value, and FALSE otherwise. I think that needs more clarification, especially where they are of different types. Contrast with is_equal() below. =item concatenate void concatenate(PMC1, PMC2, PMC3[, key]); ## Concatenates the strings in CPMC2 and CPMC3, storing the result in CPMC1. and insert (ala sv_insert) etc? =item is_equal Contrast with is_same() above. =item logical_or =item logical_and =item logical_not Er, why not just use get_bool? The only reason I can think of is to support three-value-logic but that would probably be better handled via a higher-level overloading kind of mechanism. Either way, clarify. =item match void match(PMC1, PMC2, REGEX[, key]); Performs a regular expression match on CPMC2 against the expression CREGEX, placing the results in CPMC1. Results, plural = container = array or hash. Needs clarifying. =item repeat (x) void repeat(PMC1, PMC2, PMC3[, key]); ## Performs the following sequence of operations: finds the string value from CPMC2; finds an integer value In from CPMC3; replicates the string In times; stores the resulting string in CPMC1. So call it replicate? Could also work for arrays. =item nextkey (x) void nextkey(PMC1, PMC2, start_key[, key]); Looks up the key Cstart_key in CPMC2 and then stores the key after it in CPMC1. If start_key is Cundef, the first key is returned, and CPMC1 is set to undef if there is no next key. Containers again. And I'd call it key_next() =item exists (x) Likewise, key_exists() =head1 TODO The effects of each function on scalar, array, hash, list, and IO PMCs needs to be hashed out. Before that I think a section on containers need to be added. =head1 REFERENCES PDD 3: Perl's Internal Data Types. Some references to any other vtable based languages would be good. (I presume people have looked at some and learnt lessons.) Tim.
Re: PDD 2, vtables
On Tue, Feb 06, 2001 at 11:26:57AM +, Tim Bunce wrote: =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. "Native string encoding" is an abstraction that allows us to say "some encoding that Perl knows how to deal with, but you don't need to care about". The idea, if I understand it correctly, is that strings can be treated as arrays of numbers, and you don't need to know how they're *really* stored. They might be stored as UTF8 internally; I hope so. "Foreign string" is the same, but with the implication that some external code will be needed to tell Perl how it should convert between an array of numbers and this encoding. Using the word 'variable' here probably isn't a good idea. Maybe "Destroys the contents of the PMC leaving it undef." Agreed. =item logical_or =item logical_and =item logical_not Er, why not just use get_bool? Overloading. =item repeat (x) void repeat(PMC1, PMC2, PMC3[, key]); ## Performs the following sequence of operations: finds the string value from CPMC2; finds an integer value In from CPMC3; replicates the string In times; stores the resulting string in CPMC1. So call it replicate? Well, the Perl-space operator is called "repeat", and the Perl 5 operator is OP_REPEAT, so... Before that I think a section on containers need to be added. This is basically what the whole "key" thing is about. I'm not sure that *that* much more needs to be described, (well, something needs to be described, but not in great detail) other than the fact that operating on a container PMC-key pair is equivalent to operating on a scalar PMC. Hence, I can say: (on a hash) get_number(hash, "key") (on an array) get_number(array, elem) (on a scalar) get_number(scalar) and not worry about what's going on underneath. There ought to be special functions for containers though, yeah. -- "I find that anthropomorphism really doesn't help me with a place full of bugs." -- Megahal (trained on asr), 1998-11-06
Re: PDD 2, vtables
At 11:32 AM 2/6/2001 -0200, Branden wrote: Simon Cozens wrote: =item logical_or =item logical_and =item logical_not Er, why not just use get_bool? Overloading. Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
At 11:26 AM 2/6/2001 +, Tim Bunce wrote: [First off: I've not really been paying attention so forgive me if I'm being dumb here. And many thanks for helping to drive this forwards.] On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =head2 Core datatypes For ease of use, we define the following semi-abstract data types Probably worth stating upfront that it'll be easy to add new types to avoid people argusing for their favorite type to be added here. I'm not sure it should be--that'd mean extending the vtables in ways they have little room to grow. Adding new perl datatypes is easy, adding new low-level types is harder. =item INT =item NUM =item STR =item BOOL What about references? Special type of scalar, not dealt with here. Arrays and hashes should probably be at least mentioned here. And lists, yes. Or they need their own PDD with details. =head3 String data types =item binary buffer 'Binary string' I avoided that on purpose. Label it a string and people think of its contents as characters, and they're probably not going to be a good chunk of the time. Might not outweigh the consistency issue, though. =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. Except to say that some clarification here may help, and explicitly mentioning UTF-8 (even to say it won't be a core type and provide a reference to why) would be good. I didn't put UTF-8 in on purpose, because I'd just as soon not deal with it internally. Variable length character data's a pain in the butt, and if we can avoid having the internals deal with it except as a source that gets converted to UTF-32, that's fine with me. The native and foreign string data types were an attempt to accommodate UTF-8, as well as ASCII and EBCDIC character data. One of the three will likely be the native type, and the rest will be foreign strings. I'm not sure if perl should have only one foreign string type, or if we should have a type tag along with the other bits for strings. The functions are divided into two broad categories, those that perl will use the value of internally (for example the type functions) and those that produce or modify a PMC, such as the add function. So possibly a good idea to explicitly group them that way. They were, but I see I lost that. =head2 Functions in detail =item type =item name STRname(PMC[, key]); Returns the name of the class the PMC belongs to. So I'd call it type_name (or maybe class_name as you seem to be useing the words interchangably. If type != class then clarify somewhere.). The interchange is due to sloppy thinking. I'll redo it so that class == perl data type, while type == (NUM|STR|BOOL|INT). =item move_to BOOL move_to(void *, PMC); Tells the PMC to move its contents to a block of memory starting at the passed address. Used by the garbage collector to compact memory, this call can return a false value if the move can't be done for some reason. The pointer is guaranteed to point to a chunk of memory at least as large as that returned by the Creal_size vtable function. Shouldn't the PMC be the first arg for consistency? First arg of the PMC is the destination PMC. We don't have one here. =item real_size IV real_size(PMC[, key]); Returns an integer value that represents the real size of the data portion, excluding the vtable, of the PMC. Contiguous? Sum of parts (allowing for allignment) if it contains multiple chunks of data? Size we'd need to allocate if we were going to move the data. Though knowing how much space is currently taken would also be useful, assuming they're not the same. (They probably would be within a few bytes, though) =item destroy void destroy(PMC[, key]); Destroys the variable the PMC represents, leaving it undef. Using the word 'variable' here probably isn't a good idea. Maybe "Destroys the contents of the PMC leaving it undef." Better. Thanks. =item is_same BOOL is_same(PMC1, PMC2[, key]); Returns TRUE if CPMC1 and CPMC2 refer to the same value, and FALSE otherwise. I think that needs more clarification, especially where they are of different types. Contrast with is_equal() below. If they're different types they can't be the same. This would be used to check if two references have the same referent, or if two magic variables (database handles, say) pointed to the same thing. =item concatenate void concatenate(PMC1, PMC2, PMC3[, key]); ## Concatenates the strings in CPMC2 and CPMC3, storing the result in CPMC1. and insert (ala sv_insert) etc? Hadn't considered them. Care to elaborate on the etc? =item is_equal Contrast with is_same() above. =item logical_or =item
Re: PDD 2, vtables
Dan Sugalski wrote: At 11:26 AM 2/6/2001 +, Tim Bunce wrote: Arrays and hashes should probably be at least mentioned here. And lists, yes. Or they need their own PDD with details. What's the difference between array and list? How is a list currently (Perl5) implemented? Which operations does it support? =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. Except to say that some clarification here may help, and explicitly mentioning UTF-8 (even to say it won't be a core type and provide a reference to why) would be good. I didn't put UTF-8 in on purpose, because I'd just as soon not deal with it internally. Variable length character data's a pain in the butt, and if we can avoid having the internals deal with it except as a source that gets converted to UTF-32, that's fine with me. I would bother a lot, having my strings occupying 4x more memory than they do now... For me, at least, UTF32 can be set aside, while UTF8 is a need nowadays (damn XML!). =item match void match(PMC1, PMC2, REGEX[, key]); Performs a regular expression match on CPMC2 against the expression CREGEX, placing the results in CPMC1. Results, plural = container = array or hash. Needs clarifying. Yep, especially since I'd considered tossing the match destination entirely. (Though that means special variables, and I'm not sure I want to go there) It'll likely just return true or false. I'll rethink it. Will Perl 6 still be based on a stack, to pass a list of parameters and return a list of results to the subs? Or is there any other approach discussed for it? Is it still undefined? Will it work the same as in Perl 5 or will it take changes? Too soon to talk about it? - Branden
Re: PDD 2, vtables
At 05:01 PM 2/6/2001 -0200, Branden wrote: Dan Sugalski wrote: At 11:26 AM 2/6/2001 +, Tim Bunce wrote: Arrays and hashes should probably be at least mentioned here. And lists, yes. Or they need their own PDD with details. What's the difference between array and list? How is a list currently (Perl5) implemented? Which operations does it support? I'll leave this as an exercise for the reader. (Which is a polite way to say "go find out, grasshopper" :) =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. Except to say that some clarification here may help, and explicitly mentioning UTF-8 (even to say it won't be a core type and provide a reference to why) would be good. I didn't put UTF-8 in on purpose, because I'd just as soon not deal with it internally. Variable length character data's a pain in the butt, and if we can avoid having the internals deal with it except as a source that gets converted to UTF-32, that's fine with me. I would bother a lot, having my strings occupying 4x more memory than they do now... For me, at least, UTF32 can be set aside, while UTF8 is a need nowadays (damn XML!). It's a speed/space tradeoff. Dealing with the middle of a string with variable-length characters either requires an offset array (in which case you've just used more memory and time than a fixed-width representation) or scanning from the beginning of the array, which can be costly if you need to go too far in. (How costly depends on the architecture. Dealing with arrays of 8-bit ints is actually slower on the Alpha than the same size (element count, not bytecount) array of 32-bit ints. YMMV, though) =item match void match(PMC1, PMC2, REGEX[, key]); Performs a regular expression match on CPMC2 against the expression CREGEX, placing the results in CPMC1. Results, plural = container = array or hash. Needs clarifying. Yep, especially since I'd considered tossing the match destination entirely. (Though that means special variables, and I'm not sure I want to go there) It'll likely just return true or false. I'll rethink it. Will Perl 6 still be based on a stack, to pass a list of parameters and return a list of results to the subs? Or is there any other approach discussed for it? Is it still undefined? Will it work the same as in Perl 5 or will it take changes? Too soon to talk about it? We'll probably pass lists in and out, but it'll have a stack of sorts--that's pretty much a requirement for Algol-based languages. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
On Tue, Feb 06, 2001 at 05:01:38PM -0200, Branden wrote: How is a list currently (Perl5) implemented? It's a bunch of SVs sitting on the stack, followed by a mark. Which operations does it support? None. -- Rule the Empire through force. -- Shogun Tokugawa
Re: PDD 2, vtables
Branden wrote: Where can I find how Perl5's stack works (specially about parameter passing and returning from subs)? Oh boy. What a masochist. ;-) Alan Burlison