Re: PDD 2, vtables
On Tue, Feb 06, 2001 at 12:28:23PM -0500, Dan Sugalski wrote: At 11:26 AM 2/6/2001 +, Tim Bunce wrote: [First off: I've not really been paying attention so forgive me if I'm being dumb here. And many thanks for helping to drive this forwards.] On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =head2 Core datatypes For ease of use, we define the following semi-abstract data types Probably worth stating upfront that it'll be easy to add new types to avoid people argusing for their favorite type to be added here. I'm not sure it should be--that'd mean extending the vtables in ways they have little room to grow. Adding new perl datatypes is easy, adding new low-level types is harder. That's pretty much what I meant. I think it's worth saying. =item INT =item NUM =item STR =item BOOL What about references? Special type of scalar, not dealt with here. But should be at least mentioned. =item UTF-32 string =item Native string =item Foreign string I'm a little surprised not to see UTF-8 there, but since I'm also confused about what Native string and Foreign string are I'll skip it. Except to say that some clarification here may help, and explicitly mentioning UTF-8 (even to say it won't be a core type and provide a reference to why) would be good. I didn't put UTF-8 in on purpose, because I'd just as soon not deal with it internally. Variable length character data's a pain in the butt, and if we can avoid having the internals deal with it except as a source that gets converted to UTF-32, that's fine with me. I agree with Branden that a default 4x memory bloat would not be popular. The native and foreign string data types were an attempt to accommodate UTF-8, as well as ASCII and EBCDIC character data. One of the three will likely be the native type, and the rest will be foreign strings. I'm not sure if perl should have only one foreign string type, or if we should have a type tag along with the other bits for strings. Umm, one way or another I suspect UTF-8 will be in there. =item is_same BOOL is_same(PMC1, PMC2[, key]); Returns TRUE if CPMC1 and CPMC2 refer to the same value, and FALSE otherwise. I think that needs more clarification, especially where they are of different types. Contrast with is_equal() below. If they're different types they can't be the same. This would be used to check if two references have the same referent, or if two magic variables (database handles, say) pointed to the same thing. Okay, so say so in the PPD. "refer to the same value" isn't very clear (the word value is probably the problem). =item concatenate void concatenate(PMC1, PMC2, PMC3[, key]); ## Concatenates the strings in CPMC2 and CPMC3, storing the result in CPMC1. and insert (ala sv_insert) etc? Hadn't considered them. Care to elaborate on the etc? Er, I haven't looked at sv.c for ages but basically all the kinds of string manipulations that ended up in there for good reason will probably need to be in perl6. sv_insert is a good example (and possibly the only one :-) =item logical_or =item logical_and =item logical_not Er, why not just use get_bool? The only reason I can think of is to support three-value-logic but that would probably be better handled via a higher-level overloading kind of mechanism. Either way, clarify. Well, there's overloading. Plus the potential that a class will do something odd with it--if you || on two custom arrays in list context you might get an array with each pair (left[0] || right [0] and so on) logically or'd. Okay, don't forget xor then :) =item match void match(PMC1, PMC2, REGEX[, key]); Performs a regular expression match on CPMC2 against the expression CREGEX, placing the results in CPMC1. Results, plural = container = array or hash. Needs clarifying. Yep, especially since I'd considered tossing the match destination entirely. (Though that means special variables, and I'm not sure I want to go there) It'll likely just return true or false. I'll rethink it. A BOOL return would be good. But "placing the results in CPMC1" is also good (assuming 'results' are equiv to $1, $2 etc in perl5). =head1 REFERENCES PDD 3: Perl's Internal Data Types. Some references to any other vtable based languages would be good. (I presume people have looked at some and learnt lessons.) Alas not. This is pretty much head of zeus stuff, modulo some ego. (Mine's not *that* big...) :-) Without studying history we may be doomed to repeat it. So can anyone point to vtable based language implementations? Tim.
Re: Magic [Slightly Off-Topic... please point me to documentation]
On Tue, 6 Feb 2001 17:53:17 -0200, Branden wrote: It appears you're blessing one reference and returning another... like sub new { my $key; my $a = \$key; my $b = \$key; bless $a; return $b; } I think the problem is not with the overloading magic, but with the code snippet... A recent thread on comp.lang.perl.misc discussed how bless() works with the reference, but alledgedly, it's the underlying thing that gets blessed, not the reference itself. my $a = \$x; my $b = \$x; bless $a, 'FOO'; print $b; -- FOO=SCALAR(0x8a652e4) It sure looks that they're right. Oh, this is perl 5.6.0. -- Bart.
Re: Another approach to vtables
Edwin Steiner wrote: Dan Sugalski wrote: [snip] That's OK, since my example was wrong. (D'oh! Chalk it up to remnants of the martian death flu, along with too much blood in my caffeine stream) The example $foo{bar} = $baz + $xyzzy[42]; turns into baz-vtable-add[NATIVE](foo, baz, xyzzy, key); Why does the bytecode compiler know it should generate NATIVE int addition? Are you assuming 'use integer'? (see also next comment) [snip] I thought about it once more. Maybe I was confused by the *constant* NATIVE. Are you suggesting a kind of multiple dispatch (first operand selects the vtable, second operand selects the slot in the vtable)? So $dest = $first + $second becomes first-vtable-add[second-vtable-type(second)](dest,first,second,key); ? or maybe first-vtable-add[second-vtable-slot_select](dest,first,second,key); which saves a call by directly reading an integer from the vtable of second. (BTW, this is also how overloading with respect to the second argument could be handled (should it be decided on the language level to do that): There could be a slot like add[ARCANE_MAGIC] selected by second-vtable-slot_select which does all kinds of complicated checks and branches without any cost for the vfunctions in the other slots.) Such a multiple dispatch seems to me like the only solution which avoids the following (eg. in Python): 'first + second' becomes 1. call virtual function 'add' on first 2. inside first-add do lots of checks about type of second -Edwin
Re: vtables: Assignment vs. Aliasing
[CC'ed to language, because I think it's there that it belongs] On Mon, 5 Feb 2001 15:35:18 -0200, Branden wrote: There are two possible things that could happen when you say: $a = $b; @a = @b; # or %a = %b; These two things are assignment and aliasing. No way. Although I think aliasing is a great tool, but assignment is by value. Always. (Well, except for referenced things...) In perl5 terms: *a = \$b; *a = \@b; # or *a = \%b; However, typeglobs are said to disappear from Perl6, I think Larry wants to drop typeglobs themselves, i.e. keeping different kinds of variables of the same name in one record, but not the possibilities they offer. Aliasing is likely the most interesting feature of them all. ... My preference: * Alias when assigning to a reference: \$a = \$b; \@a = \@b; \%a = \%b; I think this is a nice symmetrical syntax. * Make aliasing the default for = and provide another way of assigning (NO WAY!!!) Indeed, no way. Look, if you'd do the latter, you would not only make Perl effectively a different language, but you'd also be missing out on one of the great benefits of aliasing. For example, you pass a reference of a hash to a sub, so the original hash can be accessed and modified. With the latter syntax, you can't even do that through an alias. In the former syntax: foo(\%bar); sub foo { my \%hash = shift; # alias through reference print $hash{FOO}; } You can now access the passed hash as a hash, and not through the slightly awkward syntax of accessing it through a reference: sub foo2 { my $hash = shift; print $hash-{FOO}; } (You don't think it's that awkward? Try getting a hash slice through a hash reference. Ugh.) -- Bart.
Re: Rare Salt-Water Camel May Be Separate Species
On Wed, 7 Feb 2001 09:17:30 -0500, Joshua N Pritikin [EMAIL PROTECTED] wrote: http://www.nytimes.com/2001/02/07/science/07reuters-camel.html Which is of no use if you don't have a subscriber ID (and do not want to have one) to th NYT, since it is quite useless in europe ... -- H.Merijn Brand Amsterdam Perl Mongers (http://www.amsterdam.pm.org/) using perl-5.005.03, 5.6.0, 5.6.1, 5.7.1 623 on HP-UX 10.20 11.00, AIX 4.2 AIX 4.3, WinNT 4.0 SP-6a, and Win2000pro often with Tk800.022 /| DBD-Unify ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/
Re: Rare Salt-Water Camel May Be Separate Species
On Wed, Feb 07, 2001 at 03:33:39PM +0100, H . Merijn Brand wrote: On Wed, 7 Feb 2001 09:17:30 -0500, Joshua N Pritikin [EMAIL PROTECTED] wrote: http://www.nytimes.com/2001/02/07/science/07reuters-camel.html Which is of no use if you don't have a subscriber ID (and do not want to have one) to th NYT, since it is quite useless in europe ... http://news.bbc.co.uk/hi/english/sci/tech/newsid_1156000/1156212.stm -- I am familiar with this particular stupid user; it lives inside one's head and takes control at unexpected moments. - Roger Burton West
Re: PDD 2, vtables
Some comments about the vtable PDD... First a general comment. I think we really need to make it clear for each method, which arg respresents the object that is having its method called (ie which is $self/this so to speak). One way to make this clear would be to insist that the first arg is always $self, but failing that, it should be explicity mentioned for each function. Also, can I suggest a bit of terminology, which I will use below? I define an *empty* PMC as a PMC which exists, but does not have a valid vtable ptr or content (and so whose methods must not be called under any circumstances). I specifically contrast this with a undefined PMC, which has a valid vtable pointer that points to a bunch of methods that mostly call carp("use of undefined value ..."). Now onwards and upwards The Ckey parameter is optional, and if passed it refers to an array of key structure pointers. A mere detail, but would it not be more efficient to just pass them as extra args, ie add(PMC1, PMC2, PMC3, key1, key2, key3), rather than having to potentially create and populate a tmp struct just to call the function??? IVtype(PMC[, subtype]); Returns the type of the PMC. If the subtype is passed (int, string, num) it returns the subtype of the PMC. This is generally a class function rather than a variable one, but the PMC is passed in just in case. (And so we can have the subtype be a vararg parameter) I dont understand the subtype bit. If for example we pass an optional 2nd arg of INT (assuming this is some sort of enum?), what does type() return? =item new void new(PMC[, key]); Creates a new variable of the appropriate type out of the passed PMC, destroying the current contents if there are any. This is a class function. As an aside, am I right in assuming that there will be a function somewhere (outside the scope of this PDD) that creates new, empty PMCs, which can then be acted upon by new() to turn them into PMCs of a particular type? Will PMCs that have just been new()ed have a default null value, eg 0/"" ? Note that they wont be undefined - at least, I'm assuming that undefined is handled by a separate class. Perhaps we need instead a range of new()s that initialise to various string, numeric etc values? The above definition of new() implies that it first calls destroy() to release any previous contents. I think it would be better to define new() as operating on an empty PMC (so it is the the caller's responsibility to call destroy() first, if necessary). Actually, I suspect that the whole area of new/clone/destroy etc will need to be examined carefully in the light of a 'typical' variable lifecycle, to avoid to unecessary transitions. For example, my $a = 'abc' might involve $a going from empty - undef - empty - "" - "abc" or similar, if we're not careful. void clone(PMC1, PMC2 [, int flags[,key]); Copies CPMC2 into CPMC1. The Cflags parameter notes whether a deep copy should be done. (Possibly other things as well, if someone thinks of something reasonable) One flag that would be very useful is 'destroy', which tells clone() to destroy PMC2 immediately after the clone operation. This is because a clone will often be immediately followed by a destroy of the copied PMC, and delegating the destory() to clone() allows clone() the chance to do things more effiently (eg even when asked to do a deep copy, it just copies the vtable pointer and payload pointer(s), then scrubs the old PMC) I alo think that clone should expect PMC1 to be empty - ie it assumes the caller has already called destroy() if necessary. I guess we also need an assign() method, to handle $a = $b and the like. Note that assign and clone are very different operations (although assign may well call clone): assign() copies *to* itself, while clone() copies *from* itself. Note that if $a is a 'simple' variable, $a-assign($b) will itself just fall through to $b-clone($a) and let $a be wiped; while if $a is magic or tied or whatever, then $a-assign($b) will take a more active role in setting its own value, based on the value of $b. void morph(PMC, type[, key]); Tells the PMC to change itself into a PMC of the specified type. I dont really see what the difference is between this and new(). void destroy(PMC[, key]); Destroys the variable the PMC represents, leaving it undef. (See also my comments earlier about new/clone/destroy etc). I think destroy should leave an empty PMC rather than an undef one, since as I said earlier, I think undef is a class in its own right. =item exists (x) BOOL exists(PMC1[, key]); Presumably we also need defined(). (where most classes will always return false, while the 'undefined' classs class always returns true.)
Re: Rare Salt-Water Camel May Be Separate Species
On Wed, 7 Feb 2001 15:05:55 +, Simon Cozens [EMAIL PROTECTED] wrote: On Wed, Feb 07, 2001 at 03:33:39PM +0100, H . Merijn Brand wrote: On Wed, 7 Feb 2001 09:17:30 -0500, Joshua N Pritikin [EMAIL PROTECTED] wrote: http://www.nytimes.com/2001/02/07/science/07reuters-camel.html Which is of no use if you don't have a subscriber ID (and do not want to have one) to th NYT, since it is quite useless in europe ... http://news.bbc.co.uk/hi/english/sci/tech/newsid_1156000/1156212.stm :-)) Thanks -- H.Merijn Brand Amsterdam Perl Mongers (http://www.amsterdam.pm.org/) using perl-5.005.03, 5.6.0, 5.6.1, 5.7.1 623 on HP-UX 10.20 11.00, AIX 4.2 AIX 4.3, WinNT 4.0 SP-6a, and Win2000pro often with Tk800.022 /| DBD-Unify ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/
Re: Another approach to vtables
Dan Sugalski wrote: At 05:41 PM 2/6/2001 -0200, Branden wrote: I actually don't see a reason why the vtable entries should be the opcodes. Is there? Speed. Actually, I don't see the problem of defining a C function that would do: void add(SVAR *result, SVAR *lop, SVAR *rop, SVAL *tmp1, SVAL *tmp2) { /* tmp comes from the temporary file */ lop-vtable-FETCH(tmp1, lop); rop-vtable-FETCH(tmp2, rop); lop-vtable-ADD(tmp1, tmp1, tmp2); result-vtable-STORE(result, tmp1, tmp2); } And have it be my opcode. Passing the indexes wouldn't be a problem either. Is there any problem here? Well, no, but what's the point? If you're calling lop's ADD vtable entry, why not have that entry deal with the rest of it, rather than stick things into, and later remove them, from temp slots? Dan, I see you talk about vtables and opcodes as related things. I really don't see why you think that's necessary, I'd like to hear why you think it. As far as I know (and I could be _very_ wrong), the primary objectives of vtables are: 1. Allowing extensible datatypes to be created by extensions and used in Perl. 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). 3. Replacing Perl5's SV*,AV*,HV*,... (I don't know if it should replace or complement -- ?) And some secondary objectives are: 4. Allow the use of different string encodings internally. 5. Allow int's and float's to become bigint's and bigfloat's when an overflow occurs. Is this right or am I missing something? The point I want to make, is that vtables are directly related to what tie and overload are today (or sv_magic, if you want the underlying thing [I don't know what's underlying in overload case, as it's apparently not documented]). So I don't really see why opcodes are in the discussion. For me, at least, tie and overload are related to data, and opcodes with execution. They are orthogonal things, or at least should be (sure that doesn't mean they are not related and should be implemented separately, but they sure can). Of course I understand some opcodes are related to data manipulation, and should of course be modelled after vtables, but they surely can be separated. Before I proposed the code above for `add', taking 3 SVAR's and doing the same as what you proposed (considering no arrays/hashes). I thought about it, and I saw that it could be extended to handle 3 PMC's, instead of SVAR's. If they are SVAL's, they are passed directly to the vtable, otherwise they are fetched/stored using keys that are passed by parameters. The add method would be able to determine it by the TYPE vtable entry (finally found a use for it...). That would actually result in the exact same opcodes that your approach would. [[ And I actually see an advantage here. As I expect SVAL's to be used by temporarys, in a longer expression that involves multiple operations, all results would be stored in SVAL's, that don't have store/fetch operation (since it's trivial) and so it's cheaper to use them. ]] -- I actually am wrong here. I thought having to call store/fetch would be an issue, since it is for tied things in Perl 5, but here it only costs what the method has to do, what gives me the hint that in Perl 6, tie will be the fastest thing ever. Some more about opcode dispatching. What it has to do is: 1. Fetch an instruction (that would be a byte indicating which operation) 2. Find the function that handles this instruction (that would be lookup in a table) 3. Call the function (here I refer to the stack handling, of passing and returning parameters and return addresses...) 4. [ Here goes the instruction computing, made by the fetched function ] 5. Some cleanup (I think nothing really expensive) 6. Loop. Go back to 1 and fetch the next instruction. Ok. Am I right here or I'm missing something really dumb? If it's this way, what is so expensive here that makes 4 instructions so much slower than 1 if the 4 together make the same thing as the 1? Compared to what an `add' function would tipically do, testing the argument types, and possibly converting things to bigints, or even a `concat' function having to allocate memory and copy possibly big blocks of bytes, I don't see what's the problem with some CPU cycles to push the parameters to a stack... Of course, there is probably something very dumb I'm missing here. Please point it to me. - Branden
Re: PDD 2, vtables
Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...)
Re: PDD 2, vtables
At 04:02 PM 2/7/2001 +, David Mitchell wrote: Please see my previous post on the subject. As I pointed there, implementing || and like that breaks short-circuits. No, it doesn't. Just because you pass in two PMCs doesn't mean that they both need to be evaluated. Though the PDD does need to be clearer about how that happens. Hmmm, I can't quite how that trick works. How whould the following get evaluated: $opened || open(F, ...) The second PMC would point to a lazy list, so it wouldn't be evaluated unless its value gets fetched. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Another approach to vtables
At 11:13 AM 2/7/2001 +0100, Edwin Steiner wrote: Dan Sugalski wrote: [snip] That's OK, since my example was wrong. (D'oh! Chalk it up to remnants of the martian death flu, along with too much blood in my caffeine stream) The example $foo{bar} = $baz + $xyzzy[42]; turns into baz-vtable-add[NATIVE](foo, baz, xyzzy, key); Why does the bytecode compiler know it should generate NATIVE int addition? Are you assuming 'use integer'? (see also next comment) It's in there for clarity. It's likely either been cached somewhere, or comes from a call to the type vtable entry for the second parameter. with the add routine doing IV lside = baz-vtable-get_int[NATIVE](baz, key+1); IV rside = xyzzy-vtable-get_int[NATIVE](xyzzy, key+2); This code assumes $xyzzy[42] will fit in an IV. It could be bigint, bigfloat, ... and already create an overflow when *converted* to IV, not only when added to lside. I know this is so because it's the add[NATIVE] implementation, right(?). Right. But why does the bytecode compiler (or any other code from parsing to execution) know that NATIVE will be appropriate? Where does this assumption about both $baz and $xyzzy[42] come from? Well, we know about baz because it's baz's vtable being used. We know about $xyzzy[42] because we asked it and left that out for clarity. (Or because we wanted to treat it explicitly as a native integer type) IV sum; bigstr *bigsum; CHECK_OVERFLOW(bigstr, (sum = lside + rside)); if (OVERFLOW) { foo-vtable-set_integer[BIGINT](foo, bigsum, key[0]); } else { foo-vtable-set_integer[NATIVE](foo, sum, key[0]); } and foo's set_integer storing the passed data in the database as appropriate. And if we replace that line for the correspondent set_string operation, I don't see the need to have add in a vtable, because I cannot understand how it could be implemented differently for another variable aside from foo. Now, as to this... What happens if you have an overloaded array on the left side? Or a complex number? Or a bigint? The point of having add in the vtable (along with a lot of the other stuff that's in there) is so we can have a lot of special-purpose code rather than a hunk of general-purpose code. The idea is to reduce the number of tests and branches and shrink the size of the code we actually execute so we don't blow processor cache or have to clear out execution pipelines. Having the `key' data structure looks like special-p. - general-p. to me. Is it your idea that the key pointers will simply get passed along most of the time and in the end there will be few branches because eg. an array value knows it has to do indexing and a scalar value will ignore the key pointer? Yep. (unless we put some meaning to a key for a scalar) Container variables can have their vtable entries called with or without keys--passing a key of 42 to get_integer for an array gets you the integer value of one of the array's entries, while calling get_integer with no key gets you the integer value of the entire array. (Which is probably how scalar(@array) is going to be implemented) Are there estimates about the relative frequencies of $a vs. $a[] or $a{} in perl code? I know chip has some, but I've been unable to get them from him. One of the points he made for his Topaz talk is that the majority of the memory a perl program of any size takes up is tied to its hash and array usage. How will non-constant keys be handled? Will there be key data structures created on the C-stack or in reused buffers? Or will it be like this: 1. one or more ops calculate the key and fetch the PMC from the container, 2. the PMC is passed to other functions with a NULL key entry. Good question. I don't know yet. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 04:03:49PM +, David Mitchell wrote: BTW, should the vtable include all the mutator operators too, ie ++, += and so on, on the grounds that an implementation may be able do this more efficiently internally? ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. Nicholas Clark
Re: PDD 2, vtables
Nicholas Clark wrote: ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. Nicholas Clark I guess everything (including get/set, add/sub/mul/...) could have a speed up hack for the common case. The vtables (or PMC's flags as well) could have a flag that indicate ``No tying and no overloading here, nothing special, just another plain old variable''. Every operation would check the `special' flag of the values it operates, and do the right thing on them. Otherwise, call the vtable for the generic way of doing it. I _think_ this would be a great speed up if the program doesn't use much magic, but perhaps the overhead would be too big and make tying slower than in Perl 5... something to consider tough. - Branden
Re: PDD 2, vtables
Nicholas Clark [EMAIL PROTECTED] mused: On Wed, Feb 07, 2001 at 04:03:49PM +, David Mitchell wrote: BTW, should the vtable include all the mutator operators too, ie ++, += and so on, on the grounds that an implementation may be able do this more efficiently internally? ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. I'm not sure I follow you. What is the "this" in "this nearly provides a divide"? Confused of Sheffield.
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 05:19:16PM +, David Mitchell wrote: Nicholas Clark [EMAIL PROTECTED] mused: On Wed, Feb 07, 2001 at 04:03:49PM +, David Mitchell wrote: BTW, should the vtable include all the mutator operators too, ie ++, += and so on, on the grounds that an implementation may be able do this more efficiently internally? ++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. I'm not sure I follow you. What is the "this" in "this nearly provides a divide"? this example. I think the "nearly" probably should go. Maybe I should have written "++ and -- in perl5 provides an example of a (nearly clean) divide between operator and value Confused of Sheffield. Hmm. Yes. I'm confused too. Confused of Newcastle
Re: PDD 2, vtables
At 03:09 PM 2/7/2001 +, David Mitchell wrote: Some comments about the vtable PDD... First a general comment. I think we really need to make it clear for each method, which arg respresents the object that is having its method called (ie which is $self/this so to speak). One way to make this clear would be to insist that the first arg is always $self, but failing that, it should be explicity mentioned for each function. The docs are unclear there. I'll patch 'em up. FWIW, generally the first argument is the destination. Also, can I suggest a bit of terminology, which I will use below? I define an *empty* PMC as a PMC which exists, but does not have a valid vtable ptr or content (and so whose methods must not be called under any circumstances). I specifically contrast this with a undefined PMC, which has a valid vtable pointer that points to a bunch of methods that mostly call carp("use of undefined value ..."). This is an area that needs addressing, that's for sure. I'll deal with that as well. The Ckey parameter is optional, and if passed it refers to an array of key structure pointers. A mere detail, but would it not be more efficient to just pass them as extra args, ie add(PMC1, PMC2, PMC3, key1, key2, key3), rather than having to potentially create and populate a tmp struct just to call the function??? Well, extra arguments cost. I'd originally had only a single key optionally passed in, but that meant potentially two of the three PMCs had to be real, rather than keyed into containers. (And thus possibly virtual) Hence the key array. I admit it's not a swell decision--I'm not sure whether the single optional parameter is better, or several optional (or required) parameters are better. I can see it going either way, and I didn't put all that much thought into it. IVtype(PMC[, subtype]); Returns the type of the PMC. If the subtype is passed (int, string, num) it returns the subtype of the PMC. This is generally a class function rather than a variable one, but the PMC is passed in just in case. (And so we can have the subtype be a vararg parameter) I dont understand the subtype bit. If for example we pass an optional 2nd arg of INT (assuming this is some sort of enum?), what does type() return? If you pass in INT, you'll get back NATIVE or BIGINT, depending on which one the PMC would rather be. STR would get BINARY, NATIVE, FOREIGN, or UTF_32. Basically, subtype says "If I asked you what kind of string you were, what would you tell me?" =item new void new(PMC[, key]); Creates a new variable of the appropriate type out of the passed PMC, destroying the current contents if there are any. This is a class function. As an aside, am I right in assuming that there will be a function somewhere (outside the scope of this PDD) that creates new, empty PMCs, which can then be acted upon by new() to turn them into PMCs of a particular type? Probably, yes. More likely, PMCs will be declared nukable unless a "clean me up" flag is set, in which case we'd just stomp on what was in there. (Since we generally don't care about the contents of a PMC when trashing it, with some relatively rare exceptions) Will PMCs that have just been new()ed have a default null value, eg 0/"" ? Note that they wont be undefined - at least, I'm assuming that undefined is handled by a separate class. Perhaps we need instead a range of new()s that initialise to various string, numeric etc values? I'm figuring that'll be handled by the bits that deal with constants, but there's no reason that a plain new PMC can't be undef. (Basically set the private data pointer to NULL and the vtable pointer to the undef class vtable) The above definition of new() implies that it first calls destroy() to release any previous contents. I think it would be better to define new() as operating on an empty PMC (so it is the the caller's responsibility to call destroy() first, if necessary). destroy will only be called if the PMC needs it. Most PMCs will just get their contents trashed and let the GC clean up after. Actually, I suspect that the whole area of new/clone/destroy etc will need to be examined carefully in the light of a 'typical' variable lifecycle, to avoid to unecessary transitions. For example, my $a = 'abc' might involve $a going from empty - undef - empty - "" - "abc" or similar, if we're not careful. That's an area for the optimizer. I'd like it to go from empty-"abc", assuming we don't skip the empty step. void clone(PMC1, PMC2 [, int flags[,key]); Copies CPMC2 into CPMC1. The Cflags parameter notes whether a deep copy should be done. (Possibly other things as well, if someone thinks of something reasonable) One flag that would be very useful is 'destroy', which tells clone() to destroy PMC2 immediately after the clone operation. That makes no sense to me. If we're cloning PMC2 then trashing it, why not just set
Re: Another approach to vtables
Dan, I think there is a real problem with your vtable approach. It involves tying, overloading and assignment. I'm not sure if I really got what you meant with the PDD, but I'm assuming: 1. PMC's replace SV*. 2. Tying is handled by vtables that implement set_* and get_* entries to do the magic stuff. 3. Overloading is handled by vtables that implement add/subtract/mul/... entries to do the magic stuff. 4. There's only one vtable for each class of variables, i.e. all variables tied to class X share the same vtable, and all objects that have overloading defined by class Y share the same vtable (this seems obvious, since it's the vtables that define the behaviour in case of tying/overloading). 5. On a $a = $b assignment, the PMC correspondent to $a after the assignment is the same it corresponded before it. I.e. $a is set to the value of $b by calling set_* methods of $a passing some information of $b or $b itself as parameters, and not replaced with a new generated PMC derived from $b somehow. The problem I see concerns assignment, like $a = $b. What will be the vtable of $a? Suppose $b is tied. The vtable of $a should not be the one of $b, since by this assignment $a doesn't get tied, it only fetches the value of $b. That means the vtable of $a should probably be one of the `plain' vtables of perl, to handle simple datatypes (of course it could be a special one, but it doesn't matter here). Now suppose $b is overloaded. For instance, let's say $b is a bigint, with overloaded +,-,*,... to do the right thing in the case it's operated with other values. Then, by $a = $b, $a receives the bigint stored in $b, and should inherit (bad word) or copy $b's behaviour of add/subtract/mul/... . As I suppose a new vtable wouldn't be created (as this would cause one vtable per variable on multiple following assignments), I presume the vtable of $b would have to be copied to $a. But now we get into a contradiction: if $b is tied, the vtable should not be copied; but if $b is overloaded, it should be copied. Now what should happen if $b is both tied and overloaded? That's not impossible, since $b can, for example, be tied to a random number generator and be programmed to return bigints of 512 bits (that's actually only to say it's thinkable to have tie and overload together, I'm not expecting someone would do something like this...). Other awkward consequences would happen if $a is tied and $b is overloaded. $a = $b would make $a have to use $b's add/subtract/mul/..., but using $b's vtable in its entirety would untie $a, right? That's actually what made me feel the need for a separation between store/fetch and add/subtract/mul/... . I've been tried to figure it out how your proposal would fit this situation, but I couldn't find a way... I actually don't know if my assumptions are wrong, and tying and overloading would not be handled by set_*/get_* and add/subtract/mul/..., but I actually can't see another way. What do you think about it? - Branden
Re: PDD 2, vtables
++ and -- are already slightly messy in perl5 pp_preinc, pp_postinc, pp_predec and pp_postdec live in with all the ops. They know how to increment and decrement integers that don't overflow, and call routines in sv.c to increment and decrement anything else. Actually, this nearly provides a divide between values and operators that has been suggested, with the speed up hack for the common case. I'm not sure I follow you. What is the "this" in "this nearly provides a divide"? this example. I think the "nearly" probably should go. Maybe I should have written "++ and -- in perl5 provides an example of a (nearly clean) divide between operator and value Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. Confused of Sheffield. Hmm. Yes. I'm confused too. Confused of Newcastle Fancy swapping some cutlery for some Brown Ale? ;-)
Re: PDD 2, vtables
Dan Sugalski wrote: At 03:09 PM 2/7/2001 +, David Mitchell wrote: A mere detail, but would it not be more efficient to just pass them as extra args, ie add(PMC1, PMC2, PMC3, key1, key2, key3), rather than having to potentially create and populate a tmp struct just to call the function??? Well, extra arguments cost. I'd originally had only a single key optionally passed in, but that meant potentially two of the three PMCs had to be real, rather than keyed into containers. (And thus possibly virtual) Hence the key array. I admit it's not a swell decision--I'm not sure whether the single optional parameter is better, or several optional (or required) parameters are better. I can see it going either way, and I didn't put all that much thought into it. I think filling an array costs at least as much as passing parameters, not to mention the cost of passing that array as a parameter, also... Unless the array is constant and can be pre-filled by the compiler (which I think is a somewhat rare case, considering all the three arguments), or once filled there are cases it can be reused, I'm not sure it's worth using an array to save 2 pushes into the stack... - Branden
Re: PDD 2, vtables
On Wed, Feb 07, 2001 at 05:54:14PM +, David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. I'm not either. They feel like they should be operators. But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Nicholas Clark
Re: PDD 2, vtables
David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. Oppose. (Actually I'm talking about my idea on vtables, i.e. separate +/-/* in one vtable and store/fetch in another). My proposal on ++ and -- would be having the `value'-part of the vtable (the one that handles +/-/*) return a value corresponding to what would be the value of it after an increment or decrement. store would be used to actually commit the ++/-- operation. This would serve both postfix and prefix cases, because in one case the value before the store would be used, and in the other the one after. (I just reminded the C++ overloading of ++, that uses a dummy parameter to tell if it's a pre or a post increment. So bad...) - Branden
Re: Another approach to vtables
At 01:35 PM 2/7/2001 -0200, Branden wrote: Dan Sugalski wrote: At 05:41 PM 2/6/2001 -0200, Branden wrote: I actually don't see a reason why the vtable entries should be the opcodes. Is there? Speed. Actually, I don't see the problem of defining a C function that would do: void add(SVAR *result, SVAR *lop, SVAR *rop, SVAL *tmp1, SVAL *tmp2) { /* tmp comes from the temporary file */ lop-vtable-FETCH(tmp1, lop); rop-vtable-FETCH(tmp2, rop); lop-vtable-ADD(tmp1, tmp1, tmp2); result-vtable-STORE(result, tmp1, tmp2); } And have it be my opcode. Passing the indexes wouldn't be a problem either. Is there any problem here? Well, no, but what's the point? If you're calling lop's ADD vtable entry, why not have that entry deal with the rest of it, rather than stick things into, and later remove them, from temp slots? Dan, I see you talk about vtables and opcodes as related things. I really don't see why you think that's necessary, I'd like to hear why you think it. They are. Since the only code that will be calling vtable routines will be the opcode functions, designing the two to go hand in hand makes sense to me. As far as I know (and I could be _very_ wrong), the primary objectives of vtables are: 1. Allowing extensible datatypes to be created by extensions and used in Perl. Secondarily, yes. 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). No, not at all. This isn't really a consideration as such. (The vtable functions as desinged are inadequate for most overloading, for example) 3. Replacing Perl5's SV*,AV*,HV*,... (I don't know if it should replace or complement -- ?) Not really a goal. You forgot #0. Go faster. The big reason is to make the functions that perl calls smaller with fewer branches. We can't avoid branches--code that makes no decisions is generally dull--but we want to make as few as we can. Making the vtable code targeted means the functions don't have to check much at runtime. And some secondary objectives are: 4. Allow the use of different string encodings internally. Nope. Happy side-effect. 5. Allow int's and float's to become bigint's and bigfloat's when an overflow occurs. Nope, not a reason for vtables. You can do it just fine without them. (Just means more code in the opcode functions that do math) Is this right or am I missing something? Missing something, as you can see. That's OK, though. The vtable PDD should have made all this stuff clear in the preamble text. I've been assuming folks have been following along since the beginning. (And possibly know about the other stuff in my head (no, not *that* stuff. The other other stuff...)) The point I want to make, is that vtables are directly related to what tie and overload are today (or sv_magic, if you want the underlying thing [I don't know what's underlying in overload case, as it's apparently not documented]). No, they aren't. tie and overloading are a level or three up from this. So I don't really see why opcodes are in the discussion. For me, at least, tie and overload are related to data, and opcodes with execution. They are orthogonal things, or at least should be (sure that doesn't mean they are not related and should be implemented separately, but they sure can). You're thinking at too high a level, or have too many levels jammed together into one. This really isn't the place to be doing all of overloading, nor all of tying. Also, since the vtable won't be exposed to the rest of the world, we don't want to force ties and overloads to use it, since it means we'll need to change lots of stuff if we toss vtables for some reason. (like, say, their performance turns out to be bad) Of course I understand some opcodes are related to data manipulation, and should of course be modelled after vtables, but they surely can be separated. Before I proposed the code above for `add', taking 3 SVAR's and doing the same as what you proposed (considering no arrays/hashes). I thought about it, and I saw that it could be extended to handle 3 PMC's, instead of SVAR's. If they are SVAL's, they are passed directly to the vtable, otherwise they are fetched/stored using keys that are passed by parameters. The add method would be able to determine it by the TYPE vtable entry (finally found a use for it...). That would actually result in the exact same opcodes that your approach would. Well, not exactly. If you're fetching data out of arrays and hashes, the fetch_scalar/get_value pair may well end up creating temporary scalars. If the source array's declaration is: my @foo : int; Then there aren't any scalars inside of @foo, and fetching them out means creating new ones. That's why the keys are used, so we can avoid that in some cases. Some more about opcode dispatching. What it has to do is: 1. Fetch an instruction (that would be a byte
Re: PDD 2, vtables
At 06:12 PM 2/7/2001 +, Nicholas Clark wrote: On Wed, Feb 07, 2001 at 05:54:14PM +, David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. I'm not either. They feel like they should be operators. But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. That particular case would get caught by the optimizer (I'd hope) so it'd not be an issue anyway. Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Whether a special case in the code would get a speedup or not's up in the air. (Is the test and branch faster than a generic doing it routine?) I'd want to test that and see before I decided. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: PDD 2, vtables
At 04:15 PM 2/7/2001 -0200, Branden wrote: David Mitchell wrote: Well, many of the vtable methods are operator-ish rather than value-ish, presumably on the grounds of efficiency. A pure 'value' vtable wouldnt have add(), concatenate() etc. Whihc leads me back to: I'm not sure whether you are in favour of, or oppose, += etc being vtable methods. Oppose. (Actually I'm talking about my idea on vtables, i.e. separate +/-/* in one vtable and store/fetch in another). My proposal on ++ and -- would be having the `value'-part of the vtable (the one that handles +/-/*) return a value corresponding to what would be the value of it after an increment or decrement. store would be used to actually commit the ++/-- operation. This would serve both postfix and prefix cases, because in one case the value before the store would be used, and in the other the one after. Splitting the vtable into two pieces, with one piece not tied to a PMC, makes some things impossible. Consider this: @foo = @bar * @baz; where all three arrays are really matrix types. In the separate load/store and do vtable scheme it means you get the value of @bar and @baz in scalar context, and multiply the results. Two operations, and the resultant values are sanitzed. In the single vtable scheme, we'd execute @bar's multiply routine, which would be clever enough (because we wrote it that way) to see the second parameter's also a matrix, and do matrix math. Splitting things up also loses information when moving data between the two vtables routines. While that's not a big deal generally (as the info lost is irrelevant) it forbids some rather interesting side cases. (I just reminded the C++ overloading of ++, that uses a dummy parameter to tell if it's a pre or a post increment. So bad...) I'm not sure it's worth having both a preinc and postinc operator, as opposed to splitting it into an inc/fetch or fetch/inc pair. (And yes, I know, earlier I was arguing that one opcode's better than two. This one's rare enough that profiling it would probably be in order...) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Another approach to vtables
Dan Sugalski wrote: 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). No, not at all. This isn't really a consideration as such. (The vtable functions as desinged are inadequate for most overloading, for example) Well, if it's not tie/overload, I didn't really understand why a vtable would have to be attached to a variable. I'd really like to see an example of variables whose vtables would have set_* and get_* different one from another, and another example of variables whose vtables would have add/subtract/mul/... different one from another. What happens with vtables on assignment? (in $a = $b, $a copies its vtable from $b or not?) And I really don't see why tie/overload couldn't be handled in a level below the level of the opcodes (in a sense that one opcode calls various methods of a (potentially) tied/overloaded variable/value). The example of `my @a :int' really shows your point. I was actually thinking current Perl5 syntax as a target, and I really wouldn't know how to deal with this... (but sure I'll think about it!) - Branden
Re: Another approach to vtables
Dan, I think there is a real problem with your vtable approach. [ etc etc ] I think there's an important misconception about tieing and overloading going on hre which I will attempt to clear up. (Then Dan and co can point out that I;'m I;m wrong and have just made matters worsse ;-) First off, people should be very clear that there are two *completely* different types of tieing and overloading possible in Perl 6, which I shall call Perl-mode and C-mode for convenience. Perl 5 only supports Perl-mode tieing and overloading - ie, where the tied or overloaded functions that get called are Perl functions. This is slow and heavyweight, but it is easy to code (ie you write a Perl module with a few functions). In addition, Perl 6 allows C-mode tieing and overloading. This is where the tied or overloaded functions that get called are C functions. This is much more low-level, but much faster. It's also much harder to code. C-mode tieing and overloading is what vtables are. Ie you can write a custom data type and have control over how its data is accessed, and how its data is operated on. Note also that C-mode overloading and Perl-mode overloading are quite different at the language level: Perl 5 overloading is really the overloading of *reference* operators. Ie if you have my ($a, $b); my ($ra, $rb) = (\$ra, \$rb); Then at this point $ra + $rb gives you nonsense (last time I looked, it returned the sum of the 2 addresses). Now if you bless $a, $b into some overloaded class, then $ra + $rb suddenly starts doing what you want - ie you have overloaded the defintion of addition *on references*. Importantly, $a + $b still doesnt do what you want. Under Perl 6, with C-mode overloading, $a + $b *will* DWIM. Interestingly enough, perl-mode tieing and overloading will presumably be implemented using vtables. For example, when you call the add() vtable method associated with a tied variable, that add() method just calls the (Perl) FETCH function from the relevant module, then just passes control on to the add() vtable method associated with the PMC returned by FETCH, passing through the original args. Thus, the only(-ish) place within the perl src that needs to understand about Perl-mode ties and overloading is within a handful vtable classes
Re: Another approach to vtables
Branden wrote: Well, if it's not tie/overload, I didn't really understand why a vtable would have to be attached to a variable. I'd really like to see an example of variables whose vtables would have set_* and get_* different one from another, and another example of variables whose vtables would have add/subtract/mul/... different one from another. Try to answer that myself: my @a : int; @a = (1, 2, 3); @a would have set_* and get_* different from the same entries in the usual array of scalars. @a = @b * @c when @b and @c are matrixes instead of arrays, mul would be different from the same entries in the usual array of scalars. What happens with vtables on assignment? (in $a = $b, $a copies its vtable from $b or not?) Already answered, $a can change it's PMC by another one, and ties should be done above that. And I really don't see why tie/overload couldn't be handled in a level below the level of the opcodes (in a sense that one opcode calls various methods of a (potentially) tied/overloaded variable/value). I think this still would be a good thing, at least in cases scalars are considered (I'll still think a little about my @a : int...). - Branden
Re: PDD 2, vtables
I'm not either. They feel like they should be operators. But I don't like the thought of going in and out of a lot of generic routines for $a = 3; $a += 2; when the integer scalar ought to know what the inside of another integer scalar looks like, and that 2 + 3 doesn't overflow. That particular case would get caught by the optimizer (I'd hope) so it'd not be an issue anyway. Hmm. += isn't another opcode it's a special case of a = b + c where the PMCs for a and b are the same thing. And I see no real reason why it can't be part of the + entry. Whether a special case in the code would get a speedup or not's up in the air. (Is the test and branch faster than a generic doing it routine?) I'd want to test that and see before I decided. Are we all clear then, that in perl 6, since the opcodes etc are no longer allowed to rummage around in the internals of a PMC, its purely a question of whether $a += 3 invokes add($a,$a,3) or eqadd($a,3) and whether $a++ invokes add($a,$a,1) or postinc($a) etc? And that this decision is mainly a 'time it and see' decision?
Re: PDD 2, vtables
Dan, before I followup your reply to my list of nits about the PDD, can I clarify one thing: destruction. I am assuming that many PMCs will require destruction, eg calling destroy() on a string PMC will cause the memory used by the string data to be freed or whatever. Only very simple PMCs (such as integers) need to do no detruction. Is this the same as your perception of reality :-) ? I also gather that PMCs will have a flag saying whether they need destroying, (eg ints say no, strings say yes), and that calls to destroy() are preceeded by a check on this flag for efficiency?
Re: Another approach to vtables
David Mitchell wrote: Perl 5 only supports Perl-mode tieing and overloading - ie, where the tied or overloaded functions that get called are Perl functions. This is slow and heavyweight, but it is easy to code (ie you write a Perl module with a few functions). Actually, I think Perl 5 supports tying in C level, through sv_magic. I think overloading is also possible, however I've never seen documentation on that. In addition, Perl 6 allows C-mode tieing and overloading. This is where the tied or overloaded functions that get called are C functions. This is much more low-level, but much faster. It's also much harder to code. C-mode tieing and overloading is what vtables are. Ie you can write a custom data type and have control over how its data is accessed, and how its data is operated on. [...] Interestingly enough, perl-mode tieing and overloading will presumably be implemented using vtables. [...] Thus, the only(-ish) place within the perl src that needs to understand about Perl-mode ties and overloading is within a handful vtable classes This is exactly what I was talking about, but Dan told me that's not what he wants with vtables. At least I see I'm not the only one with this wrong idea... Note also that C-mode overloading and Perl-mode overloading are quite different at the language level: Perl 5 overloading is really the overloading of *reference* operators. Ie if you have Agreed. Don't see why Perl 6 shouldn't be different, e.g. have "foo" with a + overloaded to do concatenation. Now if you bless $a, $b into some overloaded class, then $ra + $rb suddenly starts doing what you want - ie you have overloaded the defintion of addition *on references*. Importantly, $a + $b still doesnt do what you want. Under Perl 6, with C-mode overloading, $a + $b *will* DWIM. But then I don't see how vtables will propagate in an assignment. If $b has a bigint and I do $a = $b, $a should copy the add/subtract/mul/... entries of $b's vtable, because without them a bigint doesn't really make sense. In the other side, if $a copies $b's vtable, and $b is tied to something, $a will be tied to the thing $b is tied to, what is wrong, since this is an assignment and not an aliasing. Copying and not copying is not possible. Copying partially (only add/subtract/mul/...) is only possible if the entries are in separate vtables, since there are probably many PMC's pointing to one vtable (meaning it shouldn't be modified) and creating per-variable vtables is a bad idea (it's just like saying to the user he has a 256-byte (or more) storage to a 32-bit integer, because of all vtable entries). For example, when you call the add() vtable method associated with a tied variable, that add() method just calls the (Perl) FETCH function from the relevant module, then just passes control on to the add() vtable method associated with the PMC returned by FETCH, passing through the original args. Tying is clear to me. I only see a problem with overloading on assignment, that clearly cannot co-exist with tie, as I explained above. - Branden
Re: Another approach to vtables
At 01:14 PM 02-07-2001 -0500, Dan Sugalski wrote: At 01:35 PM 2/7/2001 -0200, Branden wrote: As far as I know (and I could be _very_ wrong), the primary objectives of vtables are: 1. Allowing extensible datatypes to be created by extensions and used in Perl. Secondarily, yes. 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). No, not at all. This isn't really a consideration as such. (The vtable functions as desinged are inadequate for most overloading, for example) Hmm, I seem to remember vtables were being cited as a cure for lots of ills (perhaps combined with other aspects, like "make Perl nearly as fast as C".) The vtables were implied (or possibly out-right stated) as giving the low-level core a more object-oriented structure: as you state below, branching and conditionals in the runtime can be eliminated by the values knowing how to operate on themselves. It was also implied (or out-right claimed) that different objects/classes/packages/whatever could have class-specific vtables, defined at run-time, that would be used to handle the class-specific implementation details. I'm not sure what that could refer to except ties and overloading; class-specific methods wouldn't go in the vtable. There was some discussion that allowing the vtables to refer to functions written in perl would be a good idea, as it would allow extensions to be written in perl -- which is a good thing. I had gotten the impression that the perl code-sequence: $a = $b + $c; would generate the same op-code sequence regardless of the type of $a, $b, $c, and the vtables would do all the magic behind the scenes, calling tied or overloaded versions of the base functions if so defined for $a, $b, or $c. Now I seem to be hearing that this is not the case, that variable ties and overloads are at a much higher level, never touching the vtables. It now seems that the vtables will exist only for built-in types, and be inaccessible for user-defined types (unless those types are defined by the perl6 equivilant of XS, for example). This almost seems to be defaulting on the promise of vtables I thought was made.
Re: Another approach to vtables
Dan Sugalski wrote: At 01:35 PM 2/7/2001 -0200, Branden wrote: 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). No, not at all. This isn't really a consideration as such. (The vtable functions as desinged are inadequate for most overloading, for example) [...] Is this right or am I missing something? Missing something, as you can see. That's OK, though. The vtable PDD should have made all this stuff clear in the preamble text. I've been assuming folks have been following along since the beginning. (And possibly know about the other stuff in my head (no, not *that* stuff. The other other stuff...)) In http:[EMAIL PROTECTED]/msg00464.html Nick Ing-Simmons talks about using vtables to implement 'magic hacks' of Perl 5, which are ties. He says ``(they are even called vtables in the sources)''. In http:[EMAIL PROTECTED]/msg00494.html Ken Fox says ``It should be possible to use the dispatch tables to easily implement overloading because operators should map fairly easily into the table.'' In the recent http:[EMAIL PROTECTED]/msg02376.html David Mitchell talks about `C-mode' tying and overloading, which is the same I was thinking about. I really haven't been following along since the beginning, but I see I'm not the only one who got the wrong point. The conclusion I'm reaching is that we're definitely talking about different things. You are talking about fast opcode dispatching through clever dispatch tables. I'm talking about using polymorphism to implement tying and overloading in the low-level. I guess they are different, right? What I would like to know: * the things we're talking about are compatible or not? * can they be built into just one, that makes fast opcode dispatch and easy tying/overloading? * can one use another to be implemented? * what is a vtable after all? * do all this stuff I wrote even _make_ sense? - Branden
Re: PDD 2, vtables
Dan Sugalski wrote: Splitting the vtable into two pieces, with one piece not tied to a PMC, makes some things impossible. Consider this: @foo = @bar * @baz; where all three arrays are really matrix types. By the PDD's notion of `key', what would be the `key' of a matrix type ? (I think that's actually a -language question, but) What $foo[42] (where @foo is matrix) would compile to? In the separate load/store and do vtable scheme it means you get the value of @bar and @baz in scalar context, and multiply the results. Two operations, and the resultant values are sanitzed. In the single vtable scheme, we'd execute @bar's multiply routine, which would be clever enough (because we wrote it that way) to see the second parameter's also a matrix, and do matrix math. Actually, not necessarily. It depends of what the compiler does... There could be special entries for array operations, like +/-/*/... . The problem I see with it is what happens when you @a = @b. Actually, if @b is a matrix, @a = @b makes @a a matrix or evaluates @b in list context? What about @a = (@b) ? What if @a is a tied array? This matrix thing is actually getting very confusing to me... I think all these proposed additions to the language should be carefully examined for possible mis-interpretations like these. - Branden
Re: PDD 2, vtables
At 06:08 PM 2/7/2001 -0200, Branden wrote: Dan Sugalski wrote: Splitting the vtable into two pieces, with one piece not tied to a PMC, makes some things impossible. Consider this: @foo = @bar * @baz; where all three arrays are really matrix types. By the PDD's notion of `key', what would be the `key' of a matrix type ? Probably an integer, possibly a list for multidimensional matrices. (And I haven't thought about how to handle that--probably force a series of index lookups) (I think that's actually a -language question, but) What $foo[42] (where @foo is matrix) would compile to? Identically to how $foo[42] would if @foo were a plain array. In the separate load/store and do vtable scheme it means you get the value of @bar and @baz in scalar context, and multiply the results. Two operations, and the resultant values are sanitzed. In the single vtable scheme, we'd execute @bar's multiply routine, which would be clever enough (because we wrote it that way) to see the second parameter's also a matrix, and do matrix math. Actually, not necessarily. It depends of what the compiler does... There could be special entries for array operations, like +/-/*/... . The problem I see with it is what happens when you @a = @b. Actually, if @b is a matrix, @a = @b makes @a a matrix or evaluates @b in list context? That's a language issue. I don't know--I can see it going either way. I'd prefer a straight assign and let the assignment vtable entry handle it, but I don't know that we'll have that option. What about @a = (@b) ? Good question. I'd like to see it handled the same way as @a=@b, but I'm not sure that's going to happen. It's Larry's decision. (Mainly because I'd like to see this: @a = (@b, @c); turn into: @a = @b; push @a, @c; but I don't know that we'll be able to) What if @a is a tied array? What if? Larry's call as to whether it makes @a a copy of the data from @b, or creates a new tied thing, or an alias. Probably @a would be a plain copy of @b with no magic, but that's not my call. This matrix thing is actually getting very confusing to me... I think all these proposed additions to the language should be carefully examined for possible mis-interpretations like these. I'm not proposing they go in (well, OK, I am, but I'm not forcing it). What I am doing is trying to not preclude the possibility if its decided that it will happen. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Another approach to vtables
At 02:33 PM 2/7/2001 -0500, Buddha Buck wrote: At 01:14 PM 02-07-2001 -0500, Dan Sugalski wrote: At 01:35 PM 2/7/2001 -0200, Branden wrote: As far as I know (and I could be _very_ wrong), the primary objectives of vtables are: 1. Allowing extensible datatypes to be created by extensions and used in Perl. Secondarily, yes. 2. Making the implementation of `tie' and `overload' more efficient ('cause it's very slow in Perl 5). No, not at all. This isn't really a consideration as such. (The vtable functions as desinged are inadequate for most overloading, for example) Hmm, I seem to remember vtables were being cited as a cure for lots of ills (perhaps combined with other aspects, like "make Perl nearly as fast as C".) Yep, and they'll wash your windows and do your laundry, too. :) The vtables were implied (or possibly out-right stated) as giving the low-level core a more object-oriented structure: as you state below, branching and conditionals in the runtime can be eliminated by the values knowing how to operate on themselves. It was also implied (or out-right claimed) that different objects/classes/packages/whatever could have class-specific vtables, defined at run-time, that would be used to handle the class-specific implementation details. I'm not sure what that could refer to except ties and overloading; class-specific methods wouldn't go in the vtable. While defining tie and overload functions will affect the vtable generated for a package, users generally won't be writing all the vtable functions, nor will they be writing directly to the vtable spec. The current list of functions defined as needed for tieing and overloading are sufficient, with some glue, to build up a fully functional vtable. I really, *really* don't want people skipping the tie or overload interface and heading straight for the low-level vtable interface, since they'll be really screwed if we change how vtables are defined or work, or scrap them altogether. There was some discussion that allowing the vtables to refer to functions written in perl would be a good idea, as it would allow extensions to be written in perl -- which is a good thing. I had gotten the impression that the perl code-sequence: $a = $b + $c; would generate the same op-code sequence regardless of the type of $a, $b, $c, and the vtables would do all the magic behind the scenes, calling tied or overloaded versions of the base functions if so defined for $a, $b, or $c. Unfortunately overloading presents certain problems. Generally speaking we're OK if a custom variable behaves as if it were a builtin type, but when it doesn't the vtable scheme isn't sufficient. For example, this: @foo = 12 * @bar; will cause problems if @bar is an odd type. We'll be using 12's vtable (the integer one) as the source of the functions to do things, while it's actually the right-hand side that's messed up. With some things, + and * particularly, we can probably get by with some arithmetic reordering. With - and /, though, we can't since those operations aren't commutative. Because of this, unless Larry says "Left hand side wins for custom types", we need more support than is in the vtable definitions to manage things. That support will be provided by the compiler, assuming you write to the tie or overload interface and not directly to the vtable interface. Now I seem to be hearing that this is not the case, that variable ties and overloads are at a much higher level, never touching the vtables. It now seems that the vtables will exist only for built-in types, and be inaccessible for user-defined types (unless those types are defined by the perl6 equivilant of XS, for example). This almost seems to be defaulting on the promise of vtables I thought was made It will be possible to write full vtables in perl--that's one of the things that Larry wants. (He wants to be able to call perl subs like C functions, so it pretty much follows) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: PDD 2, vtables [pointers to related documentation]
From: Tim Bunce [mailto:[EMAIL PROTECTED]] On Tue, Feb 06, 2001 at 12:28:23PM -0500, Dan Sugalski wrote: At 11:26 AM 2/6/2001 +, Tim Bunce wrote: On Mon, Feb 05, 2001 at 05:14:44PM -0500, Dan Sugalski wrote: =head2 Core datatypes For ease of use, we define the following semi-abstract data types Probably worth stating upfront that it'll be easy to add new types to avoid people arguing for their favorite type to be added here. I'm not sure it should be--that'd mean extending the vtables in ways they have little room to grow. Adding new perl datatypes is easy, adding new low-level types is harder. That's pretty much what I meant. I think it's worth saying. Adding comments like the ones Tim is suggesting, are just what someone like myself needs. A statement of the obvious and the context in which it fits... for those who haven't a clue and are trying to piece together a conceptual model of system. It saves me the conflict of deciding whether or not to pester you all with questions or not ask, never know and continue to be my own little mushroom. =head1 REFERENCES PDD 3: Perl's Internal Data Types. Some references to any other vtable based languages would be good.(I presume people have looked at some and learnt lessons.) Alas not. This is pretty much head of zeus stuff, modulo some ego. (Mine's not *that* big...) Without studying history we may be doomed to repeat it. So can anyone point to vtable based language implementations? Well, I may be one of the least qualified subscribers on this list, but I'm a pretty good gopher... Some of this relates to languages implementing vtables as opposed to being implemented with them. Everything I've scanned so far seems to raise the flag concerning overhead associated... Title:Portable Inheritance and Polymorphism in C URL: http://www.embedded.com/97/fe29712.htm Abstract: A lower-level view that assumes only a procedural language like C for embedded developers who want to apply OO without switching to an OO language Title:Programming Language Pragmatics URL: http://www.amazon.com/exec/obidos/ASIN/1558604421 Abstract: Mentions virtual methods and tables in Sections 10.4-5. It discusses vtables from a high level in the general context of Eiffel, Simula, C++, and Ada. Title:SableVM: A Research Framework for the Efficient Execution of Java Bytecode URL: http://www.j-meg.com/~egagnon/sable-report_2000-3/ Abstract: SableVM is an open-source virtual machine for Java2, intended as a research framework for efficient execution of Java bytecode. The framework is essentially composed of an extensible bytecode interpreter using state-of-the-art and innovative techniques. Written in the C programming language, and assuming minimal system dependencies, the interpreter emphasizes high-level techniques to support efficient execution. In particular, we introduce new data layouts for classes, virtual tables and object instances that reduce the cost of interface method calls to that of normal virtual calls, allow efficient garbage collection and light synchronization, and make effective use of memory space. Title:C++ Producer Guide URL: http://www.cse.unsw.edu.au/~patrykz/TenDRA/tcpplus/lib.html#vtable Abstract: vtable implementation in C++ Title:C++ ABI for IA-64 URL: http://reality.sgi.com/dehnert_engr/cxx/abi.html Abstract: vtables layout, etc. is discussed in sections 2.5-2.6 and scattered throughout. You can find similar information in C++ ABI documentation for Macintosh, etc.
Re: yydestruct leaks
[EMAIL PROTECTED] wrote: Hmm, so this is (kind of) akin to the regcomp fix - it is the "new" stuff that is in yyl?val that should be free-d. And it is worse than that as yyl?val is just the topmost the parser state, so if I understand correctly it isn't only their current values, but also anything that is in the parser's stack (ysave-yyvs) at the time-of-death that needs to go. And all of those use the horrid yacc-ish YYSTYPE union, so we don't know what they are. Yacc did, it had a table which mapped tokens to types which it used to get union assigns right. But byacc does not put that info anywhere for run-time to use, so to get it right we would need to re-process perly.y and then look at the state stack as we popped it. Yup - that's about the size of it. Yugh. The way I usually do this is make YYSTYPE an "object" or something like a Pascal variant record - which has a tag. That was my idea - I just couldn't figure out any clean way of capturing the type information. If only byacc had a $$type variable as well as $$ etc... This would not be easy to fix for perl5. The best I can come up with is to make them all OP *, inventing special parse-time-only "ops" which can hold ival/pval/gvval values. Then yydestruct could just free the ops in yylval and yyvs[], freeing a gvalOP or pvalOP would do the right thing. Almost certainly far more than we want to do to the maint branch. That seems workable, although as you say, far too radical for the maint branch :-( The other way this mess is handled is to use a "slab allocator" e.g. GCC used an "obstack" - this allows all the memory allocated since one noted a "checkpoint" to be free-d. One could fake that idea by making malloc "plugable" and plugging it during parse to build a linked list or some such. Well, that's kinda what we have with the scope stack, the problem is that you don't know the type of the thing that needs freeing. The down side of that scheme is that auxillary allocators tend to upset Purify like tools almost as much as memory leaks do. I've tried 2 approaches to this. The first is to add "#ifdef PURIFY" code to pp_ctl.c along the lines of the following: S_doeval(...) { ... /* Flush any existing leaks to the log */ purify_new_leaks(); ... if (yyparse() == failed) { ... /* Ignore any leaks */ purify_clear_leaks(); } ... } However I'm still suspicious of this because of the number of leaks that only appear when S_doeval is somewhere in the stack trace. The other approach is to postprocess the purify log and ignore anything that has S_doeval or Perl_pp_entereval in the stack. That's the approach I'm currently using, but of course it ignores any real leaks that coincidentally appear within an eval. I think I'll try getting rid of as many leaks as possible under this restricted regime - even with this restriction, and with the bugs I've already fixed the test suite contains 141 memory errors. The truth of the matter is that I suspect eval and die will always leak until it is re-architected in perl6 - whenever that might be. Alan Burlison
Re: yydestruct leaks
Alan Burlison [EMAIL PROTECTED] writes: If an eval{} fails because of a snytax error, yydestroy is called on leaving the eval scope. Unfortunately it does this: yyval = ysave-oldyyval; yylval = ysave-oldyylval; So anything that is in those 2 vars that hasn't made its way into the parse tree is lost forever. I know this is an old problem, but I've been trying to think of a way to fix it. Has anybody any suggestions? Hmm, so this is (kind of) akin to the regcomp fix - it is the "new" stuff that is in yyl?val that should be free-d. And it is worse than that as yyl?val is just the topmost the parser state, so if I understand correctly it isn't only their current values, but also anything that is in the parser's stack (ysave-yyvs) at the time-of-death that needs to go. And all of those use the horrid yacc-ish YYSTYPE union, so we don't know what they are. Yacc did, it had a table which mapped tokens to types which it used to get union assigns right. But byacc does not put that info anywhere for run-time to use, so to get it right we would need to re-process perly.y and then look at the state stack as we popped it. Yugh. The way I usually do this is make YYSTYPE an "object" or something like a Pascal variant record - which has a tag. This would not be easy to fix for perl5. The best I can come up with is to make them all OP *, inventing special parse-time-only "ops" which can hold ival/pval/gvval values. Then yydestruct could just free the ops in yylval and yyvs[], freeing a gvalOP or pvalOP would do the right thing. Almost certainly far more than we want to do to the maint branch. The other way this mess is handled is to use a "slab allocator" e.g. GCC used an "obstack" - this allows all the memory allocated since one noted a "checkpoint" to be free-d. One could fake that idea by making malloc "plugable" and plugging it during parse to build a linked list or some such. The down side of that scheme is that auxillary allocators tend to upset Purify like tools almost as much as memory leaks do. -- Nick Ing-Simmons