Re: This week's summary
On Wed, 22 Sep 2004 21:11:02 +0100, The Perl 6 Summarizer <[EMAIL PROTECTED]> wrote: > The Perl 6 Summary for the week ending 2004-09-17 >Another week, another summary, and I'm running late. So: > > This week in perl6-compiler > > Bootstrapping the grammar >Uri Guttman had some thoughts on bootstrapping Perl 6's grammar. He >hoped that his suggested approach would enable lots of people to work on >the thing at once without necessarily getting in each other's way. Adam >Turoff pointed everyone at a detailed description of how Squeak (a free >Smalltalk) got bootstrapped. > >http://xrl.us/c6kp This link doesn't seem to be working, and www.perl6.org doesn't have the archives of perl6-compiler online yet. Does anyone have a link to the archives that works?
Re: Please rename 'but' to 'has'.
At 09:45 AM 04-26-2002 -0700, Larry Wall wrote: >Tim Bunce writes: >: For perl at least I thought Larry has said that you'll be able to >: create new ops but only give them the same precedence as any one >: of the existing ops. > >Close, but not quite. What I think I said was that you can't specify >a raw precedence--you can only specify a precedence relative to an >existing operator. That way it doesn't matter what the initial >precedence assignments are. We can always change them internally. > >: Why not use a 16 bit int and specify that languages should use >: default precedence levels spread through the range but keeping the >: bottom 8 bits all zero. That gives 255 levels between '3' and '4'. >: Seems like enough to me! >: >: Floating point seems like over-egging the omelette. > >It's also under-egging the omelette, and not just because you >eventually run out of bits. I don't think either integer or floating >point is the best solution, because in either case you have to remember >separately how many levels of derivation from the standard precedence >levels you are, so you know which bit to flip, or which increment to >add or subtract from the floater. So you'd have something like: sub operator:mult($a, $b) is looser('*') is inline {...} sub operator:add($a, $b) is tighter("+") is inline {...} sub operator:div($a,$b) is looser("/") is inline {...} assuming default Perl5 precedences for *, *, and / you would have the precedence strings for *, +, /, mult, add, and div to be "S", "R", "S", "S2", "S1", "S2" respectively? So mult and div would have the same precedences? Hmmm What problems would be caused by: sub operator:radd($a,$b) is tighter("+") is inline is rightassociative {...} sub operator:ladd($a,$b) is tighter("+") is inline is leftassociative {...} Right now, all the operator precedence levels in Perl5 have either right, left, or no associativity, but they do not mix right and left associative operators. Will that be allowed in Perl6? >Larry
Re: PMCs, setting, and suchlike things
At 03:43 PM 02-13-2002 +, Dave Mitchell you wrote: >Dan Sugalski <[EMAIL PROTECTED]> wrote: > > >So in the following: > > > > > >my Complex $c = 3+4i; > > >my $plain = 1.1; > > >$plain = $c; > > > > > >I presume that $plain ends up as type Complex (with value 3+4i)? > > > > Yup. > > > > >If so, how does $plain know how to "morph itself into the RHS's type"? > > > > The general rule is: If a PMC is not a fixed type, it tosses its > > contents and becomes whatever's assigned to it. If it is a fixed > > type, it extracts what it can as best it can from the source and uses > > that. > >Thanks. >I just want to assert/clarify that the job of "becoming whatever's >assigned to it" is delegated to the src PMC, since $plain won't itself know >how to do this? I assumed that the logic for assigning PMC to PMC would be something like: if (destPMC is specified as typeX) { if (srcPMC ISA typeX) { destPMC <- srcPMC } else { destPMC <- typeX.convert(srcPMC); } } else { destPMC <- srcPMC } in pseudocode form. If we assume that there is a universal "root" type such that all PMC's are ISA typeRoot, and that typeX.convert(PMCofTypeY) is trivial if typeY ISA typeX, then this simplifies to destPMC <- destPCM.declaredtype.convert(srcPMC); Why does that look too simple?
Re: scheme-pairs?
At 11:32 AM 01-24-2002 -0500, Dan Sugalski wrote: >At 4:19 PM + 1/24/02, Dave Mitchell wrote: >>Dan Sugalski <[EMAIL PROTECTED]> wrote: >>> That was my biggest objection. I like the thought of having a scheme >>> pair data type. The interpreter should see it, and it should be >>> accessed, as a restricted array, one with only two entries. >> >>Is this then the same datatype as a Perl6 pair (cf '=>' op in Apo 3) ?? > >Good point. it probably is, yes. (Though there may be potential >differences--depends on whether the scheme pair can only have scalars on >each side, or should allow other things) In scheme, at least, pairs can contain any data on either side. The notation for a pair is (value . value), and standard list notation (a b c d e f g) is simply syntactic sugar for (a . (b . (c . (d . (e . (f . (g . '(. Although only the cdr of these pairs contain pairs, in a list like ((a a) (b b)) (also written as "((a . (a . '())) . ((b . (b . '())) . '()))"), both the car and cdr of the outermost pair contain pairs.
Re: RFC: Bytecode file format
At 03:10 PM 09-14-2001 -0500, Brian Wheeler wrote: >I've been thinking alot about the bytecode file format lately. Its >going to get really gross really fast when we start adding other >(optional) sections to the code. > >So, with that in mind, here's what I propose: >What do you guys think? Have you taken a look at the old Amiga IFF format? It consisted mainly of "chunks" identified by a 32-bit type code and a chunk-length code. While most implementations were for specific multi-media applications (chunks defining sound formats, chunks defining image formats, etc), the standard itself was data-neutral. I believe that Microsoft is using a derivative of that format for some of its files, and I think that TIFF files are another instantiation. It may be worth looking at to avoid re-inventing wheels. >Brian
Re: Using int32_t instead of IV for code
At 04:55 PM 09-13-2001 -0400, Andy Dougherty wrote: >In perl.perl6.internals, you wrote: > > >The attached patch makes all bytecode have a type of int32_t rather than > >IV; it also contains the other stuff I needed to get the tests running > >on my Alpha (modifications to config.h.in and register.c). > >I think this is a bad idea. There simply is no guarantee that there's >a native integral type with 32 bits. And having an int32_t type that >*isn't* 32-bits is just plain confusing. Just ask anyone who's gotten >burnt by perl5's I32, which has the exact same problem. Well, since bytecode is defined to be 32-bit, it makes sense to define it as an int32_t type and have the definition of an int32_t be platform-specific.
Re: Math functions? (Particularly transcendental ones)
Dan Sugalski <[EMAIL PROTECTED]> writes: > At 07:43 PM 9/8/2001 -0700, Wizard wrote: > >Questions regarding Bitwise operators: > > > > > =item rol tx, ty, tz * > >... > > > =item ror tx, ty, tz * > > > >Are these with or without carry? > > That's a good question. Now that we have a list of bitwise ops, we can > decide how they work. What happens when you rotate/shift/bit-or a float? Or > a bitint/bigfloat? Or a string? Important questions, and we can hammer > something out now that we know what they are. I'd like to suggest that the shift- and roll/rotate- ops take a 4th parameter, that being the "word"-size in bits. For Bigints and arbitrary-length bit-vectors, the size of a "word" to rotate or shift could be infinite, probably isn't what is wanted. It would also make simpler such operations that might come up in some cryptographic routines, like "rotate the upper 64 bits left 3 bits", which would be encoded as (assuming "rotate_l dest, source, roll-amount, wordsize") rotate_l P1, P1, 64, 128 rotate_l P1, P1, 3, 64 rotate_r P1, P1, 64, 128 Just my 2 centums.
Re: Math functions? (Particularly transcendental ones)
Dan Sugalski <[EMAIL PROTECTED]> writes: > Okay, I'm whipping together the "fancy math" section of the interpreter > assembly language. I've got: > Can anyone think of things I've forgotten? It's been a while since I've > done numeric work. Uri mentioned exp(x) = e^x, but I think if you are going to include log2, log10, log, etc, you should also include ln.
Re: pads and lexicals
At 10:45 AM 09-06-2001 -0400, Ken Fox wrote: >Dave Mitchell wrote: > > So how does that all work then? What does the parrot assembler for > > > > foo($x+1, $x+2, , $x+65) > >The arg list will be on the stack. Parrot just allocates new PMCs and >pushes the PMC on the stack. > >I assume it will look something like > > new_pmc pmc_register[0] > add pmc_register[0], $x, 1 > push pmc_register[0] > > new_pmc pmc_register[0] > add pmc_register[0], $x, 2 > push pmc_register[0] > > ... > > call foo, 65 Hmmm, I assumed it would be something like: load $x, P0 ;; load $x into PMC register 0 new P2 ;; Create a new PMC in register 2 push p0,p2 ;; Make P2 be ($x) add p0,#1,p1;; Add 1 to $x, store in PMC register 1 push p1,p2 ;; Make P2 be ($x,$x+1) add p0,#2,p1;; Add 2 to $x, store in PMC register 1 push p1,p2 ;; Make P2 be ($x,$x+1,$x+2) ... call foo,p2 ;; Call foo($x,$x+1,...,$x+65) Although this would be premature optimization, since I see this idiom being used a lot, it may be useful to have some special-purpose ops to handle creating arg-lists, like a "new_array size,register" op, that would create a new PMC containing a pre-sized array (thus eliminating repeatedly growing the array with the push ops), or a "push5 destreg, reg1, reg2, reg3, reg4, reg5" op (and corresponding pushN ops for N=2 to 31) that push the specified registers (in order) onto the destreg. >Hmm. It didn't occur to me that raw values might go on the call >stack. Is the call stack going to store PMCs only? That would >simplify things a lot. If ops and functions should be able to be used interchangeably, I wouldn't expect any function arguments to be stored on the stack, but passed via registers (or lists referenced in registers). >- Ken
Re: More character matching bits
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: > > Perl came from ASCII-centric roots, so it's likely that most of our > > biases are ASCII-centric. And for a couple of reasons, it's going to > > be hard to deal with that: > > > > 1. Backwards compatability with existing Perl practice, > > > > and > > > > 2. To do language-neutral right is -really- hard; look at locales and > > Unicode as examples. > > > > As such, instead of trying to make Perl work for all languages out of > > the box, why not make Perl's language handling extensible from within > > the language and have it be as language-free as possible (except for > > backwards compatability stuff) out of the box. > > Right on. > > > Examples of what we can do: > > > > I. Make ranges work on Unicode code-points (if they don't already). > > U, yes, they do, if you by code-point ranges mean \x{...}-\x{...} > but in general I would like to discourage the use of ranges. What do > you think [a-\N{KATAKANA LETTER KI}] should mean? I think it should > mean a compile time error. People misuse ranges for classes. Ranges > also imply some collation, which is, as discussed, really bad. I think, following my line of thought, that [a-\N{KATAKANA LETTER KI}] should be equivalent to [\x{0061}-\x{30AD}], which would match any of the 12365 characters between \x{0061} and \x{30AD}. Admittedly, this probably isn't that useful of a class, but it's what I see was asked for. Collation is something I hadn't considered. My initial thought would be that by default, collation order would be code-point order, but that should probably be able to be overridden. Code-point order at least allows us to collate 'a' and KATAKANA LETTER KI, which I can't think of any other sensible way to do it. > > -- > $jhi++; # http://www.iki.fi/jhi/ > # There is this special biologist word we use for 'stable'. > # It is 'dead'. -- Jack Cohen
Re: More character matching bits
Dan Sugalski <[EMAIL PROTECTED]> writes: > > We probably also ought to answer the question "How accommodating to > non-latin writing systems are we going to be?" It's an uncomfortable > question, but one that needs asking. Answering by Larry, probably, but > definitely asking. Perl's not really language-neutral now (If you think so, > go wave locales at Jarkko and see what happens... :) but all our biases are > sort of implicit and un (or under) stated. I'd rather they be explicit, > though I know that's got problems in and of itself. Perl came from ASCII-centric roots, so it's likely that most of our biases are ASCII-centric. And for a couple of reasons, it's going to be hard to deal with that: 1. Backwards compatability with existing Perl practice, and 2. To do language-neutral right is -really- hard; look at locales and Unicode as examples. As such, instead of trying to make Perl work for all languages out of the box, why not make Perl's language handling extensible from within the language and have it be as language-free as possible (except for backwards compatability stuff) out of the box. Examples of what we can do: I. Make ranges work on Unicode code-points (if they don't already). II. Make POSIX-style character classes (e.g. [:space:]) user-definable and modifiable. That way, a Unicode::Japanese module could do something like: [:hiragana:] = /[\x{3041}-\x{3094}]/; [:katakana:] = /[\x{30A1}-\x{30F4}]/; [:kana:] = [:hiragana:] + [:katakana:]; and then each of those three classes could be used in RE's when needed. III. Allow for character equivalence tables to be user-definable. This would allow for the /i behavior of RE's to be generalized. As an example, consider the following code: $kanainsensitive = td/[:hiragana:]/[:katakana:]/; if ($japanesetext =~ m/$japanesepattern/i{$kanainsentive} { print "$japanesetext matched $japanesepattern\n"; } The new td// construct would create a character equivalence table that could be used with a generalized /i option to indicate that hiragana and katakana should be treated equivalently. A more sophisticated example could be: $vowelsoptional = td/aeiouAEIOU//; which would make vowels equivalent to no characters at all. For certain applications, it would be useful to allow matches of more than one character: $kanainsensitive += td/\x{304C}\x{3042}/\x{30AC}\x{30FC}/r + td/\x{304D}\x{3044}/\x{30AD}\x{30FC}/r + ... ; In this case, it represents the fact that long vowels are represented by one form in hiragana (HIRAGANA LETTER KA + HIRAGANA LETTER A), and a different form in katakana (KATAKANA LETTER KA + KATAKANA-HIRAGANA PROLONGED SOUND MARK). I used a /r there to indicate that the two parts of the td/// are regular expressions which are designed to be treated equivalent. That would allow both of those lines above to be written: $kanainsensitive += td/([\x{304C}\x{304D}])\x{3042}/\1\x{30FC}/r; It would also allow people to deal with combining forms, although there are probably better ways than this. IV. Make the character class switches be redefinable, but default to the current set. That would allow someone who is doing lots of work in Japanese be able use \w to mean kanji, hiragana, and katakana instead of the default of [0-9A-Za-z_]. There are probably lots of things I overlooked, but if it can be done cheaply, abstracting out the existing biases and making them user-expandable/definable would probably go a long way towards getting rid of language bias. > > Dan > > --"it's like this"--- > Dan Sugalski even samurai > [EMAIL PROTECTED] have teddy bears and even > teddy bears get drunk
Re: More character matching bits
At 01:14 PM 06-11-2001 -0700, Russ Allbery wrote: >Dan Sugalski <[EMAIL PROTECTED]> writes: > > At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote: > >> Dan Sugalski <[EMAIL PROTECTED]> writes: > > >>> Should perl's regexes and other character comparison bits have an > >>> option to consider different characters for the same thing as > >>> identical beasts? I'm thinking in particular of the Katakana/Hiragana > >>> bits of japanese, but other languages may have the same concepts. > > >> I think canonicalization gets you that if that's what you want. > > > I don't think canonicalization should do this. (I really hope not) This > > isn't really a canonicalization matter--words written with one character > > set aren't (AFAIK) the same as words written with the other, and which > > alphabet you use matters. (Which sort of argues against being able to do > > this, I suppose...) > >I guess I don't know what the definition of "the same thing" you're using >here is. I thought Dan was talking about something equivalent to the m//i functionality. Would it, or should it, be possible to tell m// to treat Katakana characters as the same as hiragana characters, in much the same way as m//i treats UPPERCASE the same as lowercase? Canonicalization won't get you that. My feeling is that the hooks should be there, but the specific equivalence mappings should be in the library, not the core.
Re: Should we care much about this Unicode-ish criticism?
Nick Ing-Simmons <[EMAIL PROTECTED]> writes: > Dan Sugalski <[EMAIL PROTECTED]> writes: > > > >It does bring up a deeper issue, however. Unicode is, at the moment, > >apparently inadequate to represent at least some part of the asian > >languages. Are the encodings currently in use less inadequate? I've been > >assuming that an Anything->Unicode translation will be lossless, but this > >makes me wonder whether that assumption is correct. > > One reason perl5.7.1+'s Encode does not do asian encodings yet is that > the tables I have found so far (Mainly Unicode 3.0 based) are lossy. Er, are the Unicode tables going to be embedded in /usr/bin/perl6? That doesn't give me a warm, cozy feeling about Perl-6 support of Unicode. I think it's great that Perl internals will be able to handle arbitrary strings of Unicode characters (using some version of UTF-*), but may I suggest that anything that relies on the properties of characters (case, conversions, combining, visibility, etc) require explicit library support? We'd lose some things, like normalization, but we wouldn't have to carry around huge tables, either. > > > -- > Nick Ing-Simmons > who is looking for a new job see http://www.ni-s.u-net.com/
Re: Stacks & registers
At 12:59 PM 05-23-2001 -0400, Dan Sugalski wrote: >Okay, folks, here's the current conundrum: > >Should Parrot be a register or stack-based system, and if a register-based >one, should we go with typed registers? >My current thoughts are this: > >We have a set of N registers. They're all linked. Nothing implicitly sets >values in any of the registers (if you want an integer value, you need to >make one). Each register has a set of validity markers for each type (int, >flaot, string, PMC) that may or may not be bits. We have a stack of sorts >that we can push the registers on to if we need. In the section I snipped, you described "linked" registers in relation to multiple sets of typed registers, with linking meaning that IntR1 would have the same value as FloatR1, etc. What do you mean by "linked" here, with each register being (as I read it) dynamically typed? Is N fixed, or can we have different number of visible registers at a time? When we push registers onto the stack, do we push them individually, or as a set? I mean, can we get away with something like (assuming C++-style overloading on "Register"): -- Register rFile[MAXREGSTACK][N]; int rDepth = 0; ParrotOp add(int addend, int addor, int sum) { rFile[rDepth][sum] = rFile[rDept][addor] + rFile[rDepth][addend]; } ParrotOp pushrframe(void) { if (rDepth == MAXREGSTACK) die("Register Stack exceeded"); rDepth++; } Or we could go the SPARC register window route: (Note: SPARC register windows overlap: they have three sets of 8 registers, and when a push happens, the old 3rd set becomes the new 1st set, allowing the caller and callee to share a set of 8 registers) Register *rFile = NULL; Register *rFrame = NULL; int rFileSize = 0; ParrotOp add(unsigned int addend, unsigned int addor, unsigned int sum) { assert(sum < 3*N); assert(addend < 3*N); assert (addor < 3*N); rFrame[sum] = rFrame[addend] + rFrame[addor] } ParrotOp pushRframe { if ((rFrame - rFile) < rFileSize - 5*N) { if (resize(rFile,rFileSize*2)) { rFileSize *= 2; } else { die("Register Stack Frame out of memory"); } } rFrame += 2*N; } -- >I'm definitely feeling unsure about this, so feel free (please!) to wade >in with comments, criticisms, or personal attacks... :) > > Dan > >--"it's like this"--- >Dan Sugalski even samurai >[EMAIL PROTECTED] have teddy bears and even > teddy bears get drunk
Re: PDD 4: Internal data types
At 11:14 AM 03-22-2001 -0800, Hong Zhang wrote: >Please not fight on wording. For most encodings I know of, the concept of >normalization does not even exist. What is your definition of normalization? To me, the usual definition of "normalization' is conversion of something into a standard form, especially when there are multiple equivilant forms it could be in. Since there are multiple ways within Unicode to express a single character that are considered (by Unicode) to be identical, conversion into single common form is necessary for comparison purposes. Example: The sequence of Unicode code points 006E 0061 0069 0308 0076 0065 and the sequence 006E 0061 00EF 0076 0065 both represent the same string in Unicode (the english word "naive", with a diaeresis over the i). Both represent 5-character strings, and both are supposed to compare identically. However, they use a different sequence of code points to represent one particular character: the 'i' with a diaeresis: 0069 0308 versus 00EF. If we have $naive5 and $naive6 be variable containing the two example strings, what do we want as the value of the following expressions? $naive5 eq $naive 6; length($naive5); length($naive6); and so forth. As far as my very limited understanding of the Unicode standard goes, they should compare equal, and both have a length of 5. But their encoded byte sequences may not be identical. >I fully understand this. This is one of the reasons I propose sole UTF-8 >encoding. If length() and substr() depend on string internal encoding, >are they still useful? Who can handle this magic length(). UTF-8 encoding doesn't fix the above problem. UTF-8 would still encode the two strings differently, because they have different code point sequences. For that matter, so would any of the other encoding suggestions. As such, for the above problem, encoding is pretty much a non-issue.
Re: Tolkein (was Re: PDD for code comments ????)
At 06:18 PM 02-20-2001 +, Nicholas Clark wrote: >As long as Terry Pratchett writes books faster than perl consumes quotes. >Based on the fact that he's still very alive, we aren't in danger yet. True... And he has some very good quotes. >However, Larry has already commented on the danger of running out of LOTR >quotes: > >http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-02/msg00369.html That thread does raise another concern in my mind. While a quote here and a quote there would constitute "Fair Use", if we are seriously in danger of running out of quotes, we are pushing the bounds of "Fair Use". I think, for copyright considerations alone, it's worth expanding to other authors. "Literate Programming"? Well, the Perl source is well read... >Nicholas Clark
Please shoot down this GC idea...
Why won't this work: As I see it, we can't guarantee that DESTROYable objects will be DESTROYed immediately when they become garbage without a full ref-counting scheme. A full ref-counting scheme is potentially expensive. Even full ref-counting schemes can't guarantee proper and timely destruction in the face of circular data structures, which ref-counting schemes leak. Partial ref-counting is very difficult to get right, and is likely to be even more expensive than full ref-counts. I haven't seen another possible problem with DESTROY-by-GC brought up: non-refcounting GCs can be fast because they don't have to look at the garbage, only the non-garbage. If we want it to DESTROY garbage that needs to be DESTROYed, they will have to look at the garbage to find the DESTROYable garbage -- which negates the advantage of just looking at non-garbage. So, here's an idea: 1. Maintain a list of DESTROYable objects. This list is automagically maintained by bless and DESTROY. 2. If the compiler can determine that an object is DESTROYable and garbage, the compiler can automatically insert a call to DESTROY at the appropriate place. e.g: { $fh = new Destroyable; $fh->methodcalls(); } could be transformed to: { $fh = new Destroyable; $fh->methodcalls(); $fh->DESTROY(); } This step may not be always possible -- can the compiler determine that $fh->methodcalls doesn't do anything to keep $fh alive? If not, it can't do this step. 3. After finding live objects, the GC would walk the DESTROYed list looking for objects not found alive. If/when it finds them, it DESTROYs them. It needs to do this before it rewrites over the reclaimed space, so that the data necessary for the DESTROY is still available. I feel that the number of objects that need to be DESTROYed will likely be small compared to the total number of Perl objects, so the DESTROYables list will be relatively small and fast to walk. The automagically detecting of when an object can be DESTROYed (if possible) should also help in keeping the DESTROYables list short. I'm sure this idea has flaws. But it's an idea. Tell me what I'm missing.
Re: Garbage collection (was Re: JWZ on s/Java/Perl/)
At 01:45 PM 02-12-2001 -0300, Branden wrote: >I think having both copying-GC and refcounting-GC is a good idea. I may be >saying a stupid thing, since I'm not a GC expert, but I think objects that >rely on having their destructors called the soonest possible for resource >cleanup could use a refcount-GC, while other objects that don't need that >could use a copy-GC. I really don't know if this is really feasible, it's >only an idea now. I also note that objects that are associated to resources >aren't typically the ones that get shared much in Perl, so using refcount >for them wouldn't be very expensive... > >Am I too wrong here? It's... complicated... Here's an example of where things could go wrong: sub foo { my $destroyme1 = new SomeClass; my $destroyme2 = new SomeClass; my @processme1; my @processme2; ... push @processme1, $destroyme1; push @processme2; $destroyme2; ... return \@processme2; } At the end of &foo(), $destroyme1 and $processme1 are dead, but $destroyme2 is alive. If $destroyme1 and $destroyme2 are ref-counted, but @processme1 and @processme2 are not, then at the end of &foo(), both objects will have ref-counts of 1 ($destroyme1 because of the ref from @processme1, which is a spurious ref-count; $destroyme2 because of the ref from @processme2, which is valid). $destroyme1 won't be destroyed until @processme1 is finalized, presumably by the GC, which could take a long time. That ref-count from @processme1 is necessary because if @processme1 escapes scope (like @processme2 did) then $destroyme1 is still alive, and can't be finalized. Going with full ref-counts solves the problem, because when @proccessme1 goes out of scope, it's ref-count drops to 0, and it gets finalized immediately, thus dropping $destroyme1 to 0, and it gets finalized. But with @processme2, its refcount drops from 2 to 1, so it survives and so does $destroyme2. Full ref-counting has a potentially large overhead for values that don't require finalization, which is likely the majority of our data. Going with partial ref-counts solves the simple case when the object is only referred to by objects with ref-counts, but could allow some objects' finalization to be delayed until the GC kicks in. Going with no ref-counts doesn't have the overhead of full refcounting, but unless some other mechanism (as yet undescribed) helps, finalization on all objects could be delayed until GC. >- Branden
Re: Another approach to vtables
At 01:14 PM 02-07-2001 -0500, Dan Sugalski wrote: >At 01:35 PM 2/7/2001 -0200, Branden wrote: >>As far as I know (and I could be _very_ wrong), the primary objectives of >>vtables are: >>1. Allowing extensible datatypes to be created by extensions and used in >>Perl. > >Secondarily, yes. > >>2. Making the implementation of `tie' and `overload' more efficient ('cause >>it's very slow in Perl 5). > >No, not at all. This isn't really a consideration as such. (The vtable >functions as desinged are inadequate for most overloading, for example) Hmm, I seem to remember vtables were being cited as a cure for lots of ills (perhaps combined with other aspects, like "make Perl nearly as fast as C".) The vtables were implied (or possibly out-right stated) as giving the low-level core a more object-oriented structure: as you state below, branching and conditionals in the runtime can be eliminated by the values knowing how to operate on themselves. It was also implied (or out-right claimed) that different objects/classes/packages/whatever could have class-specific vtables, defined at run-time, that would be used to handle the class-specific implementation details. I'm not sure what that could refer to except ties and overloading; class-specific methods wouldn't go in the vtable. There was some discussion that allowing the vtables to refer to functions written in perl would be a good idea, as it would allow extensions to be written in perl -- which is a good thing. I had gotten the impression that the perl code-sequence: $a = $b + $c; would generate the same op-code sequence regardless of the type of $a, $b, $c, and the vtables would do all the magic behind the scenes, calling tied or overloaded versions of the base functions if so defined for $a, $b, or $c. Now I seem to be hearing that this is not the case, that variable ties and overloads are at a much higher level, never touching the vtables. It now seems that the vtables will exist only for built-in types, and be inaccessible for user-defined types (unless those types are defined by the perl6 equivilant of XS, for example). This almost seems to be defaulting on the promise of vtables I thought was made.
RE: Meta-design
At 03:54 PM 12-06-2000 -0500, Sam Tregar wrote: >On Wed, 6 Dec 2000, Dan Sugalski wrote: > > > Non-refcounting GC schemes are more expensive when they collect, but less > > expensive otherwise, and it apparently is a win for the non-refcount > > schemes. > >Which is why GC is intimately tied to DESTROY consideration in terms of >Perl. If we intend to honor predictable DESTROY timing, and I think we >should, then we will need to reference count. No ifs, elses or >alternations. Anyone care to refute? This is not a complete refutation, but... It seems to me that there are three types of thingies[1] we are concerned about, conceptually: A) Thingies with no DESTROY considerations, which don't need refcounts. B) Thingies with DESTROY methods, but aren't timing-sensitive. They can be destroyed anytime after they die. These don't really need refcounts either. C) Thingies with DESTROY methods which need to be DESTROYed as soon as they die. These would seem to need refcounts. I think that distinguishing between B and C is a syntax issue out of scope here. Although B could be lumped with A if we could tell B and C apart, I'll assume that we must lump B and C together. If we could refcount only C for destruction, and let the GC-of-your-choice handle the actuall memory reclaimation, then the expense of refcounting should only affect C thingies. I am uncertain what the ratio of C thingies to A thingies is, so I can't judge how big a win it is. Theoretically, a non-refcount GC should never find any C thingies that would have a refcount>0, so the non-refcount GC shouldn't have to worry about it. >If we're going to be ref-counting anyway then the performance gain of a >non-refcounting GC, avoiding counting, is basically moot. If we're >ref-counting for DESTROY timing then we may as well use that data in the >GC. But we only care about the ref-count for DESTROY timing. If we can avoid counting for DESTROY timing insensitive thingies, we may still have a net performance gain. >I'm not some kind of ref-count true-believer - if you think we should put >this discussion of to a later date then I'm cool. I'm just spoiling for >some Perl 6 work to do and this area seemed ripe for critical development. > >-sam
Re: Opcodes (was Re: The external interface for the parser piece)
At 02:27 PM 11-30-2000 -0500, Dan Sugalski wrote: >At 05:59 PM 11/30/00 +, Nicholas Clark wrote: >>On Thu, Nov 30, 2000 at 12:46:26PM -0500, Dan Sugalski wrote: >> > (Moved over to -internals, since it's not really a parser API thing) >> > >> > At 11:06 AM 11/30/00 -0600, Jarkko Hietaniemi wrote: >> > >Presumably. But why are you then still talking about "the IV slot in >> > >a scalar"...? I'm slow today. Show me how >> > > >> > > $a = 1.2; $b = 3; $c = $a + $b; >> > > >> > >is going to work, what kind of opcodes do you see being used? >> > >(for the purposes of this exercise, you may not assume the optimizer >> > > doing $c = (1.2+3) behind the curtains :-) >> >>$a=1; $b =3; $c = $a + $b > >No, that's naughty--it's much more interesting if the scalars are >different types. OK, how would this sequence convert to opcodes? $a=1.2; $b=5; $c = ($a.$b)*4; Something like (using a load/store paradigm for the opcodes, for variety): setnum 1.2, r1;; $a = 1.2 store r1, $a setint 5, r2 ;; $b = 5 store r2, $b load $a, r1 ;; ($a.$b) load $b, r2 append r1, r2, r3 mulr3, int 4, r4 ;; $c = ($a.$b)*4 store r4, $c This is before obvious optimization (the loads are completely unnecessary, but are here as an example) The append would do something like r1->vtable->append[int](r2), as per your last example, and would be responsible for coercing r2 to a string. mul would do something like r3->vtable->mul[int](int2P6Scaler(4)), and the mul[int] associated with strings would do the necessary conversions. What I'm curious about is the following sequence: use MyRomanNumerals; $a = MyType->new(4); print $a;# should print "IV" $b = 4; print $a + $b; # should print "VIII", maybe... print $b + $a; # should print "8", maybe... The execution of the two additions should be, based on what was said before, something like: a->vtable->add[typeof b](b); b->vtable->add[typeof a](a); How does b->vtable->add[] get an entry for MyRomanNumerals? I seem to remember a suggestion made a long time ago that would have the vtable include methods to convert to the "standard types", so that if the calls were b->vtable->add(b,a) (and both operands had to be passed in; this is C we're talking about, not C++ or perl. OO has to be done manually), then the add routine would do a->vtable->fetchint(a) to get the appropriate value. Or something like that. Have I confused something? >Yup. What add does is based on the types of the two operands. In the more >odd cases, I assume it's type stuff will be based on the left-hand >operand, but I wouldn't bet the farm on that yet, as that's a Larry call. That's what I assumed above, but who knows? >> > But that probably doesn't help much. Let me throw together something more >> > detailed and we'll see where we go from there. >> >>Hopefully it will cover the above case too. > >What, the "what if one of the operands is really bizarre" case? And with Perl6, I thought we were planning allowing some really bizarre cases? Has Larry indicated at all what his thoughts about fast powerful TIED variables were? > Dan
Re: Opcodes (was Re: The external interface for the parser piece)
At 05:59 PM 11-30-2000 +, Nicholas Clark wrote: >On Thu, Nov 30, 2000 at 12:46:26PM -0500, Dan Sugalski wrote: (Note, Dan was writing about "$a=1.2; $b=3; $c = $a + $b") >$a=1; $b =3; $c = $a + $b > > > > If they don't exist already, then something like: > > > > newscalar a, num, 1.2 > > newscalar b, int, 3 > > newscalar c, num, 0 > > add t3, a, b > >and $c ends up a num? >why that line "newscalar c, num, 0" ? >It looks to me like add needs to be polymorphic and work out the best >compromise for the type of scalar to create based on the integer/num/ >complex/oddball types of its two operands. I think the "add t3, a, b" was a typo, and should be "add c, a, b" Another way of looking at it, assuming that the Perl6 interpreter is stack-based, not register-based, is that the sequence would get converted into something like this: push num 1.3 ;; literal can be precomputed at compile time dup newscaler a;; get value from top of stack push int 3;; literal can be precomputed at compile time dup newscaler b push a push b add newscaler c The "add" op would, in C code, do something like: void add() { P6Scaler *addend; P6Scaler *adder; addend = pop(); adder = pop(); push addend->vtable->add(addend, adder); } it would be up to the addend->vtable->add() to figure out how to do the actual addition, and what type to return. > > But that probably doesn't help much. Let me throw together something more > > detailed and we'll see where we go from there. > >Hopefully it will cover the above case too. > >Nicholas Clark
Re: A tentative list of vtable functions
At 04:43 PM 8/31/00 -0400, Dan Sugalski wrote: >Okay, here's a list of functions I think should go into variable vtables. >Functions marked with a * will take an optional type offset so we can >handle asking for various permutations of the basic type. Perhaps I'm missing something... Is this for scalars alone? I see no arrays/hashes here. >type >name >get_bool >get_string * >get_int * >get_float * >get_value >set_string * >set_int * >set_float * >set_value >add * >subtract * >multiply * >divide * >modulus * >clone (returns a new copy of the thing in question) >new (creates a new thing) >concatenate >is_equal (true if this thing is equal to the parameter thing) >is_same (True if this thing is the same thing as the parameter thing) >logical_or >logical_and >logical_not >bind (For =~) >repeat (For x) > >Anyone got anything to add before I throw together the base vtable RFC? > > Dan > >--"it's like this"--- >Dan Sugalski even samurai >[EMAIL PROTECTED] have teddy bears and even > teddy bears get drunk
Re: RFC 127 (v1) Sane resolution to large function returns
At 11:26 AM 8/23/00 -0700, Larry Wall wrote: >I expect that we'll get more compile-time benefit from > > my HASH sub foo { > ... > } > > %bar = foo(); So how would you fill in the type in: my TYPE sub foo { ... if (wanthash()) { return %bar; } if (wantarray()) { return @baz; ) if (wantscalar()) { return $quux; }; } $scalar = foo(); @array = foo(); %hash = foo(); >Larry