> The Perl 6 Summary for the week ending 2004-09-17
>Another week, another summary, and I'm running late. So:
> This week in perl6-compiler
>  Bootstrapping the grammar
>Uri Guttman had some thoughts on bootstrapping Perl 6's grammar. He
>hoped that his suggested approach would enable lots of people to work on
>the thing at once without necessarily getting in each other's way. Adam
>Turoff pointed everyone at a detailed description of how Squeak (a free
>Smalltalk) got bootstrapped.

This link doesn't seem to be working, and doesn't have
the archives of perl6-compiler online yet.  Does anyone have a link to
the archives that works?

>Tim Bunce writes:
>: For perl at least I thought Larry has said that you'll be able to
>: create new ops but only give them the same precedence as any one
>: of the existing ops.
>Close, but not quite.  What I think I said was that you can't specify
>a raw precedence--you can only specify a precedence relative to an
>existing operator.  That way it doesn't matter what the initial
>precedence assignments are.  We can always change them internally.
>: Why not use a 16 bit int and specify that languages should use
>: default precedence levels spread through the range but keeping the
>: bottom 8 bits all zero. That gives 255 levels between '3' and '4'.
>: Seems like enough to me!
>: Floating point seems like over-egging the omelette.
>It's also under-egging the omelette, and not just because you
>eventually run out of bits.  I don't think either integer or floating
>point is the best solution, because in either case you have to remember
>separately how many levels of derivation from the standard precedence
>levels you are, so you know which bit to flip, or which increment to
>add or subtract from the floater.

So you'd have something like:

sub operator:mult($a, $b) is looser('*') is inline {...}
sub operator:add($a, $b) is tighter("+") is inline {...}
sub operator:div($a,$b) is looser("/") is inline {...}

assuming default Perl5 precedences for *, *, and / you would have the 
precedence strings for *, +, /, mult, add, and div to be "S", "R", "S", 
"S2", "S1", "S2" respectively?  So mult and div would have the same 

Hmmm  What problems would be caused by:

sub operator:radd($a,$b) is tighter("+") is inline is rightassociative {...}
sub operator:ladd($a,$b) is tighter("+") is inline is leftassociative {...}

Right now, all the operator precedence levels in Perl5 have either right, 
left, or no associativity, but they do not mix right and left associative 
operators.  Will that be allowed in Perl6?


>Dan Sugalski <[EMAIL PROTECTED]> wrote:
> > >So in the following:
> > >
> > >my Complex $c = 3+4i;
> > >my $plain = 1.1;
> > >$plain = $c;
> > >
> > >I presume that $plain ends up as type Complex (with value 3+4i)?
> >
> > Yup.
> >
> > >If so, how does $plain know how to "morph itself into the RHS's type"?
> >
> > The general rule is: If a PMC is not a fixed type, it tosses its
> > contents and becomes whatever's assigned to it. If it is a fixed
> > type, it extracts what it can as best it can from the source and uses
> > that.
>I just want to assert/clarify that the job of "becoming whatever's
>assigned to it" is delegated to the src PMC, since $plain won't itself know
>how to do this?

I assumed that the logic for assigning PMC to PMC would be something like:

if (destPMC is specified as typeX) {
if (srcPMC ISA typeX) {
   destPMC <- srcPMC
} else {
   destPMC <- typeX.convert(srcPMC);
} else {
   destPMC <- srcPMC

in pseudocode form.

If we assume that there is a universal "root" type such that all PMC's are 
ISA typeRoot, and that typeX.convert(PMCofTypeY) is trivial if typeY ISA 
typeX, then this simplifies to

   destPMC <- destPCM.declaredtype.convert(srcPMC);

Why does that look too simple?

>At 4:19 PM + 1/24/02, Dave Mitchell wrote:
>>Dan Sugalski <[EMAIL PROTECTED]> wrote:
>>>  That was my biggest objection. I like the thought of having a scheme
>>>  pair data type. The interpreter should see it, and it should be
>>>  accessed, as a restricted array, one with only two entries.
>>Is this then the same datatype as a Perl6 pair (cf '=>' op in Apo 3) ??
>Good point. it probably is, yes. (Though there may be potential 
>differences--depends on whether the scheme pair can only have scalars on 
>each side, or should allow other things)

In scheme, at least, pairs can contain any data on either side.   The 
notation for a pair is (value . value), and standard list notation (a b c d 
e f g) is simply syntactic sugar for (a . (b . (c . (d . (e . (f . (g . 
'(.  Although only the cdr of these pairs contain pairs, in a list 
like ((a a) (b b)) (also written as "((a . (a . '())) . ((b . (b  . '())) . 
'()))"), both the car and cdr of the outermost pair contain pairs.

>I've been thinking alot about the bytecode file format lately.  Its
>going to get really gross really fast when we start adding other
>(optional) sections to the code.
>So, with that in mind, here's what I propose:

>What do you guys think?

Have you taken a look at the old Amiga IFF format?  It consisted mainly of 
"chunks" identified by a 32-bit type code and  a chunk-length code.  While 
most implementations were for specific multi-media applications (chunks 
defining sound formats, chunks defining image formats, etc), the standard 
itself was data-neutral.

I believe that Microsoft is using a derivative of that format for some of 
its files, and I think that TIFF files are another instantiation.

It may be worth looking at to avoid re-inventing wheels.


>In perl.perl6.internals, you wrote:
> >The attached patch makes all bytecode have a type of int32_t rather than
> >IV; it also contains the other stuff I needed to get the tests running
> >on my Alpha (modifications to and register.c).
>I think this is a bad idea.  There simply is no guarantee that there's
>a native integral type with 32 bits.  And having an int32_t type that
>*isn't* 32-bits is just plain confusing.  Just ask anyone who's gotten
>burnt by perl5's I32, which has the exact same problem.

Well, since bytecode is defined to be 32-bit, it makes sense to define it 
as an int32_t type and have the definition of an int32_t be platform-specific.

> At 07:43 PM 9/8/2001 -0700, Wizard wrote:
> >Questions regarding Bitwise operators:
> >
> > > =item rol tx, ty, tz *
> >...
> > > =item ror tx, ty, tz *
> >
> >Are these with or without carry?
> That's a good question. Now that we have a list of bitwise ops, we can 
> decide how they work. What happens when you rotate/shift/bit-or a float? Or 
> a bitint/bigfloat? Or a string? Important questions, and we can hammer 
> something out now that we know what they are.

I'd like to suggest that the shift- and roll/rotate- ops take a 4th
parameter, that being the "word"-size in bits.  For Bigints and
arbitrary-length bit-vectors, the size of a "word" to rotate or shift
could be infinite, probably isn't what is wanted.

It would also make simpler such operations that might come up in some
cryptographic routines, like "rotate the upper 64 bits left 3 bits",
which would be encoded as (assuming "rotate_l dest, source, roll-amount,

rotate_l P1, P1, 64, 128
rotate_l P1, P1,  3,  64
rotate_r P1, P1, 64, 128

Just my 2 centums.

> Okay, I'm whipping together the "fancy math" section of the interpreter 
> assembly language. I've got:

> Can anyone think of things I've forgotten? It's been a while since I've 
> done numeric work.

Uri mentioned exp(x) = e^x, but I think if you are going to include
log2, log10, log, etc, you should also include ln.

>Dave Mitchell wrote:
> > So how does that all work then? What does the parrot assembler for
> >
> >   foo($x+1, $x+2, , $x+65)
>The arg list will be on the stack. Parrot just allocates new PMCs and
>pushes the PMC on the stack.
>I assume it will look something like
>   new_pmc pmc_register[0]
>   add pmc_register[0], $x, 1
>   push pmc_register[0]
>   new_pmc pmc_register[0]
>   add pmc_register[0], $x, 2
>   push pmc_register[0]
>   ...
>   call foo, 65

Hmmm, I assumed it would be something like:

load $x, P0 ;; load $x into PMC register 0
new P2  ;; Create a new PMC in register 2
push p0,p2  ;; Make P2 be ($x)
add p0,#1,p1;; Add 1 to $x, store in PMC register 1
push p1,p2  ;; Make P2 be ($x,$x+1)
add p0,#2,p1;; Add 2 to $x, store in PMC register 1
push p1,p2  ;; Make P2 be ($x,$x+1,$x+2)
call foo,p2 ;; Call foo($x,$x+1,...,$x+65)

Although this would be premature optimization, since I see this idiom being 
used a lot, it may be useful to have some special-purpose ops to handle 
creating arg-lists, like a "new_array size,register" op, that would create 
a new PMC containing a pre-sized array (thus eliminating repeatedly growing 
the array with the push ops), or a "push5 destreg, reg1, reg2, reg3, reg4, 
reg5" op (and corresponding pushN ops for N=2 to 31) that push the 
specified registers (in order) onto the destreg.

>Hmm. It didn't occur to me that raw values might go on the call
>stack. Is the call stack going to store PMCs only? That would
>simplify things a lot.

If ops and functions should be able to be used interchangeably, I wouldn't 
expect any function arguments to be stored on the stack, but passed via 
registers (or lists referenced in registers).

>- Ken

> > Perl came from ASCII-centric roots, so it's likely that most of our
> > biases are ASCII-centric.  And for a couple of reasons, it's going to
> > be hard to deal with that:
> > 
> > 1. Backwards compatability with existing Perl practice,
> > 
> > and
> > 
> > 2. To do language-neutral right is -really- hard; look at locales and
> > Unicode as examples.
> > 
> > As such, instead of trying to make Perl work for all languages out of
> > the box, why not make Perl's language handling extensible from within
> > the language and have it be as language-free as possible (except for
> > backwards compatability stuff) out of the box.
> Right on.
> > Examples of what we can do:
> > 
> > I. Make ranges work on Unicode code-points (if they don't already).
> U, yes, they do, if you by code-point ranges mean \x{...}-\x{...}
> but in general I would like to discourage the use of ranges.  What do
> you think [a-\N{KATAKANA LETTER KI}] should mean?  I think it should
> mean a compile time error.  People misuse ranges for classes.  Ranges
> also imply some collation, which is, as discussed, really bad.

I think, following my line of thought, that [a-\N{KATAKANA LETTER KI}]
should be equivalent to [\x{0061}-\x{30AD}], which would match any of
the 12365 characters between \x{0061} and \x{30AD}.  Admittedly, this
probably isn't that useful of a class, but it's what I see was asked

Collation is something I hadn't considered.  My initial thought would
be that by default, collation order would be code-point order, but
that should probably be able to be overridden.

Code-point order at least allows us to collate 'a' and KATAKANA LETTER
KI, which I can't think of any other sensible way to do it.
> We probably also ought to answer the question "How accommodating to 
> non-latin writing systems are we going to be?" It's an uncomfortable 
> question, but one that needs asking. Answering by Larry, probably, but 
> definitely asking. Perl's not really language-neutral now (If you think so, 
> go wave locales at Jarkko and see what happens... :) but all our biases are 
> sort of implicit and un (or under) stated. I'd rather they be explicit, 
> though I know that's got problems in and of itself.

Perl came from ASCII-centric roots, so it's likely that most of our
biases are ASCII-centric.  And for a couple of reasons, it's going to
be hard to deal with that:

1. Backwards compatability with existing Perl practice,


2. To do language-neutral right is -really- hard; look at locales and
Unicode as examples.

As such, instead of trying to make Perl work for all languages out of
the box, why not make Perl's language handling extensible from within
the language and have it be as language-free as possible (except for
backwards compatability stuff) out of the box.

Examples of what we can do:

I. Make ranges work on Unicode code-points (if they don't already).

II. Make POSIX-style character classes (e.g. [:space:])
user-definable and modifiable.  That way, a Unicode::Japanese module
could do something like:

[:hiragana:] = /[\x{3041}-\x{3094}]/;
[:katakana:] = /[\x{30A1}-\x{30F4}]/;
[:kana:] = [:hiragana:] + [:katakana:];

and then each of those three classes could be used in RE's when needed.

III. Allow for character equivalence tables to be user-definable.
This would allow for the /i behavior of RE's to be generalized.

As an example, consider the following code:

$kanainsensitive = td/[:hiragana:]/[:katakana:]/;

if ($japanesetext =~ m/$japanesepattern/i{$kanainsentive} {
   print "$japanesetext matched $japanesepattern\n";

The new td// construct would create a character equivalence table that
could be used with a generalized /i option to indicate that hiragana
and katakana should be treated equivalently.

A more sophisticated example could be:

$vowelsoptional = td/aeiouAEIOU//;

which would make vowels equivalent to no characters at all.

For certain applications, it would be useful to allow matches of more
than one character:

$kanainsensitive +=   td/\x{304C}\x{3042}/\x{30AC}\x{30FC}/r
+ td/\x{304D}\x{3044}/\x{30AD}\x{30FC}/r
+ ... ;

In this case, it represents the fact that long vowels are represented
by one form in hiragana (HIRAGANA LETTER KA + HIRAGANA LETTER A), and
a different form in katakana (KATAKANA LETTER KA + KATAKANA-HIRAGANA

I used a /r there to indicate that the two parts of the td/// are
regular expressions which are designed to be treated equivalent.  That
would allow both of those lines above to be written:

$kanainsensitive +=  td/([\x{304C}\x{304D}])\x{3042}/\1\x{30FC}/r;

It would also allow people to deal with combining forms, although
there are probably better ways than this.

IV.  Make the character class switches be redefinable, but default to
the current set.  That would allow someone who is doing lots of work
in Japanese be able use \w to mean kanji, hiragana, and katakana
instead of the default of [0-9A-Za-z_].

There are probably lots of things I overlooked, but if it can be done
cheaply, abstracting out the existing biases and making them
user-expandable/definable would probably go a long way towards getting
rid of language bias.

>Dan Sugalski <[EMAIL PROTECTED]> writes:
> > At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
> >> Dan Sugalski <[EMAIL PROTECTED]> writes:
> >>> Should perl's regexes and other character comparison bits have an
> >>> option to consider different characters for the same thing as
> >>> identical beasts?  I'm thinking in particular of the Katakana/Hiragana
> >>> bits of japanese, but other languages may have the same concepts.
> >> I think canonicalization gets you that if that's what you want.
> > I don't think canonicalization should do this. (I really hope not) This
> > isn't really a canonicalization matter--words written with one character
> > set aren't (AFAIK) the same as words written with the other, and which
> > alphabet you use matters. (Which sort of argues against being able to do
> > this, I suppose...)
>I guess I don't know what the definition of "the same thing" you're using
>here is.

I thought Dan was talking about something equivalent to the m//i functionality.

Would it, or should it, be possible to tell m// to treat Katakana 
characters as the same as hiragana characters, in much the same way as m//i 
treats UPPERCASE the same as lowercase?  Canonicalization won't get you that.

My feeling is that the hooks should be there, but the specific equivalence 
mappings should be in the library, not the core.

> Dan Sugalski <[EMAIL PROTECTED]> writes:
> >
> >It does bring up a deeper issue, however. Unicode is, at the moment, 
> >apparently inadequate to represent at least some part of the asian 
> >languages. Are the encodings currently in use less inadequate? I've been 
> >assuming that an Anything->Unicode translation will be lossless, but this 
> >makes me wonder whether that assumption is correct.
> One reason perl5.7.1+'s Encode does not do asian encodings yet is that 
> the tables I have found so far (Mainly Unicode 3.0 based) are lossy.

Er, are the Unicode tables going to be embedded in /usr/bin/perl6?
That doesn't give me a warm, cozy feeling about Perl-6 support of

I think it's great that Perl internals will be able to handle
arbitrary strings of Unicode characters (using some version of UTF-*),
but may I suggest that anything that relies on the properties of
characters (case, conversions, combining, visibility, etc) require
explicit library support?  We'd lose some things, like normalization,
but we wouldn't have to carry around huge tables, either.

>Okay, folks, here's the current conundrum:
>Should Parrot be a register or stack-based system, and if a register-based 
>one, should we go with typed registers?

>My current thoughts are this:
>We have a set of N registers. They're all linked. Nothing implicitly sets 
>values in any of the registers (if you want an integer value, you need to 
>make one). Each register has a set of validity markers for each type (int, 
>flaot, string, PMC) that may or may not be bits. We have a stack of sorts 
>that we can push the registers on to if we need.

In the section I snipped, you described "linked" registers in relation to 
multiple sets of typed registers, with linking meaning that IntR1 would 
have the same value as FloatR1, etc.  What do you mean by "linked" here, 
with each register being (as I read it) dynamically typed?

Is N fixed, or can we have different number of visible registers at a 
time?  When we push registers onto the stack, do we push them individually, 
or as a set?

I mean, can we get away with something like (assuming C++-style overloading 
on "Register"):

Register rFile[MAXREGSTACK][N];
int rDepth = 0;

add(int addend, int addor, int sum)
 rFile[rDepth][sum] = rFile[rDept][addor] + rFile[rDepth][addend];

if (rDepth == MAXREGSTACK)
   die("Register Stack exceeded");


Or we could go the SPARC register window route:
(Note:  SPARC register windows overlap: they have three sets of 8 
registers, and when a push happens, the old 3rd set becomes the new 1st 
set, allowing the caller and callee to share a set of 8 registers)

Register *rFile = NULL;
Register *rFrame = NULL;
int rFileSize = 0;

add(unsigned int addend, unsigned int addor, unsigned int sum)
   assert(sum < 3*N); assert(addend < 3*N); assert (addor < 3*N);
   rFrame[sum] = rFrame[addend] + rFrame[addor]

   if ((rFrame - rFile) < rFileSize - 5*N) {
 if (resize(rFile,rFileSize*2)) {
   rFileSize *= 2;
 } else {
   die("Register Stack Frame out of memory");
   rFrame += 2*N;


>I'm definitely feeling unsure about this, so feel free (please!) to wade 
>in with comments, criticisms, or personal attacks... :)
> Dan
>Please not fight on wording. For most encodings I know of, the concept of
>normalization does not even exist. What is your definition of normalization?

To me, the usual definition of "normalization' is conversion of something 
into a standard form, especially when there are multiple equivilant forms 
it could be in.

Since there are multiple ways within Unicode to express a single character 
that are considered (by Unicode) to be identical, conversion into  single 
common form is necessary for comparison purposes.

Example:  The sequence of Unicode code points 006E 0061 0069 0308 0076 0065 
and the sequence 006E 0061 00EF 0076 0065 both represent the same string in 
Unicode (the english word "naive", with a diaeresis over the i).  Both 
represent 5-character strings, and both are supposed to compare 
identically.  However, they use a different sequence of code points to 
represent one particular character: the 'i' with a diaeresis: 0069 0308 
versus 00EF.

If we have $naive5 and $naive6 be variable containing the two example 
strings, what do we want as the value of the following expressions?

   $naive5 eq $naive 6;
and so forth.

As far as my very limited understanding of the Unicode standard goes, they 
should compare equal, and both have a length of 5.  But their encoded byte 
sequences may not be identical.

>I fully understand this. This is one of the reasons I propose sole UTF-8
>encoding. If length() and substr() depend on string internal encoding,
>are they still useful? Who can handle this magic length().

UTF-8 encoding doesn't fix the above problem.  UTF-8 would still encode the 
two strings differently, because they have different code point 
sequences.  For that matter, so would any of the other encoding 
suggestions.   As such, for the above problem, encoding is pretty much a 

>As long as Terry Pratchett writes books faster than perl consumes quotes.
>Based on the fact that he's still very alive, we aren't in danger yet.

True...  And he has some very good quotes.

>However, Larry has already commented on the danger of running out of LOTR

That thread does raise another concern in my mind.  While a quote here and 
a quote there would constitute "Fair Use", if we are seriously in danger of 
running out of quotes, we are pushing the bounds of "Fair Use".

I think, for copyright considerations alone, it's worth expanding to other 

"Literate Programming"?  Well, the Perl source is well read...

>Nicholas Clark

Why won't this work:

As I see it, we can't guarantee that DESTROYable objects will be DESTROYed 
immediately when they become garbage without a full ref-counting scheme.  A 
full ref-counting scheme is potentially expensive.

Even full ref-counting schemes can't guarantee proper and timely 
destruction in the face of circular data structures, which ref-counting 
schemes leak.

Partial ref-counting is very difficult to get right, and is likely to be 
even more expensive than full ref-counts.

I haven't seen another possible problem with DESTROY-by-GC brought 
up:  non-refcounting GCs can be fast because they don't have to look at the 
garbage, only the non-garbage.  If we want it to DESTROY garbage that needs 
to be DESTROYed, they will have to look at the garbage to find the 
DESTROYable garbage -- which negates the advantage of just looking at 

So, here's an idea:

1. Maintain a list of DESTROYable objects.  This list is automagically 
maintained by bless and DESTROY.

2. If the compiler can determine that an object is DESTROYable and garbage, 
the compiler can automatically insert a call to DESTROY at the appropriate 


{ $fh = new Destroyable; $fh->methodcalls(); }

could be transformed to:

{ $fh = new Destroyable; $fh->methodcalls(); $fh->DESTROY(); }

This step may not be always possible -- can the compiler determine that 
$fh->methodcalls doesn't do anything to keep $fh alive?  If not, it can't 
do this step.

3. After finding live objects, the GC would walk the DESTROYed list looking 
for objects not found alive.  If/when it finds them, it DESTROYs them.  It 
needs to do this before it rewrites over the reclaimed space, so that the 
data necessary for the DESTROY is still available.

I feel that the number of objects that need to be DESTROYed will likely be 
small compared to the total number of Perl objects, so the DESTROYables 
list will be relatively small and fast to walk.  The automagically 
detecting of when an object can be DESTROYed (if possible) should also help 
in keeping the DESTROYables list short.

I'm sure this idea has flaws.  But it's an idea.  Tell me what I'm missing.

At 01:45 PM 02-12-2001 -0300, Branden wrote:
>I think having both copying-GC and refcounting-GC is a good idea. I may be
>saying a stupid thing, since I'm not a GC expert, but I think objects that
>rely on having their destructors called the soonest possible for resource
>cleanup could use a refcount-GC, while other objects that don't need that
>could use a copy-GC. I really don't know if this is really feasible, it's
>only an idea now. I also note that objects that are associated to resources
>aren't typically the ones that get shared much in Perl, so using refcount
>for them wouldn't be very expensive...
>Am I too wrong here?

It's... complicated...

Here's an example of where things could go wrong:

sub foo {
 my $destroyme1 = new SomeClass;
 my $destroyme2 = new SomeClass;
 my @processme1;
 my @processme2;
 push @processme1, $destroyme1;
 push @processme2; $destroyme2;
 return \@processme2;

At the end of &foo(), $destroyme1 and $processme1 are dead, but $destroyme2 
is alive.

If $destroyme1 and $destroyme2 are ref-counted, but @processme1 and 
@processme2 are not, then at the end of &foo(), both objects will have 
ref-counts of 1 ($destroyme1 because of the ref from @processme1, which is 
a spurious ref-count; $destroyme2 because of the ref from @processme2, 
which is valid).  $destroyme1 won't be destroyed until @processme1 is 
finalized, presumably by the GC, which could take a long time.

That ref-count from @processme1 is necessary because if @processme1 escapes 
scope (like @processme2 did) then $destroyme1 is still alive, and can't be 

Going with full ref-counts solves the problem, because when @proccessme1 
goes out of scope, it's ref-count drops to 0, and it gets finalized 
immediately, thus dropping $destroyme1 to 0, and it gets finalized.  But 
with @processme2, its refcount drops from 2 to 1, so it survives and so 
does $destroyme2.

Full ref-counting has a potentially large overhead for values that don't 
require finalization, which is likely the majority of our data.

Going with partial ref-counts solves the simple case when the object is 
only referred to by objects with ref-counts, but could allow some objects' 
finalization to be delayed until the GC kicks in.

Going with no ref-counts doesn't have the overhead of full refcounting, but 
unless some other mechanism (as yet undescribed) helps, finalization on all 
objects could be delayed until GC.

At 01:14 PM 02-07-2001 -0500, Dan Sugalski wrote:
>At 01:35 PM 2/7/2001 -0200, Branden wrote:
>>As far as I know (and I could be _very_ wrong), the primary objectives of
>>vtables are:
>>1. Allowing extensible datatypes to be created by extensions and used in
>Secondarily, yes.
>>2. Making the implementation of `tie' and `overload' more efficient ('cause
>>it's very slow in Perl 5).
>No, not at all. This isn't really a consideration as such. (The vtable 
>functions as desinged are inadequate for most overloading, for example)

Hmm, I seem to remember vtables were being cited as a cure for lots of ills 
(perhaps combined with other aspects, like "make Perl nearly as fast as C".)

The vtables were implied (or possibly out-right stated) as giving the 
low-level core a more object-oriented structure: as you state below, 
branching and conditionals in the runtime can be eliminated by the values 
knowing how to operate on themselves.

It was also implied (or out-right claimed) that different 
objects/classes/packages/whatever could have class-specific vtables, 
defined at run-time, that would be used to handle the class-specific 
implementation details.  I'm not sure what that could refer to except ties 
and overloading; class-specific methods wouldn't go in the vtable.

There was some discussion that allowing the vtables to refer to functions 
written in perl would be a good idea, as it would allow extensions to be 
written in perl -- which is a good thing.

I had gotten the impression that the perl code-sequence:

   $a = $b + $c;

would generate the same op-code sequence regardless of the type of $a, $b, 
$c, and the vtables would do all the magic behind the scenes, calling tied 
or overloaded versions of the base functions if so defined for $a, $b, or $c.

Now I seem to be hearing that this is not the case, that variable ties and 
overloads are at a much higher level, never touching the vtables.  It now 
seems that the vtables will exist only for built-in types, and be 
inaccessible for user-defined types (unless those types are defined by the 
perl6 equivilant of XS, for example).  This almost seems to be defaulting 
on the promise of vtables I thought was made.

>On Wed, 6 Dec 2000, Dan Sugalski wrote:
> > Non-refcounting GC schemes are more expensive when they collect, but less
> > expensive otherwise, and it apparently is a win for the non-refcount
> > schemes.
>Which is why GC is intimately tied to DESTROY consideration in terms of
>Perl.  If we intend to honor predictable DESTROY timing, and I think we
>should, then we will need to reference count.  No ifs, elses or
>alternations.  Anyone care to refute?

This is not a complete refutation, but...

It seems to me that there are three types of thingies[1] we are concerned 
about, conceptually:

A) Thingies with no DESTROY considerations, which don't need refcounts.
B) Thingies with DESTROY methods, but aren't timing-sensitive.  They can be 
destroyed anytime after they die.  These don't really need refcounts either.
C) Thingies with DESTROY methods which need to be DESTROYed as soon as they 
die.  These would seem to need refcounts.

I think that distinguishing between B and C is a syntax issue out of scope 
here.  Although B could be lumped with A if we could tell B and C apart, 
I'll assume that we must lump B and C together.

If we could refcount only C for destruction, and let the GC-of-your-choice 
handle the actuall memory reclaimation, then the expense of refcounting 
should only affect C thingies.  I am uncertain what the ratio of C thingies 
to A thingies is, so I can't judge how big a win it is.

Theoretically, a non-refcount GC should never find any C thingies that 
would have a refcount>0, so the non-refcount GC shouldn't have to worry 
about it.

>If we're going to be ref-counting anyway then the performance gain of a
>non-refcounting GC, avoiding counting, is basically moot.  If we're
>ref-counting for DESTROY timing then we may as well use that data in the

But we only care about the ref-count for DESTROY timing.  If we can avoid 
counting for DESTROY timing insensitive thingies, we may still have a net 
performance gain.

>I'm not some kind of ref-count true-believer - if you think we should put
>this discussion of to a later date then I'm cool.  I'm just spoiling for
>some Perl 6 work to do and this area seemed ripe for critical development.

>At 05:59 PM 11/30/00 +, Nicholas Clark wrote:
>>On Thu, Nov 30, 2000 at 12:46:26PM -0500, Dan Sugalski wrote:
>> > (Moved over to -internals, since it's not really a parser API thing)
>> >
>> > At 11:06 AM 11/30/00 -0600, Jarkko Hietaniemi wrote:
>> > >Presumably.  But why are you then still talking about "the IV slot in
>> > >a scalar"...?  I'm slow today.  Show me how
>> > >
>> > > $a = 1.2; $b = 3; $c = $a + $b;
>> > >
>> > >is going to work, what kind of opcodes do you see being used?
>> > >(for the purposes of this exercise, you may not assume the optimizer
>> > >  doing $c = (1.2+3) behind the curtains :-)
>>$a=1; $b =3; $c = $a + $b
>No, that's naughty--it's much more interesting if the scalars are 
>different types.

OK, how would this sequence convert to opcodes?

$a=1.2; $b=5; $c = ($a.$b)*4;

Something like (using a load/store paradigm for the opcodes, for variety):

  setnum 1.2, r1;; $a = 1.2
  store  r1, $a
  setint 5, r2  ;; $b = 5
  store  r2, $b
  load   $a, r1 ;; ($a.$b)
  load   $b, r2
  append r1, r2, r3
  mulr3, int 4, r4  ;; $c = ($a.$b)*4
  store  r4, $c

This is before obvious optimization (the loads are completely unnecessary, 
but are here as an example)

The append would do something like r1->vtable->append[int](r2), as per your 
last example, and would be responsible for coercing r2 to a string.  mul 
would do something like r3->vtable->mul[int](int2P6Scaler(4)), and the 
mul[int] associated with strings would do the necessary conversions.

What I'm curious about is the following sequence:

use MyRomanNumerals;
$a = MyType->new(4);
print $a;# should print "IV"
$b = 4;
print $a + $b; # should print "VIII", maybe...
print $b + $a; # should print "8", maybe...

The execution of the two additions should be, based on what was said 
before, something like:

   a->vtable->add[typeof b](b);
   b->vtable->add[typeof a](a);

How does b->vtable->add[] get an entry for MyRomanNumerals?

I seem to remember a suggestion made a long time ago that would have the 
vtable include methods to convert to the "standard types", so that if the 
calls were b->vtable->add(b,a) (and both operands had to be passed in; this 
is C we're talking about, not C++ or perl.  OO has to be done manually), 
then the add routine would do a->vtable->fetchint(a) to get the appropriate 
value.  Or something like that.  Have I confused something?

>Yup. What add does is based on the types of the two operands. In the more 
>odd cases, I assume it's type stuff will be based on the left-hand 
>operand, but I wouldn't bet the farm on that yet, as that's a Larry call.

That's what I assumed above, but who knows?

>> > But that probably doesn't help much. Let me throw together something more
>> > detailed and we'll see where we go from there.
>>Hopefully it will cover the above case too.
>What, the "what if one of the operands is really bizarre" case?

And with Perl6, I thought we were planning allowing some really bizarre 
cases?  Has Larry indicated at all what his thoughts about fast powerful 
TIED variables were?

> Dan

At 05:59 PM 11-30-2000 +, Nicholas Clark wrote:
>On Thu, Nov 30, 2000 at 12:46:26PM -0500, Dan Sugalski wrote:

(Note, Dan was writing about "$a=1.2; $b=3; $c = $a + $b")

>$a=1; $b =3; $c = $a + $b
> > If they don't exist already, then something like:
> >
> >  newscalar   a, num, 1.2
> >  newscalar   b, int, 3
> >  newscalar   c, num, 0
> >  add t3, a, b
>and $c ends up a num?
>why that line "newscalar c, num, 0" ?
>It looks to me like add needs to be polymorphic and work out the best
>compromise for the type of scalar to create based on the integer/num/
>complex/oddball types of its two operands.

I think the "add t3, a, b" was a typo, and should be "add c, a, b"

Another way of looking at it, assuming that the Perl6 interpreter is 
stack-based, not register-based, is that the sequence would get converted 
into something like this:

 push  num 1.3  ;; literal can be precomputed at compile time
 newscaler a;; get value from top of stack
 push  int 3;; literal can be precomputed at compile time
 newscaler b
 push  a
 push  b
 newscaler c

The "add" op would, in C code, do something like:

void add() {
   P6Scaler *addend;
   P6Scaler *adder;

   addend = pop();  adder = pop();
   push addend->vtable->add(addend, adder);

it would be up to the addend->vtable->add() to figure out how to do the 
actual addition, and what type to return.

> > But that probably doesn't help much. Let me throw together something more
> > detailed and we'll see where we go from there.
>Hopefully it will cover the above case too.
>Nicholas Clark

At 04:43 PM 8/31/00 -0400, Dan Sugalski wrote:
>Okay, here's a list of functions I think should go into variable vtables. 
>Functions marked with a * will take an optional type offset so we can 
>handle asking for various permutations of the basic type.

Perhaps I'm missing something...  Is this for scalars alone?  I see no 
arrays/hashes here.

>get_string *
>get_int *
>get_float *
>set_string *
>set_int *
>set_float *
>add *
>subtract *
>multiply *
>divide *
>modulus *
>clone (returns a new copy of the thing in question)
>new (creates a new thing)
>is_equal (true if this thing is equal to the parameter thing)
>is_same (True if this thing is the same thing as the parameter thing)
>bind (For =~)
>repeat (For x)
>Anyone got anything to add before I throw together the base vtable RFC?
> Dan
>--"it's like this"---
>Dan Sugalski  even samurai
>[EMAIL PROTECTED] have teddy bears and even
>  teddy bears get drunk

At 11:26 AM 8/23/00 -0700, Larry Wall wrote:

>I expect that we'll get more compile-time benefit from
> my HASH sub foo {
> ...
> }
> %bar = foo();

So how would you fill in the type in:

my TYPE sub foo {
   if (wanthash())   { return %bar;  }
   if (wantarray())  { return @baz;  )
   if (wantscalar()) { return $quux; };

$scalar = foo();
@array  = foo();
%hash   = foo();
