This fragment of response is about types, layers of abstraction and tracking information as the stages of compilation progress.

And, I probably haven't said it enough yet, but the work you've done here is absolutely wonderful, Patrick. There's nothing like a solid chunk of working code to push the design to the next stage of evolution. :)

Patrick R. Michaud wrote:
On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:

- Is there no way to indicate what type of variable a PAST::Var is? Scalar/Array/Hash? (high-level types, not low-level types)

Sure, that's what 'vtype' is -- it indicates the type of value
that the variable ought to hold.

My plan has been to follow the Perl6 concept of "implementation types"
and "value types" within PAST.  Thus far I've only put in the support
for the value types, as the "vtype" attribute (and vtype can be any
high-level type the language happens to support).  I'm expecting
to add an "itype" attribute at some point when we're a bit farther
along; I'm still working out the details.

Hrm... you've really got two HLL types: the container type (scalar/array/hash) and the value type (Str, Int, Foo::Bar, Array, Hash, Matrix, Custom::Hash, etc).

You've also essentially got two PIR types: the container type (int/num/str/pmc) and the value type (int, num, str, or some pmc type).

By "implementation type" do you mean the PIR value type?


A YAML config file to map HLL value types to PIR value types for a particular compiler would be another nice addition. PAST doesn't need to know anything about PIR types.


- In PAST nodes, the attribute 'ctype' isn't actually storing a C language type. Better name?

It really stands for "constant type", and is one of 'i', 'n', or
's' depending on whether it can be treated as an int, num, or
string when being handled as a constant in PIR.

Okay, 'const_type' is a better name.

- The attribute 'vtype' is both variable type in POST::Var and value type in POST::Val. Handy generalization, but it's not clear from the name that 'vtype' is either of those things.

I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var
or POST::Val.

Indeed I did. Though, why isn't there a POST::Var or POST::Val? POST has both variables and values.

But 'vtype' really stands for "value type" in both
cases -- it's the type of value returned by either a PAST::Var
or PAST::Val node.

Hmm... If a PAST::Var is, say, an integer constant, will it have the same 'value_type' as an integer PAST::Val?

(Definitely go with the longer name instead of 'vtype'.)

- The values for both 'ctype' and 'vtype' are obscure. Better to establish a general system for representing types, than to include raw Parrot types or 1-letter codes in the AST.

Ultimately I expect that the types that appear in 'vtype' will
be the types defined by the HLL itself.  For example, in perl6
one would see 'vtype'=>'Str' to indicate a Perl 6 string constant.
Unfortunately it's been difficult to illustrate this in real code because of the HLL classname conflicts that I've been reporting in other contexts.

What bug # is that? It's hard to imagine how an HLL type name that's only stored in an AST would conflict with a Parrot class name. Or, are you assuming that the HLL type names have to be the same as the Parrot class names? Shouldn't need to be the same, you just need a config file mapping between the two.

I agree the values and name for 'ctype' are a bit obscure, and
will gladly accept any suggestions for improving it. The 'ctype' attribute is really just code optimization in the final output, and it does assume some knowledge of the target. If no ctype is specified, past-pm assumes that the constant value must first be placed into a PMC in order to be useful. With
a ctype present, then past-pm can match up the (PIR) opcode
contexts in which the value can be directly used as an int/num/string in an operation. It's the difference between

    # $b + 2                             # $b + 2
    get_global $P0, '$b'                 get_global $P0, '$b'
    new $P2, .Undef                      new $P1, .Integer
    add $P2, $P0, 2                      assign $P1, 2
                                         new $P2, .Undef
                                         add $P2, $P0, $P1

or

    # say 3, 4, 5                        # say 3, 4, 5
    "say"(3, 4, 5)                       new $P1, .Integer
                                         assign $P1, 3
                                         new $P2, .Integer
                                         assign $P2, 4
                                         new $P3, .Integer
                                         assign $P3, 5
                                         "say"($P1, $P2, $P3)

Okay, if ctype is an optimization hint, then you don't actually need to list the specific types (i/n/s) in the PAST nodes. All you need is the name of the HLL value type, and a small bit of config info for that type name. Whether a particular HLL type can be used directly as an int, num, or string, and which it can be used as, is always consistent for that type. Int can be used as a low-level integer, and Matrix can never be used as a low-level constant.

So, PAST provides the HLL type name, a configuration file provides details about that type, and the PAST-to-POST transformation decides when to use direct values (for the HLL types that allow it).


- In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op) represents. Is it the literal name of a PIR opcode, or a generic representation of standard low-level operations? I'm more in favor of the latter. Better still, give compiler-writers a standard format lookup table they can write to allow the PAST to POST tranformation to select the right PIR operation from the HLL op name. (See the comments on boundaries of abstraction.)

I think past-pm already has exactly what you want here, but it
may not be entirely clear.  First, 'pirop' does exactly what you
request in 'Better still, ...' -- it provides a way for the compiler
writer to identify the right PIR operation from the HLL op name.
In particular, in the operator-precedence specification a
compiler writer writes:

    proto infix:+ is pirop('add') { ... }
    proto infix:- is pirop('sub') { ... }

and this provides an easy way for the parse-to-past transformation
to associate the correct PIR operation from the HLL op name.
Essentially, the transformation looks for a 'pirop' trait on
the operator, and if found it puts it in the 'pirop' attribute
of the corresponding PAST::Op node.

Aye, that's how I have it working now. (Actually 'n_add' instead of 'add', because 'add' didn't work, so I cargo-culted from the perl6 implementation.)

The values of 'pirop' are really generic representations of
standard low-level operators. Unfortunately, PIR is not as regular as we might like it to be -- some PIR operations will work only with pmc operands, some will work with a variety of int/num/string/pmc operands, and still others won't work with
pmcs at all.  So, POST.pir has a lookup table (%pirtable)
that takes the generic name given by 'pirop' and does any
necessary coercions to get the operands to match.  So far
this table is incomplete -- I've been adding entries only
as I need them.

Okay, good. This is a nice abstraction layer. And, I note it can work equally well whether the optable is generated from the parser grammar or from a separate config file. Also good.

The idea is that a compiler writer can use 'pirop' to specify the mapping of HLL operators into PIR opcodes
directly in the grammar files where the HLL operators are
being defined.  Furthermore, the compiler writer doesn't have
to keep track of the low-level details for each PIR opcode;
i.e., when specifying 'pirop'=>'concat' the past-pm code
generation knows that concat needs string oprands.  (However,
if a compiler writer needs a specific PIR opcode, then they
could specify it with something like 'pirop'=>'concat_p_sc'.)

Reasonable. The association to PIR opcode names has to be declared somewhere. We can probably come up with better syntax than Parrot's cryptic internal 'concat_p_sc', but it's good enough to start.

- It would be easier to maintain (and create) the list of HLL to PIR operator associations in something like a YAML file than embedded in the parser grammar file. [...]

Hmm.  My feeling was that it was easier to put the operator
associations in the parser grammar file, but I can see the value
of placing them somewhere else, and I definitely would like to
keep Parrot-isms out of the AST as much as possible.

OTOH, there are many times when for optimization reasons or
other items it's useful to be able to drop some Parrot hints
directly into the AST (e.g., the 'ctype' attribute above), and
so I think that as long as full program semantics are captured
in the AST without any Parrot-specific items, it's okay to have
Parrot-specific items available in the AST as compiler hints
simply because it's sometimes easier to place them there than
elsewhere.

Sounds like we're in philosophical agreement. I'm okay with having a limited amount of Parrot-specific information in the AST, if it's extraneous to representing the semantics of the source code. At the same time, if the compiler hints are stored in an optable that's accessible from all stages of compilation, I don't see the advantage of annotating them in the AST. It just spends additional processor time and storage to create an unused copy of the information. So, "allowed but rare" would be my rule of thumb.

Still, that question is completely separate from the question of where the compiler writer declares the optable information. For now, let's take both options on that one: keep the traits on the operator precedence parser rules, but provide a config file format to generate optables independently. (We probably need to provide the second option anyway, since not every compiler writer will use PAST, or even PGE.)

At the very least, the 'pirop' property on parser rules could be handled by the PAST-to-POST transformation, so the compiler writer doesn't have to manually pull those values out of the parser grammar's optable when creating the AST.

Agreed -- I'll work on this.

Excellent.

Allison

Reply via email to