This fragment of response is about types, layers of abstraction and
tracking information as the stages of compilation progress.
And, I probably haven't said it enough yet, but the work you've done
here is absolutely wonderful, Patrick. There's nothing like a solid
chunk of working code to push the design to the next stage of evolution. :)
Patrick R. Michaud wrote:
On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:
- Is there no way to indicate what type of variable a PAST::Var is?
Scalar/Array/Hash? (high-level types, not low-level types)
Sure, that's what 'vtype' is -- it indicates the type of value
that the variable ought to hold.
My plan has been to follow the Perl6 concept of "implementation types"
and "value types" within PAST. Thus far I've only put in the support
for the value types, as the "vtype" attribute (and vtype can be any
high-level type the language happens to support). I'm expecting
to add an "itype" attribute at some point when we're a bit farther
along; I'm still working out the details.
Hrm... you've really got two HLL types: the container type
(scalar/array/hash) and the value type (Str, Int, Foo::Bar, Array, Hash,
Matrix, Custom::Hash, etc).
You've also essentially got two PIR types: the container type
(int/num/str/pmc) and the value type (int, num, str, or some pmc type).
By "implementation type" do you mean the PIR value type?
A YAML config file to map HLL value types to PIR value types for a
particular compiler would be another nice addition. PAST doesn't need to
know anything about PIR types.
- In PAST nodes, the attribute 'ctype' isn't actually storing a C
language type. Better name?
It really stands for "constant type", and is one of 'i', 'n', or
's' depending on whether it can be treated as an int, num, or
string when being handled as a constant in PIR.
Okay, 'const_type' is a better name.
- The attribute 'vtype' is both variable type in POST::Var and value
type in POST::Val. Handy generalization, but it's not clear from the
name that 'vtype' is either of those things.
I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var
or POST::Val.
Indeed I did. Though, why isn't there a POST::Var or POST::Val? POST has
both variables and values.
But 'vtype' really stands for "value type" in both
cases -- it's the type of value returned by either a PAST::Var
or PAST::Val node.
Hmm... If a PAST::Var is, say, an integer constant, will it have the
same 'value_type' as an integer PAST::Val?
(Definitely go with the longer name instead of 'vtype'.)
- The values for both 'ctype' and 'vtype' are obscure. Better to
establish a general system for representing types, than to include raw
Parrot types or 1-letter codes in the AST.
Ultimately I expect that the types that appear in 'vtype' will
be the types defined by the HLL itself. For example, in perl6
one would see 'vtype'=>'Str' to indicate a Perl 6 string constant.
Unfortunately it's been difficult to illustrate this in real code
because of the HLL classname conflicts that I've been reporting
in other contexts.
What bug # is that? It's hard to imagine how an HLL type name that's
only stored in an AST would conflict with a Parrot class name. Or, are
you assuming that the HLL type names have to be the same as the Parrot
class names? Shouldn't need to be the same, you just need a config file
mapping between the two.
I agree the values and name for 'ctype' are a bit obscure, and
will gladly accept any suggestions for improving it. The 'ctype'
attribute is really just code optimization in the final output,
and it does assume some knowledge of the target. If no ctype
is specified, past-pm assumes that the constant value must
first be placed into a PMC in order to be useful. With
a ctype present, then past-pm can match up the (PIR) opcode
contexts in which the value can be directly used as an
int/num/string in an operation. It's the difference between
# $b + 2 # $b + 2
get_global $P0, '$b' get_global $P0, '$b'
new $P2, .Undef new $P1, .Integer
add $P2, $P0, 2 assign $P1, 2
new $P2, .Undef
add $P2, $P0, $P1
or
# say 3, 4, 5 # say 3, 4, 5
"say"(3, 4, 5) new $P1, .Integer
assign $P1, 3
new $P2, .Integer
assign $P2, 4
new $P3, .Integer
assign $P3, 5
"say"($P1, $P2, $P3)
Okay, if ctype is an optimization hint, then you don't actually need to
list the specific types (i/n/s) in the PAST nodes. All you need is the
name of the HLL value type, and a small bit of config info for that type
name. Whether a particular HLL type can be used directly as an int, num,
or string, and which it can be used as, is always consistent for that
type. Int can be used as a low-level integer, and Matrix can never be
used as a low-level constant.
So, PAST provides the HLL type name, a configuration file provides
details about that type, and the PAST-to-POST transformation decides
when to use direct values (for the HLL types that allow it).
- In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op)
represents. Is it the literal name of a PIR opcode, or a generic
representation of standard low-level operations? I'm more in favor of
the latter. Better still, give compiler-writers a standard format lookup
table they can write to allow the PAST to POST tranformation to select
the right PIR operation from the HLL op name. (See the comments on
boundaries of abstraction.)
I think past-pm already has exactly what you want here, but it
may not be entirely clear. First, 'pirop' does exactly what you
request in 'Better still, ...' -- it provides a way for the compiler
writer to identify the right PIR operation from the HLL op name.
In particular, in the operator-precedence specification a
compiler writer writes:
proto infix:+ is pirop('add') { ... }
proto infix:- is pirop('sub') { ... }
and this provides an easy way for the parse-to-past transformation
to associate the correct PIR operation from the HLL op name.
Essentially, the transformation looks for a 'pirop' trait on
the operator, and if found it puts it in the 'pirop' attribute
of the corresponding PAST::Op node.
Aye, that's how I have it working now. (Actually 'n_add' instead of
'add', because 'add' didn't work, so I cargo-culted from the perl6
implementation.)
The values of 'pirop' are really generic representations of
standard low-level operators. Unfortunately, PIR is not as
regular as we might like it to be -- some PIR operations will
work only with pmc operands, some will work with a variety of
int/num/string/pmc operands, and still others won't work with
pmcs at all. So, POST.pir has a lookup table (%pirtable)
that takes the generic name given by 'pirop' and does any
necessary coercions to get the operands to match. So far
this table is incomplete -- I've been adding entries only
as I need them.
Okay, good. This is a nice abstraction layer. And, I note it can work
equally well whether the optable is generated from the parser grammar or
from a separate config file. Also good.
The idea is that a compiler writer can use 'pirop' to
specify the mapping of HLL operators into PIR opcodes
directly in the grammar files where the HLL operators are
being defined. Furthermore, the compiler writer doesn't have
to keep track of the low-level details for each PIR opcode;
i.e., when specifying 'pirop'=>'concat' the past-pm code
generation knows that concat needs string oprands. (However,
if a compiler writer needs a specific PIR opcode, then they
could specify it with something like 'pirop'=>'concat_p_sc'.)
Reasonable. The association to PIR opcode names has to be declared
somewhere. We can probably come up with better syntax than Parrot's
cryptic internal 'concat_p_sc', but it's good enough to start.
- It would be easier to maintain (and create) the list of HLL to PIR
operator associations in something like a YAML file than embedded in the
parser grammar file. [...]
Hmm. My feeling was that it was easier to put the operator
associations in the parser grammar file, but I can see the value
of placing them somewhere else, and I definitely would like to
keep Parrot-isms out of the AST as much as possible.
OTOH, there are many times when for optimization reasons or
other items it's useful to be able to drop some Parrot hints
directly into the AST (e.g., the 'ctype' attribute above), and
so I think that as long as full program semantics are captured
in the AST without any Parrot-specific items, it's okay to have
Parrot-specific items available in the AST as compiler hints
simply because it's sometimes easier to place them there than
elsewhere.
Sounds like we're in philosophical agreement. I'm okay with having a
limited amount of Parrot-specific information in the AST, if it's
extraneous to representing the semantics of the source code. At the same
time, if the compiler hints are stored in an optable that's accessible
from all stages of compilation, I don't see the advantage of annotating
them in the AST. It just spends additional processor time and storage to
create an unused copy of the information. So, "allowed but rare" would
be my rule of thumb.
Still, that question is completely separate from the question of where
the compiler writer declares the optable information. For now, let's
take both options on that one: keep the traits on the operator
precedence parser rules, but provide a config file format to generate
optables independently. (We probably need to provide the second option
anyway, since not every compiler writer will use PAST, or even PGE.)
At the very least, the 'pirop' property on parser rules could be handled
by the PAST-to-POST transformation, so the compiler writer doesn't have
to manually pull those values out of the parser grammar's optable when
creating the AST.
Agreed -- I'll work on this.
Excellent.
Allison