Author: allison Date: Tue Jul 29 12:34:52 2008 New Revision: 29859 Modified: trunk/docs/pdds/draft/pdd19_pir.pod
Log: [pdd] Architectural review of PIR PDD. Modified: trunk/docs/pdds/draft/pdd19_pir.pod ============================================================================== --- trunk/docs/pdds/draft/pdd19_pir.pod (original) +++ trunk/docs/pdds/draft/pdd19_pir.pod Tue Jul 29 12:34:52 2008 @@ -12,20 +12,15 @@ =head1 ABSTRACT -This document outlines the architecture and core syntax of the Parrot +This document outlines the architecture and core syntax of Parrot Intermediate Representation (PIR). -This document describes PIR, a stable, middle-level language for both -compiler and human to target on. - =head1 DESCRIPTION PIR is a stable, middle-level language intended both as a target for the generated output from high-level language compilers, and for human use developing core features and extensions for Parrot. -=head1 IMPLEMENTATION - =head2 Basic Syntax A valid PIR program consists of a sequence of statements, directives, comments @@ -75,14 +70,14 @@ A label declaration consists of a label name followed by a colon. A label name conforms to the standard requirements for identifiers. A label declaration may occur at the start of a statement, or stand alone on a line, but always within -a compilation unit. +a subroutine. A reference to a label consists of only the label name, and is generally used as an argument to an instruction or directive. -A PIR label is accessible only in the compilation unit where it's defined. A -label name must be unique within a compilation unit, but it can be reused in -other compilation units. +A PIR label is accessible only in the subroutine where it's defined. A label +name must be unique within a subroutine, but it can be reused in other +subroutines. goto label1 ... @@ -90,13 +85,8 @@ =head3 Registers and Variables -There are three ways of referencing Parrot's registers. The first is direct -access to a specific register by name In, Sn, Nn, Pn. The second is through a -temporary register variable $In, $Sn, $Nn, $Pn. I<n> consists of digit(s) -only. There is no limit on the size of I<n>. - -The third syntax for accessing registers is through named local variables -declared with C<.local>. +There are two ways of referencing Parrot's registers. The first is +through named local variables declared with C<.local>. .local pmc foo @@ -104,12 +94,16 @@ corresponding to the types of registers. No other types are used. [See RT#42769] -The difference between direct register access and register variables or local -variables is largely a matter of allocation. If you directly reference C<P99>, -Parrot will blindly allocate 100 registers for that compilation unit. If you -reference C<$P99> or a named variable C<foo>, on the other hand, Parrot will -intelligently allocate a literal register in the background. So, C<$P99> may -be stored in C<P0>, if it is the only register in the compilation unit. +The second way of referencing a register is through a register variable +C<$In>, C<$Sn>, C<$Nn>, or C<$Pn>. The capital letter indicates the type +of the register (integer, string, number, or PMC). I<n> consists of +digit(s) only. There is no limit on the size of I<n>. There is no direct +correspondence between the value of I<n> and the position of the +register in the register set, C<$P42> may be stored in the zeroth PMC +register, if it is the only register in the subroutine. + +{{DEPRECATION NOTE: PIR will no longer support the old PASM-style syntax +for registers without dollar signs: C<In>, C<Sn>, C<Nn>, C<Pn>.}} =head2 Constants @@ -194,11 +188,17 @@ set S0, utf8:unicode:"«" -The encoding and charset gets attached to the string, no further processing -is done, specifically escape sequences are not honored. +The encoding and charset are attached to the string constant, and +adopted by any string containter the constant is assigned to. + +The standard escape sequences are honored within strings with an +alternate encoding, so in the example above, you can include a +particular Unicode character as either a literal sequence of bytes, or +as an escape sequence. =item numeric constants +Both integers (C<42>) and numbers (C<3.14159>) may appear as constants. C<0x> and C<0b> denote hex and binary constants respectively. =back @@ -209,15 +209,15 @@ =item .local <type> <identifier> [:unique_reg] -Define a local name I<identifier> for this compilation unit with the given -I<type>. You can define multiple identifiers of the same type by separating -them with commas: +Define a local name I<identifier> within a subroutine with the given +I<type>. You can define multiple identifiers of the same type by +separating them with commas: .local int i, j The optional C<:unique_reg> modifier will force the register allocator to associate the identifier with a unique register for the duration of the -compilation unit. +subroutine. =item .lex <string constant>, <reg> @@ -239,44 +239,34 @@ =item .const <type> <identifier> = <const> -{{ PROPOSAL: add - .const <string constant> <identifier> = <const> - as an alternative to allow ".const 'Sub' ... " -}} - Define a constant named I<identifier> of type I<type> and assign value -I<const> to it. The constant is stored in the constant table of the current +I<const> to it. The I<type> may be either an integer value or a string +constant. The constant is stored in the constant table of the current bytecode file. =item .globalconst <type> <identifier> = <const> As C<.const> above, but the defined constant is globally accessible. -=item .namespace <identifier> [deprecated: See RT #48737] +=item .sub -Open a new scope block. This "namespace" is not the same as the -.namespace [ <identifier> ] syntax, which is used for storing subroutines -in a particular namespace in the global symbol table. -This directive is useful in cases such as (pseudocode): + .sub <identifier> [:<flag> ...] + .sub <quoted string> [:<flag> ...] - local x = 1; - print(x); # prints 1 - do # open a new namespace/scope block - local x = 2; # this x hides the previous x - print(x); # prints 2 - end # close the current namespace - print(x); # prints 1 again +Define a subroutine. All code in a PIR source file must be defined in a +subroutine. See the section L<Subroutine flags> for available flags. +Optional flags are a list of I<flag>, separated by spaces. -All types of common language constructs such as if, for, while, repeat and -such that have nested scopes, can use this directive. +The name of the sub may be either a bare identifier or a quoted string +constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers> +above), but string sub names can contain any characters, including characters +from different character sets (see L<Constants> above). -{{ NOTE: this variation of C<.namespace> and C<.endnamespace> are deprecated. -They were a hackish attempt at implementing scopes in Parrot, but didn't -actually turn out to be useful.}} +Always paired with C<.end>. -=item .endnamespace <identifier> [deprecated: See RT #48737] +=item .end -Closes the scope block that was opened with .namespace <identifier>. +End a subroutine. Always paired with C<.sub>. =item .namespace [ <identifier> ; <identifier> ] @@ -295,21 +285,8 @@ The brackets are not optional, although the string inside them is. -{{ NOTE: currently the brackets *are* optional. TODO: make decision whether - we want the brackets optional. }} - - -=item .pragma n_operators - -Convert arithmethic infix operators to n_infix operations. The unary opcodes -C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_> -prefix. - - .pragma n_operators 1 - .sub foo - ... - $P0 = $P1 + $P2 # n_add $P0, $P1, $P2 - $P2 = abs $P0 # n_abs $P2, $P0 +{{ NOTE: currently the brackets *are* optional, so this is an +implementation change. }} =item .loadlib "lib_name" @@ -319,73 +296,87 @@ A library loaded this way is also available at runtime, as if it has been loaded again in C<:load>, so there is no need to call C<loadlib> at runtime. -=item .HLL <hll_name>, <hll_lib> +=item .HLL <hll_name> -Define the HLL for the current file. Takes two string constants. If the string -I<hll_lib> isn't empty this compile time pragma also loads the shared lib for -the HLL, so that integer type constants are working for creating new PMCs. - -{{ PROPOSAL: make the ",<hll_lib>" part optional, so you don't have to - specify an empty string for the library. - (Alternatively, make this two different directives: .HLL_name, .HLL_lib) -}} +Define the HLL for the current file. Takes one string constant, the name +of the HLL. -=item .HLL_map <core_type>, <user_type> +=item .HLL <hll_name>, <hll_lib> [deprecated] -{{ PROPOSAL: make the ',' an "->", "=>", "=", for instance, so it's easier - to remember what argument comes first, the core type or the user type. -}} +An old form of the .HLL directive that also loaded a shared lib for the +HLL. Use C<.loadlib> instead. + +=item .HLL_map <core_type> = <user_type> + +{{ NOTE: the '=' used to be ','. }} Whenever Parrot has to create PMCs inside C code on behalf of the running -user program it consults the current type mapping for the executing HLL +user program, it consults the current type mapping for the executing HLL and creates a PMC of type I<user_type> instead of I<core_type>, if such a mapping is defined. I<core_type> and I<user_type> may be any valid string constant. -For example, with this code snippet ... +For example, with this code snippet: .loadlib 'dynlexpad' .HLL "Foo", "" - .HLL_map 'LexPad', 'DynLexPad' + .HLL_map 'LexPad' = 'DynLexPad' .sub main :main ... -... all subroutines for language I<Foo> would use a dynamic lexpad pmc. +all subroutines for language I<Foo> would use a dynamic lexpad pmc. -{{ PROPOSAL: stop using integer constants for types RT#45453 }} +=item .line <integer>, <string> -=item .sub +Set the line number and filename to the value specified. This is useful in +case the PIR code is generated from some source file, and error messages +should print the source file, not the line number and filename of the +generated file. - .sub <identifier> [:<flag> ...] - .sub <quoted string> [:<flag> ...] +{{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857], +[RT#43269], and [RT#47141]. }} -Define a compilation unit. All code in a PIR source file must be defined in a -compilation unit. See the section C<Subroutine flags> for -available flags. Optional flags are a list of I<flag>, separated by empty -spaces. +=item .namespace <identifier> [deprecated: See RT #48737] -The name of the sub may be either a bare identifier or a quoted string -constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers> -above), but string sub names can contain any characters, including characters -from different character sets (see L<Constants> above). +{{ DEPRECATION NOTE: this variation of C<.namespace> and +C<.endnamespace> are deprecated. They were a hackish attempt at +implementing scopes in Parrot, but didn't actually turn out to be +useful.}} -Always paired with C<.end>. +Open a new scope block. This "namespace" is not the same as the +.namespace [ <identifier> ] syntax, which is used for storing subroutines +in a particular namespace in the global symbol table. +This directive is useful in cases such as (pseudocode): -=item .end + local x = 1; + print(x); # prints 1 + do # open a new namespace/scope block + local x = 2; # this x hides the previous x + print(x); # prints 2 + end # close the current namespace + print(x); # prints 1 again -End a compilation unit. Always paired with C<.sub>. +All types of common language constructs such as if, for, while, repeat and +such that have nested scopes, can use this directive. -=item .line <integer>, <string> +=item .endnamespace <identifier> [deprecated: See RT #48737] -Set the line number and filename to the value specified. This is useful in -case the PIR code is generated from some source file, and any error messages -should print the source file, not the line number and filename of the -generated file. +Closes the scope block that was opened with .namespace <identifier>. + +=item .pragma n_operators [deprecated] + +Convert arithmethic infix operators to n_infix operations. The unary opcodes +C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_> +prefix. + + .pragma n_operators 1 + .sub foo + ... + $P0 = $P1 + $P2 # n_add $P0, $P1, $P2 + $P2 = abs $P0 # n_abs $P2, $P0 -{{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857], -[RT#43269], and [RT#47141]. }} =back @@ -483,26 +474,34 @@ =item :method -The marked C<.sub> is a method. In the method body, the object PMC -can be referred to with C<self>. + .sub bar :method + .sub bar :method("foo") + +The marked C<.sub> is a method, added as a method in the class that +corresponds to the current namespace, and not stored in the namespace. +In the method body, the object PMC can be referred to with C<self>. + +If a string argument is given to C<:method> the method is stored with +that name instead of the C<.sub> name. =item :vtable -The marked C<.sub> overrides a v-table method. By default, a sub with the same -name as a v-table method does not override the v-table method. To specify that -there should be no namespace entry (that is, it just overrides the v-table -method but is callable as a normal method), use B<:vtable :anon>. To give the -v-table method a different name, use B<:vtable("...")>. For example, to have -the method B<ToString> also be the v-table method B<get_string>), use -B<:vtable("get_string")>. + .sub bar :vtable + .sub bar :vtable("foo") + +The marked C<.sub> overrides a vtable function, and is not stored in the +namespace. By default, it overrides a vtable function with the same name +as the C<.sub> name. To override a different vtable function, use +C<:vtable("...")>. For example, to have a C<.sub> named I<ToString> also +be the vtable function C<get_string>), use C<:vtable("get_string")>. When the B<:vtable> flag is set, the object PMC can be referred to with C<self>, as with the B<:method> flag. - =item :outer(subname) -The marked C<.sub> is lexically nested within the sub known by B<subname>. +The marked C<.sub> is lexically nested within the sub known by +I<subname>. =item :lexid( <string_constant> ) @@ -591,7 +590,10 @@ be stored. Available flags: C<:slurpy>, C<:named>, C<:optional>, C<:opt_flag> and C<:unique_reg>. -=item .param <type> "<identifier>" => <identifier> [:<flag>]* +=item .param <type> "<identifier>" => <identifier> [:<flag>]* [deprecate] + +{{ NOTE: if this is already implemented, deprecate, otherwise, just +delete from spec.}} Define a named parameter. This is syntactic sugar for: @@ -648,59 +650,56 @@ =item if <var> goto <identifier> -If I<var> evaluates as true, jump to the named I<identifier>. Translate to -C<if var, identifier>. +If I<var> evaluates as true, jump to the named I<identifier>. =item unless <var> goto <identifier> -Unless I<var> evaluates as true, jump to the named I<identifier>. Translate -to C<unless var, identifier>. +Unless I<var> evaluates as true, jump to the named I<identifier>. =item if null <var> goto <identifier> -If I<var> evaluates as null, jump to the named I<identifier>. Translate to -C<if_null var, identifier>. +If I<var> evaluates as null, jump to the named I<identifier>. =item unless null <var> goto <identifier> -Unless I<var> evaluates as null, jump to the named I<identifier>. Translate -to C<unless_null var, identifier>. +Unless I<var> evaluates as null, jump to the named I<identifier>. =item if <var1> <relop> <var2> goto <identifier> -The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate +The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. + which translate to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If I<var1 relop var2> evaluates as true, jump to the named I<identifier>. =item unless <var1> <relop> <var2> goto <identifier> -The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate -to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless +The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. Unless I<var1 relop var2> evaluates as true, jump to the named I<identifier>. =item <var1> = <var2> -Assign a value. Translates to C<set var1, var2>. +Assign a value. =item <var1> = <unary> <var2> -The unaries C<!>, C<-> and C<~> generate C<not>, C<neg> and C<bnot> ops. +Unary operations C<!> (NOT), C<-> (negation) and C<~> (bitwise NOT). =item <var1> = <var2> <binary> <var3> -The binaries C<+>, C<->, C<*>, C</>, C<%> and C<**> generate -C<add>, C<sub>, C<mul>, C<div>, C<mod> and C<pow> arithmetic ops. -binary C<.> is C<concat> and only valid for string arguments. +Binary arithmetic operations C<+> (addition), C<-> (subtraction), C<*> +(multiplication), C</> (division), C<%> (modulus) and C<**> (exponent). +Binary C<.> is concatenation and only valid for string arguments. -C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts C<shl> and C<shr>. -C<E<gt>E<gt>E<gt>> is the logical shift C<lsr>. +C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts left and right. +C<E<gt>E<gt>E<gt>> is the logical shift right. -C<&&>, C<||> and C<~~> are logic C<and>, C<or> and C<xor>. +Binary logic operations C<&&> (AND), C<||> (OR) and C<~~> (XOR). -C<&>, C<|> and C<~> are binary C<band>, C<bor> and C<bxor>. +Binary bitwise operations C<&> (bitwise AND), C<|> (bitwise OR) and C<~> +(bitwise XOR). {{PROPOSAL: Change description to support logic operators (comparisons) as -implemented (and working) in imcc.y.}} +implemented (and working) in imcc.y. ANR: proposal not clear.}} =item <var1> <op>= <var2> @@ -712,8 +711,10 @@ =item <var> = <var> [ <var> ] -This generates either a keyed C<set> operation or C<substr var, var, -var, 1> for string arguments and an integer key. +A keyed C<set> operation for PMCs or a substring operation for string +arguments and an integer key. + +{{ DEPRECATION NOTE: Possibly deprecate the substring variant. }} =item <var> = <var> [ <key> ] @@ -783,30 +784,30 @@ =item <var>."_method"([arg [:<flag> ...], ...]) -=item <var>._method([arg [:<flag> ...], ...]) +=item <var>.<var>([arg [:<flag> ...], ...]) Function or method call. These notations are shorthand for a longer PCC function call. I<var> can denote a global subroutine, a local I<identifier> or a I<reg>. -{{We should review the (currently inconsistent) specification of the -method name. Currently it can be a bare word, a quoted string or a -string register. See #45859.}} +{{ DEPRECATION NOTE: bare word method names (e.g. C<foo.bar()> where +C<bar> is not a local variable name) are deprecated. Use a quoted string +instead. See #45859. }} =item .return ([<var> [:<flag> ...], ...]) -Return from the current compilation unit with zero or more values. +Return from the current subroutine with zero or more values. -The surrounded parentheses are mandatory. Besides making sequence -break more conspicuous, this is necessary to distinguish this syntax -from other uses of the C<.return> directive that will be probably +The parentheses surrounding the arguments are mandatory. Besides making +sequence break more conspicuous, this is necessary to distinguish this +syntax from other uses of the C<.return> directive that will be probably deprecated. =item .return <var>(args) =item .return <var>."somemethod"(args) -=item .return <var>.somemethod(args) +=item .return <var>.<var>(args) Tail call: call a function or method and return from the sub with the function or method call return values. @@ -827,28 +828,16 @@ or a C-level conversion (int cast, float cast, a string copy, or a call to one of the conversion functions like C<string_to_num>). -A PMC source with a low-level destination, calls the C<get_integer>, -C<get_number>, or C<get_string> vtable function on the PMC. A low-level source -with a PMC destination calls the C<set_integer_native>, C<set_number_native>, -or C<set_string_native> vtable function on the PMC (assign to value -semantics). Two PMC arguments are a direct C assignment (assign to container -semantics). +Assigning a PMC argument to a low-level argument calls the +C<get_integer>, C<get_number>, or C<get_string> vtable function on the +PMC. Assigning a low-level argument to a PMC argument calls the +C<set_integer_native>, C<set_number_native>, or C<set_string_native> +vtable function on the PMC (assign to value semantics). Two PMC +arguments are a direct C assignment (assign to container semantics). For assign to value semantics for two PMC arguments use C<assign>, which calls the C<assign_pmc> vtable function. - -{{ NOTE: response to the question: - - <pmichaud> I don't think that 'morph' as a method call is a good idea - <pmichaud> we need something that says "assign to value" versus - "assign to container" - <pmichaud> we can't eliminate the existing 'morph' opcode until we have a - replacement - -}} - - =head2 Macros This section describes the macro layer of the PIR language. The macro layer of @@ -867,7 +856,7 @@ runtime/parrot/include, in that order. The first file of that name to be found is included. -{{ Check the include directive's search order and whether it's complete }} +{{ NOTE: the C<include> directive's search order is subject to change. }} =item * C<.macro> <identifier> [<parameters>] @@ -1275,6 +1264,10 @@ argument before a variable number of following arguments is the argument count. +=head1 IMPLEMENTATION + +There are multiple implementations of PIR, each of which will meet this +specification for the syntax. =head1 ATTACHMENTS