draft

allison Tue, 29 Jul 2008 12:35:24 -0700

Author: allison
Date: Tue Jul 29 12:34:52 2008
New Revision: 29859

Modified:
   trunk/docs/pdds/draft/pdd19_pir.pod


Log:
[pdd] Architectural review of PIR PDD.


Modified: trunk/docs/pdds/draft/pdd19_pir.pod
==============================================================================
--- trunk/docs/pdds/draft/pdd19_pir.pod (original)
+++ trunk/docs/pdds/draft/pdd19_pir.pod Tue Jul 29 12:34:52 2008
@@ -12,20 +12,15 @@
 
 =head1 ABSTRACT
 
-This document outlines the architecture and core syntax of the Parrot
+This document outlines the architecture and core syntax of Parrot
 Intermediate Representation (PIR).
 
-This document describes PIR, a stable, middle-level language for both
-compiler and human to target on.
-
 =head1 DESCRIPTION
 
 PIR is a stable, middle-level language intended both as a target for the
 generated output from high-level language compilers, and for human use
 developing core features and extensions for Parrot.
 
-=head1 IMPLEMENTATION
-
 =head2 Basic Syntax
 
 A valid PIR program consists of a sequence of statements, directives, comments
@@ -75,14 +70,14 @@
 A label declaration consists of a label name followed by a colon. A label name
 conforms to the standard requirements for identifiers. A label declaration may
 occur at the start of a statement, or stand alone on a line, but always within
-a compilation unit.
+a subroutine.
 
 A reference to a label consists of only the label name, and is generally used
 as an argument to an instruction or directive.
 
-A PIR label is accessible only in the compilation unit where it's defined. A
-label name must be unique within a compilation unit, but it can be reused in
-other compilation units.
+A PIR label is accessible only in the subroutine where it's defined. A label
+name must be unique within a subroutine, but it can be reused in other
+subroutines.
 
   goto label1
      ...
@@ -90,13 +85,8 @@
 
 =head3 Registers and Variables
 
-There are three ways of referencing Parrot's registers. The first is direct
-access to a specific register by name In, Sn, Nn, Pn. The second is through a
-temporary register variable $In, $Sn, $Nn, $Pn. I<n> consists of digit(s)
-only.  There is no limit on the size of I<n>.
-
-The third syntax for accessing registers is through named local variables
-declared with C<.local>.
+There are two ways of referencing Parrot's registers. The first is
+through named local variables declared with C<.local>.
 
   .local pmc foo
 
@@ -104,12 +94,16 @@
 corresponding to the types of registers. No other types are used. [See
 RT#42769]
 
-The difference between direct register access and register variables or local
-variables is largely a matter of allocation. If you directly reference C<P99>,
-Parrot will blindly allocate 100 registers for that compilation unit. If you
-reference C<$P99> or a named variable C<foo>, on the other hand, Parrot will
-intelligently allocate a literal register in the background. So, C<$P99> may
-be stored in C<P0>, if it is the only register in the compilation unit.
+The second way of referencing a register is through a register variable
+C<$In>, C<$Sn>, C<$Nn>, or C<$Pn>. The capital letter indicates the type
+of the register (integer, string, number, or PMC). I<n> consists of
+digit(s) only. There is no limit on the size of I<n>. There is no direct
+correspondence between the value of I<n> and the position of the
+register in the register set, C<$P42> may be stored in the zeroth PMC
+register, if it is the only register in the subroutine.
+
+{{DEPRECATION NOTE: PIR will no longer support the old PASM-style syntax
+for registers without dollar signs: C<In>, C<Sn>, C<Nn>, C<Pn>.}}
 
 =head2 Constants
 
@@ -194,11 +188,17 @@
 
   set S0, utf8:unicode:"«"
 
-The encoding and charset gets attached to the string, no further processing
-is done, specifically escape sequences are not honored.
+The encoding and charset are attached to the string constant, and
+adopted by any string containter the constant is assigned to.
+
+The standard escape sequences are honored within strings with an
+alternate encoding, so in the example above, you can include a
+particular Unicode character as either a literal sequence of bytes, or
+as an escape sequence.
 
 =item numeric constants
 
+Both integers (C<42>) and numbers (C<3.14159>) may appear as constants.
 C<0x> and C<0b> denote hex and binary constants respectively.
 
 =back
@@ -209,15 +209,15 @@
 
 =item .local <type> <identifier> [:unique_reg]
 
-Define a local name I<identifier> for this compilation unit with the given
-I<type>. You can define multiple identifiers of the same type by separating
-them with commas:
+Define a local name I<identifier> within a subroutine with the given
+I<type>. You can define multiple identifiers of the same type by
+separating them with commas:
 
   .local int i, j
 
 The optional C<:unique_reg> modifier will force the register allocator to
 associate the identifier with a unique register for the duration of the
-compilation unit.
+subroutine.
 
 =item .lex <string constant>, <reg>
 
@@ -239,44 +239,34 @@
 
 =item .const <type> <identifier> = <const>
 
-{{ PROPOSAL: add
-   .const <string constant> <identifier> = <const>
-   as an alternative to allow ".const 'Sub' ... "
-}}
-
 Define a constant named I<identifier> of type I<type> and assign value
-I<const> to it. The constant is stored in the constant table of the current
+I<const> to it. The I<type> may be either an integer value or a string
+constant. The constant is stored in the constant table of the current
 bytecode file.
 
 =item .globalconst <type> <identifier> = <const>
 
 As C<.const> above, but the defined constant is globally accessible.
 
-=item .namespace <identifier> [deprecated: See RT #48737]
+=item .sub
 
-Open a new scope block. This "namespace" is not the same as the
-.namespace [ <identifier> ] syntax, which is used for storing subroutines
-in a particular namespace in the global symbol table.
-This directive is useful in cases such as (pseudocode):
+  .sub <identifier> [:<flag> ...]
+  .sub <quoted string> [:<flag> ...]
 
-  local x = 1;
-  print(x);       # prints 1
-  do              # open a new namespace/scope block
-    local x = 2;  # this x hides the previous x
-    print(x);     # prints 2
-  end             # close the current namespace
-  print(x);       # prints 1 again
+Define a subroutine. All code in a PIR source file must be defined in a
+subroutine. See the section L<Subroutine flags> for available flags.
+Optional flags are a list of I<flag>, separated by  spaces.
 
-All types of common language constructs such as if, for, while, repeat and
-such that have nested scopes, can use this directive.
+The name of the sub may be either a bare identifier or a quoted string
+constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
+above), but string sub names can contain any characters, including characters
+from different character sets (see L<Constants> above).
 
-{{ NOTE: this variation of C<.namespace> and C<.endnamespace> are deprecated.
-They were a hackish attempt at implementing scopes in Parrot, but didn't
-actually turn out to be useful.}}
+Always paired with C<.end>.
 
-=item .endnamespace <identifier> [deprecated: See RT #48737]
+=item .end
 
-Closes the scope block that was opened with .namespace <identifier>.
+End a subroutine. Always paired with C<.sub>.
 
 =item .namespace [ <identifier> ; <identifier> ]
 
@@ -295,21 +285,8 @@
 
 The brackets are not optional, although the string inside them is.
 
-{{ NOTE: currently the brackets *are* optional. TODO: make decision whether
-   we want the brackets optional. }}
-
-
-=item .pragma n_operators
-
-Convert arithmethic infix operators to n_infix operations. The unary opcodes
-C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_>
-prefix.
-
- .pragma n_operators 1
- .sub foo
-   ...
-   $P0 = $P1 + $P2           # n_add $P0, $P1, $P2
-   $P2 = abs $P0             # n_abs $P2, $P0
+{{ NOTE: currently the brackets *are* optional, so this is an
+implementation change. }}
 
 =item .loadlib "lib_name"
 
@@ -319,73 +296,87 @@
 A library loaded this way is also available at runtime, as if it has been
 loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.
 
-=item .HLL <hll_name>, <hll_lib>
+=item .HLL <hll_name>
 
-Define the HLL for the current file. Takes two string constants. If the string
-I<hll_lib> isn't empty this compile time pragma also loads the shared lib for
-the HLL, so that integer type constants are working for creating new PMCs.
-
-{{ PROPOSAL: make the ",<hll_lib>" part optional, so you don't have to
-   specify an empty string for the library.
-   (Alternatively, make this two different directives: .HLL_name, .HLL_lib)
-}}
+Define the HLL for the current file. Takes one string constant, the name
+of the HLL.
 
-=item .HLL_map <core_type>, <user_type>
+=item .HLL <hll_name>, <hll_lib> [deprecated]
 
-{{ PROPOSAL: make the ',' an "->", "=>", "=", for instance, so it's easier
-   to remember what argument comes first, the core type or the user type.
-}}
+An old form of the .HLL directive that also loaded a shared lib for the
+HLL. Use C<.loadlib> instead.
+
+=item .HLL_map <core_type> = <user_type>
+
+{{ NOTE: the '=' used to be ','. }}
 
 Whenever Parrot has to create PMCs inside C code on behalf of the running
-user program it consults the current type mapping for the executing HLL
+user program, it consults the current type mapping for the executing HLL
 and creates a PMC of type I<user_type> instead of I<core_type>, if such
 a mapping is defined. I<core_type> and I<user_type> may be any valid string
 constant.
 
-For example, with this code snippet ...
+For example, with this code snippet:
 
   .loadlib 'dynlexpad'
 
   .HLL "Foo", ""
-  .HLL_map 'LexPad', 'DynLexPad'
+  .HLL_map 'LexPad' = 'DynLexPad'
 
   .sub main :main
     ...
 
-... all subroutines for language I<Foo> would use a dynamic lexpad pmc.
+all subroutines for language I<Foo> would use a dynamic lexpad pmc.
 
-{{ PROPOSAL: stop using integer constants for types RT#45453 }}
+=item .line <integer>, <string>
 
-=item .sub
+Set the line number and filename to the value specified. This is useful in
+case the PIR code is generated from some source file, and error messages
+should print the source file, not the line number and filename of the
+generated file.
 
-  .sub <identifier> [:<flag> ...]
-  .sub <quoted string> [:<flag> ...]
+{{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857],
+[RT#43269], and [RT#47141]. }}
 
-Define a compilation unit. All code in a PIR source file must be defined in a
-compilation unit. See the section C<Subroutine flags> for
-available flags.  Optional flags are a list of I<flag>, separated by empty
-spaces.
+=item .namespace <identifier> [deprecated: See RT #48737]
 
-The name of the sub may be either a bare identifier or a quoted string
-constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
-above), but string sub names can contain any characters, including characters
-from different character sets (see L<Constants> above).
+{{ DEPRECATION NOTE: this variation of C<.namespace> and
+C<.endnamespace> are deprecated.  They were a hackish attempt at
+implementing scopes in Parrot, but didn't actually turn out to be
+useful.}}
 
-Always paired with C<.end>.
+Open a new scope block. This "namespace" is not the same as the
+.namespace [ <identifier> ] syntax, which is used for storing subroutines
+in a particular namespace in the global symbol table.
+This directive is useful in cases such as (pseudocode):
 
-=item .end
+  local x = 1;
+  print(x);       # prints 1
+  do              # open a new namespace/scope block
+    local x = 2;  # this x hides the previous x
+    print(x);     # prints 2
+  end             # close the current namespace
+  print(x);       # prints 1 again
 
-End a compilation unit. Always paired with C<.sub>.
+All types of common language constructs such as if, for, while, repeat and
+such that have nested scopes, can use this directive.
 
-=item .line <integer>, <string>
+=item .endnamespace <identifier> [deprecated: See RT #48737]
 
-Set the line number and filename to the value specified. This is useful in
-case the PIR code is generated from some source file, and any error messages
-should print the source file, not the line number and filename of the
-generated file.
+Closes the scope block that was opened with .namespace <identifier>.
+
+=item .pragma n_operators [deprecated]
+
+Convert arithmethic infix operators to n_infix operations. The unary opcodes
+C<abs>, C<not>, C<bnot>, C<bnots>, and C<neg> are also changed to use a C<n_>
+prefix.
+
+ .pragma n_operators 1
+ .sub foo
+   ...
+   $P0 = $P1 + $P2           # n_add $P0, $P1, $P2
+   $P2 = abs $P0             # n_abs $P2, $P0
 
-{{ DEPRECATION NOTE: was C<<#line <integer> <string>>>. See [RT#45857],
-[RT#43269], and [RT#47141]. }}
 
 =back
 
@@ -483,26 +474,34 @@
 
 =item :method
 
-The marked C<.sub> is a method. In the method body, the object PMC
-can be referred to with C<self>.
+  .sub bar :method
+  .sub bar :method("foo")
+  
+The marked C<.sub> is a method, added as a method in the class that
+corresponds to the current namespace, and not stored in the namespace.
+In the method body, the object PMC can be referred to with C<self>.
+
+If a string argument is given to C<:method> the method is stored with
+that name instead of the C<.sub> name.
 
 =item :vtable
 
-The marked C<.sub> overrides a v-table method. By default, a sub with the same
-name as a v-table method does not override the v-table method. To specify that
-there should be no namespace entry (that is, it just overrides the v-table
-method but is callable as a normal method), use B<:vtable :anon>. To give the
-v-table method a different name, use B<:vtable("...")>. For example, to have
-the method B<ToString> also be the v-table method B<get_string>), use
-B<:vtable("get_string")>.
+  .sub bar :vtable
+  .sub bar :vtable("foo")
+
+The marked C<.sub> overrides a vtable function, and is not stored in the
+namespace. By default, it overrides a vtable function with the same name
+as the C<.sub> name.  To override a different vtable function, use
+C<:vtable("...")>. For example, to have a C<.sub> named I<ToString> also
+be the vtable function C<get_string>), use C<:vtable("get_string")>.
 
 When the B<:vtable> flag is set, the object PMC can be referred to with 
 C<self>, as with the B<:method> flag.
 
-
 =item :outer(subname)
 
-The marked C<.sub> is lexically nested within the sub known by B<subname>.
+The marked C<.sub> is lexically nested within the sub known by
+I<subname>.
 
 =item :lexid( <string_constant> )
 
@@ -591,7 +590,10 @@
 be stored. Available flags:
 C<:slurpy>, C<:named>, C<:optional>, C<:opt_flag> and C<:unique_reg>.
 
-=item .param <type> "<identifier>" => <identifier> [:<flag>]*
+=item .param <type> "<identifier>" => <identifier> [:<flag>]* [deprecate]
+
+{{ NOTE: if this is already implemented, deprecate, otherwise, just
+delete from spec.}}
 
 Define a named parameter. This is syntactic sugar for:
 
@@ -648,59 +650,56 @@
 
 =item if <var> goto <identifier>
 
-If I<var> evaluates as true, jump to the named I<identifier>. Translate to
-C<if var, identifier>.
+If I<var> evaluates as true, jump to the named I<identifier>.
 
 =item unless <var> goto <identifier>
 
-Unless I<var> evaluates as true, jump to the named I<identifier>. Translate
-to C<unless var, identifier>.
+Unless I<var> evaluates as true, jump to the named I<identifier>.
 
 =item if null <var> goto <identifier>
 
-If I<var> evaluates as null, jump to the named I<identifier>. Translate to
-C<if_null var, identifier>.
+If I<var> evaluates as null, jump to the named I<identifier>.
 
 =item unless null <var> goto <identifier>
 
-Unless I<var> evaluates as null, jump to the named I<identifier>. Translate
-to C<unless_null var, identifier>.
+Unless I<var> evaluates as null, jump to the named I<identifier>.
 
 =item if <var1> <relop> <var2> goto <identifier>
 
-The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
+The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>.
+ which translate
 to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If
 I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
 
 =item unless <var1> <relop> <var2> goto <identifier>
 
-The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>> which translate
-to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. Unless
+The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. Unless
 I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
 
 =item <var1> = <var2>
 
-Assign a value. Translates to C<set var1, var2>.
+Assign a value.
 
 =item <var1> = <unary> <var2>
 
-The unaries C<!>, C<-> and C<~> generate C<not>, C<neg> and C<bnot> ops.
+Unary operations C<!> (NOT), C<-> (negation) and C<~> (bitwise NOT).
 
 =item <var1> = <var2> <binary> <var3>
 
-The binaries C<+>, C<->, C<*>, C</>, C<%> and C<**> generate
-C<add>, C<sub>, C<mul>, C<div>, C<mod> and C<pow> arithmetic ops.
-binary C<.> is C<concat> and only valid for string arguments.
+Binary arithmetic operations C<+> (addition), C<-> (subtraction), C<*>
+(multiplication), C</> (division), C<%> (modulus) and C<**> (exponent).
+Binary C<.> is concatenation and only valid for string arguments.
 
-C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts C<shl> and C<shr>.
-C<E<gt>E<gt>E<gt>> is the logical shift C<lsr>.
+C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts left and right.
+C<E<gt>E<gt>E<gt>> is the logical shift right.
 
-C<&&>, C<||> and C<~~> are logic C<and>, C<or> and C<xor>.
+Binary logic operations C<&&> (AND), C<||> (OR) and C<~~> (XOR).
 
-C<&>, C<|> and C<~> are binary C<band>, C<bor> and C<bxor>.
+Binary bitwise operations C<&> (bitwise AND), C<|> (bitwise OR) and C<~>
+(bitwise XOR).
 
 {{PROPOSAL: Change description to support logic operators (comparisons) as
-implemented (and working) in imcc.y.}}
+implemented (and working) in imcc.y. ANR: proposal not clear.}}
 
 =item <var1> <op>= <var2>
 
@@ -712,8 +711,10 @@
 
 =item <var> = <var> [ <var> ]
 
-This generates either a keyed C<set> operation or C<substr var, var,
-var, 1> for string arguments and an integer key.
+A keyed C<set> operation for PMCs or a substring operation for string
+arguments and an integer key.
+
+{{ DEPRECATION NOTE: Possibly deprecate the substring variant. }}
 
 =item <var> = <var> [ <key> ]
 
@@ -783,30 +784,30 @@
 
 =item <var>."_method"([arg [:<flag> ...], ...])
 
-=item <var>._method([arg [:<flag> ...], ...])
+=item <var>.<var>([arg [:<flag> ...], ...])
 
 Function or method call. These notations are shorthand for a longer PCC
 function call. I<var> can denote a global subroutine, a local I<identifier> or
 a I<reg>.
 
-{{We should review the (currently inconsistent) specification of the
-method name. Currently it can be a bare word, a quoted string or a
-string register. See #45859.}}
+{{ DEPRECATION NOTE: bare word method names (e.g. C<foo.bar()> where
+C<bar> is not a local variable name) are deprecated. Use a quoted string
+instead. See #45859. }}
 
 =item .return ([<var> [:<flag> ...], ...])
 
-Return from the current compilation unit with zero or more values.
+Return from the current subroutine with zero or more values.
 
-The surrounded parentheses are mandatory. Besides making sequence
-break more conspicuous, this is necessary to distinguish this syntax
-from other uses of the C<.return> directive that will be probably
+The parentheses surrounding the arguments are mandatory. Besides making
+sequence break more conspicuous, this is necessary to distinguish this
+syntax from other uses of the C<.return> directive that will be probably
 deprecated.
 
 =item .return <var>(args)
 
 =item .return <var>."somemethod"(args)
 
-=item .return <var>.somemethod(args)
+=item .return <var>.<var>(args)
 
 Tail call: call a function or method and return from the sub with the
 function or method call return values.
@@ -827,28 +828,16 @@
 or a C-level conversion (int cast, float cast, a string copy, or a call to one
 of the conversion functions like C<string_to_num>).
 
-A PMC source with a low-level destination, calls the C<get_integer>,
-C<get_number>, or C<get_string> vtable function on the PMC. A low-level source
-with a PMC destination calls the C<set_integer_native>, C<set_number_native>,
-or C<set_string_native> vtable function on the PMC (assign to value
-semantics).  Two PMC arguments are a direct C assignment (assign to container
-semantics).
+Assigning a PMC argument to a low-level argument calls the
+C<get_integer>, C<get_number>, or C<get_string> vtable function on the
+PMC. Assigning a low-level argument to a PMC argument calls the
+C<set_integer_native>, C<set_number_native>, or C<set_string_native>
+vtable function on the PMC (assign to value semantics). Two PMC
+arguments are a direct C assignment (assign to container semantics).
 
 For assign to value semantics for two PMC arguments use C<assign>, which calls
 the C<assign_pmc> vtable function.
 
-
-{{ NOTE: response to the question:
-
-    <pmichaud>  I don't think that 'morph' as a method call is a good idea
-    <pmichaud>  we need something that says "assign to value" versus
-        "assign to container"
-    <pmichaud>  we can't eliminate the existing 'morph' opcode until we have a
-        replacement
-
-}}
-
-
 =head2 Macros
 
 This section describes the macro layer of the PIR language. The macro layer of
@@ -867,7 +856,7 @@
 runtime/parrot/include, in that order. The first file of that name to be found
 is included.
 
-{{ Check the include directive's search order and whether it's complete }}
+{{ NOTE: the C<include> directive's search order is subject to change. }}
 
 =item * C<.macro> <identifier> [<parameters>]
 
@@ -1275,6 +1264,10 @@
 argument before a variable number of following arguments is the
 argument count.
 
+=head1 IMPLEMENTATION
+
+There are multiple implementations of PIR, each of which will meet this
+specification for the syntax.
 
 =head1 ATTACHMENTS

[svn:parrot-pdd] r29859 - trunk/docs/pdds/draft

Reply via email to