On Mon, Aug 08, 2005 at 12:03:25AM +0100, Jonathan Worthington wrote:
> So, I re-wrote it.  It now talks about PIR, and has examples in PIR.  It 
> mentions how PIR differs from PASM.  Subroutines now get a look in to the 
> introduction, and it mentions in passing that Parrot is capable of doing OO 
> stuff, threads and GC.  I've attached it for review and, if nobody yelps in 
> horror, I'll commit it in a couple of days.  Suggestions for improvements 
> are welcome, though as it's an intro it probably doesn't want to get too 
> much longer.

I did some spellchecking, PODification, and also some pedantic
wordsmithing (as attached).  Hope it helps.

Thanks,
/Autrijus/
# Copyright: 2001-2005 The Perl Foundation.  All Rights Reserved.
# $Id: intro.pod 8818 2005-08-05 13:21:09Z leo $

=head1 NAME

docs/intro.pod - The Parrot Primer

=head1 Welcome to Parrot

This document provides a gentle introduction to the Parrot virtual machine for 
anyone
considering writing code for Parrot by hand, writing a compiler that targets 
Parrot,
getting involved with Parrot development or simply wondering what on earth 
Parrot is.

=head1 What is Parrot?

=head2 Virtual Machines

Parrot is a virtual machine. To understand what a virtual machine is, consider 
what
happens when you write a program in a language such as Perl, the run it with the
applicable interpreter (in the case of Perl, the perl executable). First, the 
program
you have written in a high level language is turned into simple instructions,
for example I<fetch the value of the variable named x>, I<add 2 to this value>,
I<store this value in the variable named y>, etc. A single line of code in a
high level language may be converted into tens of these simple instructions.
This stage is called I<compilation>.

The second stage involves executing these simple instructions. Some languages 
(for
example, C) are often compiled to instructions that are understood by the
CPU and as such can be executed by the hardware. Other languages, such as Perl,
Python and Java, are usually compiled to CPU-independent instructions.  A
I<virtual machine> (sometimes known as an I<interpreter>) is required to
execute those instructions.

While the central role of a virtual machine is to efficiently execute 
instructions,
it also performs a number of other functions. One of these is to abstract away 
the
details of the hardware and operating system that a program is running on. Once 
a
program has been compiled to run on a virtual machine, it will run on any 
platform
that the VM has been implemented on. VMs may also provide security by allowing 
more
fine-grained limitations to be placed on a program, memory management 
functionality
and support for high level language features (such as objects, data structures,
types, subroutines, etc).

=head2 Design goals

Parrot is designed with the needs of dynamically typed languages (such as Perl 
and
Python) in mind, and should be able to run programs written in these languages 
more
efficiently than VMs developed with static languages in mind (JVM, .NET). 
Parrot is
also designed to provide interoperability between languages that compile to it. 
In
theory, you will be able to write a class in Perl, subclass it in Python and 
then
instantiate and use that subclass in a Tcl program.

Historically, Parrot started out as the runtime for Perl 6. Unlike Perl 5, the 
Perl 6
compiler and runtime (VM) are to be much more clearly separated. The name 
I<Parrot>
was chosen after the 2001 April Fool's Joke which had Perl and Python 
collaborating
on the next version of their languages. The name reflects the intention to build
a VM to run not just Perl 6, but also many other languages.


=head1 Parrot concepts and jargon

=head2 Instruction formats

Parrot can currently accept instructions to execute in four forms. PIR (Parrot
Intermediate Representation) is designed to be written by people and generated 
by
compilers. It hides away some low-level details, such as the way parameters are
passed to functions. PASM (Parrot Assembly) is a level below PIR - it is still 
human
readable/writable and can be generated by a compiler, but the author has to 
take care
of details such as calling conventions and register spilling. PAST (Parrot 
Abstract
Syntax Tree) enables Parrot to accept an abstract syntax tree style input - 
useful
for those writing compilers.

All of the above forms of input are automatically converted inside Parrot to PBC
(Parrot Bytecode). This is much like machine code, but understood by the Parrot
interpreter. It is not intended to be human-readable or human-writable, but 
unlike
the other forms execution can start immediately, without the need for an 
assembly
phase. Parrot bytecode is platform independent.

=head2 The instruction set

The Parrot instruction set includes arithmetic and logical operators, compare 
and
branch/jump (for implementing loops, if...then constructs, etc), finding and 
storing
global and lexical variables, working with classes and objects, calling 
subroutines
and methods along with their parameters, I/O, threads and more.

=head2 Registers and fundamental data types

The Parrot VM is register based. This means that, like a hardware CPU, it has a
number of fast-access units of storage called registers. There are 4 types of
register in Parrot: integers (I), numbers (N), strings (S) and PMCs (P). There 
are 32
of each of these, named I0..I31, N0..N31, etc. Integer registers are the same 
size as
a word on the machine Parrot is running on and number registers also map to a 
native
floating point type.

PMC stands for Parrot Magic Cookie. PMCs represent any complex data structure or
type, including aggregate data types (arrays, hash tables, etc). A PMC can 
implement
its own behaviour for arithmetic, logical and string operations performed on it,
allowing for language-specific behaviour to be introduced. PMCs can be built in 
to
the Parrot executable or dynamically loaded when they are needed.

=head2 Garbage Collection

Parrot provides garbage collection, meaning that Parrot programs to do not need 
to
free memory explicitly; it will be freed when it is no longer in use (that is, 
no
longer referenced) whenever the garbage collector runs.


=head1 Obtaining, building and testing Parrot

=head2 Where to get Parrot

Periodic, numbered releases will appear on CPAN.  At this stage of the project,
an awful lot is changing between releases. You can get a copy of the latest
Parrot from the SVN repository. This is done as follows:

  svn co https://svn.perl.org/parrot/trunk parrot

You can find more instructions at: http://www.parrotcode.org/source.html

=head2 Building Parrot

The first step to building Parrot is to run the F<Configure.pl> program, which
looks at your platform and decides how Parrot should be built. This is done by
typing:

  perl Configure.pl

Once this is complete, run the C<make> program (sometimes called C<nmake> or
C<dmake>).  This should complete, giving you a working Parrot executable.

Please report any problems that you encounter while building Parrot so the 
developers
can fix them. You can do this by sending a message to L<[EMAIL PROTECTED]>
containing a description of your problem. Please include the F<myconfig> file 
that was
generated as part of the build process and any errors that you observed.

=head2 The Parrot test suite

Parrot has an extensive regression test suite. This can be run by typing:

  make test

Substituting make for the name of the make program on your platform. The output
will look something like this:

  C:\Perl\bin\perl.exe t\harness --gc-debug --running-make-test
    t\library\*.t  t\op\*.t  t\pmc\*.t  t\run\*.t  t\native_pbc\*.t
    imcc\t\*\*.t  t\dynclass\*.t  t\p6rules\*.t t\src\*.t t\perl\*.t
  t\library\dumper...............ok
  t\library\getopt_long..........ok
  ...
  All tests successful, 4 test and 71 subtests skipped.
  Files=163, Tests=2719, 192 wallclock secs ( 0.00 cusr +  0.00 csys =  0.00 
CPU)

It is possible that a number of tests may fail. If this is a small number, then 
it
is probably little to worry about, especially if you have the latest Parrot from
the SVN repository. However, please do not let this discourage you from 
reporting
test failures, using the same method as described for reporting build problems.


=head1 Some simple Parrot programs

=head2 Hello world!

Create a file called F<hello.pir> that contains the following code.

  .sub _main
      print "Hello world!\n"
      end
  .end

Then run it by typing:

  parrot hello.pir

As expected, this will display the text C<Hello world!> on the console, 
followed by
a new line (due to the C<\n>).

Let's take the program apart. C<.sub _main> states that the instructions that
follow make up a subroutine named C<_main>, until a C<.end> is encountered. The
second line contains the C<print> instruction. In this case, we are calling the
variant of the instruction that accepts a constant string. The assembler takes
care of deciding which variant of the instruction to use for us. The third line
contains the C<end> instruction, which causes the interpreter to terminate.

=head2 Using registers

We can modify hello.pir to first store the string C<Hello world!\n> in a 
register
and then use that register with the print instruction.

  .sub _main
      set S0, "Hello world!\n"
      print S0
      end
  .end

Here we have stated exactly which register to use. However, by replacing C<S0> 
with
C<$S0> we can delegate the choice of which register to use to Parrot, and it
will take care of any register spilling that needs to be done for us. It is
also possible to use an C<=> notation instead of writing the C<set> instruction.

  .sub _main
      $S0 = "Hello world!\n"
      print $S0
      end
  .end

To make PIR even more readable, named registers can be used. These are later 
mapped
to real numbered registers.

  .sub _main
      .local string hello
      hello = "Hello world!\n"
      print hello
      end
  .end

The C<.local> directive indicates that the named register is only needed inside 
the
current compilation unit (that is, between C<.sub> and C<.end>). Following
C<.local> is a type. This can be C<int> (for I registers), C<float> (for N
registers), C<string> (for S registers), C<pmc> (for P registers) or the name
of a PMC type.

=head2 PIR vs PASM

PIR can be turned into PASM by running:

  parrot -o hello.pasm hello.pir

The PASM for the final example looks like this:

  _main:
      set S30, "Hello world!\n"
      print S30
      end

PASM does not handle register allocation or provide support for named 
registers. It
also does not have the C<.sub> and C<.end> directives, instead replacing them 
with a
label at the start of the instructions.

=head2 Summing squares

This example introduces some more instructions and PIR syntax. Lines starting 
with a
C<#> are comments.

  .sub _main
      # State the number of squares to sum.
      .local int maxnum
      maxnum = 10
      
      # Some named registers we'll use. Note how we can declare many
      # registers of the same type on one line.
      .local int i, total, temp
      total = 0
      
      # Loop to do the sum.
      i = 1
  loop:
      temp = i * i
      total += temp
      inc i
      if i <= maxnum goto loop
      
      # Output result.
      print "The sum of the first "
      print maxnum
      print " squares is "
      print total
      print ".\n"
      end
  .end

PIR provides a bit of syntactic sugar that makes it look more high level than
assembly. For example:

  temp = i * i

Is just another way of writing the more assembly-ish:

  mul temp, i, i

And:

  if i <= maxnum goto loop

Is the same as:

  le i, maxnum, loop

And:

  total += temp

Is the same as:

  add total, temp

As a rule, whenever a Parrot instruction modifies the contents of a register, 
that
will be the first register when writing the instruction in assembly form.

As is usual in assembly languages, loops and selection are implemented in terms 
of
conditional branch statements and labels, as shown above. Assembly programming 
is one
place where using goto is not bad form!

=head2 Recursively computing factorial

In this example we define a factorial function and recursively call it to 
compute
factorial.

  .sub _fact
      # Get input parameter.
      .param int n
      
      # return (n > 1 ? n * _fact(n - 1) : 1)
      .local int result
      
      if n > 1 goto recurse
      result = 1
      goto return
  
  recurse:
      $I0 = n - 1
      result = _fact($I0)
      result *= n
      
  return:
      .return (result)
  .end
  
  
  .sub _main @MAIN
      .local int f, i
      
      # We'll do factorial 0 to 1.
      i = 0
  loop:
      f = _fact(i)
      
      print "Factorial of "
      print i
      print " is "
      print f
      print ".\n"
      
      inc i
      if i <= 10 goto loop
      
      # That's it.
      end
  .end

Let's look at the C<_fact> sub first. A point that was glossed over earlier is
why the names of subroutines all start with an underscore. This is done simply
as a way of showing that the label is global rather than scoped to a particular
subroutine. This is significant as the label is then visible to other
subroutines.

The first line, C<.param int n>, specifies that this subroutine takes one 
integer
parameter and that we'd like to refer to the register it was passed in by the 
name
C<n> for the rest of the sub.

Much of what follows has been seen in previous examples, apart from the line 
reading:

  result = _fact($I0)

This single line of PIR actually represents quite a few lines of PASM. First, 
the
value in register C<$I0> is moved into the appropriate register for it to be
received as an integer parameter by the C<_fact> function. Other calling
related registers are then set up, followed by C<_fact> being invoked. Then,
once C<_fact> returns, the value returned by C<_fact> is placed into the
register given the name C<result>.

Right before the C<.end> of the C<_fact> sub, a C<.return> directive is used to
ensure the value held in the register named C<result> is placed into the correct
register for it to be seen as a return value by the code calling the sub.

The call to C<_fact> in main works in just the same was as the recursive call
to C<_fact> within the sub C<_fact> itself. The only remaining bit of new
syntax is the C<@MAIN>, written after C<.sub _main>. By default, PIR assumes
that execution begins with the first sub in the file. This behaviour can be
changed by making the sub to start in with C<@MAIN>.

=head2 Compiling to PBC

To compile PIR to bytecode, use the C<-o> flag and specify an output file with 
the
extension F<.pbc>.

  parrot -o factorial.pbc factorial.pir


=head1 Where next?

=head2 Documentation

What documentation you read next depends upon what you are looking to do with 
Parrot.
The opcodes reference and built-in PMCs reference are useful to dip into for 
pretty
much everyone. If you intend to write or compile to PIR then there are a number 
of
documents about PIR that are worth a read. For compiler writers, the Compiler 
FAQ is
essential reading. If you want to get involved with Parrot development, the PDDs
(Parrot Design Documents) contain some details of the internals of Parrot; a few
other documents fill in the gaps. One way of helping Parrot development is to 
write
tests, and there is a document entitled I<Testing Parrot> that will help with 
this.

=head2 The Parrot Mailing List

Much Parrot development and discussion takes place on the perl6-internals
mailing list. You can subscribe by sending an email to
L<[EMAIL PROTECTED]> or read the perl6-internals NNTP archive.

=head2 IRC

The Parrot IRC channel is hosted on irc.perl.org and is named C<#parrot>.

=cut

Attachment: pgpqdYInOskAA.pgp
Description: PGP signature

Reply via email to