Documenting Perl6 part 2

Mark Overmeer Fri, 06 Jul 2007 04:59:59 -0700

<intro>
  Hi Mongers,

  Part one of this story, which I released last week on this list, was
  really provoking people to rethink the way work work now (not!)


  The challenge was made by Damian, to design a documentation system which
  is easier to use: where code and doc work together, in stead of being
  orthogonally separated.  Well, the gauntlet has been taken, resulting
  in this 0.0 version; of course incomplete.

  Most of the non-inlined features of this design are available for
  Perl5 via the OODoc distribution.  It is a pity that Perl5 subroutines
  provide so little information about how they are to be used: therefore
  you have to specify much more explicitly than required for Perl6.

  As I have asked twice before, I would really like to see a demo by
  Damian on how to document a small Perl6 class (say 30 lines) using his
  S26 specs.  Including everything what is needed to get it configured
  in a way that tools may swallow it.  In return, I will use the design
  described here to do the same... for comparison.

    MarkOv
</intro>

======== Documentation of Perl6
=== part2: general syntax, version 0.0

The crucial difference between Damian's S26 design and my opinion about
documentation, is that IMO the focus must on being able to create tools
which produce the best end-user documentation (see part1 of this story)
and the markup language is just one (minor) issue.  Damians Synopsis S26
does design a markup language, but does not want to define code inter-
connection nor predefined markup tags to help the tools.

=== Choosing Symbols

Let's try to avoid reinventing the syntax wheel.  In Perl5 we use POD for
end-user documentation (text surrounded by lines which start with a '=')
and comments to explain the program (texts with a leading '#').  In Perl6,
the same symbols are allocated, and people are used to that.  Let's stick
with them, for now.

Defined are two ways to add documentation fragments to the code:

(1) the POD(5) way:

   =over 4
   =item . one
   =back
   =cut

   As in POD(5), any line which starts with a '=' will start a
   documentation block, and '=cut' ends it.

   Extensions to POD(5)

   - "internal" blank lines are optional; only the line before the
     first line of a documentation fragment must be blank.  This will
     make the documentation much more compact to write, so code will
     get more visible in the file.

   - each tag is matched with  /^\=+(\w+)[\s+|$]/
     By permitting more than one '=' as leader, the author can make
     his/her own visual helpers.  For instance, the '=head1' in the
     file looks as heavy as the '=over', but is of course of much
     more importance.  So: the author may decide to say '====head1'.
     The extra '=' are not significant.

   - a (long) list of logical markup tags will be used to add information
     about what is being documented.  We will not write
          =item print OPTIONS

     but  =sub print OPTIONS
      or  ====sub print OPTIONS
     (if we need to)

   - the tag /^\=+end\s+$1/ will also terminate a pod block, like
     =cut, but then with a semantic role as well.

(2) Inlined documentation, like comment

   Many (small) features in the program need to get some documentation,
   for instance, class attributes.  To start a documentation block, as
   described in (1) for each of them is both a lot of work to type and
   makes it harder to get an overview on the code.

   Without extension of the available symbols for documentation in Perl6,
   our hands are bound.  So, therefore (for the moment) I use
        /[^|\s|\;]\#\=/  for user documentation, where
        /[^|\s|;]\#[^=]/ is programmers comment.

   Inlined docs are automatically linked to *declaration* above or
   before them:
       .has Point $center;  #= center coordinate of the universe

       method move(float $dx, float $dy, float $dz) {
         #= Jump to a different location, relative to the
         #= current position.
         #=param $dx in parsec

         # of course, time should be included in this
         # interface.  We will do that later. [normal comment]
       }

   The last two lines are code comments.

=== Producing Document fragments

Both inlined and block documentation provides the same features.
The fragments start in one of three ways:

(a) method print() {...}   # implementation anywhere in the file

    #=method print           =method print
    #=description            =description
    #= line1                 line1
    #= line2                 line2
    #=end method             =end method


(b) method print() {...}   # implementation anywhere in the file

    #=method print           =method print
    #= line1                 line1
    #= line2                 line2
    #=end method             =end method

(c) method print() {...}     method print() {...}
    #= line1
    #= line2                 =description
                             line1
                             line2
                             =cut

    When followed by the next method/end section, no "=end"
    is required.  Just after a container item, we always start
    with a description.

Clearly, the inlined version of (c) is most compact.
Also in the following cases, you will have information collected for
the manual pages to be produced later... still without description.

(d) has .$center;
    method print() {...}


The description is always kept in the first lines of a container.
Additional (nested) informational items start with
   #=<name> <parameters>
Their description can start on the same line, or on the next line.
Equivalent are:

 #=param $x
 #= the horizontal coordinate.

 #=param $x the horizontal coordinate.

 #=param $x the horizontal
 # coordinate.

=== Merging with code

One of the targets of this design, is to avoid replication of information:
when the program says that a parameter has default '10', then the
documentation shouldn't say '42' (... unless you really want to)

Long example from Apocalypse A06:

    method action ($self:
                int  $x = 10,
                int ?$y,
                int ?$z,
             Adverb +$how,
        Beneficiary +$for,
           Location +$at is copy,
           Location +$toward is copy,
           Location +$from is copy,
             Reason +$why,
                    *%named,
                    [EMAIL PROTECTED]
                ) {...}

Without any intervention, this will produce something comparable to
the following.  Some items are hidden for simplicity, and the (ignored)
additional '=' and blank lines are used to show some structure.

 ====method action
 ===visibility public
 ===call ($self: int  $x, int ?$y, int ?$z, OPTIONS, LIST)

 ==param $x
 =type int
 =use required
 =default 10

 ==param $y
 =type int
 =use optional

 ==option $how
 =type Adverb

 ==option $from
 =type Location
 =pass copy
 ... etc ...
 =end call


The purpose of the first phase of the documentation processor is to
generate as much (consistent) information about the (public) interface
as possible.  The back-end manual-page generators will limit the amount
of information they present: it is the user's decision what they want
to read, not the authors!

This automatic extraction process is the most complicated part of
the whole implementation, for sure.  It is wise to have this info not
collected some POD tool, but directly extracted from the Perl6 AST.

Now a complex example from Apocalypse A06:

    sub swap ([EMAIL PROTECTED] is rw) { @_[0,1] = @_[1,0] };

Could be documented overruling nearly all generated information.
Explictly overruling the parameter list with "call" will override the
automatically generated parameter information.  This is especially
useful to merge the details about multi methods/subs.

    sub swap ([EMAIL PROTECTED] is rw) { @_[0,1] = @_[1,0] };
    #= exchange the content of two variables.
    #=call (A, B)
    #=return the reverse list
    #=param A will be replaced by B
    #=param B will be replaced by A
    #  this is just to demonstrate programmers comment:
    #  we probably should check the number of values passed.

If you like to describe your code before it is used, you need a
reference for the documentation fragment:

    #====sub swap
    #= exchange the content of two variables.
    #=call (A, B)
    #=return the reverse list
    #=param A will be replaced by B
    #=param B will be replaced by A

    sub swap ([EMAIL PROTECTED] is rw) { @_[0,1] = @_[1,0] };
       #  we probably should check the number of values passed.

or

    ====sub swap
    exchange the content of two variables.
    =call (A, B)
    =return the reverse list
    =param A will be replaced by B
    =param B will be replaced by A
    =cut   # or =end sub  or =end sub swap

    sub swap ([EMAIL PROTECTED] is rw) { @_[0,1] = @_[1,0] };
       #  we probably should check the number of values passed.

Adding some example to swap, anywhere in the file (probably close by,
or in the same block without need for a reference)

    ====sub swap
    =example
       my ($a, $b) = (10,42)
       swap($a, $b);
       say ":$a:$b:";    # :42:10:
    =end sub

In the same way, you can add descriptions of procedured error and
warning messages ('=error', '=warning').

=== Ordering

Each documentation fragment types (both blocks and inlined) describe
a specific kind of knowledge; therefore it is always defined where it
belongs to; either implicit or explicit.

When being processed, the document fragments are organized into a
tree ("DocTree"), derived from the Perl6 AST (Abstract Syntax Tree).
As the Perl6 can be distributed in compiled form, also this "DocTree"
can be distributed as half-product.

Back-end documentation tools will use (one or more of these) DocTrees as
only source of information to produce user manuals: they should not (need
to) process the source code themselves to collect additional information.

The created DocTree looks something like this:

  root::
     distribution(MyDist)
        file(MyDist.pm)
           chapter(copyrights)
           chapter(authors)
        package(MyDist)
           manual-data
        class(MyClass)
           inheritance-info
           manual-data

  manual-data::
     chapter(name)
     chapter(description)
     chapter(methods)
         section(constructors)
             method(CLASS, dup)
                call
                  option(Debug)
                    type(BOOLEAN)
                    default(false)
                  parameter
                example
                 
You now can either explictly or implicitly add something to a block.

(1) Explicitly is the clearest.  It uses the predefined logical
    markup statements.  Example:

  =chapter METHODS
  =section Constructors
  =method dup
  Duplicate an object.

    The DocTree generator will lookup additional information about the
    dup() method automatically, and place that on the explicitly indicated
    spot in the tree as well.
 
(2) Implicit reference is a bit tricky: some tags found by the Perl6
    parser will automatically insert tags.

    The parser will not only insert starting tags, but also ending
    tags: for instance, when a new chapter starts, then all lower
    and equal level nested block-structures will be closed automatically.

An example which shows implicit and explicit references:

  class Mail::Message {
    #= a general message object.

  =chapter DESCRIPTION
  Implementation of...
  =cut

     method print() {...}
       #= output message header and body.
  }

is equivalent to

  class Mail::Message {

  =class Mail::Message
  a general message object

  =chapter DESCRIPTION
  Implementation of...

  =end chapter
  =chapter METHODS
  =cut

     method print() {...}

  =method print
  output message header and body.
  =call ()
  =end method
  =end chapter
  =end class

  }

As should be clear from above example: implicit references make the
live for the documentation authors a lot easier.  Chosen is to use
"=end method" instead of "=method end", to avoid conflicts with a method
or subroutine named "end()".  You may also use "=end chapter METHODS",
in which case the provided name shall match.

When there is author supplied information about a code feature,
that block location will be used to collect all information about the
feature.  If there is no author supplied info, then the location of the
implementation will be used.  This concept will make it easy to create
the whole manual-page below all code.

The DocTree structural definition [sloppy]

  root:  distribution*

  distribution: (file|package|class|grammar)*

  file: chapter*
  package: chapter*, exporter-info?
  class,grammar: chapter*, inheritance-info?

  chapter: description, (section|example|callable)*
  section: description, (subsection|example|callable)*
  subsection: description, (subsubsection|example|callable)*
  subsubsection: description, (example|callable)*

  callable: method|rule|sub|macro|....

  method,rule,sub: description,(call|example|report)*

  call: (param|option)*
  param,option: name, description?, default?, type?, use?

  description,example: text?
  report: text?   # describes errors and warnings

  text: unicode-string   # a documentation fragment using markup.

=== The markup language

The text blocks in the DocTree will use POD(5) block markup syntax, so
the users may define their own markup syntax (like Synopsis S26 defines),
as long as a POD(5) is recorded in the DocTree.

For sake of better references between documentation elements, POD needs
a more detailed reference syntax:

  A<Some::Module>
    refers to a package/class/grammar with that name

  A<do_something()>
    refers to a sub/method/rule in this resp. package/class/grammar.
    May be available via inheritance from some base class/grammar,
    or from import().

  A<Some::Module::do_something()>
    refers specificly to an element in a different page

  A<do_something(option)>
  A<Some::Module::do_something(option)>
    references to a parameter or named-parameter description.

All above define references.  Some back-ends, like UNIX manual-pages
and POD, may not support such fine resolution, and need to rewrite these
links into text.

The destination components do not need to register themselves as anchor
points.  This is very important, because the destination may very
part of a different distribution.

=== User documentation

A documentation generating back-end takes the DocTree of one or
more distributions, and uses only the fragments it finds interesting.
That sub-set of features is converted into end-user manuals, for instance
into traditional POD, man, HTML, XML, or LaTeX.  (User provided) templates
can really simplify this process.

          developer
              |
              v
  Perl(6) files of a distribution
              |
              +  Perl6 compilation
              v
           Perl AST ---> code
              |
              +  fragment collector/splitter
              +  markup translator to POD+
              v
           DocTree  (distributable)
              |
              | ,-----< DocTree* additional
              ||
              ++ Manual-page generator (templates/style sheets)
              |
        static manuals (distributable)
              |
              v
           end-user

=== Syntax Alternatives

In above syntax, only the currently allocated '=' and '#=' symbols are
used.  When additional symbols will be made available, then a visually
cleaner syntax might be developed.

(0) With current symbols

      method compute()          method compute()
        #= some text               = some text
        #= and more                = and more

    These '#=' are quite visually heavy.  Of course we are used to an
    extensive application of the '#' as comment, but still.  Without '#',
    it is much prettier.

(1) The Python look:

      method compute()
        """ some text
            and more
        """

    There are already so many quotes is a text, that it is
    confusing.  The character is pale, which doesn't feel pleasant.

(2) like attaching a label

      method compute()          method compute($x)
        ` some text               ` some text
        ` and more                =param $x
                                  ` the starting point

(3) like a line

      method compute()          method compute()
        | some text                : some text
        | and more                 : and more

Zillion other possibilities, which all require a change in the current
Perl6 syntax definition.  It could be a good plan to create a larger
example module, and then experiment with above suggestions, within
Perl6's boundaries.

Documenting Perl6 part 2

Reply via email to