[RFC] A more extensible/flexible POD (ROUGH-DRAFT)

Stevan Little Mon, 14 Mar 2005 17:55:36 -0800

Gang,

My proposal is for an extensible version of POD. Basically what XML is to HTML/SGML, this will be for POD. This is a very very very rough draft. I am submitting it in hopes of getting some basic feedback on the idea to see if I should carry it any further or not. So please if you have an opinion on this, either positive or negative, please chime in and let me know.

If nothing more I am hoping this might bring the debate over the future of POD (or Kwid) out of the more transient #perl6 and into the more concrete world of email as I think there are currently too many opinions and not enough dialog.

- Stevan

(p.s. - I apologize in advance if my ideas sound a little jumbled together, remember,.. this is very very rough)

INTRODUCTION

There has been a lot of chatter on #perl6 about POD vs. Kwid and how best to do this and that, etc. etc. etc. and so on and so forth. It seems to me that we will never be able to create a documentation format which will make everyone happy in all cases.

But my question really is, why should we?

Just as with HTML, at some point the formatting information overtakes the data being displayed and the data no longer has any real meaning in relation to the markup around it. IMO, POD (and Kwid) are both too formatting centric, and while much simpler than HTML, suffer on some level from the same problem.

The early promise of XML was that data and formatting would be separate. That you put the data in the XML and gave it meaning and context, then you put the formatting into your stylesheet (XSL or CSS). Of course we all know XML (and XSL and all its variants) have now become a big ugly monster, but that does not mean those early ideas were not good ones.

I am of the opinion that while documentation is traditionally a very static thing, it should and need not be so. In fact, I feel that documentation should be as much meta-data as it is prose. I think the extensible POD-like format I will describe below could possibly bridge that gap between meta-data and static documentation.

THE BASIC IDEA

When grossly simplified SGML, HTML & XML all share a base set of similar constructs. They are made up of Containers, Elements and Entities. An Element is basically a tag with no embedded tags (<BR> or <HR> are the most common in HTML). A Container is a tag which itself has Elements within it. An Entity is just a special chunk of text embedded within other text (in *ML they begin with '&' and end with ';'). (NOTE: I am making the assumption that text is itself an implicit element)

If we port this idea to a POD like syntax, it might look something like this:

=container

=element: some string with E<entities> in it

Some plain text contained I<within> this container.
        
=end

The Container is begun the normal "POD"-ish way, and easily delimited by an '=end' marker. An Element is very "POD"-ish as well, but is identified by a trailing ':' after it's name. And entities take the common POD form of an uppercase character and something inside of two angle brackets.

NOTE: Anything following an Element or a Container declaration on the same line can be thought of as being "part of" that element/container. This is somewhat like how *ML tags have attributes.

A more complex example might be something like this:

=module
        =project: Pugs
        =name: Perl6::Pugs
        =version: 6.0.11
        =author
                =name: Autrijus Tang
                =email: [EMAIL PROTECTED]
        =end
        =description: Pugs - Perl6 Users Golfing System.
        =dependencies
                GHC 6.2 or above
        =end
        =copyright: Copyright 2005 Autrijus Tang.
        =url: L<http://www.autrijus.org>
        =license
                This library is free software; you can redistribute it
         and/or modify it under the same terms as Perl itself.
        =end
=end

As you can see any container or element can begin with a normal identifier ('a-zA-Z_' but no spaces). Since they can be nested name conflict can be avoided through scoping (module/name and module/author/name do not conflict). Like *ML (but unlike POD and Kwid) both whitespace and line breaks are not significant (at least not in the same way). Unlike *ML, the "tags" are not so verbose and are fairly easily readable by humans (at least in IMHO that is). The "entities" can be pretty much any single uppercase letter (26 is likely enough).

Now, I am sure some of you are thinking that it is starting to look a little like YAML. But YAML is much more complex and structured, and therefore not really good for documentation. YAML has a much larger syntax, for what I am proposing, what you see here is all of it.

And really, thats pretty much all of it. Details like code and verbatim sections are not part of this, they are dictated by the formatter. All we have in here is data, pure and simple.

FORMATTERS

(NOTE: this is currently the weakest part of this proposal/idea, and so feedback on it is very appreciated)

The question of "how will I format this in <insert favorite format here>" must be addressed. However, I will dodge that quickly and propose a different approach instead.

I like how POD does not dictate, and instead just suggests, how a formatter should handle things. However, this ideal has lead to many different and somewhat incompatible POD parser/formatters out there. I do not think that I can solve this problem, however, I do think that we can at least solve a part of it by dictating how formatters will interact with the extensible POD data.

This is modeled after the base idea of XSL, however, I do not in anyway want to create some kind of POD/XSL hybrid monster here. Think of it instead as more of an API for formatters, which is somewhat more akin to the XML-DOM.

EXAMPLE

Here is a basic example of the common set CPAN information. It is more verbose than standard POD (and Kwid would be), but keep in mind that this same information could be used to not only generate basic search.cpan.org HTML docs and man pages. But also from the method information simple UML diagrams. Testing of synopsis code, while easy in POD, is even easier here.

It is also important to note that while I deliberately mimicked the CPAN style here, there is no reason that the data needs to be structured in this way. It can be structured in what ever way suits the data, and let the formatter dictate the eventual layout (again think XML/XSL).

=name: My::Module - A perl extension for my module
=synopsis

        my $m = My::Module->new();
        $m->method();

=end
=description

This is my module, I hope you like it.

=end
=methods

=method
        =name: hello_world
        =args: ($which_world, $is_friendly)
        =returns: void
        =description:
                This method greets a particular world and takes an
                optional C<$is_friendly> flag.
        =end
=end

=end
=see_also
        L<My::Other::Module>
=end
=author
        =name: Stevan Little
        =email: [EMAIL PROTECTED]
=end

CONCLUSION

Okay, nothing much here. It's getting late, and has been a long day, so I won't bore you anymore.

Thanks for reading this far, and please send me all your comments (both good and bad).

[RFC] A more extensible/flexible POD (ROUGH-DRAFT)

Reply via email to