Re: Recommendations about POD -> XML

Sean M. Burke Wed, 14 Jul 2004 21:37:25 -0700

(To the Pod-People list: the Pod::Simple docs say that "If you actually want to use Pod as a format that you want to render to XML (particularly if to an XML instance with more elements than normal Pod has), please email me ([EMAIL PROTECTED]) and I'll probably have some recommendations." That's mostly a placeholder for a real description that I keep meaning to write. But anyway, someone just wrote to me taking me up on my "please email me" note, and this is my reply.)


Mr Liyanage,

OK, I've been keeping you hanging for a while, so here's my main notes about using Pod for things beyond just what it's currently for:


-- Making new N<...> codes --

You can make a new [uppercaseletter]<...> code.  But:

* You can't take over a currently assigned letter. You can't take over B or I or L or C or X or whatever. Those are fixed.

* It can't be a "magic" code like L<...> which has all that crazy parser-hacked processing of its content text. (If you do want to implement something like L<>, ugh, go right ahead, but it can't happen in the parser like L does.) (L<> and Z<> and E<> and S<> are the only "magical" codes: L because it does that kookoo processing of like in L<foo/bar>, Z<> because it must always be empty, and E<stuff> because it turns into whatever character "stuff" means; and S<stuff whatever> because the parser can, optionally, rewrite the spaces in the content to nonbreaking spaces. None of that magic is available for new codes.)

* You have to declare how your new code will behave for Pod processors that don't understand it. The choices are:

- it disappears, taking all its content with it. Example: "I like pieA<foo> and it likes me!" turns into "I like pie and it likes me!"

- it disappears, but leaves its content, but with no special extra formatting. Example: "I like pieA<foo> and it likes me!" turns into "I like piefoo and it likes me!"

- it turns into another (established) formatting code. Example: "I like pieA<foo> and it likes me!" turns into "I like pieI<foo> and it likes me!"

- it turns into several (established) formatting codes. Example: "I like pieA<foo> and it likes me!" turns into "I like pieI<B<foo>> and it likes me!"

* Then you have to declare how your code should look to Pod processors that DO understand it. Your options are basically expressed as a list of XMLnames.

This is declared in a =foo directive called "=extend", which has this syntax:

=extend Newcode Fallback Canonicalnames

Newcode is one uppercase letter, the name of the new lettercode you'll use. E.g., "A" or "Q" or "M", etc.

Fallback says how uses of this code should look to Pod processors that don't understand it. The syntax is one of four things, corresponding to the four options above: - The digit "0" (which means that it disappears and takes its content with it) - The digit "1" (which means that it disappears but leaves its content) - A single lettercode like "I" (which means that it becomes that code, like A<foo> turns into I<foo>) - A comma-separated list of lettercodes like "I,B" (which means that like A<foo> turns into I<B<foo>>)

Then Canonicalnames is a comma-separated list of element names that uses of this new lettercode will appear as, in the eyes of the Pod processor. That is, a Canonicalnames "M,N,O" means "if the Pod processor understands M, then this new code I'm declaring will show up as an M to it; otherwise if it understands N, make it appear as N; otherwise if it understands O, make it appear as O. (And if it doesn't understand any of these, then the parser considers Fallback, as described above.) However, note that Canonicalnames doesn't have to be a list of single uppercase letter codes; it accepts anything that's legal as an XML element name (which happens to include single uppercase letters. Example:

=extend  P  C,I   paramname

So you write a Pod processor that says it understands "paramname" (thru a call to $podparser->accept_codes('paramname'), and then when you say "blah blah P<crunkle> blah", it looks to the processor as if it were the XML "blah blah <paramname>crunkle</paramname> blah". Note that the Pod processor doesn't see the "P" anymore. So you could just as well say

=extend  V  C,I   paramname
...
blah blah V<crunkle> blah

and it would look the same ("blah blah <paramname>crunkle</paramname> blah") to the Pod processor.

Another example, with several canonicalnames this time:

=extend  P  C,I   Parammy,paramname,Marc:Paramour

If the Pod processor has said it understands "Parammy" elements, then that's how P<crunkle> looks, "<Parammy>crunkle</Parammy>", otherwise if it understands paramname, then it gets that, otherwise if it understands "Marc:Paramour", it gets that, otherwise it gets <C>...</C>.

There's a short-form syntax: =extend newcode fallback ...which is just a shortcut for "=extend newcode fallback newcode", i.e., "=extend A B" is a shortcut for "=extend A B A". That is, the canonical name is just the one-letter uppercase code. So A<foo> produces <A>foo</A>, which works if the Pod processor has called $podparser->accept_code("A").

This whole =extend business is tested in Pod-Simple's t/ac_c_extend.t, but I can't say I've /extensively/ tested it beyond that, so if you use this, be sure to let me know whether or not it behaves properly, okay? I consider this an important feature that I want to have working right.

I actually have written a few Pod processors that declare that they understand particular extended lettercodes. They are Pod::Simple::HTML and Pod::Simple::RTF. P'S'RTF's extensions are for getting at cosmetic commands in RTF (like underlining, smallcaps, superscript, etc) and P'S'HTML's extensions are for getting at HTML commands, some of which are cosmetic (like for superscript) and some of which are semantic (like <cite> for the typically-italicized names of movies, books, etc). Anyhoo, if you have a document with this:

=extend U I underline

I U<like> pie.

And then use Pod::Simple::HTML to process this, Pod::Simple::HTML has declared that it understands "underline" (and it internally maps it to HTML "u"), so you get: I like pie. If you use Pod::Simple::RTF, Pod::Simple::RTF has declared that it understands "underline" (and it internally maps it to RTF "\ul"), so you get: I {\ul like} pie.

But if you use it with some other Pod processor that doesn't declare that it understands "underline", then that processor will just see an I code instead.

-- Making new =foo directives --

In theory, Pod processors that use Pod::Simple can declare that they accept new "=whatever" codes, like: "=methodname crunch" However, I strongly discourage this, because there's no way for a Pod processor that doesn't understand "=methodname" to know what to do when it sees it. (Altho if you really wanted to add a "=methodname" and not care about Pod processors that don't understand it, you would do it by calling $podparser -> accept_directive_as_verbatim( 'methodname' ) or $podparser -> accept_directive_as_processed( 'methodname' ) or $podparser -> accept_directive_as_data( 'methodname' ) , depending on whether you wanted the contents of the text in the paragraph that follows "=methodname ..." to be treated as verbatim (with tabs expanded to spaces, notable), as processed (with I<...>, etc parsed), or as data (totally untouched).

So instead of adding a "=foo", you instead do this:


-- Making new "=for foo" (a/k/a "=begin foo ... =end foo") targets --

A Pod processor declares that it understands "foo" by calling $podparser -> accept_target_as_data( "foo" ). Then if you have this in a document:

=for foo Hooboy pie!

Then the Pod processor sees events (or nodes) as if it saw this XML:
<for target="foo" target_matching="foo"
  ><Data xml:space="preserve"
  >Hooboy pie!</Data></for>

And that's just the same as if you'd done this:

=begin foo

Hooboy pie!

=end foo

...since =for... is just a shorthand one-paragraph form of =begin ... =end.

In "=for target ..." (or "=begin target\n\n...\n\n=end target"), "target" is normally the name of a single target -- but it can actually be a comma-separated list of targets, any of which can match, like here:

=for meta,metadata,versioning V=1.9.2 ; A=Joe Shmo

With that paragraph, if the Pod processor has called accept_target_as_data( "meta" ) or accept_target_as_data( "metadata" ) or accept_target_as_data( "versioning" ), then it sees:

<for target="meta,metadata,versioning" target_matching="..."
  ><Data xml:space="preserve"
  >V=1.9.2 ; A=Joe Shmo</Data></for>

(With the "..." replaced with the name of whichever target matched first.)

You can also have a =for directive that targets processors that /don't/ understand a particular formatter:

=for !meta,metadata,versioning (See source for versioning)

Then every processor that doesn't understand any of meta, metadata, or versioning, will see:

<for target="!meta,metadata,versioning" target_matching="!"><Data xml:space="preserve">(See source for versioning)</Data></for>

To make the Pod parser treat the contents of a =for group as processed text (with N<...> things parsed) instead of as data, you can explicitly say so by prefixing a ":" to the target:

=for :meta,metadata

...stuff...

and/or:

=for !:meta,metadata

("!:" and ":!" are equivalent, so don't worry about which to use.)

This =for foo stuff is tested in Pod-Simple's t/for*.t tests, but I haven't used it extensively.

Am I talking in a direction that makes sense for what you're thinking about doing with a superset of Pod feeding into a bunch of XML tools?

--
Sean M. Burke    http://search.cpan.org/~sburke/

Re: Recommendations about POD -> XML

Reply via email to