Re: perlpodspec, draft 2

Philip Newton Tue, 25 Sep 2001 10:58:17 -0700

On Sun, 23 Sep 2001 23:05:27 -0600, in perl.perl5.porters you wrote:

>  [endquote]
>  I think that those paragraphs should just be removed; paragraph-based
>  parsing  seems to have been largely abandoned, because of the hassle
>  with non-empty blank lines messing up what people meant by "paragraph".
>  Even if the "it makes parsing easier" bit were especially true,
>  it wouldn't be worth the confusion of having perl and pod2whatever
>  actually disagree on what can constitute a Pod block.


One possible "fix" is to amend what Perl thinks is a Pod block :)

Oh well, I guess it's not that big a deal, really (though I got rather
worried about it at first). The only things it influences is what starts
an entire Pod block (and there you recommend using a blank line before
it -- e.g. in your "firecracker goes boom" example and in Pod after an
__END__ line, for "older parsers"), and what ends one. And as someone
pointed out, people don't talk about =cut all that often, so the risk of
that word accidentally beginning a line through paragraph reformatting
is minimal (and can be further reduced by spelling it C<=cut> or Z<>=cut
or E<61>cut).

> It is advised that formatnames match the regexp
> C<m/^:?[-a-zA-Z0-9_]+$/s>.

What's the /s for? There's no . in your regexp.

> =item *
> 
> A formatting code starts with a capital letter (just US-ASCII [A-Z])
> followed by two or more "<"'s

Ah, so the << >> thing is explicitly allowed for things besides C<>!
That's good to know. I was going to ask about that in my response to
perlpod, but forgot :).

> , one or more whitespace characters,
> any number of characters, one or more whitespace characters,
> and ending with the first matching sequence of two or more ">"'s, where
> the number of ">"'s equals the number of "<"'s in the opening of this
> formatting code.

Are the whitespace characters stripped before rendering? All of them?

> Consider:
> 
>     C<$x ? $y    :  $z>
> 
>     S<C<$x ? $y     :  $z>>
> 
> Both signify the monospace (c[ode] style) text consisting of
> "$x", one space, "?", one space, ":", one space, "$z".  The
> difference is that in the latter, with the S code, those spaces
> are not "normal" spaces, but instead are nonbreaking spaces.

Does

    C<S<$x ? $y     :  $z>>

make any sense? Does it mean something different than S<C<...>>? I can
imagine not.

> =item *
> 
> Pod parsers should not, by default, try to coerce apostrophe (') and
> quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
> turn backtick (`) into anything else but a single backtick character
> (distinct from an openquote character!), nor "--" into anything but
> two minus signs.  They I<must never> do any of those things to text
> in CE<lt>...> formatting codes, and never I<ever> to text in verbatim
> paragraphs.

Ah. This is something I like, though I know of others who think
otherwise.

> =item *
> 
> Authors of Pod formatters/processors should make every effort to
> avoid writing their own Pod parser.  There are already several in
> CPAN, with a wide range of interface styles -- and one of them,
> Pod::Parser, comes with modern versions of Perl.

<smile>

> =item * Be warned
> that some formatters cannot reliably render characters outside 32-126;
> and many are able to handle 32-126 and 160-255, but nothing above
> 255.

Put a newline between "*" and "Be warned", or the text will disappear on
formatting (if I understood your comment "Pod parsers will infer the
list type from the first element" correctly).

Hm, you say:

>                                                         Pod parsers,
> when faced with some unknown "EE<lt>I<identifier>>" code,
> shouldn't simply replace it with nullstring (by default, at least),
> but may pass it through as a string consisting of the literal characters
> E, less-than, I<identifier>, greater-than.  Or Pod parsers may offer the
> alternative option of processing such unknown
> "EE<lt>I<identifier>>" codes by firing an event especially
> for such codes, or by adding a special node-type to the in-memory
> document tree.  Such "EE<lt>I<identifier>>" may have special meaning
> to some processors, or some processors may choose to add them to
> a special error report.

, but then:

>                                                            Implementers
> are not expected to bend over backwards in an attempt to render
> Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
> of the other weird things that Unicode can encode.)

What's a parser supposed to do on encountering E<12345> if it only
"does" Latin-1? Output the sequence "E < 1 2 3 4 5 >" or the
alternatives listed for E<identifier> above? Or is there no fallback?

>                                                      And
> if a Pod document uses a character not found in such a mapping, the
> formatter should consider it an unrenderable character.

Does this mean "E<12345>, where the mapping table has no mapping for
'12345'", or "a literal character with the value 0x12345"? It sounds
like "literal character" rather than "E<> sequence", but you didn't talk
about literal non-ASCII characters and what should be done with them
(yet?).

>         If you are in this circumstance, you should begin with the
> characters in the range 0x00A0 - 0x00FF, which is mostly the heavily
> used accented characters).
                          ^
unmatched parenthesis.

> =item Third:
> 
> The name or URL, or undef if none.  (E.g., in "LE<lt>Perl
> Functions|Sperlfunc>", the name -- also sometimes called the page --
> is "perlfunc".  In "LE<lt>/CAVEATS>", the name is undef.)

s/Sperl/perl/

> Note that you can distinguish URL-links from anything else by the
> fact that they match C<m/^\w+\:[^:\s]\S+$/s>.  So
> C<LE<lt>http://www.perl.comE<gt>> is a URL, but
> C<LE<lt>HTTP::ResponseE<gt>> isn't.

Delete the \ from the \:, and the /s modifier.

(<nitpick>that means that mailto:a is not a URL? darn!</nitpick>)

> The ":" on these identifiers means simply "process this stuff
> normally, even though the result will be for some special target".
> I suggest that parser APIs report "biblio" as the target identifier,
> but also report that it had a ":" prefix.  (And similarly, with the
> above "html", report "html" as the target identifier, and note the
> I<lack> of a ":" prefix.)

Does this mean that the format names "foo" and ":foo" are necessarily
related -- two different kinds of format that differ only in whether the
contents are parsed?

For example, there are formatters that understand =for html. Do they
also have to "automatically" support =for :html?

> Incidentally, that there's no easy way to express a data
> paragraph starting with something that looks like a command.

Is there a "note" missing at the beginning of the sentence, after the
comma?

Cheers,
Philip

Re: perlpodspec, draft 2

Reply via email to