On Sun, 23 Sep 2001 23:05:27 -0600, in perl.perl5.porters you wrote: > [endquote] > I think that those paragraphs should just be removed; paragraph-based > parsing seems to have been largely abandoned, because of the hassle > with non-empty blank lines messing up what people meant by "paragraph". > Even if the "it makes parsing easier" bit were especially true, > it wouldn't be worth the confusion of having perl and pod2whatever > actually disagree on what can constitute a Pod block.
One possible "fix" is to amend what Perl thinks is a Pod block :) Oh well, I guess it's not that big a deal, really (though I got rather worried about it at first). The only things it influences is what starts an entire Pod block (and there you recommend using a blank line before it -- e.g. in your "firecracker goes boom" example and in Pod after an __END__ line, for "older parsers"), and what ends one. And as someone pointed out, people don't talk about =cut all that often, so the risk of that word accidentally beginning a line through paragraph reformatting is minimal (and can be further reduced by spelling it C<=cut> or Z<>=cut or E<61>cut). > It is advised that formatnames match the regexp > C<m/^:?[-a-zA-Z0-9_]+$/s>. What's the /s for? There's no . in your regexp. > =item * > > A formatting code starts with a capital letter (just US-ASCII [A-Z]) > followed by two or more "<"'s Ah, so the << >> thing is explicitly allowed for things besides C<>! That's good to know. I was going to ask about that in my response to perlpod, but forgot :). > , one or more whitespace characters, > any number of characters, one or more whitespace characters, > and ending with the first matching sequence of two or more ">"'s, where > the number of ">"'s equals the number of "<"'s in the opening of this > formatting code. Are the whitespace characters stripped before rendering? All of them? > Consider: > > C<$x ? $y : $z> > > S<C<$x ? $y : $z>> > > Both signify the monospace (c[ode] style) text consisting of > "$x", one space, "?", one space, ":", one space, "$z". The > difference is that in the latter, with the S code, those spaces > are not "normal" spaces, but instead are nonbreaking spaces. Does C<S<$x ? $y : $z>> make any sense? Does it mean something different than S<C<...>>? I can imagine not. > =item * > > Pod parsers should not, by default, try to coerce apostrophe (') and > quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to > turn backtick (`) into anything else but a single backtick character > (distinct from an openquote character!), nor "--" into anything but > two minus signs. They I<must never> do any of those things to text > in CE<lt>...> formatting codes, and never I<ever> to text in verbatim > paragraphs. Ah. This is something I like, though I know of others who think otherwise. > =item * > > Authors of Pod formatters/processors should make every effort to > avoid writing their own Pod parser. There are already several in > CPAN, with a wide range of interface styles -- and one of them, > Pod::Parser, comes with modern versions of Perl. <smile> > =item * Be warned > that some formatters cannot reliably render characters outside 32-126; > and many are able to handle 32-126 and 160-255, but nothing above > 255. Put a newline between "*" and "Be warned", or the text will disappear on formatting (if I understood your comment "Pod parsers will infer the list type from the first element" correctly). Hm, you say: > Pod parsers, > when faced with some unknown "EE<lt>I<identifier>>" code, > shouldn't simply replace it with nullstring (by default, at least), > but may pass it through as a string consisting of the literal characters > E, less-than, I<identifier>, greater-than. Or Pod parsers may offer the > alternative option of processing such unknown > "EE<lt>I<identifier>>" codes by firing an event especially > for such codes, or by adding a special node-type to the in-memory > document tree. Such "EE<lt>I<identifier>>" may have special meaning > to some processors, or some processors may choose to add them to > a special error report. , but then: > Implementers > are not expected to bend over backwards in an attempt to render > Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any > of the other weird things that Unicode can encode.) What's a parser supposed to do on encountering E<12345> if it only "does" Latin-1? Output the sequence "E < 1 2 3 4 5 >" or the alternatives listed for E<identifier> above? Or is there no fallback? > And > if a Pod document uses a character not found in such a mapping, the > formatter should consider it an unrenderable character. Does this mean "E<12345>, where the mapping table has no mapping for '12345'", or "a literal character with the value 0x12345"? It sounds like "literal character" rather than "E<> sequence", but you didn't talk about literal non-ASCII characters and what should be done with them (yet?). > If you are in this circumstance, you should begin with the > characters in the range 0x00A0 - 0x00FF, which is mostly the heavily > used accented characters). ^ unmatched parenthesis. > =item Third: > > The name or URL, or undef if none. (E.g., in "LE<lt>Perl > Functions|Sperlfunc>", the name -- also sometimes called the page -- > is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.) s/Sperl/perl/ > Note that you can distinguish URL-links from anything else by the > fact that they match C<m/^\w+\:[^:\s]\S+$/s>. So > C<LE<lt>http://www.perl.comE<gt>> is a URL, but > C<LE<lt>HTTP::ResponseE<gt>> isn't. Delete the \ from the \:, and the /s modifier. (<nitpick>that means that mailto:a is not a URL? darn!</nitpick>) > The ":" on these identifiers means simply "process this stuff > normally, even though the result will be for some special target". > I suggest that parser APIs report "biblio" as the target identifier, > but also report that it had a ":" prefix. (And similarly, with the > above "html", report "html" as the target identifier, and note the > I<lack> of a ":" prefix.) Does this mean that the format names "foo" and ":foo" are necessarily related -- two different kinds of format that differ only in whether the contents are parsed? For example, there are formatters that understand =for html. Do they also have to "automatically" support =for :html? > Incidentally, that there's no easy way to express a data > paragraph starting with something that looks like a command. Is there a "note" missing at the beginning of the sentence, after the comma? Cheers, Philip