Pod as shorthand for XML

Sean M. Burke Thu, 03 Jan 2002 13:02:09 -0800

So I've been thinking many Deep Thoughts lately about Pod.

I have competing goals in the design of Pod as a document format:


The first and foremost goal is the absolute requirement that Pod be
sufficient for easily writing text documentation, and that its semantics be
simple enough for all its constructs to be easily translatable into any
sane markup language or typesetting system. I think that with new perlpod
and perlpodspec and my forthcoming/in-progress new Pod parser, that is
pretty much taken care of. (Then there's just the cleanup work of actually
bringing the formatters up to date, notably the currently appalling
repulsive oozing hissing pod2html.) 

The second goal is that Pod be extensible enough that you could use it as a
sort of "Huffman-coding for XML" (as I remember Larry once expressing the
idea, altho I'm quoting from memory, as I can't now find the exact
message). This idea isn't a real requirement, but hooboy, it'd be nice if I
could pull it off. I'm thinking not of all XML, but just XML for document
formatting.  I feel like I've got most of the idea so far, but it's all
quite tentative and hazy and incomplete.  So I'm posting it here in the
hopes someone might have a clever idea about it.


The basic idea is that current Pod syntax is a notational variant of a
subset of XML, and the job of a Pod parser such as I'm writing is to
turnthe Pod syntax into either the equivalent XML, or to produce SAX events
such as you'd get if you fed that XML into a SAX parser.  So this means
"B<foo>" turns into either "<B>foo</B>", or into the three events
start_element() for "B", characters() for "foo", and end_element() for "B".

So that gets you as far as my first goal, making a parser for just Pod as
defined in the specs.  (That's without getting into what "=item *" turns
into, but that's just a detail.)

However, I want to do the whole extensibility thing.  My dream, in its
grandest form, is that for whatever XML document format people are dealing
with (say, DocBook), no-one will have to key in XML, and people could
instead key it in with Pod syntax.  So instead of:

The <emphasis>destructor</emphasis> (<function>DESTROY</function>) for the
object <literal>$b</literal> will be called...

You would have something like:

=equate M emphasis,B

=equate U function,C

=equate T literal,C

and then anytime later...

The M<destructor> (U<function>) for the object T<$b> will be called...

The meaning of "=equate M emphasis,B" is "In the rest of this document, I
may use a nonstandard formatting code 'M' as shorthand for 'emphasis'
(that's an XML element name), but unless the Pod processor has told the Pod
parser library that it understands 'emphasis', fall back on using 'B'
instead".  The end of that list has to be either one of the standard Pod
formatting codes (B, C, F, etc.), or the specials "0" or "1" -- "0" meaning
"ignore it and its content", and "1" meaning "just have its content, with
no code around it".
(Incidentally, I'm still not committed to that exact syntax -- among other
things, I keep waffling between "=equate" and something like "=extend" or
something klunkier like "=defcode".) 

That's all fine, and I think it addresses most of the problems that I think
are out there.  But there's some residual things I think:

0) It doesn't allow PIs (<?foo bar?>) or comments (<!-- foo -->)
Yes, you can do
=for whatever xml <?foo bar?>
But you can't do it in the middle of a paragraph.
Nor can you do arcana like entity declarations or anything.
Hohum, I don't know that this is a big problem.

1) There's no provision for something like this:
<some-block-level-thing>
<some-other-block-level-thing>
<para>I'm a cucumber, I'm a cucumber, I'm a cucumber, I'm a cucumber, Don't
you take me to the pickle farm!</para>
</some-other-block-level-thing>
</some-block-level-thing>

That is, =equate doesn't allow defining block-scope elements, since all Pod
formatting codes (N<...>) are only within a paragraph (or heading, etc.).
Maybe I could just say that if you want this, just use "=for whateverxml
<some-block-level-thing><some-other-block-level-thing>", and then later
"=for whateverxml
</some-other-block-level-thing></some-block-level-thing>".  But if there's
some better way, I'd like to hear it.

2) In the example above, how to have paragraph-like things that aren't in a
<para>...</para> but are in a <whoozits>...</whoozits>. Maybe this could
just be done with something like "=equate Para =whoozits,Para", where Para
means the invisible label you get on plain paragraphs?  I'm less happy with
this.

3) Attributes -- so far I have ways of getting <foo>...</foo> things. But
no way of getting a bar="baz" in <foo bar="baz">...</foo>.  Maybe I could
do some hoodoo like:
=extend K SuperMagicName,0
where anything aliased to SuperMagicName sets the attribute value(s) for
the next element and then disappears, so K<class="thang"
level="3">B<shazbot> emits <B class="thang" level="3">shazbot</B>.
I'd like to find some really convincing way of saying that no, we don't
need attributes.  Some XML instances really need them, but maybe I get to
say "if you need attributes, then you need to write yourself some layer of
indirection between your Pod and the output XML, which inserts whatever
attributes you need".  But on the other hand, that layer of indirection
would /possibly/ be something not vastly different from what I'm proposing
with the SuperMagicName thing, so my not providing it might be a case of
making everyone independently and badly invent the wheel that I'm refusing
to provide.


While I realize that these are all "problems" not with Pod, but with the
attempt to allow use Pod as a shorthand for XML.  But like I say, if
there's some way to kill many birds with one stone (without requiring that
stone be the size of Ireland, be in hyperspace, and/or be made of
neutronium), it'd be nice to do, so that we could spread around the
numminess of Pod!

Thoughts, anyone?

--
Sean M. Burke    [EMAIL PROTECTED]    http://www.spinn.net/~sburke/

Pod as shorthand for XML

Reply via email to