This is an idea originally from 2010-04-08 that I just never got
around to publishing until now.

Now that I’ve written the below, I’m tempted to try to hack this
together tonight, but I think I should probably sleep instead, and
maybe do this on the weekend.

Motivation
----------

This week I’ve been working on a bibliographic website, which, among
other things, renders citations into HTML.  And this is resulting in
me writing a lot of HAML templates that say things like:

    - if @publication.booktitle?
      In #{publication.booktitle}.
    - if @publication.month?
      #{@publication.month} #{@publication.year}.

The general pattern here is that there are one or more properties that
need to be present, and if they’re present, we format them together,
along with some other window dressing intended to format them
properly.  But this involves a lot of very local duplication in the
code, and even though that duplication is local, it is still
error-prone; consider what happens in the above case if the year is
missing.

Now, aside from the question of whether there are already existing
BibTeX HTML formatters for Rails, this is an interesting kind of
problem to solve.

Introducing the solution: an example
------------------------------------

So I remembered this thing I’d written up in a notebook a couple of
years ago, under the name "Backtracking HTML Templates", which makes
that kind of thing very simple, and eliminates the duplication.
Here’s a presentation by example:

    In $booktitle.<;> $month $year.<;>

This template consists of three "transactions", separated by `<;>`.
The first one concatenates "In ", the value of the variable
`booktitle`, and ".".  If it succeeds, that concatenation will be
emitted.  If it fails, nothing is emitted for that transaction.  It
will fail if any of the three things it’s concatenating fail.  Two of
them are literal strings, which always succeed, but the middle one is
a variable reference, which will only succeed if the variable is set.
The second transaction is similar, but it interpolates two variables
instead of one, so it fails unless *both* variables are set.  The
third transaction is the empty string after the second `<;>`, which is
boring but worth mentioning.

A longer example
----------------

This shows off the full feature set of this templating language,
albeit in a fairly artificial setting:

    <!DOCTYPE html>
    <html>
      <head><title><{>$title - rabujogopo.com<|>RABUJOGOPO<}></title></head>
    <body><;>
      <h1>$title</h1><;>

      <p> Hello<{>, $fullname<|>, $firstname $lastname<;><}>. <;> You
          have arrived from a search engine seeking $searchterms. <;>
          This article is tagged as 
          <{><@tags><a href="/tag/$tagname">$tagname</a><,> <}>.
          <|> This article is not tagged.<;> 
          This article is important. <if $importance == high> <;>
      </p>
      <{>
        <@sections>
        <h2>$title</h2> <!-- the section title -->
        <@contents>
        <if $type == paragraph>
          <p>$text</p>
        <|>
          <@subsections>
          <h3>$title</h3>
          <@contents>
          <p>$p</p>
      <}>
    </body></html>

### Informal explanation ###

This contains literal text, `$var`, `<;>`, `<{><}>`, `<|>`, `<if
...>`, `<@var>`, and `<,>`, which are the entire feature set of the
language.  I’ve already explained literal text, `$var`, and `<;>`, and
concatenation.

#### Alternation: `<|>` and `<{><}>` ####

The `<;>` is used to isolate a transaction, so that if it fails, you
just get an empty string instead of, say, an error-message page.  It’s
just syntactic sugar for the more general alternation construct
written with `<|>`, though.  This template:

    $foo bar<;>

could be written like this, and mean exactly the same thing:

    <{>$foo bar<|><}>

The `<|>` makes an *alternation* of two transactions.  If the first
transaction succeeds, the alternation yields the result of the first
transaction; if it fails, the alternation runs the other transaction
and yields its result.  In this case, the second transaction is the
empty string, which will always succeed.  The `<{><}>` syntactically
delimit the reach of the `<|>` operator, so that it doesn’t suck up
your entire document as its operands.

But your second transaction could be something more interesting than
an empty string.  For example, it could explain that some piece of
data was missing.

    <{>By $author.<|>Author unknown.<}>

So, you see, in the longer example template above, the page title has
a fallback of “RABUJOGOPO”.  And we greet the user by the variable
`fullname` if we have it, otherwise `firstname lastname`, or otherwise
just with “Hello.”

#### `<if condition>` ####

`<if condition>` succeeds and produces the empty string if `condition`
is true, and fails otherwise.  Because of how failure works, you can
put it before, after, or in the middle of the text you want it to
control.  So far I’ve only thought about string-equality conditions.

So, in the example above, the sentence “This article is important.” is
only emitted if the variable `importance` has the value `high`,
because otherwise the `<if>` fails, failing the whole transaction
containing that sentence.

#### Iteration: `<@var>` and `<,>` ####

`<@var>` begins a loop.  The variable named needs to name a list of
dicts.  If the variable doesn’t exist or is an empty list, the
transaction fails.  Otherwise, the text that follows `<@var>` is
evaluated once for each of the dict in the list — with the names from
the dict added to the local namespace — and the resulting values are
concatenated.  If any of the iterations of the loop fails, the whole
loop fails.

(This namespace thing is the one thing I’m not sure about here;
merging the global namespace with the per-iteration namespace means
that both human readers and the compiler have to guess which scope a
given variable refers to.)

Because the loop fails if it executes zero times, you can use `<|>` to
provide an alternative to display in the case where a collection came
up empty, as in the “This article is not tagged.” example above.

`<,>` is an optional part of the loop construct.  `<@var>a<,>b`
evaluates `ab` once for each item in the list *except the last*, and
for the last item, evaluates only `a`.

So, in the above, we actually have loops nested four deep, with
sections containing contents, which may contain either text or
subsections, which contain further contents.  And in the loop over
`<@tags>`, the links to the individual tags are separated by spaces,
but the last tag is not followed by a space, because the space is
between the `<,>` and the outside of the `<{><}>`.

Some notes on grammar
---------------------

For the most part, this is a relatively traditional infix grammar,
masquerading as a markup language.  `<{><}>` provide nothing more than
syntactic grouping; `<|>`, `<;>`, bare juxtaposition, and `<@var>` are
infix operators; and `<,>` can be thought of as an infix operator that
only makes sense within the right operand of `<@var>`.

The precedence order I’ve implicitly used above, from tightest binding
to loosest:

* concatenation or juxtaposition;
* `<|>`
* `<;>`
* `<@var>` and `<,>`

I don’t know if this is the best order, or even a sane one.

As it happens, all of juxtaposition, `<|>`, and `<;>` are associative,
which provides some liberty; but the looping construct is
not. `<{> a <@b> c <}> <@d> e` is not the same as `a <@b> <{> c <@d> e <}>`, 
because in the first case, `<@b>` and `<@d>` are separate iterations
that get concatenated, and in the second case, `<@d>` is nested inside
`<@b>`.

(It’s kind of nice to have a syntactically flat structure that’s
nevertheless capable of expressing complex things.)

In the above, I treated `<@var>` as right-associative: `<@a><@b>x` is
`<@a><{><@b>x<}>`, not `<{><@a><}><@b>x`.  That was just because it
made that particular example come out syntactically simpler, not
because of any deep analysis.

Variations
----------

The current syntax completely fails to take advantage of the most
desirable ASCII punctuation characters, which are “.”, “:”, “'”, and
“"”.  You could write `<.sections>` instead of `<@sections>`, and `<:
$importance == high >` instead of `<if $importance == high>`, thus
reducing the visual noise a bit.  However, `<if ...>` is more
understandable.  Maybe `<:if ...>` to avoid clashes with SGML.

Similarly, `<<` and `>>` might be better than `<{>` and `<}>`, which
look a bit unbalanced.  `>>` has the disadvantage that it could
legitimately occur in normal HTML, and both could occur in normal JS
or similar languages.  Also, they’re ambiguous when they’re used with
HTML tags adjacent to them.

With those two variations we would get:

    <!DOCTYPE html>
    <html>
      <head><title><<$title - rabujogopo.com<|>RABUJOGOPO>></title></head>
    <body><;>
      <h1>$title</h1><;>

      <p> Hello<<, $fullname<|>, $firstname $lastname<;>>>. <;> You
          have arrived from a search engine seeking $searchterms. <;>
          This article is tagged as 
          <<<.tags><a href="/tag/$tagname">$tagname</a><,> >>.
          <|> This article is not tagged.<;> 
          This article is important. <:$importance == high> <;>
      </p>
      <<
        <.sections>
        <h2>$title</h2> <!-- the section title -->
        <.contents>
        <:$type == paragraph>
          <p>$text</p>
        <|>
          <.subsections>
          <h3>$title</h3>
          <.contents>
          <p>$p</p>
      >>
    </body></html>

You need some kind of escaping mechanism in any case.

The loop-namespace problem bothers me.  One approach is to name the
loop variable, and extend `$var` to `$var.prop`:

        <@sec in $sections>
        <h2>$sec.title</h2>
        <@c in $contents>
        <if $c.type == paragraph>
          <p>$c.text</p>
        <|>
          <@s in $c.subsections>
          <h3>$s.title</h3>
          <@p in $s.contents>
          <p>$p.p</p>

Another approach might be to use Perl/BASIC/Ruby/CoffeeScript sigils
to distinguish variables from different scopes.  Ruby has
`@instance_variables` and `@@class_variables`.  If we only supported
two levels of scope — globals, and variables inside the innermost loop
— we could do the same thing, using separate sigils, like `$.` and
`<@. >`, for loop-locals:

        <@sections>
        <h2>$.title</h2> <!-- the section title -->
        <@.contents>
        <if $.type == paragraph>
          <p>$.text</p>
        <|>
          <@.subsections>
          <h3>$.title</h3>
          <@.contents>
          <p>$.p</p>

I think that if you want to have more than two active scopes, you
can’t rely on sigils; you end up having to count nested scopes, and
basically writing de Bruijn indices by hand, which is not an
acceptable user interface.  You’d need to use the `$c.text` approach I
suggested earlier.

I had at one point thought about making the data model simpler:
instead of a namespace being a mapping from names to either strings or
lists of namespaces, a namespace would be a mapping from names to
values, where a value was either a string or a list of values, like in
Scheme’s macro system.  So you’d iterate down lists in parallel,
hoping they were the same length and depth:

      <dl>
        <{> <@ $n $v>
          <dt>$n</dt>
          <dd>$v</dd>
        <}>
      </dl>
    <|>
      Empty.

Or:

    <p> <@ $word $synonym>
    <b>$word</b>: <{><@ $synonym>$synonym<,>, <}>
    </p>

In Scheme, you only have to mention the variable names once, instead
of twice; you stick a `...` after the structure containing them to
indicate at which syntactic level the repetition is supposed to occur.
If you did the same thing here, using `<[><]>` to indicate repetition,
you’d get this:

    <[>
    <p><b>$word</b>: <[>$synonym<,>, <]></p>
    <]>

That gives the same kind of DRY implicit-DWIM feeling to iteration
that `<|>` gives to conditionals, and with the sigil approach, you
could even mix global variables with loop variables.  Still, I’m not
convinced that it’s better, particularly since it forces what I think
of as a fairly inflexible data model.
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol

Reply via email to