This is an idea originally from 2010-04-08 that I just never got around to publishing until now.
Now that I’ve written the below, I’m tempted to try to hack this together tonight, but I think I should probably sleep instead, and maybe do this on the weekend. Motivation ---------- This week I’ve been working on a bibliographic website, which, among other things, renders citations into HTML. And this is resulting in me writing a lot of HAML templates that say things like: - if @publication.booktitle? In #{publication.booktitle}. - if @publication.month? #{@publication.month} #{@publication.year}. The general pattern here is that there are one or more properties that need to be present, and if they’re present, we format them together, along with some other window dressing intended to format them properly. But this involves a lot of very local duplication in the code, and even though that duplication is local, it is still error-prone; consider what happens in the above case if the year is missing. Now, aside from the question of whether there are already existing BibTeX HTML formatters for Rails, this is an interesting kind of problem to solve. Introducing the solution: an example ------------------------------------ So I remembered this thing I’d written up in a notebook a couple of years ago, under the name "Backtracking HTML Templates", which makes that kind of thing very simple, and eliminates the duplication. Here’s a presentation by example: In $booktitle.<;> $month $year.<;> This template consists of three "transactions", separated by `<;>`. The first one concatenates "In ", the value of the variable `booktitle`, and ".". If it succeeds, that concatenation will be emitted. If it fails, nothing is emitted for that transaction. It will fail if any of the three things it’s concatenating fail. Two of them are literal strings, which always succeed, but the middle one is a variable reference, which will only succeed if the variable is set. The second transaction is similar, but it interpolates two variables instead of one, so it fails unless *both* variables are set. The third transaction is the empty string after the second `<;>`, which is boring but worth mentioning. A longer example ---------------- This shows off the full feature set of this templating language, albeit in a fairly artificial setting: <!DOCTYPE html> <html> <head><title><{>$title - rabujogopo.com<|>RABUJOGOPO<}></title></head> <body><;> <h1>$title</h1><;> <p> Hello<{>, $fullname<|>, $firstname $lastname<;><}>. <;> You have arrived from a search engine seeking $searchterms. <;> This article is tagged as <{><@tags><a href="/tag/$tagname">$tagname</a><,> <}>. <|> This article is not tagged.<;> This article is important. <if $importance == high> <;> </p> <{> <@sections> <h2>$title</h2> <!-- the section title --> <@contents> <if $type == paragraph> <p>$text</p> <|> <@subsections> <h3>$title</h3> <@contents> <p>$p</p> <}> </body></html> ### Informal explanation ### This contains literal text, `$var`, `<;>`, `<{><}>`, `<|>`, `<if ...>`, `<@var>`, and `<,>`, which are the entire feature set of the language. I’ve already explained literal text, `$var`, and `<;>`, and concatenation. #### Alternation: `<|>` and `<{><}>` #### The `<;>` is used to isolate a transaction, so that if it fails, you just get an empty string instead of, say, an error-message page. It’s just syntactic sugar for the more general alternation construct written with `<|>`, though. This template: $foo bar<;> could be written like this, and mean exactly the same thing: <{>$foo bar<|><}> The `<|>` makes an *alternation* of two transactions. If the first transaction succeeds, the alternation yields the result of the first transaction; if it fails, the alternation runs the other transaction and yields its result. In this case, the second transaction is the empty string, which will always succeed. The `<{><}>` syntactically delimit the reach of the `<|>` operator, so that it doesn’t suck up your entire document as its operands. But your second transaction could be something more interesting than an empty string. For example, it could explain that some piece of data was missing. <{>By $author.<|>Author unknown.<}> So, you see, in the longer example template above, the page title has a fallback of “RABUJOGOPO”. And we greet the user by the variable `fullname` if we have it, otherwise `firstname lastname`, or otherwise just with “Hello.” #### `<if condition>` #### `<if condition>` succeeds and produces the empty string if `condition` is true, and fails otherwise. Because of how failure works, you can put it before, after, or in the middle of the text you want it to control. So far I’ve only thought about string-equality conditions. So, in the example above, the sentence “This article is important.” is only emitted if the variable `importance` has the value `high`, because otherwise the `<if>` fails, failing the whole transaction containing that sentence. #### Iteration: `<@var>` and `<,>` #### `<@var>` begins a loop. The variable named needs to name a list of dicts. If the variable doesn’t exist or is an empty list, the transaction fails. Otherwise, the text that follows `<@var>` is evaluated once for each of the dict in the list — with the names from the dict added to the local namespace — and the resulting values are concatenated. If any of the iterations of the loop fails, the whole loop fails. (This namespace thing is the one thing I’m not sure about here; merging the global namespace with the per-iteration namespace means that both human readers and the compiler have to guess which scope a given variable refers to.) Because the loop fails if it executes zero times, you can use `<|>` to provide an alternative to display in the case where a collection came up empty, as in the “This article is not tagged.” example above. `<,>` is an optional part of the loop construct. `<@var>a<,>b` evaluates `ab` once for each item in the list *except the last*, and for the last item, evaluates only `a`. So, in the above, we actually have loops nested four deep, with sections containing contents, which may contain either text or subsections, which contain further contents. And in the loop over `<@tags>`, the links to the individual tags are separated by spaces, but the last tag is not followed by a space, because the space is between the `<,>` and the outside of the `<{><}>`. Some notes on grammar --------------------- For the most part, this is a relatively traditional infix grammar, masquerading as a markup language. `<{><}>` provide nothing more than syntactic grouping; `<|>`, `<;>`, bare juxtaposition, and `<@var>` are infix operators; and `<,>` can be thought of as an infix operator that only makes sense within the right operand of `<@var>`. The precedence order I’ve implicitly used above, from tightest binding to loosest: * concatenation or juxtaposition; * `<|>` * `<;>` * `<@var>` and `<,>` I don’t know if this is the best order, or even a sane one. As it happens, all of juxtaposition, `<|>`, and `<;>` are associative, which provides some liberty; but the looping construct is not. `<{> a <@b> c <}> <@d> e` is not the same as `a <@b> <{> c <@d> e <}>`, because in the first case, `<@b>` and `<@d>` are separate iterations that get concatenated, and in the second case, `<@d>` is nested inside `<@b>`. (It’s kind of nice to have a syntactically flat structure that’s nevertheless capable of expressing complex things.) In the above, I treated `<@var>` as right-associative: `<@a><@b>x` is `<@a><{><@b>x<}>`, not `<{><@a><}><@b>x`. That was just because it made that particular example come out syntactically simpler, not because of any deep analysis. Variations ---------- The current syntax completely fails to take advantage of the most desirable ASCII punctuation characters, which are “.”, “:”, “'”, and “"”. You could write `<.sections>` instead of `<@sections>`, and `<: $importance == high >` instead of `<if $importance == high>`, thus reducing the visual noise a bit. However, `<if ...>` is more understandable. Maybe `<:if ...>` to avoid clashes with SGML. Similarly, `<<` and `>>` might be better than `<{>` and `<}>`, which look a bit unbalanced. `>>` has the disadvantage that it could legitimately occur in normal HTML, and both could occur in normal JS or similar languages. Also, they’re ambiguous when they’re used with HTML tags adjacent to them. With those two variations we would get: <!DOCTYPE html> <html> <head><title><<$title - rabujogopo.com<|>RABUJOGOPO>></title></head> <body><;> <h1>$title</h1><;> <p> Hello<<, $fullname<|>, $firstname $lastname<;>>>. <;> You have arrived from a search engine seeking $searchterms. <;> This article is tagged as <<<.tags><a href="/tag/$tagname">$tagname</a><,> >>. <|> This article is not tagged.<;> This article is important. <:$importance == high> <;> </p> << <.sections> <h2>$title</h2> <!-- the section title --> <.contents> <:$type == paragraph> <p>$text</p> <|> <.subsections> <h3>$title</h3> <.contents> <p>$p</p> >> </body></html> You need some kind of escaping mechanism in any case. The loop-namespace problem bothers me. One approach is to name the loop variable, and extend `$var` to `$var.prop`: <@sec in $sections> <h2>$sec.title</h2> <@c in $contents> <if $c.type == paragraph> <p>$c.text</p> <|> <@s in $c.subsections> <h3>$s.title</h3> <@p in $s.contents> <p>$p.p</p> Another approach might be to use Perl/BASIC/Ruby/CoffeeScript sigils to distinguish variables from different scopes. Ruby has `@instance_variables` and `@@class_variables`. If we only supported two levels of scope — globals, and variables inside the innermost loop — we could do the same thing, using separate sigils, like `$.` and `<@. >`, for loop-locals: <@sections> <h2>$.title</h2> <!-- the section title --> <@.contents> <if $.type == paragraph> <p>$.text</p> <|> <@.subsections> <h3>$.title</h3> <@.contents> <p>$.p</p> I think that if you want to have more than two active scopes, you can’t rely on sigils; you end up having to count nested scopes, and basically writing de Bruijn indices by hand, which is not an acceptable user interface. You’d need to use the `$c.text` approach I suggested earlier. I had at one point thought about making the data model simpler: instead of a namespace being a mapping from names to either strings or lists of namespaces, a namespace would be a mapping from names to values, where a value was either a string or a list of values, like in Scheme’s macro system. So you’d iterate down lists in parallel, hoping they were the same length and depth: <dl> <{> <@ $n $v> <dt>$n</dt> <dd>$v</dd> <}> </dl> <|> Empty. Or: <p> <@ $word $synonym> <b>$word</b>: <{><@ $synonym>$synonym<,>, <}> </p> In Scheme, you only have to mention the variable names once, instead of twice; you stick a `...` after the structure containing them to indicate at which syntactic level the repetition is supposed to occur. If you did the same thing here, using `<[><]>` to indicate repetition, you’d get this: <[> <p><b>$word</b>: <[>$synonym<,>, <]></p> <]> That gives the same kind of DRY implicit-DWIM feeling to iteration that `<|>` gives to conditionals, and with the sigil approach, you could even mix global variables with loop variables. Still, I’m not convinced that it’s better, particularly since it forces what I think of as a fairly inflexible data model. -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-tol