I wrote this incomplete essay (design document?) in January. It describes an unimplemented technique for making server-side web applications more transparent, inspired by TinyTemplate, TAL, Nevow, and HTML::Template. I wanted to finish it up before sending it out to the world, but it's been three months now, so I thought I'd better send it out soon.
Lossless template expansion =========================== So suppose we tag certain elements in HTML with an attribute that marks them for replacement with a value: t:src="title", or what have you. Then we can write ...<head><title t:src="title">Sample title</title></head><body><h1 t:src="title">Sample title</h1>... and have our template be a well-formed, valid XHTML document, with "Sample title" standing in for the actual value of the "title" variable. Cool, huh? This is the idea behind Tiny Template. (The t: stands for an XML namespace.) Now suppose that we leave these attributes in the rendered HTML, rather than stripping them out upon rendering (as Tiny Template does). Now our formatted HTML document contains all the semantic information about the variables that was in the original template --- in fact, you can use the formatted HTML document as a template, and it's semantically equal to the original template. The content and the formatting have become completely orthogonal and the operation of combining them has become lossless! The sacrifice is that some validators will complain that there are attributes from some foreign namespace in your HTML, but web browsers are guaranteed not to care. Sequences and attributes ------------------------ Extending this to cover the necessities of a practical HTML templating language is a little tricky. Tiny Template allows the code generating the template replacement values to generate different kinds of values that have different effects; by default, you generate a content-replacement value, but there are attribute-replacement values and repetition values as well. In order to be able to extract these values from the rendered template, the information about which attribute or attributes are being replaced, and what is being repeated, must live in the template itself. I suggest the following solutions: - for sequences: - as in HTML::Template, a page context is a set of name-value pairs, where each value is either a string or a sequence of page contexts; sequences of strings are not allowed. - an element that corresponds to a sequence of page contexts is repeated once per page context, and is tagged t:for="varname" rather than t:src="varname". Elements within it are applied to the page contexts within the sequence rather than the context the sequence belongs to. - in the interest of readability, whitespace following that element is also repeated. - for attributes: - define a t:dest attribute that names the attribute being replaced, e.g. <a href="http://example.com/" t:src="url" t:dest="href">See other.</a>. - define t:src2, t:dest2, and so on up through 4 or so, to handle multiple replacements in the same attribute, e.g. <a t:src="linktitle" t:src2="url" t:dest2="href" href="http://example.com/">Example.com</a>. So now we have t:src, t:dest, their numbered mates, and t:for. So you can write, for example: <table><tr><th>Name</th><th>Phone</th></tr> <tr t:for="person"><td t:src="name">John Smith</td><td t:src="phone">555-1212</td></tr> <tr t:for="person"><td t:src="name">John Smith</td><td t:src="phone">555-1212</td></tr> </table> And have it render nicely. Now we have a relatively full *lossless* HTML templating system, as long as none of your sequences are empty; it's only missing conditionals, and conditionals are incompatible with losslessness anyway. (You can achieve the effect with an empty sequence.) Round-trip editing ------------------ Losslessness means that you could, for example, download the rendered page, edit it, and re-upload it, and have the guy at the other end understand the structure well enough to update both the template and some back-end database so that the page, when next rendered, will look like the version you uploaded; so you can do WYSIWYG editing of database contents with, say, Dreamweaver. DHTML user interface -------------------- You could imagine DHTML that helps with the editing: - click on a named field to edit its value - each sequence item has also - a "+" symbol which duplicates it when clicked - a "-" symbol which deletes it when clicked - "<" and ">" symbols which rearrange the sequence - clicking on any HTML element brings up an edit box for its contents, whether it's database-backed or not; this lets you edit your template as easily as you can edit the data poured into it. The edit box includes a button to click to move the editing focus to its parent element. - a button to save the edited version as a new page, rather than updating the original (as a bookmarklet, this could also work with arbitrary HTML pages from other sources entirely) HTML as canonical format ------------------------ Losslessness also means there doesn't need to be a back-end database at all; you can simply store the "rendered" form of everything as HTML, and parse the HTML when you want to extract the actual data. To make this practical, we need to solve five major problems: - list/detail views: embedding data from one page into another - editing a template that applies to many pages (each representing a database record of some kind) - querying the collection of data - handling syntactic problems (like ill-formed XML, mismatched sequence items), semantically dangerous operations (like deleting a field), and semantic errors (like a field with no value). - an expression language for reports I will eventually get around to explaining how to solve these. Document fragments ------------------ Notice that the values of all variables so far are either sequences, or X(HT)ML document fragments, rather than strings. A document fragment is like an XML document, except that it doesn't contain PIs, XMLPIs, or the tags around the root element. URLs as variable names ---------------------- You can use a URL in place of a variable name wherever a variable name is required: in t:src and t:for, so far. The value of the URL is a (possibly cached) resource GETted from it, either as an XML document-as-document-fragment if possible, or as a string of text. The value of a URL, without a fragment identifier (you know, #sec1), being used as a variable in t:src is the entire resource; if there is a fragment identifier that identifies some particular element, the value of the URL is the entire element, including its enclosing tags. If the URL is being used with t:for instead of t:src, the value of a URL with no fragment identifier is a sequence containing a single page context, containing the contents of that page. (In the context of t:for, the value of a URL with a fragment identifier is irrelevant because it's not useful; except, as explained later, when the fragment id is actually a variable name.) This makes it possible to put old content wine in new template bottles. I'm not sure what to do if someone edits the content of a variable from another document. Should we try to propagate the edit (and if so, with what credentials?) or should we just discard the change, possibly with some warning? Variable names as fragment identifiers in URLs ---------------------------------------------- You can use a variable name inside of the document as a fragment identifier in the URL. If we were being pure, foo.html#walnuts should refer to the element whose id is walnuts, not whose t:src is walnuts, since there might be several elements with the same t:src. But we're not being pure. So if we have "<span id='a' t:src='b'>wiggle</span>" in c.html, then (for t:src) the value of c.html#a is "<span id='a'>wiggle</span>", and the value of c.html#b is "wiggle". For t:for, normally the fragment identifier identifies another t:for in the source documents, and its value is the sequence of page contexts in the rendered output of that original t:for. (I don't think it makes sense to refer to a t:src variable in a t:for, whether that variable is in the same page or not.) This makes it possible to reformat small bits of other documents. XPath expressions as fragment identifiers (strike this? What's the point?) ----------------------------------------- You can use an XPath expression preceded by a slash as the fragment identifier instead, to facilitate access to data that isn't actually stored in a defined field. An XPath expression evaluates to a set of nodes, rather than a particular node; I don't know that there's a really aesthetically pleasing way to handle sets of non-unity cardinality in this case in general. Perhaps the nodes could be concatenated when used in t:src, but wrapped appropriately for a t:for. By and large I don't think this is useful because it's an insufficiently effective way of extracting interesting parts from other people's documents, and an insufficiently simple way of extracting interesting parts from your own documents. Template overriding ------------------- To edit a template that applies to many pages, we have a couple of possibilities. We could either have some kind of global search-and-replace system that lets you select exactly which pages you want to edit the templates of, whenever you change a template, or we could make it possible to have many pages refer to a template that's stored in a single other page. I choose the second choice. The low-level mechanism for this is another attribute, t:template, whose value is a URL from which to get the template for the element it's attached to. So this fragment: John <b t:src="surname" t:template="#position" id="crap">Smith</b> <font color="#f70" t:src="position">President</font> we emit instead: John <font color="#f70" t:src="surname" t:template="#position">Smith</font> <font color="#f70" t:src="position">President</font> Note that this is not lossless in that it loses some information in the original template, namely that "Smith" was originally in bold, but it is lossless in that the round-trip modification is idempotent. If the URL for the template cannot be fetched, it is relatively harmless to continue to use the template it would be replacing --- in the above example, the bold. (Although since that's an intra-document anchor, the fetch could only fail if the variable name were misspelled.) After an edit round-trip, the markup in the template will match the markup from the external template, so it serves as a sort of cache in case the external template can't be fetched. I'm not sure whether this cache should be updated automatically whenever you view a page; it would certainly be useful to have a batch job to do this. When you save an edited page as a new page, if you haven't changed the templates, the root element of the new page contains a t:template reference back to the original page. When you save an edited page as itself, the edits may conflict with some referenced template; I think that, in this case, the t:template reference should be removed, but there should be an easy way to put it back. I also think that you should be able to apply an effective t:template attribute to a page as a URL argument. In particular, I think there should be a debugging template that can be applied to any page. If the referenced template has a t:dest attribute, it is used, as are any non-template attributes, and it is also used. That is, this: <a t:src="xref" class="xref" t:dest="href" href="http://example.com">q.v.</a> ... <span class="xref" t:src="xrefn" t:template="#xref">http://r2d2h2g2.example.org</span> should render as this: <a t:src="xref" class="xref" t:dest="href" href="http://example.com">q.v.</a> ... <a t:src="xrefn" class="xref" t:dest="href" href="http://r2d2h2g2.example.org" t:template="#xref">q.v.</a> and not as this: <a t:src="xref" class="xref" t:dest="href" href="http://example.com">q.v.</a> ... <a t:src="xrefn" class="xref" href="http://example.com" t:template="#xref">http://r2d2h2g2.example.org</a> If the t:template attribute appears on an element without a t:src or t:for attribute, then neither should the top element of the template to which it refers, and that template is applied to the current document context. If t:template has a fragment identifier that is an element id, the value of that template is the entire element identified by that element; and if its fragment identifier is a variable name, then the value of that template is the first element that has that variable as t:src or t:for. Indirection ----------- You could argue that this should include the contents of the document named by the variable "url": <p t:src="url" t:dest="t:template" t:template="http://oldvalue.example.com/#disclaimer" /> But this has a couple of problems: - It requires t:template to be the value of an attribute, and that's usually not particularly simple to implement if t: is really an XML namespace and you're using an XML parser that handles XML namespaces for you. - as specified earlier, t:dest from the source template gets copied over and used; that breaks this template's round-trip-ness. So instead I'm going to add another attribute: rather than specifying the name of the variable to get the replacement value from, the way t:src does, it is called t:embed and it specifies the name of the variable in which the value for t:src comes. The effect of t:embed="walnuts" is very similar to t:src2="walnuts" t:dest2="t:src", except that it works. In particular, the value of "walnuts" can be found in t:src. (Normally, in this case, you'll want to make sure walnuts has a fragment identifier in it, or you'll get the whole thing!) So we write the above example as: <p t:src="http://oldvalue.example.com/#disclaimer" t:embed="url" /> This doesn't handle double indirection in any particularly good way, so maybe I should use a formula/expression instead? Now, if an element has t:for, it can't also have t:src, so t:for with t:embed can safely have a slightly different meaning: replace the value of t:for, rather than the value of t:src. This allows reformatting of documents referenced indirectly. (Uh-oh, t:for does need to be able to have t:src; see the section about t:span for details.) It might also be desirable to have some way to indirect t:template and t:pattern (see below) URL references through a variable, and t:embed doesn't quite seem to reach that --- maybe you should just say: <p t:src="$url=http://oldvalue.example.com/#disclaimer" /> (You need the value since 'url' might or might not be mentioned anywhere else in the page.) Then you ...... Classes as fragment identifiers (strike this?) ------------------------------- Modern HTML normally uses the 'class' attribute to describe the semantics of elements, in some vocabulary specific to the particular application. As a convenience, you can use a class name, preceded by a dot ".", as a fragment identifier. In the context of t:src, the value of this fragment is the *content* of the first element of that class; in the context of t:for, I am not sure what its value should be; and in the context of t:template, the value of this fragment is the first element of that class, including both the content and the enclosing tags. This is not as useful as I hoped it would be because it doesn't provide a useful way of accessing data that is not unique within a document. Pattern-matching ---------------- Often there are external documents we would like to parse, as if they had been rendered by some template, then had the template attributes removed. Given the original template, perhaps created by a human being editing a similar page to add the template attributes, we'd like to recover the variable values. If the original page exactly matched some possible rendering of the specified template, this is mostly a solvable problem; it's always possible to produce some set of variable values, and the only problem is that there might be more than one, in the unusual case that there are two identical adjacent t:for elements with nothing between them. But that's a much simpler problem than the ones we encounter in the real world, where we're trying to recover data from pages that get reformatted by other people without warning. So I propose a looser method of matching. I'll use the word "pattern" for the template we're trying to match the foreign page against. >From the pattern, for each element that substitutes a variable, we extract a set of features: - element name - content - various prefixes of content: first character, first two characters, first word, first subelement - similarly, various suffixes of content - names of subelements - element attributes - text before - text after - for table cells, the content of the cell at the top of the column and the beginning of the row, and the index of the column - for <dd>, the content of the corresponding <dt> - all of the above for previous and following siblings - all of the above for each ancestor element, as both "nth ancestor" and "some ancestor" - all of the above for each element that links to this element If the substituted element is wrapped in a t:for, it will occur more than once, and features that do not match for all of the instances of that element are dropped. If there are other elements that match the features for some substituted element just as well as then substituted element itself does, but that is not substituted (or is substituted with a different variable), we have a potential problem, and it needs to be possible to find out about it somehow. It may be necessary to allow multiple pages to constitute the same template, in order to prevent spurious variations from getting used as identifying features for variables that occur only once per page. Now, when we try to match some foreign page against the pattern, for each variable in the pattern, we look for the element (or, for t:for, sequence of elements) in the foreign page that matches the largest number of features from some element in the template that substitutes that variable, and use the specified part of it. This should allow a fairly large degree of variation in the matched page without breaking the pattern matching. To specify that a page should be matched against some pattern, instead of using whatever t:attributes might or might not be embedded in it, we use the t:pattern attribute to name the template to use as the pattern. It might be worthwhile to allow multiple possible variations for a particular variable, which don't necessarily have to match; for example, there might be lists containing two different kinds of things intermixed indiscriminately. Inheritance ----------- Suppose you render a page context containing no variables using a template that has some variables. What do you use for the variable values? The simplest answer is that you do not substitute those values --- you use the values from the template. This allows you to change the default value of the variable in the template and have the change be reflected in any pages that haven't been edited since the variable was added. Ideally, you'd like the values to remain dependent on the template values until that particular value is edited, rather than simply until the entire page is edited. To provide this function, there is a t:inherit attribute whose presence specifies that the value of the variable is inherited from the template rather than being supplied by the page. This attribute should be added by default when saving a page as a new page, and automatically removed whenever a page save updates the variable's value. This requires keeping around the original value of the variable from the template until the edited page is saved; the simplest place to keep it is in the t:inherit attribute itself. Collections ----------- Formulas and Queries -------------------- Sometimes you need something more complex than a simple full-content interpolation or pulling out existing named variables. For example: - URL composition from a base and a relative URL - counting the number of items in a list - iterating over only the first few items in a list: the first ten items of search results, the first paragraph of a blog post - summing a column - A "No items found." message when a list is empty - limiting the size of interpolated strings: the first 80 characters of an email body - all the CSS selector stuff: first, last, if href contains 'images', whatever - interpolating URLs into JavaScript URLs - replacing double newlines with <p> tags I think the best solution for these is some kind of expression language that can be evaluated to get document fragments or sequences of document contexts, and that uses variable values as its inputs. Other possibilities include constraint languages, imperative languages, and appeals to external REST services. This leaves me with two reasonable options: either allow arbitrary expressions of the expression language in place of variable names, with variable names just being a common special case, or specify rules for calculating variable values out-of-line: <t:formula name="cssurl" src="absolute_url(base, 'style.css')" /> ... <link rel="stylesheet" t:src="cssurl" t:dest="href" href="foo" /> This is more verbose, but allows for a layer of abstraction, and also allows the code to be separated from the presentation. Probably the expression language should be JavaScript, since it's a reasonably nice language and already widely known among the folks who would use this. t:span ------ Suppose I want to write a template that extracts the words in some foreign HTML document, so that I can do some kind of operation on them, like display them one at a time with JavaScript, or count them, or whatever. To do this, I need to be able to build a template that matches bits of text that aren't HTML elements in the source documents. So I define a tag <t:span>, which has the same purpose as <span> in HTML: it merely marks a section of text and allows attributes to be attached to it, without itself implying any semantics. This is also necessary for some other scenarios. If formulas are expressed... Queries ------- URL parameters -------------- Summary ------- A variable can contain either a document fragment or a sequence of page contexts; some variables contain both. Attributes: t:src replaces the content of an element with the document-fragment value of the specified variable. t:dest specifies that t:src should replace an attribute rather than the content. t:src2, t:dest2, etc., specify other replacements to perform on the same element. Ugly hack. t:for repeats an element once for each value of a page-context-sequence variable, rendering its contents with the template contained in the t:for. t:template specifies that a particular element should use a template from somewhere else; if that template has a t:src or t:for attribute in its root element, then its value is not used, but the element with the t:template attribute must have the same attribute, either t:src or t:for, and inversely, if the template has no such attribute, neither should the element referring to it. t:embed is applied to an element that already has t:src or t:for, and indicates that that element's t:src or t:for value should come from the variable specified by t:embed, rather than the value in the template. t:pattern names a pattern template to use to parse the variables out of the specified source resource, so that you can extract semantic data from web pages not generated with this toolkit. t:inherit, applied to an element with t:src or t:for, specifies that the value found in the element itself should be overridden with whatever value is found for that variable in the external template; and, to facilitate the breaking of this inheritance link if someone edits this value and saves the page, it contains the previous value. (Maybe it should only contain a checksum of it?) Variables can be specified in many ways: A simple text string is just the name of a variable inside this file, and it normally has only one value, either document-fragment or page-context-sequenced. At present I want to exclude punctuation, but not whitespace, from this syntax. A URL without a fragment identifier has, as its document-fragment value, the entire content of the named resource ('s representation as an entity), and as its page-context-sequence value, a sequence containing one page context: the context of that page. (t:template and t:pattern have URLs as their values, but those URLs are not being used as variables.) A URL with a fragment identifier can be interpreted in several ways, depending on the fragment identifier: If the fragid is the id of some element in the entity retrieved, then its document-fragment value is that element, and its page-context-sequence value is not yet defined. If the fragid is the name of some variable in the entity retrieved, as indicated by t:src, then its document-fragment value is the content of the element with the t:src attribute. If the fragid is the name of some variable in the entity retrieved, as indicated by t:for, then its page-context-sequence value is the sequence of page contexts from that t:for. I have uncompelling uses for XPaths and class names as fragids in variable names. t:template and t:pattern have uses for URLs as ways of retrieving templates, normally the entire retrieved entity. In these URLs, fragment identifiers specify that only a part of the retrieved entity should be used, either the element with that id or the first element (in the appropriate page context) with the specified variable name. Plan for Implementation -------------------------- I don't have a permanent plan for how to order the steps, but obviously we won't have to finish everything before releasing anything. I'm thinking I could start this in Perl. Here's a list of some of the most crucial features, in a plausible order of implementation, with rough estimates in abstract "points": - parsing (plain) t:src variables out of an HTML file (3) - finding t:src variables pointing to external URLs with fragment IDs naming t:src variables in the other file (2) - a command-line tool to update the external values in an HTML file by - parsing its t:src names out - fetching the external data - interpolating the t:src variables into the template (1) - parsing t:template URLs out of an HTML file (1) - make command-line tool fetch external templates and update the file (2) - some way to handle extra fields not mentioned in external template! probably becomes a generic error-reporting mechanism. (2) - t:dest (1) - some kind of handling of document fragments being shoved into attributes (drop tags? Complain?) (1) - t:for with a plain variable name (not a URL) (4) - t:for with an external URL with a fragment ID naming a t:for variable in it (2) --- first releasable point is here (19 points so far) - t:template for t:for (may have to be implemented earlier) (2) - some sort of on-the-fly rendering scheme, so the software runs on the web server when you view a page (5) - caching of fetched URLs (2) - HTML form-POST-based page update (requires authentication) (3) - minimal DHTML UI: upload current HTML to HTML form-POST-based page update (2) - DHTML: integrate editnode bookmarklet? (1) --- second releasable point is here (another 15 points) - DHTML: make t:src fields editable with just a click (if authorized)? (2) - DHTML: save edited version as new page (2) - t:inherit (4) - set it on "save as new page"! (1) --- third releasable point is here (another 9 points) - some kind of indirection, maybe with t:embed as described (3) - minimal t:pattern support, maybe just based on tag hierarchy (5) - some way to debug pattern-matching (2) - more t:pattern heuristics (8) --- fourth releasable point is here (another 18 points) - basic formula support (binding to JavaScript) (4) - DHTML: add + and - buttons on t:for fields (if authorized) (3) - some kind of query language (embedded in formula language or not) (4) - support for URLs without fragment IDs as variable names (1) - support for real fragment IDs (not t:src or t:for variables) (1) - t:span (3) --- at this point we have another 16 points; total is 77 points