> On 8 Jan 2015, at 10:16 am, Dave Fisher <[email protected]> wrote: > > Hi Peter, > > This is a helpful email from your concrete discussion I can better understand > the mapping between the abstract / HTML model and the concrete / DOCX, ODT. > > You mention differences in the style runs for Word and ODT of which I am > familiar from the OOXML side. Does the abstract model / HTML take a > particular approach towards style runs? Is there a concrete version of the > HTML model? Is there a specification or plan for the abstract model?
As a general principle, no - a given filter is expected to handle arbitrary HTML. However, there is a function for “normalising” a HTML document to change nested sets of inline elements (span, b, i, etc.) into a flat sequence of runs (each represented as a span element). The Word filter uses this, due to Word’s flat model of inline runs. ODF text documents, on the other hand, *do* support nested formatting runs, so when writing this filter it may make sense not to apply the normalisation process used in the word filter. This should be done if there is information that could not be represented in HTML and would be lost by flattening the structure like we do for word. There’s been a few times where the topic of what internal representation we should use has been raised - whether we should stick with HTML, come up with our own entirely different model, or something else. I personally think HTML is a good choice, but perhaps for those who have raised the issue of an alternate intermediate form, this might be a good time to start that discussion ;) — Dr Peter M. Kelly [email protected] PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
