> On 8 Jan 2015, at 10:16 am, Dave Fisher <[email protected]> wrote:
> 
> Hi Peter,
> 
> This is a helpful email from your concrete discussion I can better understand 
> the mapping between the abstract / HTML model and the concrete / DOCX, ODT.
> 
> You mention differences in the style runs for Word and ODT of which I am 
> familiar from the OOXML side. Does the abstract model / HTML take a 
> particular approach towards style runs? Is there a concrete version of the 
> HTML model? Is there a specification or plan for the abstract model?

As a general principle, no - a given filter is expected to handle arbitrary 
HTML.

However, there is a function for “normalising” a HTML document to change nested 
sets of inline elements (span, b, i, etc.) into a flat sequence of runs (each 
represented as a span element). The Word filter uses this, due to Word’s flat 
model of inline runs.

ODF text documents, on the other hand, *do* support nested formatting runs, so 
when writing this filter it may make sense not to apply the normalisation 
process used in the word filter. This should be done if there is information 
that could not be represented in HTML and would be lost by flattening the 
structure like we do for word.

There’s been a few times where the topic of what internal representation we 
should use has been raised - whether we should stick with HTML, come up with 
our own entirely different model, or something else. I personally think HTML is 
a good choice, but perhaps for those who have raised the issue of an alternate 
intermediate form, this might be a good time to start that discussion ;)

—
Dr Peter M. Kelly
[email protected]

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Reply via email to