-- reply below to -- From: jan i [mailto:[email protected]] Sent: Thursday, January 8, 2015 08:12 To: [email protected] Subject: Re: ODF filter
On 8 January 2015 at 16:59, Peter Kelly <[email protected]> wrote: [ ... ] > As a general principle, no - a given filter is expected to handle > arbitrary HTML. > > However, there is a function for “normalising” a HTML document to change > nested sets of inline elements (span, b, i, etc.) into a flat sequence of > runs (each represented as a span element). The Word filter uses this, due > to Word’s flat model of inline runs. > > ODF text documents, on the other hand, *do* support nested formatting > runs, so when writing this filter it may make sense not to apply the > normalisation process used in the word filter. This should be done if there > is information that could not be represented in HTML and would be lost by > flattening the structure like we do for word. > > There’s been a few times where the topic of what internal representation > we should use has been raised - whether we should stick with HTML, come up > with our own entirely different model, or something else. I personally > think HTML is a good choice, but perhaps for those who have raised the > issue of an alternate intermediate form, this might be a good time to start > that discussion ;) > Point taken, I am I assume the first who questioned it. But just to be precise, I am happy having HTML as the internal structure, but I am unhappy that filters can do what they like with the HTML. My goal is to define a set of access functions that filters should use to navigate/insert/delete tags and restrictions on what can be put in the tags. Just image one filter needs to id some tags, therefore uses id=, another filter needs to name some tags, therefore uses name=. If we are not careful here it will explode and reading HTML becomes nearly as complicated as reading the formats directly. We should have 1 and only 1 HTML definition, which the filters can use. rgds jan I. <orcmid> I'm not following this well. Let me ask it this way: Are we talking about fixing some sort of DOM over the HTML5 or are we allowing arbitrary HTML5 and transforming to and from it? I am having trouble visualizing this process -- is the intermediate concrete HTML and not some DOM view? This relates to how inter-conversion is to be tested. Is there some abstraction against which document features are assessed and mapped through or are we working concrete level to/from concrete level and that is essentially it? Help me calibrate my understanding of the thrust. </orcmid>
