Re: [Document Model] Initial questions about web-based application

Peter Kelly Tue, 10 Mar 2015 17:26:35 -0700

> On 8 Mar 2015, at 10:20 pm, Franz de Copenhague 
> <[email protected]> wrote:
> 
> I agree that HTML5 is a good model to feed into the editing library to 
> support the edition of paragraphs, lists, text, tables, images. But what 
> about sections, headers, footers, fields (author, date, etc ), styles and 
> themes? All of them are document features implemented either docx or odt and 
> so far they are not supported by DocFormat API.


So this is where things get tricky :)

HTML5 does not directly support all the features of OOXML/ODF word processing 
documents, so we need to figure out whether or not we are going to support 
these features and, if so, how. With UX Write I’ve always taken the position 
that it was never intended to be a complete replacement for Word/OO and that it 
was subject to the inherent limitations of HTML (e.g. no page breaks, tabs, 
headers/footers etc). But I got a *lot* of complains about the lack of those 
features, which meant a difficult situations as those can only be properly be 
added (at least in terms of doing the layout calculations) by modifying the web 
layout engine itself - although some of them can be “faked” using javascript. 
But I think we can find a way to support most of these.

Sections: (This term is ambiguous unfortunately as it can both mean different 
parts of a document e.g. “See Section 3.2 for details” and part of the document 
that has separate page layout settings). We could support these using a <div> 
with a custom CSS class, e.g. “corinthia-section”, which means that a browser 
or any other HTML-supporting program will still be able to make sense of the 
document, only that we will know that class=“corinthia-section” has special 
semantics that we handle appropriately in both DocFormats and the editor.

There are actually a few instances already where I’ve used custom class names 
for this purpose - see DocFormats/core/src/common/DFClassNames.h. Currently 
these use the “uxwrite-“ prefix, which should be changed to “corinthia-“ - this 
is a fairly easy task for someone to take on if perhaps if they want to start 
making a contribution since it’s largely just find and replace. When the change 
occurs we must also update the tests.

For sections, we could alternatively use the <article> tag which is also in 
HTML5, and thinking about it I’d actually favour this more than a div since 
then we can avoid relying on a custom class name. There is a <section> element 
also but this is for sections in the “see section 3.2” sense (i.e. what appears 
in the table of contents of a report).

Headers and footers: HTML5 actually has <header> and <footer> elements - but, 
bizarelly, they don’t seem to be intended for the same process as the way we 
think of them in traditional word processing. However just checking the spec 
now it seems they’ve made it a little more clearer. Even if browsers won’t 
necessarily display them properly as such, due to the non-paginated layout 
model used on the web, it’s at least the closest we can get in terms of how we 
represent things. We may be able to have the editor use CSS tricks to display 
the header and footer content at the top and bottom of the screen.

Fields: There’s a few of these that are handled already, though the set is 
fairly limited. These are:

- Table of contents
- List of figures
- List of tables
- Cross-reference (to a section, figure or table) - can be text only, label + 
number, caption text, etc.

See DFClassNames.h for the list of these, and also grep through the JS files in 
Editor/src and the OOXML filter to see how they’re used. I think using custom 
CSS class names to identify them, and perhaps data- attributes where we need 
extra information would be appropriate.

Incidentally, once nice thing about how these are handled in the Editor is it 
updates them automatically, in the same way that a spreadsheet automatically 
recalculates formulas. Every time you add, remove, or rename a section (<h1> to 
<h6>), figure, or table (in the case of the latter two, changing content of the 
caption), the table of contents and all cross-references are updated. This is 
handled in Outline.js. This also reports changes to the outline structure of 
these items to callback functions, so the editor can display a “document map” 
or outline view in the UI.

Styles: Already handled, via CSS. See for example 
DocFormats/filters/ooxml/src/word/WordStyles.c which is where the translation 
is done for OOXML Word documents.

For the Editor, the JS code there’s no facilities for manipulating styles 
directly other than simply getting and settings the CSS text. DocFormats 
provides a set of classes for representing CSS stylesheets, styles, and 
property collections, which can be used in native (C/C++/Objective C) code. UX 
Write uses this API, and the Qt editor can do the same. For the web-based 
version of an editor, we’ll need to create a similar set of data structures for 
the Web UI.

Themes: I’m not sure what the best strategy for this is, but I’d say something 
along the lines of CSS stylesheets that can be reused among different documents 
would probably be the way to go. This requires a lot of thought and 
investigation.

—
Dr Peter M. Kelly
[email protected]

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: [Document Model] Initial questions about web-based application

Reply via email to