On Tue, 5 Jan 2021 at 10:37, Joel Kulesza <jkule...@gmail.com> wrote:
> On Tue, Jan 5, 2021 at 1:19 AM Pavel Sanda <sa...@lyx.org> wrote: > >> On Mon, Jan 04, 2021 at 09:48:42PM +0100, Thibaut Cuvelier wrote: >> > There are multiple issues here. What is needed to generate HTML and >> DocBook >> > is a simple SAX writer, not a parser. I've done plenty of research about >> > it, there's no XML library that does that. Most of them are using a DOM, >> > which is a total waste of memory for such an application: it stores a >> > complete XML tree in memory before serialising it. With SAX, you just >> need >> > a string backend, which is much more lightweight (by several factors). >> >> After little bit more thinking, is using DOM actually that big issue? >> I mean how much it takes - for document of length n its O(n) in space? >> >> Sure, it might be cut to constant, but practically speaking when you have >> 100 pages document what is the real time/memory consumption. Timewise >> you spent 1s in XML compared to next 30s in conversion figures to pdf or >> whatever format? Spacewise probably one more time than what we >> already allocated for document itself. >> >> If using more heavy-weight caliber xml lib is not pain from API point >> of view (and I do not know, you are the expert here) then we might >> actually consider it, given the difficulties in SAX space? >> > > I had a similar thought and will note that I've had good success on other > projects with pugixml. > It's typical to have a DOM tree that is two to five times larger than the raw text, that's not always negligible (Xerces is close to 2, Java implementations anywhere between 2 and 5, I haven't checked pugixml or TinyXML2 for this specific criterion). But that's not the real issue: for generating HTML and DocBook, for now, DOM is not so useful from a developer point of view, DOM is more suitable to handle an existing document or to modify it, not really to generate one from scratch. A SAX writer is really what's the most appropriate, given the way LyX is internally structured: there is very little need to go backward when generating the file (e.g., add something to the header when encountering some LyX inset). Using DOM will not really simplify the code (I'm speaking for the DocBook export, which is highly similar to HTML). However, it might make its logic easier to understand for a newcomer. Nevertheless, DOM comes with more complex syntax: with SAX, you are only appending content to the file, with only strings; with DOM, you have to indicate where you want to write something (with methods like InsetEndChild), and you pass around complete XML nodes (built from the same strings). More specifically, in SAX (where stream is mostly a large string object with helper methods): stream.writeStartTag("tag"); With DOM, taking the example of TinyXML2 (where document is the root of the DOM tree and node the node in the tree that is being filled): node->InsertEndChild( document->NewElement("tag") ); Both are perfectly good choices, though. If we write a thin layer on top of a DOM writer (as Riki suggested, this would allow decoupling with the actual XML library), we might be able to have a syntax close to that of SAX while having the extra flexibility of DOM. This way, the LyX code would be clean, and avoid current intricacies to output things at the right place (in DocBook, especially the <info> tag). More specifically, @Pavel: for DocBook, you spend 0% of your time dealing with images, as it's supposed to be done by the DocBook processor afterwards. Any gain in the XML part of LyX will be noticeable by the user for large documents (book-sized). (And I won't say that something being O(n) is negligible in this case: I'm using daily exponential-time algorithms that work so much faster than polynomial-time ones…)
-- lyx-devel mailing list lyx-devel@lists.lyx.org http://lists.lyx.org/mailman/listinfo/lyx-devel