Re: XML stream writer library

2021-01-12 Thread Richard Kimberly Heck
On 1/12/21 6:19 PM, Thibaut Cuvelier wrote:
> On Tue, 12 Jan 2021 at 16:33, Lorenzo Bertini
> mailto:lorenzobertin...@gmail.com>> wrote:
>
> Il 08/01/21 03:00, Thibaut Cuvelier ha scritto:
> > A tour of some C++ libraries for XML:
> > - RapidXML: mostly unmaintained since 2013, no support for
> namespaces
> > (except in forks: https://github.com/dwd/rapidxml
> 
> > >)
> > - Boost Property Tree: no XML parser, which limits further use
> (it can
> > use RapidXML though, see above)
> > - libstudxml: C++ library, designed for speed, no DOM
> > - libxml2: C library, designed for features and not speed (also
> includes
> > XPath and XSLT, DTD and XML Schema, namespaces), "mature" and
> barely not
> > evolving anymore
> > - libxml++: depends on glibmm2
> > - Xerces-C++: C++ library, designed for features and not speed
> (also
> > includes XPath, DTD and XML Schema, namespaces), "mature" and
> barely not
> > evolving anymore; no XSLT (Xalan could be used, but it only
> works with a
> > ancient version of Xerces; XQuilla implemented XPath 2, but is
> no more
> > developed since 2016)
> > - Expat: C library, designed for speed, no DOM by default
> (provided by
> > https://github.com/kolotsey/expat-dom
> 
> >  >), with namespaces
> > - tinyxml2: C++ library, designed for speed only (also includes
> XPath
> > through the unmaintained
> https://github.com/stanthomas/tinyxml2-ex
> 
> >  >, no validation, no
> > namespaces), mature and slowly evolving
> > - pugixml: C++ library, designed for speed with a few features
> (like
> > XPath, no validation, no namespaces), mature and evolving
> > - libroxml: C library, no clear design goal (includes XPath,
> namespaces,
> > no validation), evolving
> > - Saxon-C: C/C++ wrapper of the state-of-the-art Java library,
> largest
> > amount of features (XPath and XSLT 3, DTD and XML Schema
> validation --
> > extension for RelaxNG: http://www.cfoster.net/saxon-jing/
> 
> >  > --, namespaces), very mature,
> > really evolving (both performance and features), but it requires
> a JVM
> > (Excelsior is built-in, even though it's not been maintained for
> quite a
> > long time)
> > - Qt: no, I was joking :). Qt XML is not supported anymore, it's
> > recommended to switch to QXmlStreamReader and QXmlStreamWriter
> (which
> > are only SAX-like). Qt XML Patterns used to have XPath, XSLT,
> and XML
> > Schema, but it's been deprecated a while ago (Qt 5.13 for the last
> > wake-up call, but it hasn't been touched since Qt 4, basically)
> >
> > If LyX is being really serious about XML (i.e. moving as many
> things as
> > possible to XML technologies), Saxon is probably the way to go.
> > Otherwise, it's going to be too heavy to ship Saxon and a JVM
> along with
> > LyX. Instead, pugixml seems to me like a good choice: a few
> features
> > (XPath is the most relevant for LyX, and included in the base
> library,
> > no need for addons), good performance, still maintained (there is a
> > chance to have bugs fixed in a newer version, plus security
> > vulnerabilities taken care of).
> Was this addressed in the virtual meeting? 
>
>
> As far as I know, it wasn't discussed.

We were pretty focused on planning for 2.4.0.

 

> Anyhow, I think that for a start we'd need only the most basic
> features
> (tag insertion, indent), as was the purpose of #12055 in the first
> place
> (I'm sorry to have opened this pandora's box), so maybe no harm will
> come if we start wrapping pugi.
>
> Let me know what you think, and if this is not the time for this, as
> with LyX 2.4 coming out there might be other things that need focus.
>
>
> It looks like the patches cannot get integrated into the master
> development branch before 2.4 is out (or at least branched). However,
> in the meantime, I think I can create a feature branch and push your
> patches there (https://www.lyx.org/trac/browser/features
> ).

Yes, that would be the way to go.

Riki



-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-12 Thread Thibaut Cuvelier
On Tue, 12 Jan 2021 at 16:33, Lorenzo Bertini 
wrote:

> Il 08/01/21 03:00, Thibaut Cuvelier ha scritto:
> > A tour of some C++ libraries for XML:
> > - RapidXML: mostly unmaintained since 2013, no support for namespaces
> > (except in forks: https://github.com/dwd/rapidxml
> > )
> > - Boost Property Tree: no XML parser, which limits further use (it can
> > use RapidXML though, see above)
> > - libstudxml: C++ library, designed for speed, no DOM
> > - libxml2: C library, designed for features and not speed (also includes
> > XPath and XSLT, DTD and XML Schema, namespaces), "mature" and barely not
> > evolving anymore
> > - libxml++: depends on glibmm2
> > - Xerces-C++: C++ library, designed for features and not speed (also
> > includes XPath, DTD and XML Schema, namespaces), "mature" and barely not
> > evolving anymore; no XSLT (Xalan could be used, but it only works with a
> > ancient version of Xerces; XQuilla implemented XPath 2, but is no more
> > developed since 2016)
> > - Expat: C library, designed for speed, no DOM by default (provided by
> > https://github.com/kolotsey/expat-dom
> > ), with namespaces
> > - tinyxml2: C++ library, designed for speed only (also includes XPath
> > through the unmaintained https://github.com/stanthomas/tinyxml2-ex
> > , no validation, no
> > namespaces), mature and slowly evolving
> > - pugixml: C++ library, designed for speed with a few features (like
> > XPath, no validation, no namespaces), mature and evolving
> > - libroxml: C library, no clear design goal (includes XPath, namespaces,
> > no validation), evolving
> > - Saxon-C: C/C++ wrapper of the state-of-the-art Java library, largest
> > amount of features (XPath and XSLT 3, DTD and XML Schema validation --
> > extension for RelaxNG: http://www.cfoster.net/saxon-jing/
> >  --, namespaces), very mature,
> > really evolving (both performance and features), but it requires a JVM
> > (Excelsior is built-in, even though it's not been maintained for quite a
> > long time)
> > - Qt: no, I was joking :). Qt XML is not supported anymore, it's
> > recommended to switch to QXmlStreamReader and QXmlStreamWriter (which
> > are only SAX-like). Qt XML Patterns used to have XPath, XSLT, and XML
> > Schema, but it's been deprecated a while ago (Qt 5.13 for the last
> > wake-up call, but it hasn't been touched since Qt 4, basically)
> >
> > If LyX is being really serious about XML (i.e. moving as many things as
> > possible to XML technologies), Saxon is probably the way to go.
> > Otherwise, it's going to be too heavy to ship Saxon and a JVM along with
> > LyX. Instead, pugixml seems to me like a good choice: a few features
> > (XPath is the most relevant for LyX, and included in the base library,
> > no need for addons), good performance, still maintained (there is a
> > chance to have bugs fixed in a newer version, plus security
> > vulnerabilities taken care of).
> Was this addressed in the virtual meeting?


As far as I know, it wasn't discussed.


> Also, since Xerces-C was the
> most feature full and mature after Saxon-C, I was curious as to why you
> didn't mention it.
>

Actually, Xerces-C and Xerces-C++ are just the same thing (the official
name being Xerces-C++ and the name of the packages Xerces-C, if I got it
correctly).


> Anyhow, I think that for a start we'd need only the most basic features
> (tag insertion, indent), as was the purpose of #12055 in the first place
> (I'm sorry to have opened this pandora's box), so maybe no harm will
> come if we start wrapping pugi.
>
> Let me know what you think, and if this is not the time for this, as
> with LyX 2.4 coming out there might be other things that need focus.
>

It looks like the patches cannot get integrated into the master development
branch before 2.4 is out (or at least branched). However, in the meantime,
I think I can create a feature branch and push your patches there (
https://www.lyx.org/trac/browser/features).
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-12 Thread Lorenzo Bertini

Il 08/01/21 03:00, Thibaut Cuvelier ha scritto:

A tour of some C++ libraries for XML:
- RapidXML: mostly unmaintained since 2013, no support for namespaces 
(except in forks: https://github.com/dwd/rapidxml 
)
- Boost Property Tree: no XML parser, which limits further use (it can 
use RapidXML though, see above)

- libstudxml: C++ library, designed for speed, no DOM
- libxml2: C library, designed for features and not speed (also includes 
XPath and XSLT, DTD and XML Schema, namespaces), "mature" and barely not 
evolving anymore

- libxml++: depends on glibmm2
- Xerces-C++: C++ library, designed for features and not speed (also 
includes XPath, DTD and XML Schema, namespaces), "mature" and barely not 
evolving anymore; no XSLT (Xalan could be used, but it only works with a 
ancient version of Xerces; XQuilla implemented XPath 2, but is no more 
developed since 2016)
- Expat: C library, designed for speed, no DOM by default (provided by 
https://github.com/kolotsey/expat-dom 
), with namespaces
- tinyxml2: C++ library, designed for speed only (also includes XPath 
through the unmaintained https://github.com/stanthomas/tinyxml2-ex 
, no validation, no 
namespaces), mature and slowly evolving
- pugixml: C++ library, designed for speed with a few features (like 
XPath, no validation, no namespaces), mature and evolving
- libroxml: C library, no clear design goal (includes XPath, namespaces, 
no validation), evolving
- Saxon-C: C/C++ wrapper of the state-of-the-art Java library, largest 
amount of features (XPath and XSLT 3, DTD and XML Schema validation -- 
extension for RelaxNG: http://www.cfoster.net/saxon-jing/ 
 --, namespaces), very mature, 
really evolving (both performance and features), but it requires a JVM 
(Excelsior is built-in, even though it's not been maintained for quite a 
long time)
- Qt: no, I was joking :). Qt XML is not supported anymore, it's 
recommended to switch to QXmlStreamReader and QXmlStreamWriter (which 
are only SAX-like). Qt XML Patterns used to have XPath, XSLT, and XML 
Schema, but it's been deprecated a while ago (Qt 5.13 for the last 
wake-up call, but it hasn't been touched since Qt 4, basically)


If LyX is being really serious about XML (i.e. moving as many things as 
possible to XML technologies), Saxon is probably the way to go. 
Otherwise, it's going to be too heavy to ship Saxon and a JVM along with 
LyX. Instead, pugixml seems to me like a good choice: a few features 
(XPath is the most relevant for LyX, and included in the base library, 
no need for addons), good performance, still maintained (there is a 
chance to have bugs fixed in a newer version, plus security 
vulnerabilities taken care of).
Was this addressed in the virtual meeting? Also, since Xerces-C was the 
most feature full and mature after Saxon-C, I was curious as to why you 
didn't mention it.


Anyhow, I think that for a start we'd need only the most basic features 
(tag insertion, indent), as was the purpose of #12055 in the first place 
(I'm sorry to have opened this pandora's box), so maybe no harm will 
come if we start wrapping pugi.


Let me know what you think, and if this is not the time for this, as 
with LyX 2.4 coming out there might be other things that need focus.

--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-07 Thread Thibaut Cuvelier
On Thu, 7 Jan 2021 at 18:23, Thibaut Cuvelier  wrote:

> On Thu, 7 Jan 2021, 12:52 Lorenzo Bertini, 
> wrote:
>
>> I think almost all the options are on the table at this point. For the
>> sake of completeness I think it's worth mentioning DOM library Boost
>> Property Tree, which popped up frequently while searching.
>>
>> I think Thibaut is right when saying that, for the way LyX is structured
>> now, a SAX writer would be more appropriate, because we won't work on
>> xml directly, but convert the LyX file. However most of the libraries
>> have a DOM approach, and also, if someday we'll convert LyX format to
>> something xml-like, we might have to start all of this again.
>>
>> I did a small benchmark with pugixml and to both read and write a xml
>> document of 2.2Mb of equivalent ~100/120 pages chock full of math: it
>> takes negligble time to both read and write on my really modest laptop
>> A10-9600). Peak memory consumption was 14Mb, but since some MathML was
>> corrupted (it has trouble with backslash \) it's possible it might be
>> way less once fixed: LyX consumption opening the corresponding LyX file
>> was ~120Mb. The benchmark table in
>> <
>> http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1performance>
>>
>> seems to indicate that pugixml and RapidXML have performance just one
>> order greater than strlen, so I don't think parse time will ever be a
>> problem.
>
>
> Thanks for your benchmark. For me, the major difference between the two
> libraries is that pugixml is still maintained, but not really RapidXML. And
> XML parsing is very often a source of security problems (not just XXE).
>
> I'm unfamiliar with the concept of "wrapping" libraries and "layers": is
>> it when you write your own classes and methods on top of some common
>> stuff those libraries do, so if for whatever reason you have to switch
>> you can "plug" another easily?
>>
>
> Yes, exactly.
>

Below is my take on
https://stackoverflow.com/questions/9387610/what-xml-parser-should-i-use-in-c
and https://github.com/fffaraz/awesome-cpp#xml

XPath would be very useful if LyX switches to an XML representation (easy
queries on an XML document, think of SQL for XML).
XSLT is a way to describe transformations from XML to anything. If LyX
switches to an XML representation, it might be used to replace C++
exporters (but formula conversion will be a pain!). It might lower the
entry bar for new contributors, even though XSLT is not an easy language.
XQuery is a script language for XML processes.
Apart from Java libraries, only versions 1.0 are implemented: apart from
XPath, it really limits their use… A state-of-the-art implementation of the
current norms is Saxon, which has a C binding.

To allow for validation of XML files (i.e. check they respect some
grammar), DTD is the oldest way (inherited from SGML), XML Schema adds many
features over DTD (like types). The best technology nowadays is RelaxNG
(it's not recent: 2005), which is much more powerful than XML Schema.

XInclude is the XML way of specifying includes of other files (not
necessarily XML). Think \input in LaTeX or LyX child documents with a few
more features.

Name spaces are similar to those of C++, and are especially useful when
mixing several standards (like MathML and DocBook).

A tour of some C++ libraries for XML:
- RapidXML: mostly unmaintained since 2013, no support for namespaces
(except in forks: https://github.com/dwd/rapidxml)
- Boost Property Tree: no XML parser, which limits further use (it can use
RapidXML though, see above)
- libstudxml: C++ library, designed for speed, no DOM
- libxml2: C library, designed for features and not speed (also includes
XPath and XSLT, DTD and XML Schema, namespaces), "mature" and barely not
evolving anymore
- libxml++: depends on glibmm2
- Xerces-C++: C++ library, designed for features and not speed (also
includes XPath, DTD and XML Schema, namespaces), "mature" and barely not
evolving anymore; no XSLT (Xalan could be used, but it only works with a
ancient version of Xerces; XQuilla implemented XPath 2, but is no more
developed since 2016)
- Expat: C library, designed for speed, no DOM by default (provided by
https://github.com/kolotsey/expat-dom), with namespaces
- tinyxml2: C++ library, designed for speed only (also includes XPath
through the unmaintained https://github.com/stanthomas/tinyxml2-ex, no
validation, no namespaces), mature and slowly evolving
- pugixml: C++ library, designed for speed with a few features (like XPath,
no validation, no namespaces), mature and evolving
- libroxml: C library, no clear design goal (includes XPath, namespaces, no
validation), evolving
- Saxon-C: C/C++ wrapper of the state-of-the-art Java library, largest
amount of fea

Re: XML stream writer library

2021-01-07 Thread Thibaut Cuvelier
On Thu, 7 Jan 2021, 12:52 Lorenzo Bertini, 
wrote:

> I think almost all the options are on the table at this point. For the
> sake of completeness I think it's worth mentioning DOM library Boost
> Property Tree, which popped up frequently while searching.
>
> I think Thibaut is right when saying that, for the way LyX is structured
> now, a SAX writer would be more appropriate, because we won't work on
> xml directly, but convert the LyX file. However most of the libraries
> have a DOM approach, and also, if someday we'll convert LyX format to
> something xml-like, we might have to start all of this again.
>
> I did a small benchmark with pugixml and to both read and write a xml
> document of 2.2Mb of equivalent ~100/120 pages chock full of math: it
> takes negligble time to both read and write on my really modest laptop
> A10-9600). Peak memory consumption was 14Mb, but since some MathML was
> corrupted (it has trouble with backslash \) it's possible it might be
> way less once fixed: LyX consumption opening the corresponding LyX file
> was ~120Mb. The benchmark table in
> <
> http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1performance>
>
> seems to indicate that pugixml and RapidXML have performance just one
> order greater than strlen, so I don't think parse time will ever be a
> problem.


Thanks for your benchmark. For me, the major difference between the two
libraries is that pugixml is still maintained, but not really RapidXML. And
XML parsing is very often a source of security problems (not just XXE).

I'm unfamiliar with the concept of "wrapping" libraries and "layers": is
> it when you write your own classes and methods on top of some common
> stuff those libraries do, so if for whatever reason you have to switch
> you can "plug" another easily?
>

Yes, exactly.

>
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-07 Thread Lorenzo Bertini
I think almost all the options are on the table at this point. For the 
sake of completeness I think it's worth mentioning DOM library Boost 
Property Tree, which popped up frequently while searching.


I think Thibaut is right when saying that, for the way LyX is structured 
now, a SAX writer would be more appropriate, because we won't work on 
xml directly, but convert the LyX file. However most of the libraries 
have a DOM approach, and also, if someday we'll convert LyX format to 
something xml-like, we might have to start all of this again.


I did a small benchmark with pugixml and to both read and write a xml 
document of 2.2Mb of equivalent ~100/120 pages chock full of math: it 
takes negligble time to both read and write on my really modest laptop 
A10-9600). Peak memory consumption was 14Mb, but since some MathML was 
corrupted (it has trouble with backslash \) it's possible it might be 
way less once fixed: LyX consumption opening the corresponding LyX file 
was ~120Mb. The benchmark table in 
<http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1performance> 
seems to indicate that pugixml and RapidXML have performance just one 
order greater than strlen, so I don't think parse time will ever be a 
problem.


I'm unfamiliar with the concept of "wrapping" libraries and "layers": is 
it when you write your own classes and methods on top of some common 
stuff those libraries do, so if for whatever reason you have to switch 
you can "plug" another easily?


Thanks, Lo.
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-06 Thread Thibaut Cuvelier
On Tue, 5 Jan 2021 at 10:37, Joel Kulesza  wrote:

> On Tue, Jan 5, 2021 at 1:19 AM Pavel Sanda  wrote:
>
>> On Mon, Jan 04, 2021 at 09:48:42PM +0100, Thibaut Cuvelier wrote:
>> > There are multiple issues here. What is needed to generate HTML and
>> DocBook
>> > is a simple SAX writer, not a parser. I've done plenty of research about
>> > it, there's no XML library that does that. Most of them are using a DOM,
>> > which is a total waste of memory for such an application: it stores a
>> > complete XML tree in memory before serialising it. With SAX, you just
>> need
>> > a string backend, which is much more lightweight (by several factors).
>>
>> After little bit more thinking, is using DOM actually that big issue?
>> I mean how much it takes - for document of length n its O(n) in space?
>>
>> Sure, it might be cut to constant, but practically speaking when you have
>> 100 pages document what is the real time/memory consumption. Timewise
>> you spent 1s in XML compared to next 30s in conversion figures to pdf or
>> whatever format? Spacewise probably one more time than what we
>> already allocated for document itself.
>>
>> If using more heavy-weight caliber xml lib is not pain from API point
>> of view (and I do not know, you are the expert here) then we might
>> actually consider it, given the difficulties in SAX space?
>>
>
> I had a similar thought and will note that I've had good success on other
> projects with pugixml.
>

It's typical to have a DOM tree that is two to five times larger than the
raw text, that's not always negligible (Xerces is close to 2, Java
implementations anywhere between 2 and 5, I haven't checked pugixml or
TinyXML2 for this specific criterion). But that's not the real issue: for
generating HTML and DocBook, for now, DOM is not so useful from a developer
point of view, DOM is more suitable to handle an existing document or to
modify it, not really to generate one from scratch. A SAX writer is really
what's the most appropriate, given the way LyX is internally structured:
there is very little need to go backward when generating the file (e.g.,
add something to the header when encountering some LyX inset).

Using DOM will not really simplify the code (I'm speaking for the DocBook
export, which is highly similar to HTML). However, it might make its logic
easier to understand for a newcomer. Nevertheless, DOM comes with more
complex syntax: with SAX, you are only appending content to the file, with
only strings; with DOM, you have to indicate where you want to write
something (with methods like InsetEndChild), and you pass around complete
XML nodes (built from the same strings).

More specifically, in SAX (where stream is mostly a large string object
with helper methods):

stream.writeStartTag("tag");

With DOM, taking the example of TinyXML2 (where document is the root of the
DOM tree and node the node in the tree that is being filled):

node->InsertEndChild( document->NewElement("tag") );

Both are perfectly good choices, though. If we write a thin layer on top of
a DOM writer (as Riki suggested, this would allow decoupling with the
actual XML library), we might be able to have a syntax close to that of SAX
while having the extra flexibility of DOM. This way, the LyX code would be
clean, and avoid current intricacies to output things at the right place
(in DocBook, especially the  tag).

More specifically, @Pavel: for DocBook, you spend 0% of your time dealing
with images, as it's supposed to be done by the DocBook processor
afterwards. Any gain in the XML part of LyX will be noticeable by the user
for large documents (book-sized).
(And I won't say that something being O(n) is negligible in this case: I'm
using daily exponential-time algorithms that work so much faster than
polynomial-time ones…)
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-05 Thread Joel Kulesza
On Tue, Jan 5, 2021 at 1:19 AM Pavel Sanda  wrote:

> On Mon, Jan 04, 2021 at 09:48:42PM +0100, Thibaut Cuvelier wrote:
> > There are multiple issues here. What is needed to generate HTML and
> DocBook
> > is a simple SAX writer, not a parser. I've done plenty of research about
> > it, there's no XML library that does that. Most of them are using a DOM,
> > which is a total waste of memory for such an application: it stores a
> > complete XML tree in memory before serialising it. With SAX, you just
> need
> > a string backend, which is much more lightweight (by several factors).
>
> After little bit more thinking, is using DOM actually that big issue?
> I mean how much it takes - for document of length n its O(n) in space?
>
> Sure, it might be cut to constant, but practically speaking when you have
> 100 pages document what is the real time/memory consumption. Timewise
> you spent 1s in XML compared to next 30s in conversion figures to pdf or
> whatever format? Spacewise probably one more time than what we
> already allocated for document itself.
>
> If using more heavy-weight caliber xml lib is not pain from API point
> of view (and I do not know, you are the expert here) then we might
> actually consider it, given the difficulties in SAX space?
>

I had a similar thought and will note that I've had good success on other
projects with pugixml.

Regards,
Joel
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-05 Thread Pavel Sanda
On Mon, Jan 04, 2021 at 09:48:42PM +0100, Thibaut Cuvelier wrote:
> There are multiple issues here. What is needed to generate HTML and DocBook
> is a simple SAX writer, not a parser. I've done plenty of research about
> it, there's no XML library that does that. Most of them are using a DOM,
> which is a total waste of memory for such an application: it stores a
> complete XML tree in memory before serialising it. With SAX, you just need
> a string backend, which is much more lightweight (by several factors). 

After little bit more thinking, is using DOM actually that big issue?
I mean how much it takes - for document of length n its O(n) in space? 

Sure, it might be cut to constant, but practically speaking when you have 
100 pages document what is the real time/memory consumption. Timewise
you spent 1s in XML compared to next 30s in conversion figures to pdf or
whatever format? Spacewise probably one more time than what we
already allocated for document itself.

If using more heavy-weight caliber xml lib is not pain from API point
of view (and I do not know, you are the expert here) then we might
actually consider it, given the difficulties in SAX space?

Pavel
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-04 Thread Richard Kimberly Heck
On 1/4/21 5:10 PM, Pavel Sanda wrote:
> On Mon, Jan 04, 2021 at 09:48:42PM +0100, Thibaut Cuvelier wrote:
>> My recommendation, based on a quite long study of XML libraries (i.e.
>> several years, but quite far from full-time): either use QXmlStreamWriter
>> (which is mostly a SAX implementation in C++) or write our own.
>> QXmlStreamWriter is almost 4k-line long, but it can substantially be
>> simplified in our case (
>> https://github.com/qt/qtbase/blob/54875be84de059374920e4c0deacd13a41caaa13/src/corelib/serialization/qxmlstream.cpp).
>>
>>
>> TinyXML2 (https://github.com/leethomason/tinyxml2), pugixml (
>> https://github.com/zeux/pugixml), and Xerces-C++ (
>> https://xerces.apache.org/xerces-c/) are only DOM-based. There are quite a
>> few C libraries, like libxml2, that can be SAX-like, but C libraries are
>> horrible to use (http://www.xmlsoft.org/examples/testWriter.c).

I did some searching and, yes, I see the problem. Word is that recent
versions of libxml and libxml2 have dependencies on Gnome libraries that
we don't want.

I'll let you know if I get any answers to my question on the Fedora list.


> I do not dare to make any qualified recommendation between the choices
> above. But thinking aloud -- if there de facto isn't an alternative
> to QXmlStreamWriter, would it be hard to separate that class from
> the rest of Qt, fork and include it as an internal lyx routine?
> We would have full control over that code without unnecessary surprises
> of Qt's development.

I was going to suggest something in this spirit.

If, as our usual policy has been, we confine QXmlStreamWrapper to
support/, then what that basically means is writing our own LyX API as a
kind of wrapper around the Qt stuff. (Thibaut, if you haven't already,
you might look at how the FileName class. Much of it is a wrapper around
QFile.) Some, even many, of the routines might just directly call the Qt
equivalent (probably after a call to toqstr, from qstring_helpers). This
would be a relatively quick way to get something that worked and was
easy to use, and work on adapting DocBook and HTML export to this code
could proceed.

At that point, we could then write our own XML backend, possibly
adapting it from the Qt code. There are quite a few dependencies there,
but I'll guess some of them we do not need (e.g., the QApplication and
QFile dependencies). Our we build a lightweight library from scratch.
(It does seem like maybe there's a general need for that.) With the
already functioning backend from QXmlStreamWrapper, it would be easy to
test our own code and make sure it was producing the same output.

Riki


-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-04 Thread Pavel Sanda
On Mon, Jan 04, 2021 at 09:48:42PM +0100, Thibaut Cuvelier wrote:
> My recommendation, based on a quite long study of XML libraries (i.e.
> several years, but quite far from full-time): either use QXmlStreamWriter
> (which is mostly a SAX implementation in C++) or write our own.
> QXmlStreamWriter is almost 4k-line long, but it can substantially be
> simplified in our case (
> https://github.com/qt/qtbase/blob/54875be84de059374920e4c0deacd13a41caaa13/src/corelib/serialization/qxmlstream.cpp).
> 
> 
> TinyXML2 (https://github.com/leethomason/tinyxml2), pugixml (
> https://github.com/zeux/pugixml), and Xerces-C++ (
> https://xerces.apache.org/xerces-c/) are only DOM-based. There are quite a
> few C libraries, like libxml2, that can be SAX-like, but C libraries are
> horrible to use (http://www.xmlsoft.org/examples/testWriter.c).

I do not dare to make any qualified recommendation between the choices
above. But thinking aloud -- if there de facto isn't an alternative
to QXmlStreamWriter, would it be hard to separate that class from
the rest of Qt, fork and include it as an internal lyx routine?
We would have full control over that code without unnecessary surprises
of Qt's development.

Pavel
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-04 Thread Yuriy Skalko

TinyXML2 (https://github.com/leethomason/tinyxml2), pugixml (
https://github.com/zeux/pugixml), and Xerces-C++ (
https://xerces.apache.org/xerces-c/) are only DOM-based. There are quite a
few C libraries, like libxml2, that can be SAX-like, but C libraries are
horrible to use (http://www.xmlsoft.org/examples/testWriter.c).


There are several C++ wrappers for libxml2 on GitHub. Maybe they can be 
useful:


https://github.com/libxmlplusplus/libxmlplusplus
https://github.com/rioki/libxmlmm


Yuriy
--
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-04 Thread Thibaut Cuvelier
On Mon, 4 Jan 2021 at 20:30, Richard Kimberly Heck  wrote:

> On 1/3/21 3:37 PM, Lorenzo Bertini wrote:
>
> Hello list,
>
> In 12055 <https://www.lyx.org/trac/ticket/12055>, discussing the merge of
> some MathMLStream and XmlStream components, we were contemplating the
> possibility of using an external library to handle XML streams, for example
> with indentation and tag insertion. One of the candidates was
> QXmlStreamWriter <https://doc.qt.io/qt-5/qxmlstreamwriter.html> class,
> but with the talk about removing unnecessary Qt components we thought to
> ask the list.
>
> Lest us know what do you think it's the best course, and if you know of
> other libraries we should look.
>
> As I mention in the bug, I looked over various XML libraries a while ago,
> when I was thinking about the long-standing idea of converting LyX's own
> format to XML. There seemed to be a myriad of options, and I never settled
> upon one. But it looks like there's a general feeling that we don't want to
> get too married to Qt---any more than we already are. That is in part
> because Qt seems to break itself fairly frequently (especially on OSX) and
> partly because they keep changing their attitude towards open source. There
> was some thing not long ago about how recent updates would only be
> available to paid subscribers right away, or something like that.
>
> So I'd generally suggest searching around for good, well-maintained XML
> libraries, maybe asking on Stack Exchange what people like. I'll send an
> email to the Fedora list and see what suggestions pop up.
>
There are multiple issues here. What is needed to generate HTML and DocBook
is a simple SAX writer, not a parser. I've done plenty of research about
it, there's no XML library that does that. Most of them are using a DOM,
which is a total waste of memory for such an application: it stores a
complete XML tree in memory before serialising it. With SAX, you just need
a string backend, which is much more lightweight (by several factors). In
this case, as the content is generated without ever looking back, SAX is
the best choice.

You have more choices in the Java world, and the standard library is often
enough (well, the standard extensions javax and JAXP). If you need a good
XML tool, chances are it will be written in Java, especially if it's open
source (Saxon for XSLT or XQuery, eXist or MarkLogic for XML database).

On the other hand, if you want to represent a complete LyX document and
work on it, you'd rather go for DOM, as you will always have the whole
structure in memory: you may want to edit things at any point in the
document. (Unless there is never an operation on the file structures, and
only on the set of insets of the document)

My recommendation, based on a quite long study of XML libraries (i.e.
several years, but quite far from full-time): either use QXmlStreamWriter
(which is mostly a SAX implementation in C++) or write our own.
QXmlStreamWriter is almost 4k-line long, but it can substantially be
simplified in our case (
https://github.com/qt/qtbase/blob/54875be84de059374920e4c0deacd13a41caaa13/src/corelib/serialization/qxmlstream.cpp).


TinyXML2 (https://github.com/leethomason/tinyxml2), pugixml (
https://github.com/zeux/pugixml), and Xerces-C++ (
https://xerces.apache.org/xerces-c/) are only DOM-based. There are quite a
few C libraries, like libxml2, that can be SAX-like, but C libraries are
horrible to use (http://www.xmlsoft.org/examples/testWriter.c).
-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: XML stream writer library

2021-01-04 Thread Richard Kimberly Heck
On 1/3/21 3:37 PM, Lorenzo Bertini wrote:
>
> Hello list,
>
> In 12055 , discussing the merge
> of some MathMLStream and XmlStream components, we were contemplating
> the possibility of using an external library to handle XML streams,
> for example with indentation and tag insertion. One of the candidates
> was QXmlStreamWriter 
> class, but with the talk about removing unnecessary Qt components we
> thought to ask the list.
>
> Lest us know what do you think it's the best course, and if you know
> of other libraries we should look.
>
As I mention in the bug, I looked over various XML libraries a while
ago, when I was thinking about the long-standing idea of converting
LyX's own format to XML. There seemed to be a myriad of options, and I
never settled upon one. But it looks like there's a general feeling that
we don't want to get too married to Qt---any more than we already are.
That is in part because Qt seems to break itself fairly frequently
(especially on OSX) and partly because they keep changing their attitude
towards open source. There was some thing not long ago about how recent
updates would only be available to paid subscribers right away, or
something like that.

So I'd generally suggest searching around for good, well-maintained XML
libraries, maybe asking on Stack Exchange what people like. I'll send an
email to the Fedora list and see what suggestions pop up.

Riki


-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


XML stream writer library

2021-01-03 Thread Lorenzo Bertini
Hello list,

In 12055 , discussing the merge
of some MathMLStream and XmlStream components, we were contemplating the
possibility of using an external library to handle XML streams, for
example with indentation and tag insertion. One of the candidates was
QXmlStreamWriter  class,
but with the talk about removing unnecessary Qt components we thought to
ask the list.

Lest us know what do you think it's the best course, and if you know of
other libraries we should look.

Lo (lynx in trac).

-- 
lyx-devel mailing list
lyx-devel@lists.lyx.org
http://lists.lyx.org/mailman/listinfo/lyx-devel


Re: writer [and Insets]

1999-02-10 Thread Allan Rae

On Tue, 9 Feb 1999, John Weiss wrote:
[...]
  How often do we add new insets? 
  (I'd say more often than new writers)
 
 Therein lies a major problem:
 
 Every time someone wants to support some new whiz-bang LaTeX package,
 or an SGMLDocBook feature, and so on, they go and add:
1) A new Inset for it, for the LyX document editing pane.
2) A new menu entry for it.
3) A [possibly] new submenu for it.
4) A new popup, or subpopup, or new widgets in an existing popup.
 
 Clearly, this approach is not scaleable.

[other good scalability points snipped...]
 
 Why not instead make the kernel --- including output formats, Insets,
 display, and popups --- generic?  I said this many moons ago: after
 cleanup, LyX should be extensible through *.layout files for 90% of
 new features.  This is what Tex/LaTeX did; they seldom need to alter
 the core code, which is highly stable.
 
 Yes, it will require more work to do this, especially with popups and,
 to a degree, Insets.  Figuring out a generic scheme by which something
 like a *.layout file can specify a new popup will be *very* tricky.
 Doing the same for insets won't be quite as hard, but will still be
 tricky.  I think, however, that we can lay the foundation for this by
 1.1.

 I realize I'm all talk at the moment.  Truth be know, I haven't looked
 at the source for 1.1  I would like to, however.  I'd like to
 stimulate a discussion on this.
 
 Some Points:
   Support for optional arguments, or secondary/tertiary arguments,
   to environments are accreting onto the Paragraph Layout popup.
   For most environments, however, that support need only be a popup
   with a line edit and a descriptive label for each arg, plus some
   overall description for these.
 
   Certain optional args lend themselves to "fixed" collapsable
   insets.  These are insets like a footnote, but with a full-line,
   normal font label.  [E.g.: Think short-form section headings and
   captions.]

My latest email on the "writer" thread proposed generic handling of the
optional arguments (admittedly I didn't spell out how this would all
work).  The generic begin(...), end(...), and command(...) member
functions of the Writer hierarchy are driven by definitions in the .layout
files.  The LaTeXName parameters format would be changed to include the
list of options such as "section[]{}" then if we have generic insets based
on our current breakup of paragraph types ITEM_ENVIRONMENT, ENVIRONMENT,
COMMAND etc. these insets can handle the various data as required.
Probably in a vectorLString since each new command/environment has a
different number of options and only compulsory options must be
present in the call to begin(...) or command(...).

That leaves the generic popup.  I think it was probably you John who
suggested ages ago that we have a generic popup with six or seven text
input fields for such a purpose.  The problem of accessing the popup is a
little trickier since we have two scopes: document and paragraph.
Paragraph level is easy enough -- right mouse click on the paragraph
brings up a menu appropriate to that paragraph with a few other useful
entries perhaps (this is already planned anyway).  We might offer
a "Paragraph format -- Document Scope" option on that menu otherwise
document scope is a bit trickier.  Hmmm... how would a given paragraph
find out what its corresponding document scope settings are? 

The labels for said input fields would be kept in the .layouts (and hence
they'd need to be included in POTFILES).

 Some Questions:
   How expensive would it be to make the Insets a base class with
   many protected members, member to perform the most frequently
   used operations for any Inset?
   [The idea:  special Insets, e.g. Math Mode, would subclass from
   this and use this "inset toolkit" internally.  There would also
   be one child class, "GenericInset", that implements any of these
   operations/features depending on the use of constructor flags.]

I'm not sure what you're getting at here.  What effect will the
constructor flags have:  reconfiguring which inherited methods are
supported?

   What are common features of LaTeX, SGML, XML, HTML, etc., that
   we'll need to support?

The ones we currently support now:
fonts
item environments
environments
commands
specialised versions of above:
ref
toc
figures
tables
math
footnotes/marginnotes

   What existing features fit a similar schema?

What exciting new packages might also fit this schema?
And better yet which ones don't?

   What features could use a similar popup?  Similar insets?
 
Most environments should be able to be supported by the environment inset
as described above (and likewise the command inset handling o

Re: writer [and Insets]

1999-02-10 Thread Allan Rae

On Tue, 9 Feb 1999, John Weiss wrote:
[...]
> > How often do we add new insets? 
> > (I'd say more often than new writers)
> 
> Therein lies a major problem:
> 
> Every time someone wants to support some new whiz-bang LaTeX package,
> or an SGMLDocBook feature, and so on, they go and add:
>1) A new Inset for it, for the LyX document editing pane.
>2) A new menu entry for it.
>3) A [possibly] new submenu for it.
>4) A new popup, or subpopup, or new widgets in an existing popup.
> 
> Clearly, this approach is not scaleable.

[other good scalability points snipped...]
 
> Why not instead make the kernel --- including output formats, Insets,
> display, and popups --- generic?  I said this many moons ago: after
> cleanup, LyX should be extensible through *.layout files for 90% of
> new features.  This is what Tex/LaTeX did; they seldom need to alter
> the core code, which is highly stable.
> 
> Yes, it will require more work to do this, especially with popups and,
> to a degree, Insets.  Figuring out a generic scheme by which something
> like a *.layout file can specify a new popup will be *very* tricky.
> Doing the same for insets won't be quite as hard, but will still be
> tricky.  I think, however, that we can lay the foundation for this by
> 1.1.

> I realize I'm all talk at the moment.  Truth be know, I haven't looked
> at the source for 1.1  I would like to, however.  I'd like to
> stimulate a discussion on this.
> 
> Some Points:
>   Support for optional arguments, or secondary/tertiary arguments,
>   to environments are accreting onto the Paragraph Layout popup.
>   For most environments, however, that support need only be a popup
>   with a line edit and a descriptive label for each arg, plus some
>   overall description for these.
> 
>   Certain optional args lend themselves to "fixed" collapsable
>   insets.  These are insets like a footnote, but with a full-line,
>   normal font label.  [E.g.: Think short-form section headings and
>   captions.]

My latest email on the "writer" thread proposed generic handling of the
optional arguments (admittedly I didn't spell out how this would all
work).  The generic begin(...), end(...), and command(...) member
functions of the Writer hierarchy are driven by definitions in the .layout
files.  The LaTeXName parameters format would be changed to include the
list of options such as "section[]{}" then if we have generic insets based
on our current breakup of paragraph types ITEM_ENVIRONMENT, ENVIRONMENT,
COMMAND etc. these insets can handle the various data as required.
Probably in a vector since each new command/environment has a
different number of options and only compulsory options must be
present in the call to begin(...) or command(...).

That leaves the generic popup.  I think it was probably you John who
suggested ages ago that we have a generic popup with six or seven text
input fields for such a purpose.  The problem of accessing the popup is a
little trickier since we have two scopes: document and paragraph.
Paragraph level is easy enough -- right mouse click on the paragraph
brings up a menu appropriate to that paragraph with a few other useful
entries perhaps (this is already planned anyway).  We might offer
a "Paragraph format -- Document Scope" option on that menu otherwise
document scope is a bit trickier.  Hmmm... how would a given paragraph
find out what its corresponding document scope settings are? 

The labels for said input fields would be kept in the .layouts (and hence
they'd need to be included in POTFILES).

> Some Questions:
>   How expensive would it be to make the Insets a base class with
>   many protected members, member to perform the most frequently
>   used operations for any Inset?
>   [The idea:  special Insets, e.g. Math Mode, would subclass from
>   this and use this "inset toolkit" internally.  There would also
>   be one child class, "GenericInset", that implements any of these
>   operations/features depending on the use of constructor flags.]

I'm not sure what you're getting at here.  What effect will the
constructor flags have:  reconfiguring which inherited methods are
supported?

>   What are common features of LaTeX, SGML, XML, HTML, etc., that
>   we'll need to support?

The ones we currently support now:
fonts
item environments
environments
commands
specialised versions of above:
ref
toc
figures
tables
math
footnotes/marginnotes

>   What existing features fit a similar schema?

What exciting new packages might also fit this schema?
And better yet which ones don't?

&g

Re: writer

1999-02-09 Thread Lars Gullik Bjønnes

   Allan Rae writes:

class PainterWriter : Writer { writeInsetUrl(data args in
   inseturl) { painter.drawButton(text text); } };
   
   Think of a verbatim inset:
   
   InsetVerbatim::write(Writer wri) {
   wri.writeInsetVerbatim(vectorverbatimvalidobjects vec); }
   
   LaTeXWriter::writeInsetVerbatim(vectorverbatimvalidobjects vec)
   { filestr  "\\begin{verbatim}"  endl for (a = vec.begin(); a
   != vec.end(); ++a) { (*a).write(this);

  AR What exactly is in a (*a)? I'd have thought it would be an

There is a declaration missing:

vectorverbatimvalidobjects::iterator a;

  AR LString but it looks like you are having other insets -- hmmm...
  AR multiple paragraphs in a verbatim inset one insetparagraph each
  AR perhaps?

   } filestr.endineol(); filestr  "\\end{verbatim}"  endl; }
   
   Well I don't know anymore, but it seemed like nice idea when I
   was taking a shower. Especially since it hid all the hairy stuff.

  AR Trying to come up with an alternate scheme is difficult. Below
  AR is a scheme that tries to push the writers and insets to be more
  AR independent. It does however introduce some complications for

I am not sure if I see this scheme separating writer and inset more. I
will claim that my solution using only public methods of the insets
separates them more.

  AR the writers (and maybe maintenance difficulties later -- unless
  AR we are careful in implementing the state machines). It all
  AR involves making writers look like ostreams. I'll show you some
  AR code (no warranty implied ;) :

  AR class Writer { enum {command_start, command_end,
  AR command_option_start, command_option_end,verbatim_start,
  AR verbatim_end...} WriterStyles; Writer(LString const );
  AR //filename virtual Writer  operator (LString const ) = 0;
  AR virtual Writer  operator (int const ) = 0; virtual Writer 
  AR operator (WriterStyles const ) = 0; ... }

  AR class LaTeXWriter : Writer { public: LaTeXWriter(LString const
  AR ); // filename virtual Writer  operator (LString const );
  AR virtual Writer  operator (int const ); virtual Writer 
  AR operator (WriterStyles const ); ... private:
  AR auto_ptrostream our_output_file; }

  AR Writer  LaTeXWriter::operator (WriterStyles const  ws) {
  AR static LWriterState state = default_state; switch (state) { case

this state thing is not something I care a lot for.

  AR default_state: switch (ws) { case verbatim_start: state =
  AR in_environment; *our_output_file  "\\begin{verbatim}"; // we
  AR could get fancy and use a number of different // optimizations
  AR such as: // enviro_name = "verbatim"; // and then catching all
  AR environment ends and // only having one output statement. break;
  AR case *_end: // error no _end's allowed in default_state ...
  AR break; ... } case in_environment: switch (ws) { case
  AR verbatim_end: state = default_state; *our_output_file 
  AR "\\end{verbatim}"  endl; // another thing we'd probably do is
  AR to do our // initial output to an LString buffer like we do //
  AR now so we can look backwards in the output // and break long
  AR lines or add/remove '\n'. break; ... } ... } return *this; }

  AR class InsetVerbatim : Inset { virtual write(Writer ) const; ...
  AR }

  AR Writer  InsetVerbatim::write(Writer  wr) const { return wr 
  AR WriterStyles::verbatim_start  contents 
  AR WriterStyles::verbatim_end; }

So for every kind of environment we will need a *_start and *_end ?

  AR Of course it'd be nice to be able to write: LaTeXWriter
  AR lwr("somefile.tex")  preamble_and_stuff(); //
  AR preamble_and_stuff() could actually be in an // InsetPreamble or
  AR something similar. for (document_structure::const_iterator iter
  AR = buffer.begin(); iter != buffer.end(); ++iter) { lwr 
  AR (*iter); } lwr  closing_of_document();

Buffer.write(Writer*w);

Sounds nice.

  AR where (*iter) is effectively any inset. So we therefore need to
  AR add:

  AR Writer  operator (Writer  wr, InsetVerbatim const  iv) {
  AR return iv.write(wr); }

  AR and likewise for all the other different insets *or* I think the
  AR following will work:

  AR Writer  operator (Writer  wr, Inset const  inset) { return
  AR inset.write(wr); }

inset.write(this); 

Hmm, unless you write this a non-members of course.

and drop one of the paramterters.

  AR Thus giving us a double-dispatch but if we inline this one it
  AR shouldn't cost much in time or space.

  AR The scheme above makes the writers look somewhat uglier than
  AR Lars scheme does but it should also be a bit more independent of
  AR the insets. The hardest part is going to be the state machines
  AR in each of the writers WriterStyles handlers. For example:

  AR class PainterWriter : Writer { public: PainterWriter(Painter );
  AR virtual Writer  operator (LString const ); virtual Writer 
  AR operator (int const ); virtual Writer  operator
  AR (WriterStyles const ); ... private: enum {defau

Re: writer

1999-02-09 Thread Jean-Marc Lasgouttes

 "Joacim" == Joacim Persson [EMAIL PROTECTED] writes:

Joacim It's also the most obvious object-oriented approach. Maybe one
Joacim day we want to have dynamically loadable insets like `ez' has
Joacim (wp for AUIS).  That would be a lot easier to implement if all
Joacim the inset-specific code is in one place.  I missed the start
Joacim of this thread; was there some problem with this method?  How
Joacim will these three designs affect the user interface design btw?

At least, if we have a script language, it should be possible to write
(simple) insets in this script language...

Joacim I can imagine cases when the objects (insets) will be sending
Joacim their output to something else than a stream, and cases when
Joacim some insets doesn't implement output methods for all formats;
Joacim Ascii from figure insets for instance -- wouldn't make much
Joacim sense in most cases (ascii-graphics? =) (hmm.. it could be
Joacim nice to be able to search for a figure by its filename; that
Joacim could be something to export to a findreplace `writer')

Each inset could have a 'bool matches(LString)' method which is called
by the search function. Replace would be more difficult, of course.

JMarc



Re: writer

1999-02-09 Thread Asger Alstrup Nielsen

 Note that I have said nothing about how the writing will actually be
 done. Look at Asgers proposal, a lot of reuse of code there. However
 he tries to abstract all writing, and I don't think that is easily
 feasible or maintainable.

I haven't had time to read up on all of the issues in this yet, but let me
clarify one thing:

The Writer stuff I put in there is only one element in the entire design, and
it's a fairly finished part.

I imagine that we will use a layered design approach. The stuff I did is the
very lower part that will help make the upper classes simpler, because they
don't have to bother about wrapping, margins and stuff like that.

Notice that it's entirely possible to create a new class that will exploit my
writer class to do some of the dirty work.  I don't think we should add much
more into the ones I have written.
We should try to keep each component relatively small and understandable.  In
particular, I don't think we should derive a giant LaTeXWriter from the
AsciiWriter thing I have written.
Rather, the LaTeXWriter should be a fairly small class that simply dispatches
all the different tasks to other components, such as the formatting class I
presented.

--

We should have another real time meeting sometime and get this design done. 
Design by commitee over the net is a slow process, because we have to explain
every little detail in detail...

Greets,

Asger



Re: writer

1999-02-09 Thread Lars Gullik Bjønnes

  >> Allan Rae writes:

  >>  class PainterWriter : Writer { writeInsetUrl(> inseturl>) { painter.drawButton(text text); } };
  >> 
  >> Think of a verbatim inset:
  >> 
  >> InsetVerbatim::write(Writer wri) {
  >> wri.writeInsetVerbatim(vector vec); }
  >> 
  >> LaTeXWriter::writeInsetVerbatim(vector vec)
  >> { filestr << "\\begin{verbatim}" << endl for (a = vec.begin(); a
  >> != vec.end(); ++a) { (*a).write(this);

  AR> What exactly is in a (*a)? I'd have thought it would be an

There is a declaration missing:

vector::iterator a;

  AR> LString but it looks like you are having other insets -- hmmm...
  AR> multiple paragraphs in a verbatim inset one insetparagraph each
  AR> perhaps?

  >> } filestr.endineol(); filestr << "\\end{verbatim}" << endl; }
  >> 
  >> Well I don't know anymore, but it seemed like nice idea when I
  >> was taking a shower. Especially since it hid all the hairy stuff.

  AR> Trying to come up with an alternate scheme is difficult. Below
  AR> is a scheme that tries to push the writers and insets to be more
  AR> independent. It does however introduce some complications for

I am not sure if I see this scheme separating writer and inset more. I
will claim that my solution using only public methods of the insets
separates them more.

  AR> the writers (and maybe maintenance difficulties later -- unless
  AR> we are careful in implementing the state machines). It all
  AR> involves making writers look like ostreams. I'll show you some
  AR> code (no warranty implied ;) :

  AR> class Writer { enum {command_start, command_end,
  AR> command_option_start, command_option_end,verbatim_start,
  AR> verbatim_end...} WriterStyles; Writer(LString const &);
  AR> //filename virtual Writer & operator<< (LString const &) = 0;
  AR> virtual Writer & operator<< (int const &) = 0; virtual Writer &
  AR> operator<< (WriterStyles const &) = 0; ... }

  AR> class LaTeXWriter : Writer { public: LaTeXWriter(LString const
  AR> &); // filename virtual Writer & operator<< (LString const &);
  AR> virtual Writer & operator<< (int const &); virtual Writer &
  AR> operator<< (WriterStyles const &); ... private:
  AR> auto_ptr our_output_file; }

  AR> Writer & LaTeXWriter::operator<< (WriterStyles const & ws) {
  AR> static LWriterState state = default_state; switch (state) { case

this state thing is not something I care a lot for.

  AR> default_state: switch (ws) { case verbatim_start: state =
  AR> in_environment; *our_output_file << "\\begin{verbatim}"; // we
  AR> could get fancy and use a number of different // optimizations
  AR> such as: // enviro_name = "verbatim"; // and then catching all
  AR> environment ends and // only having one output statement. break;
  AR> case *_end: // error no _end's allowed in default_state ...
  AR> break; ... } case in_environment: switch (ws) { case
  AR> verbatim_end: state = default_state; *our_output_file <<
  AR> "\\end{verbatim}" << endl; // another thing we'd probably do is
  AR> to do our // initial output to an LString buffer like we do //
  AR> now so we can look backwards in the output // and break long
  AR> lines or add/remove '\n'. break; ... } ... } return *this; }

  AR> class InsetVerbatim : Inset { virtual write(Writer &) const; ...
  AR> }

  AR> Writer & InsetVerbatim::write(Writer & wr) const { return wr <<
  AR> WriterStyles::verbatim_start << contents <<
  AR> WriterStyles::verbatim_end; }

So for every kind of environment we will need a *_start and *_end ?

  AR> Of course it'd be nice to be able to write: LaTeXWriter
  AR> lwr("somefile.tex") << preamble_and_stuff(); //
  AR> preamble_and_stuff() could actually be in an // InsetPreamble or
  AR> something similar. for (document_structure::const_iterator iter
  AR> = buffer.begin(); iter != buffer.end(); ++iter) { lwr <<
  AR> (*iter); } lwr << closing_of_document();

Buffer.write(Writer*w);

Sounds nice.

  AR> where (*iter) is effectively any inset. So we therefore need to
  AR> add:

  AR> Writer & operator<< (Writer & wr, InsetVerbatim const & iv) {
  AR> return iv.write(wr); }

  AR> and likewise for all the other different insets *or* I think the
  AR> following will work:

  AR> Writer & operator<< (Writer & wr, Inset const & inset) { return
  AR> inset.write(wr); }

inset.write(this); 

Hmm, unless you write this a non-members of course.

and drop one of the paramterters.

  AR> Thus giving us a double-dispatch but if we i

Re: writer

1999-02-09 Thread Jean-Marc Lasgouttes

>>>>> "Joacim" == Joacim Persson <[EMAIL PROTECTED]> writes:

Joacim> It's also the most obvious object-oriented approach. Maybe one
Joacim> day we want to have dynamically loadable insets like `ez' has
Joacim> (wp for AUIS).  That would be a lot easier to implement if all
Joacim> the inset-specific code is in one place.  I missed the start
Joacim> of this thread; was there some problem with this method?  How
Joacim> will these three designs affect the user interface design btw?

At least, if we have a script language, it should be possible to write
(simple) insets in this script language...

Joacim> I can imagine cases when the objects (insets) will be sending
Joacim> their output to something else than a stream, and cases when
Joacim> some insets doesn't implement output methods for all formats;
Joacim> Ascii from figure insets for instance -- wouldn't make much
Joacim> sense in most cases (ascii-graphics? =) (hmm.. it could be
Joacim> nice to be able to search for a figure by its filename; that
Joacim> could be something to export to a find `writer')

Each inset could have a 'bool matches(LString)' method which is called
by the search function. Replace would be more difficult, of course.

JMarc



Re: writer

1999-02-09 Thread Asger Alstrup Nielsen

> Note that I have said nothing about how the writing will actually be
> done. Look at Asgers proposal, a lot of reuse of code there. However
> he tries to abstract all writing, and I don't think that is easily
> feasible or maintainable.

I haven't had time to read up on all of the issues in this yet, but let me
clarify one thing:

The Writer stuff I put in there is only one element in the entire design, and
it's a fairly finished part.

I imagine that we will use a layered design approach. The stuff I did is the
very lower part that will help make the upper classes simpler, because they
don't have to bother about wrapping, margins and stuff like that.

Notice that it's entirely possible to create a new class that will exploit my
writer class to do some of the dirty work.  I don't think we should add much
more into the ones I have written.
We should try to keep each component relatively small and understandable.  In
particular, I don't think we should derive a giant LaTeXWriter from the
AsciiWriter thing I have written.
Rather, the LaTeXWriter should be a fairly small class that simply dispatches
all the different tasks to other components, such as the formatting class I
presented.

--

We should have another real time meeting sometime and get this design done. 
Design by commitee over the net is a slow process, because we have to explain
every little detail in detail...

Greets,

Asger



Re: writer

1999-02-08 Thread Jean-Marc Lasgouttes

 "Allan" == Allan Rae [EMAIL PROTECTED] writes:

Allan The third option which I'm still inclined to prefer (for its
Allan simplicity) is to have each Inset provide writer specific
Allan methods.  Such a scheme wouldn't make it any harder for the
Allan writer to keep a check of the output (to break lines at
Allan appropriate points or insert extra spaces etc.).  It would also
Allan be possible to use an ostream syntax for the writers.  Let's
Allan call this Option3.

I tend to agree that this is the somplest solutions. If we want to
have all the output methods in the same place, we can always do what
mathed does: all the Write methods of the different insets are in the
same file.

Allan Other comments: I like the iostream appearance of my scheme
Allan with the overloaded operator.  Unfortunately, I doubt we
Allan could modify Lars' scheme to use overloaded operator unless
Allan we changed it to being overloaded on inset types. I think my
Allan scheme needs something better than the WriterStyles enum to
Allan configure the Writer stream.  Maybe something similar to the
Allan ostream manipulators (setw() and the like).

I'd say that  is merely syntactic sugar. Do we really need that?

JMarc



Re: writer

1999-02-08 Thread Jean-Marc Lasgouttes

>>>>> "Allan" == Allan Rae <[EMAIL PROTECTED]> writes:

Allan> The third option which I'm still inclined to prefer (for its
Allan> simplicity) is to have each Inset provide writer specific
Allan> methods.  Such a scheme wouldn't make it any harder for the
Allan> writer to keep a check of the output (to break lines at
Allan> appropriate points or insert extra spaces etc.).  It would also
Allan> be possible to use an ostream syntax for the writers.  Let's
Allan> call this Option3.

I tend to agree that this is the somplest solutions. If we want to
have all the output methods in the same place, we can always do what
mathed does: all the Write methods of the different insets are in the
same file.

Allan> Other comments: I like the iostream appearance of my scheme
Allan> with the overloaded operator<<.  Unfortunately, I doubt we
Allan> could modify Lars' scheme to use overloaded operator<< unless
Allan> we changed it to being overloaded on inset types. I think my
Allan> scheme needs something better than the WriterStyles enum to
Allan> configure the Writer stream.  Maybe something similar to the
Allan> ostream manipulators (setw() and the like).

I'd say that << is merely syntactic sugar. Do we really need that?

JMarc



Re: writer

1999-01-20 Thread Amir Karger

On Wed, Jan 20, 1999 at 04:36:54PM +0100, Jean-Marc Lasgouttes wrote:
  "Lars" == Lars Gullik Bjønnes [EMAIL PROTECTED] writes:
 
 Lars Gains: - simplified code in insets - we ensure that all insets
 Lars can be output with all different writers - when adding new
 Lars insets, the compiler will barf unless you inplement the new
 Lars writer method in all derived classes.
 
 The risk is that people will implement empty methods just to compile
 and forget about it later... However, I agree that it will help
 maintenance. 

Sure, but if they don't write something, then the first person who tries to
use that method will find out, won't they. And it will be extremely easy to
pinpoint where the missing code is.

 Lars Well I don't know anymore, but it seemed like nice idea when I
 Lars was taking a shower. 
 
 You should take showers more often ;)
 

I should too. I tend to get my best programming ideas there.

Jean-Marc has apparently decided that in e-mails to me, it's always
friday...

-Amir



Re: writer

1999-01-20 Thread Lars Gullik Bjønnes

*Jean-Marc Lasgouttes writes:
 |  The risk is that people will implement empty methods just to
 | compile and forget about it later... However, I agree that it will
 | help maintenance.

Hopefully they will at least insert a "#warning complete this please".

 |  Are there other places where 'code' is needed. Getting rid of it
 | would certainly be great. I did not understand everything in
 | Allan's propositions in this respect... BTW, isn't this problem
 | related to rtti?

Yes, but to avoit rtti is also Good(tm). I'd really like to think of a
scheme where we could ge rid of the Inset::Code.

 |  Why not directly writeInsetUrl(const  InsetURL) {...}
 | 
 | Passing the inset is certainly simpler than a list of arguments
 | (where you can put arguments in the wrong order, etc.).

then the writeInsetUrl (which should perhaps be named just writeUrl)
needs to know a lot more about InsetUrl internals.
(what functions to access to get the required info f.ex.)

 |  You should take showers more often ;)

Hmprf, do I stink all the way to France?

Lgb



Re: writer

1999-01-20 Thread Jean-Marc Lasgouttes

 "Lars" == Lars Gullik Bjønnes [EMAIL PROTECTED] writes:

Lars Yes, but to avoit rtti is also Good(tm). I'd really like to
Lars think of a scheme where we could ge rid of the Inset::Code.

I absolutely agree that avoiding rtti is good. I just wanted to make
sure it was related.

Lars  | Why not directly writeInsetUrl(const  InsetURL) {...}  | |
Lars Passing the inset is certainly simpler than a list of arguments
Lars | (where you can put arguments in the wrong order, etc.).

Lars then the writeInsetUrl (which should perhaps be named just
Lars writeUrl) needs to know a lot more about InsetUrl internals.
Lars (what functions to access to get the required info f.ex.)

Well, this is of course possible for URL:s, but what about math
insets, table insets or text insets? Are you going to pass 10
arguments just to get it right? IMO, this only amounts to more code
(not much, admittedly).

Lars  | You should take showers more often ;)

Lars Hmprf, do I stink all the way to France?

No, but when you take showers, your bright ideas fly all the way to
France... 

JMarc



Re: writer

1999-01-20 Thread Amir Karger

On Wed, Jan 20, 1999 at 04:36:54PM +0100, Jean-Marc Lasgouttes wrote:
> >>>>> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
> 
> Lars> Gains: - simplified code in insets - we ensure that all insets
> Lars> can be output with all different writers - when adding new
> Lars> insets, the compiler will barf unless you inplement the new
> Lars> writer method in all derived classes.
> 
> The risk is that people will implement empty methods just to compile
> and forget about it later... However, I agree that it will help
> maintenance. 

Sure, but if they don't write something, then the first person who tries to
use that method will find out, won't they. And it will be extremely easy to
pinpoint where the missing code is.

> Lars> Well I don't know anymore, but it seemed like nice idea when I
> Lars> was taking a shower. 
> 
> You should take showers more often ;)
> 

I should too. I tend to get my best programming ideas there.

Jean-Marc has apparently decided that in e-mails to me, it's always
friday...

-Amir



Re: writer

1999-01-20 Thread Lars Gullik Bjønnes

*Jean-Marc Lasgouttes writes:
 |  The risk is that people will implement empty methods just to
 | compile and forget about it later... However, I agree that it will
 | help maintenance.

Hopefully they will at least insert a "#warning complete this please".

 |  Are there other places where 'code' is needed. Getting rid of it
 | would certainly be great. I did not understand everything in
 | Allan's propositions in this respect... BTW, isn't this problem
 | related to rtti?

Yes, but to avoit rtti is also Good(tm). I'd really like to think of a
scheme where we could ge rid of the Inset::Code.

 |  Why not directly writeInsetUrl(const & InsetURL) {...}
 | 
 | Passing the inset is certainly simpler than a list of arguments
 | (where you can put arguments in the wrong order, etc.).

then the writeInsetUrl (which should perhaps be named just writeUrl)
needs to know a lot more about InsetUrl internals.
(what functions to access to get the required info f.ex.)

 |  You should take showers more often ;)

Hmprf, do I stink all the way to France?

Lgb



Re: writer

1999-01-20 Thread Jean-Marc Lasgouttes

> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:

Lars> Yes, but to avoit rtti is also Good(tm). I'd really like to
Lars> think of a scheme where we could ge rid of the Inset::Code.

I absolutely agree that avoiding rtti is good. I just wanted to make
sure it was related.

Lars>  | Why not directly writeInsetUrl(const & InsetURL) {...}  | |
Lars> Passing the inset is certainly simpler than a list of arguments
Lars> | (where you can put arguments in the wrong order, etc.).

Lars> then the writeInsetUrl (which should perhaps be named just
Lars> writeUrl) needs to know a lot more about InsetUrl internals.
Lars> (what functions to access to get the required info f.ex.)

Well, this is of course possible for URL:s, but what about math
insets, table insets or text insets? Are you going to pass 10
arguments just to get it right? IMO, this only amounts to more code
(not much, admittedly).

Lars>  | You should take showers more often ;)

Lars> Hmprf, do I stink all the way to France?

No, but when you take showers, your bright ideas fly all the way to
France... 

JMarc