On Sun, Oct 17, 2004 at 10:34:28PM +0200, Andreas Vox wrote:
> Hi LyXers!
>
> I would like to propose an architecture for a C++ wrapper on eXpat or
> libxml for parsing LyX XML. Disclaimer: I'm not a C++ programmer,
> so don't expect correct syntax or correct STL names.
I don't think I agree with the idea.
LyX's C++ wrapper for XML can be fairly close to LyX's internal
structure. There's no need to invent another tree architecture in
parallel to the existing inset hierarchy, so a naural solution woud be
something similar to that 'MetaInset' I proposed earlier. This is
sufficient to store data and attributes in a way that makes it more
uniform acessible from the outside than the current inset hierarchy with
individual 'hand coded' attributes, yet plugs into the inset hierarchy
and thus can be handled in the 'customary LyX way'.
> The first would be a class hierarchy for every type of XML element we
> want to read.
As there is already a rough 1:1 correspondence between insets and the
kind of XML elements we want to write, this class hierarchy is already
there.
> Here's the root class:
>
> XMLement.h
> ===================================================================
>
> template <class Node>
> class XMLement
Why would that be a template? Wouldn't a plain hierarchy make more
sense?
> {
> /// this is the XML element tag
> string const name() const = 0;
>
> /// used to create a new node when an XML element start with tag ==
> name() is found
> Node & create(Attribute** attributes) = 0;
>
> /// parent has this element type, child is an element below
> void addElement(Node & parent, Node & child) = 0;
>
> /// parent has this element type, adds free text
> void addText(Node & parent, string & const) = 0;
>
> /// parent has this element type, adds CDATA text
> void addCDATA(Node & parent, string & const) = 0;
>
> /// checks if attributes and elements correspond to DTD or whatever
> schema
> bool validate(Node & parent) = 0;
>
> /// called when the element's end tag es encountered, gives
> /// possibility to restructure and replace with other element type(s)
> vector<Node> resolve(Node &) = 0;
> }
> ___________________________________________________________________
>
> The next class wraps a parser and uses XMLements to build a tree:
>
> XMLBuilder.h
> ===================================================================
>
> #include "xmlparser.h"
>
> template <class Node>
> class XMLBuilder : private ContentHandler
> {
> XMLBuilder (some flags);
>
> /// uses this element type for parsing
> void register(XMLement & const);
>
> /// does the parsing
> Node & parse(istream &);
>
> private:
>
> /// stores the registered element types
> hashmap<string,XMLement<Node>> registry;
>
> ///
> XMLParser parser;
>
> /// holds the temporary nodes
> stack<Node&> nodeStack;
>
> /// holds the types of temporary nodes
> stack<XMLement&> typeStack;
>
> /// called by XML parser
> void startElement(string & const, Attribute**);
>
> void endElement(String & const);
>
> void characters(String & const);
> }
> ___________________________________________________________________
>
>
>
> Implementation would be straightforward:
>
> XMLBuilder.C
> ===================================================================
>
> include "XMLBuilder.h"
>
> XMLBuilder::XMLBuilder(some flags) : parser(some flags)
> {
> parser.setHandler(this);
> }
>
> void XMLBuilder::register (XMLement & const type)
> {
> registry.put (type.name(), type);
> }
>
> Node XMLBuilder::parse(istream & is)
> {
> nodeStack.clear();
> // anonymous root
> Node & root = new Node()
> nodeStack.push(root);
> parser.parse(is);
> return root.firstChild();
> }
>
>
> void XMLBuilder::startElement(string & const tag, Attribute**
> attributes)
> {
> XMLement & type = registry.get(tag);
> Node node = type.create(attributes);
> typeStack.push(type);
> nodeStack.push(node);
> }
>
> void XMLBuilder::endElement(string & const)
> {
> Node node = nodeStack.pop();
> XMLement type = typeStack.pop();
>
> vector<Node> children = type.resolve(node);
>
> Node parent = nodeStack.top();
> XMLement parenttype = typeStack.top();
> for (vector_iterator<node> i = children.start; i < children.end();
> i++)
> {
> parenttype.addElement(parent, *i )
> }
> }
>
> void XMLBuilder::characters(string & const data)
> {
> Node node = nodeStack.top();
> XMLement type = typeStack.top();
> type.addText(node, data);
> }
>
> stack<Node &> nodeStack;
>
> XMLParser XMLBuilder::parser;
>
> hashmap<string, XMLement<Node>> XMLBuilder::registry;
>
> ___________________________________________________________________
>
>
> An example for parsing trees with just two node types would be:
>
> nodes.h
> ===================================================================
>
> #include "XMLement.h"
>
> struct MyNode {
> bool isLeaf;
> string & data;
> vector<MyNode> children;
> }
>
> class MyLeafType : XMLement <MyNode> {
>
> MyNode create(Attribute**) // I get to lazy to start an extra
> nodes.C file !
> {
> return new MyNode : isLeaf(true) , data("");
> }
Sorry, you lost me. This does not really seem to make sense even if I
replaced this childern vector with something compilable and interpret
that last return as a constructor of a 'MyNode'. I simply see not how
this hooks together. IMO to make this work, there are quite a few
'virtual' missing and the template argument is not needed. I.e. we'd be
back at a plain old hierarchy. The 'create' facility looks like
something that should be a class factory with manual registration of
element types.
Now if I call your 'XMLElement' an 'Inset' and use the existing
src/factory.[hC] for creation you describe something fairly similar to
current LyX structure.
> So, that's it. The advantages I see:
>
> *** easily adaptable to other XML parsers *** enough flexibility to
> reuse element types or add new types *** concise
>
> For disadvantages it's over to you ;-)
Well, for one it is not even close to being compilable. Secondly, if
made compilable (at least in the way I think it could work) you mimic
existing code.
> So what do you think ?
Without further clarification I don't see any real advantage. It looks
like you are looking for a solution for the wrong problem. LyX knows how
to create and handle tree-structured data. No need to change this. The
current 'problem' is to decide _how_ to plugin _which_ XML parser and
whether this should be for reading and writing or only for reading.
My point of view is (a) that the 'which' does not matter too much if
the interface between the LyX core and such a parser is kept to a
minimum [and the proposed 'MetaInset' would be a fairly slim interface]
and (b) that we don't need any external help to _write_ XML.
Andre'