> Writing an XML parser from scratch for boost should, IMHO, have these > features:
> [1] It should make use of the Spirit and Regex libraries for XML and XPath
> parsing.
Whilst these libraries might be useful for the parser writer, I don't see any
benefit to requiring their use for a boost XML parser. If a submitted parser
used alternative parsing methods that should be acceptable provided it worked.
Writing a lexer/parser is a complex task. It wasn't a requirement, more a suggestion/my opinion on what a boost XML library should be like. There are four possible options:
[1] Write the lexer-parser by hand. This is a difficult process to get right, and can lead to complex code that is difficult to update.
[2] Use flex/bison or equivalents. These are C-based lexers and as such, it is difficult to integrate with C++ (especially since several implementations use K&R C and make use of variables called class!!). Also, do we then require that flex/bison be distributed with Boost??
[3] Use Boost.Spirit/Boost.Regex. These are written in C++ and so make use of advanced techniques. For example, the use of templates make writing the parser as easy as writing BNF grammars! Also, Spirit uses trinary-search-trees that have very fast lookup as an associative-style container. Also, using these libraries will prevent wheel reinvention and make the code more boostified.
[4] Use another lexer/parser generator. This is an unknown, and again with the Boost distribution.
> [2] It should conform to the following W3C standards:
> (b) DOM 1.0/2.0/3.0
Hmm. The DOM standards in particular are very Java oriented, and don't
necessarily make for efficient C++ bindings. I can see that the parser needs
to provide the same set of facilities though, even if it is done in a
different way.
If you are writing a program that interacts with XML via a scripting language, then DOM bindings would be needed (especially if you are wanting a browser that can, for example, control SVG objects when the user interacts with them). I know that this could be done using the MS parser, but what if you wanted good unicode support for something like MathML?
I also agree that efficient C++ bindings would be very desirable. What about a C++-to-DOM binding wrapper?
e.g. boost::w3c::dom::DOMElement< boost::xml::element >
NOTE: This is just a suggestion. That way, you can make the C++ versions very efficient, while the DOM versions will have a wrapper layer to them.
IMHO, the base parser should provide an API on which other things can be
built. For example, provided the facilities are present to retrieve the
information needed for XPath processing, the core API doesn't need to have an
XPath processor. Likewise for XSLT.
Agreed. XML, DTD and XPath parsing and structure navigation with unicode support are all that is required for the base level. That is why I put the others as optional. It would be nice if the library supported XPath navigation, XSLT, and DTD/XMLSchema validation, though as these are common facilities.
However, I think it is important that the library does include add-on APIs for
as much of the supporting standards as possible, such as DOM-like processing,
XPath node selection, DTD and XMLSchema validation, and XSLT.
Agreed. Perhapse it would be best to organise accoring to facilities:
boost::xml::dom -- C++ DOM bindings boost::w3c::dom -- W3C DOM bindings (requires boost::xml::dom) boost::xml::xpath -- XPath parsing/navigation boost::xml::xslt -- requires boost::xml::dom and boost::xml::xpath boost::xml::xslfo -- requires boost::xml::xslt boost::xml::mathml -- requires boost::xml::dom etc.
This way, the user can include which API's he/she wants with minimal dependencies.
> [4] It should provide XPath bindings to the XML DOM in a clean way; I > personally like the MS selectSingleNode/selectNodes extension to the XML > node DOM interface.
There is no point in providing XPath support if it's painful to use.
I was thinking in terms of a W3C DOM. If we are thinking in terms of C++ bindings, the usage could be like this:
boost::xml::dom::node root;
// ...
// select a single node - note usage of array-style notation: boost::xml::dom::node sel = root[ boost::xml::xpath::expr( L"/*[1]" )];
// select a collection of nodes - can accept an XPath string or XPath expression
boost::xml::xpath::result_set math( root, L"//m:math" );
This would give a cleaner interface between XML and XPath. (NOTE: I have implemented this style of syntax for my MS-XML wrappers).
> [5] It should have a clean access to attributes, without the user needing to
> call get/set methods.
I am not sure what you mean here.
I was thinking from a W3C DOM/MS COM PoV where the attributes are implemented via get and set methods [Example:
get_documentElement( IXMLDOMNode * )
vs
XMLDOMNode XMLDOMDocument.documentElement
], but with clean C++ bindings this is largely irrelevant.
I am developing Axemill (http://www.sf.net/projects/axemill) to fulfil most of
these goals, with the eventual goal of submitting it to boost. If you want to
contribute code and/or ideas, please email me. Currently, it requires gcc 3.2
(though it should build with other relatively conforming compilers) and boost
1.29.0 (I intend to move to 1.30.0 shortly)
I would be happy to help out with code and ideas.
Regards, Reece
_________________________________________________________________
It's fast, it's easy and it's free. Get MSN Messenger today! http://www.msn.co.uk/messenger
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost