Re: [boost] API Review request: XML APIs for C++
Reece Dunn wrote: Hamish Mackenzie wrote: How do you feel about having a node_reference instead. What are the access differences? I.e. could you use '.' to access attributes/methods as opposed to '->'? yes of course, that's the whole point: it's a wrapper around a C pointer, endowed with all the methods you would expect in a C++ xml API. But, much like in my original proposal, these methods are very thin wrappers that delegate the call to libxml2. Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: How do you feel about having a node_reference instead. What are the access differences? I.e. could you use '.' to access attributes/methods as opposed to '->'? // libxml2 class node_reference { private: xmlNodePtr node_; }; [snip] boost::xml::dom::node_reference mynode = doc.selectNode( "/h:*[1]" ); I am not that bothered *how* it is implemented, so long as it provides a decent enough interface. I was also showing how you could make it more portable by not explicitly using a Document * doc = ...; style code fragment. Regards, Reece _ Find a cheaper internet access deal - choose one to suit you. http://www.msn.co.uk/internetaccess ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
On Fri, 2003-06-13 at 19:39, Reece Dunn wrote: > >>This would, however, allow the implementation to be adapted to when memory > >>management needs to be used and to switch between different management > >>policies for the various implementations. > > >Can you elaborate ? > > Sure. In the libxml2 versions, nodes are explicitly deleted, and thus do not > need to be managed, e.g: >typedef node_impl * node; > > If an object requires explicit management, via shared_ptr for example: >typedef boost::shared_ptr< node_impl > node; > > Or, if it is say using a wrapper around an MSXML interface: >class node: public CComPtr< IXMLDOMNode >{ ... }; > > The user can thus write: >boost::xml::dom::node mynode = doc.selectNode( "/h:*[1]" ); > > without worrying about the underlying representation of > boost::xml::dom::node. These are all pointer types though, so node_ptr is perhaps be a better name. How do you feel about having a node_reference instead. // libxml2 class node_reference { private: xmlNodePtr node_; }; // MSXML class node_reference { private: CComPtr< IXMLDOMNode > node_; }; boost::xml::dom::node_reference mynode = doc.selectNode( "/h:*[1]" ); A node_reference to an object in the document would only be safe as long as that node existed in the document. -- Hamish Mackenzie <[EMAIL PROTECTED]> ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
On Fri, 2003-06-13 at 18:53, Stefan Seefeld wrote: > > Maybe we do need to allow the use of _private in higher level layers. > > I don't think that's a good idea, in particular because it's to tightly > coupled to a particular implementation. We could allow a 'payload' for > nodes (possibly a template parameter), but we could as well do the > association between a node and its template in an external lookup > mechanism, which seems to be more clean and drag in the least overhead. Cool. I think the at least one association should belong to the document (so that associations can be removed when the node or it's parents are erased from the document). Something like template< typename node_data, ... > class document { public: typedef xmlNodePtr node_key; std::map< node_key, node_data > & node_data_map(); const std::map< node_key, node_data > & node_data_map() const; private: std::map< node_key, node_data > node_data_map_; }; class node_reference { public: typedef xmlNodePtr node_key; node_key key() { return node_; } }; A libxml2 implementation could substitute node_data_map for a class that uses _private to store the data (or to speedup lookup). However this would be an optimisation (definitely not a requirement). The user would be free to set up other associative containers using the node key. However they would be responsible for erasing the entries for deleted nodes themselves. -- Hamish Mackenzie <[EMAIL PROTECTED]> ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Stefan Seefeld wrote I also like the xml::dom::document document = xml::dom::parse_file(argv[1]); style usage. Also: xml::dom::element e = doc.documentElement(); xml::dom::node n = e; This is just my preferred style/usage, and does not have to be adopted. you mean you vote for the 'nodes-are-references' style ? I mean, for example: namespace boost { namespace xml { namespace dom { typedef DocumentImpl * document; }}} This would, however, allow the implementation to be adapted to when memory management needs to be used and to switch between different management policies for the various implementations. Can you elaborate ? Sure. In the libxml2 versions, nodes are explicitly deleted, and thus do not need to be managed, e.g: typedef node_impl * node; If an object requires explicit management, via shared_ptr for example: typedef boost::shared_ptr< node_impl > node; Or, if it is say using a wrapper around an MSXML interface: class node: public CComPtr< IXMLDOMNode >{ ... }; The user can thus write: boost::xml::dom::node mynode = doc.selectNode( "/h:*[1]" ); without worrying about the underlying representation of boost::xml::dom::node. Regards, Reece _ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: Yes, I can see that, for the xml node types. But for that we don't even need anything but a single 'node_proxy' class (with a 'type()' method returning an enum). True. Though it might be useful in the next layer of the interface. If you end up with another proxy type that has lots of switch( raw_node_.some_node_property() ) { } I use a very similar system in an HTML editor where the entity name is used to lookup the handler. really the entire name ? Hmm. I'm thinking of the DOM spec(s), and domain specific extensions. There typically new node types are elements, i.e. they would (logically) all derive from dom::element. (SVGElement, say). Hmm, I'm not sure yet how to 'extend' dom::nodes towards these DOM types. But it seems inheritance is not the right solution, especially if we don't control the node's construction, i.e. we can't just plug in a 'SVG node factory'. Anyways, that's another layer...:-) Maybe we do need to allow the use of _private in higher level layers. I don't think that's a good idea, in particular because it's to tightly coupled to a particular implementation. We could allow a 'payload' for nodes (possibly a template parameter), but we could as well do the association between a node and its template in an external lookup mechanism, which seems to be more clean and drag in the least overhead. I start to like your node reference class quite a lot... :-) I am definitely leaning toward "node_reference" rather than "node_proxy" is that your preferred name too? I was talking about your proxy approach as opposed to my node wrapper classes that are constructed on the heap. But yeah, concerning the naming I prefer reference over proxy. yeah, a parser for asynchronous document creation may be interesting. But I see that as a somewhat different beast. Simple (local, synchronous) document creation from an xml file doesn't need to go over a stateful parser object. Its not a high priority but it is nice that libxml2 supports it. Agreed. So we could have both, stateless (synchronous) parsers creating a document in a single call, and statefull asynchronous parsers that can be bound to event loops or other 'signals'. Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
On Fri, 2003-06-13 at 16:11, Stefan Seefeld wrote: > well, it looks like a mix of things. What you are doing, essentially, > is wrapping a polymorphic 'do_something' method around a non-C++ > type system, i.e. the real method invocation is done with a 'type()' > discriminator. Yes. > Yes, I can see that, for the xml node types. But for that we don't even > need anything but a single 'node_proxy' class (with a 'type()' method > returning an enum). True. Though it might be useful in the next layer of the interface. If you end up with another proxy type that has lots of switch( raw_node_.some_node_property() ) { } I use a very similar system in an HTML editor where the entity name is used to lookup the handler. Because the lookup can be slow I cache the result in the node itself. We might have to expose _private if we wanted to do that :-(. Maybe we do need to allow the use of _private in higher level layers. If so how about something like this?... template< typename Value_Type, ... > class document; template< typename Value_Type, ... > class node_reference { public: void value( Value_Type * value ) { node_->_private = value; } Value_Type * value() const { return static_cast< Value_Type * >( node_->_private ); } }; Or some such??? > I think it is a good thing not to own them, but the semantics should > be clear. > > >>yes, and you could even make that an 'element_proxy' as you know that > >>parent nodes are always elements. However, with a flat set of (libxml2) > >>nodes that wouldn't work any more, so runtime polymorphism would be > >>lost. Well, may be there is no need for it either. I have to think over > >>that... > > > > > > True and that would fit in nicely with the code I outlined above > > > > node.first_child().do_something( 0 ); > > > > Would go through the polymorphic lookup and > > > > node.parent().do_something( 0 ); > > > > Would call the same code but without the lookup overhead. > > indeed, though, on further thinking, nodes themselfs don't do anything, > so we could as well keep this polymorphism outside the node class, and > let nodes only provide their type as an enum. > > I start to like your node reference class quite a lot... :-) I am definitely leaning toward "node_reference" rather than "node_proxy" is that your preferred name too? > >>>parse_stream would indeed be even better. As I recall there are > >>>functions in libxml2 that allow you to write to the parser as well. > >> > >>erm, that's even more confusing, I think. A parser should remain > >>just a parser, i.e. something that extracts tokens from an input stream. > > > > > > It is still just a parser (but works as a state machine). Check out > > xmlCreatePushParserCtxt and xmlParseChunk. > > > > Say you need to receive lots of xml files over the internet and parse > > them all at once. You could use a thread per connection and have the > > parsers read from the stream but that would require lots of threads. > > Are you suggesting that all the different xml files should be merged > into a single dom document ? No many separate documents each needing there own parser and thread. > > With the "push" interface you can use async io to read from the sockets > > and then write the data to the parsers as you get it. Because the state > > of the parse is not stored on the stack you do not need a separate > > thread for each parser. > > yeah, a parser for asynchronous document creation may be interesting. > But I see that as a somewhat different beast. Simple (local, > synchronous) document creation from an xml file doesn't need to > go over a stateful parser object. Its not a high priority but it is nice that libxml2 supports it. -- Hamish Mackenzie <[EMAIL PROTECTED]> ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Reece Dunn wrote: Any node can define namespaces: ... ... ... so namespace lookup would propagate along the parents of each node until a matching namespace is found. This would mean that each node must store a (smart) pointer to a namespace mapping, in order to facilitate lookup (done during node construction). The node will also have a pointer to a namespace information object that looks something like: class namespaceInfo { std::string url; std::string name; }; Have I got this wrong? no, it's correct...on a conceptual level. The point I'm trying to make is that libxml2 does all this for me already. The node itself doesn't store all the active namespaces, as that would be a horrible waste. You have to query them (and therefor you'll access the document and parent nodes). For example, if a node is moved from one place to another, its context will change and therefor some necessary adjustments have to be made notably concerning the namespaces. A look into the libxml2 source code convinced me that it's quite a tricky job... I also like the xml::dom::document document = xml::dom::parse_file(argv[1]); style usage. Also: xml::dom::element e = doc.documentElement(); xml::dom::node n = e; This is just my preferred style/usage, and does not have to be adopted. you mean you vote for the 'nodes-are-references' style ? This would, however, allow the implementation to be adapted to when memory management needs to be used and to switch between different management policies for the various implementations. Can you elaborate ? Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Reece Dunn wrote: ... ... ... This should have been (with a default namespace): ... ... ... Regards, Reece _ Stay in touch with absent friends - get MSN Messenger http://www.msn.co.uk/messenger ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: How about this for reading an input stream... xml::dom::document doc( parse_stream( std::cin.rdbuf() ) ); I like this facility. You could also make it accept an istream like this: class parse_stream { public: parse_stream( istream & is ): sb( is.rdbuf()){...} ... }; or something similar, making the above the simpler: xml::dom::document doc( parse_stream( std::cin )); - Stefan Seefeld wrote: Yes, but is that a problem ? Of course it has to be written in bold strokes: "Don't delete a document while operating on its content !", but I think the main idea to get across is that nodes *can only* exist in the context of a document. That's not only because of memory ownership issues, but also for a variety of other contextual data associated with a node, such as namespaces. Any node can define namespaces: ... ... ... so namespace lookup would propagate along the parents of each node until a matching namespace is found. This would mean that each node must store a (smart) pointer to a namespace mapping, in order to facilitate lookup (done during node construction). The node will also have a pointer to a namespace information object that looks something like: class namespaceInfo { std::string url; std::string name; }; Have I got this wrong? - I also like the xml::dom::document document = xml::dom::parse_file(argv[1]); style usage. Also: xml::dom::element e = doc.documentElement(); xml::dom::node n = e; This is just my preferred style/usage, and does not have to be adopted. This would, however, allow the implementation to be adapted to when memory management needs to be used and to switch between different management policies for the various implementations. _ Tired of 56k? Get a FREE BT Broadband connection http://www.msn.co.uk/specials/btbroadband ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: On Fri, 2003-06-13 at 14:11, Stefan Seefeld wrote: Hmm, I see your point. Well, that would be possible, but that way you are unable to make nodes polymorphic. Neither with respect to the basic node types (Element, Attribute, Text, CData, etc.) nor later when implementing real DOM support on top of it. How about this (I forget the name of this pattern but its in the GOF book) well, it looks like a mix of things. What you are doing, essentially, is wrapping a polymorphic 'do_something' method around a non-C++ type system, i.e. the real method invocation is done with a 'type()' discriminator. Yes, I can see that, for the xml node types. But for that we don't even need anything but a single 'node_proxy' class (with a 'type()' method returning an enum). yeah, looks interesting, and even more thin than my wrapper. However, the thinner the wrapper gets the greater is indeed the danger of having the whole design tied to a particular implementation, as William pointed out earlier. I don't think that this is a problem with my wrapper lib, but with your implementation you get dangerously close...:-) Making it thicker won't make it any easier to apply to other parsers. Especially if you rely on the existence of something like _private. Indeed, and I think this hits the nail on the head: we still only provide ownership semantics implicitly by delegating to the implementation. I think it is crucial to define the right ownership semantics, and then make sure it is implementable with different libs. In your case you explicitely operate on node references ('proxies'), while I use 'raw pointers'. We both don't own the real nodes. I think it is a good thing not to own them, but the semantics should be clear. yes, and you could even make that an 'element_proxy' as you know that parent nodes are always elements. However, with a flat set of (libxml2) nodes that wouldn't work any more, so runtime polymorphism would be lost. Well, may be there is no need for it either. I have to think over that... True and that would fit in nicely with the code I outlined above node.first_child().do_something( 0 ); Would go through the polymorphic lookup and node.parent().do_something( 0 ); Would call the same code but without the lookup overhead. indeed, though, on further thinking, nodes themselfs don't do anything, so we could as well keep this polymorphism outside the node class, and let nodes only provide their type as an enum. I start to like your node reference class quite a lot... :-) parse_stream would indeed be even better. As I recall there are functions in libxml2 that allow you to write to the parser as well. erm, that's even more confusing, I think. A parser should remain just a parser, i.e. something that extracts tokens from an input stream. It is still just a parser (but works as a state machine). Check out xmlCreatePushParserCtxt and xmlParseChunk. Say you need to receive lots of xml files over the internet and parse them all at once. You could use a thread per connection and have the parsers read from the stream but that would require lots of threads. Are you suggesting that all the different xml files should be merged into a single dom document ? With the "push" interface you can use async io to read from the sockets and then write the data to the parsers as you get it. Because the state of the parse is not stored on the stack you do not need a separate thread for each parser. yeah, a parser for asynchronous document creation may be interesting. But I see that as a somewhat different beast. Simple (local, synchronous) document creation from an xml file doesn't need to go over a stateful parser object. Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
On Fri, 2003-06-13 at 14:11, Stefan Seefeld wrote: > Hmm, I see your point. Well, that would be possible, but that way you > are unable to make nodes polymorphic. Neither with respect to the basic > node types (Element, Attribute, Text, CData, etc.) nor later when > implementing real DOM support on top of it. How about this (I forget the name of this pattern but its in the GOF book) class node_type_handler { public: virtual void do_something( node_proxy & node, int params_go_here ) = 0; static node_type_handler * handler( int node_type ); }; class element_node_type_handler : public node_type_handler { public: virtual void do_something( node_proxy & node, int params_go_here ) { element_proxy( node ).do_something( params_go_here ); } }; static node_type_handler * node_type_handler::handler( int node_type ) { static node_type_handler * handlers[ node_type ] = { new element_node_type_handler(), new attribute_node_type_handler() // ... }; return handlers[ node_type ]; } class node_proxy { public: void do_something( int params_go_here ) { node_type_handler::handler( type() )-> do_something( *this, params_go_here ); } }; class element_proxy : public node_proxy { public: explicit element_proxy( const node_proxy & node ); void do_something( int params_go_here ) { // Actually do_something } }; > > I have attached the wrappers I have written. They do not cover much of > > libxml2 (just what I needed at the time). Feel free to borrow as much > > or as little from it as you like. > > yeah, looks interesting, and even more thin than my wrapper. However, > the thinner the wrapper gets the greater is indeed the danger of having > the whole design tied to a particular implementation, as William pointed > out earlier. I don't think that this is a problem with my wrapper lib, > but with your implementation you get dangerously close...:-) Making it thicker won't make it any easier to apply to other parsers. Especially if you rely on the existence of something like _private. If the wrapper is a good tight wrapper for libxml2 then if something needs changing to be portable then we can wrap the wrapper with a more portability layer. > > Looking up this node's parent node is thus simply > > > >>static_cast(this->my_impl->parent->_private); > > > > > > If there was a parent lookup in node_proxy it would be > > > > class node_proxy > > { > > public: > > node_proxy parent() { return node_proxy( node_->parent ); } > > ... > > private: > > xmlNodePtr node_; > > }; > > yes, and you could even make that an 'element_proxy' as you know that > parent nodes are always elements. However, with a flat set of (libxml2) > nodes that wouldn't work any more, so runtime polymorphism would be > lost. Well, may be there is no need for it either. I have to think over > that... True and that would fit in nicely with the code I outlined above node.first_child().do_something( 0 ); Would go through the polymorphic lookup and node.parent().do_something( 0 ); Would call the same code but without the lookup overhead. > well, but you could also make it such that > > xml::dom::document doc = xml::dom::parse_file("a.xml"); > > works with parse_file being a function. That would mean the document is > copied, but then following your philosophy xml::dom::document could be > a proxy, too, so copying could be cheap... I don't think xml::dom::document should be a proxy as I see it as the container and owner of all the nodes. But if I remember correctly the syntax above will work if parse_file is a class and document has a constructor that takes it. > I really don't like the idea of 'parse_file' being an object (whichs > state being a potentially already parsed document). It's unintuive. > > > parse_stream would indeed be even better. As I recall there are > > functions in libxml2 that allow you to write to the parser as well. > > erm, that's even more confusing, I think. A parser should remain > just a parser, i.e. something that extracts tokens from an input stream. It is still just a parser (but works as a state machine). Check out xmlCreatePushParserCtxt and xmlParseChunk. Say you need to receive lots of xml files over the internet and parse them all at once. You could use a thread per connection and have the parsers read from the stream but that would require lots of threads. With the "push" interface you can use async io to read from the sockets and then write the data to the parsers as you get it. Because the state of the parse is not stored on the stack you do not need a separate thread for each parser. -- Hamish Mackenzie ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
On Fri, 2003-06-13 at 12:14, Peter Dimov wrote: > Hamish Mackenzie wrote: > > > > 3) Why dom::basic_document::clone? Why not have the copy constructor > > and assignment operator should do a deep copy of the document? This > > is consistent with other containers. If you want to stick with clone > > return > > an auto_ptr and and derive basic_document from boost::noncopyable. > > Whether clone is appropriate depends on the object model that we choose for > Document and Node. One option is the Java-ish > Only shallow copy if it is a pointer, iterator or reference. So you would have to call them document_ptr, node_ptr, etc. To make it clear they are pointers. > A safer Node alternative is > > struct Node > { > xmlNodePtr impl_; // or however it's spelled > > shared_ptr pi_; // keep Document alive > }; But it won't prevent someone removing the root node from the document and thereby invalidating all the document's nodes. So it is a false sense of security. I can't see an easy way to implement a "node" type, but that's ok because we don't need to What we need is a node_iterator which returns a node_proxy when dereferenced (see the files attached to my second post). node_proxy is the reference_type of the container there would be no value_type (perhaps node_reference would be a better name). > Reference semantics are convenient when passing and returning Documents > to/from functions. Documents can be deep copied with clone(). > > Another option is to drop the reference semantics. A Document can be > noncopyable with clone(), mandating the use of auto_ptr, or it can have deep > copy semantics. Which is what I meant when I said... > > If you want to stick with clone return > > an auto_ptr and and derive basic_document from boost::noncopyable But you have to ask yourself how would you feel if std::vector worked this way? > All of these solutions have their pros and cons, but other things being > equal I tend towards the Java model. Deep copy seems inappropriate for a > Document since it is a very expensive operation that's better given an > explicit name. (I just found a bug in my code where I accidentally passed an > expensive data structure by value; performance went downhill, I was > stumped.) It might be expensive for large documents (probably order N) but so is std::vector's copy constructor. How would you have felt if you had used an object believing it to deep copy only to find out it didn't and as a result all the customer records in your database ended up pointing to the same address value? shared_ptr< document > is easy to spot. Shallow copy is for iterator, pointer and reference types only! It is dangerous and should be written on the tin in bold print. -- Hamish Mackenzie ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: That way I'v got a pointer from the C struct to the C++ wrapper (_private), as well as a pointer from the C++ wrapper to the C struct (my_impl). Would a thin proxy object not be a better way to go? More in keeping with the "you don't pay for what you don't use" philosophy of C++. Hmm, I see your point. Well, that would be possible, but that way you are unable to make nodes polymorphic. Neither with respect to the basic node types (Element, Attribute, Text, CData, etc.) nor later when implementing real DOM support on top of it. I have attached the wrappers I have written. They do not cover much of libxml2 (just what I needed at the time). Feel free to borrow as much or as little from it as you like. yeah, looks interesting, and even more thin than my wrapper. However, the thinner the wrapper gets the greater is indeed the danger of having the whole design tied to a particular implementation, as William pointed out earlier. I don't think that this is a problem with my wrapper lib, but with your implementation you get dangerously close...:-) If you look in node_iterator.h you can see that it uses a proxy containing just a pointer to the libxml2 node. indeed. Looking up this node's parent node is thus simply static_cast(this->my_impl->parent->_private); If there was a parent lookup in node_proxy it would be class node_proxy { public: node_proxy parent() { return node_proxy( node_->parent ); } ... private: xmlNodePtr node_; }; yes, and you could even make that an 'element_proxy' as you know that parent nodes are always elements. However, with a flat set of (libxml2) nodes that wouldn't work any more, so runtime polymorphism would be lost. Well, may be there is no need for it either. I have to think over that... making 'parse_file' a class suggests it is carrying some data/state. What would that be ? I'm thinking of 'parse_file' as a stateless factory, i.e. a function returning a newly created document. Sorry I should have included the ... bit class parse_file { public: parse_file( const std::string & f ) : file_name_ {} private: const std::string & file_name_; template< ... > friend class ::boost::xml::dom::document; }; Put that together with the constructor... template<...> class basic_document { public: basic_document( const parse_file & f ) { // does what parse_file function does now } }; And you can write code that goes xml::dom::document doc( xml::dom::parse_file( "a.xml" ) ); No auto_ptr needed. It's not in my wrappers as I only thought of it while I was reading through yours. well, but you could also make it such that xml::dom::document doc = xml::dom::parse_file("a.xml"); works with parse_file being a function. That would mean the document is copied, but then following your philosophy xml::dom::document could be a proxy, too, so copying could be cheap... I really don't like the idea of 'parse_file' being an object (whichs state being a potentially already parsed document). It's unintuive. parse_stream would indeed be even better. As I recall there are functions in libxml2 that allow you to write to the parser as well. erm, that's even more confusing, I think. A parser should remain just a parser, i.e. something that extracts tokens from an input stream. Best regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: > > 3) Why dom::basic_document::clone? Why not have the copy constructor > and assignment operator should do a deep copy of the document? This > is consistent with other containers. If you want to stick with clone > return > an auto_ptr and and derive basic_document from boost::noncopyable. Whether clone is appropriate depends on the object model that we choose for Document and Node. One option is the Java-ish Document d = parse_file("test.xml"); Node n = d.root(); with Document and Node having reference semantics. Internally Document could be struct Document { shared_ptr pi_; }; and Node could be struct Node { xmlNodePtr impl_; // or however it's spelled }; A safer Node alternative is struct Node { xmlNodePtr impl_; // or however it's spelled shared_ptr pi_; // keep Document alive }; Reference semantics are convenient when passing and returning Documents to/from functions. Documents can be deep copied with clone(). Another option is to drop the reference semantics. A Document can be noncopyable with clone(), mandating the use of auto_ptr, or it can have deep copy semantics. All of these solutions have their pros and cons, but other things being equal I tend towards the Java model. Deep copy seems inappropriate for a Document since it is a very expensive operation that's better given an explicit name. (I just found a bug in my code where I accidentally passed an expensive data structure by value; performance went downhill, I was stumped.) ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
On Thu, 2003-06-12 at 22:12, Stefan Seefeld wrote: > I'm wrapping libxml2. The structs provided by libxml2 all carry a > '_private' member, precisely because it's a good way for extensions > such as language wrapping. > libxml2 itself calls callbacks whenever it allocates instances of these > structs, and I allocate my C++ wrappers in these callbacks, and let the > _private member point to it. > That way I'v got a pointer from the C struct to the C++ wrapper > (_private), as well as a pointer from the C++ wrapper to the C struct > (my_impl). Would a thin proxy object not be a better way to go? More in keeping with the "you don't pay for what you don't use" philosophy of C++. I have attached the wrappers I have written. They do not cover much of libxml2 (just what I needed at the time). Feel free to borrow as much or as little from it as you like. If you look in node_iterator.h you can see that it uses a proxy containing just a pointer to the libxml2 node. Looking up this node's parent node is thus simply > static_cast(this->my_impl->parent->_private); If there was a parent lookup in node_proxy it would be class node_proxy { public: node_proxy parent() { return node_proxy( node_->parent ); } ... private: xmlNodePtr node_; }; > > Here are some suggestions... > > > > 1) parse_file could be > > > > class parse_file > > { > > public: > > parse_file( const std::string & ) : {} > > ... > > }; > > making 'parse_file' a class suggests it is carrying some data/state. > What would that be ? I'm thinking of 'parse_file' as a stateless > factory, i.e. a function returning a newly created document. Sorry I should have included the ... bit class parse_file { public: parse_file( const std::string & f ) : file_name_ {} private: const std::string & file_name_; template< ... > friend class ::boost::xml::dom::document; }; Put that together with the constructor... template<...> class basic_document { public: basic_document( const parse_file & f ) { // does what parse_file function does now } }; And you can write code that goes xml::dom::document doc( xml::dom::parse_file( "a.xml" ) ); No auto_ptr needed. It's not in my wrappers as I only thought of it while I was reading through yours. > > 2) How about parse_string (based on xmlParseMemory)? > > hmm, while that would be possible, I think it's more C++'ish to > provide document extraction from a std::streambuf, which could be > a string_buf or any other buffer implementation. (Note that I wouldn't > use std::iostreams as that would suggest that formatted extraction is > possible, which would only work on ascii, not on unicode content. parse_stream would indeed be even better. As I recall there are functions in libxml2 that allow you to write to the parser as well. How about this for reading an input stream... xml::dom::document doc( parse_stream( std::cin.rdbuf() ) ); and if you want to write to to the parser (for instance if the data is coming from a series of asynchronous read operations). class document { public: std::streambuf & parser(); }; xml::dom::document doc; doc.parser().write( buffer, buffer_size ); > > 4) How about something like > > > > class basic_document > > { > > public: > > ... > > > > typedef node_iterator iterator; > > typedef const_node_iterator const_iterator; > > > > iterator begin() > > { return iterator( ptr_->children ); } > > const_iterator begin() const > > { return const_iterator( ptr_->children ); } > > iterator end() { return iterator(); } > > const_iterator end() const { return const_iterator(); } > > > > iterator root() > > { return iterator( xmlDocGetRootElement( ptr_ ) ); } > > const_iterator root() const > > { return const_iterator( xmlDocGetRootElement( ptr_ ) ); } > > }; > > I don't understand: are you suggesting an iterator that traverses > the whole tree (as opposed the children of a single node) ? No just the children. If figure node.begin() and node.end() be fore iteration over the child nodes. -- Hamish Mackenzie <[EMAIL PROTECTED]> xml_stuff.tar.gz Description: application/compressed-tar ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Hamish Mackenzie wrote: Looks good. I have only looked at the dom stuff so far. Why are you storing information in _private? What goes in there that could not be extracted when the nodes are accessed? It seems like a lot of unnecessary overhead. I'm wrapping libxml2. The structs provided by libxml2 all carry a '_private' member, precisely because it's a good way for extensions such as language wrapping. libxml2 itself calls callbacks whenever it allocates instances of these structs, and I allocate my C++ wrappers in these callbacks, and let the _private member point to it. That way I'v got a pointer from the C struct to the C++ wrapper (_private), as well as a pointer from the C++ wrapper to the C struct (my_impl). Looking up this node's parent node is thus simply static_cast(this->my_impl->parent->_private); Here are some suggestions... 1) parse_file could be class parse_file { public: parse_file( const std::string & ) : {} ... }; making 'parse_file' a class suggests it is carrying some data/state. What would that be ? I'm thinking of 'parse_file' as a stateless factory, i.e. a function returning a newly created document. 2) How about parse_string (based on xmlParseMemory)? hmm, while that would be possible, I think it's more C++'ish to provide document extraction from a std::streambuf, which could be a string_buf or any other buffer implementation. (Note that I wouldn't use std::iostreams as that would suggest that formatted extraction is possible, which would only work on ascii, not on unicode content. 3) Why dom::basic_document::clone? Why not have the copy constructor and assignment operator should do a deep copy of the document? This is consistent with other containers. If you want to stick with clone return an auto_ptr and and derive basic_document from boost::noncopyable. fair enough. 4) How about something like class basic_document { public: ... typedef node_iterator iterator; typedef const_node_iterator const_iterator; iterator begin() { return iterator( ptr_->children ); } const_iterator begin() const { return const_iterator( ptr_->children ); } iterator end() { return iterator(); } const_iterator end() const { return const_iterator(); } iterator root() { return iterator( xmlDocGetRootElement( ptr_ ) ); } const_iterator root() const { return const_iterator( xmlDocGetRootElement( ptr_ ) ); } }; I don't understand: are you suggesting an iterator that traverses the whole tree (as opposed the children of a single node) ? While that would be possible, I don't think it would actually be useful. I'v written a Visitor which I use to look up specific nodes. As the node 'type system' is known without C++ RTTI that doesn't require double dispatching though... 5) basic_xpath and basic_xpath_result should be derived from boost::noncopyable ok 6) Leading _ chars are a no no (reserved for compiler implementors). I know _private is defined in libxml2, but what you see below is not... ok Thanks for the feedback ! Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Looks good. I have only looked at the dom stuff so far. Why are you storing information in _private? What goes in there that could not be extracted when the nodes are accessed? It seems like a lot of unnecessary overhead. Here are some suggestions... 1) parse_file could be class parse_file { public: parse_file( const std::string & ) : {} ... }; template<...> class basic_document { public: basic_document( const parse_file & ); } 2) How about parse_string (based on xmlParseMemory)? 3) Why dom::basic_document::clone? Why not have the copy constructor and assignment operator should do a deep copy of the document? This is consistent with other containers. If you want to stick with clone return an auto_ptr and and derive basic_document from boost::noncopyable. 4) How about something like class basic_document { public: ... typedef node_iterator iterator; typedef const_node_iterator const_iterator; iterator begin() { return iterator( ptr_->children ); } const_iterator begin() const { return const_iterator( ptr_->children ); } iterator end() { return iterator(); } const_iterator end() const { return const_iterator(); } iterator root() { return iterator( xmlDocGetRootElement( ptr_ ) ); } const_iterator root() const { return const_iterator( xmlDocGetRootElement( ptr_ ) ); } }; 5) basic_xpath and basic_xpath_result should be derived from boost::noncopyable 6) Leading _ chars are a no no (reserved for compiler implementors). I know _private is defined in libxml2, but what you see below is not... $ grep ' _' * basic_attribute.h:#ifndef _dom_basic_attribute_h basic_attribute.h:#define _dom_basic_attribute_h basic_comment.h:#ifndef _dom_basic_comment_h basic_comment.h:#define _dom_basic_comment_h basic_document.h:#ifndef _dom_basic_document_h basic_document.h:#define _dom_basic_document_h basic_dtd.h:#ifndef _dom_basic_dtd_h basic_dtd.h:#define _dom_basic_dtd_h basic_element.h:#ifndef _dom_basic_element_h basic_element.h:#define _dom_basic_element_h basic_node.h:#ifndef _dom_basic_node_h basic_node.h:#define _dom_basic_node_h basic_node.h: basic_node_iterator(impl *current = 0) : _current(current) {} basic_node.h: bool operator == (self i) { return _current == i._current;} basic_node.h: void increment() { _current = _current->next;} basic_node.h: void decrement() { _current = _current->prev;} basic_node.h: basic_node_const_iterator(const impl *current = 0) : _current(current) {} basic_node.h: bool operator == (self i) { return _current == i._current;} basic_node.h: void increment() { _current = _current->next;} basic_node.h: void decrement() { _current = _current->prev;} basic_pi.h:#ifndef _dom_basic_pi_h basic_pi.h:#define _dom_basic_pi_h basic_text.h:#ifndef _dom_basic_text_h basic_text.h:#define _dom_basic_text_h basic_traversal.h:#ifndef _dom_basic_traversal_h basic_traversal.h:#define _dom_basic_traversal_h Hamish On Thu, 2003-06-12 at 03:45, Stefan Seefeld wrote: > hi there, > > following some discussion we had some weeks ago, > I'd like to invite everybody to review xml++.tgz at > > http://groups.yahoo.com/group/boost/files/xml/ > > It's a DOM-like and a SAX-like API currently implemented > on top of libxml2 (http://www.xmlsoft.org). > > What it provides: > > * parsing of xml files and creation of a document tree > * manipulation of document tree, i.e. insertion and >deletion of nodes > * node iteration, search (xpath based) > * document output to a (xml) file > > * event driven xml file parsing (sax) > > To be added: > > * validation (dtd, schema, etc.) > * ? > > Is there any interest in this library evolving > into a boost::xml library ? If so, what needs to change, > what needs to be added / removed ? > > Regards, > Stefan > > ___ > Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost -- Hamish Mackenzie <[EMAIL PROTECTED]> ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Peter Dimov wrote: Looks reasonable, but we don't want the architecture of the backend to affect the interface. right. So what is would be reasonable semantics to expect from a dom API ? May be I'v just got used to libxml2's way of life, but I think it is a good choice. Nodes are owned by their parents, so you can do dom::element *child = parent->add_child("info"); And calling dom::element::iterator i = parent->find(child); parent->erase_child(i); will invalidate 'child'. I don't know of any way to make that more safe while still being efficient. There is also the problem that the user can be left with an invalid pointer after the document has been deleted. Yes, but is that a problem ? Of course it has to be written in bold strokes: "Don't delete a document while operating on its content !", but I think the main idea to get across is that nodes *can only* exist in the context of a document. That's not only because of memory ownership issues, but also for a variety of other contextual data associated with a node, such as namespaces. I had a long discussion with the libxml2 author about ownership semantics and he convinced me that the current way is the best tradeoff between simplicity/ease of use and efficiency. Since the DOM is a tree and has no cycles, it should be possible to get fairly close to the Java interface using strict ownership or shared_ptr underneath. In the libxml2 case every Node would need to keep the whole Document alive but this may not be necessary given a different backend. I don't understand that. The document owns its nodes, so letting nodes reference the document would create loops, no ? Of course if this isn't practical a quick fix would be to return a shared_ptr from parse_file. yeah, that's unrelated. But why not std::auto_ptr then ? Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Stefan Seefeld wrote: > Peter Dimov wrote: >> >> I think that there is considerable interest in a boost::xml library. >> But... >> >> Document *document = Document::parse_file(argv[1]); >> >> ... I don't believe that a raw pointer based interface is acceptable. >> >> xml::dom::document document = xml::dom::parse_file(argv[1]); >> >> looks much better. > > Good catch. However, it looks worse than it actually is :-) : > > The memory management for nodes is entirely handled by the backend > (libxml2), i.e. nodes are always created and deleted by their parents. > Constructors and destructors are protected. > > The 'Document' class is the only one that is owned directly by the > user, and thus has to be deleted. Looks reasonable, but we don't want the architecture of the backend to affect the interface. There is also the problem that the user can be left with an invalid pointer after the document has been deleted. Since the DOM is a tree and has no cycles, it should be possible to get fairly close to the Java interface using strict ownership or shared_ptr underneath. In the libxml2 case every Node would need to keep the whole Document alive but this may not be necessary given a different backend. Of course if this isn't practical a quick fix would be to return a shared_ptr from parse_file. ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Peter Dimov wrote: Stefan Seefeld wrote: Is there any interest in this library evolving into a boost::xml library ? If so, what needs to change, what needs to be added / removed ? I think that there is considerable interest in a boost::xml library. But... Document *document = Document::parse_file(argv[1]); ... I don't believe that a raw pointer based interface is acceptable. xml::dom::document document = xml::dom::parse_file(argv[1]); looks much better. Good catch. However, it looks worse than it actually is :-) : The memory management for nodes is entirely handled by the backend (libxml2), i.e. nodes are always created and deleted by their parents. Constructors and destructors are protected. The 'Document' class is the only one that is owned directly by the user, and thus has to be deleted. BTW why is basic_document<>::write_file virtual but basic_document<>::clone isn't? hmm, good question. I don't see any need for it to be virtual, I wouldn't expect anybody to derive from Document. Regards, Stefan ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] API Review request: XML APIs for C++
Stefan Seefeld wrote: > > Is there any interest in this library evolving > into a boost::xml library ? If so, what needs to change, > what needs to be added / removed ? I think that there is considerable interest in a boost::xml library. But... Document *document = Document::parse_file(argv[1]); ... I don't believe that a raw pointer based interface is acceptable. xml::dom::document document = xml::dom::parse_file(argv[1]); looks much better. BTW why is basic_document<>::write_file virtual but basic_document<>::clone isn't? The SAX part looks OK to me. ___ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost