Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie
On Fri, 2003-06-13 at 12:14, Peter Dimov wrote:
 Hamish Mackenzie wrote:
 
  3) Why dom::basic_document::clone?  Why not have the copy constructor
  and assignment operator should do a deep copy of the document?  This
  is consistent with other containers. If you want to stick with clone
  return
  an auto_ptr and and derive basic_document from boost::noncopyable.
 
 Whether clone is appropriate depends on the object model that we choose for
 Document and Node. One option is the Java-ish
 

Only shallow copy if it is a pointer, iterator or reference.

So you would have to call them document_ptr, node_ptr, etc.  To make it
clear they are pointers.

 A safer Node alternative is
 
 struct Node
 {
 xmlNodePtr impl_; // or however it's spelled
 
 shared_ptrdocument_impl pi_; // keep Document alive
 };

But it won't prevent someone removing the root node from the document
and thereby invalidating all the document's nodes.  So it is a false
sense of security.

I can't see an easy way to implement a node type, but that's ok
because we don't need to

What we need is a node_iterator which returns a node_proxy when
dereferenced (see the files attached to my second post).  node_proxy is
the reference_type of the container there would be no value_type
(perhaps node_reference would be a better name).

 Reference semantics are convenient when passing and returning Documents
 to/from functions. Documents can be deep copied with clone().
 
 Another option is to drop the reference semantics. A Document can be
 noncopyable with clone(), mandating the use of auto_ptr, or it can have deep
 copy semantics.

Which is what I meant when I said...
  If you want to stick with clone return
  an auto_ptr and and derive basic_document from boost::noncopyable

But you have to ask yourself how would you feel if std::vector worked
this way?

 All of these solutions have their pros and cons, but other things being
 equal I tend towards the Java model. Deep copy seems inappropriate for a
 Document since it is a very expensive operation that's better given an
 explicit name. (I just found a bug in my code where I accidentally passed an
 expensive data structure by value; performance went downhill, I was
 stumped.)

It might be expensive for large documents (probably order N) but so is
std::vector's copy constructor.

How would you have felt if you had used an object believing it to deep
copy only to find out it didn't and as a result all the customer records
in your database ended up pointing to the same address value? 
shared_ptr document  is easy to spot.

Shallow copy is for iterator, pointer and reference types only! It is
dangerous and should be written on the tin in bold print.

-- 
Hamish Mackenzie

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie
On Fri, 2003-06-13 at 14:11, Stefan Seefeld wrote:
 Hmm, I see your point. Well, that would be possible, but that way you
 are unable to make nodes polymorphic. Neither with respect to the basic
 node types (Element, Attribute, Text, CData, etc.) nor later when 
 implementing real DOM support on top of it.

How about this (I forget the name of this pattern but its in the GOF
book)

class node_type_handler
{
public:
  virtual void do_something( node_proxy  node, int params_go_here )
= 0;
  static node_type_handler * handler( int node_type );
};

class element_node_type_handler : public node_type_handler
{
public:
  virtual void do_something( node_proxy  node, int params_go_here )
  {
element_proxy( node ).do_something( params_go_here );
  } 
};

static node_type_handler * node_type_handler::handler( int node_type )
{ 
  static node_type_handler * handlers[ node_type ] =
  {
new element_node_type_handler(),
new attribute_node_type_handler()
// ...
  };
  
  return handlers[ node_type ];
}

class node_proxy
{
public:
  void do_something( int params_go_here )
  {
node_type_handler::handler( type() )-
  do_something( *this, params_go_here );
  }
};

class element_proxy : public node_proxy
{
public:
  explicit element_proxy( const node_proxy  node );
  void do_something( int params_go_here )
  {
// Actually do_something
  }
};

  I have attached the wrappers I have written.  They do not cover much of
  libxml2 (just what I needed at the time).  Feel free to borrow as much
  or as little from it as you like.
 
 yeah, looks interesting, and even more thin than my wrapper. However,
 the thinner the wrapper gets the greater is indeed the danger of having
 the whole design tied to a particular implementation, as William pointed
 out earlier. I don't think that this is a problem with my wrapper lib,
 but with your implementation you get dangerously close...:-)

Making it thicker won't make it any easier to apply to other parsers. 
Especially if you rely on the existence of something like _private.  If
the wrapper is a good tight wrapper for libxml2 then if something needs
changing to be portable then we can wrap the wrapper with a more
portability layer.

  Looking up this node's parent node is thus simply 
  
 static_castNode *(this-my_impl-parent-_private);
  
  
  If there was a parent lookup in node_proxy it would be
  
  class node_proxy
  {
  public:
node_proxy parent() { return node_proxy( node_-parent ); }
  ...
  private:
xmlNodePtr node_;
  };
 
 yes, and you could even make that an 'element_proxy' as you know that
 parent nodes are always elements. However, with a flat set of (libxml2)
 nodes that wouldn't work any more, so runtime polymorphism would be
 lost. Well, may be there is no need for it either. I have to think over
 that...

True and that would fit in nicely with the code I outlined above

node.first_child().do_something( 0 );

Would go through the polymorphic lookup and

node.parent().do_something( 0 );

Would call the same code but without the lookup overhead.

 well, but you could also make it such that
 
 xml::dom::document doc = xml::dom::parse_file(a.xml);
 
 works with parse_file being a function. That would mean the document is
 copied, but then following your philosophy xml::dom::document could be
 a proxy, too, so copying could be cheap...

I don't think xml::dom::document should be a proxy as I see it as the
container and owner of all the nodes.  But if I remember correctly the
syntax above will work if parse_file is a class and document has a
constructor that takes it.

 I really don't like the idea of 'parse_file' being an object (whichs
 state being a potentially already parsed document). It's unintuive.
 
  parse_stream would indeed be even better.  As I recall there are
  functions in libxml2 that allow you to write to the parser as well.
 
 erm, that's even more confusing, I think. A parser should remain
 just a parser, i.e. something that extracts tokens from an input stream.

It is still just a parser (but works as a state machine).  Check out
xmlCreatePushParserCtxt and xmlParseChunk.

Say you need to receive lots of xml files over the internet and parse
them all at once.  You could use a thread per connection and have the
parsers read from the stream but that would require lots of threads.

With the push interface you can use async io to read from the sockets
and then write the data to the parsers as you get it.  Because the state
of the parse is not stored on the stack you do not need a separate
thread for each parser.

-- 
Hamish Mackenzie

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn
Hamish Mackenzie wrote:

How about this for reading an input stream...

xml::dom::document doc( parse_stream( std::cin.rdbuf() ) );
I like this facility. You could also make it accept an istream like this:

class parse_stream
{
  public:
 parse_stream( istream  is ): sb( is.rdbuf()){...}
 ...
};
or something similar, making the above the simpler:
  xml::dom::document doc( parse_stream( std::cin ));
-

Stefan Seefeld wrote:

Yes, but is that a problem ? Of course it has to be written in bold 
strokes:
Don't delete a document while operating on its content !, but I think the 
main idea to get across is that nodes *can only* exist in the context
of a document. That's not only because of memory ownership issues, but also 
for a
variety of other contextual data associated with a node, such as 
namespaces.
Any node can define namespaces:

h:html xmlns:h = html4 xmlns:h = html4
  body
 m:math xmlns:m = mathml2.../m:math
 h:span.../h:span
 mml:math xmlns:m = mathml2.../mml:math
  /body
/h:html
so namespace lookup would propagate along the parents of each node until a 
matching namespace is found. This would mean that each node must store a 
(smart) pointer to a namespace mapping, in order to facilitate lookup (done 
during node construction). The node will also have a pointer to a namespace 
information object that looks something like:

class namespaceInfo
{
  std::string url;
  std::string name;
};
Have I got this wrong?

-

I also like the
  xml::dom::document document = xml::dom::parse_file(argv[1]);
style usage. Also:
  xml::dom::element e = doc.documentElement();
  xml::dom::node n = e;
This is just my preferred style/usage, and does not have to be adopted. This 
would, however, allow the implementation to be adapted to when memory 
management needs to be used and to switch between different management 
policies for the various implementations.

_
Tired of 56k? Get a FREE BT Broadband connection 
http://www.msn.co.uk/specials/btbroadband

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn
Reece Dunn wrote:

h:html xmlns:h = html4 xmlns:h = html4  body m:math xmlns:m = 
mathml2.../m:math h:span.../h:span mml:math xmlns:m = 
mathml2.../mml:math  /body /h:html
This should have been (with a default namespace):

h:html xmlns:h = html4 xmlns = html4
 body
m:math xmlns:m = mathml2.../m:math
h:span.../h:span
mml:math xmlns:m = mathml2.../mml:math
 /body
/h:html
Regards,
Reece
_
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Stefan Seefeld
Reece Dunn wrote:

Any node can define namespaces:

h:html xmlns:h = html4 xmlns:h = html4
  body
 m:math xmlns:m = mathml2.../m:math
 h:span.../h:span
 mml:math xmlns:m = mathml2.../mml:math
  /body
/h:html
so namespace lookup would propagate along the parents of each node until 
a matching namespace is found. This would mean that each node must store 
a (smart) pointer to a namespace mapping, in order to facilitate lookup 
(done during node construction). The node will also have a pointer to a 
namespace information object that looks something like:

class namespaceInfo
{
  std::string url;
  std::string name;
};
Have I got this wrong?
no, it's correct...on a conceptual level.

The point I'm trying to make is that libxml2 does all this for me
already. The node itself doesn't store all the active namespaces, as
that would be a horrible waste. You have to query them (and therefor
you'll access the document and parent nodes).
For example, if a node is moved from one place to another, its context
will change and therefor some necessary adjustments have to be made
notably concerning the namespaces. A look into the libxml2 source code
convinced me that it's quite a tricky job...
I also like the
  xml::dom::document document = xml::dom::parse_file(argv[1]);
style usage. Also:
  xml::dom::element e = doc.documentElement();
  xml::dom::node n = e;
This is just my preferred style/usage, and does not have to be adopted. 
you mean you vote for the 'nodes-are-references' style ?

This would, however, allow the implementation to be adapted to when 
memory management needs to be used and to switch between different 
management policies for the various implementations.
Can you elaborate ?

Regards,
Stefan
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie
On Fri, 2003-06-13 at 16:11, Stefan Seefeld wrote:
 well, it looks like a mix of things. What you are doing, essentially,
 is wrapping a polymorphic 'do_something' method around a non-C++
 type system, i.e. the real method invocation is done with a 'type()'
 discriminator.

Yes.

 Yes, I can see that, for the xml node types. But for that we don't even
 need anything but a single 'node_proxy' class (with a 'type()' method
 returning an enum).

True.

Though it might be useful in the next layer of the interface. If you end
up with another proxy type that has lots of

switch( raw_node_.some_node_property() )
{

}

I use a very similar system in an HTML editor where the entity name is
used to lookup the handler.  Because the lookup can be slow I cache the
result in the node itself.  We might have to expose _private if we
wanted to do that :-(.

Maybe we do need to allow the use of _private in higher level layers.

If so how about something like this?...

template typename Value_Type, ... 
class document;

template typename Value_Type, ... 
class node_reference
{
public:
  void value( Value_Type * value )
  {
node_-_private = value;
  }
  Value_Type * value() const
  {
return static_cast Value_Type * ( node_-_private );
  }
};

Or some such???

 I think it is a good thing not to own them, but the semantics should
 be clear.
 
 yes, and you could even make that an 'element_proxy' as you know that
 parent nodes are always elements. However, with a flat set of (libxml2)
 nodes that wouldn't work any more, so runtime polymorphism would be
 lost. Well, may be there is no need for it either. I have to think over
 that...
  
  
  True and that would fit in nicely with the code I outlined above
  
  node.first_child().do_something( 0 );
  
  Would go through the polymorphic lookup and
  
  node.parent().do_something( 0 );
  
  Would call the same code but without the lookup overhead.
 
 indeed, though, on further thinking, nodes themselfs don't do anything,
 so we could as well keep this polymorphism outside the node class, and
 let nodes only provide their type as an enum.
 
 I start to like your node reference class quite a lot... :-)

I am definitely leaning toward node_reference rather than node_proxy
is that your preferred name too? 

 parse_stream would indeed be even better.  As I recall there are
 functions in libxml2 that allow you to write to the parser as well.
 
 erm, that's even more confusing, I think. A parser should remain
 just a parser, i.e. something that extracts tokens from an input stream.
  
  
  It is still just a parser (but works as a state machine).  Check out
  xmlCreatePushParserCtxt and xmlParseChunk.
  
  Say you need to receive lots of xml files over the internet and parse
  them all at once.  You could use a thread per connection and have the
  parsers read from the stream but that would require lots of threads.
 
 Are you suggesting that all the different xml files should be merged
 into a single dom document ?

No many separate documents each needing there own parser and thread.

  With the push interface you can use async io to read from the sockets
  and then write the data to the parsers as you get it.  Because the state
  of the parse is not stored on the stack you do not need a separate
  thread for each parser.
 
 yeah, a parser for asynchronous document creation may be interesting.
 But I see that as a somewhat different beast. Simple (local, 
 synchronous) document creation from an xml file doesn't need to
 go over a stateful parser object.

Its not a high priority but it is nice that libxml2 supports it.  

-- 
Hamish Mackenzie [EMAIL PROTECTED]

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn
Stefan Seefeld wrote

I also like the
  xml::dom::document document = xml::dom::parse_file(argv[1]);
style usage. Also:
  xml::dom::element e = doc.documentElement();
  xml::dom::node n = e;

This is just my preferred style/usage, and does not have to be adopted.

you mean you vote for the 'nodes-are-references' style ?
I mean, for example:
  namespace boost { namespace xml { namespace dom
  {
 typedef DocumentImpl * document;
  }}}
This would, however, allow the implementation to be adapted to when memory 
management needs to be used and to switch between different management 
policies for the various implementations.

Can you elaborate ?
Sure. In the libxml2 versions, nodes are explicitly deleted, and thus do not 
need to be managed, e.g:
  typedef node_impl * node;

If an object requires explicit management, via shared_ptr for example:
  typedef boost::shared_ptr node_impl  node;
Or, if it is say using a wrapper around an MSXML interface:
  class node: public CComPtr IXMLDOMNode { ... };
The user can thus write:
  boost::xml::dom::node mynode = doc.selectNode( /h:*[1] );
without worrying about the underlying representation of 
boost::xml::dom::node.

Regards,
Reece
_
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie
On Fri, 2003-06-13 at 19:39, Reece Dunn wrote:
 This would, however, allow the implementation to be adapted to when memory 
 management needs to be used and to switch between different management 
 policies for the various implementations.
 
 Can you elaborate ?
 
 Sure. In the libxml2 versions, nodes are explicitly deleted, and thus do not 
 need to be managed, e.g:
typedef node_impl * node;
 
 If an object requires explicit management, via shared_ptr for example:
typedef boost::shared_ptr node_impl  node;
 
 Or, if it is say using a wrapper around an MSXML interface:
class node: public CComPtr IXMLDOMNode { ... };
 
 The user can thus write:
boost::xml::dom::node mynode = doc.selectNode( /h:*[1] );
 
 without worrying about the underlying representation of 
 boost::xml::dom::node.

These are all pointer types though, so node_ptr is perhaps be a better
name.

How do you feel about having a node_reference instead.

// libxml2
class node_reference
{
private:
  xmlNodePtr node_;
};

// MSXML
class node_reference
{
private:
  CComPtr IXMLDOMNode  node_; 
};

boost::xml::dom::node_reference mynode = doc.selectNode( /h:*[1] );

A node_reference to an object in the document would only be safe as long
as that node existed in the document.

-- 
Hamish Mackenzie [EMAIL PROTECTED]

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn
Hamish Mackenzie wrote:

How do you feel about having a node_reference instead.
What are the access differences? I.e. could you use '.' to access 
attributes/methods as opposed to '-'?

// libxml2
class node_reference
{
private:
  xmlNodePtr node_;
};
[snip]

boost::xml::dom::node_reference mynode = doc.selectNode( /h:*[1] );
I am not that bothered *how* it is implemented, so long as it provides a 
decent enough interface.

I was also showing how you could make it more portable by not explicitly 
using a
  Document * doc = ...;
style code fragment.

Regards,
Reece
_
Find a cheaper internet access deal - choose one to suit you. 
http://www.msn.co.uk/internetaccess

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Peter Dimov
Stefan Seefeld wrote:

 Is there any interest in this library evolving
 into a boost::xml library ? If so, what needs to change,
 what needs to be added / removed ?

I think that there is considerable interest in a boost::xml library. But...

Document *document = Document::parse_file(argv[1]);

... I don't believe that a raw pointer based interface is acceptable.

xml::dom::document document = xml::dom::parse_file(argv[1]);

looks much better. BTW why is basic_document::write_file virtual but
basic_document::clone isn't?

The SAX part looks OK to me.

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Stefan Seefeld
Peter Dimov wrote:
Stefan Seefeld wrote:

Is there any interest in this library evolving
into a boost::xml library ? If so, what needs to change,
what needs to be added / removed ?


I think that there is considerable interest in a boost::xml library. But...

Document *document = Document::parse_file(argv[1]);

... I don't believe that a raw pointer based interface is acceptable.

xml::dom::document document = xml::dom::parse_file(argv[1]);

looks much better.
Good catch. However, it looks worse than it actually is :-) :

The memory management for nodes is entirely handled by the backend
(libxml2), i.e. nodes are always created and deleted by their parents.
Constructors and destructors are protected.
The 'Document' class is the only one that is owned directly by the
user, and thus has to be deleted.
BTW why is basic_document::write_file virtual but
basic_document::clone isn't?
hmm, good question. I don't see any need for it to be virtual, I
wouldn't expect anybody to derive from Document.
Regards,
Stefan
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Peter Dimov
Stefan Seefeld wrote:
 Peter Dimov wrote:

 I think that there is considerable interest in a boost::xml library.
 But...

 Document *document = Document::parse_file(argv[1]);

 ... I don't believe that a raw pointer based interface is acceptable.

 xml::dom::document document = xml::dom::parse_file(argv[1]);

 looks much better.

 Good catch. However, it looks worse than it actually is :-) :

 The memory management for nodes is entirely handled by the backend
 (libxml2), i.e. nodes are always created and deleted by their parents.
 Constructors and destructors are protected.

 The 'Document' class is the only one that is owned directly by the
 user, and thus has to be deleted.

Looks reasonable, but we don't want the architecture of the backend to
affect the interface. There is also the problem that the user can be left
with an invalid pointer after the document has been deleted.

Since the DOM is a tree and has no cycles, it should be possible to get
fairly close to the Java interface using strict ownership or shared_ptr
underneath. In the libxml2 case every Node would need to keep the whole
Document alive but this may not be necessary given a different backend.

Of course if this isn't practical a quick fix would be to return a
shared_ptrDocument from parse_file.

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Stefan Seefeld
Peter Dimov wrote:

Looks reasonable, but we don't want the architecture of the backend to
affect the interface.
right. So what is would be reasonable semantics to expect from a dom API
? May be I'v just got used to libxml2's way of life, but I think it is
a good choice. Nodes are owned by their parents, so you can do
dom::element *child = parent-add_child(info);

And calling

dom::element::iterator i = parent-find(child);
parent-erase_child(i);
will invalidate 'child'. I don't know of any way to make that more safe
while still being efficient.
There is also the problem that the user can be left
with an invalid pointer after the document has been deleted.
Yes, but is that a problem ? Of course it has to be written in bold 
strokes: Don't delete a document while operating on its content !,
but I think the main idea to get across is that nodes *can only* exist
in the context of a document. That's not only because of memory 
ownership issues, but also for a variety of other contextual data
associated with a node, such as namespaces.

I had a long discussion with the libxml2 author about ownership
semantics and he convinced me that the current way is the best tradeoff
between simplicity/ease of use and efficiency.
Since the DOM is a tree and has no cycles, it should be possible to get
fairly close to the Java interface using strict ownership or shared_ptr
underneath. In the libxml2 case every Node would need to keep the whole
Document alive but this may not be necessary given a different backend.
I don't understand that. The document owns its nodes, so letting nodes
reference the document would create loops, no ?
Of course if this isn't practical a quick fix would be to return a
shared_ptrDocument from parse_file.
yeah, that's unrelated. But why not std::auto_ptr then ?

Regards,
Stefan
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Hamish Mackenzie
Looks good.  I have only looked at the dom stuff so far.  Why are you
storing information in _private?  What goes in there that could not be
extracted when the nodes are accessed?  It seems like a lot of
unnecessary overhead.

Here are some suggestions...

1) parse_file could be

class parse_file
{
public:
  parse_file( const std::string  ) : {}
  ...
};

template...
class basic_document
{
public:
  basic_document( const parse_file  );
}

2) How about parse_string (based on xmlParseMemory)?

3) Why dom::basic_document::clone?  Why not have the copy constructor
and assignment operator should do a deep copy of the document?  This is
consistent with other containers. If you want to stick with clone return
an auto_ptr and and derive basic_document from boost::noncopyable.

4) How about something like

class basic_document
{
public:
  ...

  typedef node_iterator iterator;
  typedef const_node_iterator const_iterator;

  iterator begin()
{ return iterator( ptr_-children ); }
  const_iterator begin() const
{ return const_iterator( ptr_-children ); }
  iterator end() { return iterator(); }
  const_iterator end() const { return const_iterator(); }

  iterator root()
{ return iterator( xmlDocGetRootElement( ptr_ ) ); }
  const_iterator root() const
{ return const_iterator( xmlDocGetRootElement( ptr_ ) ); }
};

5) basic_xpath and basic_xpath_result should be derived from
boost::noncopyable

6) Leading _ chars are a no no (reserved for compiler implementors).  I
know _private is defined in libxml2, but what you see below is not...

$ grep ' _' *
basic_attribute.h:#ifndef _dom_basic_attribute_h
basic_attribute.h:#define _dom_basic_attribute_h
basic_comment.h:#ifndef _dom_basic_comment_h
basic_comment.h:#define _dom_basic_comment_h
basic_document.h:#ifndef _dom_basic_document_h
basic_document.h:#define _dom_basic_document_h
basic_dtd.h:#ifndef _dom_basic_dtd_h
basic_dtd.h:#define _dom_basic_dtd_h
basic_element.h:#ifndef _dom_basic_element_h
basic_element.h:#define _dom_basic_element_h
basic_node.h:#ifndef _dom_basic_node_h
basic_node.h:#define _dom_basic_node_h
basic_node.h:  basic_node_iterator(impl *current = 0) :
_current(current) {}
basic_node.h:  bool operator == (self i) { return _current ==
i._current;}
basic_node.h:  void increment() { _current = _current-next;}
basic_node.h:  void decrement() { _current = _current-prev;}
basic_node.h:  basic_node_const_iterator(const impl *current = 0) :
_current(current) {}
basic_node.h:  bool operator == (self i) { return _current ==
i._current;}
basic_node.h:  void increment() { _current = _current-next;}
basic_node.h:  void decrement() { _current = _current-prev;}
basic_pi.h:#ifndef _dom_basic_pi_h
basic_pi.h:#define _dom_basic_pi_h
basic_text.h:#ifndef _dom_basic_text_h
basic_text.h:#define _dom_basic_text_h
basic_traversal.h:#ifndef _dom_basic_traversal_h
basic_traversal.h:#define _dom_basic_traversal_h

Hamish

On Thu, 2003-06-12 at 03:45, Stefan Seefeld wrote: 
 hi there,
 
 following some discussion we had some weeks ago,
 I'd like to invite everybody to review xml++.tgz at
 
 http://groups.yahoo.com/group/boost/files/xml/
 
 It's a DOM-like and a SAX-like API currently implemented
 on top of libxml2 (http://www.xmlsoft.org).
 
 What it provides:
 
 * parsing of xml files and creation of a document tree
 * manipulation of document tree, i.e. insertion and
deletion of nodes
 * node iteration, search (xpath based)
 * document output to a (xml) file
 
 * event driven xml file parsing (sax)
 
 To be added:
 
 * validation (dtd, schema, etc.)
 * ?
 
 Is there any interest in this library evolving
 into a boost::xml library ? If so, what needs to change,
 what needs to be added / removed ?
 
 Regards,
   Stefan
 
 ___
 Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost







-- 







Hamish Mackenzie [EMAIL PROTECTED]

___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Stefan Seefeld
Hamish Mackenzie wrote:
Looks good.  I have only looked at the dom stuff so far.  Why are you
storing information in _private?  What goes in there that could not be
extracted when the nodes are accessed?  It seems like a lot of
unnecessary overhead.
I'm wrapping libxml2. The structs provided by libxml2 all carry a
'_private' member, precisely because it's a good way for extensions
such as language wrapping.
libxml2 itself calls callbacks whenever it allocates instances of these
structs, and I allocate my C++ wrappers in these callbacks, and let the
_private member point to it.
That way I'v got a pointer from the C struct to the C++ wrapper
(_private), as well as a pointer from the C++ wrapper to the C struct
(my_impl).
Looking up this  node's parent node is thus simply
static_castNode *(this-my_impl-parent-_private);

Here are some suggestions...

1) parse_file could be

class parse_file
{
public:
  parse_file( const std::string  ) : {}
  ...
};
making 'parse_file' a class suggests it is carrying some data/state.
What would that be ? I'm thinking of 'parse_file' as a stateless
factory, i.e. a function returning a newly created document.
2) How about parse_string (based on xmlParseMemory)?
hmm, while that would be possible, I think it's more C++'ish to
provide document extraction from a std::streambuf, which could be
a string_buf or any other buffer implementation. (Note that I wouldn't
use std::iostreams as that would suggest that formatted extraction is
possible, which would only work on ascii, not on unicode content.
3) Why dom::basic_document::clone?  Why not have the copy constructor
and assignment operator should do a deep copy of the document?  This is
consistent with other containers. If you want to stick with clone return
an auto_ptr and and derive basic_document from boost::noncopyable.
fair enough.

4) How about something like

class basic_document
{
public:
  ...
  typedef node_iterator iterator;
  typedef const_node_iterator const_iterator;
  iterator begin()
{ return iterator( ptr_-children ); }
  const_iterator begin() const
{ return const_iterator( ptr_-children ); }
  iterator end() { return iterator(); }
  const_iterator end() const { return const_iterator(); }
  iterator root()
{ return iterator( xmlDocGetRootElement( ptr_ ) ); }
  const_iterator root() const
{ return const_iterator( xmlDocGetRootElement( ptr_ ) ); }
};
I don't understand: are you suggesting an iterator that traverses
the whole tree (as opposed the children of a single node) ?
While that would be possible, I don't think it would actually
be useful. I'v written a Visitor which I use to look up specific
nodes. As the node 'type system' is known without C++ RTTI that
doesn't require double dispatching though...
5) basic_xpath and basic_xpath_result should be derived from
boost::noncopyable
ok

6) Leading _ chars are a no no (reserved for compiler implementors).  I
know _private is defined in libxml2, but what you see below is not...
ok
Thanks for the feedback !
Regards,
Stefan
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Hamish Mackenzie
On Thu, 2003-06-12 at 22:12, Stefan Seefeld wrote: 
 I'm wrapping libxml2. The structs provided by libxml2 all carry a
 '_private' member, precisely because it's a good way for extensions
 such as language wrapping.
 libxml2 itself calls callbacks whenever it allocates instances of these
 structs, and I allocate my C++ wrappers in these callbacks, and let the
 _private member point to it.
 That way I'v got a pointer from the C struct to the C++ wrapper
 (_private), as well as a pointer from the C++ wrapper to the C struct
 (my_impl).

Would a thin proxy object not be a better way to go?  More in keeping
with the you don't pay for what you don't use philosophy of C++.

I have attached the wrappers I have written.  They do not cover much of
libxml2 (just what I needed at the time).  Feel free to borrow as much
or as little from it as you like.

If you look in node_iterator.h you can see that it uses a proxy
containing just a pointer to the libxml2 node.

Looking up this node's parent node is thus simply 
 static_castNode *(this-my_impl-parent-_private);

If there was a parent lookup in node_proxy it would be

class node_proxy
{
public:
  node_proxy parent() { return node_proxy( node_-parent ); }
...
private:
  xmlNodePtr node_;
};

  Here are some suggestions...
  
  1) parse_file could be
  
  class parse_file
  {
  public:
parse_file( const std::string  ) : {}
...
  };
 
 making 'parse_file' a class suggests it is carrying some data/state.
 What would that be ? I'm thinking of 'parse_file' as a stateless
 factory, i.e. a function returning a newly created document.

Sorry I should have included the ... bit

class parse_file
{
public:
  parse_file( const std::string  f ) : file_name_ {}
private:
  const std::string  file_name_;
  
  template ... 
  friend class ::boost::xml::dom::document;
};

Put that together with the constructor...

template...
class basic_document
{
public:
  basic_document( const parse_file  f )
  {
// does what parse_file function does now
  }
};

And you can write code that goes

xml::dom::document doc( xml::dom::parse_file( a.xml ) );

No auto_ptr needed.  It's not in my wrappers as I only thought of it
while I was reading through yours.


  2) How about parse_string (based on xmlParseMemory)?
 
 hmm, while that would be possible, I think it's more C++'ish to
 provide document extraction from a std::streambuf, which could be
 a string_buf or any other buffer implementation. (Note that I wouldn't
 use std::iostreams as that would suggest that formatted extraction is
 possible, which would only work on ascii, not on unicode content.

parse_stream would indeed be even better.  As I recall there are
functions in libxml2 that allow you to write to the parser as well.

How about this for reading an input stream...

xml::dom::document doc( parse_stream( std::cin.rdbuf() ) );

and if you want to write to to the parser (for instance if the data is
coming from a series of asynchronous read operations).

class document
{
public:
  std::streambuf  parser();
};

xml::dom::document doc;
doc.parser().write( buffer, buffer_size );

  4) How about something like
  
  class basic_document
  {
  public:
...
  
typedef node_iterator iterator;
typedef const_node_iterator const_iterator;
  
iterator begin()
  { return iterator( ptr_-children ); }
const_iterator begin() const
  { return const_iterator( ptr_-children ); }
iterator end() { return iterator(); }
const_iterator end() const { return const_iterator(); }
  
iterator root()
  { return iterator( xmlDocGetRootElement( ptr_ ) ); }
const_iterator root() const
  { return const_iterator( xmlDocGetRootElement( ptr_ ) ); }
  };
 
 I don't understand: are you suggesting an iterator that traverses
 the whole tree (as opposed the children of a single node) ?

No just the children.  If figure node.begin() and node.end() be fore
iteration over the child nodes. 

-- 
Hamish Mackenzie [EMAIL PROTECTED]


xml_stuff.tar.gz
Description: application/compressed-tar
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


[boost] API Review request: XML APIs for C++

2003-06-11 Thread Stefan Seefeld
hi there,

following some discussion we had some weeks ago,
I'd like to invite everybody to review xml++.tgz at
http://groups.yahoo.com/group/boost/files/xml/

It's a DOM-like and a SAX-like API currently implemented
on top of libxml2 (http://www.xmlsoft.org).
What it provides:

* parsing of xml files and creation of a document tree
* manipulation of document tree, i.e. insertion and
  deletion of nodes
* node iteration, search (xpath based)
* document output to a (xml) file
* event driven xml file parsing (sax)

To be added:

* validation (dtd, schema, etc.)
* ?
Is there any interest in this library evolving
into a boost::xml library ? If so, what needs to change,
what needs to be added / removed ?
Regards,
Stefan
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost