subject:"Re\: \[boost\] API Review request\: XML APIs for C\+\+"

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Stefan Seefeld

Reece Dunn wrote:
Hamish Mackenzie wrote:

How do you feel about having a node_reference instead.


What are the access differences? I.e. could you use '.' to access 
attributes/methods as opposed to '->'?
yes of course, that's the whole point: it's a wrapper around a C 
pointer, endowed with all the methods you would expect in a C++ xml API.
But, much like in my original proposal, these methods are very thin
wrappers that delegate the call to libxml2.

Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn

Hamish Mackenzie wrote:

How do you feel about having a node_reference instead.
What are the access differences? I.e. could you use '.' to access 
attributes/methods as opposed to '->'?

// libxml2
class node_reference
{
private:
  xmlNodePtr node_;
};
[snip]

boost::xml::dom::node_reference mynode = doc.selectNode( "/h:*[1]" );
I am not that bothered *how* it is implemented, so long as it provides a 
decent enough interface.

I was also showing how you could make it more portable by not explicitly 
using a
  Document * doc = ...;
style code fragment.

Regards,
Reece
_
Find a cheaper internet access deal - choose one to suit you. 
http://www.msn.co.uk/internetaccess

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie

On Fri, 2003-06-13 at 19:39, Reece Dunn wrote:
> >>This would, however, allow the implementation to be adapted to when memory 
> >>management needs to be used and to switch between different management 
> >>policies for the various implementations.
> 
> >Can you elaborate ?
> 
> Sure. In the libxml2 versions, nodes are explicitly deleted, and thus do not 
> need to be managed, e.g:
>typedef node_impl * node;
> 
> If an object requires explicit management, via shared_ptr for example:
>typedef boost::shared_ptr< node_impl > node;
> 
> Or, if it is say using a wrapper around an MSXML interface:
>class node: public CComPtr< IXMLDOMNode >{ ... };
> 
> The user can thus write:
>boost::xml::dom::node mynode = doc.selectNode( "/h:*[1]" );
> 
> without worrying about the underlying representation of 
> boost::xml::dom::node.

These are all pointer types though, so node_ptr is perhaps be a better
name.

How do you feel about having a node_reference instead.

// libxml2
class node_reference
{
private:
  xmlNodePtr node_;
};

// MSXML
class node_reference
{
private:
  CComPtr< IXMLDOMNode > node_; 
};

boost::xml::dom::node_reference mynode = doc.selectNode( "/h:*[1]" );

A node_reference to an object in the document would only be safe as long
as that node existed in the document.

-- 
Hamish Mackenzie <[EMAIL PROTECTED]>

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie

On Fri, 2003-06-13 at 18:53, Stefan Seefeld wrote:
> > Maybe we do need to allow the use of _private in higher level layers.
> 
> I don't think that's a good idea, in particular because it's to tightly
> coupled to a particular implementation. We could allow a 'payload' for
> nodes (possibly a template parameter), but we could as well do the 
> association between a node and its template in an external lookup
> mechanism, which seems to be more clean and drag in the least overhead.

Cool.  I think the at least one association should belong to the
document (so that associations can be removed when the node or it's
parents are erased from the document).  Something like

template< typename node_data, ... >
class document
{
public:
  typedef xmlNodePtr node_key;
  std::map< node_key, node_data > & node_data_map(); 
  const std::map< node_key, node_data > & node_data_map() const;

private:
  std::map< node_key, node_data > node_data_map_; 
};

class node_reference
{
public:
  typedef xmlNodePtr node_key;
  node_key key() { return node_; }
};

A libxml2 implementation could substitute node_data_map for a class that
uses _private to store the data (or to speedup lookup).  However this
would be an optimisation (definitely not a requirement).

The user would be free to set up other associative containers using the
node key.  However they would be responsible for erasing the entries for
deleted nodes themselves. 

-- 
Hamish Mackenzie <[EMAIL PROTECTED]>

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn

Stefan Seefeld wrote

I also like the
  xml::dom::document document = xml::dom::parse_file(argv[1]);
style usage. Also:
  xml::dom::element e = doc.documentElement();
  xml::dom::node n = e;

This is just my preferred style/usage, and does not have to be adopted.

you mean you vote for the 'nodes-are-references' style ?
I mean, for example:
  namespace boost { namespace xml { namespace dom
  {
 typedef DocumentImpl * document;
  }}}
This would, however, allow the implementation to be adapted to when memory 
management needs to be used and to switch between different management 
policies for the various implementations.

Can you elaborate ?
Sure. In the libxml2 versions, nodes are explicitly deleted, and thus do not 
need to be managed, e.g:
  typedef node_impl * node;

If an object requires explicit management, via shared_ptr for example:
  typedef boost::shared_ptr< node_impl > node;
Or, if it is say using a wrapper around an MSXML interface:
  class node: public CComPtr< IXMLDOMNode >{ ... };
The user can thus write:
  boost::xml::dom::node mynode = doc.selectNode( "/h:*[1]" );
without worrying about the underlying representation of 
boost::xml::dom::node.

Regards,
Reece
_
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Stefan Seefeld

Hamish Mackenzie wrote:

Yes, I can see that, for the xml node types. But for that we don't even
need anything but a single 'node_proxy' class (with a 'type()' method
returning an enum).


True.

Though it might be useful in the next layer of the interface. If you end
up with another proxy type that has lots of
switch( raw_node_.some_node_property() )
{

}
I use a very similar system in an HTML editor where the entity name is
used to lookup the handler.
really the entire name ? Hmm. I'm thinking of the DOM spec(s), and
domain specific extensions. There typically new node types are
elements, i.e. they would (logically) all derive from dom::element.
(SVGElement, say).
Hmm, I'm not sure yet how to 'extend' dom::nodes towards these DOM
types. But it seems inheritance is not the right solution, especially
if we don't control the node's construction, i.e. we can't just plug
in a 'SVG node factory'.
Anyways, that's another layer...:-)
Maybe we do need to allow the use of _private in higher level layers.
I don't think that's a good idea, in particular because it's to tightly
coupled to a particular implementation. We could allow a 'payload' for
nodes (possibly a template parameter), but we could as well do the 
association between a node and its template in an external lookup
mechanism, which seems to be more clean and drag in the least overhead.

I start to like your node reference class quite a lot... :-)


I am definitely leaning toward "node_reference" rather than "node_proxy"
is that your preferred name too? 
I was talking about your proxy approach as opposed to my node wrapper 
classes that are constructed on the heap.

But yeah, concerning the naming I prefer reference over proxy.

yeah, a parser for asynchronous document creation may be interesting.
But I see that as a somewhat different beast. Simple (local, 
synchronous) document creation from an xml file doesn't need to
go over a stateful parser object.


Its not a high priority but it is nice that libxml2 supports it.  
Agreed. So we could have both, stateless (synchronous) parsers creating
a document in a single call, and statefull asynchronous parsers that
can be bound to event loops or other 'signals'.
Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie

On Fri, 2003-06-13 at 16:11, Stefan Seefeld wrote:
> well, it looks like a mix of things. What you are doing, essentially,
> is wrapping a polymorphic 'do_something' method around a non-C++
> type system, i.e. the real method invocation is done with a 'type()'
> discriminator.

Yes.

> Yes, I can see that, for the xml node types. But for that we don't even
> need anything but a single 'node_proxy' class (with a 'type()' method
> returning an enum).

True.

Though it might be useful in the next layer of the interface. If you end
up with another proxy type that has lots of

switch( raw_node_.some_node_property() )
{

}

I use a very similar system in an HTML editor where the entity name is
used to lookup the handler.  Because the lookup can be slow I cache the
result in the node itself.  We might have to expose _private if we
wanted to do that :-(.

Maybe we do need to allow the use of _private in higher level layers.

If so how about something like this?...

template< typename Value_Type, ... >
class document;

template< typename Value_Type, ... >
class node_reference
{
public:
  void value( Value_Type * value )
  {
node_->_private = value;
  }
  Value_Type * value() const
  {
return static_cast< Value_Type * >( node_->_private );
  }
};

Or some such???

> I think it is a good thing not to own them, but the semantics should
> be clear.
> 
> >>yes, and you could even make that an 'element_proxy' as you know that
> >>parent nodes are always elements. However, with a flat set of (libxml2)
> >>nodes that wouldn't work any more, so runtime polymorphism would be
> >>lost. Well, may be there is no need for it either. I have to think over
> >>that...
> > 
> > 
> > True and that would fit in nicely with the code I outlined above
> > 
> > node.first_child().do_something( 0 );
> > 
> > Would go through the polymorphic lookup and
> > 
> > node.parent().do_something( 0 );
> > 
> > Would call the same code but without the lookup overhead.
> 
> indeed, though, on further thinking, nodes themselfs don't do anything,
> so we could as well keep this polymorphism outside the node class, and
> let nodes only provide their type as an enum.
> 
> I start to like your node reference class quite a lot... :-)

I am definitely leaning toward "node_reference" rather than "node_proxy"
is that your preferred name too? 

> >>>parse_stream would indeed be even better.  As I recall there are
> >>>functions in libxml2 that allow you to write to the parser as well.
> >>
> >>erm, that's even more confusing, I think. A parser should remain
> >>just a parser, i.e. something that extracts tokens from an input stream.
> > 
> > 
> > It is still just a parser (but works as a state machine).  Check out
> > xmlCreatePushParserCtxt and xmlParseChunk.
> > 
> > Say you need to receive lots of xml files over the internet and parse
> > them all at once.  You could use a thread per connection and have the
> > parsers read from the stream but that would require lots of threads.
> 
> Are you suggesting that all the different xml files should be merged
> into a single dom document ?

No many separate documents each needing there own parser and thread.

> > With the "push" interface you can use async io to read from the sockets
> > and then write the data to the parsers as you get it.  Because the state
> > of the parse is not stored on the stack you do not need a separate
> > thread for each parser.
> 
> yeah, a parser for asynchronous document creation may be interesting.
> But I see that as a somewhat different beast. Simple (local, 
> synchronous) document creation from an xml file doesn't need to
> go over a stateful parser object.

Its not a high priority but it is nice that libxml2 supports it.  

-- 
Hamish Mackenzie <[EMAIL PROTECTED]>

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Stefan Seefeld

Reece Dunn wrote:

Any node can define namespaces:


  
 ...
 ...
 ...
  

so namespace lookup would propagate along the parents of each node until 
a matching namespace is found. This would mean that each node must store 
a (smart) pointer to a namespace mapping, in order to facilitate lookup 
(done during node construction). The node will also have a pointer to a 
namespace information object that looks something like:

class namespaceInfo
{
  std::string url;
  std::string name;
};
Have I got this wrong?
no, it's correct...on a conceptual level.

The point I'm trying to make is that libxml2 does all this for me
already. The node itself doesn't store all the active namespaces, as
that would be a horrible waste. You have to query them (and therefor
you'll access the document and parent nodes).
For example, if a node is moved from one place to another, its context
will change and therefor some necessary adjustments have to be made
notably concerning the namespaces. A look into the libxml2 source code
convinced me that it's quite a tricky job...
I also like the
  xml::dom::document document = xml::dom::parse_file(argv[1]);
style usage. Also:
  xml::dom::element e = doc.documentElement();
  xml::dom::node n = e;
This is just my preferred style/usage, and does not have to be adopted. 
you mean you vote for the 'nodes-are-references' style ?

This would, however, allow the implementation to be adapted to when 
memory management needs to be used and to switch between different 
management policies for the various implementations.
Can you elaborate ?

Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn

Reece Dunn wrote:

   ... ... ...   
This should have been (with a default namespace):


 
...
...
...
 

Regards,
Reece
_
Stay in touch with absent friends - get MSN Messenger 
http://www.msn.co.uk/messenger

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Reece Dunn

Hamish Mackenzie wrote:

How about this for reading an input stream...

xml::dom::document doc( parse_stream( std::cin.rdbuf() ) );
I like this facility. You could also make it accept an istream like this:

class parse_stream
{
  public:
 parse_stream( istream & is ): sb( is.rdbuf()){...}
 ...
};
or something similar, making the above the simpler:
  xml::dom::document doc( parse_stream( std::cin ));
-

Stefan Seefeld wrote:

Yes, but is that a problem ? Of course it has to be written in bold 
strokes:
"Don't delete a document while operating on its content !", but I think the 
main idea to get across is that nodes *can only* exist in the context
of a document. That's not only because of memory ownership issues, but also 
for a
variety of other contextual data associated with a node, such as 
namespaces.
Any node can define namespaces:


  
 ...
 ...
 ...
  

so namespace lookup would propagate along the parents of each node until a 
matching namespace is found. This would mean that each node must store a 
(smart) pointer to a namespace mapping, in order to facilitate lookup (done 
during node construction). The node will also have a pointer to a namespace 
information object that looks something like:

class namespaceInfo
{
  std::string url;
  std::string name;
};
Have I got this wrong?

-

I also like the
  xml::dom::document document = xml::dom::parse_file(argv[1]);
style usage. Also:
  xml::dom::element e = doc.documentElement();
  xml::dom::node n = e;
This is just my preferred style/usage, and does not have to be adopted. This 
would, however, allow the implementation to be adapted to when memory 
management needs to be used and to switch between different management 
policies for the various implementations.

_
Tired of 56k? Get a FREE BT Broadband connection 
http://www.msn.co.uk/specials/btbroadband

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Stefan Seefeld

Hamish Mackenzie wrote:
On Fri, 2003-06-13 at 14:11, Stefan Seefeld wrote:

Hmm, I see your point. Well, that would be possible, but that way you
are unable to make nodes polymorphic. Neither with respect to the basic
node types (Element, Attribute, Text, CData, etc.) nor later when 
implementing real DOM support on top of it.


How about this (I forget the name of this pattern but its in the GOF
book)
well, it looks like a mix of things. What you are doing, essentially,
is wrapping a polymorphic 'do_something' method around a non-C++
type system, i.e. the real method invocation is done with a 'type()'
discriminator.
Yes, I can see that, for the xml node types. But for that we don't even
need anything but a single 'node_proxy' class (with a 'type()' method
returning an enum).
yeah, looks interesting, and even more thin than my wrapper. However,
the thinner the wrapper gets the greater is indeed the danger of having
the whole design tied to a particular implementation, as William pointed
out earlier. I don't think that this is a problem with my wrapper lib,
but with your implementation you get dangerously close...:-)


Making it thicker won't make it any easier to apply to other parsers. 
Especially if you rely on the existence of something like _private.
Indeed, and I think this hits the nail on the head: we still only
provide ownership semantics implicitly by delegating to the
implementation.
I think it is crucial to define the right ownership semantics, and
then make sure it is implementable with different libs. In your case
you explicitely operate on node references ('proxies'), while I use
'raw pointers'. We both don't own the real nodes.
I think it is a good thing not to own them, but the semantics should
be clear.
yes, and you could even make that an 'element_proxy' as you know that
parent nodes are always elements. However, with a flat set of (libxml2)
nodes that wouldn't work any more, so runtime polymorphism would be
lost. Well, may be there is no need for it either. I have to think over
that...


True and that would fit in nicely with the code I outlined above

node.first_child().do_something( 0 );

Would go through the polymorphic lookup and

node.parent().do_something( 0 );

Would call the same code but without the lookup overhead.
indeed, though, on further thinking, nodes themselfs don't do anything,
so we could as well keep this polymorphism outside the node class, and
let nodes only provide their type as an enum.
I start to like your node reference class quite a lot... :-)

parse_stream would indeed be even better.  As I recall there are
functions in libxml2 that allow you to write to the parser as well.
erm, that's even more confusing, I think. A parser should remain
just a parser, i.e. something that extracts tokens from an input stream.


It is still just a parser (but works as a state machine).  Check out
xmlCreatePushParserCtxt and xmlParseChunk.
Say you need to receive lots of xml files over the internet and parse
them all at once.  You could use a thread per connection and have the
parsers read from the stream but that would require lots of threads.
Are you suggesting that all the different xml files should be merged
into a single dom document ?
With the "push" interface you can use async io to read from the sockets
and then write the data to the parsers as you get it.  Because the state
of the parse is not stored on the stack you do not need a separate
thread for each parser.
yeah, a parser for asynchronous document creation may be interesting.
But I see that as a somewhat different beast. Simple (local, 
synchronous) document creation from an xml file doesn't need to
go over a stateful parser object.

Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie

On Fri, 2003-06-13 at 14:11, Stefan Seefeld wrote:
> Hmm, I see your point. Well, that would be possible, but that way you
> are unable to make nodes polymorphic. Neither with respect to the basic
> node types (Element, Attribute, Text, CData, etc.) nor later when 
> implementing real DOM support on top of it.

How about this (I forget the name of this pattern but its in the GOF
book)

class node_type_handler
{
public:
  virtual void do_something( node_proxy & node, int params_go_here )
= 0;
  static node_type_handler * handler( int node_type );
};

class element_node_type_handler : public node_type_handler
{
public:
  virtual void do_something( node_proxy & node, int params_go_here )
  {
element_proxy( node ).do_something( params_go_here );
  } 
};

static node_type_handler * node_type_handler::handler( int node_type )
{ 
  static node_type_handler * handlers[ node_type ] =
  {
new element_node_type_handler(),
new attribute_node_type_handler()
// ...
  };
  
  return handlers[ node_type ];
}

class node_proxy
{
public:
  void do_something( int params_go_here )
  {
node_type_handler::handler( type() )->
  do_something( *this, params_go_here );
  }
};

class element_proxy : public node_proxy
{
public:
  explicit element_proxy( const node_proxy & node );
  void do_something( int params_go_here )
  {
// Actually do_something
  }
};

> > I have attached the wrappers I have written.  They do not cover much of
> > libxml2 (just what I needed at the time).  Feel free to borrow as much
> > or as little from it as you like.
> 
> yeah, looks interesting, and even more thin than my wrapper. However,
> the thinner the wrapper gets the greater is indeed the danger of having
> the whole design tied to a particular implementation, as William pointed
> out earlier. I don't think that this is a problem with my wrapper lib,
> but with your implementation you get dangerously close...:-)

Making it thicker won't make it any easier to apply to other parsers. 
Especially if you rely on the existence of something like _private.  If
the wrapper is a good tight wrapper for libxml2 then if something needs
changing to be portable then we can wrap the wrapper with a more
portability layer.

> > Looking up this node's parent node is thus simply 
> > 
> >>static_cast(this->my_impl->parent->_private);
> > 
> > 
> > If there was a parent lookup in node_proxy it would be
> > 
> > class node_proxy
> > {
> > public:
> >   node_proxy parent() { return node_proxy( node_->parent ); }
> > ...
> > private:
> >   xmlNodePtr node_;
> > };
> 
> yes, and you could even make that an 'element_proxy' as you know that
> parent nodes are always elements. However, with a flat set of (libxml2)
> nodes that wouldn't work any more, so runtime polymorphism would be
> lost. Well, may be there is no need for it either. I have to think over
> that...

True and that would fit in nicely with the code I outlined above

node.first_child().do_something( 0 );

Would go through the polymorphic lookup and

node.parent().do_something( 0 );

Would call the same code but without the lookup overhead.

> well, but you could also make it such that
> 
> xml::dom::document doc = xml::dom::parse_file("a.xml");
> 
> works with parse_file being a function. That would mean the document is
> copied, but then following your philosophy xml::dom::document could be
> a proxy, too, so copying could be cheap...

I don't think xml::dom::document should be a proxy as I see it as the
container and owner of all the nodes.  But if I remember correctly the
syntax above will work if parse_file is a class and document has a
constructor that takes it.

> I really don't like the idea of 'parse_file' being an object (whichs
> state being a potentially already parsed document). It's unintuive.
> 
> > parse_stream would indeed be even better.  As I recall there are
> > functions in libxml2 that allow you to write to the parser as well.
> 
> erm, that's even more confusing, I think. A parser should remain
> just a parser, i.e. something that extracts tokens from an input stream.

It is still just a parser (but works as a state machine).  Check out
xmlCreatePushParserCtxt and xmlParseChunk.

Say you need to receive lots of xml files over the internet and parse
them all at once.  You could use a thread per connection and have the
parsers read from the stream but that would require lots of threads.

With the "push" interface you can use async io to read from the sockets
and then write the data to the parsers as you get it.  Because the state
of the parse is not stored on the stack you do not need a separate
thread for each parser.

-- 
Hamish Mackenzie

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Hamish Mackenzie

On Fri, 2003-06-13 at 12:14, Peter Dimov wrote:
> Hamish Mackenzie wrote:
> >
> > 3) Why dom::basic_document::clone?  Why not have the copy constructor
> > and assignment operator should do a deep copy of the document?  This
> > is consistent with other containers. If you want to stick with clone
> > return
> > an auto_ptr and and derive basic_document from boost::noncopyable.
> 
> Whether clone is appropriate depends on the object model that we choose for
> Document and Node. One option is the Java-ish
> 

Only shallow copy if it is a pointer, iterator or reference.

So you would have to call them document_ptr, node_ptr, etc.  To make it
clear they are pointers.

> A safer Node alternative is
> 
> struct Node
> {
> xmlNodePtr impl_; // or however it's spelled
> 
> shared_ptr pi_; // keep Document alive
> };

But it won't prevent someone removing the root node from the document
and thereby invalidating all the document's nodes.  So it is a false
sense of security.

I can't see an easy way to implement a "node" type, but that's ok
because we don't need to

What we need is a node_iterator which returns a node_proxy when
dereferenced (see the files attached to my second post).  node_proxy is
the reference_type of the container there would be no value_type
(perhaps node_reference would be a better name).

> Reference semantics are convenient when passing and returning Documents
> to/from functions. Documents can be deep copied with clone().
> 
> Another option is to drop the reference semantics. A Document can be
> noncopyable with clone(), mandating the use of auto_ptr, or it can have deep
> copy semantics.

Which is what I meant when I said...
> > If you want to stick with clone return
> > an auto_ptr and and derive basic_document from boost::noncopyable

But you have to ask yourself how would you feel if std::vector worked
this way?

> All of these solutions have their pros and cons, but other things being
> equal I tend towards the Java model. Deep copy seems inappropriate for a
> Document since it is a very expensive operation that's better given an
> explicit name. (I just found a bug in my code where I accidentally passed an
> expensive data structure by value; performance went downhill, I was
> stumped.)

It might be expensive for large documents (probably order N) but so is
std::vector's copy constructor.

How would you have felt if you had used an object believing it to deep
copy only to find out it didn't and as a result all the customer records
in your database ended up pointing to the same address value? 
shared_ptr< document > is easy to spot.

Shallow copy is for iterator, pointer and reference types only! It is
dangerous and should be written on the tin in bold print.

-- 
Hamish Mackenzie

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Stefan Seefeld

Hamish Mackenzie wrote:

That way I'v got a pointer from the C struct to the C++ wrapper
(_private), as well as a pointer from the C++ wrapper to the C struct
(my_impl).


Would a thin proxy object not be a better way to go?  More in keeping
with the "you don't pay for what you don't use" philosophy of C++.
Hmm, I see your point. Well, that would be possible, but that way you
are unable to make nodes polymorphic. Neither with respect to the basic
node types (Element, Attribute, Text, CData, etc.) nor later when 
implementing real DOM support on top of it.

I have attached the wrappers I have written.  They do not cover much of
libxml2 (just what I needed at the time).  Feel free to borrow as much
or as little from it as you like.
yeah, looks interesting, and even more thin than my wrapper. However,
the thinner the wrapper gets the greater is indeed the danger of having
the whole design tied to a particular implementation, as William pointed
out earlier. I don't think that this is a problem with my wrapper lib,
but with your implementation you get dangerously close...:-)
If you look in node_iterator.h you can see that it uses a proxy
containing just a pointer to the libxml2 node.
indeed.

Looking up this node's parent node is thus simply 

static_cast(this->my_impl->parent->_private);


If there was a parent lookup in node_proxy it would be

class node_proxy
{
public:
  node_proxy parent() { return node_proxy( node_->parent ); }
...
private:
  xmlNodePtr node_;
};
yes, and you could even make that an 'element_proxy' as you know that
parent nodes are always elements. However, with a flat set of (libxml2)
nodes that wouldn't work any more, so runtime polymorphism would be
lost. Well, may be there is no need for it either. I have to think over
that...
making 'parse_file' a class suggests it is carrying some data/state.
What would that be ? I'm thinking of 'parse_file' as a stateless
factory, i.e. a function returning a newly created document.


Sorry I should have included the ... bit

class parse_file
{
public:
  parse_file( const std::string & f ) : file_name_ {}
private:
  const std::string & file_name_;
  
  template< ... >
  friend class ::boost::xml::dom::document;
};

Put that together with the constructor...

template<...>
class basic_document
{
public:
  basic_document( const parse_file & f )
  {
// does what parse_file function does now
  }
};
And you can write code that goes

xml::dom::document doc( xml::dom::parse_file( "a.xml" ) );

No auto_ptr needed.  It's not in my wrappers as I only thought of it
while I was reading through yours.
well, but you could also make it such that

xml::dom::document doc = xml::dom::parse_file("a.xml");

works with parse_file being a function. That would mean the document is
copied, but then following your philosophy xml::dom::document could be
a proxy, too, so copying could be cheap...
I really don't like the idea of 'parse_file' being an object (whichs
state being a potentially already parsed document). It's unintuive.
parse_stream would indeed be even better.  As I recall there are
functions in libxml2 that allow you to write to the parser as well.
erm, that's even more confusing, I think. A parser should remain
just a parser, i.e. something that extracts tokens from an input stream.
Best regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-13 Thread Peter Dimov

Hamish Mackenzie wrote:
>
> 3) Why dom::basic_document::clone?  Why not have the copy constructor
> and assignment operator should do a deep copy of the document?  This
> is consistent with other containers. If you want to stick with clone
> return
> an auto_ptr and and derive basic_document from boost::noncopyable.

Whether clone is appropriate depends on the object model that we choose for
Document and Node. One option is the Java-ish

Document d = parse_file("test.xml");
Node n = d.root();

with Document and Node having reference semantics. Internally Document could
be

struct Document
{
shared_ptr pi_;
};

and Node could be

struct Node
{
xmlNodePtr impl_; // or however it's spelled
};

A safer Node alternative is

struct Node
{
xmlNodePtr impl_; // or however it's spelled

shared_ptr pi_; // keep Document alive
};

Reference semantics are convenient when passing and returning Documents
to/from functions. Documents can be deep copied with clone().

Another option is to drop the reference semantics. A Document can be
noncopyable with clone(), mandating the use of auto_ptr, or it can have deep
copy semantics.

All of these solutions have their pros and cons, but other things being
equal I tend towards the Java model. Deep copy seems inappropriate for a
Document since it is a very expensive operation that's better given an
explicit name. (I just found a bug in my code where I accidentally passed an
expensive data structure by value; performance went downhill, I was
stumped.)

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Hamish Mackenzie

On Thu, 2003-06-12 at 22:12, Stefan Seefeld wrote: 
> I'm wrapping libxml2. The structs provided by libxml2 all carry a
> '_private' member, precisely because it's a good way for extensions
> such as language wrapping.
> libxml2 itself calls callbacks whenever it allocates instances of these
> structs, and I allocate my C++ wrappers in these callbacks, and let the
> _private member point to it.
> That way I'v got a pointer from the C struct to the C++ wrapper
> (_private), as well as a pointer from the C++ wrapper to the C struct
> (my_impl).

Would a thin proxy object not be a better way to go?  More in keeping
with the "you don't pay for what you don't use" philosophy of C++.

I have attached the wrappers I have written.  They do not cover much of
libxml2 (just what I needed at the time).  Feel free to borrow as much
or as little from it as you like.

If you look in node_iterator.h you can see that it uses a proxy
containing just a pointer to the libxml2 node.

Looking up this node's parent node is thus simply 
> static_cast(this->my_impl->parent->_private);

If there was a parent lookup in node_proxy it would be

class node_proxy
{
public:
  node_proxy parent() { return node_proxy( node_->parent ); }
...
private:
  xmlNodePtr node_;
};

> > Here are some suggestions...
> > 
> > 1) parse_file could be
> > 
> > class parse_file
> > {
> > public:
> >   parse_file( const std::string & ) : {}
> >   ...
> > };
> 
> making 'parse_file' a class suggests it is carrying some data/state.
> What would that be ? I'm thinking of 'parse_file' as a stateless
> factory, i.e. a function returning a newly created document.

Sorry I should have included the ... bit

class parse_file
{
public:
  parse_file( const std::string & f ) : file_name_ {}
private:
  const std::string & file_name_;
  
  template< ... >
  friend class ::boost::xml::dom::document;
};

Put that together with the constructor...

template<...>
class basic_document
{
public:
  basic_document( const parse_file & f )
  {
// does what parse_file function does now
  }
};

And you can write code that goes

xml::dom::document doc( xml::dom::parse_file( "a.xml" ) );

No auto_ptr needed.  It's not in my wrappers as I only thought of it
while I was reading through yours.


> > 2) How about parse_string (based on xmlParseMemory)?
> 
> hmm, while that would be possible, I think it's more C++'ish to
> provide document extraction from a std::streambuf, which could be
> a string_buf or any other buffer implementation. (Note that I wouldn't
> use std::iostreams as that would suggest that formatted extraction is
> possible, which would only work on ascii, not on unicode content.

parse_stream would indeed be even better.  As I recall there are
functions in libxml2 that allow you to write to the parser as well.

How about this for reading an input stream...

xml::dom::document doc( parse_stream( std::cin.rdbuf() ) );

and if you want to write to to the parser (for instance if the data is
coming from a series of asynchronous read operations).

class document
{
public:
  std::streambuf & parser();
};

xml::dom::document doc;
doc.parser().write( buffer, buffer_size );

> > 4) How about something like
> > 
> > class basic_document
> > {
> > public:
> >   ...
> > 
> >   typedef node_iterator iterator;
> >   typedef const_node_iterator const_iterator;
> > 
> >   iterator begin()
> > { return iterator( ptr_->children ); }
> >   const_iterator begin() const
> > { return const_iterator( ptr_->children ); }
> >   iterator end() { return iterator(); }
> >   const_iterator end() const { return const_iterator(); }
> > 
> >   iterator root()
> > { return iterator( xmlDocGetRootElement( ptr_ ) ); }
> >   const_iterator root() const
> > { return const_iterator( xmlDocGetRootElement( ptr_ ) ); }
> > };
> 
> I don't understand: are you suggesting an iterator that traverses
> the whole tree (as opposed the children of a single node) ?

No just the children.  If figure node.begin() and node.end() be fore
iteration over the child nodes. 

-- 
Hamish Mackenzie <[EMAIL PROTECTED]>


xml_stuff.tar.gz
Description: application/compressed-tar
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Stefan Seefeld

Hamish Mackenzie wrote:
Looks good.  I have only looked at the dom stuff so far.  Why are you
storing information in _private?  What goes in there that could not be
extracted when the nodes are accessed?  It seems like a lot of
unnecessary overhead.
I'm wrapping libxml2. The structs provided by libxml2 all carry a
'_private' member, precisely because it's a good way for extensions
such as language wrapping.
libxml2 itself calls callbacks whenever it allocates instances of these
structs, and I allocate my C++ wrappers in these callbacks, and let the
_private member point to it.
That way I'v got a pointer from the C struct to the C++ wrapper
(_private), as well as a pointer from the C++ wrapper to the C struct
(my_impl).
Looking up this  node's parent node is thus simply
static_cast(this->my_impl->parent->_private);

Here are some suggestions...

1) parse_file could be

class parse_file
{
public:
  parse_file( const std::string & ) : {}
  ...
};
making 'parse_file' a class suggests it is carrying some data/state.
What would that be ? I'm thinking of 'parse_file' as a stateless
factory, i.e. a function returning a newly created document.
2) How about parse_string (based on xmlParseMemory)?
hmm, while that would be possible, I think it's more C++'ish to
provide document extraction from a std::streambuf, which could be
a string_buf or any other buffer implementation. (Note that I wouldn't
use std::iostreams as that would suggest that formatted extraction is
possible, which would only work on ascii, not on unicode content.
3) Why dom::basic_document::clone?  Why not have the copy constructor
and assignment operator should do a deep copy of the document?  This is
consistent with other containers. If you want to stick with clone return
an auto_ptr and and derive basic_document from boost::noncopyable.
fair enough.

4) How about something like

class basic_document
{
public:
  ...
  typedef node_iterator iterator;
  typedef const_node_iterator const_iterator;
  iterator begin()
{ return iterator( ptr_->children ); }
  const_iterator begin() const
{ return const_iterator( ptr_->children ); }
  iterator end() { return iterator(); }
  const_iterator end() const { return const_iterator(); }
  iterator root()
{ return iterator( xmlDocGetRootElement( ptr_ ) ); }
  const_iterator root() const
{ return const_iterator( xmlDocGetRootElement( ptr_ ) ); }
};
I don't understand: are you suggesting an iterator that traverses
the whole tree (as opposed the children of a single node) ?
While that would be possible, I don't think it would actually
be useful. I'v written a Visitor which I use to look up specific
nodes. As the node 'type system' is known without C++ RTTI that
doesn't require double dispatching though...
5) basic_xpath and basic_xpath_result should be derived from
boost::noncopyable
ok

6) Leading _ chars are a no no (reserved for compiler implementors).  I
know _private is defined in libxml2, but what you see below is not...
ok
Thanks for the feedback !
Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Hamish Mackenzie

Looks good.  I have only looked at the dom stuff so far.  Why are you
storing information in _private?  What goes in there that could not be
extracted when the nodes are accessed?  It seems like a lot of
unnecessary overhead.

Here are some suggestions...

1) parse_file could be

class parse_file
{
public:
  parse_file( const std::string & ) : {}
  ...
};

template<...>
class basic_document
{
public:
  basic_document( const parse_file & );
}

2) How about parse_string (based on xmlParseMemory)?

3) Why dom::basic_document::clone?  Why not have the copy constructor
and assignment operator should do a deep copy of the document?  This is
consistent with other containers. If you want to stick with clone return
an auto_ptr and and derive basic_document from boost::noncopyable.

4) How about something like

class basic_document
{
public:
  ...

  typedef node_iterator iterator;
  typedef const_node_iterator const_iterator;

  iterator begin()
{ return iterator( ptr_->children ); }
  const_iterator begin() const
{ return const_iterator( ptr_->children ); }
  iterator end() { return iterator(); }
  const_iterator end() const { return const_iterator(); }

  iterator root()
{ return iterator( xmlDocGetRootElement( ptr_ ) ); }
  const_iterator root() const
{ return const_iterator( xmlDocGetRootElement( ptr_ ) ); }
};

5) basic_xpath and basic_xpath_result should be derived from
boost::noncopyable

6) Leading _ chars are a no no (reserved for compiler implementors).  I
know _private is defined in libxml2, but what you see below is not...

$ grep ' _' *
basic_attribute.h:#ifndef _dom_basic_attribute_h
basic_attribute.h:#define _dom_basic_attribute_h
basic_comment.h:#ifndef _dom_basic_comment_h
basic_comment.h:#define _dom_basic_comment_h
basic_document.h:#ifndef _dom_basic_document_h
basic_document.h:#define _dom_basic_document_h
basic_dtd.h:#ifndef _dom_basic_dtd_h
basic_dtd.h:#define _dom_basic_dtd_h
basic_element.h:#ifndef _dom_basic_element_h
basic_element.h:#define _dom_basic_element_h
basic_node.h:#ifndef _dom_basic_node_h
basic_node.h:#define _dom_basic_node_h
basic_node.h:  basic_node_iterator(impl *current = 0) :
_current(current) {}
basic_node.h:  bool operator == (self i) { return _current ==
i._current;}
basic_node.h:  void increment() { _current = _current->next;}
basic_node.h:  void decrement() { _current = _current->prev;}
basic_node.h:  basic_node_const_iterator(const impl *current = 0) :
_current(current) {}
basic_node.h:  bool operator == (self i) { return _current ==
i._current;}
basic_node.h:  void increment() { _current = _current->next;}
basic_node.h:  void decrement() { _current = _current->prev;}
basic_pi.h:#ifndef _dom_basic_pi_h
basic_pi.h:#define _dom_basic_pi_h
basic_text.h:#ifndef _dom_basic_text_h
basic_text.h:#define _dom_basic_text_h
basic_traversal.h:#ifndef _dom_basic_traversal_h
basic_traversal.h:#define _dom_basic_traversal_h

Hamish

On Thu, 2003-06-12 at 03:45, Stefan Seefeld wrote: 
> hi there,
> 
> following some discussion we had some weeks ago,
> I'd like to invite everybody to review xml++.tgz at
> 
> http://groups.yahoo.com/group/boost/files/xml/
> 
> It's a DOM-like and a SAX-like API currently implemented
> on top of libxml2 (http://www.xmlsoft.org).
> 
> What it provides:
> 
> * parsing of xml files and creation of a document tree
> * manipulation of document tree, i.e. insertion and
>deletion of nodes
> * node iteration, search (xpath based)
> * document output to a (xml) file
> 
> * event driven xml file parsing (sax)
> 
> To be added:
> 
> * validation (dtd, schema, etc.)
> * ?
> 
> Is there any interest in this library evolving
> into a boost::xml library ? If so, what needs to change,
> what needs to be added / removed ?
> 
> Regards,
>   Stefan
> 
> ___
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost







-- 







Hamish Mackenzie <[EMAIL PROTECTED]>

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Stefan Seefeld

Peter Dimov wrote:

Looks reasonable, but we don't want the architecture of the backend to
affect the interface.
right. So what is would be reasonable semantics to expect from a dom API
? May be I'v just got used to libxml2's way of life, but I think it is
a good choice. Nodes are owned by their parents, so you can do
dom::element *child = parent->add_child("info");

And calling

dom::element::iterator i = parent->find(child);
parent->erase_child(i);
will invalidate 'child'. I don't know of any way to make that more safe
while still being efficient.
There is also the problem that the user can be left
with an invalid pointer after the document has been deleted.
Yes, but is that a problem ? Of course it has to be written in bold 
strokes: "Don't delete a document while operating on its content !",
but I think the main idea to get across is that nodes *can only* exist
in the context of a document. That's not only because of memory 
ownership issues, but also for a variety of other contextual data
associated with a node, such as namespaces.

I had a long discussion with the libxml2 author about ownership
semantics and he convinced me that the current way is the best tradeoff
between simplicity/ease of use and efficiency.
Since the DOM is a tree and has no cycles, it should be possible to get
fairly close to the Java interface using strict ownership or shared_ptr
underneath. In the libxml2 case every Node would need to keep the whole
Document alive but this may not be necessary given a different backend.
I don't understand that. The document owns its nodes, so letting nodes
reference the document would create loops, no ?
Of course if this isn't practical a quick fix would be to return a
shared_ptr from parse_file.
yeah, that's unrelated. But why not std::auto_ptr then ?

Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Peter Dimov

Stefan Seefeld wrote:
> Peter Dimov wrote:
>>
>> I think that there is considerable interest in a boost::xml library.
>> But...
>>
>> Document *document = Document::parse_file(argv[1]);
>>
>> ... I don't believe that a raw pointer based interface is acceptable.
>>
>> xml::dom::document document = xml::dom::parse_file(argv[1]);
>>
>> looks much better.
>
> Good catch. However, it looks worse than it actually is :-) :
>
> The memory management for nodes is entirely handled by the backend
> (libxml2), i.e. nodes are always created and deleted by their parents.
> Constructors and destructors are protected.
>
> The 'Document' class is the only one that is owned directly by the
> user, and thus has to be deleted.

Looks reasonable, but we don't want the architecture of the backend to
affect the interface. There is also the problem that the user can be left
with an invalid pointer after the document has been deleted.

Since the DOM is a tree and has no cycles, it should be possible to get
fairly close to the Java interface using strict ownership or shared_ptr
underneath. In the libxml2 case every Node would need to keep the whole
Document alive but this may not be necessary given a different backend.

Of course if this isn't practical a quick fix would be to return a
shared_ptr from parse_file.

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Stefan Seefeld

Peter Dimov wrote:
Stefan Seefeld wrote:

Is there any interest in this library evolving
into a boost::xml library ? If so, what needs to change,
what needs to be added / removed ?


I think that there is considerable interest in a boost::xml library. But...

Document *document = Document::parse_file(argv[1]);

... I don't believe that a raw pointer based interface is acceptable.

xml::dom::document document = xml::dom::parse_file(argv[1]);

looks much better.
Good catch. However, it looks worse than it actually is :-) :

The memory management for nodes is entirely handled by the backend
(libxml2), i.e. nodes are always created and deleted by their parents.
Constructors and destructors are protected.
The 'Document' class is the only one that is owned directly by the
user, and thus has to be deleted.
BTW why is basic_document<>::write_file virtual but
basic_document<>::clone isn't?
hmm, good question. I don't see any need for it to be virtual, I
wouldn't expect anybody to derive from Document.
Regards,
Stefan
___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

2003-06-12 Thread Peter Dimov

Stefan Seefeld wrote:
>
> Is there any interest in this library evolving
> into a boost::xml library ? If so, what needs to change,
> what needs to be added / removed ?

I think that there is considerable interest in a boost::xml library. But...

Document *document = Document::parse_file(argv[1]);

... I don't believe that a raw pointer based interface is acceptable.

xml::dom::document document = xml::dom::parse_file(argv[1]);

looks much better. BTW why is basic_document<>::write_file virtual but
basic_document<>::clone isn't?

The SAX part looks OK to me.

___
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

Re: [boost] API Review request: XML APIs for C++

22 matches

Site Navigation

Mail list logo

Footer information