subject:"\[HACKERS\] xpath processing brain dead"

Re: [HACKERS] xpath processing brain dead

2009-03-20 Thread Andrew Dunstan




Andrew Dunstan wrote:



Hannu Krosing wrote:

Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
pass full documents to Xpath ?
At least this is my reading of Xpath standard.

  


I think that's possibly overstating it., unless I have missed 
something (W3 standards are sometimes not much more clear than the SQL 
standards ;-( )


For instance, there's this, that implies at least that the tree might 
not be a document:


   A / at the beginning of a path expression is an abbreviation for
   the initial step fn:root(self::node()) treat as document-node()/
   (however, if the / is the entire path expression, the trailing /
   is omitted from the expansion.) The effect of this initial step is
   to begin the path at the root node of the tree that contains the
   context node. If the context item is not a node, a type error is
   raised [err:XPTY0020]. At evaluation time, if the root node above
   the context node is not a document node, a dynamic error is raised
   [err:XPDY0050].

The problem is that we certainly do have to provide a context node 
(the standard is clear about that), and unless we want to convert a 
non-document to a node-set as James suggested and then apply the xpath 
expression to each node in the node-set, we have no way of sanely 
specifying the context node.




No-one has come up with an answer to this, so I propose to remove the 
hackery. That leaves the question of what to do when the xml is not a 
well formed document ... raise an error?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-20 Thread James Pye


On Mar 20, 2009, at 8:56 AM, Andrew Dunstan wrote:

Andrew Dunstan wrote:

  A / at the beginning of a path expression is an abbreviation for
  the initial step fn:root(self::node()) treat as document-node()/
  (however, if the / is the entire path expression, the trailing  
/

  is omitted from the expansion.) The effect of this initial step is
  to begin the path at the root node of the tree that contains the
  context node. If the context item is not a node, a type error is
  raised [err:XPTY0020]. At evaluation time, if the root node above
  the context node is not a document node, a dynamic error is raised
  [err:XPDY0050].

The problem is that we certainly do have to provide a context node  
(the standard is clear about that), and unless we want to convert a  
non-document to a node-set as James suggested and then apply the  
xpath expression to each node in the node-set, we have no way of  
sanely specifying the context node.


libxml2 only supports xpath1. Why are you referencing xpath20? And, if  
SQL-XML requires an xpath20 conforming xpath() function, we have  
bigger problems than '/x' + xpath_str. ;)


Although, this is not terribly important to me as:


No-one has come up with an answer to this,


I don't feel fragment()/node-set() is a good idea from a usability  
standpoint alone. I only mentioned it because that's how I've always  
worked with fragments in the xslt1 world.. Mere curiosity drove most  
of the remaining interest I had in it.



so I propose to remove the hackery.


I think this would probably be best for the core xpath() function.

However, it may be wise to relocate the munging functionality into  
another function so users don't have invent their own when they feel  
so inclined to work with fragments.



raise an error?


+1, xpath requires a well-formed document

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Peter Eisentraut


Andrew Dunstan wrote:
can you point me at any call in libxml2 which will evaluate an xpath 
expression in the context of a nodeset instead of a document? Quite 
apart from anything else, xpath requires there to be a (single) context 
node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set 
that node to the document node, but what would it be for a node-set or a 
fragment? If we can't get over that hurdle we're screwed in pursuing 
your line of thought.


Which may hint at the fact that running xpath on content fragments is 
ill-defined to begin with?!?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Simon Riggs


On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote:

 I think the XML type needs to conform to the SQL/XML spec. However, we
 are trying to apply XPath, which has a different data model, to that 
 type - hence the impedance mismatch.
 
 I think that the best we can do (for 8.4, having fixed 8.3 as best we 
 can without adversely changing behaviour) is to throw the
 responsibility 
 for ensuring that the XML passed to the function is an XML document
 back on the programmer. Anything else, especially any mangling of the
 XPath 
 expression, presents a very real danger of breaking on correct input.

Can we provide a single function to bridge the gap between fragment and
document? It will be clearer to do this than to see various forms of
appending/munging, even if that function is a simple wrapper around an
append.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Andrew Dunstan




Peter Eisentraut wrote:

Andrew Dunstan wrote:
can you point me at any call in libxml2 which will evaluate an xpath 
expression in the context of a nodeset instead of a document? Quite 
apart from anything else, xpath requires there to be a (single) 
context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, 
we set that node to the document node, but what would it be for a 
node-set or a fragment? If we can't get over that hurdle we're 
screwed in pursuing your line of thought.


Which may hint at the fact that running xpath on content fragments is 
ill-defined to begin with?!?




Right. But that's no excuse for what we have been doing, which was 
demonstrably providing false results on good input.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Andrew Dunstan




Simon Riggs wrote:

On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote:

  

I think the XML type needs to conform to the SQL/XML spec. However, we
are trying to apply XPath, which has a different data model, to that 
type - hence the impedance mismatch.


I think that the best we can do (for 8.4, having fixed 8.3 as best we 
can without adversely changing behaviour) is to throw the
responsibility 
for ensuring that the XML passed to the function is an XML document

back on the programmer. Anything else, especially any mangling of the
XPath 
expression, presents a very real danger of breaking on correct input.



Can we provide a single function to bridge the gap between fragment and
document? It will be clearer to do this than to see various forms of
appending/munging, even if that function is a simple wrapper around an
append.

  


I have no objection to providing an *extra* function that explicitly 
wraps non-documents and prefixes the xpath expression in that case, and 
is documented to have limitations. But I don't think we can provide a 
single function that always does the right thing, especially when that 
is so ill-defined in the case of fragments.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Hannu Krosing

On Mon, 2009-03-02 at 07:54 -0500, Andrew Dunstan wrote:
 
 Simon Riggs wrote:
  On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote:
 

  I think the XML type needs to conform to the SQL/XML spec. However, we
  are trying to apply XPath, which has a different data model, to that 
  type - hence the impedance mismatch.
 
  I think that the best we can do (for 8.4, having fixed 8.3 as best we 
  can without adversely changing behaviour) is to throw the
  responsibility 
  for ensuring that the XML passed to the function is an XML document
  back on the programmer. Anything else, especially any mangling of the
  XPath 
  expression, presents a very real danger of breaking on correct input.
  
 
  Can we provide a single function to bridge the gap between fragment and
  document? It will be clearer to do this than to see various forms of
  appending/munging, even if that function is a simple wrapper around an
  append.
 

 
 I have no objection to providing an *extra* function that explicitly 
 wraps non-documents and prefixes the xpath expression in that case, and 
 is documented to have limitations. But I don't think we can provide a 
 single function that always does the right thing, especially when that 
 is so ill-defined in the case of fragments.

Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
pass full documents to Xpath ? 

At least this is my reading of Xpath standard.

 cheers
 
 andrew
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Peter Eisentraut


Hannu Krosing wrote:

Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
pass full documents to Xpath ? 


At least this is my reading of Xpath standard.


It is easy to read the XPath standard that way, because the concept of 
fragments is not defined outside of SQL/XML, and is therefore unknown to 
the XPath standard.  The question at hand is rather whether we can 
usefully adapt it.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Andrew Dunstan




Hannu Krosing wrote:

Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
pass full documents to Xpath ? 


At least this is my reading of Xpath standard.

  


I think that's possibly overstating it., unless I have missed something 
(W3 standards are sometimes not much more clear than the SQL standards ;-( )


For instance, there's this, that implies at least that the tree might 
not be a document:


   A / at the beginning of a path expression is an abbreviation for
   the initial step fn:root(self::node()) treat as document-node()/
   (however, if the / is the entire path expression, the trailing /
   is omitted from the expansion.) The effect of this initial step is
   to begin the path at the root node of the tree that contains the
   context node. If the context item is not a node, a type error is
   raised [err:XPTY0020]. At evaluation time, if the root node above
   the context node is not a document node, a dynamic error is raised
   [err:XPDY0050].

The problem is that we certainly do have to provide a context node (the 
standard is clear about that), and unless we want to convert a 
non-document to a node-set as James suggested and then apply the xpath 
expression to each node in the node-set, we have no way of sanely 
specifying the context node.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-02 Thread Hannu Krosing

On Mon, 2009-03-02 at 15:25 +0200, Peter Eisentraut wrote:
 Hannu Krosing wrote:
  Is it just that in you _can't_ use Xpath on fragments, and you _need_ to
  pass full documents to Xpath ? 
  
  At least this is my reading of Xpath standard.
 
 It is easy to read the XPath standard that way, because the concept of 
 fragments is not defined outside of SQL/XML, and is therefore unknown to 
 the XPath standard. 

How is the opposite - Does SQL/XML specify Xpath usage for XML(SEQUENCE)
and XML(CONTENT) ?

  The question at hand is rather whether we can 
 usefully adapt it.

This sounds like trying to adapt integer arithmetic to
lists-of-integers.

Even for simple things like addition, there are several ways of doing it

[1,2,3] + [1,1,1] = [1,2,3,1,1,1]
[1,2,3] + [1,1,1] = [2,3,4]
[1,2,3] + [1,1,1] = [[1,2,3],[1,1,1]]

all seem possible and logical


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-01 Thread Andrew Dunstan




Hannu Krosing wrote:
Some of the functions, including some specified in the standard, produce 
fragments. That's why we have the 'IS DOCUMENT' test.



But then you could use xmlfragments as the functions return type, no ?

Does tha standard require that the same field type must store both
documents and fragments ?

  


Yes, the standard very explicitly provides for one XML type which need 
not be an XML document. We have no choice about that.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-01 Thread Hannu Krosing

On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote:
 
 Hannu Krosing wrote:
  Some of the functions, including some specified in the standard, produce 
  fragments. That's why we have the 'IS DOCUMENT' test.
  
 
  But then you could use xmlfragments as the functions return type, no ?
 
  Does tha standard require that the same field type must store both
  documents and fragments ?
 

 
 Yes, the standard very explicitly provides for one XML type which need 
 not be an XML document. We have no choice about that.

What is it then ? A sequence of XML elements ?

Which standard does postgreSQL XML type need to confirm to - general XML
DB, Xpath or some other XML ?


XML Path Language (XPath) Version 1.0
--

starts with this Abstract:

XPath is a language for addressing parts of an XML document, designed
to be used by both XSLT and XPointer.

So I think that using Xpath on anything else than XML document is
invalid and results are undefined.

XML 1.0 and XML 1.1
---

Also, both XML 1.0 and XML 1.1 standards are about a thing called an
XML document, so I don't see anything there, which would make us
accept anything else as being XML.


And SQL 2008, Part 14: XML-Related Specifications (SQL/XML)
---
Says:

SQL defines a predefined data type named by the following key word:
XML.
...
The data types XML(DOCUMENT(UNTYPED)), XML(DOCUMENT(ANY)),
XML(DOCUMENT(XMLSCHEMA)), XML(CONTENT(UNTYPED)), XML(CONTENT(ANY)),
XML(CONTENT(XMLSCHEMA)), and XML(SEQUENCE) are referred to as the XML
types. Values of XML types are called XML values.


So while the type itself could be called XML, there are several
subtypes, like Document, Content and Sequence

Could the XML(SEQUENCE) better be represented as an array 
of xml documents aka. xml[] , and maybe CONTENT could be done as 
xmlelement[] where xmlelement can be any single XML element, including
CDATA and plain text ?

 cheers
 
 andrew
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-03-01 Thread Andrew Dunstan




Hannu Krosing wrote:

On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote:
  

Hannu Krosing wrote:

Some of the functions, including some specified in the standard, produce 
fragments. That's why we have the 'IS DOCUMENT' test.



But then you could use xmlfragments as the functions return type, no ?

Does tha standard require that the same field type must store both
documents and fragments ?

  
  
Yes, the standard very explicitly provides for one XML type which need 
not be an XML document. We have no choice about that.



What is it then ? A sequence of XML elements ?
  


In its typically oblique way, the 2003 draft says this:

   NOTE 1 — An XML root information item is similar to an XML document
   information item, with the following modifications: an
   XML root information item does not have a [document element]
   property, a [base URI] property, a [character encoding scheme] property,
   or an [all declarations processed] property; and the [children]
   property permits more than one XML element information item.

   An SQL/XML information item is either an XML root information item
   or one of the following (defined in
   Subclause 3.1.3, “Definitions provided in Part 14”): an XML
   attribute information item, an XML character
   information item, an XML comment information item, an XML document
   type declaration information item,
   an XML element information item, an XML namespace information item,
   an XML notation information item,
   an XML processing instruction information item, an XML unexpanded
   entity reference information item, or
   an XML unparsed entity information item.

   An XML value is either the null value, or a collection of SQL/XML
   information items, consisting of exactly
   one XML root information item, plus any other SQL/XML information
   items that can be reached recursively
   by traversing the properties of the SQL/XML information items.



Which standard does postgreSQL XML type need to confirm to - general XML
DB, Xpath or some other XML ?
  


I think the XML type needs to conform to the SQL/XML spec. However, we 
are trying to apply XPath, which has a different data model, to that 
type - hence the impedance mismatch.


I think that the best we can do (for 8.4, having fixed 8.3 as best we 
can without adversely changing behaviour) is to throw the responsibility 
for ensuring that the XML passed to the function is an XML document back 
on the programmer. Anything else, especially any mangling of the XPath 
expression, presents a very real danger of breaking on correct input.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-28 Thread James Pye


sigh.. I got curious. :P

On Feb 27, 2009, at 7:19 PM, James Pye wrote:

Well, that or force the user to call it explicitly.



Attached is the patch that I used to get the results below..
This is just a proof of concept, so it's quite lacking. Notably, it  
doesn't even try to identify well-formed documents.



Purpose/idea being, give the user access to the poorly-formed document  
as a node-set via the fragment function instead of mangling the  
xpath and xml:



postgres=# SELECT xpath('fragment()//*', 'blehfoo/bar/'::xml);
 xpath
---
 {}
(1 row)

postgres=# SELECT xpath('fragment()//*', 'blehmehsub//mehfoo/ 
bar/'::xml);

  xpath
--
 {sub/}
(1 row)

postgres=# SELECT xpath('fragment()/*', 'blehmehsub//mehfoo/ 
bar/'::xml);

  xpath
--
 {sub/}
(1 row)

postgres=# SELECT xpath('fragment()', 'blehmehsub//mehfoo/bar/ 
'::xml);

 xpath

 {bleh,meh
   sub/
 /meh,foo/,bar/}
(1 row)

postgres=# SELECT xpath('/*', 'blehmehsub//mehfoo/bar/'::xml);
 xpath
---
 {}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()=foo]/@att',
 'blehmehsub//mehfoo att=sometin/bar/'::xml);
   xpath
---
 {sometin}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()=meh]/*',
'blehmehsub//mehfoo att=sometin/bar/'::xml);
  xpath
--
 {sub/}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()=meh or local- 
name()=bar]', 'blehmehsub//mehfoo att=sometin/bar/'::xml);

  xpath
-
 {meh
   sub/
 /meh,bar/}
(1 row)

postgres=# SELECT xpath('fragment()[local-name()=bar]',
'blehmehsub//mehfoo att=sometin/bar/'::xml);
  xpath
--
 {bar/}
(1 row)

postgres=# SELECT xpath('fragment()[...@*]',
'blehmehsub//mehothertextfoo att=sometin/bar/'::xml);
   xpath

 {foo att=\sometin\/}
(1 row)


Can't say that I've ever been thrilled with using node-sets, but  
*shrug*.


I'm sleepy now..



git.diff
Description: Binary data


xmlc_file.diff
Description: Binary data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-28 Thread Andrew Dunstan




James Pye wrote:

sigh.. I got curious. :P

On Feb 27, 2009, at 7:19 PM, James Pye wrote:

Well, that or force the user to call it explicitly.



Attached is the patch that I used to get the results below..
This is just a proof of concept, so it's quite lacking. Notably, it 
doesn't even try to identify well-formed documents.



This is entirely out of the question for 8.3, as it's a significant 
change of behaviour.


I'd also want to see this usage blessed by some xpath guru ... I'm not 
sure it meets the standard's requirements, but I could be wrong.


And it seems to me much better to provide the facility as a separate 
function e.g. xpath_fragment() (if at all) rather than by adding on a 
non-standard xpath function, but that's just a first impression.


Nice piece of work, though.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-28 Thread James Pye


On Feb 28, 2009, at 7:53 AM, Andrew Dunstan wrote:
This is entirely out of the question for 8.3, as it's a significant  
change of behaviour.


Yep. Even with implicit prefixing, the semantics are very different.


What got me thinking about it was this:

 http://archives.postgresql.org/pgsql-bugs/2008-07/msg00058.php

If it's desirable to avoid prefixing, what options remain?

(At least I find it desirable to avoid prefixing =)


I'd also want to see this usage blessed by some xpath guru ... I'm  
not sure it meets the standard's requirements, but I could be wrong.


Oh, the context node question you raised? I think it would be easy to  
expect that the standard is expecting a well-formed document to query  
against in the first place, so I *do* think it's a very valid concern.


http://www.w3.org/TR/xpath
http://www.w3.org/TR/xpath#data-model
http://www.w3.org/TR/xpath#infoset

Curious, if we constructed an actual document fragment node from the  
node list and set it as the document's root, would that be enough to  
satisfy any requirements? It does appear to talk about nodes quite  
generally.



In the current case, we're shaving the corners of the square peg so it  
will fit in the round hole. In fragment()'s case, it seems we would be  
trying to circumvent the round hole altogether..


I don't really like either way. :P

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-28 Thread Andrew Dunstan




I wrote:




I'll test again on some longer fragments since you don't seem convinced.




I set up a test with a much larger XML fragment - over 1Mb - basically 
it's the English source of the SVN Turtle book.


The result is that the extra parsing cost is still pretty much unmeasurable:

regression=# select avg(length(foo)) from (select 
repeat(xpath('//title',src)::text,i) as foo from xpathtest4, 
generate_series(1,100) as i ) x;
avg 
--

1309869.
(1 row)

Without fix:
Time: 5695.930 ms
Time: 4855.837 ms
Time: 5453.044 ms

With fix:
Time: 5232.810 ms
Time: 5272.375 ms
Time: 5528.434 ms


So I'm going to go ahead and commit this change for 8.3, with Tom's 
suggested ammendments.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-28 Thread Hannu Krosing

On Fri, 2009-02-27 at 22:55 -0500, Andrew Dunstan wrote:
 
 Hannu Krosing wrote:


  Some of the functions, including some specified in the standard, produce 
  fragments. That's why we have the 'IS DOCUMENT' test.
  
 
  But then you could use xmlfragments as the functions return type, no ?
 

 
 
 Not in the case of xpath, no.

single xml document is a sub-case of xmlforest, so xmlforest should be
allowed as return type, no ?

 There is a very complete standard for the Xpath data model, 
 and we need to adhere to it.

Is declaring a single all-covering xml data type the best or even the
only way to adhere ?

 cheers
 
 andrew
 
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Hannu Krosing

On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote:
  What I have proposed for 8.3 should not break a single case that currently
  behaves usefully. If anyone has a counter-example please show it.
 
  What I have proposed for 8.4 possibly would break current useful behaviour
  (FSVO useful), but should be done anyway on correctness grounds.
 
 I dunno, aren't XML document fragments sort of a pretty common case?

I'd rather argue that xml datatype should not even accept anything but
complete xml documents. Same as int field does not accept int[].

Or maybe we rather need separate xmldocument and xmlforest/xmlfragments
types in next releases and leave the base xml as it is but deprecated
due to inability to fix it without breaking backwards compatibility.


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Andrew Dunstan




Hannu Krosing wrote:

On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote:
  

What I have proposed for 8.3 should not break a single case that currently
behaves usefully. If anyone has a counter-example please show it.

What I have proposed for 8.4 possibly would break current useful behaviour
(FSVO useful), but should be done anyway on correctness grounds.
  

I dunno, aren't XML document fragments sort of a pretty common case?



I'd rather argue that xml datatype should not even accept anything but
complete xml documents. Same as int field does not accept int[].

Or maybe we rather need separate xmldocument and xmlforest/xmlfragments
types in next releases and leave the base xml as it is but deprecated
due to inability to fix it without breaking backwards compatibility.

  


Some of the functions, including some specified in the standard, produce 
fragments. That's why we have the 'IS DOCUMENT' test.


You can also force validation as a document by saying  SET XML OPTION 
DOCUMENT;


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Andrew Dunstan




Andrew Dunstan wrote:



Tom Lane wrote:

Hmm, does this proposal require adding a test of well-formed-ness to
a code path that doesn't currently have one?  If so, is that likely
to contribute any noticeable slowdown?

I can't offhand see an objection to this other than possible performance
impact.

   
  


Yeah, testing the well-formedness might cost a bit. We could 
short-circuit the test by applying some comparatively fast heuristic 
tests.


Or we could decide that we'll just fix the xpath prefix part for 8.3 
and keep the wrapping. I don't want to spend a huge effort on fixing 
something I regard as fundamentally broken.


I'll do some tests to see what the cost of extra xml parsing might be.



The extra cost appears to be fairly negligible.

regression=# create table xpathtest3 as select xmlconcat(xmlelement(name 
unique1, unique1), '\n\t',xmlelement(name unique2, unique2), 
'\n\t',xmlelement(name two, two), '\n\t',xmlelement(name four, 
four),'\n\t',xmlelement(name ten,ten),'\n\t',xmlelement(name 
twenty,twenty),'\n\t',xmlelement(name 
hundred,hundred),'\n\t',xmlelement(name 
thousand,thousand),'\n\t',xmlelement(name 
twothusand,twothousand),'\n\t',xmlelement(name 
fivethous,fivethous),'\n\t',xmlelement(name 
tenthous,tenthous),'\n\t',xmlelement(name 
odd,odd),'\n\t',xmlelement(name even,even),'\n\t',xmlelement(name 
stringu1,stringu1),'\n\t',xmlelement(name 
stringu2,stringu2),'\n\t',xmlelement(name string4,string4),'\n') from tenk1;


regression=# select count(*) from (select 
xpath('//two[text()=0]/text()',xmlconcat) as elems from xpathtest3, 
generate_series(1,10) ) x ;
count 


10
(1 row)

Time: 27.722 ms


Proposed patch for 8.3 attached. (Note: it only reparses in the 
non-document case)


cheers

andrew


Index: src/backend/utils/adt/xml.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v
retrieving revision 1.68.2.6
diff -c -r1.68.2.6 xml.c
*** src/backend/utils/adt/xml.c	10 Nov 2008 18:02:27 -	1.68.2.6
--- src/backend/utils/adt/xml.c	27 Feb 2009 20:59:28 -
***
*** 3320,3360 
  
  	xml_init();
  
! 	/*
! 	 * To handle both documents and fragments, regardless of the fact whether
! 	 * the XML datum has a single root (XML well-formedness), we wrap the XML
! 	 * datum in a dummy element (x.../x) and extend the XPath expression
! 	 * accordingly.  To do it, throw away the XML prolog, if any.
! 	 */
! 	if (len = 5 
! 		xmlStrncmp((xmlChar *) datastr, (xmlChar *) ?xml, 5) == 0)
! 	{
! 		i = 5;
! 		while (i  len 
! 			   !(datastr[i - 1] == '?'  datastr[i] == ''))
! 			i++;
! 
! 		if (i == len)
! 			xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 		could not parse XML data);
  
! 		++i;
  
! 		datastr += i;
! 		len -= i;
! 	}
! 
! 	string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar));
! 	memcpy(string, x, 3);
! 	memcpy(string + 3, datastr, len);
! 	memcpy(string + 3 + len, /x, 5);
! 	len += 7;
! 
! 	xpath_expr = (xmlChar *) palloc((xpath_len + 3) * sizeof(xmlChar));
! 	memcpy(xpath_expr, /x, 2);
! 	memcpy(xpath_expr + 2, VARDATA(xpath_expr_text), xpath_len);
! 	xpath_expr[xpath_len + 2] = '\0';
! 	xpath_len += 2;
  
  	xmlInitParser();
  
--- 3320,3332 
  
  	xml_init();
  
! 	string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar));
  
! 	xpath_expr = (xmlChar *) palloc((xpath_len + 5) * sizeof(xmlChar));
  
! 	memcpy (string, datastr, len);
! 	string[len] = '\0';
! 	
  
  	xmlInitParser();
  
***
*** 3367,3375 
  		xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
  	could not allocate parser context);
  	doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 	if (doc == NULL)
! 		xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT,
! 	could not parse XML data);
  	xpathctx = xmlXPathNewContext(doc);
  	if (xpathctx == NULL)
  		xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
--- 3339,3410 
  		xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY,
  	could not allocate parser context);
  	doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
! 
! 	if (doc == NULL || xmlDocGetRootElement(doc) == NULL)
! 	{
! 
! 		/*
! 		 * In case we have a fragment rather than a well-formed XML document,
! 		 * which has a single root (XML well-formedness), we try again after
! 		 * transforming the xml by stripping away the XML prolog, if any, and
! 		 * wrapping the remainder in a dummy element (x.../x),
! 		 * and later extending the XPath expression accordingly. 
! 		 */
! 		if (len = 5 
! 			xmlStrncmp((xmlChar *) datastr, (xmlChar *) ?xml, 5) == 0)
! 		{
! 			i = 5;
! 			while (i  len 
!    !(datastr[i - 1] == '?'  datastr[i] == ''))
! i++;
! 			
! 			if (i == len)
! xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR,
! 			could not parse XML data);
! 			
! 			++i;
! 			
! 			datastr += i;
! 			len -= i;
! 		}
! 
! 		memcpy(string, x, 3);
! 		memcpy(string + 3, datastr, len);
! 		memcpy(string + 3 + len, /x, 5);
! 		len += 7;
! 
! 		doc =

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 I'll do some tests to see what the cost of extra xml parsing might be.

 The extra cost appears to be fairly negligible.

Uh, you didn't actually show a comparison of before and after?
What it looks like to me is that this approach is free or nearly so for
well-formed documents, but doubles the parsing cost for forests.
Which is likely to annoy anyone who's really depending on the
capability.

Also,

 ! if (*VARDATA(xpath_expr_text) == '/')

This is risking a core dump if the xpath expr is of zero length.  You
need something like

if (xpath_len  0  *VARDATA(xpath_expr_text) == '/')

It would also be a good idea if the allocation of string and xpath_expr
had a comment about why it's allocating extra space (something like see
hacks below for use of this extra space would be sufficient).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

I'll do some tests to see what the cost of extra xml parsing might be.
  


  

The extra cost appears to be fairly negligible.



Uh, you didn't actually show a comparison of before and after?
What it looks like to me is that this approach is free or nearly so for
well-formed documents, but doubles the parsing cost for forests.
Which is likely to annoy anyone who's really depending on the
capability.
  



The difference is lost in the noise.

Without fix:

Time: 24.619 ms
Time: 24.245 ms
Time: 25.179 ms

With fix:

Time: 24.084 ms
Time: 21.980 ms
Time: 23.765 ms

The test is done on 10,000 short fragments each parsed 10 times (or 20 
times with the fix).


I'll test again on some longer fragments since you don't seem convinced.


Also,

  

!   if (*VARDATA(xpath_expr_text) == '/')



This is risking a core dump if the xpath expr is of zero length.  You
need something like

if (xpath_len  0  *VARDATA(xpath_expr_text) == '/')
  


OK.


It would also be a good idea if the allocation of string and xpath_expr
had a comment about why it's allocating extra space (something like see
hacks below for use of this extra space would be sufficient).
  



OK.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Hannu Krosing

On Fri, 2009-02-27 at 16:37 -0500, Andrew Dunstan wrote:
 
 Hannu Krosing wrote:
  On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote:

  What I have proposed for 8.3 should not break a single case that currently
  behaves usefully. If anyone has a counter-example please show it.
 
  What I have proposed for 8.4 possibly would break current useful 
  behaviour
  (FSVO useful), but should be done anyway on correctness grounds.

  I dunno, aren't XML document fragments sort of a pretty common case?
  
 
  I'd rather argue that xml datatype should not even accept anything but
  complete xml documents. Same as int field does not accept int[].
 
  Or maybe we rather need separate xmldocument and xmlforest/xmlfragments
  types in next releases and leave the base xml as it is but deprecated
  due to inability to fix it without breaking backwards compatibility.
 

 
 Some of the functions, including some specified in the standard, produce 
 fragments. That's why we have the 'IS DOCUMENT' test.

But then you could use xmlfragments as the functions return type, no ?

Does tha standard require that the same field type must store both
documents and fragments ?

 You can also force validation as a document by saying  SET XML OPTION 
 DOCUMENT;
 
 cheers
 
 andrew
-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread James Pye


On Feb 26, 2009, at 7:03 PM, Andrew Dunstan wrote:
can you point me at any call in libxml2 which will evaluate an xpath  
expression in the context of a nodeset instead of a document?


No, I can't. node-sets are XPath objects not xmlNode objects, so I  
don't think it would be as simple as modifying:


xml.c:xpath() {
   ...
   xpathctx-node = xmlDocGetRootElement(doc);

with the result of xmlXPathNewNodeSet..

[snip other questions]

My *guess* would be that if we were to use a node-set instead, we'd  
still have to prefix the XPath query. In this case, with a function  
call to an xpath extension function that creates the NodeSet from the  
content fragment(s?) of the document created by xml_parse(ie, more or  
less, a re-implementation of exsl:node-set() tailored for our use- 
case). Well, that or force the user to call it explicitly. Possible or  
not--wrt using a content fragment/document as the context node, I find  
this less desirable than the current mangling, so I'm becoming quite  
indifferent. :)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-27 Thread Andrew Dunstan




Hannu Krosing wrote:
  
  
Some of the functions, including some specified in the standard, produce 
fragments. That's why we have the 'IS DOCUMENT' test.



But then you could use xmlfragments as the functions return type, no ?

  



Not in the case of xpath, no.

There is a very complete standard for the Xpath data model, and we need 
to adhere to it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  
For fear of passing an ill formed fragment of xml to the processor, we 
strip the xml declaration if any and surround what's left with 'x and 
'/x' and prepend '/x' to the supposed xpath. This is just horrible.



I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.
  


Well, a few of us do. I guess I took my eye off the ball a bit back when 
we were putting this into 8.3.


  
This whole thing is a mess, and I suspect the only fix for now is to 
undo all the mangling of both the xml and the xpath expression.



I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior.  Whacking semantics around afresh
with each release does not endear us to users.  If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.


  


Honestly, this is a bug, pure and simple. There really can't be an 
argument about that. For the stable branch, we could make the following 
changes that should result in a Pareto improvement (nothing gets worse 
while some things get better):


   * only do the xml transformation if the xml is known not to be be
 well formed
   * if we need to mangle the xpath expression due to doing the xml
 transformation, then unless the xpath expression begins with a
 '/', prepend it with '/x//'. (two slashes corresponds to the
 descendent axis in xpath - in effect it stands for any number of
 descendent elements).


But that's just a holding operation. For 8.4 we should stop this 
nonsense and simply say that it is up to the programmer to ensure that 
the xml passed to the processor is well formed.


The thing that is so very bad about this is that if the programmer *has* 
made sure that the inputs are correct, s/he can still end up with broken 
results. If we're going to try to fix bad inputs, we must make damn sure 
that we don't break on correct inputs as a result. However, I can't off 
hand think of a lightning way to fix bad inputs that doesn't carry some 
danger to good inputs. Until someone comes up with something tolerably 
bulletproof, I suggest that we simply say that it is the programmer's 
responsibility.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Peter Eisentraut


Andrew Dunstan wrote:
For fear of passing an ill formed fragment of xml to the processor, we 
strip the xml declaration if any and surround what's left with 'x and 
'/x' and prepend '/x' to the supposed xpath. This is just horrible. It 
will break for every xpath expression that doesn't begin with a '/' and 
probably for many that do.


This whole thing is a mess, and I suspect the only fix for now is to 
undo all the mangling of both the xml and the xpath expression. If the 
programmer passes an ill formed piece of xml to the processor that is 
their lookout, but I think we should ensure that we give back correct 
results on well formed input.


It's not about ill-formed pieces, it is about (well-formed) content 
fragments that are not full documents (exactly one root element). 
libxml2 doesn't support xpath on content fragments.


The tradeoff here is either supporting no xpath at all on content 
fragments or supporting some xpath on both kinds of xml data.  Whoever 
made the initial implementation of this (Nikolay, Cc) decided for the 
latter, but I can't say I'm happy with it either.  I'd be OK with 
changing it, but I haven't had time to get to it, unfortunately.


See also http://wiki.postgresql.org/wiki/XPath_Todo#XPath.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Peter Eisentraut wrote:

Andrew Dunstan wrote:
For fear of passing an ill formed fragment of xml to the processor, 
we strip the xml declaration if any and surround what's left with 
'x and '/x' and prepend '/x' to the supposed xpath. This is just 
horrible. It will break for every xpath expression that doesn't begin 
with a '/' and probably for many that do.


This whole thing is a mess, and I suspect the only fix for now is to 
undo all the mangling of both the xml and the xpath expression. If 
the programmer passes an ill formed piece of xml to the processor 
that is their lookout, but I think we should ensure that we give back 
correct results on well formed input.


It's not about ill-formed pieces, it is about (well-formed) content 
fragments that are not full documents (exactly one root element). 
libxml2 doesn't support xpath on content fragments.


The tradeoff here is either supporting no xpath at all on content 
fragments or supporting some xpath on both kinds of xml data.  Whoever 
made the initial implementation of this (Nikolay, Cc) decided for the 
latter, but I can't say I'm happy with it either.  I'd be OK with 
changing it, but I haven't had time to get to it, unfortunately.


See also http://wiki.postgresql.org/wiki/XPath_Todo#XPath.



I don't think it is our responsibility to remedy the deficiencies of 
libxml2, especially if it involves breaking the processing of valid 
xpath expressions on well formed XML.


If someone can come up with a correct scheme for handling fragments, I'm 
all ears, but otherwise I think a) we should rip this out of 8.4 and b) 
we should try to make 8.3 slightly less broken at least, along the lines 
of my earlier suggestion.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 I don't think it is our responsibility to remedy the deficiencies of 
 libxml2, especially if it involves breaking the processing of valid 
 xpath expressions on well formed XML.

 If someone can come up with a correct scheme for handling fragments, I'm 
 all ears, but otherwise I think a) we should rip this out of 8.4 and b) 
 we should try to make 8.3 slightly less broken at least, along the lines 
 of my earlier suggestion.

I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  
I don't think it is our responsibility to remedy the deficiencies of 
libxml2, especially if it involves breaking the processing of valid 
xpath expressions on well formed XML.



  
If someone can come up with a correct scheme for handling fragments, I'm 
all ears, but otherwise I think a) we should rip this out of 8.4 and b) 
we should try to make 8.3 slightly less broken at least, along the lines 
of my earlier suggestion.



I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.


  


It's never too late to fix a bug. The current behaviour is just plain 
nonsense ... if your xpath expression is 'foo/bar' it ends up searching 
for '/xfoo/bar' which can't possibly be right. Surely we can at least 
fix that.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 I'm okay with removing this for 8.4, but I think it's too late to
 change the behavior of 8.3.

 It's never too late to fix a bug.

When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Tom Lane wrote:


I'm okay with removing this for 8.4, but I think it's too late to
change the behavior of 8.3.
  


  

It's never too late to fix a bug.



When the proposed fix involves breaking cases that used to behave
usefully, you need a much stronger argument than that.
  



What I have proposed for 8.3 should not break a single case that 
currently behaves usefully. If anyone has a counter-example please show it.


What I have proposed for 8.4 possibly would break current useful 
behaviour (FSVO useful), but should be done anyway on correctness grounds.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Robert Haas

 What I have proposed for 8.3 should not break a single case that currently
 behaves usefully. If anyone has a counter-example please show it.

 What I have proposed for 8.4 possibly would break current useful behaviour
 (FSVO useful), but should be done anyway on correctness grounds.

I dunno, aren't XML document fragments sort of a pretty common case?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Robert Haas wrote:

What I have proposed for 8.3 should not break a single case that currently
behaves usefully. If anyone has a counter-example please show it.

What I have proposed for 8.4 possibly would break current useful behaviour
(FSVO useful), but should be done anyway on correctness grounds.



I dunno, aren't XML document fragments sort of a pretty common case?


  


They are. So what? How does that contradict either of the statements 
made above?


Programmers using libxml2 are used to handling this problem. Why must 
postgres alone use a totally brain dead and utterly incorrect wrapping 
to solve a problem that everyone else leaves up to the programmer to handle?


People in this thread are not concentrating on the fact that what we do 
now can break correct input. That makes it an unquestionable bug, 
IMNSHO. There seems to be agreement about what to do for 8.4, so we seem 
to be arguing about what to do for 8.3. Are you *really* arguing that 
prepending the xpath expression with '/x' in all cases should be allowed 
to continue? If you are I can only assume your use of xml/xpath is 
probably fairly light.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Robert Haas

On Thu, Feb 26, 2009 at 3:18 PM, Andrew Dunstan and...@dunslane.net wrote:
 I dunno, aren't XML document fragments sort of a pretty common case?

 They are. So what? How does that contradict either of the statements made
 above?

 Programmers using libxml2 are used to handling this problem. Why must
 postgres alone use a totally brain dead and utterly incorrect wrapping to
 solve a problem that everyone else leaves up to the programmer to handle?

I can't think of any reason, especially not when you put it that way.  :-)

 People in this thread are not concentrating on the fact that what we do now
 can break correct input. That makes it an unquestionable bug, IMNSHO. There
 seems to be agreement about what to do for 8.4, so we seem to be arguing
 about what to do for 8.3. Are you *really* arguing that prepending the xpath
 expression with '/x' in all cases should be allowed to continue? If you are
 I can only assume your use of xml/xpath is probably fairly light.

Mine is very light indeed.  But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch.  The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there.  We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Joshua D. Drake

On Thu, 2009-02-26 at 15:37 -0500, Robert Haas wrote:

 Mine is very light indeed.  But the change you're proposing seems like
 it could CONCEIVABLY break a working application that counts on
 PostgreSQL's particular flavor of misbehavior,

That is what release notes are for.

Joshua D. Drake


-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Robert Haas wrote:

But the change you're proposing seems like
it could CONCEIVABLY break a working application that counts on
PostgreSQL's particular flavor of misbehavior, and therefore it seems
questionable to put that into a stable branch.  The fact that the
application perhaps should not have been written that way to begin
with is neither here nor there.  We don't want people to be afraid of
upgrading to the latest point release when we fix, say, a backend
crash or a data corruption bug.


  


Let me reiterate the changes that I propose, and again challenge you to 
provide a working counter-example if you believe it will break anything 
not currently broken, even cases involving fragments.


First, I propose that we abandon this mangling, if, and only if, the xml 
is in fact a well formed XML document. Since the whole point of the 
mangling is to handle situations where the XML is not a well formed 
document, that seems fairly straight-forward. If this change were to 
upset any user, it must be because they are relying on undisputably 
incorrect results.


Second, I propose that, in the remaining cases, where we do mangle the 
XML, if the xpath expression does not begin with a '/', instead of 
prepending it with '/x/, which can not possibly be correct under any 
circumstance, we prepend it with '/x//' which has some possibility of 
giving correct results.


IOW, these proposals will expand the number of correct results from the 
code, without contributing any new incorrect cases. These changes are 
*very* conservative, as is usual when we fix things on stable branches.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




I wrote:


Second, I propose that, in the remaining cases, where we do mangle the 
XML, if the xpath expression does not begin with a '/', instead of 
prepending it with '/x/, which can not possibly be correct under any 
circumstance, we prepend it with '/x//' which has some possibility of 
giving correct results.



   ^^^
  '/x'


cheers  


andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 First, I propose that we abandon this mangling, if, and only if, the xml 
 is in fact a well formed XML document. Since the whole point of the 
 mangling is to handle situations where the XML is not a well formed 
 document, that seems fairly straight-forward. If this change were to 
 upset any user, it must be because they are relying on undisputably 
 incorrect results.

 Second, I propose that, in the remaining cases, where we do mangle the 
 XML, if the xpath expression does not begin with a '/', instead of 
 prepending it with '/x/, which can not possibly be correct under any 
 circumstance, we prepend it with '/x//' which has some possibility of 
 giving correct results.

Hmm, does this proposal require adding a test of well-formed-ness to
a code path that doesn't currently have one?  If so, is that likely
to contribute any noticeable slowdown?

I can't offhand see an objection to this other than possible performance
impact.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Robert Haas

On Thu, Feb 26, 2009 at 3:54 PM, Andrew Dunstan and...@dunslane.net wrote:
 Robert Haas wrote:
 But the change you're proposing seems like
 it could CONCEIVABLY break a working application that counts on
 PostgreSQL's particular flavor of misbehavior, and therefore it seems
 questionable to put that into a stable branch.  The fact that the
 application perhaps should not have been written that way to begin
 with is neither here nor there.  We don't want people to be afraid of
 upgrading to the latest point release when we fix, say, a backend
 crash or a data corruption bug.

 Let me reiterate the changes that I propose, and again challenge you to
 provide a working counter-example if you believe it will break anything not
 currently broken, even cases involving fragments.

 First, I propose that we abandon this mangling, if, and only if, the xml is
 in fact a well formed XML document. Since the whole point of the mangling is
 to handle situations where the XML is not a well formed document, that seems
 fairly straight-forward. If this change were to upset any user, it must be
 because they are relying on undisputably incorrect results.

 Second, I propose that, in the remaining cases, where we do mangle the XML,
 if the xpath expression does not begin with a '/', instead of prepending it
 with '/x/, which can not possibly be correct under any circumstance, we
 prepend it with '/x//' which has some possibility of giving correct results.

 IOW, these proposals will expand the number of correct results from the
 code, without contributing any new incorrect cases. These changes are *very*
 conservative, as is usual when we fix things on stable branches.

 cheers

 andrew

You are right.

Nuff said,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread James Pye


On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:
It's not about ill-formed pieces, it is about (well-formed) content  
fragments that are not full documents (exactly one root element).  
libxml2 doesn't support xpath on content fragments.


exslt:node-set() to the rescue? Or is that/equivalent functionality  
not easily accessed at the C-level with libxml2?


http://www.exslt.org/exsl/functions/node-set/index.html

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




James Pye wrote:

On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote:
It's not about ill-formed pieces, it is about (well-formed) content 
fragments that are not full documents (exactly one root element). 
libxml2 doesn't support xpath on content fragments.


exslt:node-set() to the rescue? Or is that/equivalent functionality 
not easily accessed at the C-level with libxml2?


http://www.exslt.org/exsl/functions/node-set/index.html


A node-set isn't a document. In any case, this functionality doesn't 
appear to be in libxml2, it's in libxslt according to the reference you 
provided.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread James Pye


On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote:

http://www.exslt.org/exsl/functions/node-set/index.html


A node-set isn't a document.


yes.. :)

I guess what I'm saying is that if:

tinman=# SELECT XML'foo/bar/';
 xml
--
 foo/bar/
(1 row)

is considered to be valid per *SQL-XML*, then it should probably be  
treated as a node-set in the context of xpath, not mangled with  
x.../x..


Certainly, I would expect an implicit node-set() call long before  
wrapping the fragment in x.../x and prefixing my xpath query.


However, I find the validity of the above, XML'foo/bar/', a bit  
disturbing to begin with. :P



In any case, this functionality doesn't appear to be in libxml2,
it's in libxslt according to the reference you provided.


I think that's *just* referencing the list of xslt implementations  
that the extension function is known to be available in.. I doubt that  
means to imply that the function or equivalent functionality is not  
available in libxml2 itself. I'd wager that equivalent functionality  
could be implemented if it weren't directly/already available.. =)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




James Pye wrote:

On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote:

http://www.exslt.org/exsl/functions/node-set/index.html


A node-set isn't a document.


yes.. :)

I guess what I'm saying is that if:

tinman=# SELECT XML'foo/bar/';
 xml
--
 foo/bar/
(1 row)

is considered to be valid per *SQL-XML*, then it should probably be 
treated as a node-set in the context of xpath, not mangled with 
x.../x..


Certainly, I would expect an implicit node-set() call long before 
wrapping the fragment in x.../x and prefixing my xpath query.


However, I find the validity of the above, XML'foo/bar/', a bit 
disturbing to begin with. :P



In any case, this functionality doesn't appear to be in libxml2,
it's in libxslt according to the reference you provided.


I think that's *just* referencing the list of xslt implementations 
that the extension function is known to be available in.. I doubt that 
means to imply that the function or equivalent functionality is not 
available in libxml2 itself. I'd wager that equivalent functionality 
could be implemented if it weren't directly/already available.. =)




James,

can you point me at any call in libxml2 which will evaluate an xpath 
expression in the context of a nodeset instead of a document? Quite 
apart from anything else, xpath requires there to be a (single) context 
node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set 
that node to the document node, but what would it be for a node-set or a 
fragment? If we can't get over that hurdle we're screwed in pursuing 
your line of thought.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-26 Thread Andrew Dunstan




Tom Lane wrote:

Hmm, does this proposal require adding a test of well-formed-ness to
a code path that doesn't currently have one?  If so, is that likely
to contribute any noticeable slowdown?

I can't offhand see an objection to this other than possible performance
impact.


  


Yeah, testing the well-formedness might cost a bit. We could 
short-circuit the test by applying some comparatively fast heuristic tests.


Or we could decide that we'll just fix the xpath prefix part for 8.3 and 
keep the wrapping. I don't want to spend a huge effort on fixing 
something I regard as fundamentally broken.


I'll do some tests to see what the cost of extra xml parsing might be.

cheers

andrew

We

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] xpath processing brain dead

2009-02-25 Thread Andrew Dunstan



Andrew Gierth was just pointing out to me how badly broken our XPath 
processing is.


For fear of passing an ill formed fragment of xml to the processor, we 
strip the xml declaration if any and surround what's left with 'x and 
'/x' and prepend '/x' to the supposed xpath. This is just horrible. It 
will break for every xpath expression that doesn't begin with a '/' and 
probably for many that do.


This whole thing is a mess, and I suspect the only fix for now is to 
undo all the mangling of both the xml and the xpath expression. If the 
programmer passes an ill formed piece of xml to the processor that is 
their lookout, but I think we should ensure that we give back correct 
results on well formed input.


The only good piece of news is that the xpath procedures in contrib/xml2 
don't apparently suffer these faults.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] xpath processing brain dead

2009-02-25 Thread Tom Lane

Andrew Dunstan and...@dunslane.net writes:
 For fear of passing an ill formed fragment of xml to the processor, we 
 strip the xml declaration if any and surround what's left with 'x and 
 '/x' and prepend '/x' to the supposed xpath. This is just horrible.

I seem to recall having complained about that at the time, but I didn't
(and don't) know enough about xpath to do any better.

 This whole thing is a mess, and I suspect the only fix for now is to 
 undo all the mangling of both the xml and the xpath expression.

I don't think we should change the behavior if it's just to arrive at
another less-than-desirable behavior.  Whacking semantics around afresh
with each release does not endear us to users.  If we know how to fix it
right, great; but if we can't then we should keep compatibility with 8.3
until we can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

48 matches

Mail list logo