Re: [HACKERS] xpath processing brain dead
Andrew Dunstan wrote: Hannu Krosing wrote: Is it just that in you _can't_ use Xpath on fragments, and you _need_ to pass full documents to Xpath ? At least this is my reading of Xpath standard. I think that's possibly overstating it., unless I have missed something (W3 standards are sometimes not much more clear than the SQL standards ;-( ) For instance, there's this, that implies at least that the tree might not be a document: A / at the beginning of a path expression is an abbreviation for the initial step fn:root(self::node()) treat as document-node()/ (however, if the / is the entire path expression, the trailing / is omitted from the expansion.) The effect of this initial step is to begin the path at the root node of the tree that contains the context node. If the context item is not a node, a type error is raised [err:XPTY0020]. At evaluation time, if the root node above the context node is not a document node, a dynamic error is raised [err:XPDY0050]. The problem is that we certainly do have to provide a context node (the standard is clear about that), and unless we want to convert a non-document to a node-set as James suggested and then apply the xpath expression to each node in the node-set, we have no way of sanely specifying the context node. No-one has come up with an answer to this, so I propose to remove the hackery. That leaves the question of what to do when the xml is not a well formed document ... raise an error? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Mar 20, 2009, at 8:56 AM, Andrew Dunstan wrote: Andrew Dunstan wrote: A / at the beginning of a path expression is an abbreviation for the initial step fn:root(self::node()) treat as document-node()/ (however, if the / is the entire path expression, the trailing / is omitted from the expansion.) The effect of this initial step is to begin the path at the root node of the tree that contains the context node. If the context item is not a node, a type error is raised [err:XPTY0020]. At evaluation time, if the root node above the context node is not a document node, a dynamic error is raised [err:XPDY0050]. The problem is that we certainly do have to provide a context node (the standard is clear about that), and unless we want to convert a non-document to a node-set as James suggested and then apply the xpath expression to each node in the node-set, we have no way of sanely specifying the context node. libxml2 only supports xpath1. Why are you referencing xpath20? And, if SQL-XML requires an xpath20 conforming xpath() function, we have bigger problems than '/x' + xpath_str. ;) Although, this is not terribly important to me as: No-one has come up with an answer to this, I don't feel fragment()/node-set() is a good idea from a usability standpoint alone. I only mentioned it because that's how I've always worked with fragments in the xslt1 world.. Mere curiosity drove most of the remaining interest I had in it. so I propose to remove the hackery. I think this would probably be best for the core xpath() function. However, it may be wise to relocate the munging functionality into another function so users don't have invent their own when they feel so inclined to work with fragments. raise an error? +1, xpath requires a well-formed document -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan wrote: can you point me at any call in libxml2 which will evaluate an xpath expression in the context of a nodeset instead of a document? Quite apart from anything else, xpath requires there to be a (single) context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set that node to the document node, but what would it be for a node-set or a fragment? If we can't get over that hurdle we're screwed in pursuing your line of thought. Which may hint at the fact that running xpath on content fragments is ill-defined to begin with?!? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote: I think the XML type needs to conform to the SQL/XML spec. However, we are trying to apply XPath, which has a different data model, to that type - hence the impedance mismatch. I think that the best we can do (for 8.4, having fixed 8.3 as best we can without adversely changing behaviour) is to throw the responsibility for ensuring that the XML passed to the function is an XML document back on the programmer. Anything else, especially any mangling of the XPath expression, presents a very real danger of breaking on correct input. Can we provide a single function to bridge the gap between fragment and document? It will be clearer to do this than to see various forms of appending/munging, even if that function is a simple wrapper around an append. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Peter Eisentraut wrote: Andrew Dunstan wrote: can you point me at any call in libxml2 which will evaluate an xpath expression in the context of a nodeset instead of a document? Quite apart from anything else, xpath requires there to be a (single) context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set that node to the document node, but what would it be for a node-set or a fragment? If we can't get over that hurdle we're screwed in pursuing your line of thought. Which may hint at the fact that running xpath on content fragments is ill-defined to begin with?!? Right. But that's no excuse for what we have been doing, which was demonstrably providing false results on good input. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Simon Riggs wrote: On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote: I think the XML type needs to conform to the SQL/XML spec. However, we are trying to apply XPath, which has a different data model, to that type - hence the impedance mismatch. I think that the best we can do (for 8.4, having fixed 8.3 as best we can without adversely changing behaviour) is to throw the responsibility for ensuring that the XML passed to the function is an XML document back on the programmer. Anything else, especially any mangling of the XPath expression, presents a very real danger of breaking on correct input. Can we provide a single function to bridge the gap between fragment and document? It will be clearer to do this than to see various forms of appending/munging, even if that function is a simple wrapper around an append. I have no objection to providing an *extra* function that explicitly wraps non-documents and prefixes the xpath expression in that case, and is documented to have limitations. But I don't think we can provide a single function that always does the right thing, especially when that is so ill-defined in the case of fragments. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Mon, 2009-03-02 at 07:54 -0500, Andrew Dunstan wrote: Simon Riggs wrote: On Sun, 2009-03-01 at 18:22 -0500, Andrew Dunstan wrote: I think the XML type needs to conform to the SQL/XML spec. However, we are trying to apply XPath, which has a different data model, to that type - hence the impedance mismatch. I think that the best we can do (for 8.4, having fixed 8.3 as best we can without adversely changing behaviour) is to throw the responsibility for ensuring that the XML passed to the function is an XML document back on the programmer. Anything else, especially any mangling of the XPath expression, presents a very real danger of breaking on correct input. Can we provide a single function to bridge the gap between fragment and document? It will be clearer to do this than to see various forms of appending/munging, even if that function is a simple wrapper around an append. I have no objection to providing an *extra* function that explicitly wraps non-documents and prefixes the xpath expression in that case, and is documented to have limitations. But I don't think we can provide a single function that always does the right thing, especially when that is so ill-defined in the case of fragments. Is it just that in you _can't_ use Xpath on fragments, and you _need_ to pass full documents to Xpath ? At least this is my reading of Xpath standard. cheers andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Hannu Krosing wrote: Is it just that in you _can't_ use Xpath on fragments, and you _need_ to pass full documents to Xpath ? At least this is my reading of Xpath standard. It is easy to read the XPath standard that way, because the concept of fragments is not defined outside of SQL/XML, and is therefore unknown to the XPath standard. The question at hand is rather whether we can usefully adapt it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Hannu Krosing wrote: Is it just that in you _can't_ use Xpath on fragments, and you _need_ to pass full documents to Xpath ? At least this is my reading of Xpath standard. I think that's possibly overstating it., unless I have missed something (W3 standards are sometimes not much more clear than the SQL standards ;-( ) For instance, there's this, that implies at least that the tree might not be a document: A / at the beginning of a path expression is an abbreviation for the initial step fn:root(self::node()) treat as document-node()/ (however, if the / is the entire path expression, the trailing / is omitted from the expansion.) The effect of this initial step is to begin the path at the root node of the tree that contains the context node. If the context item is not a node, a type error is raised [err:XPTY0020]. At evaluation time, if the root node above the context node is not a document node, a dynamic error is raised [err:XPDY0050]. The problem is that we certainly do have to provide a context node (the standard is clear about that), and unless we want to convert a non-document to a node-set as James suggested and then apply the xpath expression to each node in the node-set, we have no way of sanely specifying the context node. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Mon, 2009-03-02 at 15:25 +0200, Peter Eisentraut wrote: Hannu Krosing wrote: Is it just that in you _can't_ use Xpath on fragments, and you _need_ to pass full documents to Xpath ? At least this is my reading of Xpath standard. It is easy to read the XPath standard that way, because the concept of fragments is not defined outside of SQL/XML, and is therefore unknown to the XPath standard. How is the opposite - Does SQL/XML specify Xpath usage for XML(SEQUENCE) and XML(CONTENT) ? The question at hand is rather whether we can usefully adapt it. This sounds like trying to adapt integer arithmetic to lists-of-integers. Even for simple things like addition, there are several ways of doing it [1,2,3] + [1,1,1] = [1,2,3,1,1,1] [1,2,3] + [1,1,1] = [2,3,4] [1,2,3] + [1,1,1] = [[1,2,3],[1,1,1]] all seem possible and logical -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Hannu Krosing wrote: Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Does tha standard require that the same field type must store both documents and fragments ? Yes, the standard very explicitly provides for one XML type which need not be an XML document. We have no choice about that. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote: Hannu Krosing wrote: Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Does tha standard require that the same field type must store both documents and fragments ? Yes, the standard very explicitly provides for one XML type which need not be an XML document. We have no choice about that. What is it then ? A sequence of XML elements ? Which standard does postgreSQL XML type need to confirm to - general XML DB, Xpath or some other XML ? XML Path Language (XPath) Version 1.0 -- starts with this Abstract: XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. So I think that using Xpath on anything else than XML document is invalid and results are undefined. XML 1.0 and XML 1.1 --- Also, both XML 1.0 and XML 1.1 standards are about a thing called an XML document, so I don't see anything there, which would make us accept anything else as being XML. And SQL 2008, Part 14: XML-Related Specifications (SQL/XML) --- Says: SQL defines a predefined data type named by the following key word: XML. ... The data types XML(DOCUMENT(UNTYPED)), XML(DOCUMENT(ANY)), XML(DOCUMENT(XMLSCHEMA)), XML(CONTENT(UNTYPED)), XML(CONTENT(ANY)), XML(CONTENT(XMLSCHEMA)), and XML(SEQUENCE) are referred to as the XML types. Values of XML types are called XML values. So while the type itself could be called XML, there are several subtypes, like Document, Content and Sequence Could the XML(SEQUENCE) better be represented as an array of xml documents aka. xml[] , and maybe CONTENT could be done as xmlelement[] where xmlelement can be any single XML element, including CDATA and plain text ? cheers andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Hannu Krosing wrote: On Sun, 2009-03-01 at 10:13 -0500, Andrew Dunstan wrote: Hannu Krosing wrote: Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Does tha standard require that the same field type must store both documents and fragments ? Yes, the standard very explicitly provides for one XML type which need not be an XML document. We have no choice about that. What is it then ? A sequence of XML elements ? In its typically oblique way, the 2003 draft says this: NOTE 1 — An XML root information item is similar to an XML document information item, with the following modifications: an XML root information item does not have a [document element] property, a [base URI] property, a [character encoding scheme] property, or an [all declarations processed] property; and the [children] property permits more than one XML element information item. An SQL/XML information item is either an XML root information item or one of the following (defined in Subclause 3.1.3, “Definitions provided in Part 14”): an XML attribute information item, an XML character information item, an XML comment information item, an XML document type declaration information item, an XML element information item, an XML namespace information item, an XML notation information item, an XML processing instruction information item, an XML unexpanded entity reference information item, or an XML unparsed entity information item. An XML value is either the null value, or a collection of SQL/XML information items, consisting of exactly one XML root information item, plus any other SQL/XML information items that can be reached recursively by traversing the properties of the SQL/XML information items. Which standard does postgreSQL XML type need to confirm to - general XML DB, Xpath or some other XML ? I think the XML type needs to conform to the SQL/XML spec. However, we are trying to apply XPath, which has a different data model, to that type - hence the impedance mismatch. I think that the best we can do (for 8.4, having fixed 8.3 as best we can without adversely changing behaviour) is to throw the responsibility for ensuring that the XML passed to the function is an XML document back on the programmer. Anything else, especially any mangling of the XPath expression, presents a very real danger of breaking on correct input. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
sigh.. I got curious. :P On Feb 27, 2009, at 7:19 PM, James Pye wrote: Well, that or force the user to call it explicitly. Attached is the patch that I used to get the results below.. This is just a proof of concept, so it's quite lacking. Notably, it doesn't even try to identify well-formed documents. Purpose/idea being, give the user access to the poorly-formed document as a node-set via the fragment function instead of mangling the xpath and xml: postgres=# SELECT xpath('fragment()//*', 'blehfoo/bar/'::xml); xpath --- {} (1 row) postgres=# SELECT xpath('fragment()//*', 'blehmehsub//mehfoo/ bar/'::xml); xpath -- {sub/} (1 row) postgres=# SELECT xpath('fragment()/*', 'blehmehsub//mehfoo/ bar/'::xml); xpath -- {sub/} (1 row) postgres=# SELECT xpath('fragment()', 'blehmehsub//mehfoo/bar/ '::xml); xpath {bleh,meh sub/ /meh,foo/,bar/} (1 row) postgres=# SELECT xpath('/*', 'blehmehsub//mehfoo/bar/'::xml); xpath --- {} (1 row) postgres=# SELECT xpath('fragment()[local-name()=foo]/@att', 'blehmehsub//mehfoo att=sometin/bar/'::xml); xpath --- {sometin} (1 row) postgres=# SELECT xpath('fragment()[local-name()=meh]/*', 'blehmehsub//mehfoo att=sometin/bar/'::xml); xpath -- {sub/} (1 row) postgres=# SELECT xpath('fragment()[local-name()=meh or local- name()=bar]', 'blehmehsub//mehfoo att=sometin/bar/'::xml); xpath - {meh sub/ /meh,bar/} (1 row) postgres=# SELECT xpath('fragment()[local-name()=bar]', 'blehmehsub//mehfoo att=sometin/bar/'::xml); xpath -- {bar/} (1 row) postgres=# SELECT xpath('fragment()[...@*]', 'blehmehsub//mehothertextfoo att=sometin/bar/'::xml); xpath {foo att=\sometin\/} (1 row) Can't say that I've ever been thrilled with using node-sets, but *shrug*. I'm sleepy now.. git.diff Description: Binary data xmlc_file.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
James Pye wrote: sigh.. I got curious. :P On Feb 27, 2009, at 7:19 PM, James Pye wrote: Well, that or force the user to call it explicitly. Attached is the patch that I used to get the results below.. This is just a proof of concept, so it's quite lacking. Notably, it doesn't even try to identify well-formed documents. This is entirely out of the question for 8.3, as it's a significant change of behaviour. I'd also want to see this usage blessed by some xpath guru ... I'm not sure it meets the standard's requirements, but I could be wrong. And it seems to me much better to provide the facility as a separate function e.g. xpath_fragment() (if at all) rather than by adding on a non-standard xpath function, but that's just a first impression. Nice piece of work, though. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Feb 28, 2009, at 7:53 AM, Andrew Dunstan wrote: This is entirely out of the question for 8.3, as it's a significant change of behaviour. Yep. Even with implicit prefixing, the semantics are very different. What got me thinking about it was this: http://archives.postgresql.org/pgsql-bugs/2008-07/msg00058.php If it's desirable to avoid prefixing, what options remain? (At least I find it desirable to avoid prefixing =) I'd also want to see this usage blessed by some xpath guru ... I'm not sure it meets the standard's requirements, but I could be wrong. Oh, the context node question you raised? I think it would be easy to expect that the standard is expecting a well-formed document to query against in the first place, so I *do* think it's a very valid concern. http://www.w3.org/TR/xpath http://www.w3.org/TR/xpath#data-model http://www.w3.org/TR/xpath#infoset Curious, if we constructed an actual document fragment node from the node list and set it as the document's root, would that be enough to satisfy any requirements? It does appear to talk about nodes quite generally. In the current case, we're shaving the corners of the square peg so it will fit in the round hole. In fragment()'s case, it seems we would be trying to circumvent the round hole altogether.. I don't really like either way. :P -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
I wrote: I'll test again on some longer fragments since you don't seem convinced. I set up a test with a much larger XML fragment - over 1Mb - basically it's the English source of the SVN Turtle book. The result is that the extra parsing cost is still pretty much unmeasurable: regression=# select avg(length(foo)) from (select repeat(xpath('//title',src)::text,i) as foo from xpathtest4, generate_series(1,100) as i ) x; avg -- 1309869. (1 row) Without fix: Time: 5695.930 ms Time: 4855.837 ms Time: 5453.044 ms With fix: Time: 5232.810 ms Time: 5272.375 ms Time: 5528.434 ms So I'm going to go ahead and commit this change for 8.3, with Tom's suggested ammendments. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Fri, 2009-02-27 at 22:55 -0500, Andrew Dunstan wrote: Hannu Krosing wrote: Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Not in the case of xpath, no. single xml document is a sub-case of xmlforest, so xmlforest should be allowed as return type, no ? There is a very complete standard for the Xpath data model, and we need to adhere to it. Is declaring a single all-covering xml data type the best or even the only way to adhere ? cheers andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote: What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current useful behaviour (FSVO useful), but should be done anyway on correctness grounds. I dunno, aren't XML document fragments sort of a pretty common case? I'd rather argue that xml datatype should not even accept anything but complete xml documents. Same as int field does not accept int[]. Or maybe we rather need separate xmldocument and xmlforest/xmlfragments types in next releases and leave the base xml as it is but deprecated due to inability to fix it without breaking backwards compatibility. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Hannu Krosing wrote: On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote: What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current useful behaviour (FSVO useful), but should be done anyway on correctness grounds. I dunno, aren't XML document fragments sort of a pretty common case? I'd rather argue that xml datatype should not even accept anything but complete xml documents. Same as int field does not accept int[]. Or maybe we rather need separate xmldocument and xmlforest/xmlfragments types in next releases and leave the base xml as it is but deprecated due to inability to fix it without breaking backwards compatibility. Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. You can also force validation as a document by saying SET XML OPTION DOCUMENT; cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan wrote: Tom Lane wrote: Hmm, does this proposal require adding a test of well-formed-ness to a code path that doesn't currently have one? If so, is that likely to contribute any noticeable slowdown? I can't offhand see an objection to this other than possible performance impact. Yeah, testing the well-formedness might cost a bit. We could short-circuit the test by applying some comparatively fast heuristic tests. Or we could decide that we'll just fix the xpath prefix part for 8.3 and keep the wrapping. I don't want to spend a huge effort on fixing something I regard as fundamentally broken. I'll do some tests to see what the cost of extra xml parsing might be. The extra cost appears to be fairly negligible. regression=# create table xpathtest3 as select xmlconcat(xmlelement(name unique1, unique1), '\n\t',xmlelement(name unique2, unique2), '\n\t',xmlelement(name two, two), '\n\t',xmlelement(name four, four),'\n\t',xmlelement(name ten,ten),'\n\t',xmlelement(name twenty,twenty),'\n\t',xmlelement(name hundred,hundred),'\n\t',xmlelement(name thousand,thousand),'\n\t',xmlelement(name twothusand,twothousand),'\n\t',xmlelement(name fivethous,fivethous),'\n\t',xmlelement(name tenthous,tenthous),'\n\t',xmlelement(name odd,odd),'\n\t',xmlelement(name even,even),'\n\t',xmlelement(name stringu1,stringu1),'\n\t',xmlelement(name stringu2,stringu2),'\n\t',xmlelement(name string4,string4),'\n') from tenk1; regression=# select count(*) from (select xpath('//two[text()=0]/text()',xmlconcat) as elems from xpathtest3, generate_series(1,10) ) x ; count 10 (1 row) Time: 27.722 ms Proposed patch for 8.3 attached. (Note: it only reparses in the non-document case) cheers andrew Index: src/backend/utils/adt/xml.c === RCS file: /cvsroot/pgsql/src/backend/utils/adt/xml.c,v retrieving revision 1.68.2.6 diff -c -r1.68.2.6 xml.c *** src/backend/utils/adt/xml.c 10 Nov 2008 18:02:27 - 1.68.2.6 --- src/backend/utils/adt/xml.c 27 Feb 2009 20:59:28 - *** *** 3320,3360 xml_init(); ! /* ! * To handle both documents and fragments, regardless of the fact whether ! * the XML datum has a single root (XML well-formedness), we wrap the XML ! * datum in a dummy element (x.../x) and extend the XPath expression ! * accordingly. To do it, throw away the XML prolog, if any. ! */ ! if (len = 5 ! xmlStrncmp((xmlChar *) datastr, (xmlChar *) ?xml, 5) == 0) ! { ! i = 5; ! while (i len ! !(datastr[i - 1] == '?' datastr[i] == '')) ! i++; ! ! if (i == len) ! xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, ! could not parse XML data); ! ++i; ! datastr += i; ! len -= i; ! } ! ! string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar)); ! memcpy(string, x, 3); ! memcpy(string + 3, datastr, len); ! memcpy(string + 3 + len, /x, 5); ! len += 7; ! ! xpath_expr = (xmlChar *) palloc((xpath_len + 3) * sizeof(xmlChar)); ! memcpy(xpath_expr, /x, 2); ! memcpy(xpath_expr + 2, VARDATA(xpath_expr_text), xpath_len); ! xpath_expr[xpath_len + 2] = '\0'; ! xpath_len += 2; xmlInitParser(); --- 3320,3332 xml_init(); ! string = (xmlChar *) palloc((len + 8) * sizeof(xmlChar)); ! xpath_expr = (xmlChar *) palloc((xpath_len + 5) * sizeof(xmlChar)); ! memcpy (string, datastr, len); ! string[len] = '\0'; ! xmlInitParser(); *** *** 3367,3375 xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, could not allocate parser context); doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); ! if (doc == NULL) ! xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT, ! could not parse XML data); xpathctx = xmlXPathNewContext(doc); if (xpathctx == NULL) xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, --- 3339,3410 xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, could not allocate parser context); doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); ! ! if (doc == NULL || xmlDocGetRootElement(doc) == NULL) ! { ! ! /* ! * In case we have a fragment rather than a well-formed XML document, ! * which has a single root (XML well-formedness), we try again after ! * transforming the xml by stripping away the XML prolog, if any, and ! * wrapping the remainder in a dummy element (x.../x), ! * and later extending the XPath expression accordingly. ! */ ! if (len = 5 ! xmlStrncmp((xmlChar *) datastr, (xmlChar *) ?xml, 5) == 0) ! { ! i = 5; ! while (i len ! !(datastr[i - 1] == '?' datastr[i] == '')) ! i++; ! ! if (i == len) ! xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, ! could not parse XML data); ! ! ++i; ! ! datastr += i; ! len -= i; ! } ! ! memcpy(string, x, 3); ! memcpy(string + 3, datastr, len); ! memcpy(string + 3 + len, /x, 5); ! len += 7; ! ! doc =
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan and...@dunslane.net writes: I'll do some tests to see what the cost of extra xml parsing might be. The extra cost appears to be fairly negligible. Uh, you didn't actually show a comparison of before and after? What it looks like to me is that this approach is free or nearly so for well-formed documents, but doubles the parsing cost for forests. Which is likely to annoy anyone who's really depending on the capability. Also, ! if (*VARDATA(xpath_expr_text) == '/') This is risking a core dump if the xpath expr is of zero length. You need something like if (xpath_len 0 *VARDATA(xpath_expr_text) == '/') It would also be a good idea if the allocation of string and xpath_expr had a comment about why it's allocating extra space (something like see hacks below for use of this extra space would be sufficient). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: I'll do some tests to see what the cost of extra xml parsing might be. The extra cost appears to be fairly negligible. Uh, you didn't actually show a comparison of before and after? What it looks like to me is that this approach is free or nearly so for well-formed documents, but doubles the parsing cost for forests. Which is likely to annoy anyone who's really depending on the capability. The difference is lost in the noise. Without fix: Time: 24.619 ms Time: 24.245 ms Time: 25.179 ms With fix: Time: 24.084 ms Time: 21.980 ms Time: 23.765 ms The test is done on 10,000 short fragments each parsed 10 times (or 20 times with the fix). I'll test again on some longer fragments since you don't seem convinced. Also, ! if (*VARDATA(xpath_expr_text) == '/') This is risking a core dump if the xpath expr is of zero length. You need something like if (xpath_len 0 *VARDATA(xpath_expr_text) == '/') OK. It would also be a good idea if the allocation of string and xpath_expr had a comment about why it's allocating extra space (something like see hacks below for use of this extra space would be sufficient). OK. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Fri, 2009-02-27 at 16:37 -0500, Andrew Dunstan wrote: Hannu Krosing wrote: On Thu, 2009-02-26 at 13:51 -0500, Robert Haas wrote: What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current useful behaviour (FSVO useful), but should be done anyway on correctness grounds. I dunno, aren't XML document fragments sort of a pretty common case? I'd rather argue that xml datatype should not even accept anything but complete xml documents. Same as int field does not accept int[]. Or maybe we rather need separate xmldocument and xmlforest/xmlfragments types in next releases and leave the base xml as it is but deprecated due to inability to fix it without breaking backwards compatibility. Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Does tha standard require that the same field type must store both documents and fragments ? You can also force validation as a document by saying SET XML OPTION DOCUMENT; cheers andrew -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Feb 26, 2009, at 7:03 PM, Andrew Dunstan wrote: can you point me at any call in libxml2 which will evaluate an xpath expression in the context of a nodeset instead of a document? No, I can't. node-sets are XPath objects not xmlNode objects, so I don't think it would be as simple as modifying: xml.c:xpath() { ... xpathctx-node = xmlDocGetRootElement(doc); with the result of xmlXPathNewNodeSet.. [snip other questions] My *guess* would be that if we were to use a node-set instead, we'd still have to prefix the XPath query. In this case, with a function call to an xpath extension function that creates the NodeSet from the content fragment(s?) of the document created by xml_parse(ie, more or less, a re-implementation of exsl:node-set() tailored for our use- case). Well, that or force the user to call it explicitly. Possible or not--wrt using a content fragment/document as the context node, I find this less desirable than the current mangling, so I'm becoming quite indifferent. :) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Hannu Krosing wrote: Some of the functions, including some specified in the standard, produce fragments. That's why we have the 'IS DOCUMENT' test. But then you could use xmlfragments as the functions return type, no ? Not in the case of xpath, no. There is a very complete standard for the Xpath data model, and we need to adhere to it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: For fear of passing an ill formed fragment of xml to the processor, we strip the xml declaration if any and surround what's left with 'x and '/x' and prepend '/x' to the supposed xpath. This is just horrible. I seem to recall having complained about that at the time, but I didn't (and don't) know enough about xpath to do any better. Well, a few of us do. I guess I took my eye off the ball a bit back when we were putting this into 8.3. This whole thing is a mess, and I suspect the only fix for now is to undo all the mangling of both the xml and the xpath expression. I don't think we should change the behavior if it's just to arrive at another less-than-desirable behavior. Whacking semantics around afresh with each release does not endear us to users. If we know how to fix it right, great; but if we can't then we should keep compatibility with 8.3 until we can. Honestly, this is a bug, pure and simple. There really can't be an argument about that. For the stable branch, we could make the following changes that should result in a Pareto improvement (nothing gets worse while some things get better): * only do the xml transformation if the xml is known not to be be well formed * if we need to mangle the xpath expression due to doing the xml transformation, then unless the xpath expression begins with a '/', prepend it with '/x//'. (two slashes corresponds to the descendent axis in xpath - in effect it stands for any number of descendent elements). But that's just a holding operation. For 8.4 we should stop this nonsense and simply say that it is up to the programmer to ensure that the xml passed to the processor is well formed. The thing that is so very bad about this is that if the programmer *has* made sure that the inputs are correct, s/he can still end up with broken results. If we're going to try to fix bad inputs, we must make damn sure that we don't break on correct inputs as a result. However, I can't off hand think of a lightning way to fix bad inputs that doesn't carry some danger to good inputs. Until someone comes up with something tolerably bulletproof, I suggest that we simply say that it is the programmer's responsibility. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan wrote: For fear of passing an ill formed fragment of xml to the processor, we strip the xml declaration if any and surround what's left with 'x and '/x' and prepend '/x' to the supposed xpath. This is just horrible. It will break for every xpath expression that doesn't begin with a '/' and probably for many that do. This whole thing is a mess, and I suspect the only fix for now is to undo all the mangling of both the xml and the xpath expression. If the programmer passes an ill formed piece of xml to the processor that is their lookout, but I think we should ensure that we give back correct results on well formed input. It's not about ill-formed pieces, it is about (well-formed) content fragments that are not full documents (exactly one root element). libxml2 doesn't support xpath on content fragments. The tradeoff here is either supporting no xpath at all on content fragments or supporting some xpath on both kinds of xml data. Whoever made the initial implementation of this (Nikolay, Cc) decided for the latter, but I can't say I'm happy with it either. I'd be OK with changing it, but I haven't had time to get to it, unfortunately. See also http://wiki.postgresql.org/wiki/XPath_Todo#XPath. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Peter Eisentraut wrote: Andrew Dunstan wrote: For fear of passing an ill formed fragment of xml to the processor, we strip the xml declaration if any and surround what's left with 'x and '/x' and prepend '/x' to the supposed xpath. This is just horrible. It will break for every xpath expression that doesn't begin with a '/' and probably for many that do. This whole thing is a mess, and I suspect the only fix for now is to undo all the mangling of both the xml and the xpath expression. If the programmer passes an ill formed piece of xml to the processor that is their lookout, but I think we should ensure that we give back correct results on well formed input. It's not about ill-formed pieces, it is about (well-formed) content fragments that are not full documents (exactly one root element). libxml2 doesn't support xpath on content fragments. The tradeoff here is either supporting no xpath at all on content fragments or supporting some xpath on both kinds of xml data. Whoever made the initial implementation of this (Nikolay, Cc) decided for the latter, but I can't say I'm happy with it either. I'd be OK with changing it, but I haven't had time to get to it, unfortunately. See also http://wiki.postgresql.org/wiki/XPath_Todo#XPath. I don't think it is our responsibility to remedy the deficiencies of libxml2, especially if it involves breaking the processing of valid xpath expressions on well formed XML. If someone can come up with a correct scheme for handling fragments, I'm all ears, but otherwise I think a) we should rip this out of 8.4 and b) we should try to make 8.3 slightly less broken at least, along the lines of my earlier suggestion. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan and...@dunslane.net writes: I don't think it is our responsibility to remedy the deficiencies of libxml2, especially if it involves breaking the processing of valid xpath expressions on well formed XML. If someone can come up with a correct scheme for handling fragments, I'm all ears, but otherwise I think a) we should rip this out of 8.4 and b) we should try to make 8.3 slightly less broken at least, along the lines of my earlier suggestion. I'm okay with removing this for 8.4, but I think it's too late to change the behavior of 8.3. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: I don't think it is our responsibility to remedy the deficiencies of libxml2, especially if it involves breaking the processing of valid xpath expressions on well formed XML. If someone can come up with a correct scheme for handling fragments, I'm all ears, but otherwise I think a) we should rip this out of 8.4 and b) we should try to make 8.3 slightly less broken at least, along the lines of my earlier suggestion. I'm okay with removing this for 8.4, but I think it's too late to change the behavior of 8.3. It's never too late to fix a bug. The current behaviour is just plain nonsense ... if your xpath expression is 'foo/bar' it ends up searching for '/xfoo/bar' which can't possibly be right. Surely we can at least fix that. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: I'm okay with removing this for 8.4, but I think it's too late to change the behavior of 8.3. It's never too late to fix a bug. When the proposed fix involves breaking cases that used to behave usefully, you need a much stronger argument than that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: I'm okay with removing this for 8.4, but I think it's too late to change the behavior of 8.3. It's never too late to fix a bug. When the proposed fix involves breaking cases that used to behave usefully, you need a much stronger argument than that. What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current useful behaviour (FSVO useful), but should be done anyway on correctness grounds. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current useful behaviour (FSVO useful), but should be done anyway on correctness grounds. I dunno, aren't XML document fragments sort of a pretty common case? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Robert Haas wrote: What I have proposed for 8.3 should not break a single case that currently behaves usefully. If anyone has a counter-example please show it. What I have proposed for 8.4 possibly would break current useful behaviour (FSVO useful), but should be done anyway on correctness grounds. I dunno, aren't XML document fragments sort of a pretty common case? They are. So what? How does that contradict either of the statements made above? Programmers using libxml2 are used to handling this problem. Why must postgres alone use a totally brain dead and utterly incorrect wrapping to solve a problem that everyone else leaves up to the programmer to handle? People in this thread are not concentrating on the fact that what we do now can break correct input. That makes it an unquestionable bug, IMNSHO. There seems to be agreement about what to do for 8.4, so we seem to be arguing about what to do for 8.3. Are you *really* arguing that prepending the xpath expression with '/x' in all cases should be allowed to continue? If you are I can only assume your use of xml/xpath is probably fairly light. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Thu, Feb 26, 2009 at 3:18 PM, Andrew Dunstan and...@dunslane.net wrote: I dunno, aren't XML document fragments sort of a pretty common case? They are. So what? How does that contradict either of the statements made above? Programmers using libxml2 are used to handling this problem. Why must postgres alone use a totally brain dead and utterly incorrect wrapping to solve a problem that everyone else leaves up to the programmer to handle? I can't think of any reason, especially not when you put it that way. :-) People in this thread are not concentrating on the fact that what we do now can break correct input. That makes it an unquestionable bug, IMNSHO. There seems to be agreement about what to do for 8.4, so we seem to be arguing about what to do for 8.3. Are you *really* arguing that prepending the xpath expression with '/x' in all cases should be allowed to continue? If you are I can only assume your use of xml/xpath is probably fairly light. Mine is very light indeed. But the change you're proposing seems like it could CONCEIVABLY break a working application that counts on PostgreSQL's particular flavor of misbehavior, and therefore it seems questionable to put that into a stable branch. The fact that the application perhaps should not have been written that way to begin with is neither here nor there. We don't want people to be afraid of upgrading to the latest point release when we fix, say, a backend crash or a data corruption bug. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Thu, 2009-02-26 at 15:37 -0500, Robert Haas wrote: Mine is very light indeed. But the change you're proposing seems like it could CONCEIVABLY break a working application that counts on PostgreSQL's particular flavor of misbehavior, That is what release notes are for. Joshua D. Drake -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Robert Haas wrote: But the change you're proposing seems like it could CONCEIVABLY break a working application that counts on PostgreSQL's particular flavor of misbehavior, and therefore it seems questionable to put that into a stable branch. The fact that the application perhaps should not have been written that way to begin with is neither here nor there. We don't want people to be afraid of upgrading to the latest point release when we fix, say, a backend crash or a data corruption bug. Let me reiterate the changes that I propose, and again challenge you to provide a working counter-example if you believe it will break anything not currently broken, even cases involving fragments. First, I propose that we abandon this mangling, if, and only if, the xml is in fact a well formed XML document. Since the whole point of the mangling is to handle situations where the XML is not a well formed document, that seems fairly straight-forward. If this change were to upset any user, it must be because they are relying on undisputably incorrect results. Second, I propose that, in the remaining cases, where we do mangle the XML, if the xpath expression does not begin with a '/', instead of prepending it with '/x/, which can not possibly be correct under any circumstance, we prepend it with '/x//' which has some possibility of giving correct results. IOW, these proposals will expand the number of correct results from the code, without contributing any new incorrect cases. These changes are *very* conservative, as is usual when we fix things on stable branches. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
I wrote: Second, I propose that, in the remaining cases, where we do mangle the XML, if the xpath expression does not begin with a '/', instead of prepending it with '/x/, which can not possibly be correct under any circumstance, we prepend it with '/x//' which has some possibility of giving correct results. ^^^ '/x' cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan and...@dunslane.net writes: First, I propose that we abandon this mangling, if, and only if, the xml is in fact a well formed XML document. Since the whole point of the mangling is to handle situations where the XML is not a well formed document, that seems fairly straight-forward. If this change were to upset any user, it must be because they are relying on undisputably incorrect results. Second, I propose that, in the remaining cases, where we do mangle the XML, if the xpath expression does not begin with a '/', instead of prepending it with '/x/, which can not possibly be correct under any circumstance, we prepend it with '/x//' which has some possibility of giving correct results. Hmm, does this proposal require adding a test of well-formed-ness to a code path that doesn't currently have one? If so, is that likely to contribute any noticeable slowdown? I can't offhand see an objection to this other than possible performance impact. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Thu, Feb 26, 2009 at 3:54 PM, Andrew Dunstan and...@dunslane.net wrote: Robert Haas wrote: But the change you're proposing seems like it could CONCEIVABLY break a working application that counts on PostgreSQL's particular flavor of misbehavior, and therefore it seems questionable to put that into a stable branch. The fact that the application perhaps should not have been written that way to begin with is neither here nor there. We don't want people to be afraid of upgrading to the latest point release when we fix, say, a backend crash or a data corruption bug. Let me reiterate the changes that I propose, and again challenge you to provide a working counter-example if you believe it will break anything not currently broken, even cases involving fragments. First, I propose that we abandon this mangling, if, and only if, the xml is in fact a well formed XML document. Since the whole point of the mangling is to handle situations where the XML is not a well formed document, that seems fairly straight-forward. If this change were to upset any user, it must be because they are relying on undisputably incorrect results. Second, I propose that, in the remaining cases, where we do mangle the XML, if the xpath expression does not begin with a '/', instead of prepending it with '/x/, which can not possibly be correct under any circumstance, we prepend it with '/x//' which has some possibility of giving correct results. IOW, these proposals will expand the number of correct results from the code, without contributing any new incorrect cases. These changes are *very* conservative, as is usual when we fix things on stable branches. cheers andrew You are right. Nuff said, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote: It's not about ill-formed pieces, it is about (well-formed) content fragments that are not full documents (exactly one root element). libxml2 doesn't support xpath on content fragments. exslt:node-set() to the rescue? Or is that/equivalent functionality not easily accessed at the C-level with libxml2? http://www.exslt.org/exsl/functions/node-set/index.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
James Pye wrote: On Feb 26, 2009, at 9:37 AM, Peter Eisentraut wrote: It's not about ill-formed pieces, it is about (well-formed) content fragments that are not full documents (exactly one root element). libxml2 doesn't support xpath on content fragments. exslt:node-set() to the rescue? Or is that/equivalent functionality not easily accessed at the C-level with libxml2? http://www.exslt.org/exsl/functions/node-set/index.html A node-set isn't a document. In any case, this functionality doesn't appear to be in libxml2, it's in libxslt according to the reference you provided. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote: http://www.exslt.org/exsl/functions/node-set/index.html A node-set isn't a document. yes.. :) I guess what I'm saying is that if: tinman=# SELECT XML'foo/bar/'; xml -- foo/bar/ (1 row) is considered to be valid per *SQL-XML*, then it should probably be treated as a node-set in the context of xpath, not mangled with x.../x.. Certainly, I would expect an implicit node-set() call long before wrapping the fragment in x.../x and prefixing my xpath query. However, I find the validity of the above, XML'foo/bar/', a bit disturbing to begin with. :P In any case, this functionality doesn't appear to be in libxml2, it's in libxslt according to the reference you provided. I think that's *just* referencing the list of xslt implementations that the extension function is known to be available in.. I doubt that means to imply that the function or equivalent functionality is not available in libxml2 itself. I'd wager that equivalent functionality could be implemented if it weren't directly/already available.. =) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
James Pye wrote: On Feb 26, 2009, at 5:41 PM, Andrew Dunstan wrote: http://www.exslt.org/exsl/functions/node-set/index.html A node-set isn't a document. yes.. :) I guess what I'm saying is that if: tinman=# SELECT XML'foo/bar/'; xml -- foo/bar/ (1 row) is considered to be valid per *SQL-XML*, then it should probably be treated as a node-set in the context of xpath, not mangled with x.../x.. Certainly, I would expect an implicit node-set() call long before wrapping the fragment in x.../x and prefixing my xpath query. However, I find the validity of the above, XML'foo/bar/', a bit disturbing to begin with. :P In any case, this functionality doesn't appear to be in libxml2, it's in libxslt according to the reference you provided. I think that's *just* referencing the list of xslt implementations that the extension function is known to be available in.. I doubt that means to imply that the function or equivalent functionality is not available in libxml2 itself. I'd wager that equivalent functionality could be implemented if it weren't directly/already available.. =) James, can you point me at any call in libxml2 which will evaluate an xpath expression in the context of a nodeset instead of a document? Quite apart from anything else, xpath requires there to be a (single) context node (see http://www.w3.org/TR/xpath20/#context ). For a doc, we set that node to the document node, but what would it be for a node-set or a fragment? If we can't get over that hurdle we're screwed in pursuing your line of thought. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Tom Lane wrote: Hmm, does this proposal require adding a test of well-formed-ness to a code path that doesn't currently have one? If so, is that likely to contribute any noticeable slowdown? I can't offhand see an objection to this other than possible performance impact. Yeah, testing the well-formedness might cost a bit. We could short-circuit the test by applying some comparatively fast heuristic tests. Or we could decide that we'll just fix the xpath prefix part for 8.3 and keep the wrapping. I don't want to spend a huge effort on fixing something I regard as fundamentally broken. I'll do some tests to see what the cost of extra xml parsing might be. cheers andrew We -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] xpath processing brain dead
Andrew Gierth was just pointing out to me how badly broken our XPath processing is. For fear of passing an ill formed fragment of xml to the processor, we strip the xml declaration if any and surround what's left with 'x and '/x' and prepend '/x' to the supposed xpath. This is just horrible. It will break for every xpath expression that doesn't begin with a '/' and probably for many that do. This whole thing is a mess, and I suspect the only fix for now is to undo all the mangling of both the xml and the xpath expression. If the programmer passes an ill formed piece of xml to the processor that is their lookout, but I think we should ensure that we give back correct results on well formed input. The only good piece of news is that the xpath procedures in contrib/xml2 don't apparently suffer these faults. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xpath processing brain dead
Andrew Dunstan and...@dunslane.net writes: For fear of passing an ill formed fragment of xml to the processor, we strip the xml declaration if any and surround what's left with 'x and '/x' and prepend '/x' to the supposed xpath. This is just horrible. I seem to recall having complained about that at the time, but I didn't (and don't) know enough about xpath to do any better. This whole thing is a mess, and I suspect the only fix for now is to undo all the mangling of both the xml and the xpath expression. I don't think we should change the behavior if it's just to arrive at another less-than-desirable behavior. Whacking semantics around afresh with each release does not endear us to users. If we know how to fix it right, great; but if we can't then we should keep compatibility with 8.3 until we can. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers