On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:
On Jun29, 2011, at 19:57 , Radosław Smogura wrote:
This is review of patch
https://commitfest.postgresql.org/action/patch_view?id=565
"Bugfix for XPATH() if expression returns a scalar value"

SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo;

      xmlelement
-------------------------
<root sth="&amp;lt;n"/>

It's clearly visible that value from attribute is "&lt;n", not "<". Every parser will read this as "&lt;n" which is not-natural and will require form consumer/developer to de-escape this on his side - roughly speaking this will
be reported as serious bug.

There's a problem there, no doubt, but your proposed solution just moves the
problem around.

Here's your query, reformatted to be more legible

SELECT
  XMLELEMENT(name root,
             XMLATTRIBUTES(foo.namespace AS sth)
  )
FROM (
  SELECT
    (XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name "root",
                XMLATTRIBUTES('<n' AS xmlns,
                              '<v' AS value),'<t')
  )
) v(x)) as foo;

What happens is that the XPATH expression selects the xmlns
attribute with the value '<n'. To be well-formed xml, that value
must be escaped, so what is actually returned by the XPATH
call is '&lt;n'. The XMLATTRIBUTES() function, however, doesn't
distinguish between input of type XML and input of type TEXT,
so it figures it has to represent the *textual* value '&lt;n'
in xml, and produces '&amp;lt;n'.

Not escaping textual values returned by XPATH isn't a solution,
though. For example, assume someone inserts the value returned
by the XPATH() call in your query into a column of type XML.
If XPATH() returned '<n' literally instead of escaping it,
the column would contain non-well-formed XML in a column of type
XML. Not pretty, and pretty fatal during your next dump/reload
cycle.

Or, if you want another example, simply remove the XMLATTRIBUTES
call and use the value returned by XPATH as a child node directly.

SELECT
  XMLELEMENT(name root,
             foo.namespace
  )
FROM (
  SELECT
    (XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name "root",
                XMLATTRIBUTES('<n' AS xmlns,
                              '<v' AS value),'<t')
  )
) v(x)) as foo;

     xmlelement
--------------------
 <root>&lt;n</root>

Note that this correct! The <root> node contains a text node
representing the textual value '<n'. If XPATH() hadn't return
the value escaped, the query above would have produces

  <root><n</root>

which is obviously wrong. It works in this case because XMLELEMENT
is smart enough to distinguish between child nodes gives as TEXT
values (those are escaped) and child nodes provided as XML values
(those are inserted unchanged).

XMLATTRIBUTES, however, doesn't make that distinction, and simply
escaped all attributes values independent of their type. Now, the
reason it does that is probably that *not* all XML values are
well-formed attribute values without additional escaping. For example,
you cannot have literal '<' and '>' in attribute values. So if
XMLATTRIBUTES, like XMLELEMENT never escaped values which are
already of type XML, the following piece of code

  XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))

would produce

  <a v="<node/>"/>

which, alas, is not well-formed :-(

The correct solution, I believe, is to teach XMLATTRIBUTES to
only escape those characters which are invalid in attribute
values but valid in XML (Probably only '<' and '>' but I'll
check). If there are no objects, I'll prepare a separate patch
for that. I don't want to include it in this patch because it's
really a distinct issue (even modifies a different function).

best regards,
Florian Pflug

You may manually enter invalid xml too, You don't need xpath for this. Much more
In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
I executed
SELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))
and it's looks like I got correct output "<a v="&lt;node/&gt;"/>"
when I looked with text editor into table files I saw same value.

I will check on last master if I can reproduce this.

But indeed, now PostgreSQL is not type safe against XML type. See SELECT XMLELEMENT(name "root", '<', (xpath('//text()', '<root>&lt;</root>'))[1])).

You found some problem with escaping, but solution is bad, I think problem lies with XML type, which may be used to hold strings and proper xml. For example above query can't distinguish if (xpath('//text()', '<root>&lt;</root>'))[1] is xml text, or xml element.

For me adding support for scalar is really good and needed, but escaping is not.

Regards,
Radosław Smogura


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to