Re: [HACKERS] patch: function xmltable

Craig Ringer Sun, 11 Sep 2016 21:29:54 -0700

> I'll take a closer read-through shortly.


Missing file. You omitted executor/tableexpr.h from the patch, so I
can't compile.

I've expanded and copy-edited the docs. Some of it is guesswork based
on the references you sent and a glance at the code. Please check my
changes carefully. I found a few surprises, like the fact that DEFAULT
isn't a normal literal, it's an xpath expression evaluated at the same
time as the rowexpression.

Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
that it's missing tableexpr.h. For convenient review or to apply to
your working tree I also attach a diff of just my docs changes as
proposed-docs-changes.diff.


Docs:

- Can you send the sample data used to generate the example output?
I'd like to include at least a cut down part of it in the docs to make
it clear how the input correlates with output, and preferably put the
whole thing in an appendix.

- How does it decide what subset of the document to iterate over?
That's presumably rowexpr, which is xpath in postgresql? (I added this
to docs).

- xmlnamespaces clause in docs needs an example for a non-default namespace.

- What effect does xmlnamespaces clause have? Does supplying it allow
you to reference qualified names in xpath? What happens if you don't
specify it for a document that has namespaces or don't define all the
namespaces? What if you reference an undefined namespace in xpath?
What about if an undefined namespace isn't referenced by xpath, but is
inside a node selected by an xpath expression?

- What are the rules for converting the matched XML node into a value?
If the matched node is not a simple text node or lacks a text node as
its single child, what happens?

- What happens if the matched has multiple text node children? This
can happen if, for example, you have something like

    <matchedNode>
      some text <!-- comment splits up text node --> other text
    </matchedNode>

- Is there a way to get an attribute as a value? If so, an example
should show this because it's going to be a common need. Presumably
you want   node/@attrname ?

- What happens if COLUMNS is not specified at all? It looks like it
returns a single column result set with the matched entries as 'xml'
type, so added to docs, please verify.


- PASSING clause isn't really defined. You can specify one PASSING
entry as a literal/colref/expression, and it's the argument xml
document, right? The external docs you referred to say that PASSING
may have a BY VALUE keyword, alias its argument with AS, and may have
expressions, e.g.

    PASSING BY VALUE '<x/>' AS a, '<y/>' AS b

  Neither aliases nor multiple entries are supported by the code or
grammar. Should this be documented as a restriction? Do you know if
that's an extension by the other implementation or if it's SQL/XML
standard? (I've drafted a docs update to cover this in the updated
patch).


- What does BY REF mean? Should this just be mentioned with a "see
xmlexists(...)" since it seems to be compatibility noise? Is there a
corresponding BY VALUE or similar?


- The parser definitions re-use xmlexists_argument . I don't mind
that, but it's worth noting here in case others do.


- Why do the expression arguments take c_expr (anything allowed in
a_expr or b_expr), not b_expr (restricted expression) ?


- Column definitions are underdocumented. The grammar says they can be
NOT NULL, for example, but I don't see that in any of the references
you mailed me nor in the docs. What behaviour is expected for a NOT
NULL column? I've documented my best guess (not checked closely
against the code, please verify).

-




Test suggestions:

- Coverage of multiple text() node children of an element, where split
up by comment or similar

- Coverage of xpath that matches a node with child element nodes



More to come. Please review my docs changes in the mean time. I'm
spending a lot more time on this than I expected so I might have to
get onto other things for a while too.



-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

From 464e8f16386c0fbfb8b5c6fdee66402c95c6ae9f Mon Sep 17 00:00:00 2001
From: Craig Ringer <[email protected]>
Date: Mon, 12 Sep 2016 12:26:14 +0800
Subject: [PATCH] Proposed docs changes

---
 doc/src/sgml/func.sgml | 103 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 82 insertions(+), 21 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ca861d2..3ba492e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10099,7 +10099,7 @@ SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
     </para>
    </sect3>
 
-   <sect3>
+   <sect3 id="functions-xmltable" xreflabel="XMLTABLE">
     <title><literal>xmltable</literal></title>
 
    <indexterm>
@@ -10111,7 +10111,9 @@ SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
 </synopsis>
 
     <para>
-      The <function>xmltable</function> produces table based on passed XML value.
+      The <function>xmltable</function> produces a table based on the
+      passed XML value, an xpath filter to extract rows, and an
+      optional set of column definitions.
     </para>
 
     <para>
@@ -10140,11 +10142,11 @@ SELECT  xmltable.*
     </para>
 
    <para>
-     The optional <literal>xmlnamespaces</literal> clause allow to specify a list
-     of namespaces specified by <replaceable>namespace URI</replaceable> and
-     <replaceable>namespace name</replaceable> (alias>. The default namespace is
-     specified by <replaceable>namespace URI</replaceable> after keyword
-     <literal>DEFAULT</literal>.
+     The optional <literal>xmlnamespaces</> clause is a comma-separated list of
+     namespaces of the form <replaceable>namespace-URI</> <literal>AS</>
+     <replaceable>namespace-name</>. It specifies the XML namespaces used in
+     the document and their aliases. The default namespace is specified with
+     <literal>DEFAULT</> <replaceable>namespace-URI</>.
 
 <screen><![CDATA[
 SELECT *
@@ -10159,25 +10161,84 @@ SELECT *
    </para>
 
     <para>
+     The required <replaceable>rowexpr</> argument is an xpath
+     expression that is evaluated against the supplied XML document to
+     obtain an ordered sequence of XML nodes. This sequence is what
+     <function>xmltable</> transforms into output rows.
+    </para>
+
+    <note>
+     <para>
+      The SQL/XML standard specifies the the
+      <function>xmltable</function> construct should take an XQuery
+      expression as the first argument, but PostgreSQL currently only
+      supports XPath, which is a subset of XQuery.
+     </para>
+    </note>
+
+    <para>
+     The <literal>PASSING</> <replaceable>xml</> clause provides the
+     XML document to operate on.
      The <literal>BY REF</literal> clauses have no effect in
      PostgreSQL, but are allowed for SQL conformance and compatibility
-     with other implementations.  Per SQL standard, the
-     first <literal>BY REF</literal> is required, the second is
-     optional.  Also note that the SQL standard specifies
-     the <function>xmlexists</function> construct to take an XQuery
-     expression as first argument, but PostgreSQL currently only
-     supports XPath, which is a subset of XQuery.
+     with other implementations. Per the SQL/XML standard, the first
+     <literal>BY REF</literal> is required, the second is optional.
+     Passing <literal>BY VALUE</> is not supported. Multiple
+     comma-separated terms in the PASSING clause are not supported.
+     <literal>AS</> aliases are not supported.
     </para>
 
    <para>
-     The optional <literal>COLUMNS</literal> clause allow to specify a list
-     of colums of generated table. The column with special mark
-     <literal>FOR ORDINALITY</literal> ensures row numbers in this column.
-     When <literal>PATH</literal> is not defined, then the name of column is
-     used as implicit path. Only one column should be marked <literal>FOR ORDINALITY</literal>.
-     When path expression is empty, then possible <literal>DEFAULT</literal> value
-     is used.
-<screen><![CDATA[
+     The optional <literal>COLUMNS</literal> clause specifies the list
+     of colums in the generated table. If the whole
+     <literal>COLUMNS</> list is omitted, then the result set is a
+     single column where each row is a field of <literal>xml</> type
+     containing the data matched by the <replaceable>rowexpr</>.
+     Otherwise each entry describes a single column. See the syntax
+     summary above for the format. The column name and type are
+     required; the path and not-null specifier are optional. 
+    </para>
+
+    <note>
+     <para>
+      Column names and types are identifiers so unlike normal
+      function arguments they may not be specified as bind parameters,
+      column references, or expressions. The path and default are
+      ordinary values.
+     </para>
+    </note>
+
+    <para>
+     The <literal>PATH</> for a column is an xpath expression that is
+     evaluated for each row, relative to the result of the
+     <replaceable>rowexpr</>, to find the value of the column. If no
+     <literal>PATH</> is given then the column name is used as an
+     implicit path.
+    </para>
+
+    <para>
+     If the path expression does not match for a given row but a
+     <literal>DEFAULT</> xpath expression is specified, the resulting
+     default value is used. If no <literal>DEFAULT</> is given the
+     field will be <literal>NULL</>.  Note that unlike
+     <literal>PATH</>, the <literal>DEFAULT</> for a column is
+     evaluated only once, relative to the root of the
+     <replaceable>xml</> argument, when processing begins.
+    </para>
+
+    <para>
+     Columns may be marked <literal>NOT NULL</>. If the xpath expression
+     for a <literal>NOT NULL</> column does not match anything and there
+     is no <literal>DEFAULT</> or the <literal>DEFAULT</> also evaluated
+     to null then the function terminates with an ERROR.
+    </para>
+     
+    <para>
+     A column marked with the
+     <literal>FOR ORDINALITY</literal> keyword will be populated with
+     row numbers that match the order in which the the output rows appeared
+     in the original input XML document.  Only one column should be
+     marked <literal>FOR ORDINALITY</literal>. <screen><![CDATA[
 SELECT *
   FROM XMLTABLE(XMLNAMESPACES(DEFAULT 'http://x.y')
                 '/rows/row' PASSING '<rows xmlns="http://x.y";><row><a>10</a></row></rows>');
-- 
2.5.5

0001-XMLTABLE-v3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

Reply via email to