On 02/28/2011 10:51 AM, Tom Lane wrote:
Andrew Dunstan<and...@dunslane.net>  writes:
xpath_table is severely broken by design IMNSHO. We need a new design,
but I'm reluctant to work on that until someone does LATERAL, because a
replacement would be much nicer to design with it than without it.
Well, maybe I'm missing something, but I don't really understand why
xpath_table's design is so unreasonable.  Also, what would a better
solution look like exactly?  (Feel free to assume LATERAL is available.)



What's unreasonable about it is that the supplied paths are independent of each other, and evaluated in the context of the entire XML document.

Let's take the given example in the docs, changed slightly to assume each piece of XML can have more than one article listing in it (i.e,. 'article' is not the root node of the document):

   SELECT * FROM
   xpath_table('article_id',
                'article_xml',
                'articles',
                '//article/author|//article/pages|//article/title',
                'date_entered>  ''2003-01-01'' ')
   AS t(article_id integer, author text, page_count integer, title text);

There is nothing that says that the author has to come from the same article as the title, nor is there any way of saying that they must. If an article node is missing author or pages or title, or has more than one where its siblings do not, they will line up wrongly.

An alternative would be to supply a single xpath expression that would specify the context nodes to be iterated over (in this case that would be '//article') and a set of xpath expressions to be evaluated in the context of those nodes (in this case 'article|pages|title' ort better yet, supply these as a text array). We'd produce exactly one row for each node found by the context expression, and take the first value found by each of the column expressions in that context (or we could error out if we found more than one, or supply an array if the result field is an array). So with LATERAL taking care of the rest, the function signature could be something like:

   xpath_table_new(
        doc xml,
        context_xpath text,
        column_xpath text[])
   returns setof record


Given this, you could not get a row with title and author from different article nodes in the source document like you can now.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to