Re: [HACKERS] remove contrib/xml2

2010-03-01 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:
  

Tom Lane wrote:


... The reason for that behavior is that xpath_table runs through
the XPATH_NODESET results generated by the various XPaths and dumps the
k'th one of each into the k'th output row generated for the current
input row.
  


  
ISTM the missing piece is really in our API. We need to be able to 
specify a nodeset to iterate over, and then for each node take the first 
value produced by each xpath expression. So the example above would look 
something like:



  

SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b text);



Hm.  It seems like that still leaves you open to the possibility of
out-of-sync results.  If you consider the current behavior as what
you'd get with an empty root nodeset spec, then restricting it to
produce only the first output row doesn't help at all -- it would still
associate 1 with oops.  In general if the nodeset spec doesn't
select a unique subnode then you're at risk of bogus answers.
Maybe that could be defined as user error but it sure seems like it
would be error-prone to use.

  


Well, I think that's going to be hard or impossible to avoid in the 
general case. My suggestion was intended to give the user a much better 
chance of avoiding it, however.


Arbitrary XML (or JSON or YAML or any artbitrarilly tree structured data 
markup) doesn't map well to a rectangular structure, and this is always 
likely to cause problems like this IMO.


I guess in the end the user could preprocess the XML with XSLT to remove 
the irregularities before passing to to xpath_table.


We certainly need to put  some warnings in the docs about it.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-28 Thread Tom Lane
I believe I have fixed all the reported crashes in contrib/xml2.
However there is still this issue pointed out by Robert:

 CREATE TABLE xpath_test (id integer NOT NULL, t xml);
 INSERT INTO xpath_test VALUES (1, 'rowlistrow a=1/row a=2 
 b=oops//rowlist');
 SELECT * FROM xpath_table('id', 't', 'xpath_test',
 '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);

 which yields an answer that is, at least, extremely surprising, if not
 flat-out wrong:

  id | a |  b
 +---+--
   1 | 1 | oops
   1 | 2 |
 (2 rows)

the point being that it seems like oops should be associated with 2
not 1.  The reason for that behavior is that xpath_table runs through
the XPATH_NODESET results generated by the various XPaths and dumps the
k'th one of each into the k'th output row generated for the current
input row.  If there is any way to synchronize which node in each array
goes with each node in each other array, it's not apparent to me, but
I don't know libxml's API at all.  Perhaps there is some other call we
should be using to evaluate all the XPaths in parallel?

(The code is also unbelievably inefficient, recompiling each XPath
expression once per output row (!); but it doesn't seem worth fixing
that right away given that we might have to throw away the logic
entirely in order to fix this bug.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-28 Thread Andrew Dunstan



Tom Lane wrote:

I believe I have fixed all the reported crashes in contrib/xml2.
  


Yay! Well done! That at least removes any possibly urgency about 
removing the module.



However there is still this issue pointed out by Robert:

  

CREATE TABLE xpath_test (id integer NOT NULL, t xml);
INSERT INTO xpath_test VALUES (1, 'rowlistrow a=1/row a=2 
b=oops//rowlist');
SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);



  

which yields an answer that is, at least, extremely surprising, if not
flat-out wrong:



  

 id | a |  b
+---+--
  1 | 1 | oops
  1 | 2 |
(2 rows)



the point being that it seems like oops should be associated with 2
not 1.  The reason for that behavior is that xpath_table runs through
the XPATH_NODESET results generated by the various XPaths and dumps the
k'th one of each into the k'th output row generated for the current
input row.  If there is any way to synchronize which node in each array
goes with each node in each other array, it's not apparent to me, but
I don't know libxml's API at all.  Perhaps there is some other call we
should be using to evaluate all the XPaths in parallel?

(The code is also unbelievably inefficient, recompiling each XPath
expression once per output row (!); but it doesn't seem worth fixing
that right away given that we might have to throw away the logic
entirely in order to fix this bug.)


  


Damn that's ugly.


ISTM the missing piece is really in our API. We need to be able to 
specify a nodeset to iterate over, and then for each node take the first 
value produced by each xpath expression. So the example above would look 
something like:


   SELECT * FROM xpath_table('id', 't', 'xpath_test',
   '/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b text);


Maybe we could approximate that with the current API by factoring out 
the common root of the xpath expressions, but that's likely to be 
extremely fragile and error prone, and we've already got bad experience 
of trying to be too cute with xpath expressions.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 ... The reason for that behavior is that xpath_table runs through
 the XPATH_NODESET results generated by the various XPaths and dumps the
 k'th one of each into the k'th output row generated for the current
 input row.

 Damn that's ugly.

Yup :-(

 ISTM the missing piece is really in our API. We need to be able to 
 specify a nodeset to iterate over, and then for each node take the first 
 value produced by each xpath expression. So the example above would look 
 something like:

 SELECT * FROM xpath_table('id', 't', 'xpath_test',
 '/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b text);

Hm.  It seems like that still leaves you open to the possibility of
out-of-sync results.  If you consider the current behavior as what
you'd get with an empty root nodeset spec, then restricting it to
produce only the first output row doesn't help at all -- it would still
associate 1 with oops.  In general if the nodeset spec doesn't
select a unique subnode then you're at risk of bogus answers.
Maybe that could be defined as user error but it sure seems like it
would be error-prone to use.

 Maybe we could approximate that with the current API by factoring out 
 the common root of the xpath expressions, but that's likely to be 
 extremely fragile and error prone, and we've already got bad experience 
 of trying to be too cute with xpath expressions.

Agreed, we do not want to be doing textual manipulations of XPaths,
which is what burnt us before.  But does libxml2 offer any more
abstract path representation we could work on?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-18 Thread Alvaro Herrera
M Z escribió:
 Hi Alvaro,
 
 I followed your instruction but put the patch on 8.4.2 as I found it
 crashes. It looks like the server still crash in the same way. Can you and
 anyone give me some ideas how to fix this bug?

See src/backend/utils/adt/xml.c.  Note the comment at the top:

/*
 * Notes on memory management:
 *
 * Sometimes libxml allocates global structures in the hope that it can reuse
 * them later on.  This makes it impractical to change the xmlMemSetup
 * functions on-the-fly; that is likely to lead to trying to pfree() chunks
 * allocated with malloc() or vice versa.  Since libxml might be used by
 * loadable modules, eg libperl, our only safe choices are to change the
 * functions at postmaster/backend launch or not at all.  Since we'd rather
 * not activate libxml in sessions that might never use it, the latter choice
 * is the preferred one.  However, for debugging purposes it can be awfully
 * handy to constrain libxml's allocations to be done in a specific palloc
 * context, where they're easy to track.  Therefore there is code here that
 * can be enabled in debug builds to redirect libxml's allocations into a
 * special context LibxmlContext.  It's not recommended to turn this on in
 * a production build because of the possibility of bad interactions with
 * external modules.
 */
/* #define USE_LIBXMLCONTEXT */



Then if you look at xpath.c in contrib/xml2 you notice that it's doing
exactly the thing that the core module says it's unreliable: using
palloc and friends in xmlMemSetup.  So to fix the bug what's needed is
that the xmlMemSetup call in contrib is removed altogether, and all
memory is tracked and released by hand.  It's rather tedious, and it's
also difficult to plug all resulting memory leaks.  But AFAIUI doing
that would fix (some of?) the crashes.  Not sure if your crash is in
this category.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-18 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Then if you look at xpath.c in contrib/xml2 you notice that it's doing
 exactly the thing that the core module says it's unreliable: using
 palloc and friends in xmlMemSetup.  So to fix the bug what's needed is
 that the xmlMemSetup call in contrib is removed altogether, and all
 memory is tracked and released by hand.  It's rather tedious, and it's
 also difficult to plug all resulting memory leaks.  But AFAIUI doing
 that would fix (some of?) the crashes.  Not sure if your crash is in
 this category.

FWIW, the core xml code seems to have been pretty stable since we gave
up on trying to redirect libxml's memory allocations to palloc.
So what you basically need to do to xpath.c is something like this:
http://archives.postgresql.org/pgsql-committers/2009-05/msg00229.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-18 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 FWIW, the core xml code seems to have been pretty stable since we gave
 up on trying to redirect libxml's memory allocations to palloc.
 So what you basically need to do to xpath.c is something like this:
 http://archives.postgresql.org/pgsql-committers/2009-05/msg00229.php

Which translates to this git URL, for lazy browsers such as me:

  
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=f8059d7f8ae8bfac840fbda3c8efbc0f7c09b123
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-17 Thread M Z
Hi Alvaro,

I followed your instruction but put the patch on 8.4.2 as I found it
crashes. It looks like the server still crash in the same way. Can you and
anyone give me some ideas how to fix this bug?

==
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int',
'true') as t(id int4);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!
==

Best,
M Z



  CREATE TABLE xpath_test (id integer NOT NULL, t xml);
  INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
  SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
  as t(id int4);

  Hmm.  Well, all I know is that the first thing I tried crashed the
 server.



 This trivial patch lingering on my system fixes this crasher (this is
 for the 8.3 branch).  It makes the problem in alloc set ExprContext
 warning show up instead.

 There are still lotsa other holes, but hey, this is a start ...

 Index: contrib/xml2/xpath.c
 ===
 RCS file: /home/alvherre/Code/cvs/pgsql/contrib/xml2/xpath.c,v
 retrieving revision 1.16.2.1
 diff -c -p -r1.16.2.1 xpath.c
 *** contrib/xml2/xpath.c26 Mar 2008 01:19:11 -  1.16.2.1
 --- contrib/xml2/xpath.c27 Jan 2010 15:30:56 -
 *** xpath_table(PG_FUNCTION_ARGS)
 *** 793,798 
 --- 793,801 
   */
pgxml_parser_init();

 +   PG_TRY();
 +   {
 +
/* For each row i.e. document returned from SPI */
for (i = 0; i  proc; i++)
{
 *** xpath_table(PG_FUNCTION_ARGS)
 *** 929,934 
 --- 932,944 
if (xmldoc)
pfree(xmldoc);
}
 +   }
 +   PG_CATCH();
 +   {
 +   xmlCleanupParser();
 +   PG_RE_THROW();
 +   }
 +   PG_END_TRY();

xmlCleanupParser();
  /* Needed to flag completeness in 7.3.1. 7.4 defines it as a no-op. */

 --
 Alvaro Herrera
 http://www.CommandPrompt.com/
 The PostgreSQL Company - Command Prompt, Inc.

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] remove contrib/xml2

2010-02-06 Thread M Z
The thing is, why it doesn't crash on 8.3.8 but crash on 8.4.2? Any idea? A
patch was applied to 8.3 but not to 8.4.2?

Thanks,
M Z

On Fri, Feb 5, 2010 at 1:45 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Feb 3, 2010 at 8:49 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Robert Haas escribió:
  On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan and...@dunslane.net
 wrote:
   Robert Haas wrote:
   (2) add a very, very large warning that this will crash if you do
   almost anything with it.
  
   I think that's an exaggeration. Certain people are known to be using
 it
   quite successfully.
 
  Hmm.  Well, all I know is that the first thing I tried crashed the
 server.
 
  CREATE TABLE xpath_test (id integer NOT NULL, t xml);
  INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
  SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
  as t(id int4);
 
  This trivial patch lingering on my system fixes this crasher (this is
  for the 8.3 branch).  It makes the problem in alloc set ExprContext
  warning show up instead.
 
  There are still lotsa other holes, but hey, this is a start ...

 Interestingly M Z found he couldn't reproduce this crash on 8.3.  Can
 you?  If so, +1 for applying this and backpatching it as far as make
 sense.

 ...Robert

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] remove contrib/xml2

2010-02-06 Thread Tom Lane
M Z jm80...@gmail.com writes:
 The thing is, why it doesn't crash on 8.3.8 but crash on 8.4.2? Any idea?

Pure luck.  Memory-clobber bugs like these are notoriously
nondeterministic.  Any minor, logically unrelated change could make them
visible or not visible, because the clobber happens to clobber data that
is or is not in active use at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-05 Thread Robert Haas
On Thu, Feb 4, 2010 at 10:51 PM, M Z jm80...@gmail.com wrote:
 I did some tests followed Robert's test cases on both postgresql 8.4.2-0ubu
 and 8.3.8-1, OS: Ubuntu Karmic.

 1) 1st test case, it doesn't crash on 8.3.8 but crash on 8.4.2;

Interesting.  So, that's a regression of some kind.

 2) 2nd test case, both 8.3.8 and 8.4.2 are fine, and no warning (different
 from Robert's test?);

I built with --enable-debug and --enable-cassert, which might be relevant.

 3) 3rd test case (and modified test case for 8.3.8), both 8.3.8 and 8.4.2
 are not correct, same with Robert's test (8.5 beta?);

As I think about that further, it might not be a bug - how is the
processor supposed to know what we expect to happen?  But then, I
don't really know how this is supposed to work.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-05 Thread Robert Haas
On Wed, Feb 3, 2010 at 8:49 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Robert Haas escribió:
 On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan and...@dunslane.net wrote:
  Robert Haas wrote:
  (2) add a very, very large warning that this will crash if you do
  almost anything with it.
 
  I think that's an exaggeration. Certain people are known to be using it
  quite successfully.

 Hmm.  Well, all I know is that the first thing I tried crashed the server.

 CREATE TABLE xpath_test (id integer NOT NULL, t xml);
 INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
 SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
 as t(id int4);

 This trivial patch lingering on my system fixes this crasher (this is
 for the 8.3 branch).  It makes the problem in alloc set ExprContext
 warning show up instead.

 There are still lotsa other holes, but hey, this is a start ...

Interestingly M Z found he couldn't reproduce this crash on 8.3.  Can
you?  If so, +1 for applying this and backpatching it as far as make
sense.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-04 Thread M Z
I did some tests followed Robert's test cases on both postgresql 8.4.2-0ubu
and 8.3.8-1, OS: Ubuntu Karmic.

1) 1st test case, it doesn't crash on 8.3.8 but crash on 8.4.2;
2) 2nd test case, both 8.3.8 and 8.4.2 are fine, and no warning (different
from Robert's test?);
3) 3rd test case (and modified test case for 8.3.8), both 8.3.8 and 8.4.2
are not correct, same with Robert's test (8.5 beta?);


*
1st test case:
==
8.3.8
==
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int',
'true') as t(id int4);
 id

  1
(1 row)

==
8.4.2
==
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int',
'true') as t(id int4);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!
*

*
2nd test case
==
8.3.8 and 8.4.2
==
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t text);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int',
'true') as t(id int4);
 id

  1
(1 row)
*

*
3rd test case
==
8.3.8 and 8.4.2
==
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t text);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, 'rowlistrow a=1/row a=2
b=oops//rowlist');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);
 id | a |  b
+---+--
  1 | 1 | oops
  1 | 2 |
(2 rows)


==
8.3.8 (modified 3rd test case, because 8.3.8 won't crash using xml)
==
conifer=# CREATE TABLE xpath_test (id integer NOT NULL, t xml);
CREATE TABLE
conifer=# INSERT INTO xpath_test VALUES (1, 'rowlistrow a=1/row a=2
b=oops//rowlist');
INSERT 0 1
conifer=# SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);
 id | a |  b
+---+--
  1 | 1 | oops
  1 | 2 |
(2 rows)
*


For 1st test case, not sure if some paths applied to 8.3 haven't been
applied to 8.4, or other reasons cause the difference between 8.3.8 and
8.4.2.

Any ideas or comments?

Thank you,
M Z

On Mon, Feb 1, 2010 at 8:44 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan and...@dunslane.net
 wrote:
  Robert Haas wrote:
  (2) add a very, very large warning that this will crash if you do
  almost anything with it.
 
  I think that's an exaggeration. Certain people are known to be using it
  quite successfully.

 Hmm.  Well, all I know is that the first thing I tried crashed the server.

 CREATE TABLE xpath_test (id integer NOT NULL, t xml);
 INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
 SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
 as t(id int4);

 It doesn't crash if you change the type of t from xml to text; instead
 you get a warning about some sort of memory allocation problem.

 DROP TABLE xpath_test;
 CREATE TABLE xpath_test (id integer NOT NULL, t text);
 INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
 SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
 as t(id int4);

 yields:

 WARNING:  problem in alloc set ExprContext: bogus aset link in block
 0x14645e0, chunk 0x14648b8

 And then there's this (see also bug #5285):

 DELETE FROM xpath_test;
 INSERT INTO xpath_test VALUES (1, 'rowlistrow a=1/row a=2
 b=oops//rowlist');
 SELECT * FROM xpath_table('id', 't', 'xpath_test',
 '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b
 text);

 which yields an answer that is, at least, extremely surprising, if not
 flat-out wrong:

  id | a |  b
 +---+--
  1 | 1 | oops
  1 | 2 |
 (2 rows)

 Bugs #4953 and #5079 can also be reproduced in CVS HEAD.  Both crash the
 server.

 ...Robert

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] remove contrib/xml2

2010-02-03 Thread Alvaro Herrera
Robert Haas escribió:
 On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan and...@dunslane.net wrote:
  Robert Haas wrote:
  (2) add a very, very large warning that this will crash if you do
  almost anything with it.
 
  I think that's an exaggeration. Certain people are known to be using it
  quite successfully.
 
 Hmm.  Well, all I know is that the first thing I tried crashed the server.
 
 CREATE TABLE xpath_test (id integer NOT NULL, t xml);
 INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
 SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
 as t(id int4);

This trivial patch lingering on my system fixes this crasher (this is
for the 8.3 branch).  It makes the problem in alloc set ExprContext
warning show up instead.

There are still lotsa other holes, but hey, this is a start ...

Index: contrib/xml2/xpath.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/contrib/xml2/xpath.c,v
retrieving revision 1.16.2.1
diff -c -p -r1.16.2.1 xpath.c
*** contrib/xml2/xpath.c26 Mar 2008 01:19:11 -  1.16.2.1
--- contrib/xml2/xpath.c27 Jan 2010 15:30:56 -
*** xpath_table(PG_FUNCTION_ARGS)
*** 793,798 
--- 793,801 
   */
pgxml_parser_init();
  
+   PG_TRY();
+   {
+ 
/* For each row i.e. document returned from SPI */
for (i = 0; i  proc; i++)
{
*** xpath_table(PG_FUNCTION_ARGS)
*** 929,934 
--- 932,944 
if (xmldoc)
pfree(xmldoc);
}
+   }
+   PG_CATCH();
+   {
+   xmlCleanupParser();
+   PG_RE_THROW();
+   }
+   PG_END_TRY();
  
xmlCleanupParser();
  /* Needed to flag completeness in 7.3.1. 7.4 defines it as a no-op. */

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-01 Thread Robert Haas
On Thu, Jan 28, 2010 at 5:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 I think we need to either (1) fix the bugs and update the
 documentation to remove the statement that this will be removed or (2)
 actually remove it.

 I agree it's a mess but I don't think just abandoning the functionality
 is a good idea.

 Yeah, we can't really remove it until we have a plausible substitute for
 the xpath_table functionality.  This is in the TODO list ...

My feeling is that if it's as flakey and unreliable as it currently
is, we shouldn't ship it.  Removing it from CVS doesn't mean you
can't use this any more; this is open source.  It just means people
will have to go and get an old copy out of CVS and presumably in so
doing they will be aware that we've removed it for a reason.  We have
a well-deserved reputation for quality and I would like to see us
preserve that.

Is anyone going to try to fix this for 9.0?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 My feeling is that if it's as flakey and unreliable as it currently
 is, we shouldn't ship it.  Removing it from CVS doesn't mean you
 can't use this any more; this is open source.  It just means people
 will have to go and get an old copy out of CVS and presumably in so
 doing they will be aware that we've removed it for a reason.  We have
 a well-deserved reputation for quality and I would like to see us
 preserve that.

[ shrug... ]  It is not any more flaky than it's been since it was put in.
The people who have been depending on it presumably have use-patterns
for which it doesn't fail, and we're not going to be doing them a
service by ripping out functionality for which we can't offer a
replacement.

As for the quality argument, contrib modules are not guaranteed
to be of the same standard as the core code; anyone who thinks they are
should disabuse themselves of the notion by reading some of that code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-01 Thread Robert Haas
On Mon, Feb 1, 2010 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 My feeling is that if it's as flakey and unreliable as it currently
 is, we shouldn't ship it.  Removing it from CVS doesn't mean you
 can't use this any more; this is open source.  It just means people
 will have to go and get an old copy out of CVS and presumably in so
 doing they will be aware that we've removed it for a reason.  We have
 a well-deserved reputation for quality and I would like to see us
 preserve that.

 [ shrug... ]  It is not any more flaky than it's been since it was put in.
 The people who have been depending on it presumably have use-patterns
 for which it doesn't fail, and we're not going to be doing them a
 service by ripping out functionality for which we can't offer a
 replacement.

Well, then we'd at least better update the documentation to (1) remove
the statement that this will be removed in 8.4 (since we didn't), and
(2) add a very, very large warning that this will crash if you do
almost anything with it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-01 Thread Andrew Dunstan



Robert Haas wrote:

(2) add a very, very large warning that this will crash if you do
almost anything with it.


  


I think that's an exaggeration. Certain people are known to be using it 
quite successfully.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-02-01 Thread Robert Haas
On Mon, Feb 1, 2010 at 5:23 PM, Andrew Dunstan and...@dunslane.net wrote:
 Robert Haas wrote:
 (2) add a very, very large warning that this will crash if you do
 almost anything with it.

 I think that's an exaggeration. Certain people are known to be using it
 quite successfully.

Hmm.  Well, all I know is that the first thing I tried crashed the server.

CREATE TABLE xpath_test (id integer NOT NULL, t xml);
INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
as t(id int4);

It doesn't crash if you change the type of t from xml to text; instead
you get a warning about some sort of memory allocation problem.

DROP TABLE xpath_test;
CREATE TABLE xpath_test (id integer NOT NULL, t text);
INSERT INTO xpath_test VALUES (1, 'docint1/int/doc');
SELECT * FROM xpath_table('id', 't', 'xpath_test', '/doc/int', 'true')
as t(id int4);

yields:

WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14645e0, chunk 0x14648b8

And then there's this (see also bug #5285):

DELETE FROM xpath_test;
INSERT INTO xpath_test VALUES (1, 'rowlistrow a=1/row a=2
b=oops//rowlist');
SELECT * FROM xpath_table('id', 't', 'xpath_test',
'/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b
text);

which yields an answer that is, at least, extremely surprising, if not
flat-out wrong:

 id | a |  b
+---+--
  1 | 1 | oops
  1 | 2 |
(2 rows)

Bugs #4953 and #5079 can also be reproduced in CVS HEAD.  Both crash the server.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Andrew Dunstan




Robert Haas wrote:

There has been some more discussion lately of problems caused by contrib/xml2.

http://archives.postgresql.org/pgsql-bugs/2010-01/msg00251.php
http://archives.postgresql.org/pgsql-bugs/2010-01/msg00198.php

I think we need to either (1) fix the bugs and update the
documentation to remove the statement that this will be removed or (2)
actually remove it.  Nobody seems interested in #1, so PFA a patch to
do #2.  It also rips out all the libxslt stuff, which seems to exist
only for the purpose of supporting contrib/xml2.  
  


The problem is that there are people who use the XSLT and xpath_table 
stuff on text data and so don't run into these bugs.


I agree it's a mess but I don't think just abandoning the functionality 
is a good idea.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Mike Rylander
On Thu, Jan 28, 2010 at 4:18 PM, Andrew Dunstan and...@dunslane.net wrote:

 Robert Haas wrote:

 There has been some more discussion lately of problems caused by
 contrib/xml2.

 http://archives.postgresql.org/pgsql-bugs/2010-01/msg00251.php
 http://archives.postgresql.org/pgsql-bugs/2010-01/msg00198.php

 I think we need to either (1) fix the bugs and update the
 documentation to remove the statement that this will be removed or (2)
 actually remove it.  Nobody seems interested in #1, so PFA a patch to
 do #2.  It also rips out all the libxslt stuff, which seems to exist
 only for the purpose of supporting contrib/xml2.

 The problem is that there are people who use the XSLT and xpath_table stuff
 on text data and so don't run into these bugs.

 I agree it's a mess but I don't think just abandoning the functionality is a
 good idea.

I'm one of those people.  :)

Expecting to see contrib/xml2 go away at some point, possibly without
replacements for xslt_process and xpath_table, I've been working on
some plpgsql and plperlu work-alikes targeted at TEXT columns, as the
xml2 versions do. I hope these (attached) will be of some help to
others.  Note, these are not the exact functions I use, they are
lightly edited to remove the use of wrappers I've created to paper
over the transition from xpath_nodeset() to core XPATH().

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com


xml2-replacements.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 I think we need to either (1) fix the bugs and update the
 documentation to remove the statement that this will be removed or (2)
 actually remove it.

 I agree it's a mess but I don't think just abandoning the functionality 
 is a good idea.

Yeah, we can't really remove it until we have a plausible substitute for
the xpath_table functionality.  This is in the TODO list ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Josh Berkus

 Yeah, we can't really remove it until we have a plausible substitute for
 the xpath_table functionality.  This is in the TODO list ...

What about moving it to pgfoundry?

I'm really not keen on shipping known-broken stuff in /contrib.

--Josh Berkus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Robert Haas
On Thu, Jan 28, 2010 at 5:54 PM, Josh Berkus j...@agliodbs.com wrote:
 Yeah, we can't really remove it until we have a plausible substitute for
 the xpath_table functionality.  This is in the TODO list ...

 What about moving it to pgfoundry?

 I'm really not keen on shipping known-broken stuff in /contrib.

Yeah, exactly.  Another option - if we know that it works OK with
inputs of certain types - might be to throw an error if we get an
input of a type we know it doesn't work with.  But I guess I haven't
followed this closely enough to know exactly what kinds of arguments
do/don't work.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove contrib/xml2

2010-01-28 Thread Alvaro Herrera
Robert Haas escribió:
 On Thu, Jan 28, 2010 at 5:54 PM, Josh Berkus j...@agliodbs.com wrote:
  Yeah, we can't really remove it until we have a plausible substitute for
  the xpath_table functionality.  This is in the TODO list ...
 
  What about moving it to pgfoundry?
 
  I'm really not keen on shipping known-broken stuff in /contrib.
 
 Yeah, exactly.  Another option - if we know that it works OK with
 inputs of certain types - might be to throw an error if we get an
 input of a type we know it doesn't work with.  But I guess I haven't
 followed this closely enough to know exactly what kinds of arguments
 do/don't work.

If it were easy to throw an error, it'd be better to avoid the crash.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers