Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> Patch attached.

Will check and apply this.

> I've pondered whether to add a check to configure which verifies that
> the headers match the libxml version exactly at compile time. In the end,
> I didn't do that, for two reasons. First, there isn't anything wrong with
> using older headers together with a newer libxml, so long as both versions
> are either <= 2.7.3 or > 2.7.3. And second, with such a check in place,
> the only way to exercise libxml's promised ABI compatibility is to upgrade
> the libxml 2package after compiling postgres. That, however, is unlikely
> to happen except on production servers, and so by adding the check we'd
> increase the chance of ABI-compatibility problems remaining undetected
> during development and testing.

I concur with *not* doing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 17:07 , Tom Lane wrote:
> Florian Pflug  writes:
>> What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
>> to elog(FATAL)? Shall I do that too, or leave it for now?
> 
> No objection here --- I had considered writing it that way myself.
> I refrained for fear of making a bad situation even worse, but if
> other people think FATAL would be the safer way, fine.

Patch attached.

pg_xml_init() now checks the error context after setting it, and
ereport(ERROR)s if the check fails. I've made it so that the errhint()
which suggest that compiling against libxml <= 2.7.3 but running
against > 2.7.3 might be the reason is only emitted if we actually
compiled against an older version. This is meant to avoid confusion
should there ever be another reason for that check to fail.

I've also changed the elog(ERROR) to elog(FATAL) in xml_errorHandler().

I've pondered whether to add a check to configure which verifies that
the headers match the libxml version exactly at compile time. In the end,
I didn't do that, for two reasons. First, there isn't anything wrong with
using older headers together with a newer libxml, so long as both versions
are either <= 2.7.3 or > 2.7.3. And second, with such a check in place,
the only way to exercise libxml's promised ABI compatibility is to upgrade
the libxml 2package after compiling postgres. That, however, is unlikely
to happen except on production servers, and so by adding the check we'd
increase the chance of ABI-compatibility problems remaining undetected
during development and testing.

best regards,
Florian Pflug


pg_xml_errctxcheck.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 18:04 , Noah Misch wrote:
> On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:
>> On Jul26, 2011, at 01:57 , Noah Misch wrote:
>>> We could rearrange things so the xml2-config -L
>>> flags (or lack thereof) take priority over a --with-libraries directory for
>>> the purpose of finding libxml2.
>> 
>> Hm, how would we do that just for the purpose of finding libxml? Do autotools
>> provide a way to specify per-library -L flags?
> 
> Autoconf (built-in macros, that is) and Automake do not get involved in that. 
>  I
> vaguely recall Libtool having support for it, but PostgreSQL does not use
> Automake or Libtool.  I also spoke too loosely: -L is never per-library, but 
> you
> can emulate that by completing the search externally and passing a full path 
> to
> the linker.

Yeah, that was what I though would be the only way too. I had the slight hope
that I had missed something, but unfortunately it seems I didn't :-(

> Honestly, the benefit is probably too light to justify going to this trouble.

Yep. It'd probably be weeks, if not months, until we made that work on all
supported platforms. And by the time it does, we'd probably have re-invented
a sizeable portion of libtool.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Noah Misch
On Tue, Jul 26, 2011 at 02:09:13PM +0200, Florian Pflug wrote:
> On Jul26, 2011, at 01:57 , Noah Misch wrote:
> > We could rearrange things so the xml2-config -L
> > flags (or lack thereof) take priority over a --with-libraries directory for
> > the purpose of finding libxml2.
> 
> Hm, how would we do that just for the purpose of finding libxml? Do autotools
> provide a way to specify per-library -L flags?

Autoconf (built-in macros, that is) and Automake do not get involved in that.  I
vaguely recall Libtool having support for it, but PostgreSQL does not use
Automake or Libtool.  I also spoke too loosely: -L is never per-library, but you
can emulate that by completing the search externally and passing a full path to
the linker.

Honestly, the benefit is probably too light to justify going to this trouble.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Tom Lane  writes:
> Robert Haas  writes:
>> On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane  wrote:
>>> I don't think so.  It would just be another headache for packagers ---
>>> in Red Hat distros, at least, packages are explicitly forbidden from
>>> containing any rpath settings.

>> So what do they do about Perl and Python?

> Patch the source to not add rpath switches.

No, I take that back.  What the RH packages do is configure with
--disable-rpath, and so long as that applies to this too, I'd
have no objection.

What I was mis-remembering is that there's a patch that *puts back*
plperl's rpath for libperl, because for some reason that no one has ever
satisfactorily explained to me, perl has an exemption from the distro
policy that loadable libraries must be installed so that they're known
to ldconfig.  Which of course is exactly why rpath isn't (supposed to
be) needed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
> to elog(FATAL)? Shall I do that too, or leave it for now?

No objection here --- I had considered writing it that way myself.
I refrained for fear of making a bad situation even worse, but if
other people think FATAL would be the safer way, fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane  wrote:
>> I don't think so.  It would just be another headache for packagers ---
>> in Red Hat distros, at least, packages are explicitly forbidden from
>> containing any rpath settings.

> So what do they do about Perl and Python?

Patch the source to not add rpath switches.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 16:22 , Tom Lane wrote:
> Florian Pflug  writes:
>> On further reflection, instead of checking whether we can restore the error
>> handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
>> in pg_xml_done() to ereport(ERROR), and include a hint that explains the
>> probably cause.
> 
>> The upside being that we only fail when we actually need to restore the
>> error handler. Since there's one caller (parse_xml_decl) who calls
>> pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
>> isn't only academic. At least XML would output will continue to work
>> work after libxml is upgraded from < 2.7.4 to >= 2.7.4.
> 
> Good point.  But what about failing in pg_xml_init?  That is, after
> calling xmlSetStructuredErrorFunc, check that it set the variable we
> expected it to set.

Yeah, that's even better. Will do it that way.

What about the suggested upgrade of the elog(ERROR) in xml_errorHandler
to elog(FATAL)? Shall I do that too, or leave it for now?

> The purpose of the check in pg_xml_done is not to detect library issues,
> but to detect omitted save/restore pairs and similar coding mistakes.
> I don't want to upgrade it to an ERROR, and I don't want to confuse
> people by hinting that the problem is with libxml.

I can see the concern about possible confusion, and I agree that
pg_xml_init(), right after we set our error handler, is the most logical
place to test whether we can read it back.

I guess I don't really see the benefit of the check in pg_xml_done()
triggering a WARNING instead of an ERROR, but since we won't be hijacking
that check now anyway, I don't mind it being a WARNING either.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> On further reflection, instead of checking whether we can restore the error
> handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
> in pg_xml_done() to ereport(ERROR), and include a hint that explains the
> probably cause.

> The upside being that we only fail when we actually need to restore the
> error handler. Since there's one caller (parse_xml_decl) who calls
> pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
> isn't only academic. At least XML would output will continue to work
> work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

Good point.  But what about failing in pg_xml_init?  That is, after
calling xmlSetStructuredErrorFunc, check that it set the variable we
expected it to set.

The purpose of the check in pg_xml_done is not to detect library issues,
but to detect omitted save/restore pairs and similar coding mistakes.
I don't want to upgrade it to an ERROR, and I don't want to confuse
people by hinting that the problem is with libxml.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 15:52 , Tom Lane wrote:
> Florian Pflug  writes:
>> The only fix I can come up with is to explicitly test whether or not we're
>> able to restore the structured error context in pg_xml_init_library().
> 
> Yeah, I think you are right.
> 
>> The downside of this is that a libxml version upgrade might break postgres,
> 
> Such an upgrade would break Postgres anyway --- better we are able to
> detect and report it clearly than that we fail in bizarre ways.

On further reflection, instead of checking whether we can restore the error
handler in pg_xml_init_library(), we could simply upgrade the elog(WARNING)
in pg_xml_done() to ereport(ERROR), and include a hint that explains the
probably cause.

The upside being that we only fail when we actually need to restore the
error handler. Since there's one caller (parse_xml_decl) who calls
pg_xml_init_library() but not pg_xml_init()/pg_xml_done(), the difference
isn't only academic. At least XML would output will continue to work
work after libxml is upgraded from < 2.7.4 to >= 2.7.4.

If nobody objects, I'll post a patch to that effect later today.

BTW, come to think of it, I believe the elog(ERROR) that you put into
xml_errorHandler should to be upgraded to elog(FATAL). Once we longjmp()
out of libxml, it doesn't seem safe to re-enter it, so making the backend
exit seems to be the only safe thing to do.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Robert Haas
On Tue, Jul 26, 2011 at 9:52 AM, Tom Lane  wrote:
>>> As a side note, we don't add an rpath for libxml2 like we do for Perl and
>>> Python.
>
>> Sounds like we should change that.
>
> I don't think so.  It would just be another headache for packagers ---
> in Red Hat distros, at least, packages are explicitly forbidden from
> containing any rpath settings.

So what do they do about Perl and Python?

Not sure I understand the virtue of being inconsistent here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Tom Lane
Florian Pflug  writes:
> What's more, xml.c actually does attempt to protect against this situation
> by calling LIBXML_TEST_VERSION in pg_xml_init_library().

> But that check doesn't fire in our case, because it explicitly allows the
> actual library version to be newer than the header file version, as long
> as the major versions agree (the major version being "2" in this case).

> So in essence, libxml promises ABI backwards-compatibility, but breaks
> that promise when it comes to restoring the structured error context.

Fun.

> The only fix I can come up with is to explicitly test whether or not we're
> able to restore the structured error context in pg_xml_init_library().

Yeah, I think you are right.

> The downside of this is that a libxml version upgrade might break postgres,

Such an upgrade would break Postgres anyway --- better we are able to
detect and report it clearly than that we fail in bizarre ways.

>> As a side note, we don't add an rpath for libxml2 like we do for Perl and
>> Python.

> Sounds like we should change that.

I don't think so.  It would just be another headache for packagers ---
in Red Hat distros, at least, packages are explicitly forbidden from
containing any rpath settings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-26 Thread Florian Pflug
On Jul26, 2011, at 01:57 , Noah Misch wrote:
> On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote:
>> On Jul25, 2011, at 20:37 , Bernd Helmle wrote:
>>> Ah, but i got now what's wrong here: configure is confusing both libxml2
>>> installations, and a quick look into config.log proves that: it uses the
>>> xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
>>> MacPorts, though i seem to recall to have changed this in the past).
> 
>> Hm, but I still think there's a bug lurking there. Using a different libxml2
>> version for the configure checks than for actual builds surely isn't good...
>> 
>> From looking at configure.in, it seems that we use xml2-config to figure out
>> the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
>> somehow end up not using these flags when we later test for
>> xmlStructuredErrorContext, but do use them during the actual build. Or maybe
>> the order of the -I and -L flags just ends up being different in the two 
>> cases.
> 
> I can reproduce similar behavior on GNU/Linux.  If my setup was sufficiently
> similar, Bernd's problematic build would have used this sequence of directives
> during both configuration and build:
> 
>  -I/usr/include/libxml2  -I/opt/local/include   -L/opt/local/lib
> 
> The directories passed using --with-includes and --with-libraries took
> priority over those from xml2-config.  Since libxml2 headers live in a
> `libxml2' subdirectory, --with-includes=/opt/local/include did not affect
> finding them.  --with-libraries=/opt/local/lib *did* affect finding the
> library binaries, though.  Therefore, he built entirely against /usr headers
> and /opt/local libraries.

Right. Thanks for detailed analysis, Noah!

What's more, xml.c actually does attempt to protect against this situation
by calling LIBXML_TEST_VERSION in pg_xml_init_library().

But that check doesn't fire in our case, because it explicitly allows the
actual library version to be newer than the header file version, as long
as the major versions agree (the major version being "2" in this case).

So in essence, libxml promises ABI backwards-compatibility, but breaks
that promise when it comes to restoring the structured error context.

The only fix I can come up with is to explicitly test whether or not we're
able to restore the structured error context in pg_xml_init_library(). I'm
envisioning we'd so something like

xmlStructuredErrorFunc *saved_errfunc = xmlStructuredError;
#if HAVE_XMLSTRUCTUREDERRORCONTEXT
void *saved_errctx = xmlStructuredErrorContext;
#else
void *saved_errctx = xmlGenericErrorContext;
#endif

xmlSetStructuredErrorFunc((void *) MAGIC, xml_errorHandler);

#ifdef HAVE_XMLSTRUCTUREDERRORCONTEXT
if (xmlStructuredErrorContext != MAGIC)
#else 
if (xmlGenericErrorContext != MAGIC)
#endif
{
  ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("unable to restore the libxml error context"),
 errhint("Postgres was probably built against libxml < 2.7.4 but loaded 
a version >= 2.7.4 at runtime")))
}

xmlSetStructuredErrorFunc(saved_errctx, saved_errfunc);

The downside of this is that a libxml version upgrade might break postgres,
of course. But by the time postgres 9.1 is released, 2.7.4 will be nearly 3
years old, so I dunno how like it is that a distro would afterwards decide to
upgrade from a version earlier than that to a version later than that.

> We could rearrange things so the xml2-config -L
> flags (or lack thereof) take priority over a --with-libraries directory for
> the purpose of finding libxml2.

Hm, how would we do that just for the purpose of finding libxml? Do autotools
provide a way to specify per-library -L flags?

> As a side note, we don't add an rpath for libxml2 like we do for Perl and
> Python.  That doesn't matter on Darwin, but with GNU libc, it entails setting
> LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find
> the library binary used at build time.

Sounds like we should change that.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Noah Misch
On Mon, Jul 25, 2011 at 09:06:41PM +0200, Florian Pflug wrote:
> On Jul25, 2011, at 20:37 , Bernd Helmle wrote:
> > Ah, but i got now what's wrong here: configure is confusing both libxml2
> > installations, and a quick look into config.log proves that: it uses the
> > xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
> > MacPorts, though i seem to recall to have changed this in the past).

> Hm, but I still think there's a bug lurking there. Using a different libxml2
> version for the configure checks than for actual builds surely isn't good...
> 
> From looking at configure.in, it seems that we use xml2-config to figure out
> the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
> somehow end up not using these flags when we later test for
> xmlStructuredErrorContext, but do use them during the actual build. Or maybe
> the order of the -I and -L flags just ends up being different in the two 
> cases.

I can reproduce similar behavior on GNU/Linux.  If my setup was sufficiently
similar, Bernd's problematic build would have used this sequence of directives
during both configuration and build:

  -I/usr/include/libxml2  -I/opt/local/include   -L/opt/local/lib

The directories passed using --with-includes and --with-libraries took
priority over those from xml2-config.  Since libxml2 headers live in a
`libxml2' subdirectory, --with-includes=/opt/local/include did not affect
finding them.  --with-libraries=/opt/local/lib *did* affect finding the
library binaries, though.  Therefore, he built entirely against /usr headers
and /opt/local libraries.  We could rearrange things so the xml2-config -L
flags (or lack thereof) take priority over a --with-libraries directory for
the purpose of finding libxml2.

As a side note, we don't add an rpath for libxml2 like we do for Perl and
Python.  That doesn't matter on Darwin, but with GNU libc, it entails setting
LD_LIBRARY_PATH or updating /etc/ld.so.conf to make the run time linker find
the library binary used at build time.

-- 
Noah Mischhttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Florian Pflug
On Jul25, 2011, at 20:37 , Bernd Helmle wrote:
> Ah, but i got now what's wrong here: configure is confusing both libxml2
> installations, and a quick look into config.log proves that: it uses the
> xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of
> MacPorts, though i seem to recall to have changed this in the past).
> 
> So, all i need to do is
> 
> XML2_CONFIG=/opt/local/bin/xml2-config ./configure --with-libxml
> --with-includes=/opt/local/include/ --with-libraries=/opt/local/lib
> 
> and everything is smooth:
> 
> % grep HAVE_XMLSTRUCTUREDERRORCONTEXT src/include/pg_config.h
> #define HAVE_XMLSTRUCTUREDERRORCONTEXT 1
> 
> Regression tests passes now. This was too obvious...


Hm, but I still think there's a bug lurking there. Using a different libxml2
version for the configure checks than for actual builds surely isn't good...

>From looking at configure.in, it seems that we use xml2-config to figure out
the CFLAGS and LDFLAGS required to build and link against libxml. I guess we
somehow end up not using these flags when we later test for
xmlStructuredErrorContext, but do use them during the actual build. Or maybe
the order of the -I and -L flags just ends up being different in the two cases.

My skills in the black art that are autotools are severely lacking, so it's
quite likely that I somehow botched the incantations we use to test for
xmlStructuredErrorContext. I don't really know where to start looking for the
error, though. Ideas, anyone?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Bernd Helmle



--On 25. Juli 2011 19:57:40 +0200 Florian Pflug  wrote:


I got a theory. We do distinguish between libxml2 versions for which
the structured and the generic error context handler share the error
context (older ones), and those with don't (newer ones). Our configure
scripts checks for the availability of xmlStructuredErrorContext, and
defined HAVE_XMLSTRUCTUREDERRORCONTEXT if it is. Now, if for some reason
that test fails on your machine, even though libxml *does* provide
xmlStructuredErrorContext, then the safety-check in the error handler
would check whether xmlGenericErrorContext is set as expected, when
it really should check xmlStructuredErrorContext.

Could you check if configure defines that macro? You should find
it in the pg_config.h generated by configure.


This is what pg_config.h says:

% grep HAVE_XMLSTRUCTUREDERRORCONTEXT src/include/pg_config.h
/* #undef HAVE_XMLSTRUCTUREDERRORCONTEXT */

Ah, but i got now what's wrong here: configure is confusing both libxml2 
installations, and a quick look into config.log proves that: it uses the 
xml2-config from the OSX libs (my $PATH has /usr in front of the bindir of 
MacPorts, though i seem to recall to have changed this in the past).


So, all i need to do is

XML2_CONFIG=/opt/local/bin/xml2-config ./configure --with-libxml 
--with-includes=/opt/local/include/ --with-libraries=/opt/local/lib


and everything is smooth:

% grep HAVE_XMLSTRUCTUREDERRORCONTEXT src/include/pg_config.h#define 
HAVE_XMLSTRUCTUREDERRORCONTEXT 1


Regression tests passes now. This was too obvious...

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Florian Pflug
On Jul25, 2011, at 19:37 , Bernd Helmle wrote:
> --On 25. Juli 2011 19:07:50 +0200 Florian Pflug  wrote:
>> Hm, I have libxml2 2.7.8, installed via Mac Ports, and I cannot reproduce
>> this. Maybe Mac Ports uses a modified libxml2, though. I'll check that.
>> 
>> Where did you obtain libxml2 from?
> 
> This is MacPorts, too:
> 
> % port installed libxml2
> The following ports are currently installed:
> libxml2 @2.7.8_0 (active)

'bout the same here:

$ port installed libxml2
The following ports are currently installed:
  libxml2 @2.7.8_0+universal (active)

> I've reduced my configure line to the least required options
> 
> ./configure --with-libxml --with-includes=/opt/local/include 
> --with-libraries=/opt/local/lib
> 
> but still get the WARNINGs in the regression.diffs.

I got a theory. We do distinguish between libxml2 versions for which
the structured and the generic error context handler share the error
context (older ones), and those with don't (newer ones). Our configure
scripts checks for the availability of xmlStructuredErrorContext, and
defined HAVE_XMLSTRUCTUREDERRORCONTEXT if it is. Now, if for some reason
that test fails on your machine, even though libxml *does* provide
xmlStructuredErrorContext, then the safety-check in the error handler
would check whether xmlGenericErrorContext is set as expected, when
it really should check xmlStructuredErrorContext.

Could you check if configure defines that macro? You should find
it in the pg_config.h generated by configure.

> Which settings do you use?

configure \
--prefix=/Users/fgp/Installs/pg.master.max.noassert.O1 \
--with-includes=/opt/local/include \
--with-libraries=/opt/local/lib \
--enable-debug \
--enable-depend \
--enable-thread-safety \
--with-pgport=54320 \
--without-tcl \
--with-perl \
--with-python \
--without-gssapi \
--without-krb5 \
--without-pam \
--without-ldap \
--without-bonjour \
--without-openssl \
--without-ossp-uuid \
--with-libxml \
--with-libxslt CFLAGS="-pipe -O1 -g"

I also checked with otool -L that it really uses the libxml from /opt.

$ otool -L 
.//src/test/regress/tmp_check/install/Users/fgp/Installs/pg.master.max.noassert.O1/bin/postgres
.//src/test/regress/tmp_check/install/Users/fgp/Installs/pg.master.max.noassert.O1/bin/postgres:
/opt/local/lib/libxml2.2.dylib (compatibility version 10.0.0, current 
version 10.8.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 125.2.11)

Despite the file name, that should be libxml 2.7.8. Here's the output of
xml2-config

$ /opt/local/bin/xml2-config --version
2.7.8

And there's no other libxml2 in /opt.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Bernd Helmle



--On 25. Juli 2011 19:07:50 +0200 Florian Pflug  wrote:


Hm, I have libxml2 2.7.8, installed via Mac Ports, and I cannot reproduce
this. Maybe Mac Ports uses a modified libxml2, though. I'll check that.

Where did you obtain libxml2 from?


This is MacPorts, too:

% port installed libxml2
The following ports are currently installed:
 libxml2 @2.7.8_0 (active)

I've reduced my configure line to the least required options

./configure --with-libxml --with-includes=/opt/local/include 
--with-libraries=/opt/local/lib


but still get the WARNINGs in the regression.diffs. Which settings do you use?

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Florian Pflug
On Jul25, 2011, at 18:53 , Bernd Helmle wrote:
> --On 20. Juli 2011 13:06:17 -0400 Tom Lane  wrote:
>> I've committed this patch with the discussed changes and some other
>> editorialization.  I have to leave for an appointment and can't write
>> anything now about the changes, but feel free to ask questions if you
>> have any.
> 
> Hmm, when building against libxml2 2.7.8 i get reproducible failing
> regression tests on OSX 10.6.7. It is griping with
> 
> WARNING:  libxml error handling state is out of sync with xml.c
> 
> all over the place.
> 
> A quick check with compiling against the libxml2 shipped with OSX
> (which seems libxml2 2.7.3) causes everything to work as expected, however.


Hm, I have libxml2 2.7.8, installed via Mac Ports, and I cannot reproduce
this. Maybe Mac Ports uses a modified libxml2, though. I'll check that.

Where did you obtain libxml2 from?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-25 Thread Bernd Helmle



--On 20. Juli 2011 13:06:17 -0400 Tom Lane  wrote:


I've committed this patch with the discussed changes and some other
editorialization.  I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.


Hmm, when building against libxml2 2.7.8 i get reproducible failing regression 
tests on OSX 10.6.7. It is griping with


WARNING:  libxml error handling state is out of sync with xml.c

all over the place.

A quick check with compiling against the libxml2 shipped with OSX (which seems 
libxml2 2.7.3) causes everything to work as expected, however.


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Tom Lane
Bruce Momjian  writes:
> Did this fix any of the XML TODOs?
>   http://wiki.postgresql.org/wiki/Todo#XML

No, not that I know of.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Bruce Momjian
Tom Lane wrote:
> I've committed this patch with the discussed changes and some other
> editorialization.  I have to leave for an appointment and can't write
> anything now about the changes, but feel free to ask questions if you
> have any.

Did this fix any of the XML TODOs?

http://wiki.postgresql.org/wiki/Todo#XML

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Tom Lane
I've committed this patch with the discussed changes and some other
editorialization.  I have to leave for an appointment and can't write
anything now about the changes, but feel free to ask questions if you
have any.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Florian Pflug
On Jul20, 2011, at 17:37 , Tom Lane wrote:
> Florian Pflug  writes:
>> I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
>> pfree it, but I'm kinda curious about why you prefer that over making it
>> the callers responsibility and letting callers use a stack-allocated
>> struct if they wish to.
> 
> We could do it that way, but it would require exposing the struct
> definition to callers.  As I have it coded ATM, the struct is an
> opaque typedef in xml.h and only known within xml.c, which decouples
> contrib/xml2 from any changes in it.  Another point is that if we
> changed our minds and went over to a transaction cleanup hook,
> stack-allocated structs wouldn't work at all.  Lastly, even if we
> did stack-allocate the control struct, the message buffer has to be
> palloc'd so it can be expanded at need.

You've convinced me, thanks for the detailed explanation!

>> Fair enough. Are you going to do that, or do you want me to produce an
>> updated patch? I can do that, but probably not before the weekend.
> 
> No, I'm working on it, almost done already.

Cool, thanks!

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Tom Lane
Florian Pflug  writes:
> I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
> pfree it, but I'm kinda curious about why you prefer that over making it
> the callers responsibility and letting callers use a stack-allocated
> struct if they wish to.

We could do it that way, but it would require exposing the struct
definition to callers.  As I have it coded ATM, the struct is an
opaque typedef in xml.h and only known within xml.c, which decouples
contrib/xml2 from any changes in it.  Another point is that if we
changed our minds and went over to a transaction cleanup hook,
stack-allocated structs wouldn't work at all.  Lastly, even if we
did stack-allocate the control struct, the message buffer has to be
palloc'd so it can be expanded at need.

> Fair enough. Are you going to do that, or do you want me to produce an
> updated patch? I can do that, but probably not before the weekend.

No, I'm working on it, almost done already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Florian Pflug
On Jul20, 2011, at 17:08 , Tom Lane wrote:
> Florian Pflug  writes:
>> On Jul20, 2011, at 01:42 , Tom Lane wrote:
>>> 1. If you forget pg_xml_done in some code path, you'll find out from
>>> an Assert at the next pg_xml_init, which is probably far away from where
>>> the actual problem is.
> 
>> But won't me miss that error entirely if me make it re-entrant?
> 
> I'm of the opinion at this point that the most reliable solution is to
> have a coding convention that pg_xml_init and pg_xml_done MUST be
> called in the style
> 
>   pg_xml_init();
>   PG_TRY();
>   ...
>   PG_CATCH();
>   {
>   ...
>   pg_xml_done();
>   PG_RE_THROW();
>   }
>   PG_END_TRY();
>   pg_xml_done();
> 
> If we convert contrib/xml2 over to this style, we can get rid of some of
> the weirder aspects of the LEGACY mode, such as allowing xml_ereport to
> occur after pg_xml_done

Fine with me. 

> (which would be problematic for my proposal,
> since I want pg_xml_done to pfree the state including the message
> buffer).

I'm fine with having pg_xml_init() palloc the state and pg_xml_done()
pfree it, but I'm kinda curious about why you prefer that over making it
the callers responsibility and letting callers use a stack-allocated
struct if they wish to.

>> I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
>> like the ones in core's xml.c do. The reasons I held back was I that
>> I felt these cleanups weren't part of the problem my patch was trying
>> to solve.
> 
> Fair point, but contorting the error handling to avoid changing xml2/ a
> bit more doesn't seem like a good decision to me.  It's not like we
> aren't forcing some change on that module already.

Fair enough. Are you going to do that, or do you want me to produce an
updated patch? I can do that, but probably not before the weekend.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Tom Lane
Florian Pflug  writes:
> On Jul20, 2011, at 01:42 , Tom Lane wrote:
>> 1. If you forget pg_xml_done in some code path, you'll find out from
>> an Assert at the next pg_xml_init, which is probably far away from where
>> the actual problem is.

> Very true. In fact, I did miss one pg_xml_done() call in the xml2
> contrib module initially, and it took me a while to locate the place
> it was missing.

> But won't me miss that error entirely if me make it re-entrant?

Yeah, I don't see any very easy way to detect a missed pg_xml_done call,
but having it be an Assert failure some time later isn't good from a
robustness standpoint.

I'm of the opinion at this point that the most reliable solution is to
have a coding convention that pg_xml_init and pg_xml_done MUST be
called in the style

pg_xml_init();
PG_TRY();
...
PG_CATCH();
{
...
pg_xml_done();
PG_RE_THROW();
}
PG_END_TRY();
pg_xml_done();

If we convert contrib/xml2 over to this style, we can get rid of some of
the weirder aspects of the LEGACY mode, such as allowing xml_ereport to
occur after pg_xml_done (which would be problematic for my proposal,
since I want pg_xml_done to pfree the state including the message
buffer).

> I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
> like the ones in core's xml.c do. The reasons I held back was I that
> I felt these cleanups weren't part of the problem my patch was trying
> to solve.

Fair point, but contorting the error handling to avoid changing xml2/ a
bit more doesn't seem like a good decision to me.  It's not like we
aren't forcing some change on that module already.

> So we really ought to make pg_xml_done() complain if libxml's
> current error context isn't what it expects.

Right, got that coded already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Tom Lane
I wrote:
> Alvaro Herrera  writes:
>> I was thinking that maybe it's time for this module to hook onto the
>> cleanup stuff for the xact error case; or at least have a check that it
>> has been properly cleaned up elesewhere.  Maybe this can be made to work
>> reentrantly if there's a global var holding the current context, and it
>> contains a link to the next one up the stack.  At least, my impression
>> is that the PG_TRY blocks are already messy.

> Yeah, that's another way we could go.  But I'm not sure how well it
> would interact with potential third-party modules setting up their own
> libxml error handlers.  Anybody have a thought about that?

I thought a bit more about this, and realized that there's a big
obstacle to getting rid of the PG_TRY blocks this way: most of them are
responsible for telling libxml to free some data structures, not just
restoring the error handler state.  We can't drop that aspect without
introducing session-lifespan memory leaks.

In principle we could expand the responsibilities of a
transaction-cleanup hook to include freeing data structures as well as
restoring error handlers, but the idea seems a lot messier and hence
less attractive than it did before.  I was ready to get rid of the
PG_TRY blocks until I came across this problem, but now I think I'll
stick with them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-20 Thread Florian Pflug
On Jul20, 2011, at 01:42 , Tom Lane wrote:
> Florian Pflug  writes:
>> Updated patch attached. Do you think this is "Ready for Committer"?
> 
> I've been looking through this patch.  While it's mostly good, I'm
> pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
> deliberately designed to be non-reentrant (ie, throw an Assert if
> pg_xml_init is called twice without pg_xml_done in between).
> There are at least two issues with that:

Hm, yeah, I did it that way because I didn't see an immediate need
to support nested pg_xml_init/done calls. But you're right - since
we're already nearly there, we might as well go all the way.

> 1. If you forget pg_xml_done in some code path, you'll find out from
> an Assert at the next pg_xml_init, which is probably far away from where
> the actual problem is.

Very true. In fact, I did miss one pg_xml_done() call in the xml2
contrib module initially, and it took me a while to locate the place
it was missing.

But won't me miss that error entirely if me make it re-entrant?

> 2. I don't think it's entirely unlikely that uses of libxml could be
> nested.
> 
> xpath_table in particular calls an awful lot of stuff between
> pg_xml_init and pg_xml_done, and is at the very least open to loss of
> control via an elog before it's called pg_xml_done.

True. But note that this function's error handling leaves something
to be desired anyway - I think it'll leak memory if it elog()s, for
example.

I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
like the ones in core's xml.c do. The reasons I held back was I that
I felt these cleanups weren't part of the problem my patch was trying
to solve. At least according to our docs the xml2 contrib module is
also deprecated, which was another reason to make only the absolutely
necessary adjustments.

> I think this patch has already paid 90% of the notational price for
> supporting fully re-entrant use of libxml.  What I'm imagining is
> that we move all five of the static variables (xml_strictness,
> xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
> xml_structuredErrorContext_saved) into a struct that's palloc'd
> by pg_xml_init and eventually passed to pg_xml_done.  It could be
> passed to xml_errorHandler via the currently-unused context argument.
> A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

I'd marginally prefer for the caller, not pg_xml_init(), to be responsible
for allocating the struct. That gives the caller the option to simply
allocate it on the stack.

> Now the risk factor if we do that is that if someone misses a
> pg_xml_done call, we leave an error handler installed with a context
> argument that's probably pointing at garbage, and if someone then tries
> to use libxml without re-establishing their error handler, they've 
> got problems. But they'd have problems anyway with the current form of
> the patch.

Currently it'd actually work in non-assert-enabled builds, so long
as no third-party library depends on its libxml error handler being
restored by us (which the old code simply assume never to be the case).

> We could provide some defense against this by including a
> magic identifier value in the palloc'd struct and making
> xml_errorHandler check it before doing anything dangerous.  Also, we
> could make pg_xml_done warn if libxml's current context pointer is
> different from the struct passed to it, which would provide another
> means of detection that somebody had missed a cleanup call.

There's one additional hazard. Suppose you have a functions f() and
g() defined as

g() {
  PgXmlState* xml_state = pg_xml_init();
  ...
  /* Ups. No pg_xml_done() calll here */  
}

f() {
  PgXmlState* xml_state = pg_xml_init();

  g();

  xmlSomeLibXmlFunctionCall();
  XML_CHECK_AND_EREPORT(xml_state, ...);

  pg_xml_done();
}

Error reported by xmlSomeLibXmlFunctionCall() will now be
missed by the XML_CHECK_AND_EREPORT in f(), because they'll
modify g()'s xml_state, not f()'s.

So we really ought to make pg_xml_done() complain if libxml's
current error context isn't what it expects.

> Unless someone sees a major hole in this idea, or a better way to do it,
> I'm going to modify the patch along those lines and commit.

If pg_xml_done() and the error handler do the safety check you suggested,
it seems fine.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-19 Thread Tom Lane
[ resend due to mail server hiccup ]

Alvaro Herrera  writes:
> Excerpts from Tom Lane's message of mar jul 19 19:42:54 -0400 2011:
>> Now the risk factor if we do that is that if someone misses a
>> pg_xml_done call, we leave an error handler installed with a context
>> argument that's probably pointing at garbage, and if someone then tries
>> to use libxml without re-establishing their error handler, they've 
>> got problems.

> I don't see any holes in this idea (though I didn't look very hard), but
> I was thinking that maybe it's time for this module to hook onto the
> cleanup stuff for the xact error case; or at least have a check that it
> has been properly cleaned up elesewhere.  Maybe this can be made to work
> reentrantly if there's a global var holding the current context, and it
> contains a link to the next one up the stack.  At least, my impression
> is that the PG_TRY blocks are already messy.

Yeah, that's another way we could go.  But I'm not sure how well it
would interact with potential third-party modules setting up their own
libxml error handlers.  Anybody have a thought about that?

> BTW I'd like to know your opinion on the fact that this patch added
> two new StringInfo routines defined as static in xml.c.  It seems to me
> that if we're going to extend some module's API we should do it properly
> in its own files; otherwise we're bound to repeat the functionality
> elsewhere, and lose opportunities for cleaning up some other code that
> could presumably use similar functionality.

I did think about that for a little bit, but the functions in question
are only a couple lines long and seem rather specialized to what xml.c
needs.  I'd just as soon leave them as-is until we actually have a
second use-case to help with picking a generalized API.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-19 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 19 19:42:54 -0400 2011:

> Now the risk factor if we do that is that if someone misses a
> pg_xml_done call, we leave an error handler installed with a context
> argument that's probably pointing at garbage, and if someone then tries
> to use libxml without re-establishing their error handler, they've 
> got problems.  But they'd have problems anyway with the current form of
> the patch.  We could provide some defense against this by including a
> magic identifier value in the palloc'd struct and making
> xml_errorHandler check it before doing anything dangerous.  Also, we
> could make pg_xml_done warn if libxml's current context pointer is
> different from the struct passed to it, which would provide another
> means of detection that somebody had missed a cleanup call.
> 
> Unless someone sees a major hole in this idea, or a better way to do it,
> I'm going to modify the patch along those lines and commit.

I don't see any holes in this idea (though I didn't look very hard), but
I was thinking that maybe it's time for this module to hook onto the
cleanup stuff for the xact error case; or at least have a check that it
has been properly cleaned up elesewhere.  Maybe this can be made to work
reentrantly if there's a global var holding the current context, and it
contains a link to the next one up the stack.  At least, my impression
is that the PG_TRY blocks are already messy.

BTW I'd like to know your opinion on the fact that this patch added
two new StringInfo routines defined as static in xml.c.  It seems to me
that if we're going to extend some module's API we should do it properly
in its own files; otherwise we're bound to repeat the functionality
elsewhere, and lose opportunities for cleaning up some other code that
could presumably use similar functionality.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-07-19 Thread Tom Lane
Florian Pflug  writes:
> Updated patch attached. Do you think this is "Ready for Committer"?

I've been looking through this patch.  While it's mostly good, I'm
pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
deliberately designed to be non-reentrant (ie, throw an Assert if
pg_xml_init is called twice without pg_xml_done in between).
There are at least two issues with that:

1. If you forget pg_xml_done in some code path, you'll find out from
an Assert at the next pg_xml_init, which is probably far away from where
the actual problem is.

2. I don't think it's entirely unlikely that uses of libxml could be
nested.

xpath_table in particular calls an awful lot of stuff between
pg_xml_init and pg_xml_done, and is at the very least open to loss of
control via an elog before it's called pg_xml_done.

I think this patch has already paid 90% of the notational price for
supporting fully re-entrant use of libxml.  What I'm imagining is
that we move all five of the static variables (xml_strictness,
xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
xml_structuredErrorContext_saved) into a struct that's palloc'd
by pg_xml_init and eventually passed to pg_xml_done.  It could be
passed to xml_errorHandler via the currently-unused context argument.
A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

Now the risk factor if we do that is that if someone misses a
pg_xml_done call, we leave an error handler installed with a context
argument that's probably pointing at garbage, and if someone then tries
to use libxml without re-establishing their error handler, they've 
got problems.  But they'd have problems anyway with the current form of
the patch.  We could provide some defense against this by including a
magic identifier value in the palloc'd struct and making
xml_errorHandler check it before doing anything dangerous.  Also, we
could make pg_xml_done warn if libxml's current context pointer is
different from the struct passed to it, which would provide another
means of detection that somebody had missed a cleanup call.

Unless someone sees a major hole in this idea, or a better way to do it,
I'm going to modify the patch along those lines and commit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-27 Thread Noah Misch
On Mon, Jun 27, 2011 at 12:45:02AM +0200, Florian Pflug wrote:
> Updated patch attached. Do you think this is "Ready for Committer"?

Thanks.  Yes; I have just marked it that way.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-26 Thread Florian Pflug
Hi

Updated patch attached. Do you think this is "Ready for Committer"?

best regards,
Florian Pflug


pg_xml_errorhandling.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Noah Misch
On Fri, Jun 24, 2011 at 04:15:35PM +0200, Florian Pflug wrote:
> On Jun24, 2011, at 13:24 , Noah Misch wrote:
> > On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:

> > There's also the " parser error :" difference; would that also be easy to
> > re-add?  (I'm assuming it's not a constant but depends on the 
> > xmlErrorDomain.)
> 
> It's the string representation of the error domain and error level. 
> Unfortunately,
> the logic which builds that string representation is contained in the static
> function xmlReportError() deep within libxml, and that function doesn't seem
> to be called at all for errors reported via a structured error handler :-(
> 
> So re-adding that would mean duplicating that code within our xml.c, which
> seems undesirable...

Yes, that seems like sufficient trouble to not undertake merely for the sake of
a cosmetic error message match.

> >> ! typedef enum
> >> ! {
> >> !  PG_XML_STRICTNESS_NONE  /* No error handling */,
> >> !  PG_XML_STRICTNESS_LEGACY/* Ignore notices/warnings/errors unless
> >> !   * function 
> >> result indicates error condition
> >> !   */,
> >> !  PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser 
> >> notices/warnings/errors */,
> >> !  PG_XML_STRICTNESS_ALL   /* Report all notices/warnings/errors 
> >> */,
> >> ! } PgXmlStrictnessType;
> > 
> > We don't generally prefix backend names with Pg/PG.  Also, I think the word
> > "Type" in "PgXmlStrictnessType" is redundant.  How about just XmlStrictness?
> 
> I agree with removing the "Type" suffix. I'm  not so sure about the "Pg"
> prefix, though. I've added that after actually hitting a conflict between
> one of this enum's constants and some constant defined by libxml. Admittedly,
> that was *before* I renamed the type to "PgXmlStrictnessType", so removing
> the "Pg" prefix now would probably work. But it seems a bit close for
> comfort - libxml defines a ton of constants named XML_something.
> 
> Still, if you feel that the "Pg" prefix looks to weird and believe that it's
> better to wait until there's an actual clash before renaming things, I won't
> object further. Just wanted to bring the issue to your attention...

I do think it looks weird, but I agree that the potential for conflict is
notably higher than normal.  I'm fine with having the prefix.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Florian Pflug
On Jun24, 2011, at 13:24 , Noah Misch wrote:
> On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
>> Updated patch attached.
> 
> This builds and passes all tests on both test systems I used previously
> (CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
> 2.6.31.dfsg-2ubuntu1.6).  Tested behaviors look good, too.
> 
>> On Jun20, 2011, at 22:45 , Noah Misch wrote:
>>> On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
 On Jun20, 2011, at 19:57 , Noah Misch wrote:
>>> Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
>>> available and xmlGenericErrorContext otherwise, I would just add an Autoconf
>>> check to identify which one to use.
>> 
>> It seems that before xmlStructuredErrorContext was introduced, the structured
>> and the generic error handler shared an error context, so doing what you
>> suggested looks sensible.
>> 
>> I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I 
>> did
>> this the right way. I basically copied a similar check (for setlongjump 
>> AFAIR)
>> and adapted it to check for xmlStructuredErrorContext instead.
> 
> Turned out fine.  Note that, more often than not, committers ask for patches
> not to contain the diff of the generated "configure" itself.  (I personally
> don't mind it when the diff portion is small and generated with the correct
> version of Autoconf, as in this patch.)

Ah, OK. I didn't know what project policy was there, so I figured I'd include
it since the configure script is tracked by git too. Now that I know, I'll 
remove
it from the final patch.

 I believe that the compatibility risk is pretty low here, and removing
 the meaningless prefix makes the error message are bit nicer to read.
 But if you are concerned that there maybe applications out there who
 parse the error text, we could of course add it back. I must say that
 I don't really know what our policy regarding error message stability is...
>>> 
>>> If the libxml2 API makes it a major pain to preserve exact messages, I
>>> wouldn't worry.
>> 
>> It'd only require us to prepend "Entity: " to the message string, so
>> there's no pain there. The question is rather whether we want to continue
>> having a pure noise word in front of every libxml error message for
>> the sake of compatibility.
> 
> There's also the " parser error :" difference; would that also be easy to
> re-add?  (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

It's the string representation of the error domain and error level. 
Unfortunately,
the logic which builds that string representation is contained in the static
function xmlReportError() deep within libxml, and that function doesn't seem
to be called at all for errors reported via a structured error handler :-(

So re-adding that would mean duplicating that code within our xml.c, which
seems undesirable...

>> I vote "Nay", on the grounds that I estimate the actual breakage from
>> such a change to be approximately zero. Plus the fact that libxml
>> error messages aren't completely stable either. I had to suppress the
>> DETAIL output for one of the regression tests to make them work for
>> both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
>> URI in it's error message in one of these versions but not the other...
> 
> I'm not particularly worried about actual breakage; I'm just putting on the
> "one change per patch"-lawyer hat.  All other things being equal, I'd prefer
> one patch that changes the mechanism for marshalling errors while minimally
> changing the format of existing errors, then another patch that proposes new
> formatting.  (Granted, the formatting-only patch would probably founder.)  But
> it's a judgement call, and this change is in a grey area.  I'm fine with
> leaving it up to the committer.

I agree with that principle, in principle. In the light of my paragraph above
about how we'd need to duplicate libxml code to keep the error messages the 
same,
however, I'll leave it up to the committer as you suggested.

> A few minor code comments:
> 
>> *** a/src/backend/utils/adt/xml.c
>> --- b/src/backend/utils/adt/xml.c
> 
>> + static bool xml_error_initialized = false;
> 
> Since xml_error_initialized is only used for asserts and now redundant with
> asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

You're absolutely right. Will remove.

>> ! /*
>> !  * pg_xml_done --- restore libxml state after pg_xml_init().
>> !  *
>> !  * This must be called if pg_xml_init() was called with a
>> !  * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
>> !  * (i.e. the structured error handler) to the original state.
> 
> The first sentence is obsolete.

Right again. Will remove.

>> *** a/src/include/utils/xml.h
>> --- b/src/include/utils/xml.h
>> *** typedef enum
>> *** 68,74 
>>  XML_STANDALONE_OMITTED
>>  }   XmlStandaloneType;
>> 
>> ! exter

Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Noah Misch
Hi Florian,

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
> Updated patch attached.

This builds and passes all tests on both test systems I used previously
(CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
2.6.31.dfsg-2ubuntu1.6).  Tested behaviors look good, too.

> On Jun20, 2011, at 22:45 , Noah Misch wrote:
> > On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
> >> On Jun20, 2011, at 19:57 , Noah Misch wrote:
> > Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
> > available and xmlGenericErrorContext otherwise, I would just add an Autoconf
> > check to identify which one to use.
> 
> It seems that before xmlStructuredErrorContext was introduced, the structured
> and the generic error handler shared an error context, so doing what you
> suggested looks sensible.
> 
> I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
> this the right way. I basically copied a similar check (for setlongjump AFAIR)
> and adapted it to check for xmlStructuredErrorContext instead.

Turned out fine.  Note that, more often than not, committers ask for patches
not to contain the diff of the generated "configure" itself.  (I personally
don't mind it when the diff portion is small and generated with the correct
version of Autoconf, as in this patch.)

>  !XML_PURPOSE_LEGACY /* Save error message only, don't set error 
>  flag */,
> >>> 
> >>> It's fine to set the flag for legacy users, considering they could just 
> >>> not
> >>> read it.  The important distinction seems to be that we don't emit 
> >>> warnings or
> >>> notices in this case.  Is that correct?  If so, the comment should reflect
> >>> that emphasis.  Then, consider updating the flag unconditionally.
> >> 
> >> I took me a while to remember while I did it that way, but I think I have 
> >> now.
> >> 
> >> I initialled wanted to add an Assert(!xml_error_occurred) to catch any
> >> missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
> >> though...
> >> 
> >> So I guess I should either refrain from setting the flag as you suggested,
> >> or add such an Assert(). I think I very slightly prefer the latter, what
> >> do you think?
> > 
> > I do like the idea of that assert.  How about setting the flag anyway, but
> > making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || 
> > !xml_error_occurred)"?
> 
> I added the Assert, but didn't make the setting of the error flag 
> unconditional.
> 
> If I did that, XML_CHECK_AND_EREPORT() would stop working for the LEGACY
> case. Now, that case currently isn't exercised, but breaking nevertheless
> seemed wrong to me.

Okay.

>  *** a/src/test/regress/expected/xml.out
>  --- b/src/test/regress/expected/xml.out
>  *** INSERT INTO xmltest VALUES (3, '  *** 8,14 
>  ERROR:  invalid XML content
>  LINE 1: INSERT INTO xmltest VALUES (3, ' ^
>  ! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag 
>  wrong line 1
>      ^
>  SELECT * FROM xmltest;
>  --- 8,14 
>  ERROR:  invalid XML content
>  LINE 1: INSERT INTO xmltest VALUES (3, ' ^
>  ! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
> >>> 
> >>> Any reason to change the error text this way?
> >> 
> >> The "Entity:" prefix is added by libxml only for messages reported
> >> to the generic error handler. It *doesn't* refer to entities as defined
> >> in xml, but instead used in place of the file  name if libxml
> >> doesn't have that at hand (because it's parsing from memory).
> >> 
> >> So that "Entity:" prefix really doesn't have any meaning whatsoever.
> > 
> > So xmlParserPrintFileContext() sends different content to the generic error
> > handler from what "natural" calls to the handler would send?
> 
> xmlParserPrintFileContext() only generates the "context" part of the error
> message. In the example above, these are the two lines
>^
> These lines don't contain the "Entity:" prefix - neither with the patch
> attached nor without.

Ah, my mistake.

> >> I believe that the compatibility risk is pretty low here, and removing
> >> the meaningless prefix makes the error message are bit nicer to read.
> >> But if you are concerned that there maybe applications out there who
> >> parse the error text, we could of course add it back. I must say that
> >> I don't really know what our policy regarding error message stability is...
> > 
> > If the libxml2 API makes it a major pain to preserve exact messages, I
> > wouldn't worry.
> 
> It'd only require us to prepend "Entity: " to the message string, so
> there's no pain there. The question is rather whether we want to continue
> having a pure noise word in front of every libxml error message for
> the sake of compatibility.

There's also the " parser error :" difference; woul

Re: [HACKERS] Another issue with invalid XML values

2011-06-20 Thread Noah Misch
On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
> On Jun20, 2011, at 19:57 , Noah Misch wrote:
> > I tested this patch using two systems, a CentOS 5 box with
> > libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
> > 2.6.31.dfsg-2ubuntu1.6.  Both failed to build with this error:
> > 
> > xml.c: In function `pg_xml_init':
> > xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this 
> > function)
> > xml.c:934: error: (Each undeclared identifier is reported only once
> > xml.c:934: error: for each function it appears in.)
> > make[4]: *** [xml.o] Error 1
> > 
> > It seems `xmlStructuredErrorContext' was added rather recently.  It's not
> > obvious which version added it, but 2.7.8 has it and 2.7.2 does not.  
> > Looking
> > at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
> > both structured and generic handler functions.  Based on that, I tried with 
> > a
> > change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
> > xml.c.  I ran the test suite and hit some failures in the xml test; see
> > attached regression.diffs.  I received warnings where the tests expected
> > nothing or expected errors. 
> 
> Great :-(. I wonder if maybe I should simply remove the part of the patch
> which try to restore the error handler and context in pg_xml_done(). I've
> added that because I feared that some 3rd-party libraries setup their libxml
> error handle just once, not before every library class. The code previously
> didn't care about this either, but since we're using structured instead of
> generic error reporting now, we'd now collide with 3-rd party libs which
> also use structured error handling, whereas previously that would have worked.
> OTOH, once there's more than one such library...
> 
> Whats your stand on this?

Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
available and xmlGenericErrorContext otherwise, I would just add an Autoconf
check to identify which one to use.

> > I couldn't reconcile this description with the code.  I do see that
> > xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement().  Why is
> > xmlelement() also ready for the strictest error reporting?
> 
> xmlelement() doesn't use libxml's parser, only the xml writer. I didn't
> want to suppress any errors the writer might raise (I'm not sure it raises
> error at all, though).

Makes sense.

> > Also note that we may be able to be more strict when xmloption = document.
> 
> Yeah, probably. I think that better done in a separate patch, though - I've
> tried not to change existing behaviour with this patch except where absolutely
> necessary (i.e. for XPATH)

Likewise.

> > Ideally, an invalid namespace URI should always be an error.  While 
> > namespace
> > prefixes could become valid as you assemble a larger document, a namespace
> > URI's validity is context-free.
> 
> I agree. That, however, would require xmlelement() to check all xmlns*
> attributes, and I didn't find a way to do that with libxml() (Well,
> other than to parse the element after creating it, but that's a huge
> kludge). So I left that issue for a later time...

Likewise.

> > Remembering to put a pg_xml_done() in every relevant elog(ERROR, ...) path
> > seems fragile.  How about using RegisterXactCallback/RegisterSubXactCallback
> > in pgxml_parser_init() to handle those cases?  You can also use it to Assert
> > at non-abort transaction shutdown that pg_xml_done() was called as needed.
> 
> Hm, isn't that a bit fragile too? It seems that PG_TRY/PG_CATCH blocks don't
> necessarily create sub-transactions, do they?

PG_TRY never creates a subtransaction.  However, elog(ERROR, ...) aborts
either the subtransaction, or if none, the top-level transaction.  Therefore,
registering the same callback with both RegisterXactCallback and
RegisterSubXactCallback ensures that it gets called for any elog/ereport
non-local exit.  Then you only explicitly call pg_xml_done() in the non-error
path.

However, I see now that in the core xml.c, every error-condition pg_xml_done()
appears in a PG_CATCH block.  That's plenty reliable ...

> Also, most elog() paths already contained xmlFree() calls, because libxml
> seemingly cannot be made to use context-based memory management. So one
> already needs to be pretty careful when touching these...

... and this is a good reason to keep doing it that way.  So, scratch that idea.

> > Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR,
> > emphasizing that it wraps a conditional.
> 
> Hm, I think it was named XML_CHECK_ERROR at one point, and I changed it
> because I felt it didn't convey the fact that it's actually ereport()
> and not just return false on error. But I agree that XML_REPORT_ERROR isn't
> very self-explanatory, either. What about XML_REPORT_PENDING_ERROR()
> or XML_CHECK_AND_REPORT_ERROR()?

XML_CHECK_AND_REPORT_ERROR() seems fine.  Or XML_CHECK_AND_EREPORT()?

> > On Thu, J

Re: [HACKERS] Another issue with invalid XML values

2011-06-20 Thread Florian Pflug
Hi

Thanks for the extensive review, it's very much appreciated!

On Jun20, 2011, at 19:57 , Noah Misch wrote:
> I tested this patch using two systems, a CentOS 5 box with
> libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
> 2.6.31.dfsg-2ubuntu1.6.  Both failed to build with this error:
> 
> xml.c: In function `pg_xml_init':
> xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this 
> function)
> xml.c:934: error: (Each undeclared identifier is reported only once
> xml.c:934: error: for each function it appears in.)
> make[4]: *** [xml.o] Error 1
> 
> It seems `xmlStructuredErrorContext' was added rather recently.  It's not
> obvious which version added it, but 2.7.8 has it and 2.7.2 does not.  Looking
> at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
> both structured and generic handler functions.  Based on that, I tried with a
> change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
> xml.c.  I ran the test suite and hit some failures in the xml test; see
> attached regression.diffs.  I received warnings where the tests expected
> nothing or expected errors. 

Great :-(. I wonder if maybe I should simply remove the part of the patch
which try to restore the error handler and context in pg_xml_done(). I've
added that because I feared that some 3rd-party libraries setup their libxml
error handle just once, not before every library class. The code previously
didn't care about this either, but since we're using structured instead of
generic error reporting now, we'd now collide with 3-rd party libs which
also use structured error handling, whereas previously that would have worked.
OTOH, once there's more than one such library...

Whats your stand on this?

>  Looks like my version of libxml2 (2.6.26 in this
> case) classifies some of these messages differently.

Hm, that's exactly what I hoped that this patch would *prevent* from happening..
Will look into this.

> On Thu, Jun 09, 2011 at 06:11:46PM +0200, Florian Pflug wrote:
>> On Jun2, 2011, at 01:34 , Florian Pflug wrote:
>> I also realized that some libxml error (like undefined namespace prefixes)
>> must be ignored during xmlparse() and friends. Otherwise, it becomes
>> impossible to build XML documents from individual fragments. pg_xml_init()
>> therefore now takes an argument which specifies how strict the error
>> checking is supposed to be. For the moment, only XPATH() uses the strict
>> mode in which we report all errors. XMLPARSE() and friends only report
>> parse errors, not namespace errors.
> 
> Seems fair offhand.  We must preserve the invariant that any object with type
> "xml" can be passed through xml_out() and then be accepted by xml_in().  I'm
> not seeing any specific risks there, but I figure I'd mention it.
> 
> I couldn't reconcile this description with the code.  I do see that
> xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement().  Why is
> xmlelement() also ready for the strictest error reporting?

xmlelement() doesn't use libxml's parser, only the xml writer. I didn't
want to suppress any errors the writer might raise (I'm not sure it raises
error at all, though).

> Also note that we may be able to be more strict when xmloption = document.

Yeah, probably. I think that better done in a separate patch, though - I've
tried not to change existing behaviour with this patch except where absolutely
necessary (i.e. for XPATH)

> Ideally, an invalid namespace URI should always be an error.  While namespace
> prefixes could become valid as you assemble a larger document, a namespace
> URI's validity is context-free.

I agree. That, however, would require xmlelement() to check all xmlns*
attributes, and I didn't find a way to do that with libxml() (Well,
other than to parse the element after creating it, but that's a huge
kludge). So I left that issue for a later time...

> The patch adds some trailing whitespace.

Ups, sorry for that. Will fix.

> Remembering to put a pg_xml_done() in every relevant elog(ERROR, ...) path
> seems fragile.  How about using RegisterXactCallback/RegisterSubXactCallback
> in pgxml_parser_init() to handle those cases?  You can also use it to Assert
> at non-abort transaction shutdown that pg_xml_done() was called as needed.

Hm, isn't that a bit fragile too? It seems that PG_TRY/PG_CATCH blocks don't
necessarily create sub-transactions, do they?

Also, most elog() paths already contained xmlFree() calls, because libxml
seemingly cannot be made to use context-based memory management. So one
already needs to be pretty careful when touching these...

Anyway, should we decide to give up on trying to restore the error handler
and context, we could get rid of pg_xml_done() again... (See above for the
details on that)

> Consider renaming XML_REPORT_ERROR it to something like XML_CHECK_ERROR,
> emphasizing that it wraps a conditional.

Hm, I think it was named XML_CHECK_ERROR at one point, and I changed it
because I f

Re: [HACKERS] Another issue with invalid XML values

2011-06-20 Thread Noah Misch
Hi Florian,

I tested this patch using two systems, a CentOS 5 box with
libxml2-2.6.26-2.1.2.8.el5_5.1 and an Ubuntu 8.04 box with libxml2
2.6.31.dfsg-2ubuntu1.6.  Both failed to build with this error:

xml.c: In function `pg_xml_init':
xml.c:934: error: `xmlStructuredErrorContext' undeclared (first use in this 
function)
xml.c:934: error: (Each undeclared identifier is reported only once
xml.c:934: error: for each function it appears in.)
make[4]: *** [xml.o] Error 1

It seems `xmlStructuredErrorContext' was added rather recently.  It's not
obvious which version added it, but 2.7.8 has it and 2.7.2 does not.  Looking
at 2.6.26's sources, that version seems to use `xmlGenericErrorContext' for
both structured and generic handler functions.  Based on that, I tried with a
change from `xmlStructuredErrorContext' to `xmlGenericErrorContext' in our
xml.c.  I ran the test suite and hit some failures in the xml test; see
attached regression.diffs.  I received warnings where the tests expected
nothing or expected errors.  Looks like my version of libxml2 (2.6.26 in this
case) classifies some of these messages differently.

I suggest testing the next version with both libxml2 2.6.23, the minimum
supported version, and libxml2 2.7.8, the latest version.

On Thu, Jun 09, 2011 at 06:11:46PM +0200, Florian Pflug wrote:
> On Jun2, 2011, at 01:34 , Florian Pflug wrote:
> > On Jun2, 2011, at 00:02 , Noah Misch wrote:
> >> On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:
> >>> Anyway, I'll try to come up with a patch that replaces 
> >>> xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().
> >> 
> >> Sounds sensible.  Will this impose any new libxml2 version dependency?
> > 
> > xmlSetStructuredErrorFunc() seems to be available starting with libxml 
> > 2.6.0,
> > release on Oct 20, 2003. Since we already require the version to be >= 
> > 2.6.23,
> > we should be OK.
> > 
> > I won't have access to my PC the next few days, but I'll try to come up with
> > a patch some time next week.
> 
> Phew... I did manage to produce a patch, but it was way more work than I
> had intended to put into this.
> 
> As it turns out, you loose the nicely formatted context information that
> libxml2 provides via the generic error func once you switch to structured
> error reporting. Registering handlers for both doesn't help either, since
> the generic error handler isn't called once you register a structured one.
> 
> Fortunately, libxml does export xmlParserPrintFileContext() which
> generates these context messages. It, however, doesn't return a string,
> but instead passes them to the generic error handler (this time, independent
> from whether a structural error handler is registered or not).

Interesting API, there.

> As it stood, the code assumed that all third-party library re-install
> their libxml error handlers before each library call, and thus didn't
> bother to restore the old error handler itself. Since I revamped the
> error handling anyway, I removed that requirement. There is now a
> function pg_xml_done() which restores the original error handler that
> we overwrote in pg_xml_init().

Seems reasonable.

> I also realized that some libxml error (like undefined namespace prefixes)
> must be ignored during xmlparse() and friends. Otherwise, it becomes
> impossible to build XML documents from individual fragments. pg_xml_init()
> therefore now takes an argument which specifies how strict the error
> checking is supposed to be. For the moment, only XPATH() uses the strict
> mode in which we report all errors. XMLPARSE() and friends only report
> parse errors, not namespace errors.

Seems fair offhand.  We must preserve the invariant that any object with type
"xml" can be passed through xml_out() and then be accepted by xml_in().  I'm
not seeing any specific risks there, but I figure I'd mention it.

I couldn't reconcile this description with the code.  I do see that
xpath_internal() uses XML_PURPOSE_OTHER, but so does xmlelement().  Why is
xmlelement() also ready for the strictest error reporting?

Also note that we may be able to be more strict when xmloption = document.

> 
> Finally, I had to adjust contrib/xml2 because it uses some parts of
> the core XML support like pg_xml_init().
> 
> Heres the indended behaviour with the patch applied:
> 
> 
> We always use structured error handling. For now, the error messages
> pretty much resemble the old ones, but it's now easy to add additional
> information.
> 
> XMLPARSE() and casting to XML check for parse errors only, like they do
> without the patch. They're also capable of reporting warnings, but I
> didn't find a case where the libxml parser generates a warning.
> 
> XPATH() reports all errors and warnings. Trying to use XPATH() on
> a document with e.g. inconsistent namespace usage or invalid
> namespace URIs therefore now raises an error. This is *necessary*
> because libxml's XPath evaluator gets confus

Re: [HACKERS] Another issue with invalid XML values

2011-06-09 Thread Florian Pflug
On Jun2, 2011, at 01:34 , Florian Pflug wrote:
> On Jun2, 2011, at 00:02 , Noah Misch wrote:
>> On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:
>>> Anyway, I'll try to come up with a patch that replaces 
>>> xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().
>> 
>> Sounds sensible.  Will this impose any new libxml2 version dependency?
> 
> xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0,
> release on Oct 20, 2003. Since we already require the version to be >= 2.6.23,
> we should be OK.
> 
> I won't have access to my PC the next few days, but I'll try to come up with
> a patch some time next week.

Phew... I did manage to produce a patch, but it was way more work than I
had intended to put into this.

As it turns out, you loose the nicely formatted context information that
libxml2 provides via the generic error func once you switch to structured
error reporting. Registering handlers for both doesn't help either, since
the generic error handler isn't called once you register a structured one.

Fortunately, libxml does export xmlParserPrintFileContext() which
generates these context messages. It, however, doesn't return a string,
but instead passes them to the generic error handler (this time, independent
from whether a structural error handler is registered or not).

As it stood, the code assumed that all third-party library re-install
their libxml error handlers before each library call, and thus didn't
bother to restore the old error handler itself. Since I revamped the
error handling anyway, I removed that requirement. There is now a
function pg_xml_done() which restores the original error handler that
we overwrote in pg_xml_init().

I also realized that some libxml error (like undefined namespace prefixes)
must be ignored during xmlparse() and friends. Otherwise, it becomes
impossible to build XML documents from individual fragments. pg_xml_init()
therefore now takes an argument which specifies how strict the error
checking is supposed to be. For the moment, only XPATH() uses the strict
mode in which we report all errors. XMLPARSE() and friends only report
parse errors, not namespace errors.

Finally, I had to adjust contrib/xml2 because it uses some parts of
the core XML support like pg_xml_init().

Heres the indended behaviour with the patch applied:


We always use structured error handling. For now, the error messages
pretty much resemble the old ones, but it's now easy to add additional
information.

XMLPARSE() and casting to XML check for parse errors only, like they do
without the patch. They're also capable of reporting warnings, but I
didn't find a case where the libxml parser generates a warning.

XPATH() reports all errors and warnings. Trying to use XPATH() on
a document with e.g. inconsistent namespace usage or invalid
namespace URIs therefore now raises an error. This is *necessary*
because libxml's XPath evaluator gets confused if it encounters
e.g. invalid namespace URI and outputs invalid XML in response.

contrib/xml2's behaviour hasn't changed.

Patch is attached, and comments are welcome.

best regards,
Florian Pflug


pg_xml_errorhandling.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-01 Thread Florian Pflug
On Jun2, 2011, at 00:02 , Noah Misch wrote:
> On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:
>> Anyway, I'll try to come up with a patch that replaces 
>> xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().
> 
> Sounds sensible.  Will this impose any new libxml2 version dependency?

xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0,
release on Oct 20, 2003. Since we already require the version to be >= 2.6.23,
we should be OK.

I won't have access to my PC the next few days, but I'll try to come up with
a patch some time next week.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-01 Thread Noah Misch
On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:
> On Jun1, 2011, at 03:17 , Florian Pflug wrote:
> > My nagging suspicion is that libxml reports errors like there via some 
> > callback function, and only returns a non-zero result if there are 
> > structural errors in the XML. But my experience with libxml is pretty 
> > limited, so maybe someone with more experience in this area can shed some 
> > light on this...
> 
> As it turns out, this is actually the case.

Thanks for getting to the bottom of this.  I had wondered why, for some versions
of libxml2, xml_in would accept things that xmllint rejected.  Sounds like the
list of errors that actually affect the return code has changed over time.

> Anyway, I'll try to come up with a patch that replaces 
> xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

Sounds sensible.  Will this impose any new libxml2 version dependency?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-01 Thread Florian Pflug
On Jun1, 2011, at 03:17 , Florian Pflug wrote:
> My nagging suspicion is that libxml reports errors like there via some 
> callback function, and only returns a non-zero result if there are structural 
> errors in the XML. But my experience with libxml is pretty limited, so maybe 
> someone with more experience in this area can shed some light on this...

As it turns out, this is actually the case.

libxml reports some errors (like invalid xmlns attributes) via the error 
handler set using xmlSetGenericErrorFunc() but still returns zero (indicating 
success) from xmlCtxtReadDoc() and xmlParseBalancedChunkMemory().

If I modify xml_parse() to complain not only if one of these functions return 
non-zero, but also if xml_err_buf has non-zero length, invalid xmlns attributes 
are reported correctly.

However, the error function set using xmlSetGenericErrorFunc() cannot 
distinguish between error and warnings, so doing this causes XMLPARSE() to also 
complain about things like non-absolute namespace URIs (which are allowed but 
deprecated as far as I understand).

To fix that, xmlSetGenericErrorFunc() would probably have to be replace by 
xmlSetStructuredErrorFunc(). Structured error functions receive a pointer to an 
xmlError structore which, amongst other things, contains an xmlErrorLevel 
(NONE, WARNING, ERROR, FATAL).

While digging through the code in src/backend/utils/adt/xml.c, I also noticed 
that we set a global error handler instead of a per-context one. I guess this 
is because xmlParseBalancedChunkMemory(), which we use to parse XML fragments, 
doesn't provide a way to pass in a context but rather creates it itself. Still, 
I wonder if there isn't some other API which we could use which does allow us 
to specify a context. Again, it'd be nice if someone more familiar with this 
code could explain the reasons behind the current design.

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() 
with xmlSetStructuredErrorFunc().

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Another issue with invalid XML values

2011-05-31 Thread Florian Pflug
Hi

Unfortunately, I found another way to produce invalid XML values.

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as 
xmlns[1];
   xpath   
---
 

Since a literal "<" is not allowed in XML attributes, this XML value is not 
well-formed. And indeed

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as 
xmlns[1]::TEXT::XML;
ERROR:  invalid XML content
DETAIL:  Entity: line 1: parser error : Unescaped '<' not allowed in attributes 
values

Note that this only affects namespace declarations (xmlns). The following case 
works correctly

template1=# SELECT (XPATH('/*', XMLELEMENT(NAME "root", XMLATTRIBUTES('<' as 
value[1];   
xpath 
--
 

The root of this issue is that "<" isn't a valid namespace URI to begin with, 
since "<" isn't in the set of allowed characters for URIs. Thus, when 
converting an XML node back to text, libxml doesn't escape xmlns attribute 
values.

I don't have a good solution for this issue yet. Special-casing attributes 
called "xmlns" (or "xmlns:") in XMLATTRIBUTES() solves only part of the 
problem - the TEXT to XML cast is similarly lenient and doesn't complain if you 
do ''::XML.

Why this cast succeeds is somewhat beyond me though - piping the very same XML 
document into xmllint produces

$ echo '' | xmllint -
-:1: namespace error : xmlns: '<' is not a valid URI

My nagging suspicion is that libxml reports errors like there via some callback 
function, and only returns a non-zero result if there are structural errors in 
the XML. But my experience with libxml is pretty limited, so maybe someone with 
more experience in this area can shed some light on this...

best regards,
Florian Pflug
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers