2017-03-12 21:57 GMT+01:00 Noah Misch <n...@leadboat.com>:

> On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> > 2017-03-12 0:56 GMT+01:00 Noah Misch <n...@leadboat.com>:
> > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > > > There are possible two fixes
> > > >
> > > > a) clean decl on input - the encoding info can be removed from decl
> part
> > > >
> > > > b) use xml_out_internal everywhere before transformation to
> > > > xmlChar. pg_xmlCharStrndup can be good candidate.
> > >
> > > I'd prefer (a) if the xml type were a new feature, because no good can
> > > come of
> > > storing an encoding in each xml field when we know the actual encoding
> is
> > > the
> > > database encoding.  However, if you implemented (a), we'd still see
> > > untreated
> > > values brought over via pg_upgrade.  Therefore, I would try (b)
> first.  I
> > > suspect the intent of xml_parse() was to implement (b); it will be
> > > interesting
> > > to see your test case that malfunctions.
> > >
> >
> > I looked there again and I found so this issue is related to xpath
> function
> > only
> >
> > Functions based on xml_parse are working without problems. xpath_internal
> > uses own direct xmlCtxtReadMemory without correct encoding sanitation.
> >
> > so fix is pretty simple
>
> Please add a test case.
>

It needs a application - currently there is not possibility to import XML
document via recv API :(

I wrote a pgimportdoc utility, but it is not part of core


>
> > --- a/src/backend/utils/adt/xml.c
> > +++ b/src/backend/utils/adt/xml.c
> > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype
> *data,
> > ArrayType *namespaces,
> >         ns_count = 0;
> >     }
> >
> > -   datastr = VARDATA(data);
> > -   len = VARSIZE(data) - VARHDRSZ;
> > +   datastr = xml_out_internal(data, 0);
>
> Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?
> The
> answer is probably in the archives, because someone understood the problem
> enough to document "Some XML-related functions may not work at all on
> non-ASCII data when the server encoding is not UTF-8. This is known to be
> an
> issue for xpath() in particular."


Probably there are two possible issues

1. what I touched - recv function does encoding to database encoding - but
document encoding is not updated.

2. there are not possibility to encode from document encoding to database
encoding.


> > +   len = strlen(datastr);
> > +
> >     xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
> > +
>
> The two lines of functional change don't create a cause for more newlines,
> so
> don't add these two newlines.
>

ok

Reply via email to