2017-03-12 22:26 GMT+01:00 Pavel Stehule <pavel.steh...@gmail.com>:

>
>
> 2017-03-12 21:57 GMT+01:00 Noah Misch <n...@leadboat.com>:
>
>> On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
>> > 2017-03-12 0:56 GMT+01:00 Noah Misch <n...@leadboat.com>:
>> > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
>> > > > There are possible two fixes
>> > > >
>> > > > a) clean decl on input - the encoding info can be removed from decl
>> part
>> > > >
>> > > > b) use xml_out_internal everywhere before transformation to
>> > > > xmlChar. pg_xmlCharStrndup can be good candidate.
>> > >
>> > > I'd prefer (a) if the xml type were a new feature, because no good can
>> > > come of
>> > > storing an encoding in each xml field when we know the actual
>> encoding is
>> > > the
>> > > database encoding.  However, if you implemented (a), we'd still see
>> > > untreated
>> > > values brought over via pg_upgrade.  Therefore, I would try (b)
>> first.  I
>> > > suspect the intent of xml_parse() was to implement (b); it will be
>> > > interesting
>> > > to see your test case that malfunctions.
>> > >
>> >
>> > I looked there again and I found so this issue is related to xpath
>> function
>> > only
>> >
>> > Functions based on xml_parse are working without problems.
>> xpath_internal
>> > uses own direct xmlCtxtReadMemory without correct encoding sanitation.
>> >
>> > so fix is pretty simple
>>
>> Please add a test case.
>>
>
> It needs a application - currently there is not possibility to import XML
> document via recv API :(
>
> I wrote a pgimportdoc utility, but it is not part of core
>
>
>>
>> > --- a/src/backend/utils/adt/xml.c
>> > +++ b/src/backend/utils/adt/xml.c
>> > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype
>> *data,
>> > ArrayType *namespaces,
>> >         ns_count = 0;
>> >     }
>> >
>> > -   datastr = VARDATA(data);
>> > -   len = VARSIZE(data) - VARHDRSZ;
>> > +   datastr = xml_out_internal(data, 0);
>>
>> Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?
>> The
>> answer is probably in the archives, because someone understood the problem
>> enough to document "Some XML-related functions may not work at all on
>> non-ASCII data when the server encoding is not UTF-8. This is known to be
>> an
>> issue for xpath() in particular."
>
>
> Probably there are two possible issues
>
> 1. what I touched - recv function does encoding to database encoding - but
> document encoding is not updated.
>
> 2. there are not possibility to encode from document encoding to database
> encoding.
>
>
>> > +   len = strlen(datastr);
>> > +
>> >     xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
>> > +
>>
>> The two lines of functional change don't create a cause for more
>> newlines, so
>> don't add these two newlines.
>>
>
> ok
>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 1908b13db5..2786d5b1cb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,8 +3874,8 @@ xpath_internal(text *xpath_expr_text, xmltype
*data, ArrayType *namespaces,
        ns_count = 0;
    }

-   datastr = VARDATA(data);
-   len = VARSIZE(data) - VARHDRSZ;
+   datastr = xml_out_internal(data, 0);
+   len = strlen(datastr);
    xpath_len = VARSIZE_ANY_EXHDR(xpath_expr_text);
    if (xpath_len == 0)
        ereport(ERROR,
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 1908b13db5..2786d5b1cb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,8 +3874,8 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		ns_count = 0;
 	}
 
-	datastr = VARDATA(data);
-	len = VARSIZE(data) - VARHDRSZ;
+	datastr = xml_out_internal(data, 0);
+	len = strlen(datastr);
 	xpath_len = VARSIZE_ANY_EXHDR(xpath_expr_text);
 	if (xpath_len == 0)
 		ereport(ERROR,
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to