2017-03-12 21:57 GMT+01:00 Noah Misch <n...@leadboat.com>: > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote: > > 2017-03-12 0:56 GMT+01:00 Noah Misch <n...@leadboat.com>: > > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote: > > > > There are possible two fixes > > > > > > > > a) clean decl on input - the encoding info can be removed from decl > part > > > > > > > > b) use xml_out_internal everywhere before transformation to > > > > xmlChar. pg_xmlCharStrndup can be good candidate. > > > > > > I'd prefer (a) if the xml type were a new feature, because no good can > > > come of > > > storing an encoding in each xml field when we know the actual encoding > is > > > the > > > database encoding. However, if you implemented (a), we'd still see > > > untreated > > > values brought over via pg_upgrade. Therefore, I would try (b) > first. I > > > suspect the intent of xml_parse() was to implement (b); it will be > > > interesting > > > to see your test case that malfunctions. > > > > > > > I looked there again and I found so this issue is related to xpath > function > > only > > > > Functions based on xml_parse are working without problems. xpath_internal > > uses own direct xmlCtxtReadMemory without correct encoding sanitation. > > > > so fix is pretty simple > > Please add a test case. >
It needs a application - currently there is not possibility to import XML document via recv API :( I wrote a pgimportdoc utility, but it is not part of core > > > --- a/src/backend/utils/adt/xml.c > > +++ b/src/backend/utils/adt/xml.c > > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype > *data, > > ArrayType *namespaces, > > ns_count = 0; > > } > > > > - datastr = VARDATA(data); > > - len = VARSIZE(data) - VARHDRSZ; > > + datastr = xml_out_internal(data, 0); > > Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly? > The > answer is probably in the archives, because someone understood the problem > enough to document "Some XML-related functions may not work at all on > non-ASCII data when the server encoding is not UTF-8. This is known to be > an > issue for xpath() in particular." Probably there are two possible issues 1. what I touched - recv function does encoding to database encoding - but document encoding is not updated. 2. there are not possibility to encode from document encoding to database encoding. > > + len = strlen(datastr); > > + > > xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; > > + > > The two lines of functional change don't create a cause for more newlines, > so > don't add these two newlines. > ok