Hi
> xpath-bugfix.patch affected only xml values containing an xml declaration > with > "encoding" attribute. In UTF8 databases, this latest proposal > (xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch. In > non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values > containing non-ASCII data. In a LATIN1 database, the following works today > but breaks under your latest proposal: > > SELECT xpath('text()', ('<x>' || convert_from('\xc2b0', 'LATIN1') || > '</x>')::xml); > I don't understand, why it should not work? > > It's acceptable to break that, since the documentation explicitly disclaims > support for it. xpath-bugfix.patch breaks different use cases, which are > likewise acceptable to break. See my 2017-08-08 review for details. > > We have xpath-bugfix.patch and xpath-parsing-error-fix.patch. Both are > equivalent under supported use cases (xpath in UTF8 databases). Among > non-supported use cases, they each make different things better and > different > things worse. We should prefer to back-patch the version harming fewer > applications. I expect non-ASCII data is more common than xml declarations > with "encoding" attribute, so xpath-bugfix.patch will harm fewer > applications. > > Having said that, I now see a third option. Condition this thread's > patch's > effects on GetDatabaseEncoding()==PG_UTF8. That way, we fix supported > cases, > and we remain bug-compatible in unsupported cases. I think that's better > than > the other options discussed so far. If you agree, please send a patch > based > on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change and > the > two edits I described earlier. > I am sorry - too long day today. Do you think some like diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 24229c2dff..9fd6f3509f 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3914,7 +3914,14 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, if (ctxt == NULL || xmlerrcxt->err_occurred) xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, "could not allocate parser context"); - doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); + + /* + * Passed XML is always in server encoding. When server encoding + * is UTF8, we can pass this information to libxml2 to ignore + * possible invalid encoding declaration in XML document. + */ + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, + GetDatabaseEncoding() == PG_UTF8 ? "UTF-8" : NULL, 0); if (doc == NULL || xmlerrcxt->err_occurred) xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT, "could not parse XML document"); This fix only UTF8 servers and has no any effect on other cases ? Regards Pavel > Thanks, > nm >