date:20170819

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-19 Thread Pavel Stehule

2017-08-20 4:17 GMT+02:00 Noah Misch :

> On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote:
> > yes, probably libXML2 try to do check from utf8 encoding to header
> > specified encoding.
>
> Yes.  That has been the topic of this thread.
>
> > a) all values created by  xml_in iterface are in database encoding -
> input
> > string is stored without any change. xml_parse is called only due
> > validation.
> >
> > b) inside xml_parse, the input is converted to UTF8, and document is read
> > by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or
> > by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8"
> > and removed decl section.
> >
> > So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document,
> > wellformated_xml) the database encoding is not important
> >
> > c) xml_recv function does validation by xml_parse and translation to
> > database encoding.
> >
> > Now I don't see a difference between @b and @c - so my hypotheses about
> > necessity to use recv interface was wrong.
>
> Yes.  You posted, on 2017-04-05, a test case not requiring the recv
> interface.
>
> On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote:
> > I didn't find any info how to enable libXML2 XPath functions for other
> > encoding than UTF8 :( ??
>
> http://xmlsoft.org/encoding.html is the relevant authority.  To
> summarize, we
> should send only UTF8 to libxml2.
>

libxml2 encodes XML to UTF8 by self. All others should be in UTF8. I found
some references to xmlSwitchEncoding function - but I didn't find any
examples of usage - probably nobody use it. Result is in UTF8 always.


>
> On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote:
> > I am sending some POC  - it does support XPATH and XMLTABLE for not UTF8
> > server encoding.
> >
> > In this case, all strings should be converted to UTF8 before call libXML2
> > functions, and result should be converted back from UTF8.
>
> Adding support for xpath in non-UTF8 databases is a v11 feature proposal.
> Please start a new thread for this, and add it to the open CommitFest.
>
> In this thread, would you provide the version of your patch that I
> described
> in my 2017-08-08 post to this thread?  That's a back-patchable bug fix.


There are three issues:

1. processing 1byte encoding XMLs documents with encoding declaration -
should be fixed by ecoding_for_xmlCtxtReadMemory.patch. This patch is very
short and safe - can be apply immediately (there is regress tests)

2 encoding issues in XPath specification (and  namespaces) - because
multibytes chars are not usually used in tag names, this issue hit minimum
users.

3. encoding issues in XPath and XMLTABLE results - this is bad issue - the
function XMLTABLE will not be functional on non UTF8 databases. Fortunately
- there are less users with this encoding, but probably should be apply as
fix in 10/11 Postgres.



> I found some previous experiments https://marc.info/?l=pgsql-
> bugs&m=123407176408688
>
> https://wiki.postgresql.org/wiki/Todo#XML links to other background on
> this
> feature proposal.  See Tom Lane's review of a previous patch.  Ensure your
> patch does not have the problems he found during that review.  Do that
> before
> starting a thread for this feature.
>

good information - thank you. I'll start new thread for @2 and @3 issues -
not sure if I prepare good enough patch for next commit fest - and later
commiter can decide if will do backpatching.

Regards

Pavel



>
> Thanks,
> nm
>

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-19 Thread Noah Misch

On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote:
> yes, probably libXML2 try to do check from utf8 encoding to header
> specified encoding.

Yes.  That has been the topic of this thread.

> a) all values created by  xml_in iterface are in database encoding - input
> string is stored without any change. xml_parse is called only due
> validation.
> 
> b) inside xml_parse, the input is converted to UTF8, and document is read
> by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or
> by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8"
> and removed decl section.
> 
> So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document,
> wellformated_xml) the database encoding is not important
> 
> c) xml_recv function does validation by xml_parse and translation to
> database encoding.
> 
> Now I don't see a difference between @b and @c - so my hypotheses about
> necessity to use recv interface was wrong.

Yes.  You posted, on 2017-04-05, a test case not requiring the recv interface.

On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote:
> I didn't find any info how to enable libXML2 XPath functions for other
> encoding than UTF8 :( ??

http://xmlsoft.org/encoding.html is the relevant authority.  To summarize, we
should send only UTF8 to libxml2.

On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote:
> I am sending some POC  - it does support XPATH and XMLTABLE for not UTF8
> server encoding.
> 
> In this case, all strings should be converted to UTF8 before call libXML2
> functions, and result should be converted back from UTF8.

Adding support for xpath in non-UTF8 databases is a v11 feature proposal.
Please start a new thread for this, and add it to the open CommitFest.

In this thread, would you provide the version of your patch that I described
in my 2017-08-08 post to this thread?  That's a back-patchable bug fix.

> I found some previous experiments 
> https://marc.info/?l=pgsql-bugs&m=123407176408688

https://wiki.postgresql.org/wiki/Todo#XML links to other background on this
feature proposal.  See Tom Lane's review of a previous patch.  Ensure your
patch does not have the problems he found during that review.  Do that before
starting a thread for this feature.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread MauMau

From: Chris Travers
> Why cannot you do all this in a language handler and treat as a user
defined function?
> ...
> If you have a language handler for cypher, why do you need in_region
or cast_region?  Why not just have a graph_search() function which
takes in a cypher query and returns a set of records?

The language handler is for *stored* functions.  The user-defined
function (UDF) doesn't participate in the planning of the outer
(top-level) query.  And they both assume that they are executed in SQL
commands.

I want the data models to meet these:

1) The query language can be used as a top-level session language.
For example, if an app specifies "region=cypher_graph" at database
connection, it can use the database as a graph database and submit
Cypher queries without embedding them in SQL.

2) When a query contains multiple query fragments of different data
models, all those fragments are parsed and planned before execution.
The planner comes up with the best plan, crossing the data model
boundary.  To take the query example in my first mail, which joins a
relational table and the result of a graph query.  The relational
planner considers how to scan the table, the graph planner considers
how to search the graph, and the relational planner considers how to
join the two fragments.

So in_region() and cast_region() are not functions to be executed
during execution phase, but are syntax constructs that are converted,
during analysis phase, into calls to another region's parser/analyzer
and an inter-model cast routine.

1. The relational parser finds in_region('cypher_graph', 'graph
query') and produces a parse node InRegion(region_name, query) in the
parse tree.

2. The relational analyzer looks up the system catalog to checks if
the specified region exists, then calls its parser/analyzer to produce
the query tree for the graph query fragment.  The relational analyzer
attaches the graph query tree to the InRegion node.

3. When the relational planner finds the graph query tree, it passes
the graph query tree to the graph planner to produce the graph
execution plan.

4. The relational planner produces a join plan node, based on the
costs/statistics of the relational table scan and graph query.  The
graph execution plan is attached to the join plan node.

The parse/query/plan nodes have a label to denote a region, so that
appropriate region's routines can be called.

Regards
MauMau



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-19 Thread Peter Geoghegan


Noah Misch  wrote:

I think you're contending that, as formulated, this is not a valid v10 open
item.  Are you?


As the person that came up with this formulation, I'd like to give a
quick summary of my current understanding of the item's status:

* We're in agreement that we ought to have initdb create initial
 collations based on ICU locales, not based on distinct ICU
 collations [1].

* We're in agreement that variant keywords should not be
 created for each base locale/collation [2].

Once these two changes are made, I think that everything will be in good
shape as far as pg_collation name stability goes. It shouldn't take
Peter E. long to write the patch. I'm happy to write the patch on his
behalf if that saves time.

We're also going to work on the documentation, to make keyword variants
like -emoji and -traditional at least somewhat discoverable, and to
explain the capabilities of custom ICU collations more generally.

[1] https://postgr.es/m/f67f36d7-ceb6-cfbd-28d4-413c6d22f...@2ndquadrant.com
[2] https://postgr.es/m/3862d484-f0a5-9eef-c54e-3f6808338...@2ndquadrant.com

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2017-08-19 Thread Thomas Munro

On Sun, Aug 20, 2017 at 8:10 AM, Thomas Munro
 wrote:
> On Fri, Aug 18, 2017 at 2:14 AM, Daniel Gustafsson  wrote:
>> Attached is an updated set of patches, rebased on top of master, with bug 
>> fixes
>> and additional features missing in the first set.  While not complete (yet), 
>> in
>> case anyone is testing this I’d rather send a fresh batch rather than sitting
>> on them too long while I keep hacking at the docs.  While not every part of
>> this rather large changeset has been touched, this includes all the patches 
>> for
>> completeness sake.
>
> Hi,
>
> +#if defined(USE_OPENSSL) || defined(USE_SECURETRANSPORT)
>  #define USE_SSL
> +#if defined(USE_OPENSSL)
> +#define SSL_LIBRARY "OpenSSL"
> +#elif defined(USE_SECURETRANSPORT)
> +#define SSL_LIBRARY "Secure Transport"
> +#endif
>  #endif
>
> If you configure with neither --with-securetransport nor
> --with-openssl then SSL_LIBRARY finishes up undefined, and then guc.c
> doesn't compile:
>
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -g -O2 -I. -I.
> -I../../../../src/include  -D_GNU_SOURCE   -c -o guc.o guc.c
> guc.c:3309:3: error: â€˜SSL_LIBRARYâ€™ undeclared here (not in a function)
>SSL_LIBRARY,
>^~~
>
> I guess it should have a fallback definition, though I don't know what
> it should be.

Or maybe the guc should only exist if SSL_LIBRARY is defined?  I mean
#if defined(SSL_LIBRARY) around this:

+   {
+   /* Can't be set in postgresql.conf */
+   {"ssl_library", PGC_INTERNAL, PRESET_OPTIONS,
+   gettext_noop("Shows the SSL library used."),
+   NULL,
+   GUC_REPORT | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+   },
+   &ssl_library_string,
+   SSL_LIBRARY,
+   NULL, NULL, NULL
+   },

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-19 Thread Pavel Stehule

Hi

2017-08-19 22:53 GMT+02:00 Pavel Stehule :

> Hi
>
> I am sending some POC  - it does support XPATH and XMLTABLE for not UTF8
> server encoding.
>
> In this case, all strings should be converted to UTF8 before call libXML2
> functions, and result should be converted back from UTF8.
>
> I found some previous experiments https://marc.info/?l=pgsql-bugs&m=
> 123407176408688
>
> Note: I got some information so used xmlNodeDump function is deprecated -
> so we should to replace it too sometime.
>
> Regards
>
>
I forgot a debug elog in previous patch




> Pavel
>
>
>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index c47624eff6..a43cf13d16 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -147,6 +147,7 @@ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 	   ArrayBuildState *astate,
 	   PgXmlErrorContext *xmlerrcxt);
 static xmlChar *pg_xmlCharStrndup(char *str, size_t len);
+static xmlChar *pg_xmlCharUtf8(char *str, size_t len);
 #endif			/* USE_LIBXML */
 
 static void xmldata_root_element_start(StringInfo result, const char *eltname,
@@ -459,8 +460,28 @@ cstring_to_xmltype(const char *string)
 static xmltype *
 xmlBuffer_to_xmltype(xmlBufferPtr buf)
 {
-	return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf),
+	if (GetDatabaseEncoding() != PG_UTF8)
+	{
+		char *utf8str = (char *) xmlBufferContent(buf);
+		char *str;
+		xmltype *result;
+
+		str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str,
+			   xmlBufferLength(buf),
+			   PG_UTF8,
+			   GetDatabaseEncoding());
+
+		Assert(str != utf8str);
+		result = (xmltype *) cstring_to_text(str);
+		pfree(str);
+
+		return result;
+	}
+	else
+	{
+		return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf),
 xmlBufferLength(buf));
+	}
 }
 #endif
 
@@ -1176,6 +1197,28 @@ pg_xmlCharStrndup(char *str, size_t len)
 }
 
 /*
+ * LibXML2 internal encoding is UTF8. Sometimes LibXML2 enforce
+ * encoding to UTF8 by self, sometimes it expects UTF8 strings.
+ * This function is used for encoding from database encoding to
+ * UTF8.
+ */
+static xmlChar *
+pg_xmlCharUtf8(char *str, size_t len)
+{
+	char *result;
+
+	result = (char *) pg_do_encoding_conversion((unsigned char *) str,
+		   len,
+		   GetDatabaseEncoding(),
+		   PG_UTF8);
+
+	if (result != str)
+		return BAD_CAST result;
+
+	return pg_xmlCharStrndup(str, len);
+}
+
+/*
  * str is the null-terminated input string.  Remaining arguments are
  * output arguments; each can be NULL if value is not wanted.
  * version and encoding are returned as locally-palloc'd strings.
@@ -3714,9 +3757,16 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 	}
 	else
 	{
-		xmlChar*str;
+		xmlChar*utf8str;
+		char	   *str = NULL;
+
+		utf8str = xmlXPathCastNodeToString(cur);
+
+		str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str,
+		   strlen((char *) utf8str),
+		   PG_UTF8,
+		   GetDatabaseEncoding());
 
-		str = xmlXPathCastNodeToString(cur);
 		PG_TRY();
 		{
 			/* Here we rely on XML having the same representation as TEXT */
@@ -3727,11 +3777,18 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 		}
 		PG_CATCH();
 		{
-			xmlFree(str);
+			if (str != (char *) utf8str)
+pfree(str);
+
+			xmlFree(utf8str);
 			PG_RE_THROW();
 		}
 		PG_END_TRY();
-		xmlFree(str);
+
+		if (str != (char *) utf8str)
+			pfree(str);
+
+		xmlFree(utf8str);
 	}
 
 	return result;
@@ -3758,6 +3815,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 	Datum		datum;
 	Oid			datumtype;
 	char	   *result_str;
+	char	   *str = NULL;
 
 	switch (xpathobj->type)
 	{
@@ -3797,7 +3855,18 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 		case XPATH_STRING:
 			if (astate == NULL)
 return 1;
-			datum = CStringGetDatum((char *) xpathobj->stringval);
+
+			/*
+			 * returned string is in UTF8 encoding - should be encoded
+			 * to database encoding first.
+			 */
+			str = (char *) pg_do_encoding_conversion((unsigned char *) xpathobj->stringval,
+		   strlen((char *) xpathobj->stringval),
+		   PG_UTF8,
+		   GetDatabaseEncoding());
+
+			datum = CStringGetDatum(str);
+
 			datumtype = CSTRINGOID;
 			break;
 
@@ -3812,6 +3881,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 	datum = PointerGetDatum(cstring_to_xmltype(result_str));
 	(void) accumArrayResult(astate, datum, false,
 			XMLOID, CurrentMemoryContext);
+
 	return 1;
 }
 
@@ -3895,7 +3965,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
  errmsg("empty XPath expression")));
 
 	string = pg_xmlCharStrndup(datastr, len);
-	xpath_expr = pg_xmlCharStrndup(VARDATA_ANY(xpath_expr_text), xpath_len);
+	xpath_expr = pg_xmlCharUtf8(VARDATA_ANY(xpath_expr_text), xpath_len);
 
 	xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
 
@@ -3911

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-19 Thread Pavel Stehule

Hi

I am sending some POC  - it does support XPATH and XMLTABLE for not UTF8
server encoding.

In this case, all strings should be converted to UTF8 before call libXML2
functions, and result should be converted back from UTF8.

I found some previous experiments https://marc.info/?l=pgsql-bug
s&m=123407176408688

Note: I got some information so used xmlNodeDump function is deprecated -
so we should to replace it too sometime.

Regards

Pavel
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index c47624eff6..d02cec88ab 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -147,6 +147,7 @@ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 	   ArrayBuildState *astate,
 	   PgXmlErrorContext *xmlerrcxt);
 static xmlChar *pg_xmlCharStrndup(char *str, size_t len);
+static xmlChar *pg_xmlCharUtf8(char *str, size_t len);
 #endif			/* USE_LIBXML */
 
 static void xmldata_root_element_start(StringInfo result, const char *eltname,
@@ -459,8 +460,28 @@ cstring_to_xmltype(const char *string)
 static xmltype *
 xmlBuffer_to_xmltype(xmlBufferPtr buf)
 {
-	return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf),
+	if (GetDatabaseEncoding() != PG_UTF8)
+	{
+		char *utf8str = (char *) xmlBufferContent(buf);
+		char *str;
+		xmltype *result;
+
+		str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str,
+			   xmlBufferLength(buf),
+			   PG_UTF8,
+			   GetDatabaseEncoding());
+
+		Assert(str != utf8str);
+		result = (xmltype *) cstring_to_text(str);
+		pfree(str);
+
+		return result;
+	}
+	else
+	{
+		return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf),
 xmlBufferLength(buf));
+	}
 }
 #endif
 
@@ -1176,6 +1197,28 @@ pg_xmlCharStrndup(char *str, size_t len)
 }
 
 /*
+ * LibXML2 internal encoding is UTF8. Sometimes LibXML2 enforce
+ * encoding to UTF8 by self, sometimes it expects UTF8 strings.
+ * This function is used for encoding from database encoding to
+ * UTF8.
+ */
+static xmlChar *
+pg_xmlCharUtf8(char *str, size_t len)
+{
+	char *result;
+
+	result = (char *) pg_do_encoding_conversion((unsigned char *) str,
+		   len,
+		   GetDatabaseEncoding(),
+		   PG_UTF8);
+
+	if (result != str)
+		return BAD_CAST result;
+
+	return pg_xmlCharStrndup(str, len);
+}
+
+/*
  * str is the null-terminated input string.  Remaining arguments are
  * output arguments; each can be NULL if value is not wanted.
  * version and encoding are returned as locally-palloc'd strings.
@@ -3714,9 +3757,16 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 	}
 	else
 	{
-		xmlChar*str;
+		xmlChar*utf8str;
+		char	   *str = NULL;
+
+		utf8str = xmlXPathCastNodeToString(cur);
+
+		str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str,
+		   strlen((char *) utf8str),
+		   PG_UTF8,
+		   GetDatabaseEncoding());
 
-		str = xmlXPathCastNodeToString(cur);
 		PG_TRY();
 		{
 			/* Here we rely on XML having the same representation as TEXT */
@@ -3727,11 +3777,18 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 		}
 		PG_CATCH();
 		{
-			xmlFree(str);
+			if (str != (char *) utf8str)
+pfree(str);
+
+			xmlFree(utf8str);
 			PG_RE_THROW();
 		}
 		PG_END_TRY();
-		xmlFree(str);
+
+		if (str != (char *) utf8str)
+			pfree(str);
+
+		xmlFree(utf8str);
 	}
 
 	return result;
@@ -3758,6 +3815,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 	Datum		datum;
 	Oid			datumtype;
 	char	   *result_str;
+	char	   *str = NULL;
 
 	switch (xpathobj->type)
 	{
@@ -3797,7 +3855,20 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 		case XPATH_STRING:
 			if (astate == NULL)
 return 1;
-			datum = CStringGetDatum((char *) xpathobj->stringval);
+
+			/*
+			 * returned string is in UTF8 encoding - should be encoded
+			 * to database encoding first.
+			 */
+			str = (char *) pg_do_encoding_conversion((unsigned char *) xpathobj->stringval,
+		   strlen((char *) xpathobj->stringval),
+		   PG_UTF8,
+		   GetDatabaseEncoding());
+
+elog(NOTICE, ">>>%s<<<", str);
+
+			datum = CStringGetDatum(str);
+
 			datumtype = CSTRINGOID;
 			break;
 
@@ -3812,6 +3883,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
 	datum = PointerGetDatum(cstring_to_xmltype(result_str));
 	(void) accumArrayResult(astate, datum, false,
 			XMLOID, CurrentMemoryContext);
+
 	return 1;
 }
 
@@ -3895,7 +3967,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
  errmsg("empty XPath expression")));
 
 	string = pg_xmlCharStrndup(datastr, len);
-	xpath_expr = pg_xmlCharStrndup(VARDATA_ANY(xpath_expr_text), xpath_len);
+	xpath_expr = pg_xmlCharUtf8(VARDATA_ANY(xpath_expr_text), xpath_len);
 
 	xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
 
@@ -3911,7 +3983,9 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		i

Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2017-08-19 Thread Thomas Munro

On Fri, Aug 18, 2017 at 2:14 AM, Daniel Gustafsson  wrote:
> Attached is an updated set of patches, rebased on top of master, with bug 
> fixes
> and additional features missing in the first set.  While not complete (yet), 
> in
> case anyone is testing this I’d rather send a fresh batch rather than sitting
> on them too long while I keep hacking at the docs.  While not every part of
> this rather large changeset has been touched, this includes all the patches 
> for
> completeness sake.

Hi,

+#if defined(USE_OPENSSL) || defined(USE_SECURETRANSPORT)
 #define USE_SSL
+#if defined(USE_OPENSSL)
+#define SSL_LIBRARY "OpenSSL"
+#elif defined(USE_SECURETRANSPORT)
+#define SSL_LIBRARY "Secure Transport"
+#endif
 #endif

If you configure with neither --with-securetransport nor
--with-openssl then SSL_LIBRARY finishes up undefined, and then guc.c
doesn't compile:

ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -I. -I.
-I../../../../src/include  -D_GNU_SOURCE   -c -o guc.o guc.c
guc.c:3309:3: error: â€˜SSL_LIBRARYâ€™ undeclared here (not in a function)
   SSL_LIBRARY,
   ^~~

I guess it should have a fallback definition, though I don't know what
it should be.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Account for catalog snapshot in PGXACT->xmin updates.

2017-08-19 Thread Noah Misch

On Tue, Dec 06, 2016 at 01:59:18PM -0500, Robert Haas wrote:
> On Tue, Dec 6, 2016 at 1:36 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Tue, Nov 15, 2016 at 3:55 PM, Tom Lane  wrote:
> >>> Account for catalog snapshot in PGXACT->xmin updates.
> >
> >> It seems to me that this makes it possible for
> >> ThereAreNoPriorRegisteredSnapshots() to fail spuriously.  The only
> >> core code that calls that function is in copy.c, and apparently we
> >> never reach that point with the catalog snapshot set.  But that seems
> >> fragile.

I recently noticed this by way of the copy2 regression test failing, in 9.4
and later, under REPEATABLE READ and SERIALIZABLE.  That failure started with
$SUBJECT.  Minimal example:

CREATE TABLE vistest (a text);
BEGIN ISOLATION LEVEL REPEATABLE READ;
TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
\.

Before $SUBJECT, the registered snapshot count was one.  Since $SUBJECT, the
catalog snapshot raises the count to two.

> > Hm.  If there were some documentation explaining what
> > ThereAreNoPriorRegisteredSnapshots is supposed to do, it would be possible
> > for us to have a considered argument as to whether this patch broke it or
> > not.  As things stand, it is not obvious whether it ought to be ignoring
> > the catalog snapshot or not; one could construct plausible arguments

The rows COPY FREEZE writes will be visible (until deleted) to every possible
catalog snapshot, so whether CatalogSnapshot==NULL doesn't matter.  It may be
useful to error out if a catalog scan is in-flight, but the registration for
CatalogSnapshot doesn't confirm or refute that.

> > either way.  It's not even very obvious why both 0 and 1 registered
> > snapshots should be considered okay; that looks a lot more like a hack
> > than reasoned-out behavior.  So I'm satisfied if COPY FREEZE does what

Fair summary.  ThereAreNoPriorRegisteredSnapshots() has functioned that way
since an early submission[1], and I don't see that we ever discussed it
explicitly.  Adding Simon for his recollection.  I think the goal was to raise
an error if any scan of the COPY-target table might use an older CommandId;
this prevents a way of seeing tuples that the scan would not see after COPY
without FREEZE.  Allowing one registered snapshot was a satisfactory heuristic
at the time.  It rejected running plain queries, which have been registering
two snapshots.  It did not reject otherwise-idle REPEATABLE READ transactions,
which register FirstXactSnapshot once; that has been okay, because any use of
FirstXactSnapshot in a query pushes or registers a copy.  I think the code was
wrong to consider registered snapshots and ignore ActiveSnapshot, though.  (We
must ignore COPY's own ActiveSnapshot, but it should be the only one.)  I
don't blame $SUBJECT for changing this landscape; COPY should not attribute
meaning to a particular nonzero registered snapshot count.

> > it's supposed to do, which it still appears to do.
> >
> > IOW, I'm not interested in touching this unless someone first provides
> > adequate documentation for the pre-existing kluge here.  There is no
> > API spec at all on the function itself, and the comment in front of the
> > one call site is too muddle-headed to provide any insight.
> 
> COPY FREEZE is designed to be usable only in situations where the new
> tuples won't be visible earlier than would otherwise be the case.
> Thus, copy.c has a check that the relfilenode was created by our
> (sub)transaction, which guards against the possibility that other
> transactions might see the data early, and also a check that there are
> no other registered snapshots or ready portals, which guarantees that,
> for example, a cursor belonging to the current subtransaction but with
> a lower command-ID doesn't see the rows early.  I am fuzzy why we need
> to check for both snapshots and portals, but the overall intent seems

The snapshot check helps in this case:

CREATE TABLE vistest (a text);
BEGIN ISOLATION LEVEL READ COMMITTED;
CREATE FUNCTION copy_vistest() RETURNS void LANGUAGE plpgsql
  AS $$BEGIN COPY vistest FROM '/dev/null' CSV FREEZE; END$$;
CREATE FUNCTION count_vistest() RETURNS int LANGUAGE plpgsql STABLE
  AS $$BEGIN RETURN (SELECT count(*) FROM vistest); END$$;
TRUNCATE vistest;
SELECT copy_vistest(), count_vistest();
ROLLBACK;

Other sessions still see rows they ideally would not see[2], the same kind of
anomaly the portal and snapshot tests exist to prevent.  It's odd that we
protect visibility solely within the COPY transaction, the transaction most
likely to know what it's doing.  I'm therefore skeptical of having these tests
at all, but I hesitate to remove them in back branches.  Even in master, we
may want them later if we add visibility protection for other transactions.

I'm tempted to add this hack, ...

--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2445,2 +2445,3 @@ CopyFrom(CopyState cstate)
{
+   InvalidateCatalogSnapshot();

Re: [HACKERS] Proposal: global index

2017-08-19 Thread Alvaro Herrera

Ildar Musin wrote:

> I found the thread about indirect indexes
> (https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru%40alvherre.pgsql),
> but it seems that it haven't been updated for some time and I couldn't
> find InMemoryIndexTuple in the latest patch. Is there a newer version?
> Generally I think this may be a good idea.

Nope, I haven't posted a version with the InMemoryIndexTuple stuff yet.
I'm working on it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-08-19 Thread Mark Rofail

I have a concern that after supporting UPDATE/DELETE CASCADE, the
performance would drop.

On Thu, Jul 27, 2017 at 12:54 PM, Alexander Korotkov 
 wrote:
>
> I wonder how may RI trigger work so fast if it has to do some job besides
> index search with no results?
>

Best Regards,
Mark Rofail

Re: [HACKERS] Segmentation Fault during pg_restore using '--use-list' and '--jobs'

2017-08-19 Thread Tom Lane

=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?=  writes:
> I'm facing a 'segmentation fault' error using '--use-list' and '--jobs'
> options after update to 9.5.8.

Ooops :-(.  Fixed, thanks for the report.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One-shot expanded output in psql using \gx

2017-08-19 Thread Stephen Frost

Noah, all,

* Noah Misch (n...@leadboat.com) wrote:
> On Tue, Aug 15, 2017 at 10:24:34PM +0200, Tobias Bussmann wrote:
> > I've tested the new \gx against 10beta and current git HEAD. Actually one 
> > of my favourite features of PostgreSQL 10! However in my environment it was 
> > behaving strangely. After some debugging I found that \gx does not work if 
> > you have \set FETCH_COUNT n before. Please find attached a patch that fixes 
> > this incl. new regression test.
> 
> [Action required within three days.  This is a generic notification.]

I'll address this on Tuesday, 8/22.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread Tom Lane

"MauMau"  writes:
> I'm thinking of making PostgreSQL a multi-model database by supporting
> data models other than the current relational model.  A data model
> consists of a query language (e.g. SQL for relational model, Cypher
> for graph model), a parser and analyzer to transform a query into a
> query tree, a planner to transform the query tree into an execution
> plan, an executor, and a storage engine.

It sounds like what you want is to replace all of Postgres except
the name.  I'm not clear on the point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread Chris Travers

On Sat, Aug 19, 2017 at 4:29 PM, MauMau  wrote:

> Hello,
>
> Please forgive me for asking such a stupid and rough question.
>
> I'm thinking of making PostgreSQL a multi-model database by supporting
> data models other than the current relational model.  A data model
> consists of a query language (e.g. SQL for relational model, Cypher
> for graph model), a parser and analyzer to transform a query into a
> query tree, a planner to transform the query tree into an execution
> plan, an executor, and a storage engine.
>
> To promote the data model development, I want to make data models
> pluggable.  The rough sketch is:
>
> 1) A data model developer implements the parser, analyzer,
> transformer, planner, executor, and storage engine functions in a
> shared library.
>
> 2) The DBA registers the data model.
>
>   CREATE QUERY LANGUAGE Cypher (
> PARSER = 
>   );
>
>   CREATE DATA MODEL graph (
> QUERY LANGUAGE = Cypher,
> ANALYZER = ,
> TRANSFORMER = ,
> PLANNER = ,
> EXECUTOR = ,
> STORAGE ENGINE = ,
>   );
>
>   CREATE REGION cypher_graph (
> QUERY LANGUAGE = Cypher,
> DATA MODEL = graph
>   );
>

Why cannot you do all this in a language handler and treat as a user
defined function?

>
> The region is just a combination of a query language and a data model,
> much like a locale is a combination of a language and a country.  This
> is because there may be multiple popular query languages for a data
> model.
>
> 3) The application connects to the database, specifying a desired
> region.  The specified region's query language becomes the default
> query language for the session.
>
>
> The application can use the data of multiple data models in one query
> by specifying another region and its query via in_region().  For
> example, the following query combines the relational restaurant table
> and a social graph to get the five chinese restaurants in Tokyo that
> are most popular among friends of John and their friends.
>
>   SELECT r.name, g.num_likers
>   FROM restaurant r,
> cast_region(
>   in_region('cypher_graph',
> 'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend),
>   (friend)-[:LIKES]->(restaurant:Restaurant)
>   RETURN restaurant.name, count(*)'),
> 'relational', 'g', '(name text, num_likers int')
>   WHERE r.name = g.name AND
> r.city = 'Tokyo' AND r.cuisine = 'chinese'
>   ORDER BY g.num_likers DESC LIMIT 5


If you have a language handler for cypher, why do you need in_region or
cast_region?  Why not just have a graph_search() function which takes in a
cypher query and returns a set of records?

> ;
>
>
> What do you think would be difficult to make data models pluggable,
> especially related to plugging the parser, planner, executor, etc?
> One possible concern is that various PostgreSQL components might be
> too dependent on the data model being relational, and it would be
> difficult to separate tight coupling.
>

I guess I am missing why the current language handler structure is not
enough.  Maybe I am missing something?


>
> Regards
> MauMau
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

[HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-19 Thread MauMau

Hello,

Please forgive me for asking such a stupid and rough question.

I'm thinking of making PostgreSQL a multi-model database by supporting
data models other than the current relational model.  A data model
consists of a query language (e.g. SQL for relational model, Cypher
for graph model), a parser and analyzer to transform a query into a
query tree, a planner to transform the query tree into an execution
plan, an executor, and a storage engine.

To promote the data model development, I want to make data models
pluggable.  The rough sketch is:

1) A data model developer implements the parser, analyzer,
transformer, planner, executor, and storage engine functions in a
shared library.

2) The DBA registers the data model.

  CREATE QUERY LANGUAGE Cypher (
PARSER = 
  );

  CREATE DATA MODEL graph (
QUERY LANGUAGE = Cypher,
ANALYZER = ,
TRANSFORMER = ,
PLANNER = ,
EXECUTOR = ,
STORAGE ENGINE = ,
  );

  CREATE REGION cypher_graph (
QUERY LANGUAGE = Cypher,
DATA MODEL = graph
  );

The region is just a combination of a query language and a data model,
much like a locale is a combination of a language and a country.  This
is because there may be multiple popular query languages for a data
model.

3) The application connects to the database, specifying a desired
region.  The specified region's query language becomes the default
query language for the session.


The application can use the data of multiple data models in one query
by specifying another region and its query via in_region().  For
example, the following query combines the relational restaurant table
and a social graph to get the five chinese restaurants in Tokyo that
are most popular among friends of John and their friends.

  SELECT r.name, g.num_likers
  FROM restaurant r,
cast_region(
  in_region('cypher_graph',
'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend),
  (friend)-[:LIKES]->(restaurant:Restaurant)
  RETURN restaurant.name, count(*)'),
'relational', 'g', '(name text, num_likers int')
  WHERE r.name = g.name AND
r.city = 'Tokyo' AND r.cuisine = 'chinese'
  ORDER BY g.num_likers DESC LIMIT 5;


What do you think would be difficult to make data models pluggable,
especially related to plugging the parser, planner, executor, etc?
One possible concern is that various PostgreSQL components might be
too dependent on the data model being relational, and it would be
difficult to separate tight coupling.

Regards
MauMau



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recovery_target_time = 'now' is not an error but still impractical setting

2017-08-19 Thread Piotr Stefaniak

On 2017-08-18 20:51, Robert Haas wrote:
> On Fri, Aug 18, 2017 at 4:39 AM, Piotr Stefaniak
>  wrote:
>> On 2017-08-17 11:24, Simon Riggs wrote:
>>> Your suggestion of "furthest" is already the default behaviour.
>>>
>>> Why are you using 'now'? Why would you want to pick a randomly
>>> selected end time?
>>
>> The idea in both cases was to stop the recovery in order to prevent the
>> standby from selecting new timeline. I want to be able to continue the
>> recovery from future WAL files.  "Furthest" really meant "as far as
>> possible without completing recovery".
> 
> Can you use recovery_target_action='shutdown' instead?

The thing I've tried was a combination of recovery_target_action =
'shutdown' and recovery_target_time = 'now'. Do you mean
recovery_target_action = 'shutdown' and everything else unset (default)?
That leads to the standby selecting new timeline anyway. Adding
standby_mode = on prevents that, but then there is no shutdown. Am I
missing something?

The only way I've managed to get recovery_target_action = 'shutdown' to
work as I needed was to follow advice by Matthijs and Christoph to
combine recovery_target_action with either setting recovery_target_time
to "now" spelled in the usual, non-special way or setting
recovery_target_xid to the latest transaction ID pg_xlogdump could find.
You could also try recovery_target_lsn, but I don't have that in 9.4. I
don't like that line of thought though, so for now I'll use the
restore_command hack I mentioned in the first message of this thread.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: global index

2017-08-19 Thread Ildar Musin



18/08/2017 17:40, Alvaro Herrera пишет:
> Ildar Musin wrote:
>
>> While we've been developing pg_pathman extension one of the most frequent
>> questions we got from our users was about global index support. We cannot
>> provide it within an extension. And I couldn't find any recent discussion
>> about someone implementing it. So I'm thinking about giving it a shot and
>> start working on a patch for postgres.
>>
>> One possible solution is to create an extended version of item pointer which
>> would store relation oid along with block number and position:
> I've been playing with the index code in order to allow indirect tuples,
> which are stored in a format different from IndexTupleData.
>
> I've been adding an "InMemoryIndexTuple" (maybe there's a better name)
> which internally has pointers to both IndexTupleData and
> IndirectIndexTupleData, which makes it easier to pass around the index
> tuple in either format.  It's very easy to add an OID to that struct,
> which then allows to include the OID in either an indirect index tuple
> or a regular one.
>
> Then, wherever we're using IndexTupleData in the index AM code, we would
> replace it with InMemoryIndexTuple.  This should satisfy both your use
> case and mine.
>
I found the thread about indirect indexes
(https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru%40alvherre.pgsql),
but it seems that it haven't been updated for some time and I couldn't
find InMemoryIndexTuple in the latest patch. Is there a newer version?
Generally I think this may be a good idea.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: global index

2017-08-19 Thread Ildar Musin



18/08/2017 17:45, Alvaro Herrera пишет:
> Erikjan Rijkers wrote:
>> On 2017-08-18 11:12, Ildar Musin wrote:
>>> Hi hackers,
>>>
>>> While we've been developing pg_pathman extension one of the most
>>> frequent questions we got from our users was about global index
>>> support. We cannot provide it within an extension. And I couldn't find
>>> any recent discussion about someone implementing it. So I'm thinking
>>> about giving it a shot and start working on a patch for postgres.
>> Sorry to be dense but what exactly is a "Global Index"?
> A global index covers all partitions of a partitioned table.  It allows
> you to have unique indexes across the partitioned table.
>
> The main disadvantage of global indexes is that you need some kind of
> cleanup after you drop a partition.  Either make partition drop wait
> until all the index pointers are removed, or you need some kind of
> after-commit cleanup process that removes them afterwards (which
> requires some assurance that they are really all gone).  You can't let
> them linger forever, or you risk a new partition that reuses the same
> OID causing the whole index to become automatically corrupt.
>
Thanks for the notion, I haven't thought this through yet. We could
probably keep the list of removed partitions somewhere in the catalog or
in the index itself. Usually autovacuum (or the process we start right
after partition drop as you suggested) would clean the index up before
OID wraparound. But if it didn't and user is trying to add a new
partition with the same oid then the cleanup will be forced. I think the
latter is very unlikely.

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-19 Thread Pavel Stehule

>
> Isn't the most correct solution to call xml_parse function?
>

I am reply to self. Probably not.

Now, I am thinking so I found a reason of this issue. The document
processed in xpath_internal is passed to libXML2 by

doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);

We don't pass a encoding parameter so libXML2 expecting "UTF8" or expecting
correct encoding decl in XML document. When we pass incorrect document -
XML is in database encoding, but encoding decl is original, then it should
to fail.

the regress test can looks like your (but all chars are valid there)

postgres=# do $$
declare str text;
begin
  if current_setting('server_encoding') <> 'UTF8' then return; end if;
  str = '909'
  || convert_from('\xc588', 'UTF8')
  || '';
  raise notice '%', xpath('/enprimeur/vino/id', str::xml);
end; $$;
ERROR:  could not parse XML document
DETAIL:  input conversion failed due to input error, bytes 0x88 0x3C 0x2F
0x72
line 1: switching encoding: encoder error
�
 ^
CONTEXT:  PL/pgSQL function inline_code_block line 8 at RAISE

After correct fix:

doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
pg_encoding_to_char(GetDatabaseEncoding()),
0);

It is working

postgres=# do $$
declare str text;
begin
  if current_setting('server_encoding') <> 'UTF8' then return; end if;
  str = '909'
  || convert_from('\xc588', 'UTF8')
  || '';
  raise notice '%', xpath('/enprimeur/vino/id', str::xml);
end; $$;
NOTICE:  {909}
DO

This fix should be apply to xmltable function too.

patch attached

It doesn't fix xpath and xmltable functions issues when server encoding is
not UTF8. Looks so XPath functions from libXML2 requires UTF8 encoded
strings and the result is in UTF8 too - so result should be recoded to
server encoding.

I didn't find any info how to enable libXML2 XPath functions for other
encoding than UTF8 :( ??

Regards

Pavel



> Regards
>
> Pavel
>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index c47624eff6..b09ce23cdb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3911,7 +3911,9 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		if (ctxt == NULL || xmlerrcxt->err_occurred)
 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 		"could not allocate parser context");
-		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
+		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
+pg_encoding_to_char(GetDatabaseEncoding()), 0);
+
 		if (doc == NULL || xmlerrcxt->err_occurred)
 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
 		"could not parse XML document");
@@ -4242,18 +4244,14 @@ XmlTableSetDocument(TableFuncScanState *state, Datum value)
 
 	xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableSetDocument");
 
-	/*
-	 * Use out function for casting to string (remove encoding property). See
-	 * comment in xml_out.
-	 */
-	str = xml_out_internal(xmlval, 0);
-
-	length = strlen(str);
+	str = VARDATA(xmlval);
+	length = VARSIZE(xmlval) - VARHDRSZ;
 	xstr = pg_xmlCharStrndup(str, length);
 
 	PG_TRY();
 	{
-		doc = xmlCtxtReadMemory(xtCxt->ctxt, (char *) xstr, length, NULL, NULL, 0);
+		doc = xmlCtxtReadMemory(xtCxt->ctxt, (char *) xstr, length, NULL,
+pg_encoding_to_char(GetDatabaseEncoding()), 0);
 		if (doc == NULL || xtCxt->xmlerrcxt->err_occurred)
 			xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
 		"could not parse XML document");
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index bcc585d427..6a43896d40 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1452,3 +1452,24 @@ SELECT xmltable.* FROM xmltest2, LATERAL xmltable(('/d/r/' || lower(_path) || 'c
  14
 (4 rows)
 
+-- XML is saved in database encoding with original encoding declaration.
+-- There can be incosistency based on wrong user input, different server/client
+-- encoding or reading XML with recv function. All XML functions should to
+-- work with this partially broken XML.
+DO $$
+DECLARE str text;
+BEGIN
+  -- leave early without error, when we are not sure about result of conversion
+  IF current_setting('server_encoding') NOT IN ('UTF8', 'LATIN2') THEN return; END IF;
+
+  -- build valid UTF8 XML with broken encoding declaration
+  str = '909'
+  || convert_from('\xf2', 'windows-1250')
+  || '';
+
+  -- should to work
+  RAISE NOTICE '%', xpath('/enprimeur/vino/id', str::xml);
+  RAISE NOTICE '%', (SELECT id FROM xmltable('/enprimeur/vino' PASSING (str::xml) COLUMNS id int));
+END; $$;
+NOTICE:  {909}
+NOTICE:  909
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index eb4687fb09..97a3aa9de2 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -558,3 +558,23 @@ INSERT INTO xmltest2 VALUES('2', 'D');

Re: [HACKERS] possible encoding issues with libxml2 functions

Re: [HACKERS] possible encoding issues with libxml2 functions

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

Re: [HACKERS] possible encoding issues with libxml2 functions

Re: [HACKERS] possible encoding issues with libxml2 functions

Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

[HACKERS] Re: [COMMITTERS] pgsql: Account for catalog snapshot in PGXACT->xmin updates.

Re: [HACKERS] Proposal: global index

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

Re: [HACKERS] Segmentation Fault during pg_restore using '--use-list' and '--jobs'

Re: [HACKERS] One-shot expanded output in psql using \gx

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

[HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Re: [HACKERS] recovery_target_time = 'now' is not an error but still impractical setting

Re: [HACKERS] Proposal: global index

Re: [HACKERS] Proposal: global index

Re: [HACKERS] possible encoding issues with libxml2 functions

20 matches

Site Navigation

Mail list logo

Footer information