Re: [HACKERS] possible encoding issues with libxml2 functions
2017-08-20 4:17 GMT+02:00 Noah Misch : > On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote: > > yes, probably libXML2 try to do check from utf8 encoding to header > > specified encoding. > > Yes. That has been the topic of this thread. > > > a) all values created by xml_in iterface are in database encoding - > input > > string is stored without any change. xml_parse is called only due > > validation. > > > > b) inside xml_parse, the input is converted to UTF8, and document is read > > by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or > > by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8" > > and removed decl section. > > > > So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document, > > wellformated_xml) the database encoding is not important > > > > c) xml_recv function does validation by xml_parse and translation to > > database encoding. > > > > Now I don't see a difference between @b and @c - so my hypotheses about > > necessity to use recv interface was wrong. > > Yes. You posted, on 2017-04-05, a test case not requiring the recv > interface. > > On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote: > > I didn't find any info how to enable libXML2 XPath functions for other > > encoding than UTF8 :( ?? > > http://xmlsoft.org/encoding.html is the relevant authority. To > summarize, we > should send only UTF8 to libxml2. > libxml2 encodes XML to UTF8 by self. All others should be in UTF8. I found some references to xmlSwitchEncoding function - but I didn't find any examples of usage - probably nobody use it. Result is in UTF8 always. > > On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote: > > I am sending some POC - it does support XPATH and XMLTABLE for not UTF8 > > server encoding. > > > > In this case, all strings should be converted to UTF8 before call libXML2 > > functions, and result should be converted back from UTF8. > > Adding support for xpath in non-UTF8 databases is a v11 feature proposal. > Please start a new thread for this, and add it to the open CommitFest. > > In this thread, would you provide the version of your patch that I > described > in my 2017-08-08 post to this thread? That's a back-patchable bug fix. There are three issues: 1. processing 1byte encoding XMLs documents with encoding declaration - should be fixed by ecoding_for_xmlCtxtReadMemory.patch. This patch is very short and safe - can be apply immediately (there is regress tests) 2 encoding issues in XPath specification (and namespaces) - because multibytes chars are not usually used in tag names, this issue hit minimum users. 3. encoding issues in XPath and XMLTABLE results - this is bad issue - the function XMLTABLE will not be functional on non UTF8 databases. Fortunately - there are less users with this encoding, but probably should be apply as fix in 10/11 Postgres. > I found some previous experiments https://marc.info/?l=pgsql- > bugs&m=123407176408688 > > https://wiki.postgresql.org/wiki/Todo#XML links to other background on > this > feature proposal. See Tom Lane's review of a previous patch. Ensure your > patch does not have the problems he found during that review. Do that > before > starting a thread for this feature. > good information - thank you. I'll start new thread for @2 and @3 issues - not sure if I prepare good enough patch for next commit fest - and later commiter can decide if will do backpatching. Regards Pavel > > Thanks, > nm >
Re: [HACKERS] possible encoding issues with libxml2 functions
On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote: > yes, probably libXML2 try to do check from utf8 encoding to header > specified encoding. Yes. That has been the topic of this thread. > a) all values created by xml_in iterface are in database encoding - input > string is stored without any change. xml_parse is called only due > validation. > > b) inside xml_parse, the input is converted to UTF8, and document is read > by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or > by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8" > and removed decl section. > > So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document, > wellformated_xml) the database encoding is not important > > c) xml_recv function does validation by xml_parse and translation to > database encoding. > > Now I don't see a difference between @b and @c - so my hypotheses about > necessity to use recv interface was wrong. Yes. You posted, on 2017-04-05, a test case not requiring the recv interface. On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote: > I didn't find any info how to enable libXML2 XPath functions for other > encoding than UTF8 :( ?? http://xmlsoft.org/encoding.html is the relevant authority. To summarize, we should send only UTF8 to libxml2. On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote: > I am sending some POC - it does support XPATH and XMLTABLE for not UTF8 > server encoding. > > In this case, all strings should be converted to UTF8 before call libXML2 > functions, and result should be converted back from UTF8. Adding support for xpath in non-UTF8 databases is a v11 feature proposal. Please start a new thread for this, and add it to the open CommitFest. In this thread, would you provide the version of your patch that I described in my 2017-08-08 post to this thread? That's a back-patchable bug fix. > I found some previous experiments > https://marc.info/?l=pgsql-bugs&m=123407176408688 https://wiki.postgresql.org/wiki/Todo#XML links to other background on this feature proposal. See Tom Lane's review of a previous patch. Ensure your patch does not have the problems he found during that review. Do that before starting a thread for this feature. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
From: Chris Travers > Why cannot you do all this in a language handler and treat as a user defined function? > ... > If you have a language handler for cypher, why do you need in_region or cast_region? Why not just have a graph_search() function which takes in a cypher query and returns a set of records? The language handler is for *stored* functions. The user-defined function (UDF) doesn't participate in the planning of the outer (top-level) query. And they both assume that they are executed in SQL commands. I want the data models to meet these: 1) The query language can be used as a top-level session language. For example, if an app specifies "region=cypher_graph" at database connection, it can use the database as a graph database and submit Cypher queries without embedding them in SQL. 2) When a query contains multiple query fragments of different data models, all those fragments are parsed and planned before execution. The planner comes up with the best plan, crossing the data model boundary. To take the query example in my first mail, which joins a relational table and the result of a graph query. The relational planner considers how to scan the table, the graph planner considers how to search the graph, and the relational planner considers how to join the two fragments. So in_region() and cast_region() are not functions to be executed during execution phase, but are syntax constructs that are converted, during analysis phase, into calls to another region's parser/analyzer and an inter-model cast routine. 1. The relational parser finds in_region('cypher_graph', 'graph query') and produces a parse node InRegion(region_name, query) in the parse tree. 2. The relational analyzer looks up the system catalog to checks if the specified region exists, then calls its parser/analyzer to produce the query tree for the graph query fragment. The relational analyzer attaches the graph query tree to the InRegion node. 3. When the relational planner finds the graph query tree, it passes the graph query tree to the graph planner to produce the graph execution plan. 4. The relational planner produces a join plan node, based on the costs/statistics of the relational table scan and graph query. The graph execution plan is attached to the join plan node. The parse/query/plan nodes have a label to denote a region, so that appropriate region's routines can be called. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
Noah Misch wrote: I think you're contending that, as formulated, this is not a valid v10 open item. Are you? As the person that came up with this formulation, I'd like to give a quick summary of my current understanding of the item's status: * We're in agreement that we ought to have initdb create initial collations based on ICU locales, not based on distinct ICU collations [1]. * We're in agreement that variant keywords should not be created for each base locale/collation [2]. Once these two changes are made, I think that everything will be in good shape as far as pg_collation name stability goes. It shouldn't take Peter E. long to write the patch. I'm happy to write the patch on his behalf if that saves time. We're also going to work on the documentation, to make keyword variants like -emoji and -traditional at least somewhat discoverable, and to explain the capabilities of custom ICU collations more generally. [1] https://postgr.es/m/f67f36d7-ceb6-cfbd-28d4-413c6d22f...@2ndquadrant.com [2] https://postgr.es/m/3862d484-f0a5-9eef-c54e-3f6808338...@2ndquadrant.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative
On Sun, Aug 20, 2017 at 8:10 AM, Thomas Munro wrote: > On Fri, Aug 18, 2017 at 2:14 AM, Daniel Gustafsson wrote: >> Attached is an updated set of patches, rebased on top of master, with bug >> fixes >> and additional features missing in the first set. While not complete (yet), >> in >> case anyone is testing this I’d rather send a fresh batch rather than sitting >> on them too long while I keep hacking at the docs. While not every part of >> this rather large changeset has been touched, this includes all the patches >> for >> completeness sake. > > Hi, > > +#if defined(USE_OPENSSL) || defined(USE_SECURETRANSPORT) > #define USE_SSL > +#if defined(USE_OPENSSL) > +#define SSL_LIBRARY "OpenSSL" > +#elif defined(USE_SECURETRANSPORT) > +#define SSL_LIBRARY "Secure Transport" > +#endif > #endif > > If you configure with neither --with-securetransport nor > --with-openssl then SSL_LIBRARY finishes up undefined, and then guc.c > doesn't compile: > > ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith > -Wdeclaration-after-statement -Wendif-labels > -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing > -fwrapv -fexcess-precision=standard -g -O2 -I. -I. > -I../../../../src/include -D_GNU_SOURCE -c -o guc.o guc.c > guc.c:3309:3: error: ‘SSL_LIBRARY’ undeclared here (not in a function) >SSL_LIBRARY, >^~~ > > I guess it should have a fallback definition, though I don't know what > it should be. Or maybe the guc should only exist if SSL_LIBRARY is defined? I mean #if defined(SSL_LIBRARY) around this: + { + /* Can't be set in postgresql.conf */ + {"ssl_library", PGC_INTERNAL, PRESET_OPTIONS, + gettext_noop("Shows the SSL library used."), + NULL, + GUC_REPORT | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE + }, + &ssl_library_string, + SSL_LIBRARY, + NULL, NULL, NULL + }, -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible encoding issues with libxml2 functions
Hi 2017-08-19 22:53 GMT+02:00 Pavel Stehule : > Hi > > I am sending some POC - it does support XPATH and XMLTABLE for not UTF8 > server encoding. > > In this case, all strings should be converted to UTF8 before call libXML2 > functions, and result should be converted back from UTF8. > > I found some previous experiments https://marc.info/?l=pgsql-bugs&m= > 123407176408688 > > Note: I got some information so used xmlNodeDump function is deprecated - > so we should to replace it too sometime. > > Regards > > I forgot a debug elog in previous patch > Pavel > > > diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index c47624eff6..a43cf13d16 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -147,6 +147,7 @@ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ArrayBuildState *astate, PgXmlErrorContext *xmlerrcxt); static xmlChar *pg_xmlCharStrndup(char *str, size_t len); +static xmlChar *pg_xmlCharUtf8(char *str, size_t len); #endif /* USE_LIBXML */ static void xmldata_root_element_start(StringInfo result, const char *eltname, @@ -459,8 +460,28 @@ cstring_to_xmltype(const char *string) static xmltype * xmlBuffer_to_xmltype(xmlBufferPtr buf) { - return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf), + if (GetDatabaseEncoding() != PG_UTF8) + { + char *utf8str = (char *) xmlBufferContent(buf); + char *str; + xmltype *result; + + str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str, + xmlBufferLength(buf), + PG_UTF8, + GetDatabaseEncoding()); + + Assert(str != utf8str); + result = (xmltype *) cstring_to_text(str); + pfree(str); + + return result; + } + else + { + return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf), xmlBufferLength(buf)); + } } #endif @@ -1176,6 +1197,28 @@ pg_xmlCharStrndup(char *str, size_t len) } /* + * LibXML2 internal encoding is UTF8. Sometimes LibXML2 enforce + * encoding to UTF8 by self, sometimes it expects UTF8 strings. + * This function is used for encoding from database encoding to + * UTF8. + */ +static xmlChar * +pg_xmlCharUtf8(char *str, size_t len) +{ + char *result; + + result = (char *) pg_do_encoding_conversion((unsigned char *) str, + len, + GetDatabaseEncoding(), + PG_UTF8); + + if (result != str) + return BAD_CAST result; + + return pg_xmlCharStrndup(str, len); +} + +/* * str is the null-terminated input string. Remaining arguments are * output arguments; each can be NULL if value is not wanted. * version and encoding are returned as locally-palloc'd strings. @@ -3714,9 +3757,16 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) } else { - xmlChar*str; + xmlChar*utf8str; + char *str = NULL; + + utf8str = xmlXPathCastNodeToString(cur); + + str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str, + strlen((char *) utf8str), + PG_UTF8, + GetDatabaseEncoding()); - str = xmlXPathCastNodeToString(cur); PG_TRY(); { /* Here we rely on XML having the same representation as TEXT */ @@ -3727,11 +3777,18 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) } PG_CATCH(); { - xmlFree(str); + if (str != (char *) utf8str) +pfree(str); + + xmlFree(utf8str); PG_RE_THROW(); } PG_END_TRY(); - xmlFree(str); + + if (str != (char *) utf8str) + pfree(str); + + xmlFree(utf8str); } return result; @@ -3758,6 +3815,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, Datum datum; Oid datumtype; char *result_str; + char *str = NULL; switch (xpathobj->type) { @@ -3797,7 +3855,18 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, case XPATH_STRING: if (astate == NULL) return 1; - datum = CStringGetDatum((char *) xpathobj->stringval); + + /* + * returned string is in UTF8 encoding - should be encoded + * to database encoding first. + */ + str = (char *) pg_do_encoding_conversion((unsigned char *) xpathobj->stringval, + strlen((char *) xpathobj->stringval), + PG_UTF8, + GetDatabaseEncoding()); + + datum = CStringGetDatum(str); + datumtype = CSTRINGOID; break; @@ -3812,6 +3881,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, datum = PointerGetDatum(cstring_to_xmltype(result_str)); (void) accumArrayResult(astate, datum, false, XMLOID, CurrentMemoryContext); + return 1; } @@ -3895,7 +3965,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, errmsg("empty XPath expression"))); string = pg_xmlCharStrndup(datastr, len); - xpath_expr = pg_xmlCharStrndup(VARDATA_ANY(xpath_expr_text), xpath_len); + xpath_expr = pg_xmlCharUtf8(VARDATA_ANY(xpath_expr_text), xpath_len); xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL); @@ -3911
Re: [HACKERS] possible encoding issues with libxml2 functions
Hi I am sending some POC - it does support XPATH and XMLTABLE for not UTF8 server encoding. In this case, all strings should be converted to UTF8 before call libXML2 functions, and result should be converted back from UTF8. I found some previous experiments https://marc.info/?l=pgsql-bug s&m=123407176408688 Note: I got some information so used xmlNodeDump function is deprecated - so we should to replace it too sometime. Regards Pavel diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index c47624eff6..d02cec88ab 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -147,6 +147,7 @@ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ArrayBuildState *astate, PgXmlErrorContext *xmlerrcxt); static xmlChar *pg_xmlCharStrndup(char *str, size_t len); +static xmlChar *pg_xmlCharUtf8(char *str, size_t len); #endif /* USE_LIBXML */ static void xmldata_root_element_start(StringInfo result, const char *eltname, @@ -459,8 +460,28 @@ cstring_to_xmltype(const char *string) static xmltype * xmlBuffer_to_xmltype(xmlBufferPtr buf) { - return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf), + if (GetDatabaseEncoding() != PG_UTF8) + { + char *utf8str = (char *) xmlBufferContent(buf); + char *str; + xmltype *result; + + str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str, + xmlBufferLength(buf), + PG_UTF8, + GetDatabaseEncoding()); + + Assert(str != utf8str); + result = (xmltype *) cstring_to_text(str); + pfree(str); + + return result; + } + else + { + return (xmltype *) cstring_to_text_with_len((const char *) xmlBufferContent(buf), xmlBufferLength(buf)); + } } #endif @@ -1176,6 +1197,28 @@ pg_xmlCharStrndup(char *str, size_t len) } /* + * LibXML2 internal encoding is UTF8. Sometimes LibXML2 enforce + * encoding to UTF8 by self, sometimes it expects UTF8 strings. + * This function is used for encoding from database encoding to + * UTF8. + */ +static xmlChar * +pg_xmlCharUtf8(char *str, size_t len) +{ + char *result; + + result = (char *) pg_do_encoding_conversion((unsigned char *) str, + len, + GetDatabaseEncoding(), + PG_UTF8); + + if (result != str) + return BAD_CAST result; + + return pg_xmlCharStrndup(str, len); +} + +/* * str is the null-terminated input string. Remaining arguments are * output arguments; each can be NULL if value is not wanted. * version and encoding are returned as locally-palloc'd strings. @@ -3714,9 +3757,16 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) } else { - xmlChar*str; + xmlChar*utf8str; + char *str = NULL; + + utf8str = xmlXPathCastNodeToString(cur); + + str = (char *) pg_do_encoding_conversion((unsigned char *) utf8str, + strlen((char *) utf8str), + PG_UTF8, + GetDatabaseEncoding()); - str = xmlXPathCastNodeToString(cur); PG_TRY(); { /* Here we rely on XML having the same representation as TEXT */ @@ -3727,11 +3777,18 @@ xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) } PG_CATCH(); { - xmlFree(str); + if (str != (char *) utf8str) +pfree(str); + + xmlFree(utf8str); PG_RE_THROW(); } PG_END_TRY(); - xmlFree(str); + + if (str != (char *) utf8str) + pfree(str); + + xmlFree(utf8str); } return result; @@ -3758,6 +3815,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, Datum datum; Oid datumtype; char *result_str; + char *str = NULL; switch (xpathobj->type) { @@ -3797,7 +3855,20 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, case XPATH_STRING: if (astate == NULL) return 1; - datum = CStringGetDatum((char *) xpathobj->stringval); + + /* + * returned string is in UTF8 encoding - should be encoded + * to database encoding first. + */ + str = (char *) pg_do_encoding_conversion((unsigned char *) xpathobj->stringval, + strlen((char *) xpathobj->stringval), + PG_UTF8, + GetDatabaseEncoding()); + +elog(NOTICE, ">>>%s<<<", str); + + datum = CStringGetDatum(str); + datumtype = CSTRINGOID; break; @@ -3812,6 +3883,7 @@ xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, datum = PointerGetDatum(cstring_to_xmltype(result_str)); (void) accumArrayResult(astate, datum, false, XMLOID, CurrentMemoryContext); + return 1; } @@ -3895,7 +3967,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, errmsg("empty XPath expression"))); string = pg_xmlCharStrndup(datastr, len); - xpath_expr = pg_xmlCharStrndup(VARDATA_ANY(xpath_expr_text), xpath_len); + xpath_expr = pg_xmlCharUtf8(VARDATA_ANY(xpath_expr_text), xpath_len); xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL); @@ -3911,7 +3983,9 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, i
Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative
On Fri, Aug 18, 2017 at 2:14 AM, Daniel Gustafsson wrote: > Attached is an updated set of patches, rebased on top of master, with bug > fixes > and additional features missing in the first set. While not complete (yet), > in > case anyone is testing this I’d rather send a fresh batch rather than sitting > on them too long while I keep hacking at the docs. While not every part of > this rather large changeset has been touched, this includes all the patches > for > completeness sake. Hi, +#if defined(USE_OPENSSL) || defined(USE_SECURETRANSPORT) #define USE_SSL +#if defined(USE_OPENSSL) +#define SSL_LIBRARY "OpenSSL" +#elif defined(USE_SECURETRANSPORT) +#define SSL_LIBRARY "Secure Transport" +#endif #endif If you configure with neither --with-securetransport nor --with-openssl then SSL_LIBRARY finishes up undefined, and then guc.c doesn't compile: ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I. -I. -I../../../../src/include -D_GNU_SOURCE -c -o guc.o guc.c guc.c:3309:3: error: ‘SSL_LIBRARY’ undeclared here (not in a function) SSL_LIBRARY, ^~~ I guess it should have a fallback definition, though I don't know what it should be. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Account for catalog snapshot in PGXACT->xmin updates.
On Tue, Dec 06, 2016 at 01:59:18PM -0500, Robert Haas wrote: > On Tue, Dec 6, 2016 at 1:36 PM, Tom Lane wrote: > > Robert Haas writes: > >> On Tue, Nov 15, 2016 at 3:55 PM, Tom Lane wrote: > >>> Account for catalog snapshot in PGXACT->xmin updates. > > > >> It seems to me that this makes it possible for > >> ThereAreNoPriorRegisteredSnapshots() to fail spuriously. The only > >> core code that calls that function is in copy.c, and apparently we > >> never reach that point with the catalog snapshot set. But that seems > >> fragile. I recently noticed this by way of the copy2 regression test failing, in 9.4 and later, under REPEATABLE READ and SERIALIZABLE. That failure started with $SUBJECT. Minimal example: CREATE TABLE vistest (a text); BEGIN ISOLATION LEVEL REPEATABLE READ; TRUNCATE vistest; COPY vistest FROM stdin CSV FREEZE; x \. Before $SUBJECT, the registered snapshot count was one. Since $SUBJECT, the catalog snapshot raises the count to two. > > Hm. If there were some documentation explaining what > > ThereAreNoPriorRegisteredSnapshots is supposed to do, it would be possible > > for us to have a considered argument as to whether this patch broke it or > > not. As things stand, it is not obvious whether it ought to be ignoring > > the catalog snapshot or not; one could construct plausible arguments The rows COPY FREEZE writes will be visible (until deleted) to every possible catalog snapshot, so whether CatalogSnapshot==NULL doesn't matter. It may be useful to error out if a catalog scan is in-flight, but the registration for CatalogSnapshot doesn't confirm or refute that. > > either way. It's not even very obvious why both 0 and 1 registered > > snapshots should be considered okay; that looks a lot more like a hack > > than reasoned-out behavior. So I'm satisfied if COPY FREEZE does what Fair summary. ThereAreNoPriorRegisteredSnapshots() has functioned that way since an early submission[1], and I don't see that we ever discussed it explicitly. Adding Simon for his recollection. I think the goal was to raise an error if any scan of the COPY-target table might use an older CommandId; this prevents a way of seeing tuples that the scan would not see after COPY without FREEZE. Allowing one registered snapshot was a satisfactory heuristic at the time. It rejected running plain queries, which have been registering two snapshots. It did not reject otherwise-idle REPEATABLE READ transactions, which register FirstXactSnapshot once; that has been okay, because any use of FirstXactSnapshot in a query pushes or registers a copy. I think the code was wrong to consider registered snapshots and ignore ActiveSnapshot, though. (We must ignore COPY's own ActiveSnapshot, but it should be the only one.) I don't blame $SUBJECT for changing this landscape; COPY should not attribute meaning to a particular nonzero registered snapshot count. > > it's supposed to do, which it still appears to do. > > > > IOW, I'm not interested in touching this unless someone first provides > > adequate documentation for the pre-existing kluge here. There is no > > API spec at all on the function itself, and the comment in front of the > > one call site is too muddle-headed to provide any insight. > > COPY FREEZE is designed to be usable only in situations where the new > tuples won't be visible earlier than would otherwise be the case. > Thus, copy.c has a check that the relfilenode was created by our > (sub)transaction, which guards against the possibility that other > transactions might see the data early, and also a check that there are > no other registered snapshots or ready portals, which guarantees that, > for example, a cursor belonging to the current subtransaction but with > a lower command-ID doesn't see the rows early. I am fuzzy why we need > to check for both snapshots and portals, but the overall intent seems The snapshot check helps in this case: CREATE TABLE vistest (a text); BEGIN ISOLATION LEVEL READ COMMITTED; CREATE FUNCTION copy_vistest() RETURNS void LANGUAGE plpgsql AS $$BEGIN COPY vistest FROM '/dev/null' CSV FREEZE; END$$; CREATE FUNCTION count_vistest() RETURNS int LANGUAGE plpgsql STABLE AS $$BEGIN RETURN (SELECT count(*) FROM vistest); END$$; TRUNCATE vistest; SELECT copy_vistest(), count_vistest(); ROLLBACK; Other sessions still see rows they ideally would not see[2], the same kind of anomaly the portal and snapshot tests exist to prevent. It's odd that we protect visibility solely within the COPY transaction, the transaction most likely to know what it's doing. I'm therefore skeptical of having these tests at all, but I hesitate to remove them in back branches. Even in master, we may want them later if we add visibility protection for other transactions. I'm tempted to add this hack, ... --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2445,2 +2445,3 @@ CopyFrom(CopyState cstate) { + InvalidateCatalogSnapshot();
Re: [HACKERS] Proposal: global index
Ildar Musin wrote: > I found the thread about indirect indexes > (https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru%40alvherre.pgsql), > but it seems that it haven't been updated for some time and I couldn't > find InMemoryIndexTuple in the latest patch. Is there a newer version? > Generally I think this may be a good idea. Nope, I haven't posted a version with the InMemoryIndexTuple stuff yet. I'm working on it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2017: Foreign Key Arrays
I have a concern that after supporting UPDATE/DELETE CASCADE, the performance would drop. On Thu, Jul 27, 2017 at 12:54 PM, Alexander Korotkov wrote: > > I wonder how may RI trigger work so fast if it has to do some job besides > index search with no results? > Best Regards, Mark Rofail
Re: [HACKERS] Segmentation Fault during pg_restore using '--use-list' and '--jobs'
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= writes: > I'm facing a 'segmentation fault' error using '--use-list' and '--jobs' > options after update to 9.5.8. Ooops :-(. Fixed, thanks for the report. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] One-shot expanded output in psql using \gx
Noah, all, * Noah Misch (n...@leadboat.com) wrote: > On Tue, Aug 15, 2017 at 10:24:34PM +0200, Tobias Bussmann wrote: > > I've tested the new \gx against 10beta and current git HEAD. Actually one > > of my favourite features of PostgreSQL 10! However in my environment it was > > behaving strangely. After some debugging I found that \gx does not work if > > you have \set FETCH_COUNT n before. Please find attached a patch that fixes > > this incl. new regression test. > > [Action required within three days. This is a generic notification.] I'll address this on Tuesday, 8/22. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
"MauMau" writes: > I'm thinking of making PostgreSQL a multi-model database by supporting > data models other than the current relational model. A data model > consists of a query language (e.g. SQL for relational model, Cypher > for graph model), a parser and analyzer to transform a query into a > query tree, a planner to transform the query tree into an execution > plan, an executor, and a storage engine. It sounds like what you want is to replace all of Postgres except the name. I'm not clear on the point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
On Sat, Aug 19, 2017 at 4:29 PM, MauMau wrote: > Hello, > > Please forgive me for asking such a stupid and rough question. > > I'm thinking of making PostgreSQL a multi-model database by supporting > data models other than the current relational model. A data model > consists of a query language (e.g. SQL for relational model, Cypher > for graph model), a parser and analyzer to transform a query into a > query tree, a planner to transform the query tree into an execution > plan, an executor, and a storage engine. > > To promote the data model development, I want to make data models > pluggable. The rough sketch is: > > 1) A data model developer implements the parser, analyzer, > transformer, planner, executor, and storage engine functions in a > shared library. > > 2) The DBA registers the data model. > > CREATE QUERY LANGUAGE Cypher ( > PARSER = > ); > > CREATE DATA MODEL graph ( > QUERY LANGUAGE = Cypher, > ANALYZER = , > TRANSFORMER = , > PLANNER = , > EXECUTOR = , > STORAGE ENGINE = , > ); > > CREATE REGION cypher_graph ( > QUERY LANGUAGE = Cypher, > DATA MODEL = graph > ); > Why cannot you do all this in a language handler and treat as a user defined function? > > The region is just a combination of a query language and a data model, > much like a locale is a combination of a language and a country. This > is because there may be multiple popular query languages for a data > model. > > 3) The application connects to the database, specifying a desired > region. The specified region's query language becomes the default > query language for the session. > > > The application can use the data of multiple data models in one query > by specifying another region and its query via in_region(). For > example, the following query combines the relational restaurant table > and a social graph to get the five chinese restaurants in Tokyo that > are most popular among friends of John and their friends. > > SELECT r.name, g.num_likers > FROM restaurant r, > cast_region( > in_region('cypher_graph', > 'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend), > (friend)-[:LIKES]->(restaurant:Restaurant) > RETURN restaurant.name, count(*)'), > 'relational', 'g', '(name text, num_likers int') > WHERE r.name = g.name AND > r.city = 'Tokyo' AND r.cuisine = 'chinese' > ORDER BY g.num_likers DESC LIMIT 5 If you have a language handler for cypher, why do you need in_region or cast_region? Why not just have a graph_search() function which takes in a cypher query and returns a set of records? > ; > > > What do you think would be difficult to make data models pluggable, > especially related to plugging the parser, planner, executor, etc? > One possible concern is that various PostgreSQL components might be > too dependent on the data model being relational, and it would be > difficult to separate tight coupling. > I guess I am missing why the current language handler structure is not enough. Maybe I am missing something? > > Regards > MauMau > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Regards, Chris Travers Database Administrator Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
[HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
Hello, Please forgive me for asking such a stupid and rough question. I'm thinking of making PostgreSQL a multi-model database by supporting data models other than the current relational model. A data model consists of a query language (e.g. SQL for relational model, Cypher for graph model), a parser and analyzer to transform a query into a query tree, a planner to transform the query tree into an execution plan, an executor, and a storage engine. To promote the data model development, I want to make data models pluggable. The rough sketch is: 1) A data model developer implements the parser, analyzer, transformer, planner, executor, and storage engine functions in a shared library. 2) The DBA registers the data model. CREATE QUERY LANGUAGE Cypher ( PARSER = ); CREATE DATA MODEL graph ( QUERY LANGUAGE = Cypher, ANALYZER = , TRANSFORMER = , PLANNER = , EXECUTOR = , STORAGE ENGINE = , ); CREATE REGION cypher_graph ( QUERY LANGUAGE = Cypher, DATA MODEL = graph ); The region is just a combination of a query language and a data model, much like a locale is a combination of a language and a country. This is because there may be multiple popular query languages for a data model. 3) The application connects to the database, specifying a desired region. The specified region's query language becomes the default query language for the session. The application can use the data of multiple data models in one query by specifying another region and its query via in_region(). For example, the following query combines the relational restaurant table and a social graph to get the five chinese restaurants in Tokyo that are most popular among friends of John and their friends. SELECT r.name, g.num_likers FROM restaurant r, cast_region( in_region('cypher_graph', 'MATCH (:Person {name:"John"})-[:IS_FRIEND_OF*1..2]-(friend), (friend)-[:LIKES]->(restaurant:Restaurant) RETURN restaurant.name, count(*)'), 'relational', 'g', '(name text, num_likers int') WHERE r.name = g.name AND r.city = 'Tokyo' AND r.cuisine = 'chinese' ORDER BY g.num_likers DESC LIMIT 5; What do you think would be difficult to make data models pluggable, especially related to plugging the parser, planner, executor, etc? One possible concern is that various PostgreSQL components might be too dependent on the data model being relational, and it would be difficult to separate tight coupling. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time = 'now' is not an error but still impractical setting
On 2017-08-18 20:51, Robert Haas wrote: > On Fri, Aug 18, 2017 at 4:39 AM, Piotr Stefaniak > wrote: >> On 2017-08-17 11:24, Simon Riggs wrote: >>> Your suggestion of "furthest" is already the default behaviour. >>> >>> Why are you using 'now'? Why would you want to pick a randomly >>> selected end time? >> >> The idea in both cases was to stop the recovery in order to prevent the >> standby from selecting new timeline. I want to be able to continue the >> recovery from future WAL files. "Furthest" really meant "as far as >> possible without completing recovery". > > Can you use recovery_target_action='shutdown' instead? The thing I've tried was a combination of recovery_target_action = 'shutdown' and recovery_target_time = 'now'. Do you mean recovery_target_action = 'shutdown' and everything else unset (default)? That leads to the standby selecting new timeline anyway. Adding standby_mode = on prevents that, but then there is no shutdown. Am I missing something? The only way I've managed to get recovery_target_action = 'shutdown' to work as I needed was to follow advice by Matthijs and Christoph to combine recovery_target_action with either setting recovery_target_time to "now" spelled in the usual, non-special way or setting recovery_target_xid to the latest transaction ID pg_xlogdump could find. You could also try recovery_target_lsn, but I don't have that in 9.4. I don't like that line of thought though, so for now I'll use the restore_command hack I mentioned in the first message of this thread. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: global index
18/08/2017 17:40, Alvaro Herrera пишет: > Ildar Musin wrote: > >> While we've been developing pg_pathman extension one of the most frequent >> questions we got from our users was about global index support. We cannot >> provide it within an extension. And I couldn't find any recent discussion >> about someone implementing it. So I'm thinking about giving it a shot and >> start working on a patch for postgres. >> >> One possible solution is to create an extended version of item pointer which >> would store relation oid along with block number and position: > I've been playing with the index code in order to allow indirect tuples, > which are stored in a format different from IndexTupleData. > > I've been adding an "InMemoryIndexTuple" (maybe there's a better name) > which internally has pointers to both IndexTupleData and > IndirectIndexTupleData, which makes it easier to pass around the index > tuple in either format. It's very easy to add an OID to that struct, > which then allows to include the OID in either an indirect index tuple > or a regular one. > > Then, wherever we're using IndexTupleData in the index AM code, we would > replace it with InMemoryIndexTuple. This should satisfy both your use > case and mine. > I found the thread about indirect indexes (https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru%40alvherre.pgsql), but it seems that it haven't been updated for some time and I couldn't find InMemoryIndexTuple in the latest patch. Is there a newer version? Generally I think this may be a good idea. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: global index
18/08/2017 17:45, Alvaro Herrera пишет: > Erikjan Rijkers wrote: >> On 2017-08-18 11:12, Ildar Musin wrote: >>> Hi hackers, >>> >>> While we've been developing pg_pathman extension one of the most >>> frequent questions we got from our users was about global index >>> support. We cannot provide it within an extension. And I couldn't find >>> any recent discussion about someone implementing it. So I'm thinking >>> about giving it a shot and start working on a patch for postgres. >> Sorry to be dense but what exactly is a "Global Index"? > A global index covers all partitions of a partitioned table. It allows > you to have unique indexes across the partitioned table. > > The main disadvantage of global indexes is that you need some kind of > cleanup after you drop a partition. Either make partition drop wait > until all the index pointers are removed, or you need some kind of > after-commit cleanup process that removes them afterwards (which > requires some assurance that they are really all gone). You can't let > them linger forever, or you risk a new partition that reuses the same > OID causing the whole index to become automatically corrupt. > Thanks for the notion, I haven't thought this through yet. We could probably keep the list of removed partitions somewhere in the catalog or in the index itself. Usually autovacuum (or the process we start right after partition drop as you suggested) would clean the index up before OID wraparound. But if it didn't and user is trying to add a new partition with the same oid then the cleanup will be forced. I think the latter is very unlikely. -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible encoding issues with libxml2 functions
> > Isn't the most correct solution to call xml_parse function? > I am reply to self. Probably not. Now, I am thinking so I found a reason of this issue. The document processed in xpath_internal is passed to libXML2 by doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); We don't pass a encoding parameter so libXML2 expecting "UTF8" or expecting correct encoding decl in XML document. When we pass incorrect document - XML is in database encoding, but encoding decl is original, then it should to fail. the regress test can looks like your (but all chars are valid there) postgres=# do $$ declare str text; begin if current_setting('server_encoding') <> 'UTF8' then return; end if; str = '909' || convert_from('\xc588', 'UTF8') || ''; raise notice '%', xpath('/enprimeur/vino/id', str::xml); end; $$; ERROR: could not parse XML document DETAIL: input conversion failed due to input error, bytes 0x88 0x3C 0x2F 0x72 line 1: switching encoding: encoder error � ^ CONTEXT: PL/pgSQL function inline_code_block line 8 at RAISE After correct fix: doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, pg_encoding_to_char(GetDatabaseEncoding()), 0); It is working postgres=# do $$ declare str text; begin if current_setting('server_encoding') <> 'UTF8' then return; end if; str = '909' || convert_from('\xc588', 'UTF8') || ''; raise notice '%', xpath('/enprimeur/vino/id', str::xml); end; $$; NOTICE: {909} DO This fix should be apply to xmltable function too. patch attached It doesn't fix xpath and xmltable functions issues when server encoding is not UTF8. Looks so XPath functions from libXML2 requires UTF8 encoded strings and the result is in UTF8 too - so result should be recoded to server encoding. I didn't find any info how to enable libXML2 XPath functions for other encoding than UTF8 :( ?? Regards Pavel > Regards > > Pavel > diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index c47624eff6..b09ce23cdb 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3911,7 +3911,9 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, if (ctxt == NULL || xmlerrcxt->err_occurred) xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, "could not allocate parser context"); - doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, +pg_encoding_to_char(GetDatabaseEncoding()), 0); + if (doc == NULL || xmlerrcxt->err_occurred) xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT, "could not parse XML document"); @@ -4242,18 +4244,14 @@ XmlTableSetDocument(TableFuncScanState *state, Datum value) xtCxt = GetXmlTableBuilderPrivateData(state, "XmlTableSetDocument"); - /* - * Use out function for casting to string (remove encoding property). See - * comment in xml_out. - */ - str = xml_out_internal(xmlval, 0); - - length = strlen(str); + str = VARDATA(xmlval); + length = VARSIZE(xmlval) - VARHDRSZ; xstr = pg_xmlCharStrndup(str, length); PG_TRY(); { - doc = xmlCtxtReadMemory(xtCxt->ctxt, (char *) xstr, length, NULL, NULL, 0); + doc = xmlCtxtReadMemory(xtCxt->ctxt, (char *) xstr, length, NULL, +pg_encoding_to_char(GetDatabaseEncoding()), 0); if (doc == NULL || xtCxt->xmlerrcxt->err_occurred) xml_ereport(xtCxt->xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT, "could not parse XML document"); diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index bcc585d427..6a43896d40 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -1452,3 +1452,24 @@ SELECT xmltable.* FROM xmltest2, LATERAL xmltable(('/d/r/' || lower(_path) || 'c 14 (4 rows) +-- XML is saved in database encoding with original encoding declaration. +-- There can be incosistency based on wrong user input, different server/client +-- encoding or reading XML with recv function. All XML functions should to +-- work with this partially broken XML. +DO $$ +DECLARE str text; +BEGIN + -- leave early without error, when we are not sure about result of conversion + IF current_setting('server_encoding') NOT IN ('UTF8', 'LATIN2') THEN return; END IF; + + -- build valid UTF8 XML with broken encoding declaration + str = '909' + || convert_from('\xf2', 'windows-1250') + || ''; + + -- should to work + RAISE NOTICE '%', xpath('/enprimeur/vino/id', str::xml); + RAISE NOTICE '%', (SELECT id FROM xmltable('/enprimeur/vino' PASSING (str::xml) COLUMNS id int)); +END; $$; +NOTICE: {909} +NOTICE: 909 diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql index eb4687fb09..97a3aa9de2 100644 --- a/src/test/regress/sql/xml.sql +++ b/src/test/regress/sql/xml.sql @@ -558,3 +558,23 @@ INSERT INTO xmltest2 VALUES('2', 'D');